Sunday, September 24, 2017

Latest Foxhound 4 Patches

Two new patches have been posted recently on the Foxhound 4 page:

[ Download Patch 3 ]  Fix "unable to load dll" messages:
 Could not load dynamic library
 Unable to load either rroad4.dll or rroad464.dll
[ Patch 3 Readme ]
[ Download Patch 4 ]  Fix Display Schema for billion-row tables:
 Value nnn out of range for destination
 [ Patch 4 Readme ]
[ Download Patch 5 ]   NEW Prevent Foxhound from becoming unresponsive
 when the Monitor cannot connect to a network target database
 and an endless loop of bogus error messages appears:
  Attempt to reload definition for event "rroad_monitor_sample_loop"
  failed due to concurrent event execution.
 [ Patch 5 Readme ]
[ Download Patch 8 ]  Prevent Foxhound from becoming unresponsive
 when Ping-Only Sampling is stopped and restarted, or
 when full sampling is attempted for an arbiter server.
 [ Patch 8 Readme ]
[ Download Patch 10 ]   NEW Prevent Foxhound from becoming unresponsive
 on startup, especially when the Foxhound database is very large.
 [ Patch 10 Readme ]

The two new patches are more likely to help Foxhound users than the others:

Patch 5 will help folks who are using recent builds of SQL Anywhere 16 and 17 to run Foxhound. In particular, it implements a workaround for a bug behavior change introduced after SQL Anywhere 16.0.0.2193 and SQL Anywhere 17.0.0.1359.

Patch 10 will help anyone with a Foxhound database that has grown into the multi-gigabyte range by decreasing the time it takes for Foxhound to start up. For example, here are the test results for a 1.4G database:
Before applying Patch 10:  Foxhound startup time: 10.8s
After:                     Foxhound startup time:  1.1s
That's a factor of 10, but... you might not notice a difference of only 9 seconds.

However, you will notice the effect with a larger Foxhound database. For example, here's what happened with a 35G database:
Before applying Patch 10:  Foxhound startup time: 7m 17s ...plus 30 minutes more for the Foxhound Menu to appear!
After:                     Foxhound startup time: 4.6s
If you see this message every time you start Foxhound 4, you probably need to apply Patch 10:
Foxhound was not available yet, probably because the Foxhound engine had not finished starting up.
It may be ready now, so click here to try again.

If the situation persists contact RisingRoad.
If that message goes away after a long while, replaced by a blank page, and you still don't see the Foxhound menu appear, you definitely need Patch 10... especially if Foxhound's instance of dbsrv16.exe is using up this much CPU:

The "Foxhound startup time: 7m 17s" only tells part of the story in the example above... after Foxhound finished starting up, it took a further 30 minutes before the menu page appeared!

How Patch 10 Was Created...
Step 1: Find the Queries From Hell


The first step was to start SQL Anywhere's Procedure Profiler as soon as Foxhound started executing, by adding this statement to DatabaseStart event script:
CALL sa_server_option ( 'ProcedureProfiling', 'YES' );
Then, after the Foxhound menu finally appeared, this dbisql query showed which SQL statements took the longest to execute:
SELECT * 
  FROM sa_procedure_profile()
 where millisecs >= 100
 ORDER BY object_name, line_num;
Here's the row that shows line 102 in the rroad_monitor_list_html stored procedure took 473,628 milliseconds to run, which is 8 minutes:
object_type, object_name,               owner_name, table_name, line_num, executions, millisecs, percentage,       foreign_owner, foreign_table
'F',         'rroad_monitor_list_html', 'DBA',      ,           102,      1,          473628,    99.9972553051784, ,
Here's how to find line 102 that stored procedure:
Tip: You can't use Sybase Central to view the source code because the version it displays (SYSPROCEDURE.source) isn't the version that executes (SYSPROCEDURE.proc_defn), so you have use an Old School query.
UNLOAD SELECT proc_defn 
         FROM SYSPROCEDURE 
        WHERE proc_name = 'rroad_monitor_list_html' 
    TO 'C:\temp2\\temp_rroad_monitor_list_html.txt' 
    DELIMITED BY '' ESCAPES OFF HEXADECIMAL OFF QUOTES OFF;
Here's the Query From Hell at line 102:
select "rroad_alert_union"."sampling_id" as "sampling_id",
  "COUNT"(distinct "rroad_alert_union"."alert_number") as "active_alert_count",
  "LIST"(
  distinct "STRING"(
  '#',
  "rroad_alert_union"."alert_number"),
  ', ' order by
  "rroad_alert_union"."alert_number" asc) as "active_alert_number_list"
  into local temporary table "active_alert_count"
  from "rroad_alert_union"
  where "rroad_alert_union"."record_type" = 'Alert'
  and "rroad_alert_union"."alert_is_clear_or_cancelled" = 'N'
  group by "rroad_alert_union"."sampling_id";

Step 2: Study the Query Plans From Hell


The next step was to copy and paste the Query From Hell into dbisql and click on Tools - Plan Viewer.

In this case, the "Main Query" didn't look too bad, just one table scan for 77K rows, but the subquery called "SubQ 5" was truly breathtaking... a 77K table scan repeated 77K times!

SubQ 5 was found lurking down inside a view, and there was no apparent way to make it run faster; no indexes helped, nor did a materialized view:
LEFT OUTER JOIN ( SELECT *, 
                         'Y' AS alert_is_clear
                    FROM rroad_all_clear 
                ) AS rroad_all_clear
             ON rroad_all_clear.sampling_id        = rroad_alert.sampling_id
            AND rroad_all_clear.alert_number       = rroad_alert.alert_number
            AND rroad_all_clear.alert_in_effect_at = rroad_alert.alert_in_effect_at 

The only solution was eliminate SubQ 5 altogether by adding a new derived (redundant) column to one of the tables.

Step 3: Fix the Queries From Hell


Patch 10 actually introduces two new derived columns, for two separate Queries From Hell; here's what the code looks like.

The first derived column...
ALTER TABLE rroad_alert 
   ADD alert_is_clear_or_cancelled   VARCHAR ( 1 ) NOT NULL DEFAULT 'N';
allowed this slow query code...
CREATE VIEW rroad_alert_union AS
SELECT CAST ( 'Alert' AS VARCHAR ( 20 ) )       AS record_type,
       ...
       COALESCE ( 
          rroad_all_clear.alert_is_clear, 
          rroad_alert_cancelled.alert_is_cancelled, 
          'N' )                                 AS alert_is_clear_or_cancelled
  FROM rroad_alert
          INNER JOIN rroad_sampling_options
                  ON rroad_sampling_options.sampling_id = rroad_alert.sampling_id
          LEFT OUTER JOIN ( SELECT *, 
                                   'Y' AS alert_is_clear
                              FROM rroad_all_clear 
                          ) AS rroad_all_clear
                       ON rroad_all_clear.sampling_id        = rroad_alert.sampling_id
                      AND rroad_all_clear.alert_number       = rroad_alert.alert_number
                      AND rroad_all_clear.alert_in_effect_at = rroad_alert.alert_in_effect_at 
          LEFT OUTER JOIN ( SELECT *,  
                                   'Y' AS alert_is_cancelled
                              FROM rroad_alert_cancelled 
                          ) AS rroad_alert_cancelled
                       ON rroad_alert_cancelled.sampling_id        =  rroad_alert.sampling_id
                      AND rroad_alert_cancelled.alert_number       =  rroad_alert.alert_number
                      AND rroad_alert_cancelled.alert_in_effect_at = rroad_alert.alert_in_effect_at
to be simplified (and speeded up) by eliminating both outer join subqueries (including SUBQ 5)...
CREATE VIEW rroad_alert_union AS
SELECT CAST ( 'Alert' AS VARCHAR ( 20 ) )       AS record_type,
       ...
       rroad_alert.alert_is_clear_or_cancelled  AS alert_is_clear_or_cancelled
  FROM rroad_alert
          INNER JOIN rroad_sampling_options
                  ON rroad_sampling_options.sampling_id = rroad_alert.sampling_id
Two new triggers were required to maintain the new column:
CREATE OR REPLACE TRIGGER tri_rroad_all_clear
   AFTER INSERT ON rroad_all_clear
   REFERENCING NEW AS new_rroad_all_clear
   FOR EACH ROW
BEGIN
   UPDATE rroad_alert
      SET rroad_alert.alert_is_clear_or_cancelled = 'Y'
    WHERE rroad_alert.sampling_id        = new_rroad_all_clear.sampling_id
      AND rroad_alert.alert_number       = new_rroad_all_clear.alert_number
      AND rroad_alert.alert_in_effect_at = new_rroad_all_clear.alert_in_effect_at;
END;

CREATE OR REPLACE TRIGGER tri_rroad_alert_cancelled
   AFTER INSERT ON rroad_alert_cancelled
   REFERENCING NEW AS new_rroad_alert_cancelled
   FOR EACH ROW
BEGIN
   UPDATE rroad_alert
      SET rroad_alert.alert_is_clear_or_cancelled = 'Y'
    WHERE rroad_alert.sampling_id        = new_rroad_alert_cancelled.sampling_id
      AND rroad_alert.alert_number       = new_rroad_alert_cancelled.alert_number
      AND rroad_alert.alert_in_effect_at = new_rroad_alert_cancelled.alert_in_effect_at;
END;
Patch 10 contains another derived column...
ALTER TABLE rroad_sampling_options 
   ADD latest_sample_set_number      UNSIGNED BIGINT NOT NULL DEFAULT 0;
which allowed this slow query code...
LEFT OUTER JOIN ( SELECT rroad_group_1_property_pivot.*
                    FROM rroad_group_1_property_pivot
                         INNER JOIN ( SELECT rroad_group_1_property_pivot.sampling_id               AS sampling_id,
                                             MAX ( rroad_group_1_property_pivot.sample_set_number ) AS sample_set_number
                                        FROM rroad_group_1_property_pivot
                                       GROUP BY rroad_group_1_property_pivot.sampling_id 
                                    ) AS latest_primary_key
                                 ON latest_primary_key.sampling_id       = rroad_group_1_property_pivot.sampling_id
                                AND latest_primary_key.sample_set_number = rroad_group_1_property_pivot.sample_set_number
                 ) AS rroad_group_1_property_pivot
             ON rroad_group_1_property_pivot.sampling_id = rroad_sampling_options.sampling_id
to be simplified (and speeded up) by eliminating the complex inner join subquery:
LEFT OUTER JOIN rroad_group_1_property_pivot
             ON rroad_group_1_property_pivot.sampling_id       = rroad_sampling_options.sampling_id
            AND rroad_group_1_property_pivot.sample_set_number = rroad_sampling_options.latest_sample_set_number
Two existing triggers had to be modified to maintain the second derived column:
ALTER TRIGGER tri_rroad_sample_set
   AFTER INSERT ON rroad_sample_set
   REFERENCING NEW AS new_rroad_sample_set
   FOR EACH ROW
BEGIN
...
UPDATE rroad_sampling_options
   SET rroad_sampling_options.latest_sample_set_number = new_rroad_sample_set.sample_set_number
 WHERE rroad_sampling_options.sampling_id              = new_rroad_sample_set.sampling_id
   AND rroad_sampling_options.latest_sample_set_number < new_rroad_sample_set.sample_set_number;
...
END; -- tri_rroad_sample_set

ALTER TRIGGER tru_rroad_sample_set
   BEFORE UPDATE OF sample_finished_at ON rroad_sample_set
   REFERENCING OLD AS old_rroad_sample_set NEW AS new_rroad_sample_set
   FOR EACH ROW
BEGIN
...
IF new_rroad_sample_set.sample_set_number <> old_rroad_sample_set.sample_set_number THEN

   UPDATE rroad_sampling_options
      SET rroad_sampling_options.latest_sample_set_number = new_rroad_sample_set.sample_set_number
    WHERE rroad_sampling_options.sampling_id              = new_rroad_sample_set.sampling_id
      AND rroad_sampling_options.latest_sample_set_number < new_rroad_sample_set.sample_set_number;

END IF;
...
END; -- tru_rroad_sample_set