Monday, March 17, 2008

CREATE EVENT ... TYPE DEADLOCK

Panorama has introduced a new EVENT type for catching deadlocks. The most common type of deadlock is cyclical deadlock, and it happens when each connection has a lock on something the other connection wants.

For example, let's say connection 1 has updated row 1 in a table, and connection 2 has updated row 2. If connection 1 tries to update row 2, it will be blocked by the lock held by the other connection... that's a fairly common occurrence, no big deal unless the other user goes out to lunch without committing.

But let's suppose connection 2 now tries to update row 1... suddenly both connections are blocked, neither can do anything at all, not even commit or rollback. They are completely stuck, in what is sometimes called a "deadly embrace" or deadlock.

Cyclical deadlocks are almost always the result of application program design errors. If two different pieces of code can update the same set of two or more rows, usually in two different tables, they should perform the updates in the same order: UPDATE A first, then UPDATE B, in both pieces of code. That way, when two connections collide, one of the connections will get to the UPDATE A first, and the other connection will be blocked when it reaches UPDATE A... before it gets any locks that might block the first connection... blocking good, deadlocking bad.
SQL Anywhere instantly detects and resolves deadlocks: it executes a ROLLBACK on the last connection that became blocked, and returns an almost-useless error message to the connection that was picked as the victim. Once that happens, all further trace of the deadlock vanishes, and unless you make the effort to record them it's gonna be hard for you to figure out what happened.
"Almost-useless" might sound a bit harsh, but all it says is "deadlock detected"; scroll down to see an example. Even if it did include extra diagnostic information, a message sent back to a client application often doesn't help the administrator or developer sitting thousands of miles away.
Panorama makes it easy for you to record deadlocks by providing the new CREATE EVENT ... TYPE DEADLOCK that gets executed whenever a deadlock is detected and resolved.

Let's plunge into the code:
CREATE EVENT deadlock TYPE deadlock HANDLER
BEGIN
-- Copy fresh entries from sa_report_deadlocks()
INSERT deadlock (
snapshotId,
snapshotAt,
waiter,
who,
what,
object_id,
record_id,
owner,
is_victim,
rollback_operation_count )
SELECT *
FROM sa_report_deadlocks()
WHERE NOT EXISTS (
SELECT * FROM deadlock
WHERE deadlock.snapshotId
= sa_report_deadlocks.snapshotId
AND deadlock.snapshotAt
= sa_report_deadlocks.snapshotAt );
COMMIT;
MESSAGE STRING ( 'DIAG ', CURRENT TIMESTAMP,
' deadlock' ) TO CONSOLE;
END;
The builtin sa_report_deadlocks() procedure returns a result set containing at least two rows for each deadlock that has occurred since the server started: one for the victim, one for the other connection. The INSERT above copies that data into a permanent user-defined table called "deadlock". The WHERE NOT EXISTS is needed to prevent old rows from being copied again when another deadlock happens.
Caution: This is not your father's sa_report_deadlocks. That procedure might have been introduced in SQL Anywhere 9.0.2 but its result set has been improved a lot in Panorama.
Here's what the permanent table looks like; it has one extra column "row_number" to act as an artificial primary key, plus columns for everything returned by sa_report_deadlocks():
CREATE TABLE deadlock (
row_number BIGINT NOT NULL
DEFAULT AUTOINCREMENT,
snapshotId BIGINT NOT NULL,
snapshotAt TIMESTAMP NOT NULL,
waiter INTEGER NOT NULL,
who VARCHAR ( 128 ) NOT NULL,
what LONG VARCHAR NOT NULL,
object_id UNSIGNED BIGINT NOT NULL,
record_id BIGINT NOT NULL,
owner INTEGER NOT NULL,
is_victim BIT NOT NULL,
rollback_operation_count UNSIGNED INTEGER NOT NULL,
PRIMARY KEY ( row_number ) );
In order to make this work, you have to tell the server to gather the report data:
SET OPTION PUBLIC.LOG_DEADLOCKS = 'ON';
It also helps a lot if you enable the 'LastStatement' connection property so the report data will show the actual SQL statements that were blocked. You do this by using the dbsrv11 -zl parameter; that's "dash-zee-el", not "dash-zee-one".
(that's "dash-zed-el" for fans of "zed-zed-top" :)
To see how it works, we'll force a deadlock using two dbisql sessions. First, here is command file that creates and starts a database and two dbisql's, updated for Panorama (beware the wrapping, there are only 4 commands):
"%SQLANY11%\bin32\dbinit.exe" -et -s ddd11.db

"%SQLANY11%\bin32\dbspawn.exe" -f "%SQLANY11%\bin32\dbsrv11.exe" -o dbeng11_log_ddd11.txt -os 10M -x tcpip -zl -zp -zt ddd11.db

"%SQLANY11%\bin32\dbisql.exe" -c "ENG=ddd11;DBN=ddd11;UID=dba;PWD=sql;CON=ddd11-1"

"%SQLANY11%\bin32\dbisql.exe" -c "ENG=ddd11;DBN=ddd11;UID=dba;PWD=sql;CON=ddd11-2"

After you run the CREATE EVENT, CREATE TABLE and SET OPTION statements from above, you can run these commands to see a deadlock:
-- Cyclical Deadlock

-- dbisql Session 1, Step 1.
UPDATE RowGenerator SET row_num = 1 WHERE row_num = 1;

-- dbisql Session 2, Step 1.
UPDATE RowGenerator SET row_num = 2 WHERE row_num = 2;

-- Session 1, Step 2.
UPDATE RowGenerator SET row_num = 2 WHERE row_num = 2;

-- ...Session 1 will be blocked by Session 2.

-- Session 2, Step 2.
UPDATE RowGenerator SET row_num = 1 WHERE row_num = 1;

-- ...deadlock will be detected and resolved.
Here's how dbisql Session 2 reports the error:



Here's a query showing what happened; "waiter" is the connection number, "who" is the user name, "what" is the CONNECTION_PROPERTY ( 'LastStatement' ) and "is_victim" tells you which connection got the ROLLBACK:
SELECT waiter,
who,
what,
is_victim
FROM deadlock
ORDER BY row_number;

And now a correction from Ivan Bowman: "It is not always the last connection that became blocked that is selected as a victim. In particular, if any of the connections participating in a deadlock cycle has the Blocking_timeout option set, then the connection in the cycle that would timeout soonest is selected."

Plus, a pop quiz for Panorama participants, courtesy Anil Goel: What is strange about the cyclical deadlock example shown above, and what effect does that strangeness have on the new rollback_operation_count column returned by sa_report_deadlocks?

And a request: Can someone please tell me what sa_report_deadlocks returns for a deadlock involving three or more connections? Or is the question moot?

2 comments:

JohnD said...

oh I love this.
Only last week I did my own event code to persist the deadlock snapshot, but that was in v9.0.2, so I didn't have a DEADLOCK type event - so I hooked onto disconnect and idle.
A suggestion is to clear out the sa_report_deadlocks buffer by turning the log deadlocks option off then on again after you have copied into your own table - saves the complexity of your restriction to not insert rows already in the base table.
Do you know if the DEADLOCK event code fires while the deadlock itself is still present? - would open up the opportunity of logging quite a lot of useful information about the locks held by the connections involved etc.

Breck Carter said...

Re does event fire while deadlock present: You should ask that in the Panorama newsgroup, one of the developers will know the answer. I am guessing "moot" since the event runs on a new connection and I don't think anything is synchronized with the culprit/victim connections.
Re turning the log deadlocks option off then on again... in a really busy environment with sackloads of deadlocks, one could conceive of other deadlocks occurring *between* the INSERT and SETs OFF/ON, or even between the SET OFF and SET ON. Plus thinking about what happens with all these separate and possibly concurrent executions of the deadlock event makes my head hurt. But I DID NOT THINK of any of this when writing the complex INSERT, it just feels right in retrospect :) - Breck Not Often Lucky