Monday, April 13, 2009

The Watcom Restatement

The name "Watcom" dates back to 1981 when a company of that name was formed. In 1988 Watcom created the PACEBase SQL Database System Version 1. Today that product has become... wait for it... SQL Anywhere 11.0.1, and Watcom has become iAnywhere Solutions.

Along the way, long before and long after PACEBase was created, Watcom gained and retained a powerful reputation summed up in this simple rule:

(1) The Watcom Rule: Watcom does things the way they should be done.
That isn't just a slogan, it has important implications for determining how SQL Anywhere works. Sometimes you don't have to look it up in the docs, you just have to apply first principles:
(2) The Watcom Implication: If you want to know how Watcom does something, simply determine how it should be done.
Sadly, life is not a box of chocolates, there is a dark side to The Watcom Rule:
(3) The Watcom Restatement: If you determine how something should be done, but it differs from the way Watcom does it, you got it wrong.
Here's the story of my personal (re)discovery of the dark side: There I am, working on the scripts for a client presentation:
   The World's Fastest Simplest And Most Complete 
End-to-End
Single-Machine
Demonstration Of
SQL Anywhere High Availability
Along the way, while testing and retesting, stopping and starting and crashing and restarting and connecting and reconnecting, I noticed some interesting behavior. Here's what I reported...
(1) Make an 11.0.1.2052 (EBF) dbisql.com connection to the current HA primary database for update OLTP activity.

(2) Make an 11.0.1.2052 (EBF) dbisql.com connection to the current HA secondary database for read-only OLAP activity.

(3) Leave both dbisql.com windows open.

(4) Kill the primary, then restart it. The original secondary is now the primary, and the original primary is now the secondary.

(5) See that both dbisql windows are still active, and apparently "still connected" from an end-user point of view. That's nice. Confusing and surprising to me, but nice.

(6) Look at the properties: dbisql window 1 is now connected to the NEW PRIMARY (different actual server)... that is as it should be.

(7) HOWEVER, dbisql window 2 is connected to the same actual server it used to be, and that server is now the NEW PRIMARY... and it is updatable.

Somehow, I don't think a failover should cause all the OLTP and OLAP sessions to share the same server.

Followup...

I just confirmed the behavior: a dbisql connection to a read-only secondary does not follow the failover swap to the new read-only secondary, but establishes (retains?) a connection to the new updatable primary.
If you didn't follow all that, here's some background: High Availability means having two identical but separate servers, with identical but separate databases. One server is the primary, and it accepts connections that can perform updates. The other server is the secondary or mirror, and the "high availability" feature makes sure its database is automatically synchronized in real time with the primary database. The secondary server doesn't allow updates... until it becomes primary... and that happens automatically when the primary crashes... that process is called failover.

Those are the basics of High Availability, SQL Anywhere's had it for years. What's new in Version 11 is read-only access to the secondary server; e.g., you can make use of the otherwise "wasted" secondary server to run read-only (e.g., OLAP) queries, thus offloading work from the busy (OLTP) primary server.

And here's what I was noticing: If you start server1 as the primary, and server2 as the secondary, and make an OLTP update connection to the primary and an OLAP read-only connection to the secondary, and then stop server1 so that server2 becomes the primary, the OLAP connection doesn't get dropped... it continues working on server2 even though that is now the primary. The OLTP connection does get dropped (this is expected), and it must reconnect, this time to... wait for it... server2 which is the new updatable primary.

In other words, the OLTP and OLAP connections are all now on the same server: server2, the new primary. Nothing changes if you restart server1; it becomes the new secondary, and all the connections remain on server2.

So... I thought this was... a bug.

I thought the OLAP connections should be dropped, and not allowed to reconnect until a read-only secondary server became available, in this case when server1 was restarted.

But I was wrong... it's not a bug, it's a feature...
"This is expected and intended behaviour. The connection to the mirror server is retained if a failover occurs and the mirror becomes primary. It seems arbitrary to disrupt that connection's work and force it to reconnect."
Of course, there are exceptions to every rule, and if you REALLY don't want the OLAP and OLTP connections to share the same server, there are ways around that... but it's YOUR decision... the Watcom way is to leave as many connections connected as possible, and that's the right default.

1 comment:

Anonymous said...

I't' very interesting for me to find more detailed history of watcom software. Especially information about evolution of well known products like watcom c compiler and watcom database. Also interesting to know about most interesting perosnalities in watcom and iAnywhere team e.t.c.