Thursday, October 20, 2016

Read any replica in MySQL Cluster 7.5

MySQL Cluster uses a distributed architecture where we have one primary
replica and one or more backup replicas of all data in the cluster.
In previous versions we have not supported reading from the backup
replicas. The reason for this comes from the transaction algorithm in
the NDB storage engine. When the transaction have committed we have
unlocked the rows in the primary replica, but the locks in the backup
replica is unlocked in the complete phase.

The impact of this is that we could potentially perform an update
of a row and then immediately read it and not see our own write. To
avoid this problem we have avoided to read the backup replica.

If it is important to read the backup replica then we need to ensure
that transactions are not reported back to the user until the complete
phase.

When we decided to support read from backup replicas we considered how
to handle this. One manner would be to simply have a flag in the
updating transactions that signals that we want the response after the
complete phase instead of after the commit phase.

We decided that this approach would be possible, but we opted for a
safer approach where it is a table property that defines if tables
can be used for reading the backup replica or not.

Given that there is a latency cost to wait until the complete phase
is done this feature is not enabled for all tables. In 7.5 we decided
to have the default that read from backup replicas isn't enabled.

There are a number of ways to enable read from backup replicas.

Let's start with the case when you haven't created the table yet.
In this case the first manner to handle it is to create a table
with the read backup flag set. This is done by using a special
flag set in the COMMENT section of the CREATE TABLE statement.
To set the flag we use the following comment:
COMMENT="... NDB_TABLE=READ_BACKUP=1 ... "
We run a parser on the COMMENT section where we look for the
NDB_TABLE keyword and after that we look for an equal sign
followed by a new keyword READ_BACKUP which can be set to 1 or
0. It is also possible set several table properties in one comment
and in this case we add a comma and the next property. So e.g.
COMMENT="NDB_TABLE=READ_BACKUP=1,PARTITION_BALANCE=FOR_RA_BY_NODE"

The above feature is a good approach when you want read from backup
replicas in some tables, but not from all tables. A more likely
approach is to set the new configuration variable ndb-read-backup
in the MySQL Server that ensures that all tables gets created with
the read backup flag set.

So basically it is necessary to decide if you want to focus on
read performance or low latency of updating transactions. Given
that NDB is very often used for high write loads we have kept the
default to not set the read backup flag. But many applications
are read-focused, so in this case it makes sense to set the
config variable ndb_read_backup to 1.

There is one more case where there are already existing tables
that have been upgraded from an earlier MySQL Cluster version.
In this case it is possible to change those tables to set the
read backup flag online.

The syntax to use this for a table called t1 would be:
ALTER TABLE t1 algorithm=inplace, COMMENT="NDB_TABLE=READ_BACKUP=1";
This will run an online change that can happen concurrent with
reading and writing transactions.

Now when equipped with read backup flag set on your tables it is
possible to localize your application. Assume that you have the
following MySQL Cluster setup.



Now we want to ensure that the MySQL Server 1 always uses the NDB
data node 1 and vice versa for reading. If the MySQL Server 1 and
the NDB data node 1 are using the same hostname or ip address we
will automatically detect that they are on the same host. So in
this case we will always prefer to send reads to the same host.
If they are on the same host but uses different ip address then
we can tell the MySQL Server 1 that its closest data node
neighbour is the node id of NDB data node 1. This is done by
setting this node id in the MySQL configuration variable
ndb-data-node-neighbour to this value.

So with this settings we are able to more or less make a read
optimised architecture using the MySQL Cluster.

Many people that uses MySQL/InnoDB that tries out MySQL Cluster
will most likely expect the behaviour provided with the read
backup flag set. So users of MySQL Cluster that comes from the
MySQL world should most likely set the ndb-read-backup flag
to get the expected behaviour.

No comments: