Wednesday, March 19, 2014

MySQL Cluster on Raspberry Pi - Sub-second failover

MySQL Cluster claims to achieve sub-second failover without any data loss for commited transactions. And I always wanted to show this in a demo. Now we created that demo finally. See Mark's blog and Keith's blog for setting up MySQL Cluster on RaspberryPi.
The nice thing about the RPis is that you can easily pull the plug to test failover. Ok, that is only one possible failure scenario but for sure the most obvious and more impressive than "kill -9".

That demo application is constantly using the database for storing new lines, removing old lines and reading all line data for the graphical view. There is no caching. It uses JDBC directly.
To document the setup here is the config.ini file for MySQL Cluster:

[ndbd default]


I made the cluster diskless so it will not write any logs and table spaces to disk. The SD card performance was not great and it does not affect failover behavior.
I also reduced the HeatbeatIntervallDbDb so that nodes detect immediately (well, 10ms) if a heartbeat is missed. After a few missed heartbeats cluster reconfigures and the remaining node takes responsibility and service continues.
BTW: Pulling the plug is nice but every now and then I had to manually fsck the root-fs during reboot.

Tuesday, March 12, 2013

New in 5.6: --innodb_read_only, running MySQL from DVD

I recently met two distinct customers who want to use MySQL as a read-only database, stored on a read-only medium like a DVD. With MyISAM this was easily possible. Starting with MySQL 5.6 it is also possible to run this setup with InnoDB based databases. There is a new option:


See details in the reference manual. This options opens all tablespaces and InnoDB log files as read-only. A small test showed that you need to set some more options in the my.cnf to avoid any write access to @@datadir:


I was a bit surprised why I had to disable the event scheduler. But on the other hand, what use has a regularly running statement that cannot store any data? After all your database is read only ;-)

And together with the new fulltext indexes in InnoDB and maybe compressed table spaces you can now deploy catalogue applications or reference manuals on DVD.

Tuesday, May 29, 2012

What is ndb doing?

In MySQL cluster each SQL statement is translated inside the NDB storage engine to NDB low level protocol that is sent to the data nodes. For the performance it is most interesting how many data is moved between data nodes and MySQL. To monitor this there are several NDB status variables that you can monitor. See this link for more documentation.

(There are also the NDBINFO tables that reflect cluster status. But these are only global values. The status variables also show session status. More about NDBINFO is here.)

To easily report the NDB status on an individual SQL statement, I wrote a little script that gives you the ndb status variables and automatically calculates the diffs before and after the statement in question:

#! /bin/bash
./mysql -t -h olga64 test <<EOF

And here are some small examples of understanding what NDB does:

[root@olga64 bin]# ./ "SELECT COUNT(*) FROM t;"
| COUNT(*) |
|    32771 |
| VARIABLE_NAME                              | VALUE    |
| NDB_API_WAIT_NANOS_COUNT_SESSION           | 19495775 |
| NDB_API_BYTES_SENT_COUNT_SESSION           |      132 |
| NDB_API_TRANS_START_COUNT_SESSION          |        1 |
| NDB_API_TRANS_CLOSE_COUNT_SESSION          |        1 |
| NDB_API_TABLE_SCAN_COUNT_SESSION           |        1 |
| NDB_API_SCAN_BATCH_COUNT_SESSION           |        2 |
| NDB_API_READ_ROW_COUNT_SESSION             |        2 |

 So SELECT COUNT(*) only returns two rows (NDB_API_READ_ROW_COUNT_SESSION). Probably one row per fragment. (I have a two node cluster). COUNT(*) is optimized! But if you add a WHERE condition:

[root@olga64 bin]# ./ "SELECT COUNT(*) FROM t WHERE a<100;"
| COUNT(*) |
|       99 |
| VARIABLE_NAME                              | VALUE    |
| NDB_API_WAIT_NANOS_COUNT_SESSION           | 18267962 |
| NDB_API_BYTES_SENT_COUNT_SESSION           |      140 |
| NDB_API_TRANS_START_COUNT_SESSION          |        1 |
| NDB_API_TRANS_CLOSE_COUNT_SESSION          |        1 |
| NDB_API_RANGE_SCAN_COUNT_SESSION           |        1 |
| NDB_API_SCAN_BATCH_COUNT_SESSION           |        2 |
| NDB_API_READ_ROW_COUNT_SESSION             |       99 |

 'a' is the primary key. This statement sends all 99 rows, well only the primary keys and mysqld is counting. This one can get more expensive, depending on the row count and primary key size. Look at NDB_API_BYTES_RECEIVED_COUNT_SESSION: 3248 bytes sent. This is roughly 32 bytes per row that is sent.

Monday, March 12, 2012

Why should I consider memcached plugin?

My last post explained what to expect from memcached plugin in MySQL 5.6 (labs release). But I want to take a step back and think about "why" first. Why is memcached plugin of interest at all? What can I gain from using this instead of plain SQL?

First: I don't see this as a replacement for memcached. If you want memory caching with memcached then use memcached.

But the memcached plugin to MySQL is a replacement or addition to the SQL interface to MySQL. So instead of using SQL queries in your application to persist or retrieve data from MySQL you can use the memcached interface. And what are the benefits?
  • Much higher performance
  • Easier scalability via sharding
  • Simpler application coding

1. Performance

Performance is always good. But there are two different aspects of performance: Latency (or runtime) of a specific query and throughput. And the relation of these two measures is not easy and simple to explain. So we need to discuss both:

1.1 Latency

Latency is the time it takes to execute an individual query. If using SQL the server needs to parse that SQL query (which is not so easy because SQL is quite a complex language), then optimize the execution plan of a query and finally execute it. Memcached interface is much simpler so parsing is close to nothing and accessing a single row by an index does not require any optimization. Only the real query execution is the same. I did a test in a VM with one virtual disk. (Not the best benchmark environment ;-) I inserted single rows in multiple threads via C-API (using SQL) and via libmemcachd (using the new plugin). The mean latency for a single query was 20%-30% higher for SQL queries. Considering that my IO configuration was bad, in a better environment the share of query execution would be smaller so the benefit of memcached plugin (no parsing, no optimization) would be even bigger.

1.2 Throughput

Usually lower latency helps improving the throughput. If each query is finished earlier, the server can already start working on the next query. This is true only if the bottleneck is really the CPU or RAM. If latency is mainly due to disk IO, it is not so helpful. During my test I achieved up to 50% higher throughput with memcached plugin. Again, a suitable IO config can probably improve this advantage even further.

2. Scalability

If a single server is no longer enough one solution may be sharding. With SQL you usually have two options: Switch to MySQL cluster, which offers auto-sharding. The other alternative is to implement sharding on your own in your application, which requires some effort.
With memcached plugin it is easier: For connecting to the database you would usually use some library like libmemcached. And many of these libraries offer ... a sharding feature. This is a natural feature for memcached libraries as the goal of memcached is to offer a distributed cache. Depending on the key value (or a hash of it) the library determines which instance to talk to. There is no need for you to extend your application code and manage different database connections. In libmemcached you can simply define your connectstring as "SERVER=donald:11211 SERVER=dagobert:11211 SERVER=daisy:11211". And that's it. Libmemcached will automatically shard your data to all three database servers.
Well, one thing must be mentioned: If you want to query your data via SQL and do aggregation, you have to aggregate in your application. The SQL interface is still not auto-sharded. If you need this kind of functionality, you should consider MySQL Cluster. Cluster will shard your data automatically, wether coming via memcached API or SQL. See more in Andrew's blog. And for MySQL Cluster the memcache interface is already recommended for production use in MySQL Cluster 7.2.

3. Ease of use

The typical architecture is very often application - memcache layer - MySQL database. And for read access the application talks to memcache first, and if this results in a "miss", the application will re-do the query but with a different protocol to the MySQL server. You must implement both protocols: memcache API and SQL. If you use the memcached plugin, you can now access the database directly with the same API functions that you already use for memcache. That makes development much easier.
The sharding feature in the memcached library will make it easier to scale the database. You do not have to implement it on your own.
"Easy of use" only applies if your application is happy with doing only low level data access to a key value store. Once you also need higher level data access like JOINs or aggregation queries, you will again use memcache API and SQL in the same application. So nothing will change to before.

4. Availability?

Not a huge argument. By using the memcached plugin you essentially work with MySQL and you will leverage all possible HA options that are available for MySQL to make your data highly available. It is worth mentioning that memcached plugin supports replication. You have to define daemon_memcached_enable_binlog = 1. That's it.

Code examples

Just for your reference here is the shortcut version of the code that I used to test SQL and memcache API in connecting to MySQL server.


MYSQL *mysql;
char stmt[256];
int key = 0;
mysql = mysql_init(NULL);
if (mysql_real_connect(mysql,"localhost","user","password",
"schema",3306,NULL,0L) == NULL) {
for (;;) {
sprintf (stmt, "INSERT INTO kvstore VALUES('key%d','%s',0,0,0)",
if (mysql_query (mysql, stmt) != 0) {goodbye();}


char *config_string = "--SERVER=localhost:11211";
memcached_st *memc;
memcached_return_t rc;
memc = memcached(config_string, strlen(config_string));
if (memc == NULL) {goodbye();}
for (;;) {
rc= memcached_set(memc, key, strlen(key), value, strlen(value),
(time_t)0, (uint32_t)0);
if (rc != MEMCACHED_SUCCESS) goodbye();

Wednesday, January 18, 2012

Playing with Memcached Plugin

I am currently playing a lot with the new memcached interface to MySQL. Making MySQL a "NoSQL" solution. 

Why should I access the same data via SQL and noSQL protocol? 
A simple noSQL protocol like memcached only has lightweight access methods like "set" or "get". No complex parsing, no optimizing, no execution of expressions. And this additional overhead in SQL can be tremendous: I did a set of SELECT statements only based on primary key. Buffer cache is 50% of the table size. With ENGINE=InnoDB it takes 7.6 seconds to read the data. If I switch to BLACKHOLE engine it takes 6.4 seconds! BLACKHOLE has no code in the storage engine. So queries on BLACKHOLE engine create only load on parser and optimizer but nothing in the storage engine. But if I run on InnoDB it adds only 1 second or 15% runtime. Obviously the main part of execution time is outside the storage engine. Erkan found the same behaviour here. See page 12. The memcached interface accesses the storage engine directly. So it bypasses all the computing in parser, optimizer, expression evaluation and so on. Of course with INSERT statements the part of InnoDB gets bigger as there is more I/O work to do. But nevertheless there can be huge performance gains in using a simpler protocol. If you only want a glass of milk, don't buy a cow. 

Why not use memcached directly? Why that plugin to MySQL?
Memcache is a memory caching but not a persistent datastore. The memcached plugin for MySQL stores the data in an InnoDB table. So you get persistence, nearly unlimited table size and still you can access your data through SQL if you want more complex stuff like COUNT(*) WHERE pkey LIKE ="page:%"; This would not be possible in memcached. But with the memcached plugin you can store data with memcached SET and report on your data with SQL queries.

How can I test it?
Download the labs release from, do the usual install and then follow the README-innodb_memcached file. It is very simple. Only executing a small sql script and you are ready to test memcached. BUT: If you SET data via telnet to memcached you can retrieve it via GET but you will probably not see the data in the table via SQL. This was a bit confusing to me at least. The secret is the variable daemon_memcached_w_batch_size which is set to 32 by default. The memcached plugin will commit data changes only after 32 SET commands (or INCR, DECR, ...). This batching is good for performance. In fact currently you cannot set daemon_memcached_w_batch_size to values lower than 32. Only bigger is possible. One exception is replication: If you enable replication for the plugin, daemon_memcached_w_batch_size is set to 1. See below.
If you want to see memcached data changes immediately even before a commit, you can set your session to SET TRANSACTION ISOLATION TO READ-UNCOMMITTED;

What about performance?
This is a more complex issue. I will write a separate blog post with some performance discussions. But the summary:
The main parameter is the daemon_memcached_w_batch_size. That will batch 32 statements in memcache protocol into one transaction on InnoDB side.
My first tests showed that (as expected) access via memcached protocoll offers nearly twice the throughput of SQL for writing. This is especially useful if you have only few user connections. If there are many simultaneous connections IO becomes the bottleneck and not the SQL processing. 

What about replication and memcached interface?
This works seamlessly. You only have to specify innodb_direct_access_enable_binlog=1. Currently this variable is not (yet?) dynamic. So best to put it into my.cnf/my.ini. The aforementioned daemon_memcached_w_batch_size is set to 1 in this case which means each SET operation is committed separately. As binlog group commit is not implemented in this labs release, this affects performance very badly. But binlog group commit is already in another labs release and this would probably solve a lot of this performance issue.
More about replication and memcached can be found here:

What if I want more than one value column?
Memcached only knows one value column. That's it. But the plugin to MySQL can help. Look at the configuration table innodb_memcache.config_options:
mysql> SELECT * FROM innodb_memcache.config_options;
| name      | value |
| separator | |     |
1 row in set (0.00 sec)
This is the magic character to separate different columns. Take another look at my innodb_memcache.containers table:
mysql> SELECT * FROM innodb_memcache.containers;
| name | db_schema | db_table | key_columns | value_columns   | flags | cas_column | expire_time_column | unique_idx_name_on_key |
| bbb  | test      | kvstore  | key         | value,val2,val3 | flags | cas        | expires            | PRIMARY                |

What is the value to memcached protocol is split into three different columns on the MySQL side. And the separator is the "|" pipe character. Let's try:
[root@olga ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
set mykey 0 0 14
get mykey
VALUE mykey 0 14
And in MySQL we find the following data:
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM kvstore WHERE `key`="mykey";
| key   | value | flags | cas  | expires | val2 | val3 |
| mykey | abcd  |     0 |    1 |       0 | 1234 | WXYZ |
1 row in set (0.00 sec)
Don't forget to set transaction isolation level to read-uncommitted. Due to write batching the last SET statements may not be visible otherwise.

What happens with Foreign Keys?
NoSQL usually means that data is unstructured and the knowledge about data is no longer in the database but in the application. So I am not sure if foreign keys are relevant to traditional key value store applications. But if you distribute your value into different columns in the innodb table, it might be interesting. So let's test it:
CREATE TABLE `kvstore` (
  `key` varchar(32) NOT NULL DEFAULT '',
  `value` varchar(1024) DEFAULT NULL,
  `flags` int(11) DEFAULT NULL,
  `cas` bigint(20) unsigned DEFAULT NULL,
  `expires` int(11) DEFAULT NULL,
  `val2` varchar(32) DEFAULT NULL,
  `val3` varchar(32) NOT NULL,
  PRIMARY KEY (`key`),
  KEY `val3` (`val3`),
  CONSTRAINT `kvstore_ibfk_1` FOREIGN KEY (`val3`) REFERENCES `refdata` (`val3`)

mysql> SELECT * FROM refdata;
| val3 | somefield           |
| ABCD | Another good val3   |
| WXYZ | This entry is valid |
2 rows in set (0.00 sec)
So we can add ABCD or WXYZ as the last field via memcached. Let's see what happens:
set mykey3 0 0 14
get mykey3
VALUE mykey3 0 14
set mykey4 0 0 14
NOT_STORED          <--- That is cool! Memcached plugin appreciates foreign keys!
get efgh
get mykey4
So foreign key constraints are enforced with the memcached plugin.

How can I access multiple tables via memcached?
Memcached does not know anything about tables. There is only one data store. Usually memcached programmers use a trick to simulate different data stores: They include the table name into the key. So the key is something like "user:4711" or "page:/shop/home" or "session:fh5hjk543bjk". But still all data is in a single table in MySQL. If you want to report via SQL on only one type of data like "session:" you can add "...WHERE pkey LIKE "session:%" to your query. To make it comfortable you can also define different views:
mysql> SELECT * FROM kvstore;
| key                  | value                         | flags | cas  | expires |
| session:grcn34r834cn | 2012-01-12 08:32|4711         |     0 |    0 |       0 |
| session:k35jnjkj56ff | 2012-01-14 23:11|4713         |     0 |    0 |       0 |
| user:4711            | dummy|secret|Berlin           |     0 |    0 |       0 |
| user:4712            | superman|unkown|London        |     0 |    0 |       0 |
| user:4713            | wonderwoman|dontknow|New York |     0 |    0 |       0 |
5 rows in set (0.00 sec)

mysql> CREATE VIEW user AS SELECT RIGHT(`key`,4) AS userID, value FROM kvstore WHERE `key` LIKE "user%";
Query OK, 0 rows affected (0.37 sec)

mysql> SELECT * FROM user;
| userID | value                         |
| 4711   | dummy|secret|Berlin           |
| 4712   | superman|unkown|London        |
| 4713   | wonderwoman|dontknow|New York |
3 rows in set (0.00 sec)

Memcached plugin is easy to enable and offers very lightweight access protocoll to InnoDB data.
You can store values into different columns. Foreign key relationships are enforced.
You can also replicate data that is stored via memcached plugin to slave servers.
The most important tuning parameter is daemon_memcached_w_batch_size, which is 32 by default, 1 if replicating.

I will add a more enhanced use case of the memcached configuration in another post that should show the benefits of using the same data via memcached protocoll and SQL at the same time.

Monday, March 29, 2010

MySQL is so slow on Windows... Really?

Last week a customer called me and reported that MySQL was 30 times slower than MS SQL server. Oooops. That's a lot. No way to argue or throw smoke bombs. 30 times slower!
It was a standard installation of MySQL (typical install option) on plain Windows 7 and the same for MS SQL Server 2008. The test run was a batch of 30.000 INSERT commands in an SQL script. Runtime was 1 minute on MSSQL and 30 minutes on MySQL.
Some tests later we found out that it was only bad on InnoDB. MyISAM was as fast as MSSQL. (I didn't care which one was a bit faster. I didn't care as long as InnoDB was 30 times slower) Finally we nailed the problem down to one parameter in MySQL:
Each INSERT statement is a single transaction (autocommit mode). MySQL is configured very faithfully and ensures that each transaction is really stored on disk. This is necessary for ACID compliance. D in ACID stands for 'durability'. To store data durable, at least the log file has to be written physically. That's why MySQL when a transaction commits forces the operating system to flush its buffers and even forces the disk cache to flush its buffer. That's the meaning of flush_log_at_trx_commit = 1 in the my.ini or my.cnf file.
MSSQL is much more relaxed with your data. It writes the data to disk device. But it may stay in the disk cache, and MSSQL does not care. If you have a crash, your data is not up-to-date on the physical disk and you may lose data. This is definitely not ACID compliant. Microsoft documented this here:
  • By default, the disk cache is enabled. Use the 'Disk Properties', Hardware tab to access the 'Properties', 'Policy' tab to control the disk cache setting. (Note Some drives do not honor this setting. These drives require a specific manufacturer utility to disable cache.)
  • ...
  • Disk caching should be disabled in order to use the drive with SQL Server.
So to have a fair comparison beween MSSQL and MySQL either
  • set innodb_flush_log_at_trx_commit = 0 (or 2)
    This forces the flush to disk only once per second and brings good performance but data is not 100% safe on disk (unless you have a battery backed write cache)
  • disable the disk cache in Windows 7
    This will force MSSQL to write physically to disk. And then MSSQL is 30 times slower than before. ;-)
Lessons learned:
  • Think a lot about how to do a fair comparison.
  • Either run (unsafe and fast), or (safe and slow) or (safe and fast and expensive) with a battery backed write cache controller
  • Read the manual for MSSQL. There may be important news on page 3647+x.
The funny thing, that confused me a lot: On my Mac InnoDB was at around 1 minute. And even MySQL on Windows 7 in Virtual Box on my Mac was about 1 minute. The reason is: MySQL on Mac makes a fsync() that does not flush the disk cache. It flushes only the buffer cache of the operating system. And Windows 7 on Virtual Box: The disk is a plain file on the host OS. And this file is of course buffered in the host OS. So Virtualization may destroy your ACID compliance...
If you need more info on the different cache levels for file IO here is a very good link:

Monday, January 11, 2010

"How to find the source of queries in MySQL Query Analyzer" or "SQL comments in Query Analyzer"

MySQL Enterprise Monitor offers a tool called "Query Analyzer" (QuAn). QuAn sits between any client app and the MySQL server and logs every query and its runtime statistics. A very cool tool for analyzing your SQL. More information is available here.
If you identify a query, that needs some improvement, sometimes it is hard to identify the source of that query as well. With hundreds of different PHP scripts for example it is not easy to know, which one issued the query, that you want to modify.
A good way to achieve this is adding C-style SQL comments. Let's look at an example:
SELECT * FROM mytable /*main.php*/;
Query Analyzer will strip that comment off before archiving the query. This is ok, because QuAn wants to consolidate all similar queries and this comment is irrelevant for the query.
But you can use version specific comments in MySQL. QuAn cannot ignore these comments,
because they may be relevant for query execution. And this is the solution for our problem:
SELECT * FROM mytable /*!99999 main.php*/;
This SQL comment will be executed on MySQL version 9.99.99 and later. But this version does not exist. So in reality the comment gets not executed at all. Currently we are at MySQL 5.1.42 so it will take some time before we reach MySQL 9.99.99 ;-)
The comment is in QuAn and will be logged. So you see the comment when monitoring the query and you know immediately, where that query came from.