Tuesday, May 29, 2012

What is ndb doing?

In MySQL cluster each SQL statement is translated inside the NDB storage engine to NDB low level protocol that is sent to the data nodes. For the performance it is most interesting how many data is moved between data nodes and MySQL. To monitor this there are several NDB status variables that you can monitor. See this link for more documentation.

(There are also the NDBINFO tables that reflect cluster status. But these are only global values. The status variables also show session status. More about NDBINFO is here.)

To easily report the NDB status on an individual SQL statement, I wrote a little script that gives you the ndb status variables and automatically calculates the diffs before and after the statement in question:


#! /bin/bash
./mysql -t -h olga64 test <<EOF
CREATE TEMPORARY TABLE tmp_before LIKE INFORMATION_SCHEMA.SESSION_STATUS;
CREATE TEMPORARY TABLE tmp_after LIKE INFORMATION_SCHEMA.SESSION_STATUS;
INSERT INTO tmp_before SELECT * FROM INFORMATION_SCHEMA.SESSION_STATUS WHERE VARIABLE_NAME LIKE 'ndb_api%session%';
$1
INSERT INTO tmp_after SELECT * FROM INFORMATION_SCHEMA.SESSION_STATUS WHERE VARIABLE_NAME LIKE 'ndb_api%session%';
SELECT tmp_before.VARIABLE_NAME, tmp_after.VARIABLE_VALUE - tmp_before.VARIABLE_VALUE AS 'VALUE' FROM tmp_after INNER JOIN tmp_before USING (VARIABLE_NAME) WHERE tmp_after.VARIABLE_VALUE <> tmp_before.VARIABLE_VALUE;
EOF



And here are some small examples of understanding what NDB does:


[root@olga64 bin]# ./ndb_prof.sh "SELECT COUNT(*) FROM t;"
+----------+
| COUNT(*) |
+----------+
|    32771 |
+----------+
+--------------------------------------------+----------+
| VARIABLE_NAME                              | VALUE    |
+--------------------------------------------+----------+
| NDB_API_WAIT_SCAN_RESULT_COUNT_SESSION     |        3 |
| NDB_API_WAIT_META_REQUEST_COUNT_SESSION    |        2 |
| NDB_API_WAIT_NANOS_COUNT_SESSION           | 19495775 |
| NDB_API_BYTES_SENT_COUNT_SESSION           |      132 |
| NDB_API_BYTES_RECEIVED_COUNT_SESSION       |      280 |
| NDB_API_TRANS_START_COUNT_SESSION          |        1 |
| NDB_API_TRANS_CLOSE_COUNT_SESSION          |        1 |
| NDB_API_TABLE_SCAN_COUNT_SESSION           |        1 |
| NDB_API_SCAN_BATCH_COUNT_SESSION           |        2 |
| NDB_API_READ_ROW_COUNT_SESSION             |        2 |
| NDB_API_TRANS_LOCAL_READ_ROW_COUNT_SESSION |        2 |
| NDB_API_ADAPTIVE_SEND_FORCED_COUNT_SESSION |        1 |
+--------------------------------------------+----------+


 So SELECT COUNT(*) only returns two rows (NDB_API_READ_ROW_COUNT_SESSION). Probably one row per fragment. (I have a two node cluster). COUNT(*) is optimized! But if you add a WHERE condition:


[root@olga64 bin]# ./ndb_prof.sh "SELECT COUNT(*) FROM t WHERE a<100;"
+----------+
| COUNT(*) |
+----------+
|       99 |
+----------+
+--------------------------------------------+----------+
| VARIABLE_NAME                              | VALUE    |
+--------------------------------------------+----------+
| NDB_API_WAIT_SCAN_RESULT_COUNT_SESSION     |        3 |
| NDB_API_WAIT_META_REQUEST_COUNT_SESSION    |        2 |
| NDB_API_WAIT_NANOS_COUNT_SESSION           | 18267962 |
| NDB_API_BYTES_SENT_COUNT_SESSION           |      140 |
| NDB_API_BYTES_RECEIVED_COUNT_SESSION       |     3248 |
| NDB_API_TRANS_START_COUNT_SESSION          |        1 |
| NDB_API_TRANS_CLOSE_COUNT_SESSION          |        1 |
| NDB_API_RANGE_SCAN_COUNT_SESSION           |        1 |
| NDB_API_SCAN_BATCH_COUNT_SESSION           |        2 |
| NDB_API_READ_ROW_COUNT_SESSION             |       99 |
| NDB_API_TRANS_LOCAL_READ_ROW_COUNT_SESSION |       99 |
| NDB_API_ADAPTIVE_SEND_FORCED_COUNT_SESSION |        1 |
+--------------------------------------------+----------+


 'a' is the primary key. This statement sends all 99 rows, well only the primary keys and mysqld is counting. This one can get more expensive, depending on the row count and primary key size. Look at NDB_API_BYTES_RECEIVED_COUNT_SESSION: 3248 bytes sent. This is roughly 32 bytes per row that is sent.

Monday, March 12, 2012

Why should I consider memcached plugin?

My last post explained what to expect from memcached plugin in MySQL 5.6 (labs release). But I want to take a step back and think about "why" first. Why is memcached plugin of interest at all? What can I gain from using this instead of plain SQL?

First: I don't see this as a replacement for memcached. If you want memory caching with memcached then use memcached.

But the memcached plugin to MySQL is a replacement or addition to the SQL interface to MySQL. So instead of using SQL queries in your application to persist or retrieve data from MySQL you can use the memcached interface. And what are the benefits?
  • Much higher performance
  • Easier scalability via sharding
  • Simpler application coding

1. Performance

Performance is always good. But there are two different aspects of performance: Latency (or runtime) of a specific query and throughput. And the relation of these two measures is not easy and simple to explain. So we need to discuss both:

1.1 Latency

Latency is the time it takes to execute an individual query. If using SQL the server needs to parse that SQL query (which is not so easy because SQL is quite a complex language), then optimize the execution plan of a query and finally execute it. Memcached interface is much simpler so parsing is close to nothing and accessing a single row by an index does not require any optimization. Only the real query execution is the same. I did a test in a VM with one virtual disk. (Not the best benchmark environment ;-) I inserted single rows in multiple threads via C-API (using SQL) and via libmemcachd (using the new plugin). The mean latency for a single query was 20%-30% higher for SQL queries. Considering that my IO configuration was bad, in a better environment the share of query execution would be smaller so the benefit of memcached plugin (no parsing, no optimization) would be even bigger.

1.2 Throughput

Usually lower latency helps improving the throughput. If each query is finished earlier, the server can already start working on the next query. This is true only if the bottleneck is really the CPU or RAM. If latency is mainly due to disk IO, it is not so helpful. During my test I achieved up to 50% higher throughput with memcached plugin. Again, a suitable IO config can probably improve this advantage even further.

2. Scalability

If a single server is no longer enough one solution may be sharding. With SQL you usually have two options: Switch to MySQL cluster, which offers auto-sharding. The other alternative is to implement sharding on your own in your application, which requires some effort.
With memcached plugin it is easier: For connecting to the database you would usually use some library like libmemcached. And many of these libraries offer ... a sharding feature. This is a natural feature for memcached libraries as the goal of memcached is to offer a distributed cache. Depending on the key value (or a hash of it) the library determines which instance to talk to. There is no need for you to extend your application code and manage different database connections. In libmemcached you can simply define your connectstring as "SERVER=donald:11211 SERVER=dagobert:11211 SERVER=daisy:11211". And that's it. Libmemcached will automatically shard your data to all three database servers.
Well, one thing must be mentioned: If you want to query your data via SQL and do aggregation, you have to aggregate in your application. The SQL interface is still not auto-sharded. If you need this kind of functionality, you should consider MySQL Cluster. Cluster will shard your data automatically, wether coming via memcached API or SQL. See more in Andrew's blog. And for MySQL Cluster the memcache interface is already recommended for production use in MySQL Cluster 7.2.

3. Ease of use

The typical architecture is very often application - memcache layer - MySQL database. And for read access the application talks to memcache first, and if this results in a "miss", the application will re-do the query but with a different protocol to the MySQL server. You must implement both protocols: memcache API and SQL. If you use the memcached plugin, you can now access the database directly with the same API functions that you already use for memcache. That makes development much easier.
The sharding feature in the memcached library will make it easier to scale the database. You do not have to implement it on your own.
"Easy of use" only applies if your application is happy with doing only low level data access to a key value store. Once you also need higher level data access like JOINs or aggregation queries, you will again use memcache API and SQL in the same application. So nothing will change to before.

4. Availability?

Not a huge argument. By using the memcached plugin you essentially work with MySQL and you will leverage all possible HA options that are available for MySQL to make your data highly available. It is worth mentioning that memcached plugin supports replication. You have to define daemon_memcached_enable_binlog = 1. That's it.

Code examples

Just for your reference here is the shortcut version of the code that I used to test SQL and memcache API in connecting to MySQL server.

C-API

#include
...
MYSQL *mysql;
char stmt[256];
int key = 0;
mysql = mysql_init(NULL);
if (mysql_real_connect(mysql,"localhost","user","password",
"schema",3306,NULL,0L) == NULL) {
goodbye();
}
for (;;) {
sprintf (stmt, "INSERT INTO kvstore VALUES('key%d','%s',0,0,0)",
key++,value);
if (mysql_query (mysql, stmt) != 0) {goodbye();}
}

Libmemcached

#include
...
char *config_string = "--SERVER=localhost:11211";
memcached_st *memc;
memcached_return_t rc;
memc = memcached(config_string, strlen(config_string));
if (memc == NULL) {goodbye();}
for (;;) {
rc= memcached_set(memc, key, strlen(key), value, strlen(value),
(time_t)0, (uint32_t)0);
if (rc != MEMCACHED_SUCCESS) goodbye();
}

Wednesday, January 18, 2012

Playing with Memcached Plugin

I am currently playing a lot with the new memcached interface to MySQL. Making MySQL a "NoSQL" solution. 

Why should I access the same data via SQL and noSQL protocol? 
A simple noSQL protocol like memcached only has lightweight access methods like "set" or "get". No complex parsing, no optimizing, no execution of expressions. And this additional overhead in SQL can be tremendous: I did a set of SELECT statements only based on primary key. Buffer cache is 50% of the table size. With ENGINE=InnoDB it takes 7.6 seconds to read the data. If I switch to BLACKHOLE engine it takes 6.4 seconds! BLACKHOLE has no code in the storage engine. So queries on BLACKHOLE engine create only load on parser and optimizer but nothing in the storage engine. But if I run on InnoDB it adds only 1 second or 15% runtime. Obviously the main part of execution time is outside the storage engine. Erkan found the same behaviour here. See page 12. The memcached interface accesses the storage engine directly. So it bypasses all the computing in parser, optimizer, expression evaluation and so on. Of course with INSERT statements the part of InnoDB gets bigger as there is more I/O work to do. But nevertheless there can be huge performance gains in using a simpler protocol. If you only want a glass of milk, don't buy a cow. 

Why not use memcached directly? Why that plugin to MySQL?
Memcache is a memory caching but not a persistent datastore. The memcached plugin for MySQL stores the data in an InnoDB table. So you get persistence, nearly unlimited table size and still you can access your data through SQL if you want more complex stuff like COUNT(*) WHERE pkey LIKE ="page:%"; This would not be possible in memcached. But with the memcached plugin you can store data with memcached SET and report on your data with SQL queries.

How can I test it?
Download the labs release from http://labs.mysql.com, do the usual install and then follow the README-innodb_memcached file. It is very simple. Only executing a small sql script and you are ready to test memcached. BUT: If you SET data via telnet to memcached you can retrieve it via GET but you will probably not see the data in the table via SQL. This was a bit confusing to me at least. The secret is the variable daemon_memcached_w_batch_size which is set to 32 by default. The memcached plugin will commit data changes only after 32 SET commands (or INCR, DECR, ...). This batching is good for performance. In fact currently you cannot set daemon_memcached_w_batch_size to values lower than 32. Only bigger is possible. One exception is replication: If you enable replication for the plugin, daemon_memcached_w_batch_size is set to 1. See below.
If you want to see memcached data changes immediately even before a commit, you can set your session to SET TRANSACTION ISOLATION TO READ-UNCOMMITTED;

What about performance?
This is a more complex issue. I will write a separate blog post with some performance discussions. But the summary:
The main parameter is the daemon_memcached_w_batch_size. That will batch 32 statements in memcache protocol into one transaction on InnoDB side.
My first tests showed that (as expected) access via memcached protocoll offers nearly twice the throughput of SQL for writing. This is especially useful if you have only few user connections. If there are many simultaneous connections IO becomes the bottleneck and not the SQL processing. 


What about replication and memcached interface?
This works seamlessly. You only have to specify innodb_direct_access_enable_binlog=1. Currently this variable is not (yet?) dynamic. So best to put it into my.cnf/my.ini. The aforementioned daemon_memcached_w_batch_size is set to 1 in this case which means each SET operation is committed separately. As binlog group commit is not implemented in this labs release, this affects performance very badly. But binlog group commit is already in another labs release and this would probably solve a lot of this performance issue.
More about replication and memcached can be found here: http://blogs.innodb.com/wp/2011/10/innodb-memcached-with-binlog-capability/

What if I want more than one value column?
Memcached only knows one value column. That's it. But the plugin to MySQL can help. Look at the configuration table innodb_memcache.config_options:
mysql> SELECT * FROM innodb_memcache.config_options;
+-----------+-------+
| name      | value |
+-----------+-------+
| separator | |     |
+-----------+-------+
1 row in set (0.00 sec)
This is the magic character to separate different columns. Take another look at my innodb_memcache.containers table:
mysql> SELECT * FROM innodb_memcache.containers;
+------+-----------+----------+-------------+-----------------+-------+------------+--------------------+------------------------+
| name | db_schema | db_table | key_columns | value_columns   | flags | cas_column | expire_time_column | unique_idx_name_on_key |
+------+-----------+----------+-------------+-----------------+-------+------------+--------------------+------------------------+
| bbb  | test      | kvstore  | key         | value,val2,val3 | flags | cas        | expires            | PRIMARY                |
+------+-----------+----------+-------------+-----------------+-------+------------+--------------------+------------------------+

What is the value to memcached protocol is split into three different columns on the MySQL side. And the separator is the "|" pipe character. Let's try:
[root@olga ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
set mykey 0 0 14
abcd|1234|WXYZ
STORED
get mykey
VALUE mykey 0 14
abcd|1234|WXYZ
END
And in MySQL we find the following data:
mysql> SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM kvstore WHERE `key`="mykey";
+-------+-------+-------+------+---------+------+------+
| key   | value | flags | cas  | expires | val2 | val3 |
+-------+-------+-------+------+---------+------+------+
| mykey | abcd  |     0 |    1 |       0 | 1234 | WXYZ |
+-------+-------+-------+------+---------+------+------+
1 row in set (0.00 sec)
Don't forget to set transaction isolation level to read-uncommitted. Due to write batching the last SET statements may not be visible otherwise.

What happens with Foreign Keys?
NoSQL usually means that data is unstructured and the knowledge about data is no longer in the database but in the application. So I am not sure if foreign keys are relevant to traditional key value store applications. But if you distribute your value into different columns in the innodb table, it might be interesting. So let's test it:
CREATE TABLE `kvstore` (
  `key` varchar(32) NOT NULL DEFAULT '',
  `value` varchar(1024) DEFAULT NULL,
  `flags` int(11) DEFAULT NULL,
  `cas` bigint(20) unsigned DEFAULT NULL,
  `expires` int(11) DEFAULT NULL,
  `val2` varchar(32) DEFAULT NULL,
  `val3` varchar(32) NOT NULL,
  PRIMARY KEY (`key`),
  KEY `val3` (`val3`),
  CONSTRAINT `kvstore_ibfk_1` FOREIGN KEY (`val3`) REFERENCES `refdata` (`val3`)
) ENGINE=InnoDB;

mysql> SELECT * FROM refdata;
+------+---------------------+
| val3 | somefield           |
+------+---------------------+
| ABCD | Another good val3   |
| WXYZ | This entry is valid |
+------+---------------------+
2 rows in set (0.00 sec)
So we can add ABCD or WXYZ as the last field via memcached. Let's see what happens:
set mykey3 0 0 14
abcd|1234|ABCD 
STORED
get mykey3
VALUE mykey3 0 14
abcd|1234|ABCD
END
set mykey4 0 0 14
efgh|5678|EFGH
NOT_STORED          <--- That is cool! Memcached plugin appreciates foreign keys!
get efgh
END
get mykey4
END
So foreign key constraints are enforced with the memcached plugin.

How can I access multiple tables via memcached?
Memcached does not know anything about tables. There is only one data store. Usually memcached programmers use a trick to simulate different data stores: They include the table name into the key. So the key is something like "user:4711" or "page:/shop/home" or "session:fh5hjk543bjk". But still all data is in a single table in MySQL. If you want to report via SQL on only one type of data like "session:" you can add "...WHERE pkey LIKE "session:%" to your query. To make it comfortable you can also define different views:
mysql> SELECT * FROM kvstore;
+----------------------+-------------------------------+-------+------+---------+
| key                  | value                         | flags | cas  | expires |
+----------------------+-------------------------------+-------+------+---------+
| session:grcn34r834cn | 2012-01-12 08:32|4711         |     0 |    0 |       0 |
| session:k35jnjkj56ff | 2012-01-14 23:11|4713         |     0 |    0 |       0 |
| user:4711            | dummy|secret|Berlin           |     0 |    0 |       0 |
| user:4712            | superman|unkown|London        |     0 |    0 |       0 |
| user:4713            | wonderwoman|dontknow|New York |     0 |    0 |       0 |
+----------------------+-------------------------------+-------+------+---------+
5 rows in set (0.00 sec)

mysql> CREATE VIEW user AS SELECT RIGHT(`key`,4) AS userID, value FROM kvstore WHERE `key` LIKE "user%";
Query OK, 0 rows affected (0.37 sec)

mysql> SELECT * FROM user;
+--------+-------------------------------+
| userID | value                         |
+--------+-------------------------------+
| 4711   | dummy|secret|Berlin           |
| 4712   | superman|unkown|London        |
| 4713   | wonderwoman|dontknow|New York |
+--------+-------------------------------+
3 rows in set (0.00 sec)

Summary
Memcached plugin is easy to enable and offers very lightweight access protocoll to InnoDB data.
You can store values into different columns. Foreign key relationships are enforced.
You can also replicate data that is stored via memcached plugin to slave servers.
The most important tuning parameter is daemon_memcached_w_batch_size, which is 32 by default, 1 if replicating.

I will add a more enhanced use case of the memcached configuration in another post that should show the benefits of using the same data via memcached protocoll and SQL at the same time.