Thursday, November 26, 2015

JSON memory consumption

I got some more questions on the new JSON data type and functions during our TechTours. And I like to summarize the answers in this blogpost.

Memory consumption 

The binary format of the JSON data type should consume more memory. But how much? I did a little test by comparing a freshly loaded 25,000 row dataset stored as JSON and stored as TEXT. Seven top level attributes per JSON document. Average JSON_DEPTH is 5.9 . Let's see:
mysql> DESC data_as_text;
| Field | Type    | Null | Key | Default | Extra |
| id    | int(11) | NO   | PRI | NULL    |       |
| doc   | text    | YES  |     | NULL    |       |
2 rows in set (0.00 sec)

mysql> SELECT COUNT(*),AVG(JSON_LENGTH(doc)) FROM data_as_text;
| COUNT(*) | AVG(JSON_LENGTH(doc)) |
|    25359 |                7.0000 |
1 row in set (0.81 sec)

mysql> DESC data_as_json;
| Field | Type    | Null | Key | Default | Extra          |
| id    | int(11) | NO   | PRI | NULL    | auto_increment |
| doc   | json    | NO   |     | NULL    |                |
2 rows in set (0.00 sec)

mysql> SELECT COUNT(*),AVG(JSON_LENGTH(doc)) FROM data_as_json;
| COUNT(*) | AVG(JSON_LENGTH(doc)) |
|    25359 |                7.0000 |
1 row in set (0.08 sec)

mysql> select name,allocated_size/1024/1024 AS "size in MB" from information_schema.innodb_sys_tablespaces where name like "%temp%";
| name              | size in MB  |
| temp/data_as_json | 23.00390625 |
| temp/data_as_text | 22.00390625 |
2 rows in set (0.00 sec)
The increased memory consumption is 1/22 in this case, which is roughly 4,5%. At the same time you see the benefit: The full table scan with some JSON operation has a 90% reduction in runtime when using JSON datatype.
Don't take this number for real. Of course it depends on the number of JSON attributes, character set and others. Just a rough indication. If you want all the details look at the JSON architecture in WL#8132.

Monday, November 23, 2015

Document validation of JSON columns in MySQL

Starting with the new release MySQL 5.7 there is support to store JSON documents in a column. During our recent Tech Tour events we got questions about document validation, so ensuring that a JSON document has a certain structure. (Funny. It all started with the idea to be schema-free. Now people seem to need schema enforcement.)
I have two ideas how to implement a schema validation for JSON columns. The first one is by leveraging generated columns together with a foreign key. The second idea is by implementing a trigger. Today I want to focus on the generated columns and foreign keys.
When defining foreign keys with generated columns there are two limitations we need to be aware of:
  • Foreign keys require indexes. JSON columns cannot be indexed. We need to leverage other types.
  • Only STORED generated columns are supported for foreign keys.
So here is an example of an address table that leverages JSON to define an arbitrary number of phone number entries per row. In fact I use a mixed model of relational (e.g. to enforce a strict model for name NOT NULL) and document so that phone numbers are more free to define.

 CREATE TABLE `people` (  
 `name` varchar(30) NOT NULL,  
 `firstname` varchar(30) DEFAULT NULL,  
 `birthdate` date DEFAULT NULL,  
 `phones` json DEFAULT NULL,  
 `phonekeys` varchar(30) GENERATED ALWAYS AS (json_keys(phones)) STORED,  
 KEY `phonekeys` (`phonekeys`));  

The generated column phonekeys is a string that includes the types of phone numbers for each row. Some sample data:

 mysql> INSERT INTO people (name,firstname,birthdate,phones)  
 VALUES ("Plumber", "Joe, the", "1972-05-05",'{"work": "+1(555)24680"}');  
 Query OK, 1 row affected (0.00 sec)  
 ...some more inserts...  
 mysql> SELECT * FROM people;  
 | name | firstname | birthdate | phones | phonekeys |  
 | Doe | John | 1995-04-17 | {"mobile": "+491715555555", "private": "+49305555555"} | ["mobile", "private"] |  
 | Dian | Mary | 1963-12-12 | {"work": "+43987654321"} | ["work"] |  
 | Plumber | Joe, the | 1972-05-05 | {"work": "+1(555)24680"} | ["work"] |  
 3 rows in set (0.00 sec)  

The column phonekeys gets populated automatically.
To check that we use "correct" attributes in our JSON object we can now create a table that contains the valid JSON keys:

  CREATE TABLE `valid_keys` (  
  `keylist` varchar(30) NOT NULL,  
  PRIMARY KEY (`keylist`)  
 1 row in set (0.01 sec)  
 ... after some inserts...  
 mysql> SELECT * FROM valid_keys;  
 | keylist            |  
 | ["mobile", "private", "work"] |  
 | ["mobile", "private"]     |  
 | ["work"]           |  
 3 rows in set (0.00 sec)  

Now we can define a foreign key with the people table as a child table:
mysql> alter table people add foreign key (phonekeys) references valid_keys (keylist);

That should enforce that inserted JSON documents in the people table must have a list of attributes that matches any entry in the valid_keys table. Let's try:

mysql> INSERT INTO people (name,phones) VALUES ("me", JSON_OBJECT("work","12243"));
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO people (name,phones) VALUES ("my friend", JSON_OBJECT("home","12243"));
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`mario`.`people`, CONSTRAINT `people_ibfk_1` FOREIGN KEY (`phonekeys`) REFERENCES `valid_keys` (`keylist`))

Works fine. "home" is not an allowed attribute. I can leverage the foreign keys to make sure my phone numbers match a certain attribute list. However it is not perfectly simple to use. With five different allowed attributes in an arbitrary order you would have to add all permutations to the valid_keys table. With five attributes you end up with 6! permutations ("not defining an attribute" is also an option, hence six), which results in 720 rows for valid_keys. But it is a first start. For more complex explamples the ideas with triggers might be more favorable.