Monday, November 23, 2015

Document validation of JSON columns in MySQL

Starting with the new release MySQL 5.7 there is support to store JSON documents in a column. During our recent Tech Tour events we got questions about document validation, so ensuring that a JSON document has a certain structure. (Funny. It all started with the idea to be schema-free. Now people seem to need schema enforcement.)
I have two ideas how to implement a schema validation for JSON columns. The first one is by leveraging generated columns together with a foreign key. The second idea is by implementing a trigger. Today I want to focus on the generated columns and foreign keys.
When defining foreign keys with generated columns there are two limitations we need to be aware of:
  • Foreign keys require indexes. JSON columns cannot be indexed. We need to leverage other types.
  • Only STORED generated columns are supported for foreign keys.
So here is an example of an address table that leverages JSON to define an arbitrary number of phone number entries per row. In fact I use a mixed model of relational (e.g. to enforce a strict model for name NOT NULL) and document so that phone numbers are more free to define.

 CREATE TABLE `people` (  
 `name` varchar(30) NOT NULL,  
 `firstname` varchar(30) DEFAULT NULL,  
 `birthdate` date DEFAULT NULL,  
 `phones` json DEFAULT NULL,  
 `phonekeys` varchar(30) GENERATED ALWAYS AS (json_keys(phones)) STORED,  
 KEY `phonekeys` (`phonekeys`));  

The generated column phonekeys is a string that includes the types of phone numbers for each row. Some sample data:

 mysql> INSERT INTO people (name,firstname,birthdate,phones)  
 VALUES ("Plumber", "Joe, the", "1972-05-05",'{"work": "+1(555)24680"}');  
 Query OK, 1 row affected (0.00 sec)  
 ...some more inserts...  
 mysql> SELECT * FROM people;  
 | name | firstname | birthdate | phones | phonekeys |  
 | Doe | John | 1995-04-17 | {"mobile": "+491715555555", "private": "+49305555555"} | ["mobile", "private"] |  
 | Dian | Mary | 1963-12-12 | {"work": "+43987654321"} | ["work"] |  
 | Plumber | Joe, the | 1972-05-05 | {"work": "+1(555)24680"} | ["work"] |  
 3 rows in set (0.00 sec)  

The column phonekeys gets populated automatically.
To check that we use "correct" attributes in our JSON object we can now create a table that contains the valid JSON keys:

  CREATE TABLE `valid_keys` (  
  `keylist` varchar(30) NOT NULL,  
  PRIMARY KEY (`keylist`)  
 1 row in set (0.01 sec)  
 ... after some inserts...  
 mysql> SELECT * FROM valid_keys;  
 | keylist            |  
 | ["mobile", "private", "work"] |  
 | ["mobile", "private"]     |  
 | ["work"]           |  
 3 rows in set (0.00 sec)  

Now we can define a foreign key with the people table as a child table:
mysql> alter table people add foreign key (phonekeys) references valid_keys (keylist);

That should enforce that inserted JSON documents in the people table must have a list of attributes that matches any entry in the valid_keys table. Let's try:

mysql> INSERT INTO people (name,phones) VALUES ("me", JSON_OBJECT("work","12243"));
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO people (name,phones) VALUES ("my friend", JSON_OBJECT("home","12243"));
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`mario`.`people`, CONSTRAINT `people_ibfk_1` FOREIGN KEY (`phonekeys`) REFERENCES `valid_keys` (`keylist`))

Works fine. "home" is not an allowed attribute. I can leverage the foreign keys to make sure my phone numbers match a certain attribute list. However it is not perfectly simple to use. With five different allowed attributes in an arbitrary order you would have to add all permutations to the valid_keys table. With five attributes you end up with 6! permutations ("not defining an attribute" is also an option, hence six), which results in 720 rows for valid_keys. But it is a first start. For more complex explamples the ideas with triggers might be more favorable.

No comments:

Post a Comment