Mablomy: TTL - Time-to-Live in MySQL

A customer recently asked for a TTL feature in MySQL. The idea is to automatically delete rows from a certain table after a defined lifespan, e.g. 60 seconds. This feature is common in many NoSQL databases, but it is not available in MySQL. However MySQL offers all you need to implement this. And due to partitioning much more efficient than only deleting rows. Let's test it.

tl;dr

Partition the table and truncate partitions in a regular event procedure, that does the trick and comes at a fraction of the cost of regularly deleting rows.

The test case

The table needs a column to keep track of row age. This can be either a "created_at" column or an "expires_at" column. ("expires_at" has the additional advantage that each row can have an individual lifespan. Not possible in many NoSQL solutions.)
So my table is
CREATE TABLE `t` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`content` varchar(42) DEFAULT NULL,
);
I tested two variants to implement a 10 seconds TTL on my table "t":

The simple solution

Run an event every 10 seconds to delete rows that have been created more than 10s ago.
DELIMITER |
CREATE EVENT ttl_delete
        ON SCHEDULE EVERY 10 SECOND STARTS '2019-03-04 16:00:00' DISABLE
        DO BEGIN
                DELETE FROM t WHERE created_at < NOW() - INTERVAL 10 SECOND;
        END |
DELIMITER ;
And index on "created_at" might improve performance for the DELETE job. But in any case it is quite expensive to scan the table and remove roughly 50% of the rows of a table, at least if the INSERT rate is high.

The efficient solution

Instead of DELETing we can use the much faster TRUNCATE operation. Obviously we do not want to TRUNCATE the whole table but if we distribute the inserted rows into partitions it is safe to truncate any partition that contains outdated rows. Let's define three partitions (or buckets): One that is currently being written to, one that holds rows of the last 10 seconds and one partition that can be truncated because the rows are older than 10 seconds. Key is to calculate the bucket from the current time. This can be done with the expression FLOOR(TO_SECONDS(NOW()/10)) % 3, or more generic FLOOR(TO_SECONDS(NOW()/ttl))% number_of_buckets
Now we can partition the table by this expression. For that we add a generated column to calculate the bucket from the column "created_at" and partition the table by column "bucket". The table now looks like this:
CREATE TABLE `t` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`content` varchar(42) DEFAULT NULL,
`bucket` tinyint(4) GENERATED ALWAYS AS
       (floor(TO_SECONDS(`created_at`) / 10) % 3) STORED NOT NULL,
PRIMARY KEY (`id`,`bucket`)
) PARTITION BY LIST (`bucket`)
(PARTITION p0 VALUES IN (0),
PARTITION p1 VALUES IN (1),
PARTITION p2 VALUES IN (2));
And the event procedure is like this:
DELIMITER |
CREATE EVENT ttl_truncate
    ON SCHEDULE EVERY 10 SECOND STARTS '2019-03-04 16:00:00' DISABLE
    DO BEGIN
            CASE FLOOR(TO_SECONDS(NOW())/10)%3
            WHEN 0 THEN ALTER TABLE test.t TRUNCATE PARTITION p1;
            WHEN 1 THEN ALTER TABLE test.t TRUNCATE PARTITION p2;
            WHEN 2 THEN ALTER TABLE test.t TRUNCATE PARTITION p0;
            END CASE;
        END|
DELIMITER ;

Watching the rows come and go

To verify that the procedure works as expected I created a small monitor procedure that displays each second the number of rows per partition. Then it is easy to follow in which partition data is currently added and when a partition gets truncated.
DELIMITER |
CREATE PROCEDURE monitor()
BEGIN
WHILE 1=1 DO
   SELECT "p0" AS "part", count(*) FROM t PARTITION (p0)
         UNION SELECT "p1", count(*) FROM t PARTITION (p1)
           UNION SELECT "p2", count(*) FROM t PARTITION (p2);
   SELECT now() AS "NOW", floor(to_seconds(now())/10)%3 AS "Bucket";
   SELECT sleep(1);
END WHILE;
END|
DELIMITER ;
This procedure is not ideal. Too many count(*) will create quite some locking. But it is accurate. The alternative is to read data from INFORMATION_SCHEMA.partitions, but this does not give the exact row count, which I needed for verification.

Increasing Accuracy

If TTL is 10 seconds, deleting or truncating every 10 seconds means you have at least 10 seconds of rows available. In reality you will have 10 to 20 seconds worth of data, so on average 15 seconds (assuming a constant INSERT rate). If you run the cleaner job more often (say once per second) the average number of rows is 10.5 seconds worth of data. This comes at the cost of running the cleaning event more often. But it might be very beneficial to increase this accuracy because all other queries benefit from less data to operate on and less memory consumed by expired rows.
If you go with the simple solution of a regular DELETE event, it is sufficient to schedule the event more often.
If you prefer the TRUNCATE PARTITION option, it is necessary to increase the number of partitions or buckets to 12 (= 2 + TTL / how often to run the cleaning job).
The expression for the calculated bucket column will be
`bucket` tinyint(4) GENERATED ALWAYS AS
       (floor(TO_SECONDS(`created_at`) / TTL) % #buckets) STORED NOT NULL
and the partitioning needs to be adapted as well.
And the CASE construct in the cleaner event must be extended for each newly existing bucket/partition:
        WHEN n THEN ALTER TABLE test.t TRUNCATE PARTITION p(n+1);

What happens if...

... the event stops?
Then you keep all your rows which will likely create some follow-up problems. As always: Proper monitoring is key. Think about MySQL Enterprise Monitor for example.

... the event procedure runs at inaccurate timing due to overall system load?
No big problem. It will never run too early. So it will never remove rows too early. If it runs too late it will clean rows too late so you have more garbage in your table which might affect other queries. The real TTL is increased if this happens.

Performance Considerations

By no means I am able to run proper performance tests. I am running on a Win10 laptop, VirtualBox with Oracle Linux and MySQL runs inside a Docker container. So plenty of reason to achieve bad numbers. But to compare the two implementations it should be sufficient.
I have extended the cleaner events to report the time needed to execute the event procedure. Here the example of the simple cleaner job:
CREATE EVENT ttl_delete
ON SCHEDULE EVERY 10 SECOND STARTS '2019-03-04 16:00:00' DISABLE
DO BEGIN
    DECLARE t1,t2 TIME(6);
    SET t1=current_time(6);
    DELETE FROM t WHERE created_at < NOW() - INTERVAL 10 SECOND;
    SET t2=current_time(6);
    INSERT INTO ttl_report VALUES ("DELETE simple", now(),
                                   timediff(t2,t1));
END|
DELIMITER ;
The load was generated by mysqlslap, which only inserted rows in the table. Each test run starts the respective cleaner event, runs the mysqlslap load and then stops the cleaner event.
mysql -h 127.0.0.1 -uroot -pXXX -e \
        "USE test; ALTER event ttl_delete ENABLE;"

mysqlslap -h 127.0.0.1 -uroot -pXXX --create-schema=test \
   --concurrency=5 --iterations=20 --number-of-queries=10000
   --query="INSERT INTO test.t (created_at, content) VALUES
            (NULL,md5(id));"

mysql -h 127.0.0.1 -uroot -pXXX -e \
    "USE test; ALTER event ttl_delete DISABLE;"

mysql -h 127.0.0.1 -uroot -pXXX -e \
      "USE test; ALTER event ttl_truncate ENABLE;"

mysqlslap -h 127.0.0.1 -uroot -pXXX --create-schema=test
   --concurrency=5 --iterations=20 --number-of-queries=10000
   --query="INSERT INTO test.t (created_at, content) VALUES
           (NULL,md5(id));"
mysql -h 127.0.0.1 -uroot -pXXX -e \
    "USE test; ALTER event ttl_truncate DISABLE;"

The results are clearly in favor of truncating partitions. And the difference should be even higher the higher the INSERT rate gets. My poor setup achieved only less than 1000 inserts per second...

select who, avg(how_long) from ttl_report GROUP BY who;
+---------------+--------------------+
| who           |avg(how_long)       |
+---------------+--------------------+
| DELETE simple | 1.1980474444444444 |
| truncate      | 0.0400163333333333 |
+---------------+--------------------+
3 rows in set (0.0014 sec)

Side note

You might wonder why my test load is
INSERT INTO test.t (created_at, content) VALUES (NULL,'foo');"
Why do I mention the column "created_at" but then store NULL to give it the default of current_timestamp? If I omit the created_at column in this INSERT statement I get an error from the generated column due to bug #94550. Setting explicit_defaults_for_timestamp to OFF and then mentioning the timestamp column during INSERT is a workaround.

Mablomy

Friday, March 8, 2019

TTL - Time-to-Live in MySQL