The optimizer in MySQL 5.7 leverages generated columns. Generated columns will physically store data in two cases: Either the column is defined as STORED or you create an index on a virtual column. The optimizer will leverage such an index automatically if it encounters the same expression in a statement. Let's see an example:
mysql> DESC squares;
+-------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| dx | int(10) unsigned | YES | | NULL | |
| dy | int(10) unsigned | YES | | NULL | |
+-------+------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
mysql> SELECT COUNT(*) FROM squares;
+----------+
| COUNT(*) |
+----------+
| 2097152 |
+----------+
1 row in set (0.77 sec)
We have a large table with 2 million rows. Selecting rows by the surface area of squares can hardly leverage an index on dx or dy:
mysql> EXPLAIN SELECT * FROM squares WHERE dx*dy=221\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: squares
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2092860
filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Now let's add an index over a generated, virtual column that defines the area:
mysql> ALTER TABLE squares ADD COLUMN (area INT AS (dx*dy));
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE squares ADD INDEX (area);
Query OK, 0 rows affected (5.24 sec)
Records: 0 Duplicates: 0 Warnings: 0
Now we can run query again:
mysql> EXPLAIN SELECT * FROM squares WHERE dx*dy=221\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: squares
partitions: NULL
type: ref
possible_keys: area
key: area
key_len: 5
ref: const
rows: 18682
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.00 sec)
I did not change the query! The WHERE condition is still dx*dy. Nevertheless the optimizer finds the generated column, sees the index and decides to leverage that.
So you can add complex indexes and without changing the application code you can benefit from these indexes. That makes life much easier.
One limitation though: It seems the optimizer recognizes expressions only in the WHERE clause. It will not use the generated column and index for the SELECT expression:
mysql> EXPLAIN SELECT SUM(dx*dy) FROM squares\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: squares
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2092860
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT SUM(area) FROM squares\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: squares
partitions: NULL
type: index
possible_keys: NULL
key: area
key_len: 5
ref: NULL
rows: 2092860
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)
The aggregate function is indeed not using an index, but also something simpler like this is not (yet) using the index:
ReplyDeleteCREATE TABLE squares (dx int unsigned, dy int unsigned);
insert into squares select round(rand()*100,0), round(rand()*100,0);
insert into squares select round(rand()*100,0), round(rand()*100,0) from squares; -- until COUNT(*) == 2097152
ALTER TABLE squares ADD COLUMN (area INT AS (dx*dy));
ALTER TABLE squares ADD INDEX (area);
ALTER TABLE squares ADD INDEX `dx__area` (dx, area);
EXPLAIN SELECT * FROM squares WHERE dx*dy=221\G
EXPLAIN SELECT dx*dy FROM squares WHERE dx=5\G
EXPLAIN FORMAT=JSON SELECT dx*dy FROM squares WHERE dx=5\G
EXPLAIN FORMAT=JSON SELECT area FROM squares WHERE dx=5\G
Good point. Your example is even simpler without the aggregation function.
DeleteI also tested ... WHERE dy*dx=221; so reverse order of factors. That is mathematically the same but the optimizer does not recognize the expression. However I doubt that is a big issue. If your application is really creating both types of expressions either create both virtual columns with indexes (probably costly) or use the Query Rewrite Plugin.