ONLY_FULL_GROUP_BY是MySQL提供的一個(gè)sql_mode,通過這個(gè)sql_mode來提供SQL語句“分組求最值”合法性的檢查,在MySQL的sql_mode為非ONLY_FULL_GROUP_BY語義時(shí)。一條select語句,MySQL允許target list中輸出的表達(dá)式是除聚集函數(shù)或group by column以外的表達(dá)式,但這個(gè)表達(dá)式的值可能在經(jīng)過group by操作后變成了未知的,例如:
1 2 3 4 5 6 7 | create table emp(id int primary key, ename varchar(20), sal decimal(10, 2), deptno int); insert into emp select 1001,'emp_1001',100.00,10; insert into emp select 1002,'emp_1002',200.00,10; insert into emp select 1003,'emp_1003',300.00,20; insert into emp select 1004,'emp_1004',400.00,20; insert into emp select 1005,'emp_1005',500.00,30; insert into emp select 1006,'emp_1006',600.00,30; |
其中,empno是員工編號(hào),ename是員工姓名,sal是工資,deptno是員工所在部門號(hào)。
業(yè)務(wù)的需求是,求出每個(gè)部門中工資最高的員工的相關(guān)信息。
在MySQL 5.6中,可能會(huì)看見這種寫法:
1 2 3 4 5 6 7 8 9 | mysql> select deptno,ename,max(sal) from emp group by deptno; +--------+----------+----------+ | deptno | ename | max(sal) | +--------+----------+----------+ | 10 | emp_1001 | 200.00 | | 20 | emp_1003 | 400.00 | | 30 | emp_1005 | 600.00 | +--------+----------+----------+ 3 rows in set (0.00 sec) |
實(shí)在不明白,這里的ename在業(yè)務(wù)層有何意義,畢竟,他并不是工資最高的那位員工。但仔細(xì)觀察會(huì)發(fā)現(xiàn),ename值也算有規(guī)律,總是分組后的第一條記錄。由此,在5.6中,我們可以通過下面這個(gè)SQL來實(shí)現(xiàn)這個(gè)需求,算是一種投機(jī)方式。
1 2 3 4 5 | SELECT deptno,ename,sal FROM ( SELECT * FROM emp ORDER BY sal DESC ) t GROUP BY deptno; |
得到結(jié)果如下:
1 2 3 4 5 6 7 8 | +--------+----------+--------+ | deptno | ename | sal | +--------+----------+--------+ | 10 | emp_1002 | 200.00 | | 20 | emp_1004 | 400.00 | | 30 | emp_1006 | 600.00 | +--------+----------+--------+ 3 rows in set (0.00 sec) |
而對(duì)于語義限制都比較嚴(yán)謹(jǐn)?shù)亩嗉覕?shù)據(jù)庫,如SQLServer、Oracle、PostgreSql都不支持select target list中出現(xiàn)語義不明確的列,這樣的語句在這些數(shù)據(jù)庫中是會(huì)被報(bào)錯(cuò)的,這也是SQL92的標(biāo)準(zhǔn)。所以從MySQL 5.7版本開始修正了這個(gè)語義,就是我們所說的ONLY_FULL_GROUP_BY語義。
例如,查看MySQL 5.7默認(rèn)的sql_mode如下:
1 2 | mysql> select @@global.sql_mode; ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
所以,對(duì)于同樣的SQL,我們?cè)贛ySQL 5.7再跑一次就會(huì)報(bào)錯(cuò)了,剛才通過的查詢語句被MySQL拒絕掉了!如下錯(cuò)誤:
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 't.ename' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
在ONLY_FULL_GROUP_BY模式下,因?yàn)閠arget list中的ename沒有出現(xiàn)在聚集函數(shù)中,并且也沒有出現(xiàn)在group by list中,所以MySQL給拒絕了。這也是MySQL 5.6升級(jí)到MySQL 5.7需要注意的地方。如果我們?nèi)サ鬙NLY_FULL_GROUP_BY模式,如下操作:
1 | mysql> set session sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'; |
現(xiàn)在,我們?cè)谌サ鬙NLY_FULL_GROUP_BY語義的MySQL 5.7上把剛才的查詢?cè)俅螆?zhí)行,得到結(jié)果如下:
1 2 3 4 5 6 7 8 | +--------+----------+--------+ | deptno | ename | sal | +--------+----------+--------+ | 10 | emp_1001 | 100.00 | | 20 | emp_1003 | 300.00 | | 30 | emp_1005 | 500.00 | +--------+----------+--------+ 3 rows in set (0.00 sec) |
結(jié)果竟然跟5.6不一樣。
實(shí)際上,在MySQL5.7中,對(duì)該SQL進(jìn)行了改寫,改寫后的SQL可通過show warnings查看。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> explain SELECT deptno,ename,sal FROM ( SELECT * FROM emp ORDER BY sal DESC ) t GROUP BY deptno; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | emp | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using temporary; Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings; +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | /* select#1 */ select `employees`.`emp`.`deptno` AS `deptno`,`employees`.`emp`.`ename` AS `ename`,`employees`.`emp`.`sal` AS `sal` from `employees`.`emp` group by `employees`.`emp`.`deptno` | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) |
從改寫后的SQL來看,其消除了子查詢,導(dǎo)致結(jié)果未能實(shí)現(xiàn)預(yù)期效果。但可通過修改optimizer_switch來加以規(guī)避(setoptimizer_switch=”derived_merge=off”),derived_merge是MySQL 5.7引入的,其會(huì)試圖將derived table(派生表,from后面的子查詢)、視圖引用、共用表表達(dá)式與外層表進(jìn)行合并。
Tips:所以很多從5.6升級(jí)到5.7時(shí),為了語法兼容,大部分都會(huì)選擇調(diào)整sql_mode,使其保持跟5.6一致,為了盡量兼容程序。但是這個(gè)問題也是需要注意的,不要以為這種投機(jī)的寫法不會(huì)有人用。在stackoverflow中,該實(shí)現(xiàn)的點(diǎn)贊數(shù)就有116個(gè),由此可見其受眾之廣。
所以O(shè)NLY_FULL_GROUP_BY的語義就是確定select target list中的所有列的值都是明確語義,簡單的說來,在ONLY_FULL_GROUP_BY模式下,target list中的值要么是來自于聚集函數(shù)的結(jié)果,要么是來自于group by list中的表達(dá)式的值。但是由于表達(dá)式的表現(xiàn)形式非常豐富,對(duì)于MySQL來說,很難精確的確定一些表達(dá)式的輸出結(jié)果是明確的,比如:
1 2 3 | mysql> select deptno from emp group by deptno+1; ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'employees.emp.deptno' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by |
在上面的查詢語句中,其實(shí)count的值也是能被唯一確定的,但是由于程序無法分析出這種復(fù)雜的關(guān)系,所以這條查詢也被拒絕掉了。
因此,如果查詢語句中的target list,having condition或者order by list里引用了的表達(dá)式不是聚集函數(shù),但是和group by list中的表達(dá)式嚴(yán)格匹配,該語句也是合法的(deptno+1和deptno+1是嚴(yán)格匹配的,deptno+1和deptno+2在MySQL認(rèn)為是不嚴(yán)格匹配的, deptno+1和1+deptno也是不嚴(yán)格匹配的)。
如下嚴(yán)格匹配模式下,查詢是沒有問題的。
1 2 3 4 5 6 7 8 9 | mysql> select deptno+1 from emp group by deptno+1; +----------+ | deptno+1 | +----------+ | 11 | | 21 | | 31 | +----------+ 3 rows in set (0.00 sec) |
這條語句target list中的deptno+1和group by中的deptno+1是嚴(yán)格匹配的,所以MySQL認(rèn)為target list中的deptno+1是語義明確的,因此該語句可以通過。
再看下面這條語句:
1 2 3 4 5 6 7 8 9 | mysql> select deptno+1 as a from emp group by a order by deptno+1 desc; +------+ | a | +------+ | 31 | | 21 | | 11 | +------+ 3 rows in set (0.00 sec) |
MySQL允許target list中對(duì)于非聚集函數(shù)的alias column被group by、having condition以及order by語句引用,從上面兩條語句可以看出,group by和order by中引用了alias column,并且其等價(jià)于基礎(chǔ)列語義。
總結(jié)一下:MySQL對(duì)于ONLY_FULL_GROUP_BY語義的判斷規(guī)則是,如果group by list中的表達(dá)式是basic column,那么target list中允許出現(xiàn)表達(dá)式是group by list中basic column或者alias column的組合結(jié)果,如果group by list中的表達(dá)式是復(fù)雜表達(dá)式(非basic column或者alias column),那么要求target list中的表達(dá)式必須能夠嚴(yán)格和group by list中的表達(dá)式進(jìn)行匹配,否者這條查詢會(huì)被認(rèn)為不合法。
其實(shí)分組求最值是一個(gè)很普遍的需求。下面具體來看看,MySQL中有哪些實(shí)現(xiàn)方式。
方法一:
1 2 3 4 5 6 7 8 9 10 | SELECT e.deptno, ename, sal FROM emp e, ( SELECT deptno, max( sal ) maxsal FROM emp GROUP BY deptno ) t WHERE e.deptno = t.deptno AND e.sal = t.maxsal; |
方法二:
1 2 3 4 5 6 7 8 9 10 | SELECT a.deptno, a.ename, a.sal FROM emp a LEFT JOIN emp b ON a.deptno = b.deptno AND a.sal < b.sal WHERE b.sal IS NULL; |
性能呢?
方法一執(zhí)行計(jì)劃:
1 2 3 4 5 6 7 8 9 | mysql> desc SELECT e.deptno,ename,sal FROM emp e, ( SELECT deptno, max( sal ) maxsal FROM emp GROUP BY deptno ) t WHERE e.deptno = t.deptno AND e.sal = t.maxsal; +----+-------------+------------+------------+-------+----------------+----------------+---------+-------------------+------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+-------+----------------+----------------+---------+-------------------+------+----------+----------------------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where | | 1 | PRIMARY | e | NULL | ref | idx_deptno_sal | idx_deptno_sal | 11 | t.deptno,t.maxsal | 1 | 100.00 | Using join buffer (Batched Key Access) | | 2 | DERIVED | emp | NULL | range | idx_deptno_sal | idx_deptno_sal | 5 | NULL | 4 | 100.00 | Using index for group-by | +----+-------------+------------+------------+-------+----------------+----------------+---------+-------------------+------+----------+----------------------------------------+ 3 rows in set, 1 warning (0.00 sec) |
在有復(fù)合索引(deptno, sal)的情況下,結(jié)果瞬間就能出來應(yīng)該。根據(jù)執(zhí)行計(jì)劃,先將group by的結(jié)果放到臨時(shí)表中,然后再將該臨時(shí)表作為驅(qū)動(dòng)表,來和emp表進(jìn)行關(guān)聯(lián)查詢。驅(qū)動(dòng)表小(只有3條記錄),關(guān)聯(lián)列又有索引,無怪乎,結(jié)果能秒出。
方法二執(zhí)行計(jì)劃:
1 2 3 4 5 6 7 8 | mysql> explain SELECT a.deptno,a.ename,a.sal FROM emp a LEFT JOIN emp b ON a.deptno = b.deptno AND a.sal < b.sal WHERE b.sal IS NULL; +----+-------------+-------+------------+------+----------------+----------------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+----------------+----------------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | a | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | NULL | | 1 | SIMPLE | b | NULL | ref | idx_deptno_sal | idx_deptno_sal | 5 | employees.a.deptno | 2 | 16.67 | Using where; Using index | +----+-------------+-------+------------+------+----------------+----------------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) |
兩表關(guān)聯(lián)查詢。其犯了SQL優(yōu)化中的兩個(gè)大忌。驅(qū)動(dòng)表對(duì)于優(yōu)化器來說,沒辦法自由選擇,只能使用a表,所以a表如果越大就越慢。被驅(qū)動(dòng)表雖然也有索引,但從執(zhí)行計(jì)劃上看,其只使用了復(fù)合索引(deptno,sal)中的deptno,如果deptno選擇性太低,則越慢。
所以,對(duì)于分組求最值的需求,建議使用方法一,其不僅符合SQL規(guī)范,查詢性能上也是最好的。
<參考>
https://mp.weixin.qq.com/s/u9Twv24IKxfHVyj62B4VtQ
聯(lián)系客服