1. IN 和 EXISTS的区别
1.1 IN 的用法
语法:SELECT * FROM table_name WHERE col_name in (value1, value2, ...); , 其一般形态是:SELECT * FROM table_a WHERE col_name IN (SELECT col_name FROM table_b)
IN 查询相当于多个 OR 条件的叠加;
IN 的内部工作原理: IN 只执行一次,它查出内表table_b 中的所有col_name 字段 并缓存起来; 之后,遍历循环外表table_a ,将符合条件的外表的记录加入结果集中,直至循环结束。
其本质,就是双层循环,外层循环是外表table_a: 当内表数据比较大时,不再适用IN,因为他会将内表的数据全部遍历一次。 即 IN适合内表比外表数据小的情况。
result_lst = []
table_a = (SELECT * FROM tale_a)
table_b = (SELECT col_name FROM table_b)
for i in table_a:
for j in table_b:
if table_a[i].col_name = table_b[j].col_name:
result_lst.append(table_a[i])
使用例子:
SELECT *
FROM employees e
WHERE e.emp_no IN (SELECT emp_no
FROM dept_emp)
1.2 EXISTS 的用法
EXISTS 语法:SELECT * FROM table_a a WHERE EXISTS (SELECT 1 FROM table_b WHERE a.col_name = b.col_name )
EXISTS 的执行原理: EXISTS 会执行 外表的长度(table_a.length )次,即遍历循环外表 table_a ,在内表中寻找与外表匹配的项(一般是相等)然后将满足条件的结果(TRUE)放进结果集; EXISTS 并不缓存EXISTS的结果集,因为结果集的内容并不重要,重要的是其内查询语句的结果集空或者非空,空则返回FALSE ,非空则返回TRUE ;
其本质是单层循环,循环的是外表 table_a 。 因此,在内表数据比外表数据大的时候使用EXISTS会更高效。
result_lst = []
table_a = {SELECT * FROM table_a)
for i in table_a:
if table_a[i].col_name:
result_lst.append(table_a[i])
例子:
SELECT *
FROM employees e
WHERE NOT EXISTS (SELECT emp_no
FROM dept_emp d
WHERE e.emp_no = d.emp_no
)
1.3 用法区别小结
EXISTS 与 IN 的区别:
IN 是把外表和内表作hash连接 ;而 EXISTS 是对外表作loop循环 ,每次loop循环再对内表进行查询IN 是在内存里遍历比较,而EXISTS 需要查询数据库,所以当内表数据量较大时,EXISTS 效率优于IN 。
2. 例题
题目一
这题总结自Module 3 Coding Assignment-第6题
题目描述:
Use a UNION to create a list of all the employee’s and customer’s first names and last names ordered by the last name in descending order.
表格:
我一开始使用 NOT IN 筛选出员工,但是一直报错,后来尝试使用 EXISTS (之前也遇到过,知道 NOT IN 在字段存在NULL 的情况下会出问题,所以尽量不要使用 ),没想到直接通了。
Employees表格 大概长这样: (注意 ReportsTo 与 EmployeeID 数据类型对应)
+
| EmployeeId | LastName | FirstName | Title | ReportsTo | BirthDate | HireDate | Address | City | State | Country | PostalCode | Phone | Fax | Email |
+
| 1 | Adams | Andrew | General Manager | None | 1962-02-18 00:00:00 | 2002-08-14 00:00:00 | 11120 Jasper Ave NW | Edmonton | AB | Canada | T5K 2N1 | +1 (780) 428-9482 | +1 (780) 428-3457 | andrew@chinookcorp.com |
| 2 | Edwards | Nancy | Sales Manager | 1 | 1958-12-08 00:00:00 | 2002-05-01 00:00:00 | 825 8 Ave SW | Calgary | AB | Canada | T2P 2T3 | +1 (403) 262-3443 | +1 (403) 262-3322 | nancy@chinookcorp.com |
| 3 | Peacock | Jane | Sales Support Agent |
题解(错误地使用 NOT IN):
SELECT em.EmployeeId, em.FirstName, em.LastName
FROM Employees em
WHERE EmployeeId IN (SELECT DISTINCT ReportsTo FROM Employees)
UNION
SELECT ee.EmployeeId, ee.FirstName, ee.LastName
FROM Employees ee
WHERE EmployeeId NOT IN (SELECT DISTINCT ReportsTo FROM Employees)
ORDER BY em.LastName DESC
结果不对:
+
| em.EmployeeId | em.FirstName | em.LastName |
+
| 6 | Michael | Mitchell |
| 2 | Nancy | Edwards |
| 1 | Andrew | Adams |
+
正解(使用 EXISTS ):
SELECT em.EmployeeId, em.FirstName, em.LastName
FROM Employees em
WHERE EmployeeId IN (SELECT DISTINCT ReportsTo FROM Employees)
UNION
SELECT ee.EmployeeId, ee.FirstName, ee.LastName
FROM Employees ee
WHERE NOT EXISTS (SELECT DISTINCT ee2.ReportsTo FROM Employees ee2 WHERE ee.EmployeeId = ee2.ReportsTo)
ORDER BY em.LastName DESC
得到:
+
| em.EmployeeId | em.FirstName | em.LastName |
+
| 3 | Jane | Peacock |
| 4 | Margaret | Park |
| 6 | Michael | Mitchell |
| 7 | Robert | King |
| 5 | Steve | Johnson |
| 2 | Nancy | Edwards |
| 8 | Laura | Callahan |
| 1 | Andrew | Adams |
+
题目二
IN 的又一个坑
SELECT * FROM Score
WHERE c_id IN ('01', '02')
参考:
- SQL中exists和in的用法以及区别
- SQL查询中in和exists的区别分析
|