SELECT statement¶
The SELECT
statement is used to read data from a database. The basic syntax is:
Note
Although it is common practice to put SQL reserved words in capital letters,
in general it is not necessary. For example, SELECT
and select
are equivalent.
The last query will return columns c1
, c2
from the table some_table
.
We can also use the *
to query all rows from a table
Note
The select statement from SQL is similar to panda's .loc
function.
For example, to write the previous query in pandas we would do:
Aliases¶
Sometimes we want to rename the columns that we are querying. We can do this using aliases for tables and columns:
This is particularly useful when we are accessing data from multiple tables, and we want to avoid
ambiguities. For example, if we have two tables with a column called id
, we can use aliases to
distinguish between them:
We will see more about JOIN
statements later.
Note
We can use break lines in SQL queries to make them more readable, but this is not necessary. With breaklines or not, the query will be executed in the same way.
We can also use aliases to define new columns:
SELECT
CustomerName,
Address + ', ' + PostalCode + ' ' + City + ', ' + Country AS Address
FROM Customers;
Note
The table we query from does not need to be an existing table: it can be any form of
derived table, created by another SELECT ...
statement. Typical ways to create this
new tables to query from are Common Table Expressions (CTEs) and subqueries (just another
SELECT
statement between parentheses). We will see more about this later.
WHERE clauses¶
So far we have seen how to query all rows from a table. However, in most cases we want to query
only a subset of the rows. We can do this using the WHERE
clause:
In the last example, some_condition
is a boolean expression that evaluates to True
or False
.
Note
some_condition
can be of type =
(e.g. A = 3
), different !=
, >
, <
, etc.
Conditions can be chained by AND
and OR
operators. They can also be of the type in a set
, e.g.:
IN (value1, value2, ...);
, or equivalently in a derived table with one column, IN (SELECT ...)
.
Note
To match a string, we can use the LIKE
operator, which allows us to use wildcards such as %
and _
.
For example, to match all strings that start with a
and end with b
, we can use the following condition:
Other operators¶
DISTINCT¶
Distinct rows from a table can be queried using the DISTINCT
keyword:
ORDER BY¶
We can also sort the result set in ascending or descending order using the ORDER BY
clause:
LIMIT and OFFSET¶
To limit the number of rows returned by a query, we can use the LIMIT
clause:
OFFSET
clause:
GROUP BY and aggregate functions¶
Like in pandas, we can group unique values of one or more columns and apply an aggregate function to them
Aggregate functions are, essentially, functions that take a set of values and return a single value. We apply the aggregate function to each column that is not grouped by. For example, we can count the number of rows in each group:
Inside the COUNT
parentheses you can put a DISTINCT to count different appearances or all of them.
Note
The COUNT function returns the number of rows for which the expression evaluates to a non-null value. (* is a special expression that is not evaluated, it simply returns the number of rows.)
Usual aggregate functions are:
COUNT
(count the number of rows)SUM
(sum of all values in a column)AVG
(average of all values in a column)MIN
(minimum value in a column)MAX
(maximum value in a column)STDDEV
(standard deviation of all values in a column)
HAVING¶
HAVING is used to filter records that work on summarized GROUP BY results. HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.
Note
Difference between HAVING and WHERE:
- HAVING is used to check conditions after the aggregation takes place.
- WHERE is used to check conditions before the aggregation takes place.
Hence, when we can use it, WHERE is faster than HAVING.
For example, this code:
Gives you a table of all cities in MA and the number of addresses in each city. This code:Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
CASE statements¶
The CASE statement goes through conditions as and if/elif
statement, and returns a column:
SELECT OrderID, Quantity,
CASE
WHEN Quantity > 30 THEN 'The quantity is greater than 30'
WHEN Quantity = 30 THEN 'The quantity is 30'
ELSE 'The quantity is under 30'
END AS QuantityText
FROM OrderDetails;
Table JOINs¶
A join stitches together 2 tables, based on some condition (e.g. equality of a row).
We use the JOIN
when we want to query data from multiple tables at once. The basic syntax is:
For example, we can join the Customers
and Orders
tables to get the name of the customer
who made each order:
SELECT
Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM
Orders
INNER JOIN
Customers
ON
Orders.CustomerID = Customers.CustomerID;
Note
The SQL Join is the equivalent of the merge
function in pandas.
Joins can be of several different types:
- INNER (intersection of values)
- LEFT (Intersection + unpaired from left table)
- RIGHT (same but with right table)
- FULL OUTER (both).
Combine rows from two queries¶
UNION ALL is used to combine the result from multiple SELECT statements into a single result set.
We can also find the intersection of two queries, or subtract one from the other. Intersection:
Subtract a result set from another result set:Subqueries¶
A subquery is a SELECT query nested inside another query. We can use subqueries to query data from multiple tables, or to query data from the same table using different conditions. For example, we can use a subquery to find the name of the customer who made the most recent order:
SELECT
CustomerName
FROM
Customers
WHERE
CustomerID = (
SELECT
CustomerID
FROM
Orders
ORDER BY
OrderDate DESC
LIMIT 1
);
Common Table Expressions (CTEs)¶
A Common Table Expression (CTE) is a temporary result set that we can reference within another query. It is similar to a subquery, but it is more readable (because it has a name), and hence it is easier to maintain.
CTEs always start with the WITH
keyword, followed by the name of the CTE, and the query that defines it.
For example, we can use a CTE to find the name of the customer who made the most
recent order:
WITH last_order AS (
SELECT
CustomerID
FROM
Orders
ORDER BY
OrderDate DESC
LIMIT 1
)
SELECT
CustomerName
FROM
Customers
WHERE
CustomerID = last_order.CustomerID;
We can define multiple CTEs in the same query, and we can also use CTEs to query data from multiple tables:
WITH
last_order AS (
SELECT
CustomerID
FROM
Orders
ORDER BY
OrderDate DESC
LIMIT 1
),
customer AS (
SELECT
CustomerName
FROM
Customers
WHERE
CustomerID = last_order.CustomerID
)
SELECT
customer.CustomerName,
Orders.OrderDate
FROM
Orders
INNER JOIN
customer
ON
Orders.CustomerID = customer.CustomerID;