Introduction
DCL: Data Control
DML: Data Manipulation
DQL: Data Query
Statements
-
SELECT [DISTINCT] colname [[AS] alias], ...
-
FROM tblname [[AS] alias]
-
JOIN tblname [[AS] alias] ON
-
WHERE condition to filter record that satisfy the condition
Most used operators to form condition are the followng:
NOT xx AND yx OR yBETWEEN x AND yIN (x, y, ...)x IS NULLandx IS NOT NULL
-
GROUP BY colname, ...
Some of the most used aggregation functions are the following:
COUNTnotice the difference betweenCOUNT(*),COUNT(colname)andCOUNT(DISTINCT colname)`SUMMINandMAXAVGSTDEVVAR- ``
-
HAVING condition to filter groups based on condition
-
ORDER BY colname [ASC | DESC]
-
CASE WHEN condition THEN value ... [ELSE value] END
-
AS colname
Functions
-
ISNULL(original, substitution)
-
COALESCE(value, value, ...)
-
CAST(varname AS datatype)
-
ROUND(col, length, [, action]) where length could be positive to round on the right, or negative to round on the left, while action could be 0 (or left away) for actual rounding, or any other integer number for truncating
-
**TRUNC(val, n)***
-
generate_series(start, end, step)
With generate_series it's possible to create a frequency distribution:
WITH bins AS ( SELECT generate_series(min, max - bin_width, bin_width) AS lower, generate_series(min + bin_width, max, bin_width) AS upper ), varq AS ( SELECT varname FROM tblname WHERE somecond ) SELECT lower, upper, count(varname) FROM bins LEFT JOIN varq ON varname >= lower AND varname < upper GROUP BY lower, upper ORDER BY lower; -
DATEADD(datepart, n, date) where datepart is the unit of measurement, that can assume the following values:
- year, yyyy, yy for Year
- quarter, qq, q for Quarter
- month, mm, m for month
- dayofyear for Day of the year
- day, dy, y for Day
- week, ww, wk for Week
- weekday, dw, w for Weekday
- hour, hh for hour
- minute, mi, n for Minute
- second, ss, s for Second
- millisecond, ms for Millisecond
while n is the number of the previously specified units to add, if positive, or subtract, if negative.
-
DATEDIFF(datepart, date_start, date_end)
-
ABS(val)
-
SQUARE(val)
-
SQRT(val)
-
LOG(val[, base]) where is natural logarithm if base is left out
CONSTRAINTS
Key Constraints
-
primary key
-
candidate key
-
composite key
-
-
foreign key Foreign keys must reference a column with unique values for each row so the referenced row can be identified
Notice that a foreign key column can only contain values included in the referenced column, but it does not need to contain all of them -
unique
Cell Constraints
-
NOT NULL
-
CHECK constraint. Depends on the previous contraint.
-
TYPE constraint: depending on the data type itself, it's possible to limit the length of the type, the set or the range of the possible values, etc. Remember that each column can only assume one data type, so that's a primitive constraint.
Join
Notice that even if two tables don't have a formal relationship between them in the database, this doesn't prevent you from joining them. To join the tables, you simply need to find a column that they have in common where the values are consistent across the tables. But remember: just because two tables have a column with the same name, it doesn't mean those columns necessarily contain compatible data. If you find more than one pair of columns with similar data, you may need to try joining with each in turn to see if you get the same number of results.
Inner Join
Left (Right) Join
Full Join
Anti Join
Semi Join
Self Join
Be careful that self-joins can easily get confusing. Use meaningful aliases to help keep everything straight.
Subquery
Expression
Derived Table
Common Table Expressions (CTE)
CTEs are like derived table, but they are predefined and can be used afterwards in any query as if they were a normal table.
The list of names in colnames can be different in wording from the result set of query but must have the same number of column.
WITH cte_name (colnames)
AS
( query )
length of colnames should reflect the column dimension of the query set result
Correlated Subquery
Window Functions
Aggregate Function(colname) OVER([PARTITION BY colname] [ORDER BY colname]) AS newname
If PARTITION is not present, the window equate to the entire table.
Notice that adding the ORDER clause could lead to different results depending on the column chosen to order on. In any case, you usually don't have to use ORDER BY when using aggregations, but if you want to create running totals, you have to arrange your rows according to the column you want to calculate totals on.
Other than aggregate function there are four dedicated window functions:
FIRST_VALUE(colname), returns the first value in each windowLAST_VALUE(colname), returns the last value in each windowLEAD(colname), returns the next value in each window. Notice that the last value of each window for the new column is always NULLLAG(colname), returns the previous value in each window. Notice that the first value of each window for the new column is always NULL- ROW_NUMBER() adds a counting number to each row, restarting on each window
In all the above window functions theORDER BYclause is required in the OVER statement.
Using CTEs and Window Functions together it's possible to calculate the mode for any column:
WITH mode_cte (col_mode, frequency)
AS
(
SELECT col_mode,
ROW_NUMBER()
OVER (PARTITION BY OrderPrice ORDER BY OrderPrice) AS frequency
FROM tblname
)
SELECT col_mode AS Mode
FROM mode_cte
WHERE frequency IN (SELECT MAX(frequency) FROM mode_cte)
Set Operators
UNION
UNION ALL
INTERSECTS
EXCEPT
VARIABLES
DECLARE @varname vartype
SET @varname = value or SELECT @varname = value
SELECT @varname
vartypes: CHAR(n), VARCHAR(n), INT, DECIMAL(p, s) or NUMERIC(p, s),
All variables names must start with the @ sign, and are case insensitive, as most things SQL
LOOPS
WHILE Loops
WHILE condition
BEGIN
code to perform desired outcome
code to ensure condition can be verified somehow (avoid infinite loops please!)
END
BREAK if you need to exit the loop before condition is true