SQL


        
                Markdown 7.00 KB
                                    
                        | None                    
                
                                        |
    0    0                            

            
                                    raw
                    download
                    clone
                    embed
                    print
                
                                    report
                
                
            Introduction
DCL: Data Control
DML: Data Manipulation
DQL: Data Query
Statements


SELECT [DISTINCT] colname [[AS] alias], ...


FROM tblname [[AS] alias] 


JOIN tblname [[AS] alias] ON 


WHERE condition to filter record that satisfy the condition 
Most used operators to form condition are the followng:

NOT x
x AND y
x OR y
BETWEEN x AND y
IN (x, y, ...)
x IS NULL and x IS NOT NULL


GROUP BY colname, ... 
Some of the most used aggregation functions are the following:

COUNT notice the difference between COUNT(*),COUNT(colname) and COUNT(DISTINCT colname)`
SUM
MIN and MAX
AVG
STDEV
VAR
``


HAVING condition to filter groups based on condition  


ORDER BY colname [ASC | DESC]


CASE WHEN condition THEN value ... [ELSE value] END


AS colname


Functions


ISNULL(original, substitution)


COALESCE(value, value, ...)


CAST(varname AS datatype)


ROUND(col, length, [, action]) where length could be positive to round on the right, or negative to round on the left, while action could be 0 (or left away) for actual rounding, or any other integer number for truncating


**TRUNC(val, n)***


generate_series(start, end, step)
With generate_series it's possible to create a frequency distribution:
WITH 
  bins AS (
    SELECT generate_series(min, max - bin_width, bin_width) AS lower, generate_series(min + bin_width, max, bin_width) AS upper
  ),
  varq AS (
    SELECT varname
    FROM tblname
    WHERE somecond
) 
SELECT lower, upper, count(varname) 
FROM bins
   LEFT JOIN varq ON varname >= lower AND varname < upper
GROUP BY lower, upper
ORDER BY lower;


DATEADD(datepart, n, date) where datepart is the unit of measurement, that can assume the following values:

year, yyyy, yy for Year
quarter, qq, q for Quarter
month, mm, m for month
dayofyear for Day of the year
day, dy, y for Day
week, ww, wk for Week
weekday, dw, w for Weekday
hour, hh for hour
minute, mi, n for Minute
second, ss, s for Second
millisecond, ms for Millisecond

while n is the number of the previously specified units to add, if positive, or subtract, if negative.


DATEDIFF(datepart, date_start, date_end) 


ABS(val)


SQUARE(val)


SQRT(val)


LOG(val[, base]) where is natural logarithm if base is left out


CONSTRAINTS
Key Constraints


primary key


candidate key


composite key


foreign key Foreign keys must reference a column with unique values for each row so the referenced row can be identified

Notice that a foreign key column can only contain values included in the referenced column, but it does not need to contain all of them


unique


Cell Constraints


NOT NULL


CHECK constraint. Depends on the previous contraint.


TYPE constraint: depending on the data type itself, it's possible to limit the length of the type, the set or the range of the possible values, etc. Remember that each column can only assume one data type, so that's a primitive constraint.


Join
Notice that even if two tables don't have a formal relationship between them in the database, this doesn't prevent you from joining them. To join the tables, you simply need to find a column that they have in common where the values are consistent across the tables. But remember: just because two tables have a column with the same name, it doesn't mean those columns necessarily contain compatible data. If you find more than one pair of columns with similar data, you may need to try joining with each in turn to see if you get the same number of results.
Inner Join
Left (Right) Join
Full Join
Anti Join
Semi Join
Self Join
Be careful that self-joins can easily get confusing. Use meaningful aliases to help keep everything straight.
Subquery
Expression
Derived Table
Common Table Expressions (CTE)
CTEs are like derived table, but they are predefined and can be used afterwards in any query as if they were a normal table.

The list of names in colnames can be different in wording from the result set of query but must have the same number of column.
WITH cte_name (colnames)
AS
( query )
length of colnames should reflect the column dimension of the query set result
Correlated Subquery
Window Functions
Aggregate Function(colname) OVER([PARTITION BY colname] [ORDER BY colname]) AS newname

If PARTITION is not present, the window equate to the entire table.

Notice that adding the ORDER clause could lead to different results depending on the column chosen to order on. In any case, you usually don't have to use ORDER BY when using aggregations, but if you want to create running totals, you have to arrange your rows according to the column you want to calculate totals on.
Other than aggregate function there are four dedicated window functions:

FIRST_VALUE(colname), returns the first value in each window
LAST_VALUE(colname), returns the last value in each window
LEAD(colname), returns the next value in each window. Notice that the last value of each window for the new column is always NULL
LAG(colname), returns the previous value in each window. Notice that the first value of each window for the new column is always NULL
ROW_NUMBER() adds a counting number to each row, restarting on each window

In all the above window functions the ORDER BY clause is required in the OVER statement.

Using CTEs and Window Functions together it's possible to calculate the mode for any column:
WITH mode_cte (col_mode, frequency)
AS
(
    SELECT col_mode,
    ROW_NUMBER() 
    OVER (PARTITION BY OrderPrice ORDER BY OrderPrice) AS frequency
    FROM tblname
)
SELECT col_mode AS Mode
FROM mode_cte
WHERE frequency IN (SELECT MAX(frequency) FROM mode_cte)
Set Operators
UNION
UNION ALL
INTERSECTS
EXCEPT

VARIABLES
DECLARE @varname vartype
SET @varname = value or SELECT @varname = value
SELECT @varname
vartypes: CHAR(n), VARCHAR(n), INT, DECIMAL(p, s) or NUMERIC(p, s), 
All variables names must start with the @ sign, and are case insensitive, as most things SQL
LOOPS
WHILE Loops
WHILE condition
  BEGIN
    code to perform desired outcome
    code to ensure condition can be verified somehow (avoid infinite loops please!)
  END
BREAK if you need to exit the loop before condition is true
FOR Loops