title | date | last_modified_at |
---|---|---|
[DB] Advanced SQL |
2024-03-27 |
2024-03-27 |
A database programmer must have access to a general purpose programming language for at least two reasons:
- Not all queries can be expressed in SQL.
SQL does not provide the full expressive power of a general-purpose language. - Non-declarative actions cannot be done from within SQL.
Non-delcatarive actions include printing a report, interacting with a user, and sending the results of a query to a GUI.
There are two approaches to accessing SQL from a general purpose programming language:
- Dynamic SQL: SQL statements that are constructed at runtime; for example, the application may allow users to enter their own queries
- A general purpose program can connect to and communicate with a database server using a collection of functions.
- Enabling you to build SQL statements dynamically at runtime
- Being able to create more general purpose, flexible applications by using dynamic SQL because the full text of a SQL statement may be unknown at compilation
- e.g. Dynamic SQL statement in Python
- A general purpose program can connect to and communicate with a database server using a collection of functions.
- Embedded SQL: SQL statements in an application that do not change at runtime and, therefore, can be hard-coded into the application
- Like dynamic SQL, it provides a means by which program can interact with a database server. However, under embedded SQL:
- The SQL statements are identified at compile time, and are translated into function calls.
- At runtime, these function calls connect to the database using an API that provides dynamic SQL facilities (but may be specific to the database being used).
- e.g. Oracle SQL
- Like dynamic SQL, it provides a means by which program can interact with a database server. However, under embedded SQL:
- A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
- e.g. whenever a tuple is inserted into the
takes
relation, updates thetot_creds
attribute of tuple in thestudent
relation.
- e.g. whenever a tuple is inserted into the
- The two requirements of a trigger design are:
- Specify the conditions under which the trigger is to be executed.
- Specify the actions to be taken when the trigger executes.
- Triggering event can be an insert, a deletion, or an update.
- Triggers on updates can be restricted to specific attributes.
- e.g.
AFTER UPDATE OF takes ON grade
- e.g.
- Values of attributes before and after an update can be referenced.
REFERENCING OLD ROW AS
: For deletions and updatesREFERENCING NEW ROW AS
: For insertions and updates
BEGIN ATOMIC ... END
can serve to collect multiple SQL statements into a single compound statement.- Triggers can be activated before and after an event.
AFTER UPDATE OF
BEFORE UPDATE OF
: Serves as an extra constraint that can prevent invalid updates, inserts, or deletes-
e.g., suppose the value of an inserted grade is blank, presumably to indicate the absence of a grade. We can define a trigger that replaces the value with
NULL
CREATE TRIGGER setnull_trigger BEFORE UPDATE OF takes REFERENCING NEW ROW AS nrow FOR EACH ROW WHEN (nrow.grade = ' ') BEGIN ATOMIC SET nrow.grade = NULL; END;
-
Example
- When the
grade
attribute is updated for a tuple in thetakes
relation, keep thetot_cred
attribute value ofstudent
tuples up-to-date:
-- When (Trigger Conditions)
CREATE TRIGGER credits_earned AFTER UPDATE OF takes ON (grade)
-- What (Trigger Actions)
REFERENCING NEW ROW AS nrow
REFERENCING OLD ROW AS orow
FOR EACH ROW
WHEN (nrow.grade <> 'F' AND nrow.grade IS NOT NULL)
AND (orow.grade = 'F' OR orow.grade IS NULL)
BEGIN ATOMIC
UPDATE student
SET tot_cred = tot_cred +
(
SELECT credits
FROM course
WHERE course.course_id = nrow.course_id
)
WHERE student.id = nrow.id;
END;
- Using triggers to maintain referential integrity
CREATE TRIGGER timeslot_check1 AFTER INSERT ON section
REFERENCING NEW ROW AS nrow
FOR EACH ROW
WHEN (nrow.time slot_id NOT IN (
SELECT time_slot_id
FROM time_slot)) /* time_slot_id not present in time_slot */
BEGIN
rollback
END;
CREATE TRIGGER timeslot_check2 AFTER DELETE ON timeslot
REFERENCING OLD ROW AS orow
FOR EACH ROW
WHEN (orow.time_slot_id NOT IN (
SELECT time_slot_id
FROM time_slot)/*last tuple for time_slot_id deleted from time_slot */
AND orow.time_slot_id IN (
SELECT time_slot_id
FROM section)) /* and time slot id still referenced from section*/
BEGIN
rollback
END;
- Place a reorder when inventory falls below a specified minimum
CREATE TRIGGER reorder AFTER UPDATE OF level ON inventory
REFERENCING OLD ROW AS orow, NEW ROW AS nrow
FOR EACH ROW
WHEN nrow.level <= (SELECT level
FROM minlevel
WHERE minlevel.item = orow.item)
AND orow.level > (SELECT level
FROM minlevel
WHERE minlevel.item = orow.item)
BEGIN ATOMIC
INSERT INTO orders
(SELECT item, amount
FROM reorder
WHERE reorder.item = orow.item);
END;
- Instead of executing a separate action for each affected row, a single action can be executed for all rows affected by a transaction.
- Instead of
FOR EACH ROW
, useFOR EACH STATEMENT
. - Instead of
REFERENCING [OLD/NEW] ROW AS ...
, useREFERENCING [OLD/NEW] TABLE AS ...
to refer to temporary tables (called transition tables) containing all the affected rows.
- Instead of
- Statement level triggers can be more efficient when dealing with SQL statements that update a large number of rows.
- Earlier, triggers were used for tasks such as
- Maintaining summary data (e.g., total salary of each department).
- Replicating databases by recording changes to special relations (called change or delta relations), and having a separate process that applies the changes over to a replica
- However, there are better ways of doing these nowadays.
- Databases today provide built-in materialized view facilities to maintain summary data.
- Databases provide built-in support for replication.
- Encapsulation facilities can be used instead of triggers in many cases. Define methods to update fields, which directly carry out actions as part of the update methods instead of through a trigger indirectly.
There are some risks for using triggers. For example,
- Risk of unintended execution of triggers. For example, when
- Loading data from a backup copy, or Replicating updates at a remote site
- In this case, the triggered action has already been executed and reflected in the backup or replica in the past
$\Rightarrow$ redundant - Thus triggers would have to be disabled initially and enabled when the backup site takes over processing from the primary system.
- In this case, the triggered action has already been executed and reflected in the backup or replica in the past
- Loading data from a backup copy, or Replicating updates at a remote site
- Trigger error at runtime might lead to the failure of critical transactions that set off the trigger.
- Cascading execution of triggers.
- e.g., an insert trigger on a relation causes another insert on the same relation
- might lead to an infinite chain of triggering
-
Recursive views make it possible to write queries that cannot be written without recursion or iteration.
- Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of relation with itself
- This program can give only a fixed number of levels of prerequisites
- Given a fixed non-recursive query, we can construct a database with a greater number of levels of prerequisites on which the query will not work
- Alternatively, we may write a procedure to iterate as many times as required
- See the procedure
findAllPrereqs
in the textbook
- See the procedure
- Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of relation with itself
-
Any recursive view must be defined as the union of two subqueries:
- base query: nonrecursive
- recursive query: uses the recursive view.
-
The meaning of a recursive view is best understood as follows:
- compute the base query and add all the resultant tuples to the recursively defined view relation (initially empty)
- compute the recursive query using the current contents of the view relation, and add all the resulting tuples back to the view relation
- keep repeating the above step until no new tuples are added to the view relation.
- at some point, no tuple is added to the view after recursion. This final result is called the fixed point of the recursive view definition.
-
Example. Find all (direct or indirect) prerequisite subjects of each course
WITH RECURSIVE rec_prereq(course_id, prereq_id) AS ( ( SELECT course_id, prereq_id FROM prereq ) UNION ( SELECT rec_prereq.course_id, prereq.prereq_id FROM rec_prereq, prereq WHERE rec_prereq.prereq_id = prereq.course_id ) ) SELECT course_id FROM rec_prereq;
- The example view
rec_prereq
is called the transitive closure of theprereq
relation, meaning that it is a relation that contains all pairs(cid, pre)
such thatpre
is a direct or indirect prerequisite ofcid
.
- The example view
-
Recursive views are required to be monotonic.
- For each step of the recursion, the view must contain all of the tuples it contained in the previous step, plus possibly some more.
- Recursive queries may not use any of the following construct:
- Aggregation on the recursive view
NOT EXISTS
on a subquery that uses the recursive view.- Set difference (
EXCEPT
) whose RHS uses the recursive view.
The aggregation support in SQL is quite powerful and handles most common tasks with ease.
-
Finding the position of a value in a larger set is a common operation. In SQL, ranking is done by
RANK() OVER
in conjunction with anORDER BY
specification. -
Examples
-
Find the rank of each student
SELECT ID, RANK() OVER (ORDER BY (GPA) DESC) AS s_rank FROM student_grades; /* student_grades(ID, GPA) is a relation * giving the GPA of each student */
-
Here, an extra
ORDER BY
clause can return the results in sorted order.SELECT ID, RANK() OVER (ORDER BY (GPA) DESC) AS s_rank FROM student_grades ORDER BY s_rank ASC;
-
-
A basic issue with ranking is how to deal with the case of mutiple tuples that are the same values on the ordering attribute(s).
- Naturally, the
RANK()
function leaves gaps.
Meaning that if the highest GPA is shared by two students, both would get rank1
, and the next rank would be3
. - There is a
DENSE_RANK()
function which does not leave gaps.
- Naturally, the
- Ranking can be done within partitions of data, using
PARTITION BY
. - Examples
-
Find the rank of students within each department
SELECT ID, dept_name, RANK() OVER ( PARTITION BY dept_name ORDER BY GPA DESC ) AS dept_rank FROM dept_grades ORDER BY dept_name, dept_rank;
-
- Multiple rank expressions can be used within a single
SELECT
statement. - When ranking (possibly with partitioning) occurs along with a
GROUP BY
clause, theGROUP BY
clause is applied first, and partitioning and ranking are done on the results ofGROUP BY
, allowing aggregate values to be used for ranking.
- Several other functions can be used in place of
RANK
PERCENT_RANK
: Gives the rank of the tuple as a fractionCUME_DIST
: Cumulative distributionROW_NUMBER
: Sorts the rows and gives each row a unique number corresponding to its position- Non-deterministic in presence of duplicates
NULLS FIRST
,NULLS LAST
NTILE(n)
: Takes the tuples in each partition in the specified order, and divides them inton
buckets with equal numbers of tuples
Example
-
Find for each student the quartile they belong to
SELECT ID, NTILE(4) OVER (ORDER BY GPA DESC) AS quartile FROM student_grades;
-
Find the rank of each student
SELECT ID, RANK() OVER (ORDER BY (GPA) DESC NULLS LAST) AS s_rank FROM student_grades; /* student_grades(ID, GPA) is a relation * giving the GPA of each student */
-
Window queries compute an aggregate function over ranges of tuples. They are used to smooth out random variables.
-
Unlike partitions, windows may overlap, in which case a tuple may contribute to more than one window.
-
SQL provides a windowing feature to support such queries.
ROWS BETWEEN n_1 PRECEDING AND n_2 FOLLOWING
-
Example
- Moving average: given sales values for each date, calculate for each date the average of the sales on that day, the previous day, and the next day
SELECT date, SUM(value) OVER ( ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) FROM sales;
- the average of total number of credits taken by students in each 3 year.
SELECT year AVG (num_credits) OVER ( ORDER BY year ROWS 3 PRECEDING ) FROM tot_credits;
- the average of total number of credits taken by students in all prior years
SELECT year AVG (num_credits) OVER ( ORDER BY year ROWS UNBOUNDED PRECEDING ) FROM tot_credits;
- SQL supports windowing within partitions.
Example
- Find total balance of each account after each transaction on the account
SELECT account_number,
date_time,
SUM(value) OVER
(
PARTITION BY account_number
ORDER BY date_time
ROWS UNBOUNDED PRECEDING
) AS balance
FROM transaction
ORDER BY account_number, date_time;
UNBOUNDED
: The number of preceding / following rows are unboundedCURRENT ROW
: Specifies the current row- e.g.,
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- e.g.,
RANGE BETWEEN ...
: Cover all tuples with a particular value rather than covering a specific number of tuples
[1] Silberschatz, Abraham, Henry F. Korth, and Shashank Sudarshan. "Database system concepts." (2011).