Skip to content

Latest commit

 

History

History
380 lines (310 loc) · 15.6 KB

File metadata and controls

380 lines (310 loc) · 15.6 KB
title date last_modified_at
[DB] Advanced SQL
2024-03-27
2024-03-27

Accessing SQL From a Programming Language

A database programmer must have access to a general purpose programming language for at least two reasons:

  • Not all queries can be expressed in SQL.
    SQL does not provide the full expressive power of a general-purpose language.
  • Non-declarative actions cannot be done from within SQL.
    Non-delcatarive actions include printing a report, interacting with a user, and sending the results of a query to a GUI.

There are two approaches to accessing SQL from a general purpose programming language:

  • Dynamic SQL: SQL statements that are constructed at runtime; for example, the application may allow users to enter their own queries
    • A general purpose program can connect to and communicate with a database server using a collection of functions.
      • Enabling you to build SQL statements dynamically at runtime
      • Being able to create more general purpose, flexible applications by using dynamic SQL because the full text of a SQL statement may be unknown at compilation
      • e.g. Dynamic SQL statement in Python
  • Embedded SQL: SQL statements in an application that do not change at runtime and, therefore, can be hard-coded into the application
    • Like dynamic SQL, it provides a means by which program can interact with a database server. However, under embedded SQL:
      • The SQL statements are identified at compile time, and are translated into function calls.
      • At runtime, these function calls connect to the database using an API that provides dynamic SQL facilities (but may be specific to the database being used).
      • e.g. Oracle SQL

Triggers

  • A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
    • e.g. whenever a tuple is inserted into the takes relation, updates the tot_creds attribute of tuple in the student relation.
  • The two requirements of a trigger design are:
    • Specify the conditions under which the trigger is to be executed.
    • Specify the actions to be taken when the trigger executes.

Triggering Events and Actions in SQL

  • Triggering event can be an insert, a deletion, or an update.
  • Triggers on updates can be restricted to specific attributes.
    • e.g. AFTER UPDATE OF takes ON grade
  • Values of attributes before and after an update can be referenced.
    • REFERENCING OLD ROW AS: For deletions and updates
    • REFERENCING NEW ROW AS: For insertions and updates
  • BEGIN ATOMIC ... END can serve to collect multiple SQL statements into a single compound statement.
  • Triggers can be activated before and after an event.
    • AFTER UPDATE OF
    • BEFORE UPDATE OF: Serves as an extra constraint that can prevent invalid updates, inserts, or deletes
      • e.g., suppose the value of an inserted grade is blank, presumably to indicate the absence of a grade. We can define a trigger that replaces the value with NULL

        CREATE TRIGGER setnull_trigger BEFORE UPDATE OF takes
          REFERENCING NEW ROW AS nrow
          FOR EACH ROW
            WHEN (nrow.grade = ' ')
            BEGIN ATOMIC
                SET nrow.grade = NULL;
            END;

Example

  • When the grade attribute is updated for a tuple in the takes relation, keep the tot_cred attribute value of student tuples up-to-date:
-- When (Trigger Conditions)
CREATE TRIGGER credits_earned AFTER UPDATE OF takes ON (grade)
  -- What (Trigger Actions)
  REFERENCING NEW ROW AS nrow
  REFERENCING OLD ROW AS orow
  FOR EACH ROW
    WHEN    (nrow.grade <> 'F' AND nrow.grade IS NOT NULL)
    AND     (orow.grade = 'F'  OR  orow.grade IS NULL)
    BEGIN ATOMIC
        UPDATE  student
        SET     tot_cred = tot_cred +
        (
            SELECT  credits
            FROM    course
            WHERE   course.course_id = nrow.course_id
        )
        WHERE   student.id = nrow.id;
    END;
  • Using triggers to maintain referential integrity
CREATE TRIGGER timeslot_check1 AFTER INSERT ON section 
  REFERENCING NEW ROW AS nrow 
  FOR EACH ROW
    WHEN (nrow.time slot_id NOT IN (
          SELECT time_slot_id 
          FROM time_slot)) /* time_slot_id not present in time_slot */
    BEGIN
      rollback 
    END;
  
CREATE TRIGGER timeslot_check2 AFTER DELETE ON timeslot 
  REFERENCING OLD ROW AS orow 
  FOR EACH ROW
    WHEN (orow.time_slot_id NOT IN (
          SELECT time_slot_id 
          FROM time_slot)/*last tuple for time_slot_id deleted from time_slot */
        AND orow.time_slot_id IN (
          SELECT time_slot_id
          FROM section)) /* and time slot id still referenced from section*/
    BEGIN
      rollback 
    END;
  • Place a reorder when inventory falls below a specified minimum
CREATE TRIGGER reorder AFTER UPDATE OF level ON inventory 
  REFERENCING OLD ROW AS orow, NEW ROW AS nrow
  FOR EACH ROW
    WHEN nrow.level <= (SELECT level
                        FROM minlevel
                        WHERE minlevel.item = orow.item)
    AND orow.level > (SELECT level
                      FROM minlevel
                      WHERE minlevel.item = orow.item)
    BEGIN ATOMIC
      INSERT INTO orders
        (SELECT item, amount 
        FROM reorder
        WHERE reorder.item = orow.item);
    END;

Statement level triggers

  • Instead of executing a separate action for each affected row, a single action can be executed for all rows affected by a transaction.
    • Instead of FOR EACH ROW, use FOR EACH STATEMENT.
    • Instead of REFERENCING [OLD/NEW] ROW AS ..., use REFERENCING [OLD/NEW] TABLE AS ... to refer to temporary tables (called transition tables) containing all the affected rows.
  • Statement level triggers can be more efficient when dealing with SQL statements that update a large number of rows.

When Not To Use Triggers

  • Earlier, triggers were used for tasks such as
    • Maintaining summary data (e.g., total salary of each department).
    • Replicating databases by recording changes to special relations (called change or delta relations), and having a separate process that applies the changes over to a replica
  • However, there are better ways of doing these nowadays.
    • Databases today provide built-in materialized view facilities to maintain summary data.
    • Databases provide built-in support for replication.
    • Encapsulation facilities can be used instead of triggers in many cases. Define methods to update fields, which directly carry out actions as part of the update methods instead of through a trigger indirectly.

Risks of Triggers

There are some risks for using triggers. For example,

  • Risk of unintended execution of triggers. For example, when
    • Loading data from a backup copy, or Replicating updates at a remote site
      • In this case, the triggered action has already been executed and reflected in the backup or replica in the past $\Rightarrow$ redundant
      • Thus triggers would have to be disabled initially and enabled when the backup site takes over processing from the primary system.
  • Trigger error at runtime might lead to the failure of critical transactions that set off the trigger.
  • Cascading execution of triggers.
    • e.g., an insert trigger on a relation causes another insert on the same relation
    • might lead to an infinite chain of triggering

Recursive Queries

WITH RECURSIVE clause

  • Recursive views make it possible to write queries that cannot be written without recursion or iteration.

    • Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of relation with itself
      • This program can give only a fixed number of levels of prerequisites
      • Given a fixed non-recursive query, we can construct a database with a greater number of levels of prerequisites on which the query will not work
    • Alternatively, we may write a procedure to iterate as many times as required
      • See the procedure findAllPrereqs in the textbook
  • Any recursive view must be defined as the union of two subqueries:

    • base query: nonrecursive
    • recursive query: uses the recursive view.
  • The meaning of a recursive view is best understood as follows:

    1. compute the base query and add all the resultant tuples to the recursively defined view relation (initially empty)
    2. compute the recursive query using the current contents of the view relation, and add all the resulting tuples back to the view relation
    3. keep repeating the above step until no new tuples are added to the view relation.
    4. at some point, no tuple is added to the view after recursion. This final result is called the fixed point of the recursive view definition.
  • Example. Find all (direct or indirect) prerequisite subjects of each course

    WITH RECURSIVE rec_prereq(course_id, prereq_id) AS
    (
        (
            SELECT  course_id,
                    prereq_id
            FROM    prereq
        )
        UNION
        (
            SELECT  rec_prereq.course_id,
                    prereq.prereq_id
            FROM    rec_prereq,
                    prereq
            WHERE   rec_prereq.prereq_id = prereq.course_id
        )
    )
    SELECT  course_id
    FROM    rec_prereq;
    • The example view rec_prereq is called the transitive closure of the prereq relation, meaning that it is a relation that contains all pairs (cid, pre) such that pre is a direct or indirect prerequisite of cid.
  • Recursive views are required to be monotonic.

    • For each step of the recursion, the view must contain all of the tuples it contained in the previous step, plus possibly some more.
    • Recursive queries may not use any of the following construct:
      • Aggregation on the recursive view
      • NOT EXISTS on a subquery that uses the recursive view.
      • Set difference (EXCEPT) whose RHS uses the recursive view.

Advanced Aggregation Features

The aggregation support in SQL is quite powerful and handles most common tasks with ease.

Ranking

  • Finding the position of a value in a larger set is a common operation. In SQL, ranking is done by RANK() OVER in conjunction with an ORDER BY specification.

  • Examples

    • Find the rank of each student

      SELECT  ID,
              RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
      FROM    student_grades;
      /* student_grades(ID, GPA) is a relation 
      * giving the GPA of each student */
    • Here, an extra ORDER BY clause can return the results in sorted order.

      SELECT  ID,
              RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
      FROM    student_grades
      ORDER BY s_rank ASC;
  • A basic issue with ranking is how to deal with the case of mutiple tuples that are the same values on the ordering attribute(s).

    • Naturally, the RANK() function leaves gaps.
      Meaning that if the highest GPA is shared by two students, both would get rank 1, and the next rank would be 3.
    • There is a DENSE_RANK() function which does not leave gaps.

Ranking with partitions

  • Ranking can be done within partitions of data, using PARTITION BY.
  • Examples
    • Find the rank of students within each department

      SELECT  ID,
              dept_name,
              RANK() OVER 
              (
                  PARTITION BY dept_name
                  ORDER BY GPA DESC
              ) AS dept_rank
      FROM    dept_grades
      ORDER BY dept_name, dept_rank;
  • Multiple rank expressions can be used within a single SELECT statement.
  • When ranking (possibly with partitioning) occurs along with a GROUP BY clause, the GROUP BY clause is applied first, and partitioning and ranking are done on the results of GROUP BY, allowing aggregate values to be used for ranking.

Other ranking related features

  • Several other functions can be used in place of RANK
    • PERCENT_RANK: Gives the rank of the tuple as a fraction
    • CUME_DIST: Cumulative distribution
    • ROW_NUMBER: Sorts the rows and gives each row a unique number corresponding to its position
      • Non-deterministic in presence of duplicates
    • NULLS FIRST, NULLS LAST
    • NTILE(n): Takes the tuples in each partition in the specified order, and divides them into n buckets with equal numbers of tuples

Example

  • Find for each student the quartile they belong to

    SELECT  ID,
            NTILE(4) OVER (ORDER BY GPA DESC) AS quartile
    FROM    student_grades;
  • Find the rank of each student

    SELECT  ID,
            RANK() OVER (ORDER BY (GPA) DESC NULLS LAST) AS s_rank
    FROM    student_grades;
    /* student_grades(ID, GPA) is a relation 
    * giving the GPA of each student */

Windowing

  • Window queries compute an aggregate function over ranges of tuples. They are used to smooth out random variables.

  • Unlike partitions, windows may overlap, in which case a tuple may contribute to more than one window.

  • SQL provides a windowing feature to support such queries.

    ROWS BETWEEN n_1 PRECEDING AND n_2 FOLLOWING
  • Example

    • Moving average: given sales values for each date, calculate for each date the average of the sales on that day, the previous day, and the next day
    SELECT  date,
            SUM(value) OVER
            (
                ORDER BY date
                ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
            )
    FROM    sales;
    • the average of total number of credits taken by students in each 3 year.
    SELECT  year
            AVG (num_credits) OVER
            (
                ORDER BY year
                ROWS 3 PRECEDING
            )
    FROM    tot_credits;
    • the average of total number of credits taken by students in all prior years
    SELECT  year
            AVG (num_credits) OVER
            (
                ORDER BY year
                ROWS UNBOUNDED PRECEDING
            )
    FROM    tot_credits;

Windowing with partitions

  • SQL supports windowing within partitions.

Example

  • Find total balance of each account after each transaction on the account
SELECT  account_number,
        date_time,
        SUM(value) OVER
        (
            PARTITION BY account_number
            ORDER BY date_time
            ROWS UNBOUNDED PRECEDING
        ) AS balance
FROM    transaction
ORDER BY account_number, date_time;

Other windowing related features

  • UNBOUNDED: The number of preceding / following rows are unbounded
  • CURRENT ROW: Specifies the current row
    • e.g., ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • RANGE BETWEEN ...: Cover all tuples with a particular value rather than covering a specific number of tuples




Reference

[1] Silberschatz, Abraham, Henry F. Korth, and Shashank Sudarshan. "Database system concepts." (2011).