This repository is for my final project for CS 562 at Stevens Institute of Technology.
This package aims to implement the Phi Operator, an extended version of the SQL/Relational Algebra Group-By Operator. Particular in the MF Query form.
ESQLConstructor
is intended to be used on the command line via ESQLConstructorCLI
executable target. Using input from the CLI target, ESQLConstructor
generates the package ESQLEvaluator
, which produces the result of the Phi operator for a given set of parameters.
To run ESQLConstructor
and ESQLEvaluator
and PostgreSQL database is required.
For simplicity, the package only recognizes the sales
table, defined with the following schema:
cust
:varchar(20)
prod
:varchar(20)
day
:integer
month
:integer
year
:integer
state
:character(2)
quant
:integer
date
:date
There are a couple benefits to using the Phi operator over the traditional group-by operator.
For one queries can be written a lot more succiently. For instance, suppose we want to get the number of sales in NY, the sum of sale quantities in NJ, and the max sale quantity in CT, all per customer.
One could write a query like this:
with t1 as (
SELECT cust, count(quant) as nyCount
FROM sales
WHERE state = 'NY'
GROUP BY cust
),
t2 as (
SELECT cust, sum(quant) as njSum
FROM sales
WHERE state = 'NJ'
GROUP BY cust
),
t3 as (
SELECT cust, max(quant) as ctMax
FROM sales
WHERE state = 'CT'
GROUP BY cust
)
SELECT t1.cust, nyCount, njSum, ctMax
FROM t1 natural join t2 natural join t3
This same query can be expressed a lot more succiently using ESQL and Phi and get us the exact same result.
SELECT cust, count(NY.quant), sum(NJ.quant), max(CT.quant)
FROM sales
GROUP BY cust; NY, NJ, CT
SUCH THAT NY.state = 'NY'
NJ.state = 'NJ'
CT.state = 'CT'
Additionally, Phi can operate with fewer table scans which reduces the time required to perform the query and is crucial when only having access to data for a limited amount of time.
This was the premise on which the Phi operator was created for, in the context of operating on packets passed through cell towers operated by AT&T.
The Phi operator, in its relational algebra form, has 6 parameters, which ESQLConstructor
takes in.
They are as follows:
- Projected Values
- Number of Grouping Variables
- The Group-By Attributes
- The Aggregate Functions
- Grouping Predicates
- Having Predicate (Optional)
For the above example, it would look like this:
cust
,count(NY.quant)
,sum(NJ.quant)
,max(CT.quant)
3
cust
count(NY.quant)
,sum(NJ.quant)
,max(CT.quant)
NY.state = 'NY'
,NJ.state = 'NJ'
,CT.state = 'CT'
- Nothing
This package aims to run the algorithm behind the Phi operator based on these parameters.
For more details you can read the following papers (recommended in this order):
- Querying Multiple Features of Groups in Relational Databases (MF Queries)
- Evaluation of Ad Hoc OLAP: In-Place Computation (EMF Queries)
I had no part in either paper, but were used as references in creating this package and the generated one.
This package has two targets, ESQLConstructor
and ESQLConstructorCLI
. The latter is an executable target and the main interface for the package operations. Although, ESQLConstructor could, in theory, be used as a normal dependency.
Regardless, an PostgreSQL database is required to use the functionality of ESQLConstructorCLI
and ESQLEvaluator
.
Some code from ESQLConstructor
could be used without, but wouldn't be fully functional without a Postgres Database.
To run ESQLConstructorCLI
select the scheme of the same name.
Depending on the command you want to run, you'll have to edit the scheme and change "Arguments Passed on Launch" under Run > Arguments.
As soon as the package is opened dependencies will be downloaded.
This can be run on Xcode or via the command line:
$ swift run ESQLConstructorCLI [cmd] [arguments]
For instance, if I wanted to run the db-setup
command (see more below), I would do something like this:
$ swift run ESQLConstructorCLI db-setup --host "myHost" --port 5432 --username "myUsername" --password "myPassword" --database "myDatabase"
On the first innovcation of swift run
, the package will resolve dependencies and compile before running.
Once built the following commands will need to be run, in this order.
swift run ESQLConstructorCLI db-setup [args]
swift run ESQLConstructorCLI constructor-file [args]
ORswift run ESQLConstructorCLI constructor-args [args]
swift run ESQLEvaluator
(produced byESQLConstructorCLI
)
db-setup
only needs to be run once, if the credentials don't change. Afterwards your credentials are stored for use in the other commands.
Each command of ESQLConstructorCLI
is detailed below.
ESQLConstructorCLI
supports three comands:
db-setup
: Store & verify database credentials (Run first!)constructor-file
: Create output package by reading a fileconstructor-args
: Create output package by reading from the command line
This command stores and verifies passed in database credentials for use in the other commands.
The command is used like this:
$ ./evaluator db-setup --host "myHost" --port 5432 --username "myUsername" --password "myPassword" --database "myDatabase"
These are the arguments for the command (used in this order):
--host
: The hostname of the database--port
: The port the database is hosted on (defaults to 5432)--username
: Database username--password
: Database password (optional)--database
: Database name (optional)
--port
is the only argument that must be a number!
Note: This command is required to be used before using the other two commands. ESQLEvaluator
will not be produced otherwise.
This command reads a file for the parameters of Phi and creates ESQLEvaluator
baws on it.
The command is used like this:
$ swift run ESQLConstructorCLI constructor-file --input "/my/input/file" --output "/my/output/path/"
These are the arguments for the command (used in this order)
-i
or--input
: Path to read parameters from-o
or--output
: Path to write produced package to
The input file itself will look like this:
cust, count_1_quant, sum_2_quant, max_3_quant
3
cust
count_1_quant, sum_2_quant, max_3_quant
1.state = 'NY'; 2.state = 'NJ'; 3.state = 'CT'
If there is a having predicate, it would be placed on a 6th line, otherwise it can be omitted.
Having Predicates can be inputed just as it is in SQL.
This command reads the command line arguments for the parameters of Phi and creates ESQLEvaluator
based on it.
This command is used like this:
$ swift run ESQLConstructorCLI constructor-args -S "cust, count_1_quant, sum_2_quant, max_3_quant" -n 3 -V "cust" -F "count_1_quant, sum_2_quant, max_3_quant" -s "1.state = 'NY'; 2.state = 'NJ'; 3.state = 'CT'" --output "/my/file/path"
These are the arguments for the command (used in this order)
-S
: Projected Values-n
: Number of Group-By Variables-V
: Group-By Attributes-F
: Aggregate Functions-s
: Grouping Predicates-G
: Having Predicate (optional)-o
or--output
: Path to write produced package to
Format for each argument is the same they are in the input file for the constructor-file
command. However, they may have to be wrapped in quotation marks.
ESQLEvaluator
is an executable target and can simply be run once it is produced by ESQLConstructor
. Necessary information from the ESQLConstructorCLI
's commands are passed along in the production process.
Unlike ESQLConstructorCLI
, it doesn't take any arguments, so it can be run simply from Xcode or by using:
$ swift run ESQLEvaluator
The result of the evaluation will be printed in the console.