TyphonQL is a query language and data-manipulation language (DML) to access polystores (federations of different kinds of database back-ends, relational, document, key-value etc.) while at the same time abstracting as much as possible from how the data is actually stored.
Executing TyphonQL queries is parameterized by a TyphonML model, which provides the logical data schema in the form of an object-oriented information model. A TyphonML model declares entities with primitively-typed attributes, and bi-directional (many-valued) relations (which can be containment/ownership) relations.
TyphonQL is designed to allow the query writer to think at the level of TyphonML entities as much as possible. With TyphonQL one does not manipulate tables, graphs, documents, or key-value pairs, but sets of objects which may have relations to each other, and which conform to the entity types declared in the TyphonML model.
The present document aims to describe the TyphonQL in sufficient detail for end-users of the language. Thus, it is not a formal reference document, but rather a short overview, touching upon the most common and most quirky features in equal amount.
The next section presents an abstract overview of the language, and after we present the language using numerous examples.
NB: like TyphonQL itself, this manual, is work-in-progress. It describes how the designers of TyphonQL (Tijs van der Storm, Pablo Inostroza, Davy Landman) expect the language to work, -- there might be bugs.
This section provides a cursory overview of the language.
TyphonQL supports the following literal (constant) expressions:
- Booleans:
true
,false
- Integer numbers:
123
,-34934
- Strings:
"this is a string value"
- Floating point numbers:
0.123
,3.14
,-0.123e10
,2324.3434e-23
- Dates:
$2020-03-31$
- Date and time values:
$2020-03-31T18:08:28.477+00:00$
- Geographical points:
#point(23.4 343.34)
- Polygons:
#polygon((0.1 1.0, 2.0 0.0, 3.0 3.0, 0.1 1.0))
; - Null (indicating absence of a reference or value):
null
- Blob-pointers:
#blob:2ed99a8e-5259-4efd-8cb4-66748d52e8a1
Furthermore, TyphonQL supports syntax for dealing with objects (instances of entity types):
- Object literals (tagged with the entity type, in this case
Person
):Person {name: "Pablo", age: 30, reviews: [#879b4559-f590-48ea-968c-ff3b69ec5363, #23275eec-4746-4f23-a854-660160cafed2]}
- Reference values (pointers), represented as UUIDs:
#879b4559-f590-48ea-968c-ff3b69ec5363
- Collections of pointers to objects:
[#8bc3f0a0-5cf4-42e5-a664-0617feb2d400, #23275eec-4746-4f23-a854-660160cafed2, #879b4559-f590-48ea-968c-ff3b69ec5363]
Object literals are used as argument to insert statements, and (lists of) references are used in both insert and update statements to create links/relations between objects. In the future we might support nesting of object literals and within-insert symbolic cross referencing to manipulate complete object graphs all at once.
Select queries as well as update and delete statements use expressions to filter results and find objects to operate on respectively. For instance, a from-select query specifies a number of result expressions and conditions in the where-clause. Update and delete find the object(s) to be update resp. deleted using similar conditions in a where-clause.
TyphonQL supports the following non-literal expressions:
- Attribute or relation access:
entity.field
- Accessing the identity of an object:
entity.@id
- Boolean operators:
!exp
(negation),exp1 && exp2
(conjunction),exp1 || exp2
(disjunction) - Arithmetic operators:
exp1 * exp2
,exp1 / exp2
,exp1 + exp2
,exp1 - exp2
- Comparison operators:
exp1 == exp2
,exp1 != exp1
,exp1 > exp2
,exp1 >= exp2
, etc.
The prefix and infix operators follow the precedence levels of Java-like languages.
To be implemented:
- member operator:
exp1 in exp2
- textual match operator:
exp1 like exp2
pt1 = point(1.3,2.5)
pt2 = point(3.5,4.6)
pg1 = polygon([
[point(0, 0), pt1, point(1,1), pt2, point(0,0)]
])
pg2 = polygon([
[point(3,0), pt1, point(2,2), pt2, point(3,2)]
])
(note: pseudo QL syntax to make it a bit more readable)
distance in meters:
- two points:
distance(pt1, pt2)
- one point and closest edge of polygon:
distance(pt1, pg2)
containment:
- point inside a polygon:
pt1 in pg2
- polygon fully inside another polygon:
pg1 in pg2
overlap:
- polygon partially overlaps another polygon:
pg1 & pg2
note: on mongodb backends distance is limited to the where query and only in presence of a comparison operator. Cassandra and neo4j don't support geo operations.
If two entities are related to each other through an entity stored in a graph database, it is possible to use the reachability expression:
entity1 -[edgeEntity, lower..upper]-> entity2
entity1
andentity2
are the ids of entities that can be stored in any backendedgeEntity
is the id of the entity stored as an edge in a graph databaselower
andupper
are expressions that represents the bounds for the number of hops one wants to execute fromentity1
. By ommiting one or both of them, we get different semantics explained through the following examples.
Example 1: person1 -[friendOf, 2..3]-> person2
-> Find friends of person1
using at least 2 jumps and at most 3 jumps. Notice that this expression will ignore the direct friends of person1
.
Example 2: person1 -[friendOf, 1..]-> person2
-> Find friends of person1
using at least 1 jump, without constraining how many jumps can be used.
Example 3: person1 -[friendOf, ..3]-> person2
-> Find friends of person1
using at most 3 jumps. By default the lower limit would be 1.
Example 4: person1 -[friendOf]-> person2
-> Find friends of person1
using at least 1 jumps and without constraining the number of jumps. This expression computes the transitive closure of friendOf
for the set { person1 }
.
Blobs are handled in a special way, during insertion/update you have to send them as a pointer to a blob: #blob:UUID
(and pass along the contents of the blob to the API in a seperate field).
While selecting them, you get a base64 encoded version of the blob. It is not possible to do any operations on them, they are opaque.
Queries follow the tradition of SQL queries, except that the select and from parts are swapped. A basic query thus has the form of "from bindings select results where conditions". Bindings consist of a list of "Entity Variable" pairs, separated by comma, which introduce the scope of the query. Results is a list of expressions (separated by commas) that will make up the final result of the query. The where-clause is optional, but if present it consists of a list of expressions (separated by commas) filtering the result set.
For now, in results the only allowed expressions are x
(an entity variable introduced in the bindings), x.@id
, and x.f
(attribute or relation access).
The general form of the insert statement is "insert Entity { assignments }". The entity is the type of the object to be inserted as defined in the TyphonML statement. The assignments are bindings of the form "attrOrRelation: expression". The TyphonQL type checker will check that all assignments are correctly typed according the TyphonML model, including multiplicity constraints.
Update and delete statements specify the objects to work on via where-clauses. For instance, update has the form "update Entity x where conditions set { assignments }". The assignments are the same as in insert, except that for many-valued relations, they can specify additions ("relation +: expression") and removals ("relation -: expression").
Delete has the form "delete Entity x where conditions", which will delete all entities of type Entity satisfying the conditions in the where-clause.
All three DML statements ensure (as much as possible) that relational integrity is preserved, even across database back-ends. In particular this means:
- creating resp. breaking a relation between entities entail creating resp. breaking the inverse link as well (if so declared in the TyphonML model)
- deleting an object will delete all objects "owned" by it via containment relations (cascading delete).
Cascading delete of contained object is currently limited to one hop across database boundaries. In other words, if a sequence of containment relations alternatingly cross multiple database back-ends the cascade is only performed for the first relation.
If the type of a field is freetext
you have extra subfields available during select queries. Nothing special happens during insertion, but depending on the configered nlp analysis for this field, you have extra fields available. The table below lists all of the custom fields.
Analysis | Fieldname | type |
---|---|---|
PhraseExtraction | PhraseExtraction.Token | text |
PhraseExtraction | PhraseExtraction.end | int |
PhraseExtraction | PhraseExtraction.begin | int |
POSTagging | POSTagging.end | int |
POSTagging | POSTagging.begin | int |
POSTagging | POSTagging.PosTag | text |
POSTagging | POSTagging.PosValue | text |
RelationExtraction | RelationExtraction.TargetEntity.NamedEntity | text |
RelationExtraction | RelationExtraction.RelationName | text |
RelationExtraction | RelationExtraction.TargetEntity.begin | int |
RelationExtraction | RelationExtraction.TargetEntity.end | int |
RelationExtraction | RelationExtraction.end | int |
RelationExtraction | RelationExtraction.begin | int |
RelationExtraction | RelationExtraction.SourceEntity.NamedEntity | text |
RelationExtraction | RelationExtraction.SourceEntity.end | int |
RelationExtraction | RelationExtraction.SourceEntity.begin | int |
nGramExtraction | nGramExtraction.NgramType | text |
nGramExtraction | nGramExtraction.begin | int |
nGramExtraction | nGramExtraction.end | int |
ParagraphSegmentation | ParagraphSegmentation.end | int |
ParagraphSegmentation | ParagraphSegmentation.begin | int |
ParagraphSegmentation | ParagraphSegmentation.Paragraph | text |
Tokenisation | Tokenisation.Token | text |
Tokenisation | Tokenisation.end | int |
Tokenisation | Tokenisation.begin | int |
TermExtraction | TermExtraction.end | int |
TermExtraction | TermExtraction.WeightedToken | int |
TermExtraction | TermExtraction.TargetEntity.NamedEntity | int |
TermExtraction | TermExtraction.begin | int |
TermExtraction | TermExtraction.TargetEntity.begin | int |
TermExtraction | TermExtraction.TargetEntity.end | int |
Chunking | Chunking.begin | int |
Chunking | Chunking.end | int |
Chunking | Chunking.PosAnnotation.PosValue | text |
Chunking | Chunking.PosAnnotation.end | int |
Chunking | Chunking.TokenAnnotation.begin | int |
Chunking | Chunking.TokenAnnotation.end | int |
Chunking | Chunking.PosAnnotation.PosTag | text |
Chunking | Chunking.PosAnnotation.begin | int |
Chunking | Chunking.TokenAnnotation.Token | text |
Chunking | Chunking.Label | text |
NamedEntityRecognition | NamedEntityRecognition.NamedEntity | text |
NamedEntityRecognition | NamedEntityRecognition.begin | int |
NamedEntityRecognition | NamedEntityRecognition.GeoCode | point |
NamedEntityRecognition | NamedEntityRecognition.WordToken | text |
NamedEntityRecognition | NamedEntityRecognition.end | int |
Stemming | Stemming.begin | int |
Stemming | Stemming.end | int |
Stemming | Stemming.Stem | text |
Lemmatisation | Lemmatisation.begin | int |
Lemmatisation | Lemmatisation.end | int |
Lemmatisation | Lemmatisation.Lemma | text |
DependencyParsing | DependencyParsing.DependencyName | text |
DependencyParsing | DependencyParsing.TargetEntity.NamedEntity | text |
DependencyParsing | DependencyParsing.TargetEntity.begin | int |
DependencyParsing | DependencyParsing.TargetEntity.end | int |
DependencyParsing | DependencyParsing.begin | int |
DependencyParsing | DependencyParsing.SourceEntity.begin | int |
DependencyParsing | DependencyParsing.SourceEntity.end | int |
DependencyParsing | DependencyParsing.end | int |
DependencyParsing | DependencyParsing.SourceEntity.NamedEntity | text |
SentenceSegmentation | SentenceSegmentation.Sentence | text |
SentenceSegmentation | SentenceSegmentation.begin | int |
SentenceSegmentation | SentenceSegmentation.end | int |
SentimentAnalysis | SentimentAnalysis.SentimentLabel | text |
SentimentAnalysis | SentimentAnalysis.Sentiment | int |
CoreferenceResolution | RelationExtraction.Anaphor.begin | int |
CoreferenceResolution | RelationExtraction.Anaphor.Token | text |
CoreferenceResolution | RelationExtraction.Anaphor.end | int |
CoreferenceResolution | CoreferenceResolution.Antecedent.end | int |
CoreferenceResolution | CoreferenceResolution.begin | int |
CoreferenceResolution | CoreferenceResolution.Antecedent.begin | int |
CoreferenceResolution | CoreferenceResolution.end | int |
CoreferenceResolution | CoreferenceResolution.Antecedent.Token | text |
Every datebase handles date-time, time zones and time offsets differently. For typhon ql we picked a scheme that would be correct for most use cases, and we could support for all of the backends.
We store and normalize all date times in UTC/Zulu time zone. If you send a date time with an offset, we apply that offset and store it in UTC time. If you send a date time without an offset, we use the time zone of the QL server (that is configurable via the TZ
environment flag) to translate it to UTC.
On querying a date-time field, you will always get back the UTC version. Client libraries can use that to convert it back to a display format if required. If you are dealing with an application with multiple time zones, it's a good idea to store the zone id (like Europe/Amsterdam
) in a separate field next to the date-time.
In this section we will illustrate TyphonQL using numerous examples. The example queries and DML statements should be understood in the context of an example TyphonML, which is shown below.
entity Product {
name : string[256]
description : string[256]
price : int
productionDate : date
reviews :-> Review."Review.product"[0..*]
wish :-> Wish."Wish.product"[1]
}
entity Review {
content: text
product -> Product[1]
user -> User[1]
}
entity User {
name : string[256]
address: string[256]
biography :-> Biography[0..1]
reviews -> Review."Review.user"[0..*]
wish :-> Wish."Wish.user"[1]
}
entity Biography{
content : string[256]
user -> User[1]
}
entity Wish {
intensity: int
user -> User[1]
product -> Product[1]
}
relationaldb Inventory {
tables{
table { UserDB : User }
table { ProductDB : Product }
}
}
documentdb Reviews {
collections{
Review : Review
Biography : Biography
}
}
graphdb Wishes {
edges {
edge Wish {
from "Wish.user"
to "Wish.product"
}
}
}
Entities Product and User are deployed to an SQL database (MariaDB), called Inventory; the Review and Biography entities are stored on a (MongoDB) document-store called Reviews; and the Wish entity is stored on a (Neo4J) graph database.
Products own a number of Reviews ("deleting a product will delete associated reviews as well") via
the relation reviews
.
The ownership link can be traversed from the product
reference in Reviews because of the opposite
declaration on reviews
.
Reviews are also authored by users, which is modeled by the reviews
relation on the User entity.
This relation is not a containment relation, because an entity can only be owned by a single
entity at one point in time. User biographies however are owned by User entities via the biography
relation.
A Wish relates one user to one product, holding a value for the "intensity" of this relation. Entities that are
stored in graph databases have a number of constraints, as they represent edges in this kind of backends. Wish
must have exactly two related entities with cardinality 1, and the opposite relation might be declared in the
related entities, as long as they represent containment and have cardinality one (see wish
relation in Product
and User). In other words, removing any of the entities that correspond to the vertices should also remoce
the "edge" entity. The directionality of the relation is established in the database mapping, particularly, in the graphdb
section, where we see which relation represent the source and which one the target inside the graph database.
TyphonQL assumes TyphonML models are well-formed in the following ways:
- all entities are are placed on a database back-end
- containment is uni-directional (e.g. inverses of containment cannot be containment)
- containment is not many-to-many (i.e. tree shaped)
- containment is uniquely rooted: every owned entity can be reached from a unique path starting from an entity that is not owned
TyphonML references support bidirectional navigation over relations between entities through inverses (AKA "opposites"). In other words it is possible to navigate across a single relation in two ways. In order to support this in the implementation of TyphonQL, such bidirectional relations are realized in the back-ends in both directions. TyphonQL ensures that updates to a relation are always mirrored in the other direction according to the opposite declaration(s). This means that how you navigate across a relation (from which direction) may have different consequences at the level of the implementation.
The only exceptions to this rule are:
- a containment relation within SQL is always modeled using a single foreign key from child to parent
- a cross-reference relation within SQL is modeled using a single junction table (representing both directions).
Selecting all users:
from User u select u
This will return all the attributes of all users (but excluding references to other entities).
Selecting specific attributes of users:
from User u select u.name
This will return the identities of the users paired with their name.
Selecting a specific relation:
from User u select u.reviews
This will return pairs of user identity and review identity.
If a user has no reviews, it's identity will be paired with null
.
Filtering on a specific attribute:
from User u select u where u.name == "Pablo"
A complex query across database boundaries: find all user and product name pairs for which a user has written a review containing the word "bad".
from User u, Product p, Review r select u.name, p.name
where u.reviews == r, p.reviews == r, r.text like "bad"
Note the use of "==" even for many-valued references.
You may wonder why not use the x in y
operator for "joining" on collections; this has to do
with how relational back-ends deal with multiplicity and would complicate the implementation
considerably.
TyphonQL currently supports the usual aggregation operators count
, sum
, max
, min
, and avg
,
to be used in combination with the group-construct.
For instance, to count the number of reviews per user:
from User u
select u.name, count(u.reviews) as revCount
group u.name
NB: ALL aggregation expressions (e.g. count(u.reviews)
) need to be aliased with a variable name using "as".
If you want to consider the whole entity as the only group you may omit the "group"-clause, e.g.:
from User u
select count(u.@id) as cnt
Counts the number of unique users.
Aggregated values can be constrained using the having-clause:
from User u
select u.name, count(u.reviews) as revCount
group u.name
having revCount > 5
This selects all users (and their review counts) that have written more than five reviews.
The aggregation component of TyphonQL also supports limit-, offset-, and order-clauses for, respectively, limiting the size of the result set, and sorting it. These can be used in combination with aggregation (group-by) or without.
For instance:
from User u
select u.name, count(u.reviews) as revCount
group u.name
order revCount desc
sorts the previous query in descending order by review count. The "desc" modifier can also be omitted, in which case it amounts to using "asc" for ascending order.
To get the second most prolific reviewer, one could write:
from User u
select u.name, count(u.reviews) as revCount
group u.name
order revCount desc
limit 1
offset 1
TODO: mention that parents should pre-exist kids ("children cannot exist without parents"). With insert: custom data type value must be fully specified, but in updates, you can partially update sub-fields.
insert User { name: "John Smith", age: 30 }
insert User { name: "John Smith", age: 30, cards: [#a129feec-4b92-4ab2-9ef5-d276a7566f56] }
insert CreditCard { number: "1762376287", expires: $2020-02-21T14:03:45.274+00:00$ }
The following is not allowed, because owner is an inverse.
insert CreditCard {
number: "1762376287",
expires: $2020-02-21T14:03:45.274+00:00$,
owner: #ff704edc-5d85-470b-9ed4-fb8761bbe93a
}
Alternative is:
insert CreditCard {
number: "1762376287",
expires: $2020-02-21T14:03:45.274+00:00$
}
and then:
update User u where u.@id == #ff704edc-5d85-470b-9ed4-fb8761bbe93a
set { cards +: [#the-id-of-the-new-creditcard] }
Or, (better), inserting into owner directly:
Well-formedness of Update
- you cannot update @id fields
- no nested object literals
Updating simple-valued attributes
update User u where u.name == "John Smith" set { age: 30 }
Updating custom data types: TODO.
Setting a relation:
update Review r where r.@id == #13245f43-634f-46bf-a73d-6bd30865f5d4
set { author: #a129feec-4b92-4ab2-9ef5-d276a7566f56 }
This is equivalent to:
update User u where u.@id == #a129feec-4b92-4ab2-9ef5-d276a7566f56
set { reviews +: [#13245f43-634f-46bf-a73d-6bd30865f5d4] }
// is this possible? Shouldn't review already have an owner? i.e. the author?
Setting a many-valued relation:
update User u where u.name == "John Smith"
set { cards: [#a129feec-4b92-4ab2-9ef5-d276a7566f56] }
Adding:
update User u where u.name == "John Smith"
set { cards +: [#a129feec-4b92-4ab2-9ef5-d276a7566f56] }
Removing
update User u where u.name == "John Smith"
set { cards -: [#a129feec-4b92-4ab2-9ef5-d276a7566f56] }
- cascade to owned things, but only one hop across database boundaries.
update User u where u.@id == ?
set { cards +: [#a129feec-4b92-4ab2-9ef5-d276a7566f56] }
Named placeholders:
update User u where u.@id == ??param
set { cards +: [#a129feec-4b92-4ab2-9ef5-d276a7566f56] }
To execute queries outside of the TyphonQL IDE, the Typhon Polystore offers an HTTP API. This section describes how to invoke it, we only describe the section of the Polystore API related to TyphonQL.
In case you need to start from fresh:
GET http://polystore-api/api/resetdatabases
Post the query as a json object to:
POST http://polystore-api/api/query
Body:
{
"query": "from User u select u"
}
Note that you have to make sure to correctly encode is as a json string, any language with a json library will take care of this.
Post the command as a json object to:
POST http://polystore-api/api/update
Body:
{
"query": "insert User { name : \"John\", age: 35 }"
}
If you want to insert a new blob, you have to send along the contents of the blob separately from the query.
{
"query": "update User u where u.name==\"John\" set { image: #blob:ecbbb3c4-6c5f-4ef0-bce4-58b847a82222 }",
"blobs": {
"ecbbb3c4-6c5f-4ef0-bce4-58b847a82222": "eW91IGdldCBhIGNvb2tpZQ==" // base64 encode of the blob contents
}
}
For both ease of writing and performance, it's possible to generate send a single query, where placeholders will be replaced by a list of bindings. This makes it possible to send multiple queries in a single command.
{
"query": "insert User { @id: ??id, name: ??uname, age: ??uage }",
"parameterNames": ["id", "uname", "uage"],
"parameterTypes": ["uuid", "string", "int"],
"boundRows": [
["beefc0b2-0393-4b73-951c-7243ee849275", "John Smith", "20"],
["52001d98-05f2-4832-b653-6077a4db05a7", "Smith John", "1"]
]
}
If you encounter problems with your queries, you can turn on the same typechecker that is running inside the IDE. Add "validate" : true
to your request to run it.
Type check errors (like non-existing fields) become a 500 error (with an error message). Warnings show up along side the resulting json object. To not break the interface of the insert/update/delete API, the warnings are printed as entries of the result array.
Some remarks about this api:
- You have to give the names for the parameters, their typhon types, and then a 2d string array, with rows that are bound to the parameters, in the same order.
- The syntax of the values is the same as the output of a query. You do not have to encode the values as QL literals (for example, a uuid doesn't have to be prefixed with
#
, and a string doesn't require double escaping). - Valid types are:
int
bigint
float
string
bool
text
uuid
date
datetime
point
polygon
blob
(for these you do have to use the ql literal:#blob:uuid...
)
- You can combine this with the blobs field.
- Any json libary should be able to generate these objects
- We encode numbers as string, since json only supports doubles