The Fili API provides access to the underlying data that powers Fili.
All interaction with the API is via HTTPS GET
requests, which means that your entire query is just a URL, making it
easy to save and share queries.
There are 5 main concepts in the Fili API:
Dimensions are the dimensions along which you can slice and dice the data. Dimensions can be used for grouping and aggregating, as well as filtering of data, and are a critical part of the system. Each dimension has a set of available fields, as well as a collection of possible values for that dimension. These dimension fields and values serve two primary purposes: Filtering and Annotating data query results.
Typical dimensions have an id
property (a natural key) and a desc
(description) property (a human-readable description). Both of
these fields can be used to filter rows reported on, and both of these fields are included in the data query result set
for each dimension.
Get a list of all dimensions:
GET https://sampleapp.fili.org/v1/dimensions
Get a specific dimension:
GET https://sampleapp.fili.org/v1/dimensions/productRegion
Get a list of possible values for a dimension:
GET https://sampleapp.fili.org/v1/dimensions/productRegion/values
Additionally, the values for a dimension have some options for querying them:
- Pagination
- Format
- Filtering (All filters are supported)
For example, to get the 2nd page of User Countries with U in the description, with 5 entries per page, in JSON format:
GET https://sampleapp.fili.org/v1/dimensions/userCountry/values?filters=userCountry|desc-contains[U]&page=2&perPage=5&format=json
Metrics are the data points, and include things like Page Views, Daily Average Time Spent, etc. Since the metrics available depend on the particular table and time grain, you can discover the available metrics by querying the table you are interested in, as well as a metrics collection resource that lists all metrics supported by the system.
Get a list of all metrics:
GET https://sampleapp.fili.org/v1/metrics
Get a specific metric:
GET https://sampleapp.fili.org/v1/metrics/timeSpent
Tables are the connecting point that tell you what combination of Metrics and Dimensions are available, and at what time grain. For each table, and a specific time grain, there is a set of Metrics and Dimensions that are available on that table.
Get a list of all tables:
GET https://sampleapp.fili.org/v1/tables
Get a specific table:
GET https://sampleapp.fili.org/v1/tables/network/week
Filters allow you to constrain the data processed by the system. Depending on what resource is being requested, filters may constrain the rows in the response, or may constrain the data that the system is processing.
For non-Data resource requests, since there isn't any data aggregation happening, filters primarily exclude or include rows in the response.
For Data resource requests, however, filters primarily exclude or include raw data rows aggregated to produce a result. In order for filters on the Data resource requests to constrain the rows in the response, the request must have a Dimension Breakout along the dimension being filtered.
The Interval (or dateTime
) of a data query is the date/time range for data to include in the query. The interval is
expressed as a pair of start and stop instants in time, using the ISO 8601 format, where the start instant is inclusive
and the end instant is exclusive. It sounds complicated, but it's pretty straight-forward once you get the hang of it.
One important thing to note about intervals is that they must align to the time grain specified in the query. So, if you ask for monthly time grain, the interval must start and stop on month boundaries, and if you ask for weekly time grain, the interval must start and stop on a Monday, which is when our week starts and stops.
Time grain is the granularity or "bucket size" of each response row. Or, to look at it another way, time grain is the period over which metrics are aggregated. In particular, this matters a lot for metrics that are counting "unique" things, like Unique Identifier.
Defaulted granularities include second, minute, hour, day, week, month, quarter, year. The all granularity aggregates all data into a single bucket.
Data queries are the meat of the Fili API. The data resource allows for:
- Grouping by dimensions
- Grouping by a time grain
- Filtering by dimension values
- Performing Having filters on metric values
- Selecting metrics
- Reporting on a specific interval of time
- Sorting by a metric within a time grain
- Selecting the response format
Let's start by looking at the URL format, using an example:
GET https://sampleapp.fili.io/v1/data/network/week?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-09-01/2014-09-08
This basic query gives us network-level page views and daily average time spent data for one week. Let's break down the different components of this URL.
- https:// - The Fili API is only available over a secure connection, so HTTPS is required. HTTP queries will not work.
- sampleapp.fili.org - This is where the Fili API lives.
- v1 - The version of the API.
- data - This is the resource we are querying, and is the base for all data reports.
- network - Network is the table we are getting data from.
- week - The top-level reporting time grain of our results. Each row in our response will aggregate a week of data.
- metrics - The different metrics we want included in our response, as a comma-separated list of metrics. (Note: these are case-sensitive)
- dateTime - Indicates the interval that we are running the report over Interval Details.
Great, we've got the basics! But, what if we want to add a dimension breakout? Perhaps along the Product Region dimension?
GET https://sampleapp.fili.io/v1/data/network/week/productRegion?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-09-01/2014-09-08
The only difference is that we've added an additional path segment to the URL (productRegion
). All breakout dimensions
are added as path segments after the time grain path segment. To group by more dimensions,
just add more path segments!
So, if we wanted to also breakout by gender
in addition to breaking out by productRegion
(a 2-dimension breakout):
GET https://sampleapp.fili.io/v1/data/network/week/productRegion/gender?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-09-01/2014-09-08
Now that we can group by dimensions, can we filter out data that we don't want? Perhaps we want to see our global numbers, but excluding the US data?
GET https://sampleapp.fili.io/v1/data/network/week?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-09-01/2014-09-08&filters=productRegion|id-notin[Americas Region]
We've now added a filter to our query that excludes (notin
) rows that have the Americas Region
id for
the productRegion
dimension. We also removed the productRegion
grouping dimension we added earlier, since we wanted
to see the global numbers, not the per-region numbers.
This example also shows that we can filter by dimensions that we are not grouping on! Filters are very rich and
powerful, so take a look at the Filters section for more details. Oh, and one last thing about filters on
the Data resource: By default in
, notin
, eq
, startswith
, and contains
are supported, but startswith
and contains
may be disabled.
Now, the very last thing we need from our report: We need it [in CSV format](https://sampleapp.fili.io/v1/data/network/week?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-09-01/2014-09-08&filters=productRegion|id-notin[Americas Region]&format=csv), so that we can pull it into Excel and play around with it! No worries, the Fili API supports CSV!
GET https://sampleapp.fili.io/v1/data/network/week?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-09-01/2014-09-08&filters=productRegion|id-notin[Americas Region]&format=csv
For additional information about response format, take a look at the Format section!
Availability for a table(logical table) is defined as the maximal set of intervals that the table may be able to respond to. Or put another way, the table can certainly not respond to any intervals beyond the range marked available.
Constrained availability will be defined as the availability for a table given a set of query constraints.
Currently the tables/table/timeGrain response lacks an indication of the time range(s) for which the table has data (ie. can answer questions), but implementation is going on to expand the table resource so that it can take pretty much the same inputs that the data resource takes, and would use those inputs to constrain / restrict the available time ranges (and even available schema, etc.) of the logical table.
The constraints are:
- Grouping dimensions
- Metrics
- Filters
- Date/Time
For example:
GET https://sampleapp.fili.io/v1/tables/myTable/week/dim1/dim2?metrics=myMetric&filters=dim3|id-in[foo,bar]&dateTime=2014-09-01/2018-09-08
Which would result in a table response with the metrics, dimensions, and available intervals restricted down to the set
of items that are still "reachable" given the constraints in the query. So, if the table normally indicates that dim7
is one of it's dimensions, but there isn't a backing physical table for myTable
that has dim1
, dim2
, dim3
, and
myMetric
along with dim7
, then dim7
would not be in the dimension list returned in the response.
In the example above, query accepts an optional list of path separated grouping dimensions, an optional list of metrics, an optional filter clause, and an interval parameter. An example response could be
{
"category": "General",
"name": "shapes",
"longName": "shapes",
"timeGrain": "day",
"retention": "P1Y",
"description": "shapes",
"availableIntervals": ["2016-05-01 00:00:00.000/2017-05-27 00:00:00.000"],
"dimensions": [
{
"cardinality": "0",
"category": "General",
"name": "color",
"longName": "color",
"uri": "http://localhost:9998/dimensions/color"
}
],
"metrics": [
{
"category": "General",
"name":"rowNum",
"longName": "rowNum",
"uri": "http://localhost:9998/metrics/rowNum"
}
]
}
Note the line"availableIntervals": ["2016-05-01 00:00:00.000/2017-05-27 00:00:00.000"]
.
Many of the resources in the Fili API support different query options. Here are the different options that are supported, and how the use the options:
Pagination allows us to split our rows into pages, and then retrieve only the desired page. So rather than getting one giant response with a million result rows and then write code to extract rows 5000 to 5999 ourselves, we can use pagination to break the response into a thousand pages, each with a thousand result rows, and then ask for page 5.
At this point, only the Dimension and Data endpoints support pagination.
In addition to containing only the desired page of results, the response also contains pagination metadata. Currently, the dimension and data endpoints show different metadata, but there are plans to have the dimension endpoint display the same kind of metadata as the data endpoint.
To paginate a resource that supports it, there are two query parameters at your disposal:
-
perPage: How many result rows/resources to have on a page. Takes a positive integer as a parameter.
-
page: Which page to return, with
perPage
rows per page. Takes a positive integer as a parameter.
With these two parameters, we can, for example, get the 2nd page with 3 records per page:
GET https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=2
With all response formats, a link
header is added to the HTTP response. These are links to the first, last, next, and
previous pages with rel
attributes first
, last
, next
, and prev
respectively. If we use our previous example,
the link header in the response would be:
Link:
https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=1; rel="first",
https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=3; rel="last"
https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=3; rel="next",
https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=1; rel="prev",
There are, however, a few differences between pagination for Dimension and Data endpoints:
For JSON (and JSON-API) responses, a meta
object is included in the body of the response:
{
"meta": {
"pagination": {
"currentPage": 2,
"rowsPerPage": 3,
"numberOfResults": 7,
"first": "https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=1",
"previous": "https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=1",
"next": "https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=3",
"last": "https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews&dateTime=2014-09-01/2014-09-08&perPage=3&page=3"
}
}
}
The meta
object contains a pagination
object which contains links to the first
, last
, next
and previous
pages. The meta
object contains other information as well:
currentPage
: The page currently being displayedrowsPerPage
: The number of rows on each pagenumberOfResults
: The total number of rows in the entire result
Note: For the data endpoint, both the perPage
and page
parameters must be provided. The data endpoint has no
default pagination.
When paginating, the first
and last
links will always be present, but the next
and previous
links will only be
included in the response if there is at least 1 page either after or before the requested page. Or, to put it another
way, the response for the 1st page
won't include a link to the previous
page, and the response for the last page
won't include a link to the next
page.
Currently, the dimension endpoint only prints the previous
and next
links inside the top-level JSON object. It does,
however, include the same links in the headers as the data endpoint: first
, last
, next
and prev
.
Unlike the Data endpoint, the Dimension endpoint always paginates. It defaults to page 1, and 10000 rows per page. The
default rows per page is configurable, and may be adjusted by modifying the configuration default_per_page.
Note that default_per_page
applies only to the Dimension endpoint. It does not affect the Data endpoint.
-
perPage: Setting only the
perPage
parameter also gives a "limit" behavior, returning only the topperPage
rows.Example:
GET https://sampleapp.fili.io/v1/dimensions/productRegion/values?perPage=2
Note: This will likely change to not return "all" by default in a future version
-
page:
page
defaults to 1, the first page.Note: In order to use
page
, theperPage
query parameter must also be set.Example:
GET https://sampleapp.fili.io/v1/dimensions/productRegion/values?perPage=2&page=2
Some resources support different response formats. The default response format is JSON, and some resources also support the CSV and JSON-API formats.
To change the format of a response, use the format
query string parameter.
JSON: GET https://sampleapp.fili.io/v1/data/network/day/gender?metrics=pageViews&dateTime=2014-09-01/2014-09-02&format=json
{
"rows": [
{
"dateTime": "2014-09-01 00:00:00.000",
"gender|id": "-1",
"gender|desc": "Unknown",
"pageViews": 1681441753
},{
"dateTime": "2014-09-01 00:00:00.000",
"gender|id": "f",
"gender|desc": "Female",
"pageViews": 958894425
},{
"dateTime": "2014-09-01 00:00:00.000",
"gender|id": "m",
"gender|desc": "Male",
"pageViews": 1304365910
}
]
}
CSV: GET https://sampleapp.fili.io/v1/data/network/day/gender?metrics=pageViews&dateTime=2014-09-01/2014-09-02&format=csv
dateTime,gender|id,gender|desc,pageViews
2014-09-01 00:00:00.000,-1,Unknown,1681441753
2014-09-01 00:00:00.000,f,Female,958894425
2014-09-01 00:00:00.000,m,Male,1304365910
JSON-API: GET https://sampleapp.fili.io/v1/data/network/day/gender?metrics=pageViews&dateTime=2014-09-01/2014-09-02&format=jsonapi
{
"rows": [
{
"dateTime": "2014-09-01 00:00:00.000",
"gender": "-1",
"pageViews": 1681441753
},{
"dateTime": "2014-09-01 00:00:00.000",
"gender": "f",
"pageViews": 958894425
},{
"dateTime": "2014-09-01 00:00:00.000",
"gender": "m",
"pageViews": 1304365910
}
],
"gender": [
{
"id": "-1",
"desc": "Unknown"
},{
"id": "f",
"desc": "Female"
},{
"id": "m",
"desc": "Male"
}
]
}
The default naming formula can produce attachments with long, hard to parse names. Fili provides the filename
query
string parameter, which specifies a filename for the result attachment to be downloaded as. The format of the attachment
is determined by the format
parameter defined above. As such, do not provide a file extension to the filename
query.
For example, the query: GET https://sampleapp.fili.io/v1/data/network/day/gender?metrics=pageViews&dateTime=2014-09-01/2014-09-02&format=json&filename=data
downloads an attachment data.json
The presence of the filename
parameter indicates that the response should be downloaded as an attachment. Otherwise
the response is rendered by the browser. The exception to this is the CSV format, which is always downloaded.
Filters allow you to filter by dimension values. What is being filtered depends on the resource, but the general format for filters and their logical meaning is the same regardless of resource.
The general format of a single filter is:
dimensionName|dimensionField-filterOperation[some,list,of,url,encoded,filter,strings]
These filters can be combined by comma-separating each individual filter, and the filter strings are URL-encoded, comma-separated values:
myDim|id-contains[foo,bar],myDim|id-notin[baz],yourDim|desc-startswith[Once%20upon%20a%20time,in%20a%20galaxy]
These are the available filter operations (Though not all of them are supported by all endpoints):
- in:
In
filters are an exact match on a filter string, where only matching rows are included - notin:
Not In
filters are an exact match on a filter string, where all rows except matching rows are included - contains:
Contains
filters search for the filter string to be contained in the searched field, and work like anin
filter - startswith:
Starts With
filters search for the filter string to be at the beginning of the searched field, and work like anin
filter
Let's take an example, and break down what it means.
GET https://sampleapp.fili.io/v1/dimensions/productRegion/values?filters=productRegion|id-notin[Americas%20Region,Europe%20Region],productRegion|desc-contains[Region]
What this filter parameter means is the following:
Return dimension values that
don't have
productRegion dimension values
with an ID of "Americas Region" or "Europe Region",
and have
productRegion dimension values
with a description that contains "Region".
Having clauses allow you to to filter out result rows based on conditions on aggregated metrics. This is similar to, but distinct from Filtering, which allows you to filter out results based on dimensions. As a result, the format for writing a having clause is very similar to that of a filter.
The general format of a single having clause is:
metricName-operator[x,y,z]
where the parameters x, y, z
are numbers (integer or float) in decimal (3, 3.14159
) or scientific (4e8
) notation.
Although three numbers are used in the template above, the list of parameters may be of any length, but must be
non-empty.
These clauses can be combined by comma-separating individual clauses:
metricName1-greaterThan[w,x],metricName2-equals[y],metricName3-notLessThan[y, z]
which is read as return all rows such that metricName1 is greater than w or x, and metricName2 is equal to y, and metricName3 is less than neither y nor z.
Note that you may only perform a having filter on metrics that have been requested in the metrics
clause.
Following are the available having operators. Each operator has an associated shorthand. The shorthand is indicated in parenthesis after the name of the operator. Both the full name and the shorthand may be used in a query.
- equal(eq):
Equal
returns rows whose having-metric is equal to at least one of the specified values. - greaterThan(gt):
Greater Than
returns rows whose having-metric is strictly greater than at least one of the specified values. - lessThan(lt):
Less Than
returns rows whose having-metric is strictly less than at least one of the specified values.
Each operation may also be prefixed with not
, which negates the operation. So noteq
returns all the rows whose
having-metric is equal to none of the specified values.
Let's take an example and break down what it means.
GET https://sampleapp.fili.io/v1/data/network/day?metrics=pageViews,users&dateTime=2014-09-01/2014-09-08&having=pageViews-notgt[4e9],users-lt[1e8]
What this having clause means is the following:
Return the page views and users of all events aggregated at the day level from September 1 to
September 8, 2014 that
have at most 400 million page views
and
have strictly more than 100 million users
The having filter is only applied at the Druid level. Therefore, the results of a having filter are not guaranteed to be accurate if Fili performs any post-Druid calculations on one of the metrics being filtered on.
By default, a query's response will return the id and description for each dimension in the request. However, you may be
interested in more information about the dimensions, or less. To do this, you can specify a show
clause on the
relevant dimension path segment:
GET https://sampleapp.fili.io/v1/data/network/week/productRegion;show=desc/userCountry;show=id,regionId/?metrics=pageViews&dateTime=2014-09-01/2014-09-08
The results for this query will only show the description field for the Product Region dimension, and both the id and
region id fields for the User Country dimension. In general you add show
to a dimension with a semicolon, then
show=<fieldnames>
. Use commas to separate in multiple fields in the same show clause:
/<dimension>;show=<field>,<field>,<field>
There are a couple of keywords that can be used when selecting fields to show
:
- All: Include all fields for the dimension in the response
- None: Include only the key field in the dimension.
The none
keyword also simplifies the response to keep the size as small as possible. The simplifications applied to
the response depend on the format of the response:
Instead of the normal format for each requested field for a dimension ("dimensionName|fieldName":"fieldValue"
), each
record in the response will only have a single entry for the dimension who's value is the value of the key-field for
that dimension ("dimensionName":"keyFieldValue"
)
Instead of the normal header format for each requested field for a dimension ("dimensionName|fieldName":"fieldValue"
),
the headers of the response will only have a single entry for the dimension, which will be the dimension's name. The
values of the column for that dimension will be the key-field for that dimension.
The none
keyword for a dimension prevents the sidecar object for that dimension from being included in the response.
Sorting of the records in a response can be done
using the sort
query parameter like so:
sort=myMetric
By default, sorting is descending, however Fili supports sorting in both descending or ascending order. To specify the
sort direction for a metric, you need to specify both the metric you want to sort on, as well as the direction,
separated by a pipe (|
) like so:
sort=myMetric|asc
Sorting by multiple metrics, with a mixture of ascending and descending, can also be done by separating each sort with a comma:
sort=metric1|asc,metric2|desc,metric3|desc
There are, however, a few catches to this:
- Only the Data resource supports sorting
- Records are always sorted by
dateTime
first, and then by any sorts specified in the query, so records are always sorted within a timestamp - Sort is only supported on Metrics
- Sorting is only applied at the Druid level. Therefore, the results of a sort are not guaranteed to be accurate if Fili performs any post-Druid calculations on one of the metrics that you are sorting on.
Suppose we would like to know which three pages have the top three pageview counts for each week between January and
September 2014. We can easily answer such a question with a topN
query. topN
queries allow us to ask for the top
results for each time bucket, up to a limit n
in a request. Of course, a topN
query implies that some sort of ordering has been imposed
on the data. Therefore, a topN
query has two components:
topN=n
wheren
is the number of results to return for each bucketsort=metricName|dir
telling Fili how to sort the results before filtering down to the top N. See the section on sorting for more details about the sort clause.
Going back to the sample question at the start of the section, let's see how that looks as a Fili query:
GET https://sampleapp.fili.io/v1/data/network/week/pages?metrics=pageViews&dateTime=2014-06-01/2014-08-31&topN=3&sort=pageViews|desc
We want the three highest pageview counts for each week, so n
is set to three, and the query is aggregated to the week
granularity. Furthermore, we want the three largest pagecounts. Therefore, we sort pageViews
in descending order (the
first entry is the highest, the second entry is the lowest, and so on).
Fili also supports asking for metrics in addition to the one being sorted on in topN
queries. Suppose we want to know the daily average time
spent (dayAvgTimeSpent
) on the three pages with the highest number of page views for each week between January 6th and September
1st.
Thus, we can add dayAvgTimeSpent
to our original topN
query:
GET https://sampleapp.fili.io/v1/data/network/week/pages?metrics=pageViews,dayAvgTimeSpent&dateTime=2014-06-01/2014-08-31&topN=3&sort=pageViews|desc
When executing a topN
query with multiple metrics, Fili will compute the top N results using the sorted metric only.
Remember that topN
provides the top N results for each time bucket. Therefore, when we ask a topN
query, we will
get n * numBuckets
results. In both of the examples above, we would get 3 * 34 = 102
results. If you are only
interested in the first n
results, see pagination/limit.
Fili supports asynchronous data queries. A new parameter asyncAfter
is added on data queries. The asyncAfter
parameter will control whether a data query should always be synchronous, or transition from synchronous to asynchronous
on the fly. If asyncAfter=never
then Fili will wait indefinitely for the data, and hold the connection with the
client open as long as allowed by the network. This will be the default. However, the default behavior of asyncAfter
may be modified by setting the default_asyncAfter
configuration parameter. If asyncAfter=always
, the query is
asynchronous immediately. If asyncAfter=t
for some positive integer t
, then at least t
milliseconds will pass
before the query becomes asynchronous. Note however that the timing is best effort. The query may take longer than
t
milliseconds and still be synchronous. In other words, asyncAfter=0
and asyncAfter=always
do not mean the same
thing. It is possible for asyncAfter=0
to return the query results synchronously (this may happen if the results come
back sufficiently fast). It is impossible for the query results to return synchronously if asyncAfter=always
.
If the timeout passes, and the data has not come back, then the user receives a 202 Accepted
response and the
job meta-data.
The jobs endpoint is the one stop shop for queries about asynchronous jobs. This endpoint is responsible for:
- Providing a list of all jobs in the system.
- Providing the status of a particular job queried via the
jobs/TICKET
resource. - Providing access to the results via the
jobs/TICKET/results
resource.
A user may get the status of all jobs by sending a GET
to jobs
endpoint.
https://HOST:PORT/v1/jobs
If no jobs are available in the system, an empty collection is returned.
The jobs
endpoint supports filtering on job fields (i.e. userId
, status
), using the same syntax as the
data endpoint filters. For example:
userId-eq[greg, joan], status-eq[success]
resolves into the following boolean operation:
(userId = greg OR userId = joan) AND status = success
which will return only those Jobs created by greg
and joan
that have completed successfully.
When the user sends a GET
request to jobs/TICKET
, Fili will look up the specified ticket and return the job's
meta-data as follows:
{
"query": "https://HOST:PORT/v1/data/QUERY",
"results": "https://HOST:PORT/v1/jobs/TICKET/results",
"syncResults": "https://HOST:PORT/v1/jobs/TICKET/results?asyncAfter=never",
"self": "https://HOST:PORT/v1/jobs/TICKET",
"status": ONE OF ["pending", "success", "error"],
"jobTicket": "TICKET",
"dateCreated": "ISO 8601 DATETIME",
"dateUpdated": "ISO 8601 DATETIME",
"userId": "Foo"
}
query
is the original queryresults
provides a link to the data, whether it is fully synchronous or switches from synchronous to asynchronous after a timeout depends on the default setting ofasyncAfter
.syncResults
provides a synchronous link to the data (note theasyncAfter=never
parameter)self
provides a link that returns the most up-to-date version of this jobstatus
indicates the status of the results pointed to by this ticketpending
- The job is being worked onsuccess
- The job has been completed successfullyerror
- The job failed with an errorcanceled
- The job was canceled by the user (coming soon)
jobTicket
is a unique identifier for the jobdateCreated
is the date on which the job was createddateUpdated
when the job's status was last updateduserId
is an identifier for the user who submitted this job
If the ticket is not available in the system, we get a 404 error with the message No job found with job ticket TICKET
The user may access the results of a query by sending a GET
request to jobs/TICKET/results
. This
resource takes the following parameters:
-
format
- Allows the user to specify a response format, i.e. csv, or JSON. This behaves just like theformat
parameter on queries sent to thedata
endpoint. -
filename
- Allows the user to specify a filename for the result to be downloaded as. This behaves just like thefilename
parameter on queries sent to thedata
endpoint. -
page
,perPage
- The pagination parameters. Their behavior is the same as when sending a query to thedata
endpoint, and allow the user to get pages of the results. -
asyncAfter
- Allows the user to specify how long they are willing to wait for results from the result store. Behaves like theasyncAfter
parameter on thedata
endpoint.
If the results for the given ticket are ready, we get the results in the format specified. Otherwise, we get the job's metadata.
If clients wish to long poll for the results, they may send a GET
request to
https://HOST:PORT/v1/jobs/TICKET/results?asyncAfter=never
(the query linked to under the
syncResults
field in the async response). This request will perform like a synchronous query: Fili
will not send a response until all of the data is ready.
The date interval is specified using the dateTime
parameter with two terms, seperated by a /
. They describe an inclusive/exclusive interval. dateTime=d1/d2
. For example, dateTime=2015-10-01/2015-10-03
will return the data rows for October 1st and 2nd, but not the 3rd.
Dates can be one of:
- ISO 8601 formatted date
- Date macros (see below)
Non date terms can include:
- ISO 8601 Periods
- iCalendar Recurrence rules (RFC 2445) (can only be used as the second term)
We have followed the ISO 8601 standards as closely as possible in the API. Wikipedia has a great article on ISO 8601 dates and times if you want to dig deep.
Date macros can use context such as the grain of the request to resolve their intended meaning.
We have created two default macros. The one named current
translates to the beginning of the current time grain period. For example, if your time grain is day
, then current
will resolve to the starts of the current calendar date. If your query time grain is month
, then current
will resolve to the first of the current month.
There is also a similar macro named next
which resolves to one time grain period the date resolved by current. For example, if your time grain is day
, then, next
will resolve to the next midnight.
Date Periods have been implemented in accordance with the ISO 8601 standard. Briefly, a period is specified by the letter P
, followed by a number and then a timegrain (M=month,W=week,D=day,etc). For example, if you wanted 30 days of data, you would specify P30D
. The number and period may be repeated, so P1Y2M
is an interval of one year and two months.
This period can take the place of either the start or end date in the query.
If the first term is a date, the second one can be an iCalendar recurrence rule. (https://www.ietf.org/rfc/rfc2445.txt)
The syntax for an rrule interval is datetime=DATE/RRULE=FREQ=monthly...
RRULE syntax is very complex and rich, but the most important details: They must start with a frequency. They can contain a count indicating how many repetitions or a due date specifying a final date.
When using monthly frequency, the parameter bydayofMonth
cab be specified to select days within the month. If the value is positive it indicates a day starting from the start of the month (e.g. bymonthday=7,14 would retrieve the seventh and fourteenth days of a month).
Negative values count backwards from the end of the month, so -1 indicates the last day of a month.
Similar syntax applies to days of week, month of year, or week of month.
Example: 2019-08-01/RRULE=FREQ=monthly;bymonthday=-1;COUNT=3
-> Evaluates to a set of three intervals, containing the last day of the month for three consecutive months starting with the last day of August 2019. The period of these intervals is inferred from the granularity of the query.
Time zone cannot be specified within the interval terms. Instead, the time zone of a query can be changed via the timeZone
query parameter. This changes the time zone in which the intervals specified in the dateTime
are interpreted. By default, the query will use the default time zone of the API, but any time zone identifier can be specified to override that. For example, specifying query parameters
dateTime=2016-09-16/2016-09-17&timeZone=America/Los_Angeles
vs
dateTime=2016-09-16/2016-09-17&timeZone=America/Chicago
will result in the intervals resolving to
2016-09-16 00:00:00-07:00/2016-09-17 00:00:00-07:00
and
2016-09-16 00:00:00-05:00/2016-09-17 00:00:00-05:00
Note the hour offsets on the interval instants.
Everything in the API is case-sensitive, which means pageViews
is not the same as pageviews
is not the same as
PaGeViEwS
.
To prevent abuse of the system, the API only allows each user to have a certain number of data requests being processed at any one time. If you try to make another request that would put you above the allowed limit, you will be given an error response with an HTTP response status code of 429.
There are a number of different errors you may encounter when interacting with the API. All of them are indicated by the HTTP response status code, and most of them have a helpful message indicating what went wrong.
Code | Meaning | Cause |
---|---|---|
400 | BAD REQUEST | You have some sort of syntax error in your request. We couldn't figure out what you were trying to ask. |
401 | UNAUTHORIZED | We don't know who you are, so send your request again, but tell us who you are. Usually this means you didn't include proper security authentication information in your request |
403 | FORBIDDEN | We know who you are, but you can't come in. |
404 | NOT FOUND | We don't know what resource you're talking about. You probably have a typo in your URL path. |
416 | REQUESTED RANGE NOT SATISFIABLE | We can't get that data for you from Druid. |
422 | UNPROCESSABLE ENTITY | We understood the request (ie. syntax is correct), but something else about the query is wrong. Likely something like a dimension mis-match, or a metric / dimension not being available in the logical table. |
429 | TOO MANY REQUESTS | You've hit the rate limit. Wait a little while for any requests you may have sent to finish and try your request again. |
500 | INTERNAL SERVER ERROR | Some other error on our end. We try to catch and investigate all of these, but we might miss them, so please let us know if you get 500 errors. |
502 | BAD GATEWAY | Bad Druid response. |
503 | SERVICE UNAVAILABLE | Druid can't be reached. |
504 | GATEWAY TIMEOUT | Druid query timeout. |
507 | INSUFFICIENT STORAGE | Too heavy of a query. |