The front end query language is rather remotely related to SQL, which should make it simple to pick up. The focus is on querying one metric at a time, given that those queries are incredible fast. Globs can also be used to query multiple metrics at a time.
The basic syntax looks like this:
SELECT <FIELDS> [ FROM <ALIASES>] <TIME RANGE> [IN <RESOLUTION>]
Please keep in mind that each query can only return a single row of data at the moment.
A field can either be a metric, an alias for a metric or a function:
cloud.zones.cpu.usage.eca485cf-bdbb-4ae5-aba9-dce767 BUCKET tachyon
- a fully qualified metric.vm
- an alias that is defined in theFROM
section of the query.avg(vm, 1m)
- a aggregation function.
Fields can be aliased for output adding a AS <alias>
directive after the field. The alias can be quoted, unless a keyword is used as alias (such as min
) in which case it must be quoted.
Multiple fields can be given separating two fields with a ,
. The resolution of fields does not have to be the same, and there is no validation to enforce this!
When a metric is used multiple times it is more readable to alias this metric in the FROM
section. Multiple elements can be given separated with a ,
. Each element takes the form: <metric> BUCKET <bucket> AS <alias>
.
It is possible to match multiple metrics by using a mulitget aggregator and a glob to match a metric. Valid multiget aggregators are sum
and avg
. For example: sum(some.metric.* BUCKET b)
.
There are two ways to declare ranges. Although numbers here represent seconds, DalmatinerDB does not care about the time unit at all:
BETWEEN <time:time> AND <time:time>
The above statement selects all points between the start
and the end
.
The most used query is 'what happened in the past X seconds?', so there is a simplified form for this:
LAST <amount:int>|<time:timerange>
Alternatively start / end and a duration can be provided:
BEFORE <time:time> FOR <amount:int>|<time:timerange>
To select a range prior to a specific point in time or:
AFTER <time:time> FOR <amount:int>|<time:timerange>
to select a range after a specific point in time.
By default queries treat incoming data as a one second resolution, however this can be adjusted by passing a resolution section to the query. The syntax is: IN <resolution:time>.
A simple number literal i.e. 42.
There are two ways to declare times:
- Relatively, in which case the time is a simple integer and corresponds to a number of metric points used. (i.e.
60
) - Absolute, in which case the time is an integer followed by a time unit such as
ms
,s
,m
,h
,d
andw
. In this case the resolution of the metric is taken into account.
Relative times can either be the keyword NOW
, an absolute timestamp (a simple integer) or a relative time in the past such as <time> AGO
.
Also it is possible to give absolute dates. Dates are provided in a "
quoted string, they have the form YYYY-MM-DD HH:MM:SS
however some simplifications are possible (minutes and seconds can be skipped), and adding a timezone is also an option.
For convenience a metric is often represented as a concatenation of simple strings that are separated by dots and a second string for the bucket. The two strings are separated by the keyword BUCKET
.
Example:
cloud.zones.cpu.usage.eca485cf-bdbb-4ae5-aba9-dce767 BUCKET tachyon
However internally DalmatinerDB uses a stricter format that allows storing any string inside a metric. To access those metrics you need to wrap each part of it in quotes ('
). For example a metric for an IP could be written as follows:
'10.0.0.1'.in.'pkgs/s'
Which would be a metric consisting of three parts, 10.0.0.1
, in
and pkgs/s
. Note that the in
does not need to be quoted but could be.
Aggregation functions aggregate a metric over a given range of time and decrease the resolution by doing so. Aggregation functions can be nested, in which case the 'higher' functions work with the decreased resolution of lower functions and not the raw resolution. This means the correct code to get the 1m average over 10s sums from a 1s resolution metric would be avg(sum(m, 10s), 1m) not avg(sum(m, 10s), 6s) - however this does not apply when using the point and not the time declaration, so it would be: avg(sum(m, 10s), 6) not avg(sum(m, 10s), 60) (please note the missing s).
The minimal value over a given range of time.
The maximal value over a given range of time.
The sum of all values of a time-range.
The average of a time-range (this is the mean not the median).
Returns the total of empty data-points in a time-range. This can be used to indicate the precision of the data and the loss occurring before they get stored.
Returns the value of the n
th percentile, where 0 < n
< 1. The percentile is given as the second value of the function, the time-range to aggregate over as the third.
Manipulation functions help to change the values of a value list they do not change the resolution or aggregate multiple values into one.
Calculates the derivate of a metric, meaning N'(X)=N(X) - N(X-1)
Note
Even if the resolution isn't changed this function removes exactly 1 element from the result.
Multiplies each element with integer constant.
Divides each element with a integer constant.
Calculates the min, max and average of a metric over a hour:
SELECT min(vm, 10m), avg(vm, 10m), max(vm, 10m) AS 'max' FROM cloud.zones.cpu.usage.eca485cf-bdbb-4ae5-aba9-dce767 BUCKET tachyon AS vm LAST 60m