category |
---|
API Reference |
Piwik persists two main types of data:
- log data: raw analytics data that Piwik receives in the tracker
- archive data: aggregated analytics data (constructed from log data) that is cached and used to build reports
Piwik also persists other simpler forms of data including:
- websites
- users
- goals
- options
Note: Piwik uses PHP arrays to hold persisted data. In the descriptions below we list properties by the name used in the PHP array.
Did you know? You can extend the database with a plugin.
There are four types of log data:
- visits
- action types
- conversions
- ecommerce items
All log data is persisted in a similar way: new data is constantly added to the set at high volume and updates are non-existent, except for visits.
Visit data is updated while visits are active. So until a visit ends it is possible that Piwik will try and update it.
Log data is read when calculating analytics data and old data will sometimes be deleted (via the data purging feature).
Backends must ensure that inserting new log data is as fast as possible and aggregating log data is not too slow (though obviously, faster is better).
Visits are stored in the log_visit
table.
Each visit contains the following information:
idsite
: the ID of the the website it was tracked foridvisitor
: a visitor ID (an 8 byte binary string)visitor_localtime
: the visit datetime in the visitor's time of dayvisitor_returning
: whether the visit is the first visit for this visitor or notvisitor_count_visits
: the number of visits the visitor has made up to this onevisitor_days_since_last
: the number of days since this visitor's last visit (if any)visitor_days_since_order
: the number of days since this visitor's last order (if any)visitor_days_since_first
: the number of days since this visitors' first visitvisit_first_action_time
: the datetime of the visit's first actionvisit_last_action_time
: the datetime of the visit's last actionvisit_exit_idaction_url
: the ID of the URL action type of the visit's last actionvisit_exit_idaction_name
: the ID of the page title action type of the visit's last actionvisit_entry_idaction_url
: the ID of the URL action type of the visit's first actionvisit_entry_idaction_name
: the ID of the page title action type of this visit's first actionvisit_total_actions
: the count of actions performed during this visitvisit_total_searches
: the count of site searches performed during this visitvisit_total_events
: the count of custom events performed during this visitvisit_total_time
: the total elapsed time of the visitvisit_goal_converted
: whether this visit converted a goal or notvisit_goal_buyer
: whether the visitor ordered something during this visit or notreferer_type
: the type of this visitor's referrer. Can be one of the following values:- Common::REFERRER_TYPE_DIRECT_ENTRY = 1: If set to this value, other
referer_...
fields have no meaning. - Common::REFERRER_TYPE_SEARCH_ENGINE = 2: If set to this value,
referer_url
is the url of the search engine andreferer_keyword
is the keyword used (if we can find it). - Common::REFERRER_TYPE_WEBSITE = 3: If set to this value,
referer_url
is the url of the website. - Common::REFERRER_TYPE_CAMPAIGN = 6: If set to this value,
referer_name
is the name of the campaign.
- Common::REFERRER_TYPE_DIRECT_ENTRY = 1: If set to this value, other
referer_name
: referrer name; its meaning depends on the specific referrer typereferer_url
: the referrer URL; its meaning depends on the specific referrer typereferer_keyword
: the keyword used if a search engine was the referrerconfig_id
: a hash of all the visit's configuration options, including the OS, browser name, browser version, browser language, IP address and all browser plugin informationconfig_os
: a short string identifiying the operating system used to make this visit. See UserAgentParser for more infoconfig_browser_name
: a short string identifying the browser used to make this visit. See UserAgentParser for more infoconfig_browser_version
: a string identifying the version of the browser used to make this visitconfig_resolution
: a string identifying the screen resolution the visitor used to make this visit (eg,'1024x768'
)config_pdf
: whether the visitor's browser can view PDF files or notconfig_flash
: whether the visitor's browser can view flash files or notconfig_java
: whether the visitor's browser can run Java or notconfig_director
:config_quicktime
: whether the visitor's browser uses quicktime to play media files or notconfig_realplayer
: whether the visitor's browser can play realplayer media files or notconfig_windowsmedia
: whether the visitor's browser uses windows media player to play media filesconfig_gears
:config_silverlight
: whether the visitor's browser can run silverlight programs or notconfig_cookie
: whether the visitor's browser has cookies enabled or notlocation_ip
: the IP address of the computer that the visit was made from. Can be anonymizedlocation_browser_lang
: a string describing the language used in the visitor's browserlocation_country
: a two character string describing the country the visitor was located in while visiting the site. Set by the UserCountry plugin.location_region
: a two character string describing the region of the country the visitor was in. Set by the UserCountry plugin.location_city
: a string naming the city the visitor was in while visiting the site. Set by the UserCountry plugin.location_latitude
: the latitude of the visitor while he/she visited the site. Set by the UserCountry plugin.location_longitude
: the longitude of the visitor while he/she visited the site. Set by the UserCountry plugin.custom_var_k1
: the custom variable name of the visit in the first slot for visit custom variables.custom_var_v1
: the custom variable value of the visit in the first slot for visit custom variables.custom_var_k2
: the custom variable name of the visit in the second slot for visit custom variables.custom_var_v2
: the custom variable value of the visit in the second slot for visit custom variables.custom_var_k3
: the custom variable name of the visit in the third slot for visit custom variables.custom_var_v3
: the custom variable value of the visit in the third slot for visit custom variables.custom_var_k4
: the custom variable name of the visit in the fourth slot for visit custom variables.custom_var_v4
: the custom variable value of the visit in the fourth slot for visit custom variables.custom_var_k5
: the custom variable name of the visit in the fifth slot for visit custom variables.custom_var_v5
: the custom variable value of the visit in the fifth slot for visit custom variables.
Some plugins, such as the Provider plugin, will add new information to visits.
The index_idsite_config_datetime
index is used when trying to recognize returning visitors.
The index_idsite_datetime
index is used when aggregating visits. Since log aggregation occurs only for individual day periods, this index helps Piwik find the visits for a website and period quickly. Without it, log aggregation would require a table scan through the entire log_visit
table.
Visits also contain a list of actions, one for each action the visitor makes during the visit. Those are stored in the log_link_visit_action
table.
Visit actions contain the following information:
server_time
: the datetime the action was tracked in the UTC timezoneidaction_url
: the ID of the URL action type for this actionidaction_url_ref
: the ID of the URL action type for the previous action in the visitidaction_name
: the ID of the page title action type for this actionidaction_name_ref
: the ID of the page title action type for the previous action in the visittime_spent_ref_action
: the amount of time spent doing the previous actioncustom_var_k1
: the custom variable name of the first slot for page custom variablescustom_var_v1
: the custom variable value of the first slot for page custom variablescustom_var_k2
: the custom variable name of the second slot for page custom variablescustom_var_v2
: the custom variable value of the second slot for page custom variablescustom_var_k3
: the custom variable name of the third slot for page custom variablescustom_var_v3
: the custom variable value of the third slot for page custom variablescustom_var_k4
: the custom variable name of the fourth slot for page custom variablescustom_var_v4
: the custom variable value of the fourth slot for page custom variablescustom_var_k5
: the custom variable name of the slot for page custom variablescustom_var_v5
: the custom variable value of the slot for page custom variablescustom_float
: an unspecified float field, usually used to hold the time it took the server to serve this action
The idsite
and idvisitor
columns are copied from the visit action's associated visit in order to avoid having to join the log_visit table in some cases.
The index_idvisit
index allows Piwik to quickly query the visit actions for a visit.
The index_idsite_servertime
index is used when aggregating visit actions. It allows quick access to the visit actions that were tracked for a specific website during a specific period and lets us avoid a table scan through the whole table.
Action types, such as a specific URL or page title, are analyzed as well as visits. Such analysis can lead to an understanding of, for example, which pages are more relevant to visitors than others.
When Piwik encounters a new action type, a new action type entity is persisted.
Action types are persisted in the log_action
table and contain the following information:
name
: a string describing the action type. Can be a URL, a page title, campaign name or anything else. The meaning is determined by thetype
field.hash
: a hash value calculated using the name.type
: the action type's category. Can be one of the following values:- Piwik\Tracker\Action::TYPE_PAGE_URL = 1: the action is a URL to a page on the website being tracked.
- Piwik\Tracker\Action::TYPE_OUTLINK = 2: the action is a URL is of a link on the website being tracked. A visitor clicked it.
- Piwik\Tracker\Action::TYPE_DOWNLOAD = 3: the action is a URL of a file that was downloaded from the website being tracked.
- Piwik\Tracker\Action::TYPE_PAGE_TITLE = 4: the action is the page title of a page on the website being tracked.
- Piwik\Tracker\Action::TYPE_ECOMMERCE_ITEM_SKU = 5: the action is the SKU of an ecommerce item that is sold on the site.
- Piwik\Tracker\Action::TYPE_ECOMMERCE_ITEM_NAME = 6: the action is the name of an ecommerce item that is sold on the site.
- Piwik\Tracker\Action::TYPE_ECOMMERCE_ITEM_CATEGORY = 7: the action is the name of an ecommerce item category that is used on the site.
- Piwik\Tracker\Action::TYPE_SITE_SEARCH = 8: the action type is a site search action.
- Piwik\Tracker\Action::TYPE_EVENT_CATEGORY = 10: the action is an event category (see Tracking Events user guide)
- Piwik\Tracker\Action::TYPE_EVENT_ACTION = 11: the action is an event category
- Piwik\Tracker\Action::TYPE_EVENT_NAME = 12: the action is an event name
- Piwik\Tracker\Action::TYPE_CONTENT_NAME = 13: the action is a content name (see Content Tracking user guide and developer guide)
- Piwik\Tracker\Action::TYPE_CONTENT_PIECE = 14: the action is a content piece
- Piwik\Tracker\Action::TYPE_CONTENT_TARGET = 15: the action is a content target
- Piwik\Tracker\Action::TYPE_CONTENT_INTERACTION = 16: the action is a content interaction
url_prefix
: if the name is a URL this refers to the prefix of the URL. The prefix is removed from actual URLs so the protocol and www. parts of a URL are ignored during analysis. Can be the following values:0
:'http://'
1
:'http://www.'
2
:'https://'
3
:'https://www.'
The index_type_hash
index is used during tracking to find existing action types.
When a visit action is tracked that matches a goal's conversion parameters, a conversion is created and persisted. A conversion is a tally that counts a desired action that one of your visitors took. Piwik will analyze these tallies in conjunction with the actions that caused them in order to help Piwik users understand how to make their visitors take more desired actions.
Conversions are stored in the log_conversion
table and consist of the following information:
idvisit
: the ID of the visit that caused this conversionidsite
: the ID of the site this conversion is foridvisitor
: the ID of the visitor that caused this conversionserver_time
: the datetime of the conversion in the UTC timezoneidaction_url
: the ID of the URL action type of the visit action that caused this conversionidlink_va
: the ID of the specific visit action that resulted in this conversionreferer_visit_server_date
:url
: the URL that caused this conversion to be trackedidgoal
: the ID of the goal this conversion is foridorder
: if this conversion is for an ecommerce order or abandoned cart, this will be the order's IDitems
: if this conversion is for an ecommerce order or abandoned cart, this will be the number of items in the order/cartrevenue
: if this conversion is for an ecommerce order or abandoned cart, this is the total revenue generated by the orderrevenue_subtotal
: if this conversion is for an ecommerce order or abandoned cart, this is the total cost of the items in the order/cartrevenue_tax
: if this conversion is for an ecommerce order or abandoned cart, this is the total tax applied to the items in the order/cartrevenue_shipping
: if this conversion is for an ecommerce order or abandoned cart, this is the total cost of shippingrevenue_discount
: if this conversion is for an ecommerce order or abandoned cart, this is the total discount applied to the order
All extra information stored in the table that is not listed above is replicated from the Visit entity this conversion is for. This allows us to avoid joining the log_visit
table in certain cases.
The index_idsite_datetime
index is used when aggregating conversions. It allows quick access to the conversions that were tracked for a specific website during a specific period and lets us avoid a table scan through the entire table.
An ecommerce item is an item that was sold in an ecommerce order or abandoned in an abandoned cart.
Ecommerce items are stored in the log_conversion_item
table and consist of the following information:
server_time
:idorder
: the ID of the order that this ecommerce item is a part ofidaction_sku
: the ID of the action type entity that contains the item's SKUidaction_name
: the ID of the action type entity that contains the ecommerce item's nameidaction_category
: the ID of an action type entity that contains a category for this ecommerce itemidaction_category2
: the ID of an action type entity that contains a category for this ecommerce itemidaction_category3
: the ID of an action type entity that contains a category for this ecommerce itemidaction_category4
: the ID of an action type entity that contains a category for this ecommerce itemidaction_category5
: the ID of an action type entity that contains a category for this ecommerce itemprice
: the price of this individual ecommerce itemquantity
: the amount of this item that were present in the associated ecommerce orderdeleted
: whether this item was removed from the order or not
The idsite
, idvisitor
, server_time
and idvisit
columns are copied from the Conversion entity this Ecommerce Item belongs to. They are copied so we can aggregate Ecommerce Items without having to join other tables.
The index_idsite_servertime
index is used when aggregating ecommerce items. It allows quick access to the items that were tracked for a specific website and during a specific period and lets us avoid a table scan through the entire table.
Archive data consists of metrics and reports. Metrics are numeric values and are stored as such. Reports are stored in DataTable instances and persisted as compressed binary strings.
Archive data is associated with the website ID, period and segment it is for along with the data's identifying name. All archive data will be queried many times by this information. Currently, the segment is hashed and attached to the end of the metric name. Archive data is also persisted with the current date and time so it is possible to know how old some data is.
All archive data will contain the following information:
idarchive
: An ID that is shared with all pieces of archive data that were archived with the same website ID, period and segment.name
: The name of the report or metric. If a segment is used, a hash of the segment is appended to the name.idsite
: The ID of the website this archive data is for.date1
: The first date in the period this archive data is for.date2
: The last date in the period this archive data is for.period
: The type of period this archive data is for. Can be one of the following values:1
: for day periods.2
: for week periods.3
: for month periods.4
: for year periods.5
: for range periods.
ts_archived
: The datetime the archive data was cached.value
: Either a numeric value (for a metric) or a binary string (for a report).
Archive data is stored in tables partitioned by months, and missing tables are created automatically. Reports that aggregate visits from January 2012 will be held in a different table from reports that aggregate visits from February 2012.
Piwik creates two types of archive tables, one for each type of archive data. The archive_numeric
tables store metric data and the archive_blob
tables store report data. Tables are suffixed with the year and the month: for example the archive_numeric
table for January 2012 would be named archive_numeric_2012_01
.
In archive_numeric
tables:
- the
index_idsite_dates_period
index is used when querying archive data. It lets Piwik quickly query archive data for any site and period, and for data that was archived past a certain date-time. - the
index_period_archived
index is used when purging archive data. It allows Piwik to quickly find archive data for a specific period that is old enough to be purged.
In archive_blob
tables:
- the
index_period_archived
index is used in the same way as the one inarchive_numeric
tables archive_blob
tables do not have an index that makes it fast to query for rows by site, period and archived date. This is because they should not be queried this way. Instead, thearchive_numeric
table should be queried and theidarchive
values saved. These values can be used to query data in thearchive_blob
table.
Site entities contain information regarding a website whose visits are tracked. There won't be nearly as many of these as there are visits and archive data entries, but they will be queried often.
Every reporting request (either through the Reporting API or through Piwik's UI) will query one or more site entities. The tracker will only query site data if the tracker cache needs to be updated. For most tracking requests, site data will not be queried (thus resulting in greater performance for the tracker).
Site entities are stored in the site
table and contain the following information:
idsite
: the unique ID of the website.name
: the name of the website.main_url
: the main URL visitors should use to access the website.ts_created
: the date & time the site entity was persisted.ecommerce
:1
if the site is an ecommerce site,0
if not.sitesearch
:1
if the site contains an internal search feature,0
if not.sitesearch_keyword_parameters
: the query parameters the site uses to hold internal site search keywords. This is a comma separated list.sitesearch_category_parameters
: the query parameters the site uses to hold internal site search categories. This is a comma separated list.timezone
: the timezone of the website.currency
: the currency the website uses. Only valid if the site is an ecommerce site.excluded_ips
: a comma separated list of IP addresses or IP address ranges. Visits that come from one of these IP addresses will not be tracked for this website.excluded_parameters
: a comma separated list of query parameter names. These query parameters will be removed from page URLs before visits and actions are tracked.excluded_user_agents
: a comma separated list of strings. Visits with a user agent that contains one of these strings will not be tracked for this website.group
:keep_url_fragment
:1
if the URL fragment (everything after the#
) should be kept in the URL when tracking actions,0
if not.
Site entities also contain a list of extra URLs that can be used to access the website. These are not stored within site entities themselves: they are stored in the site_url
table.
Site entity data access occurs primarily through the Piwik\Site class. Anything that cannot be queried through that class can be queried through the SitesManager core plugin.
Each site has an optional list of goals. A goal is a desired action that a website visitor should take.
Goals are stored in the goal
table and contain the following information:
idsite
: The ID of the website this goal belongs to.idgoal
: The ID for this goal (unique only among goals for this website).name
: The name of this goal.match_attribute
: string describing what part of the request should be matched against when converting a goal. Can be one of the following values:manually
: the goal is converted by manual conversion requests.url
: the goal is converted based on what the action URL contains.title
: the goal is converted based on what the action page title contains.file
: the goal is converted based on what the filename of a downloaded file contains.external_website
: the goal is converted based on what the URL of an outlink contains.
pattern
: the pattern to use when checking if a goal is converted.pattern_type
: the type of pattern matching to use when checking if a goal is converted.contains
: the goal is converted if the match attribute contains the pattern.exact
: the goal is converted if the match attribute equals the pattern exactly.regex
: the goal is converted if the match attribute is a regex match with the pattern.
case_sensitive
:1
if the matching should be case sensitive,0
if otherwise.allow_multiple
:1
if multiple conversions are allowed per visit,0
if otherwise.revenue
: the amount of revenue a conversion generates (if any).deleted
:1
if this goal was deleted by a Piwik user,0
if not.
Note: The ecommerce and abandoned cart goals are two special goals with special IDs. Ecommerce websites automatically have these goals.
User entities describe each Piwik user except the Super User. They are persisted in the users
table.
The following information is stored in a user entity:
login
: he user's login handle.password'
: a hash of the user's password.alias
: the user's alias if any. This value is displayed instead of the login handle when addressing the user in the UI.email
: the user's email address.token_auth
: a user's token auth.date_registered
: the date the user data was persisted.
User data is read on every UI and Reporting API request.
There is some user related information that is not stored directly in user entities. They are descirbed below:
Users can be allowed and disallowed access to websites. Piwik persists each user's access level for each website they have access to in the access
table.
To read more about this, read the Permissions guide.
Piwik will also persist each user's language of choice. User logins are associated with the name of the language (written in the chosen language as opposed to English).
This association and the persistence logic is implemented by the LanguagesManager plugin.
Options are key-value pairs where the key is a string and the value is a another string (possibly bigger and possibly binary). They are queried on every UI and Reporting API request. The tracker will cache relevant option values and so will only query options when the cache needs updating.
Some options should be loaded on every non-tracking request. These options have a special autoload property set to 1
.
- To learn how log data is aggregated see our Archiving guide and take a look at the LogAggregator class.
- To learn how archive data is cached see our Archive Data guide.
- To learn about Piwik's logging utility see this section in our Getting started extending Piwik guide.
- To learn about database backed sessions read this FAQ entry.
- To learn how plugins can persist data read the Extending the Database guide.