-
Notifications
You must be signed in to change notification settings - Fork 1
Data storage
ppegusii edited this page Jul 8, 2015
·
2 revisions
Hi all,
I have been looking at the database schema and scripts that Sean wrote.
Here is the status:
The schema has multiple tables to store the egauge configurations. We
could extend it to indicate which ones need to be polled and at what
interval.
The collection data is stored in a single table that has the following
format:
mysql> select * from data order by time desc limit 5;
+------------+---------------------+-----------+-----------+-------+
| id | time | egauge_id | device_id | watts |
+------------+---------------------+-----------+-----------+-------+
| 2715333905 | 2014-10-21 15:52:08 | 6 | 109 | -61 |
| 2715333904 | 2014-10-21 15:52:08 | 6 | 113 | -1 |
| 2715333903 | 2014-10-21 15:52:08 | 6 | 110 | 31 |
| 2715333902 | 2014-10-21 15:52:08 | 10 | 160 | 7 |
| 2715333901 | 2014-10-21 15:52:08 | 10 | 158 | 7 |
+------------+---------------------+-----------+-----------+-------+
Note that the device_id column is more a channel_id actually.
It cannot really be more compact than that unless we start to compress
the data (log a value only if it is different from the last value logged
for the egauge_id/channel_id).
Another options is to have multiple tables: data_egauge_id1,
data_egauge_id2, etc where we only store the channel/value for that
egauge. We can even envisage a separate table for each device/channel
combination with a naming schema like: data_egauge_x_ch_y
Note that MySQL does not scale very well with a very large number of
tables so 1 table per egauge might be a better approach.
All in all it depends how we want to query the data afterwards. If we
need to do a lot of cross device/cross channel queries, multiple tables
could be overkill in terms of joins. If we need all the info for a
particular time period, we can split the data per month data_10_2014,
data_11_2014.
We can also use a partitioned table
(http://dev.mysql.com/doc/refman/5.1/en/partitioning.html) to have a
partition per date range. We only add partitions to the high end of the
range, so repartitioning is not necessarily that easy.
The polling scripts are written in PHP and there is 1 process per
eGauge. We could eventually rewrite this in a single app with a thread
per device but that shouldn't change much in terms of performance (just
resource utilization would be lower on none at scale).
I guess the real missing part is the management. We probably need a web
interface to interact with the database to add/remove/update eGauge
configuration and devices, start/stop collection processes and provide a
dashboard with the current status (which means collection threads should
probably store their state in the database too). A simple WebApp could
take care of that.
I think we should all meet to decide how we want to proceed and what the
next steps should be.
It could be as simple as leaving everything as is and just moving the
data to a new DB that will have enough space, or a major redesign to
accommodate new features and requirements.
Emmanuel