Skip to content

Data Archive Considerations

Richard Strange edited this page Aug 7, 2024 · 10 revisions

This page highlights some initial considerations around the archiving of data in the vAirify platform. This includes some very rough calculations to decide if concentrating on archiving is worth the effort.

As the vAirify platform continues to gather data with daily runs of the ETL processes the overall size of both the database and pre-processed data textures volumes will continue to grow. Currently there is no upper limit to this growth.

The forecast ETL runs twice a day. The In-Situ data ETL runs once an hour. For the purposes of this we will ignore logs.

In order to get a rough estimate for how quickly our data stores are likely to grow I first cleared down the three main database tables and all local data textures, then reran the ETLs to populate these, I then removed any data from the current and previous days, as these may not have represented complete datasets. In effect, the only data stored was for a 5 day period. This ended up with the following:

Date Forecast documents In Situ documents Data Textures documents
1st Aug 12546 32708 42
2nd Aug 12546 34019 42
3rd Aug 12546 34793 42
4th Aug 12546 34358 42
5th Aug 12546 33880 42

According to Mongo the storage size of these databases was:

Database Size
forecast_data 5.39 MB
in_situ_data 12.88 MB
data_textures 28.67 kB

This translates to roughly 3.7 MB a day.

In addition we have the data textures. On my local machine these took up 221 MB overall, which relate to roughly 22.1 MB a day

Combining these equates to a very rough estimate of 25.8 MB added daily by our processes.

Given the Linux box is 200 GB in size, if we were (very) cautious we could allow 100 GB to the data storage, which would take (100 * 1000) / 25.8 = 3,876 days, or just over 10.5 years to fill.

It should be noted that these calculations are VERY high level and rough.

vAirify Wiki

Home

Getting Started and Overview

Investigations and Notebooks

Testing

Manual Test Charters

Clone this wiki locally