Skip to content

Format Requirements for CSV Files

HankHerr-NOAA edited this page Jul 17, 2024 · 3 revisions

Format Requirements for Comma Separated Values (CSV) Files

The WRES can read WRES-compliant CSV files for input data for an evaluation. For how to declare the use of a .csv file, see Declaration Language. In describing the file format, below, note that all datetimes within a CSV must be in GMT (also known as Zulu time or Z).

Single-valued Forecasts

WRES is capable of ingesting single-valued forecast data provided in a CSV format. For example:

start_date,value_date,variable_name,location,measurement_unit,value
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,24.1255
1985-06-01T12:00:00Z,1985-06-01T14:00:00Z,SQIN,DRRC2,CMS,24.3102
1985-06-01T12:00:00Z,1985-06-01T15:00:00Z,SQIN,DRRC2,CMS,24.4921
1985-06-02T12:00:00Z,1985-06-02T13:00:00Z,SQIN,DRRC2,CMS,20.6023
1985-06-02T12:00:00Z,1985-06-02T14:00:00Z,SQIN,DRRC2,CMS,20.8583
1985-06-02T12:00:00Z,1985-06-02T15:00:00Z,SQIN,DRRC2,CMS,21.1095
1985-06-03T12:00:00Z,1985-06-03T13:00:00Z,SQIN,DRRC2,CMS,22.4598
1985-06-03T12:00:00Z,1985-06-03T14:00:00Z,SQIN,DRRC2,CMS,22.6702
1985-06-03T12:00:00Z,1985-06-03T15:00:00Z,SQIN,DRRC2,CMS,22.8758

The first line in the file in the example above is a header and it is required as of WRES 4.0. All other lines are processed as data and must obey the following column format:

  1. Forecast issued date/time.
  2. Forecast valid date/time.
  3. Forecast data type or parameter id.
  4. Location identifier, handbook5id.
  5. Measuring units.
  6. The value (only one per row).

Ensemble Forecasts

WRES is capable of ingesting ensemble forecast data provided in a CSV format. For example:

start_date,value_date,variable_name,location,measurement_unit,value,ensemble_name,qualifier_id,ensemblemember_id
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,22.9712,HEFSENSPOST,SIM1,1961
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,23.2453,HEFSENSPOST,SIM1,1962
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,23.9146,HEFSENSPOST,SIM1,1963
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,22.6584,HEFSENSPOST,SIM1,1964
...

As with single-valued forecasts, the first row must be a header row.

All other lines are processed as data and must obey the following column format:

  1. Forecast issued date/time.
  2. Forecast valid date/time.
  3. Forecast data type or parameter id.
  4. Location identifier, handbook5id.
  5. Measuring units.
  6. The value (only one per row).
  7. Ensemble identifier.
  8. Qualifier identifier (an additional id for the ensemble, allowing for uniquely delineating two different ensembles with the same name and members).
  9. Ensemble member identifier.

Observations and Simulations

WRES is capable of ingesting observation or simulation data provided in a CSV format. For example:

value_date,variable_name,location,measurement_unit,value
1985-06-01T13:00:00Z,QINE,DRRC2,CFS,747.78455
1985-06-01T14:00:00Z,QINE,DRRC2,CFS,735.21606
1985-06-01T15:00:00Z,QINE,DRRC2,CFS,722.6476
1985-06-01T16:00:00Z,QINE,DRRC2,CFS,710.0755
...

All other lines are processed as data and must obey the following column format:

  1. Forecast valid date/time.
  2. Forecast data type or parameter id.
  3. Location identifier, handbook5id.
  4. Measuring units.
  5. The value (only one per row).

Order and Grouping of Rows

The order of values can affect the runtime performance of (and potentially the interpretation by) WRES when it uses the CSV. It is best to put the values in timeseries order, i.e. grouped by location, measurement_unit, and start_date. In other words, here is an example of data with two locations with data for the same valid datetimes:

start_date,value_date,variable_name,location,measurement_unit,value
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,24.1255
1985-06-01T12:00:00Z,1985-06-01T14:00:00Z,SQIN,DRRC2,CMS,24.3102
1985-06-01T12:00:00Z,1985-06-01T15:00:00Z,SQIN,DRRC2,CMS,24.4921
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,LOCA2,CMS,42.1255
1985-06-01T12:00:00Z,1985-06-01T14:00:00Z,SQIN,LOCA2,CMS,42.3102
1985-06-01T12:00:00Z,1985-06-01T15:00:00Z,SQIN,LOCA2,CMS,42.4921

The order of the value_date rows within a given location/measurement_unit/start_date group (timeseries, if you will) is not as important as ensuring timeseries grouping by those three key fields.

Geometry metadata

The following optional<\u> columns are available:

  • location_description The long name of the location.
  • location_srid The EPSG SRID (an integer) of the coordinate reference system.
  • location_wkt The Well-Known-Text of the geometry.

These metadata are not fully made use of, but future releases of WRES may use them more fully.

Encoding and end-of-line characters

UTF-8 encoding is required. ASCII is a compatible subset of UTF-8, so ASCII counts. The end of a line may be represented by any of the following:

  • Carriage return,
  • line feed, or
  • carriage return immediately followed by line feed.

Time scale metadata

The following optional columns are available, which must be provided in tandem or not at all:

  • timescale_in_minutes The duration or period that each value in the time series represents or was collected over. This must be an integer and is in minutes.
  • timescale_function The function that was applied over the timescale_in_minutes. Valid values: MEAN, MINIMUM, MAXIMUM, TOTAL.

The time scale is distinct from (although frequently similar to) the time step between values. One can imagine a trailing one hour MEAN value that is recorded every fifteen minutes. In that example, the time step (between values) would be fifteen minutes while the time scale (of each value) would be one hour.

Clone this wiki locally