-
Notifications
You must be signed in to change notification settings - Fork 1
Format Requirements for CSV Files
The WRES can read WRES-compliant CSV files for input data for an evaluation. For how to declare the use of a .csv file, see Declaration Language. In describing the file format, below, note that all datetimes within a CSV must be in GMT (also known as Zulu time or Z
).
WRES is capable of ingesting single-valued forecast data provided in a CSV format. For example:
start_date,value_date,variable_name,location,measurement_unit,value
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,24.1255
1985-06-01T12:00:00Z,1985-06-01T14:00:00Z,SQIN,DRRC2,CMS,24.3102
1985-06-01T12:00:00Z,1985-06-01T15:00:00Z,SQIN,DRRC2,CMS,24.4921
1985-06-02T12:00:00Z,1985-06-02T13:00:00Z,SQIN,DRRC2,CMS,20.6023
1985-06-02T12:00:00Z,1985-06-02T14:00:00Z,SQIN,DRRC2,CMS,20.8583
1985-06-02T12:00:00Z,1985-06-02T15:00:00Z,SQIN,DRRC2,CMS,21.1095
1985-06-03T12:00:00Z,1985-06-03T13:00:00Z,SQIN,DRRC2,CMS,22.4598
1985-06-03T12:00:00Z,1985-06-03T14:00:00Z,SQIN,DRRC2,CMS,22.6702
1985-06-03T12:00:00Z,1985-06-03T15:00:00Z,SQIN,DRRC2,CMS,22.8758
The first line in the file in the example above is a header and it is required as of WRES 4.0. All other lines are processed as data and must obey the following column format:
- Forecast issued date/time.
- Forecast valid date/time.
- Forecast data type or parameter id.
- Location identifier, handbook5id.
- Measuring units.
- The value (only one per row).
WRES is capable of ingesting ensemble forecast data provided in a CSV format. For example:
start_date,value_date,variable_name,location,measurement_unit,value,ensemble_name,qualifier_id,ensemblemember_id
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,22.9712,HEFSENSPOST,SIM1,1961
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,23.2453,HEFSENSPOST,SIM1,1962
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,23.9146,HEFSENSPOST,SIM1,1963
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,22.6584,HEFSENSPOST,SIM1,1964
...
As with single-valued forecasts, the first row must be a header row.
All other lines are processed as data and must obey the following column format:
- Forecast issued date/time.
- Forecast valid date/time.
- Forecast data type or parameter id.
- Location identifier, handbook5id.
- Measuring units.
- The value (only one per row).
- Ensemble identifier.
- Qualifier identifier (an additional id for the ensemble, allowing for uniquely delineating two different ensembles with the same name and members).
- Ensemble member identifier.
WRES is capable of ingesting observation or simulation data provided in a CSV format. For example:
value_date,variable_name,location,measurement_unit,value
1985-06-01T13:00:00Z,QINE,DRRC2,CFS,747.78455
1985-06-01T14:00:00Z,QINE,DRRC2,CFS,735.21606
1985-06-01T15:00:00Z,QINE,DRRC2,CFS,722.6476
1985-06-01T16:00:00Z,QINE,DRRC2,CFS,710.0755
...
All other lines are processed as data and must obey the following column format:
- Forecast valid date/time.
- Forecast data type or parameter id.
- Location identifier, handbook5id.
- Measuring units.
- The value (only one per row).
The order of values can affect the runtime performance of (and potentially the interpretation by) WRES when it uses the CSV. It is best to put the values in timeseries order, i.e. grouped by location, measurement_unit, and start_date. In other words, here is an example of data with two locations with data for the same valid datetimes:
start_date,value_date,variable_name,location,measurement_unit,value
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,DRRC2,CMS,24.1255
1985-06-01T12:00:00Z,1985-06-01T14:00:00Z,SQIN,DRRC2,CMS,24.3102
1985-06-01T12:00:00Z,1985-06-01T15:00:00Z,SQIN,DRRC2,CMS,24.4921
1985-06-01T12:00:00Z,1985-06-01T13:00:00Z,SQIN,LOCA2,CMS,42.1255
1985-06-01T12:00:00Z,1985-06-01T14:00:00Z,SQIN,LOCA2,CMS,42.3102
1985-06-01T12:00:00Z,1985-06-01T15:00:00Z,SQIN,LOCA2,CMS,42.4921
The order of the value_date rows within a given location/measurement_unit/start_date group (timeseries, if you will) is not as important as ensuring timeseries grouping by those three key fields.
The following optional<\u> columns are available:
-
location_description
The long name of the location. -
location_srid
The EPSG SRID (an integer) of the coordinate reference system. -
location_wkt
The Well-Known-Text of the geometry.
These metadata are not fully made use of, but future releases of WRES may use them more fully.
UTF-8 encoding is required. ASCII is a compatible subset of UTF-8, so ASCII counts. The end of a line may be represented by any of the following:
- Carriage return,
- line feed, or
- carriage return immediately followed by line feed.
The following optional columns are available, which must be provided in tandem or not at all:
-
timescale_in_minutes
The duration or period that each value in the time series represents or was collected over. This must be an integer and is in minutes. -
timescale_function
The function that was applied over thetimescale_in_minutes
. Valid values:MEAN
,MINIMUM
,MAXIMUM
,TOTAL
.
The time scale is distinct from (although frequently similar to) the time step between values. One can imagine a trailing one hour MEAN
value that is recorded every fifteen minutes. In that example, the time step (between values) would be fifteen minutes while the time scale (of each value) would be one hour.
The WRES Wiki
-
Options for Deploying and Operating the WRES
- Obtaining and using the WRES as a standalone application
- WRES Local Server
- WRES Web Service (under construction)
-
- Format Requirements for CSV Files
- Format Requirements for NetCDF Files
- Introductory Resources on Forecast Verification
- Instructions for Human Interaction with a WRES Web-service
- Instructions for Programmatic Interaction with a WRES Web-service
- Output Format Description for CSV2
- Posting timeseries data directly to a WRES web‐service as inputs for a WRES job
- WRES Scripts Usage Guide