-
Notifications
You must be signed in to change notification settings - Fork 50
BMI_Tutorial
The tutorial runs on top of the standard Lower Colorado V4 tutorial directory structure, i.e., it runs entirely in t-route's V4 mode, and with HYfeatures data. Due to its simplicity, the Cedar Bayou example is used to demonstrate the capability and reported on here. The following inputs will need to be collected:
- Cedar Bayou geopackage (of geopackage containing same), e.g.,
Diff_Texas_CedarBayou_NGENv201.gpkg
. - Coastal boundary input file (in this example, DFLOW output), e.g.,
coastal_boundary_input_file = boundary_forcing/FlowFM_0000_his.nc
- Coastal domain yaml boundary file, e.g.,
domain/coastal_domain_LC_US_DS.yaml
. - qlat input forcing files, e.g.,
qlat_file_pattern_filter = "*.CHRTOUT_DOMAIN1.csv"
in a designatedqlat_input_folder
. - usgs timeslices and rfc input folders (other gages are not enabled; e.g.,
usgs_timeslices
andrfc_timeseries
)
In the main BMI script, reference is made to a dedicated Standard_Coastal_AnA.yaml
script. That script needs to be adapted to the aforementioned inputs.
The main script run_BMI_Coastal.py
is available in the standard repository of t-route, in the test/LowerColorado_TX_v4
folder. It relies on the following standard and t-route libraries:
Standard libraries:
import sys
import glob
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import pickle
from datetime import datetime, timedelta
import time
import xarray as xr
T-route repository libraries:
import bmi_troute
import bmi_DAforcing
from troute.HYFeaturesNetwork import HYFeaturesNetwork
from troute.AbstractNetwork import *
import bmi_df2array as df2a
In order to set up troute as a BMI "receiver", a troute model is initiated, and initialized with the Standard_Coastal_AnA.yaml
configuration file:
print('Initialize t-route Module...')
troute = bmi_troute.bmi_troute()
troute.initialize(bmi_cfg_file = base_path + config_dict[config]['yaml'])
troute
is an instance of the bmi_troute class that, when initialized, reads a yaml configuration file and creates and initializes a values dictionary (as a member of the class) that holds all the necessary data needed by a troute run. In addition, the bmi_troute class has member functions, the most important of which are set_value
and get_value
.
In addition to the BMI "receiver", a BMI "transmitter" is defined as a BMI data assimilation module:
print('Initialize DA Module...')
DAforcing = bmi_DAforcing.bmi_DAforcing()
DAforcing.initialize(bmi_cfg_file = base_path + config_dict[config]['yaml'])
DAforcing
, in turn, is an instance of the bmi_DAforcing
class, which ingests BMI compliant data (in principle, numpy.ndarrays
), and converts them into dataframes that t-route can work with. When initialized, the bmi_DAforcing
class combines groups of numpy.ndarrays
corresponding to a dataframe (e.g., usgs_df
or rfc_timeseries_df
) and builds that dataframe. Complementary to bmi_troute
, there are other member functions, the most important of which is set_value
.
Specifically, the coastal dataframe is, at this point, still treated differently, as coastal data has not been integrated into DAforcing yet - therefore, the setup is more direct, by importing it from a DFLOW file, using the xarray library. Please note that while the read_DFlow_output helper function from t-route is used, troute here does not actually read in the coastal dataframe directly - this is only done to simulate BMI functionality, since no coastal BMI data "transmitter" is available:
coastalFile = 'boundary_forcing/FlowFM_0000_his.nc'
ds = xr.open_dataset(coastalFile)
coastalDataFrame = read_DFlow_output(ds)
BMI compatible arrays are prepared for transmission by setting corresponding values dictionaries in the troute class using set_value. For each dataframe to be imported, there are at least three numpy.ndarrays
: (1) one flattened array containing the values of the data entries (usually 64 bit floats of physical quantities such as flow rates or water depths), (2) column names (typically time stamps), and (3) row indices (typically a coded designation of a flow path element). The array flattening is performed by concatenating successive rows into one single 1D numpy array as shown, for the dataframe containing the coastal water depth, in the following image:
The length of the flattened array is the number of time stamps times the number of coastal boundary IDs, and ts datatype depends on the dataframe entries (for the case of coastal boundary arrays, it is np.float64
). To flatten arrays, the **_flatten_arra
**y function from the bmi_df2array
is used:
depthArray_coastal = df2a._flatten_array(coastalDataFrame, np.float64)
.
The column- and row names are also converted to at least one 1D numpy.ndarray
. Since BMI transport layers are limited to either float- or integer 1D array types, and more complex datatypes such as timestamps or strings are disallowed, more than one 1D array may be needed. For example, if the row index is comprised by a list string IDs stringList
of varying lengths, it is converted into two 1D arrays using the function _stringsToBMI(stringList)
from the library bmi_df2array
:
- Data array: one giant 1D ndarray with ASCII encodings of all ID strings concatenated together
- Meta array: an array of how many characters each string contains, with the same length as the Data array
However, for the coastal dataframe, the encoding is more benign, since all boundary IDs are integers, which can be readily cast as a BMI compatible 1D array of np.int64
:
stationArray_coastal = coastalDataFrame.index.values.astype(np.int64)
nStations_coastal = len(stationArray_coastal)
,
whereas the latter is the vertical dimension of the dataframe, which will also be transmitted through the BMI transport layer. The timestamps in the columns are also encoded as integer arrays of the time difference (in seconds) relative to an arbitrary time stamp set at the time of encoding. In the tutorial, the present runtime is used - however, a time corresponding to the time period to be modeled is also suitable:
start_time_coastal = time.time()
timeRef = datetime.fromtimestamp(start_time_coastal)
coastal_timeRef = timeRef.replace(microsecond=0)
.
The resulting relative time stamp array is generated using the _time_from_df function from the library bmi_df2array
, which also generates the corresponding array dimension:
( timeArray_coastal, nTimes_coastal) = df2a._time_from_df(coastalDataFrame, coastal_timeRef)
.
Once all constituent BMI compatible arrays are defined in memory space, they are saved as entries in the values dictionary of troute, which will make them available once troute is run in BMI mode:
- Time stamp array (seconds):
troute.set_value('timeArray_coastal', timeArray_coastal)
- Time (horizontal) dimension of dataframe:
troute.set_value('nTimes_coastal', nTimes_coastal)
- Stations array (integer):
troute.set_value('stationArray_coastal', stationArray_coastal)
- Station array dimension of dataframe:
troute.set_value('nStations_coastal', nStations_coastal)
- Time reference:
troute.set_value('coastal_timeRef', coastal_timeRef)
- Actual flattened array:
troute.set_value('depthArray_coastal', depthArray_coastal)
The actual process of the reassembly of the dataframes is performed when the simulation is run using update_until(n) in the main loop through jourly timesteps:
troute.update_until(n)
.
In the case of the coastal dataframe, it is reassembled into a 2D pandas dataframe in the run
function in the troute_model
class. At first, the flattened depth array is read from the values dictionary:
'depthArray_coastal = values['depthArray_coastal']' .
If that has nonzero entries, all the BMI compatible data structures that will be used in the reassembly of the coastal dataframe are read in and saved as internal class members:
self._depthArray_coastal = depthArray_coastal
self._timeArray_coastal = values['timeArray_coastal']
self._nTimes_coastal = values['nTimes_coastal']
self._stationArray_coastal = values['stationArray_coastal']
self._nStations_coastal = values['nStations_coastal']
self._coastal_timeRef = values['coastal_timeRef']
Then a series of functions from the bmi_array2df
library is used to reassemble the dataframe, as it is also illustrated in the following figure:
At first, the _unflatten_array retrieves the actual 2D structure; this is followed by retrieving the time columns the corresponding BMI array and attaching them to resulting dataframe using _time_retrieve_from_arrays, followed by setting the station names as indices:
Unflatten array:
df_depthArray_coastal = a2df._unflatten_array(self._depthArray_coastal, self._nTimes_coastal, self._nStations_coastal)
Retrieve column time stamps:
timeAxisName = 'time'
freqString = 'None'
df_withTimes_coastal = a2df._time_retrieve_from_arrays(df_depthArray_coastal, self._coastal_timeRef, self._timeArray_coastal, timeAxisName, freqString)
Add station IDs as index:
stationAxisName = 'station_name'
index = pd.Index(self._stationArray_coastal, name=stationAxisName, dtype=np.int64)
df_withTimes_coastal.index = (index)
self._network._coastal_boundary_depth_df = df_withTimes_coastal
,
whereas the last line assigns the resulting raw coastal dataframe.
The following image results from a test where the above-described BMI formalism was compared with the standard command-line runs of troute for the case of Hurricane Ike at Cedar Bayou near Houston, TX, indicating a perfect match between the BMI- vs command line approach.
- Overview
- Hydrofabric Integration
- Input Forcing
- Domain Data
- Data Formats
- CLI
- BMI Tutorial
- Lower Colorado, TX example
- Larger Domains (e.g. CONUS)