flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

lgarzio · 2021-09-28T17:23:10Z

The flat_line_test doesn't appear to work in datasets with lots of missing values. ioos_qc version 2.0.1

Example netcdf file here

Example configuration file:
test_flatline.txt

import xarray as xr
from ioos_qc.config import Config
from ioos_qc.streams import XarrayStream
from ioos_qc.results import collect_results

f = 'maracoos_02_20210716T190208Z_dbd.nc'
config_file = 'test_flatline.txt'
ds = xr.open_dataset(f)
c = Config(config_file)
xs = XarrayStream(ds, time='time', lat='latitude', lon='longitude')
qc_results = xs.run(c)
collected_list = collect_results(qc_results, how='list')

for cl in collected_list:
    flag_results = cl.results.data
    flag_results

array([1, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 1, 9, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9,
       1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9,
       9, 1, 9, 1, 9, 9, 1, 9, 1])

ds.conductivity.values
array([0.     ,     nan, 0.     ,     nan,     nan,     nan,     nan,
           nan,     nan,     nan,     nan,     nan,     nan,     nan,
           nan,     nan,     nan,     nan,     nan, 4.45602,     nan,
       4.45214,     nan,     nan, 4.45171,     nan, 4.45178,     nan,
           nan, 4.45147,     nan, 4.45046,     nan,     nan, 4.45   ,
           nan, 4.45106,     nan,     nan, 4.45116,     nan, 4.45054,
           nan,     nan, 4.45027,     nan, 4.45089, 4.45019,     nan,
           nan, 4.45109,     nan,     nan, 4.4514 ,     nan, 4.45154,
           nan,     nan, 4.45173,     nan, 4.45156,     nan,     nan,
       4.45145,     nan, 4.45162,     nan, 4.4511 ,     nan,     nan,
       4.45092,     nan, 4.45045,     nan,     nan, 4.45007,     nan,
       4.44995,     nan,     nan, 4.44954,     nan, 4.4495 ,     nan,
           nan, 4.44886,     nan, 4.44779,     nan,     nan, 4.44765,
           nan, 4.44805,     nan,     nan, 4.44685,     nan, 4.44496,
           nan,     nan, 4.43886,     nan, 4.43323,     nan,     nan,
       4.43035,     nan, 4.46671,     nan,     nan, 4.52998,     nan,
       4.53362,     nan,     nan, 4.66421,     nan, 4.66618,     nan,
           nan, 4.61894,     nan, 4.54442,     nan,     nan, 4.51362,
           nan, 4.47128,     nan,     nan, 4.38806,     nan, 4.28966,
           nan,     nan, 4.23655,     nan, 4.23101,     nan,     nan,
       4.23322,     nan, 4.2036 ,     nan,     nan, 4.17473,     nan,
       4.16556,     nan,     nan, 4.16569,     nan, 4.16743,     nan,
           nan, 4.15847,     nan, 4.15089,     nan,     nan, 4.14448,
           nan, 4.14326], dtype=float32)

In this example, it looks like when there is only one conductivity value surrounded by missing values within the threshold window, the test flags the one valid conductivity value.

We think we have a modification to qartod.py that checks to make sure there are at least 3 valid data points in the test window:

add line 663:

np.ma.count(window, 1)

modify line 665 (now 666):

test_results = np.ma.filled(np.logical_and(data_range < tolerance, data_count > 2), fill_value=False)

When these lines are added/modified in qartod.py, the result for this example becomes:

array([1, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 1, 9, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9,
       1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9,
       9, 1, 9, 1, 9, 9, 1, 9, 1])

The text was updated successfully, but these errors were encountered:

lgarzio changed the title ~~flat_line_test doesn't work with a lot of missing values~~ flat_line_test doesn't work with a lot of missing values (priority: low) Sep 28, 2021

lgarzio changed the title ~~flat_line_test doesn't work with a lot of missing values (priority: low)~~ flat_line_test doesn't work when threshold window contains mostly missing values (priority: low) Sep 28, 2021

lgarzio changed the title ~~flat_line_test doesn't work when threshold window contains mostly missing values (priority: low)~~ flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

lgarzio commented Sep 28, 2021 •

edited

Loading

flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

Comments

lgarzio commented Sep 28, 2021 • edited Loading

lgarzio commented Sep 28, 2021 •

edited

Loading