Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

Open
lgarzio opened this issue Sep 28, 2021 · 0 comments

Comments

@lgarzio
Copy link

lgarzio commented Sep 28, 2021

The flat_line_test doesn't appear to work in datasets with lots of missing values. ioos_qc version 2.0.1

Example netcdf file here

Example configuration file:
test_flatline.txt

import xarray as xr
from ioos_qc.config import Config
from ioos_qc.streams import XarrayStream
from ioos_qc.results import collect_results

f = 'maracoos_02_20210716T190208Z_dbd.nc'
config_file = 'test_flatline.txt'
ds = xr.open_dataset(f)
c = Config(config_file)
xs = XarrayStream(ds, time='time', lat='latitude', lon='longitude')
qc_results = xs.run(c)
collected_list = collect_results(qc_results, how='list')

for cl in collected_list:
    flag_results = cl.results.data
    flag_results

array([1, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 1, 9, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9,
       1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9,
       9, 1, 9, 1, 9, 9, 1, 9, 1])

ds.conductivity.values
array([0.     ,     nan, 0.     ,     nan,     nan,     nan,     nan,
           nan,     nan,     nan,     nan,     nan,     nan,     nan,
           nan,     nan,     nan,     nan,     nan, 4.45602,     nan,
       4.45214,     nan,     nan, 4.45171,     nan, 4.45178,     nan,
           nan, 4.45147,     nan, 4.45046,     nan,     nan, 4.45   ,
           nan, 4.45106,     nan,     nan, 4.45116,     nan, 4.45054,
           nan,     nan, 4.45027,     nan, 4.45089, 4.45019,     nan,
           nan, 4.45109,     nan,     nan, 4.4514 ,     nan, 4.45154,
           nan,     nan, 4.45173,     nan, 4.45156,     nan,     nan,
       4.45145,     nan, 4.45162,     nan, 4.4511 ,     nan,     nan,
       4.45092,     nan, 4.45045,     nan,     nan, 4.45007,     nan,
       4.44995,     nan,     nan, 4.44954,     nan, 4.4495 ,     nan,
           nan, 4.44886,     nan, 4.44779,     nan,     nan, 4.44765,
           nan, 4.44805,     nan,     nan, 4.44685,     nan, 4.44496,
           nan,     nan, 4.43886,     nan, 4.43323,     nan,     nan,
       4.43035,     nan, 4.46671,     nan,     nan, 4.52998,     nan,
       4.53362,     nan,     nan, 4.66421,     nan, 4.66618,     nan,
           nan, 4.61894,     nan, 4.54442,     nan,     nan, 4.51362,
           nan, 4.47128,     nan,     nan, 4.38806,     nan, 4.28966,
           nan,     nan, 4.23655,     nan, 4.23101,     nan,     nan,
       4.23322,     nan, 4.2036 ,     nan,     nan, 4.17473,     nan,
       4.16556,     nan,     nan, 4.16569,     nan, 4.16743,     nan,
           nan, 4.15847,     nan, 4.15089,     nan,     nan, 4.14448,
           nan, 4.14326], dtype=float32)

In this example, it looks like when there is only one conductivity value surrounded by missing values within the threshold window, the test flags the one valid conductivity value.

We think we have a modification to qartod.py that checks to make sure there are at least 3 valid data points in the test window:

add line 663:

np.ma.count(window, 1)

modify line 665 (now 666):

test_results = np.ma.filled(np.logical_and(data_range < tolerance, data_count > 2), fill_value=False)

When these lines are added/modified in qartod.py, the result for this example becomes:

array([1, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 1, 9, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9,
       1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9,
       9, 1, 9, 1, 9, 9, 1, 9, 1])
@lgarzio lgarzio changed the title flat_line_test doesn't work with a lot of missing values flat_line_test doesn't work with a lot of missing values (priority: low) Sep 28, 2021
@lgarzio lgarzio changed the title flat_line_test doesn't work with a lot of missing values (priority: low) flat_line_test doesn't work when threshold window contains mostly missing values (priority: low) Sep 28, 2021
@lgarzio lgarzio changed the title flat_line_test doesn't work when threshold window contains mostly missing values (priority: low) flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant