specifying key_field parameter for split_occurrence_data causes exception #341

zzeppozz · 2022-05-31T20:27:13Z

Describe the bug

Specifying the key_field parameter in a config_file for split_occurrence_data throws exception:
Traceback (most recent call last):
File "/usr/local/bin/split_occurrence_data", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/lmpy/tools/split_occurrence_data.py", line 135, in cli
occurrence_processor.process_reader(reader, wranglers)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 168, in process_reader
self.write_points(points)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 181, in write_points
if writer_key not in self.writers.keys():
TypeError: unhashable type: 'list'

To Reproduce

at a bash prompt, run split_occurrence_data --config_file=

with the config file
{
"max_open_writers": 100,
"key_field": "scientificName",
"dwca": [
["/volumes/data/input/occ_heuchera_gbif.zip",
"/volumes/data/config/wrangle_nothing.json"
]
],
"out_dir": "/volumes/output"
}

Expected behavior

No exception, executes correctly.

Desktop (please complete the following information)

linux
Version 3.1.20

cjgrady · 2022-06-02T17:27:04Z

@zzeppozz is data available somewhere? and what does the wrangler configuration file look like?

zzeppozz · 2022-06-02T21:16:13Z

CLI split_occurrence_data fails on providing key_field in config file. Fails different ways with 1 or 2 fields in the list. It works fine if I leave out that parameter.

With 2 fields in the list:
{
"max_open_writers": 100,
"key_field": ["genus","scientficName"],
"dwca": [
["/volumes/data/input/occ_heuchera_gbif.zip",
"/volumes/data/config/no_wranglers.json"
]
],
"out_dir": "/volumes/output/heuchera_species"

}

root@c80091bd812f:/# split_occurrence_data --config_file /volumes/data/config/split_occurrence_data.json
Traceback (most recent call last):
File "/usr/local/bin/split_occurrence_data", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/lmpy/tools/split_occurrence_data.py", line 135, in cli
occurrence_processor.process_reader(reader, wranglers)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 168, in process_reader
self.write_points(points)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 184, in write_points
self.open_writer(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 143, in open_writer
writer_fn = self.get_writer_filename(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 33, in get_writer_filename_from_key
writer_fn = '{}.csv'.format(os.path.join(base_dir, *writer_key))
File "/usr/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

fails differently with 1 key_field in the list:
{
"max_open_writers": 100,
"key_field": ["scientficName"],
"dwca": [
["/volumes/data/input/occ_heuchera_gbif.zip",
"/volumes/data/config/no_wranglers.json"
]
],
"out_dir": "/volumes/output/heuchera_species"

}
root@c80091bd812f:/# split_occurrence_data --config_file /volumes/data/config/split_occurrence_data.json
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 31, in get_writer_filename_from_key
writer_fn = '{}.csv'.format(os.path.join(base_dir, writer_key))
File "/usr/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/split_occurrence_data", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/lmpy/tools/split_occurrence_data.py", line 135, in cli
occurrence_processor.process_reader(reader, wranglers)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 168, in process_reader
self.write_points(points)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 184, in write_points
self.open_writer(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 143, in open_writer
writer_fn = self.get_writer_filename(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 33, in get_writer_filename_from_key
writer_fn = '{}.csv'.format(os.path.join(base_dir, *writer_key))
TypeError: join() argument after * must be an iterable, not NoneType

zzeppozz assigned cjgrady May 31, 2022

cjgrady added bug Something isn't working tools Tasks related to lmpy tools labels Jun 2, 2022

cjgrady added this to the Version-3.2 milestone Jun 2, 2022

cjgrady mentioned this issue Jun 2, 2022

341 specifying key field parameter for split occurrence data causes exception #345

Merged

7 tasks

cjgrady linked a pull request Jun 2, 2022 that will close this issue

341 specifying key field parameter for split occurrence data causes exception #345

Merged

7 tasks

cjgrady closed this as completed in #345 Jun 2, 2022

zzeppozz reopened this Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

specifying key_field parameter for split_occurrence_data causes exception #341

specifying key_field parameter for split_occurrence_data causes exception #341

zzeppozz commented May 31, 2022

cjgrady commented Jun 2, 2022

zzeppozz commented Jun 2, 2022

specifying key_field parameter for split_occurrence_data causes exception #341

specifying key_field parameter for split_occurrence_data causes exception #341

Comments

zzeppozz commented May 31, 2022

Describe the bug

To Reproduce

Expected behavior

Desktop (please complete the following information)

cjgrady commented Jun 2, 2022

zzeppozz commented Jun 2, 2022