Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

specifying key_field parameter for split_occurrence_data causes exception #341

Open
zzeppozz opened this issue May 31, 2022 · 2 comments · Fixed by #345
Open

specifying key_field parameter for split_occurrence_data causes exception #341

zzeppozz opened this issue May 31, 2022 · 2 comments · Fixed by #345
Assignees
Labels
bug Something isn't working tools Tasks related to lmpy tools
Milestone

Comments

@zzeppozz
Copy link
Contributor

Describe the bug

Specifying the key_field parameter in a config_file for split_occurrence_data throws exception:
Traceback (most recent call last):
File "/usr/local/bin/split_occurrence_data", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/lmpy/tools/split_occurrence_data.py", line 135, in cli
occurrence_processor.process_reader(reader, wranglers)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 168, in process_reader
self.write_points(points)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 181, in write_points
if writer_key not in self.writers.keys():
TypeError: unhashable type: 'list'

To Reproduce

at a bash prompt, run split_occurrence_data --config_file=

with the config file
{
"max_open_writers": 100,
"key_field": "scientificName",
"dwca": [
["/volumes/data/input/occ_heuchera_gbif.zip",
"/volumes/data/config/wrangle_nothing.json"
]
],
"out_dir": "/volumes/output"
}

Expected behavior

No exception, executes correctly.

Desktop (please complete the following information)

  • linux
  • Version 3.1.20
@cjgrady cjgrady added bug Something isn't working tools Tasks related to lmpy tools labels Jun 2, 2022
@cjgrady cjgrady added this to the Version-3.2 milestone Jun 2, 2022
@cjgrady
Copy link
Contributor

cjgrady commented Jun 2, 2022

@zzeppozz is data available somewhere? and what does the wrangler configuration file look like?

@zzeppozz
Copy link
Contributor Author

zzeppozz commented Jun 2, 2022

CLI split_occurrence_data fails on providing key_field in config file. Fails different ways with 1 or 2 fields in the list. It works fine if I leave out that parameter.

With 2 fields in the list:
{
"max_open_writers": 100,
"key_field": ["genus","scientficName"],
"dwca": [
["/volumes/data/input/occ_heuchera_gbif.zip",
"/volumes/data/config/no_wranglers.json"
]
],
"out_dir": "/volumes/output/heuchera_species"

}

root@c80091bd812f:/# split_occurrence_data --config_file /volumes/data/config/split_occurrence_data.json
Traceback (most recent call last):
File "/usr/local/bin/split_occurrence_data", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/lmpy/tools/split_occurrence_data.py", line 135, in cli
occurrence_processor.process_reader(reader, wranglers)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 168, in process_reader
self.write_points(points)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 184, in write_points
self.open_writer(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 143, in open_writer
writer_fn = self.get_writer_filename(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 33, in get_writer_filename_from_key
writer_fn = '{}.csv'.format(os.path.join(base_dir, *writer_key))
File "/usr/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

fails differently with 1 key_field in the list:
{
"max_open_writers": 100,
"key_field": ["scientficName"],
"dwca": [
["/volumes/data/input/occ_heuchera_gbif.zip",
"/volumes/data/config/no_wranglers.json"
]
],
"out_dir": "/volumes/output/heuchera_species"

}
root@c80091bd812f:/# split_occurrence_data --config_file /volumes/data/config/split_occurrence_data.json
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 31, in get_writer_filename_from_key
writer_fn = '{}.csv'.format(os.path.join(base_dir, writer_key))
File "/usr/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/split_occurrence_data", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/lmpy/tools/split_occurrence_data.py", line 135, in cli
occurrence_processor.process_reader(reader, wranglers)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 168, in process_reader
self.write_points(points)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 184, in write_points
self.open_writer(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 143, in open_writer
writer_fn = self.get_writer_filename(writer_key)
File "/usr/local/lib/python3.8/dist-packages/lmpy/data_preparation/occurrence_splitter.py", line 33, in get_writer_filename_from_key
writer_fn = '{}.csv'.format(os.path.join(base_dir, *writer_key))
TypeError: join() argument after * must be an iterable, not NoneType

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working tools Tasks related to lmpy tools
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants