NOAA harvest job stuck because of large file size #4965

rshewitt · 2024-11-04T21:26:44Z

noaa-nesdis-ncei-accessions has some datasets which cause an out-of-memory error in catalog-fetch ( i.e. the log message is "Killed" ). related to 1487. here's a dataset which managed to be created after increasing catalog-fetch memory but because of its size the server responds with a 500 in the UI.

https://catalog-prod-admin-datagov.app.cloud.gov/api/action/package_show?id=noaa-global-drifter-program-quality-controlled-6-hour-interpolated-data-from-ocean-surface-drif

How to reproduce

harvest the source

Expected behavior

the job is completed without timeout.

Actual behavior

the job is stuck and times out after the 72 hour limit.

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

FuhuXia · 2024-11-04T22:29:47Z

For this particular case, the stuck job is directly related to tons of tags(keywords) in some xml records, 34,866 to be exact for the sampled one. Large file size is also because of tons of tags. So we can set a max limit of tags allowed, we can reject this kind of non-sense records and not get job stuck.

Rejecting records based on file size may be too broad.

btylerburton · 2024-11-05T20:51:33Z

To Fuhu's point, we should set a reasonably high limit for each field, publicize it somewhere, and then hard fail the datasets when they exceed that limit. In H2.0 we can even throw custom errors to highlight this.

FuhuXia · 2024-11-05T21:23:02Z

If we can set the limit ridiculously high, say, 3000, maybe we can get away without publicizing it, because it will be really rare for any record to reach that limit. And when it does, people will know why the dataset fails to be harvested because it is ridiculous. Who would create a dataset with 1500 resources or 3000 keywords.

rshewitt added the bug Software defect or bug label Nov 4, 2024

github-project-automation bot added this to data.gov team board Nov 4, 2024

hkdctol added component/catalog Related to catalog component playbooks/roles O&M Operations and maintenance tasks for the Data.gov platform labels Nov 7, 2024

hkdctol moved this to 📥 Queue in data.gov team board Nov 7, 2024

hkdctol assigned rshewitt Nov 7, 2024

hkdctol mentioned this issue Nov 7, 2024

Update error message on dataset page for large datasets #4966

Open

1 task

FuhuXia mentioned this issue Nov 13, 2024

Automated CKAN Job Error Condition GSA/catalog.data.gov#1487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOAA harvest job stuck because of large file size #4965

NOAA harvest job stuck because of large file size #4965

rshewitt commented Nov 4, 2024

FuhuXia commented Nov 4, 2024

btylerburton commented Nov 5, 2024

FuhuXia commented Nov 5, 2024

NOAA harvest job stuck because of large file size #4965

NOAA harvest job stuck because of large file size #4965

Comments

rshewitt commented Nov 4, 2024

How to reproduce

Expected behavior

Actual behavior

Sketch

FuhuXia commented Nov 4, 2024

btylerburton commented Nov 5, 2024

FuhuXia commented Nov 5, 2024