Improve seeders understanding of what a data set is #9

postfalk · 2017-02-11T19:43:02Z

Currently, I see two different problems in the seeded urls:

News releases ABOUT data. It would be nice if the actual data set could be tracked down in the woods of the government web pages. Sure, that can be tedious at times.
Some people log entire data portals. It would be better if they could be broken down to single data products. (It might be nice to maintain relations for code reuse).

I would identify data products by the presence of meta data, scientific citations, and method documents. I am aware of the fact that this might not be always feasible. In many cases it would make the scraping task more manageable. It would be also great to store these documents alongside the data set.

suchthis · 2017-02-15T20:59:19Z

Thanks for the feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve seeders understanding of what a data set is #9

Improve seeders understanding of what a data set is #9

postfalk commented Feb 11, 2017

suchthis commented Feb 15, 2017

Improve seeders understanding of what a data set is #9

Improve seeders understanding of what a data set is #9

Comments

postfalk commented Feb 11, 2017

suchthis commented Feb 15, 2017