You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, I see two different problems in the seeded urls:
News releases ABOUT data. It would be nice if the actual data set could be tracked down in the woods of the government web pages. Sure, that can be tedious at times.
Some people log entire data portals. It would be better if they could be broken down to single data products. (It might be nice to maintain relations for code reuse).
I would identify data products by the presence of meta data, scientific citations, and method documents. I am aware of the fact that this might not be always feasible. In many cases it would make the scraping task more manageable. It would be also great to store these documents alongside the data set.
The text was updated successfully, but these errors were encountered:
Currently, I see two different problems in the seeded urls:
News releases ABOUT data. It would be nice if the actual data set could be tracked down in the woods of the government web pages. Sure, that can be tedious at times.
Some people log entire data portals. It would be better if they could be broken down to single data products. (It might be nice to maintain relations for code reuse).
I would identify data products by the presence of meta data, scientific citations, and method documents. I am aware of the fact that this might not be always feasible. In many cases it would make the scraping task more manageable. It would be also great to store these documents alongside the data set.
The text was updated successfully, but these errors were encountered: