Not all databanks are indexed (correctly) #40

drlemmus · 2016-11-25T10:25:04Z

A number of databanks do not have any statistics (e.g. HSSP and PDB_REDO). The counts for other databanks (e.g. DSSP and STRUCTURFACTORS) are incorrect: missing entries are not listed.

jonblack · 2016-11-25T11:18:54Z

Coos is looking into this for #38.

jonblack · 2016-11-30T08:48:32Z

#38 is now closed. The stats page is now correct based on what's in the database. The issue is with the crawler and annotator.

jonblack · 2016-12-01T08:55:31Z

The crawling/annotation method is a bit backwards. We start the process without any expectations. If we scan only 100 PDB files, then we only expect 100 files maximum in the other databanks that depend on the PDB.

In reality we have a pretty good idea before we start what the ideal scenario is. We can download a list of all valid and obsolete PDB IDS from pdb.org and use that as a base. When we crawl we're no longer indexing what we have but instead checking to see what's missing. Those that are missing can then be passed through the annotator.

I'm going to update this process to use the ids downloaded from pdb.org as the source.

jonblack self-assigned this Nov 30, 2016

jonblack mentioned this issue Nov 30, 2016

Comments page is empty #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not all databanks are indexed (correctly) #40

Not all databanks are indexed (correctly) #40

drlemmus commented Nov 25, 2016

jonblack commented Nov 25, 2016

jonblack commented Nov 30, 2016

jonblack commented Dec 1, 2016 •

edited

Loading

Not all databanks are indexed (correctly) #40

Not all databanks are indexed (correctly) #40

Comments

drlemmus commented Nov 25, 2016

jonblack commented Nov 25, 2016

jonblack commented Nov 30, 2016

jonblack commented Dec 1, 2016 • edited Loading

jonblack commented Dec 1, 2016 •

edited

Loading