Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accumulated stats for historical data #144

Open
gothub opened this issue Jul 25, 2018 · 3 comments
Open

Accumulated stats for historical data #144

gothub opened this issue Jul 25, 2018 · 3 comments
Assignees
Labels
metadig All issues related to metadig question

Comments

@gothub
Copy link
Contributor

gothub commented Jul 25, 2018

The MetaDIG Solr engine will be used to generate stats for metadata reports. The current Solr schema doesn't indicate if a PID has been obsoleted by a more recent PID. This makes it difficult/impossible to track how an individual or group of metadata documents has improved over time.

Do we want to include this obsolescense info in the Solr index?

The current Solr index fields are described in this issue.

@gothub gothub added question metadig All issues related to metadig labels Jul 25, 2018
@gothub gothub self-assigned this Jul 25, 2018
@gothub
Copy link
Contributor Author

gothub commented Jul 25, 2018

Note that adding obsolescence info to the index would involve updating the index sub-processor to update the entry for the PID being obsoleted by a new PID.

@mbjones
Copy link
Member

mbjones commented Jul 25, 2018

As mentioned on a past call, we could also be loading quality stats into our metrics service that Rushi is building, as it has many of the metadata fields needed for faceting and aggregation across versions. That might be better than indexing it all over again. Let's discuss with @Rushiraj and @davev.

@gothub
Copy link
Contributor Author

gothub commented Jul 25, 2018

One consideration for indexing the quality data into it's own index is that the indexing component itself (indexing quality sub processor) is calculating the quality scores from a newly generated quality document. One the scores are calculated in the metadig-engine indexing component, inserting the document into Solr is very fast.

@gothub gothub added this to the 2.1.0 milestone Nov 14, 2018
@gothub gothub modified the milestones: 2.1.0, 3.0 Apr 2, 2020
@jeanetteclark jeanetteclark removed this from the 3.0 milestone Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadig All issues related to metadig question
Projects
None yet
Development

No branches or pull requests

3 participants