Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make smartBag Support Table Joins #120

Open
stevencox opened this issue Mar 15, 2018 · 6 comments
Open

Make smartBag Support Table Joins #120

stevencox opened this issue Mar 15, 2018 · 6 comments
Assignees

Comments

@stevencox
Copy link
Collaborator

smartBag can generate a smartAPI from a BDBag.

But it's very simple and does not support API endpoints that require joining tabular data from multiple files.

@stevencox
Copy link
Collaborator Author

@tubafrenzy, please assign a milestone date to this item and update status in the issue.

@tubafrenzy tubafrenzy modified the milestone: m2b.1 Apr 13, 2018
@tubafrenzy
Copy link

This will be finished by the end of next week.

@tubafrenzy
Copy link

@stevencox Do you have a sample join that I could use for testing and development purposes? Seems like the API will need to accept parameters that indicate which column on one data set to join to which column on another data set. For now I am playing around with a dummy metadata file keyed off of the bicluster "index_id" field.

@stevencox
Copy link
Collaborator Author

All you need to design the feature is any column shared by two input files.

CTD_chem_gene_ixns.csv header:

# Fields:
# ChemicalName,ChemicalID,CasRN,GeneSymbol,GeneID,GeneForms,Organism,OrganismID,Interaction,InteractionActions,PubMedIDs

CTD_chemicals.csv header:


# Fields:
# ChemicalName,ChemicalID,CasRN,Definition,ParentIDs,TreeNumbers,ParentTreeNumbers,Synonyms,DrugBankIDs

The generated service should allow a query by ChemicalID to return data joining CTD_chemicals and CTD_chem_gene_ixns data. Assume column names are the same.

@tubafrenzy
Copy link

Noticed that CTD_chem_gene_ixns.csv contains data of the form:

MESH:C533344

while CTD_chemicals.csv seems to have the prefix stripped off:

C025205

This discrepancy isn't completely germane to the development I am doing, but it would mean these tables don't join properly in a demo/example.

Also, as I've been going down this road, I assume the API shoule be able to represent both one-to-one and many-to-one relationships from the perspective of both table queries? Or should they be cleanly married into a single denormalized-type table result from the "many" perspective, with duplicated "one" rows per line?

@stevencox
Copy link
Collaborator Author

(a), the normalized, relational approach, not the denormalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants