Replies: 1 comment
-
Adding @n1zea144 @sheridancbio @dippindots @inodb these are good points. I agree that the naming has evolved to be very confusing... To simplify the concepts, may we can view all data in a study as the big matrix organized by profiles. I think it would be good to rename to general terms, e.g.:
If we change this, we should also change the API naming, so it would be a fairly big change... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We're currently trying to use generic assay more. The current data model works, but it is not very intuitive when (1) looking at the data in the database and by extension when (2) creating the data files. One of the reasons is b/c both treatment and later generic assay were shoehorned into genetic_profile. I think that worked well for prototyping but I worry about longterm maintainability. Now that we are also starting to add microbiome and mutational signature data it might be worth revisiting
Image from: RFC51: Generic Assay
Data model
genetic_profile
containsGENERIC_ASSAY
. Maybe more clear if (1) this table is either renamed togeneric_profile
andgeneric_assay_type
is set toGENETIC
for all datatypes ofMAF/DISCRETE/CONTINUOUS/Z-SCORE/LOG2-VALUE/FUSION/SV
. One issue here is that thedatatype
field forGENETIC
is always going to be different fromgeneric_assay
, which is not clear from the data model. It might make sense to go put allgeneric_assays
in a separate table insteadPIVOT_THRESHOLD
,SORT_ORDER
generic_entity_properties
->genetic_entity
->genetic_alteration
->genetic_profile
. If you look at the above schema image it is for me hard to imagine what any of these things mean without looking at the data. In addition there areNULL
values ingenetic_entity
for eachENTITY_TYPE
ofGENE
. I get why that is, but it's not very clean so might make sense to refactor this?Data files
When reading the documentation here: https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#generic-assay. This description is very clear, but the files themselves are a bit hard to follow:
genetic_alteration_type
this might be tricky to understand for a user. The field is called genetic_alteration_type, but we are talking about non-genetic data? Let's allow an alias ofprofile_type
datatype: LIMIT-VALUE
this is a pretty hard to understand datatype, so maybe better to use a more simple example first?pivot_threshold_value
very specific to treatment, is it optional?value_sort_order
seems like a bit of an edge case?It might make sense to show the most simple example first with these data files and then show a complete reference for all possible properties. I guess once we have mutational signature data we can maybe use that as the more basic example.
Anyway, just wanted to capture my thoughts while going through this for future reference. We can discuss later
Beta Was this translation helpful? Give feedback.
All reactions