Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify datasets #36

Merged
merged 6 commits into from
Nov 4, 2022
Merged

Clarify datasets #36

merged 6 commits into from
Nov 4, 2022

Conversation

shntnu
Copy link
Collaborator

@shntnu shntnu commented Nov 2, 2022

No description provided.

@shntnu
Copy link
Collaborator Author

shntnu commented Nov 2, 2022

  • delete 1.run-workflows/profiles/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1.csv.gz because it is a duplicate
df <- read_csv("1.run-workflows/profiles/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1.csv.gz")
df2 <- read_csv("1.run-workflows/profiles/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1.csv.gz")
compare::compare(df, df2)
# TRUE
  • delete 1.run-workflows/profiles/NCP_PROGENITORS_1_BRANCHING/BR_NCP_PROGENITORS_1.csv.gz because it is a duplicate
df <- read_csv("1.run-workflows/profiles/NCP_PROGENITORS_1_BRANCHING/BR_NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1.csv.gz")
df2 <- read_csv("1.run-workflows/profiles/NCP_PROGENITORS_1_BRANCHING/BR_NCP_PROGENITORS_1.csv.gz")
compare::compare(df, df2)
# TRUE
df <- read_csv("1.run-workflows/profiles/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1.csv.gz")
df %>% select(-matches("Metadata_")) %>% dim()
# [1]  380 4293
df <- read_csv("1.run-workflows/profiles/NCP_PROGENITORS_1_BRANCHING/BR_NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1.csv.gz")
df %>% select(-matches("Metadata_")) %>% dim()
# [1] 380  23

@shntnu
Copy link
Collaborator Author

shntnu commented Nov 2, 2022

@yhan8 Have a look at the README

The remaining puzzle (for now) is to figure out whether the repeat progenitor plate (BR00127194) had branching feature include in the profiles or not. Can you figure that out and update the table?

#10 (comment)

@shntnu shntnu requested a review from yhan8 November 2, 2022 16:26
@yhan8
Copy link
Contributor

yhan8 commented Nov 2, 2022

Can either @shntnu or @mtegtmey help confirm my understanding of the profiles below is correct.

For stem cells, the profile is located here. I am using the normalized_variable_selected profile, which if I am correct, this file has gone through normalization and feature selection. There are no branching features.


For progenitor cells, the profile of morphological features is located here. Please note that the csv.gz file has 4200+ features, which indicates it has not gone through feature selection process. I am going to perform a default feature selection on this profile using pycytominer. However, before I do so, is this file normalized at all? Can someone confirm?

There are 20+ branching features for the progenitor cells located here. I will add these 20+ branching features to the feature selected morphological features explained in the above paragraph to generate the final progenitor profile for downstream analysis.

README.md Outdated Show resolved Hide resolved
@shntnu shntnu merged commit 0259a89 into master Nov 4, 2022
@shntnu shntnu deleted the ss-clarify-datasets branch November 4, 2022 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants