Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NCP Progenitors 1] Profile 22q cohort progenitors (D4) #10

Closed
shntnu opened this issue Dec 3, 2020 · 64 comments
Closed

[NCP Progenitors 1] Profile 22q cohort progenitors (D4) #10

shntnu opened this issue Dec 3, 2020 · 64 comments

Comments

@shntnu
Copy link
Collaborator

shntnu commented Dec 3, 2020

Goal

Perform Cell Painting on neural progenitor cells to delineate morphological traits which separate patients and controls during early forebrain development

Experimental Design

Expected date for imaging: Done
Dyes: Cell Painting dyes
Cell type: Day 4 progenitors
Plates: 1 x 384-well
Plate layout: this will be identical to the layout used for the cmQTL project, consisting of 48 different lines segmented into 4-well blocks dispersed across the 384-well plate.
Plating parameters: 15k cells/well, fixed 24hrs post-plating (identified in our pilot)

Proposed analysis:

  1. Ensure we can stratify sample/feature profiles based on
    1. isolated cells
    2. colony forming cells
  2. Identify particular features and organelles structures perturbed by the 22q11 deletion
  3. compare ‘differential’ features between iPSCs and NPCs to identify whether there are shared pathways perturbed across cell states.
  4. Using existing RNA-expression data to integrate imaging and molecular data

Metadata

@shntnu
Copy link
Collaborator Author

shntnu commented Dec 3, 2020

@mtegtmey please upload the images here /imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

@mtegtmey
Copy link
Collaborator

mtegtmey commented Dec 8, 2020

Images are uploaded!

@shntnu
Copy link
Collaborator Author

shntnu commented Dec 8, 2020

For my notes because I keep looking around for the new instructions to upload from login01:

cd /imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

mv "Matt T Cell Painting*" BR_NCP_PROGENITORS_1 # rename the image folder to a standard name

# now edit this line
#       <PlateID>Matt T Cell Painting LM 12012020</PlateID>
# to this
#      <PlateID>BR_NCP_PROGENITORS_1</PlateID>
emacs BR_NCP_PROGENITORS_1/Images/Index.idx.xml  

reuse UGER
ish -l h_vmem=4G -pe smp 4 # get a node
workon cellpntg2 # or whatever env in which you've installed awscli
aws configure # verify you're in the right account

aws s3 sync \
   /imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images \
   s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

Transfer is underway

@shntnu
Copy link
Collaborator Author

shntnu commented Dec 8, 2020

@pearlryder The images are ready for analysis
They live on /imaging/analysis at

/imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images 

and also on S3

s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

Feel free to pull in from either location

I think a good starting point to analyze these neuronal progenitor cells (Day 4) would be the pipeline used to analyze stem cells. For stem cells (a.k.a. NCP_STEM_1), I had reused an existing pipeline as mentioned here #7 (comment)

I can run DCP once the pipeline is configured if you prefer.

@shntnu
Copy link
Collaborator Author

shntnu commented Dec 9, 2020

@mtegtmey could you comment on the priority for this one? Would bumping it to the new year work?

@mtegtmey
Copy link
Collaborator

mtegtmey commented Dec 9, 2020

@shntnu it is high-priority, but bumping to the new year would be fine! For me, it would be ideal to try having profiles and 'feature differentials' (however you refer to them) by mid-Feb if that seems possible.

@shntnu
Copy link
Collaborator Author

shntnu commented Dec 9, 2020

Thanks @mtegtmey ! @pearlryder feel free to make a call on prioritizing based on this info

@pearlryder
Copy link

Thanks @mtegtmey! We're going to try to process this data before the end of this year, but it's great to know that we won't be holding you back too terribly if we need to wait until January. I'll keep you updated with our progress -- you can expect to hear from me by the end of next week. Cheers!

@pearlryder
Copy link

Hi @mtegtmey and team,

I wanted to update everyone that we did have time to process these images and extracted the data over the weekend. We'll start the process of analyzing the data when I return to work in the New Year.

I hope everyone has a very happy holiday!

@mtegtmey
Copy link
Collaborator

@pearlryder thank you so much for the update, and all your hard work getting to this point! Ralda and I so much appreciate the work all of you have done on this project, and cannot wait for all the exciting science we will get to do together over the coming years. It's a collaboration we value tremendously.

Have a wonderful holiday, 'see' you in the new year!

@shntnu
Copy link
Collaborator Author

shntnu commented Jan 4, 2021

@pearlryder you can stop at the collate step i.e. just before https://cytomining.github.io/profiling-handbook/create-profiles.html#annotate and I'll handle things downstream

@pearlryder
Copy link

Thanks @shntnu! I should have everything uploaded to AWS by the EOD tomorrow. I'll ping you here when it's ready.

@pearlryder
Copy link

@shntnu, the analysis files are now available at s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/workspace/backend/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/.

The per-site analysis files are available at s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/

In double checking the .csv file, I noticed that 4 wells are missing data: F11, F12, O18, and P18. I looked at several images for these wells and confirmed that the wells appear to be empty / contain debris only. Please let me know if you have any questions!

@shntnu
Copy link
Collaborator Author

shntnu commented Jan 5, 2021

Awesome! Thanks @pearlryder

I noticed that the SQLite file is 100Gb (BR_NCP_STEM_1 was 25Gb). Was the cell density high?

@pearlryder
Copy link

Yes @shntnu, most of the wells I examined were confluent. I just checked a few images from NCP_STEM_1 and they are indeed much lower density than the BR_NCP_PROGENITORS images (maybe ~ 50-75% confluency).

@mtegtmey
Copy link
Collaborator

mtegtmey commented Jan 5, 2021

@shntnu This is something we should expect. The conditions for the NPCs are 15k cells per well with a 24hr incubation period (so they may proliferate) compared to 10k cells with a 6hr incubation for the stem cells.

@shntnu
Copy link
Collaborator Author

shntnu commented Jan 5, 2021

Thanks @mtegtmey @pearlryder for clarifying!

@shntnu shntnu changed the title [NCP Progenitors 1] [NCP Progenitors 1] Profile 22q cohort progenitors (D4) Jun 17, 2021
@shntnu
Copy link
Collaborator Author

shntnu commented Jun 24, 2021

@mtegtmey To get his off the ground – is there any specific advantage in starting with an analysis of the 4 branching metrics alone? Or would you rather just have the entire profile (4000+) features.

@mtegtmey
Copy link
Collaborator

mtegtmey commented Jun 24, 2021 via email

@shntnu
Copy link
Collaborator Author

shntnu commented Jun 24, 2021

Sounds good

PS – you are tagging the wrong Shantanu :D This is a private repo so we are good. I'm @shntnu

@ruifanp
Copy link
Collaborator

ruifanp commented Jul 15, 2021

I am looking at the D4 data. It seems right now that the inter-human variation is greater than the difference between controls and deletion in the progenitors.

image

If subjects 5,6 and 33 are removed:
image

However, there are still features which distinguish deletions from controls. There were 122 features which were statistically significant in control vs deletion in both stem cells and progenitors (out of 300+ features effective). Of those about 75% went in the same direction. I'll do some supervised methods and linear models to see if we can reliably distinguish controls from deletions despite the inter-human variation.

<style> </style>

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 26, 2021

@ruifanp We wanted to see sample images for this plate

Please follow the steps here to do so

cytomining/cytoplot#8 (comment)

Ping me when you are stuck because I bet there are missing pieces of info

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 26, 2021

Oh, you will first need to download the images of course

  1. Follow steps here to set up the R environment
  2. Run this notebook on your system to download the sample images for the project. You may choose to edit this line to include only the dataset you care about right now (NCP_PROGENITORS_1):
datasets <- 
  tribble(
    ~batch, ~plate,
    "NCP_PROGENITORS_1", "BR_NCP_PROGENITORS_1"
  )

Note that you will need to run these lines on the command line to download the images:
https://github.com/broadinstitute/neuronal-cell-painting/blob/4ebc15074bb05a7e7ea09fe2a041d1a368d0a8a4/1.run-workflows/3.select_images_to_print.Rmd#L142-L158

  1. Create sample images for BR_NCP_PROGENITORS_1. Please follow the steps here to do so Visualize sample images from a plate in plate map view cytomining/cytoplot#8 (comment)

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 28, 2021

@ruifanp can you please have this #10 (comment) squared away this week and tag @mtegtmey when you're done?

@mtegtmey
Copy link
Collaborator

mtegtmey commented Aug 3, 2021

@ruifanp @shntnu any updates to this? I want to push a repeat experiment ASAP if necessary.

@mtegtmey
Copy link
Collaborator

@shntnu @ruifanp repeat plate for the NPCs will be imaged tomorrow! Any specific place you'd like me to transfer the images once they're finished?

@shntnu
Copy link
Collaborator Author

shntnu commented Sep 9, 2021

I am copying it now

aws s3 sync  --dryrun   /imaging/analysis/stanley/nehme_lab/cellpainting/22q11.2_NPC_8.31.21/BR00127194__2021-09-03T17_06_45-Measurement_2   s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images/BR00127194__2021-09-03T17_06_45-Measurement_2

I had to log in to an interactive node to do this; can't do from login node

https://broadinstitute.slack.com/archives/C3QFX04P7/p1631212076006000?thread_ts=1598987109.024400&cid=C3QFX04P7

@shntnu
Copy link
Collaborator Author

shntnu commented Sep 16, 2021

@bethac07 This is all set now

s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images/BR00127194__2021-09-03T17_06_45-Measurement_2

I'm not sure why it barfed twice, but looks good to go.

I believe Pearl's pipelines are here
https://imaging-platform.s3.us-east-1.amazonaws.com/projects/2019_05_28_Neuronal_Cell_Painting/workspace/pipelines/NCP_PROGENITORS_1

We want to run both analysis pipelines

Screen Shot 2021-09-16 at 1 45 49 PM

Please LMK if there's anything else you need.

@bethac07
Copy link

Do you WANT the branch analysis run separately or just folded into the larger analysis?

@mtegtmey
Copy link
Collaborator

mtegtmey commented Sep 16, 2021 via email

@shntnu
Copy link
Collaborator Author

shntnu commented Sep 17, 2021

@mtegtmey @bethac07 IIUC the hands-on time for folding into the larger analysis will take no longer than branch analysis, so let's go with together

@bethac07
Copy link

So @rsenft1 and I were running this second batch through and one thing we noticed is that the cell boundaries don't follow all the way out to the small dim processes - confirmed that it seems the same was true in the first batch (see screenshot below from NCP_PROGENITORS_1/O-05 site 3). Is this the desired behavior? For this second batch, would you want us to a) make it most close to the results of the last batch or b) make it follow all these processes out? We can design a pipeline either way but wanted to get your guys thoughts on it.

image

@raldanehme
Copy link
Collaborator

raldanehme commented Sep 17, 2021 via email

@bethac07
Copy link

Ok, can do.

Do we need to rerun the first batch? Otherwise you may get different results from that batch and this one- sorry, I haven't been in the loop enough to know whether this is intended to supplement or replace the plate from December.

@mtegtmey
Copy link
Collaborator

mtegtmey commented Sep 17, 2021 via email

@bethac07
Copy link

With the new settings, this is a more representative field of what we're seeing - green here is actin, magenta is DNA. Does this look like what you were expecting/hoping for for this cell type? If so, I can pull the trigger on analysis today or Monday.

image

@raldanehme
Copy link
Collaborator

raldanehme commented Sep 17, 2021 via email

@bethac07
Copy link

@shntnu Backends building now, do you want @rsenft1 and I to put them through a profiling workflow (and if so, the cytominer or pycytominer one, and where may we find the metadata for this set?) or just ping you that they're done?

@shntnu
Copy link
Collaborator Author

shntnu commented Sep 21, 2021

@shntnu Backends building now, do you want @rsenft1 and I to put them through a profiling workflow

Yes, please

and if so, the cytominer or pycytominer one

let's stick with cytominer since that's what we did in this project

For your reference, here's are the steps I followed
https://github.com/broadinstitute/neuronal-cell-painting/blob/master/1.run-workflows/generate_profiles.sh

and where may we find the metadata for this set?

https://github.com/broadinstitute/neuronal-cell-painting/tree/cb6b192df796866e690be07233fd7f3639620aa2/1.run-workflows/metadata/NCP_PROGENITORS_1

@bethac07
Copy link

Done!

shntnu added a commit that referenced this issue Sep 22, 2021
@shntnu
Copy link
Collaborator Author

shntnu commented Sep 22, 2021

Warp speed! Thank you!

image

The profiles have been added in #29

Over to you @ruifanp

@ruifanp
Copy link
Collaborator

ruifanp commented Sep 27, 2021

@ruifanp
Copy link
Collaborator

ruifanp commented Oct 14, 2021

@mtegtmey

We're seeing dramatic variance in the data which is driven by the cell count. I'm dealing with this by filtering out the wells with abnormal count and then de-correlating, which seems to have some positive effect. My question is can we make sure that the variability in cell count is due to technical effects rather than genetic? Is there any reason that a deletion should actually cause a different count than control?

Edit:
image

@mtegtmey
Copy link
Collaborator

@ruifanp

It is possible the deletion has some sort of cell adherence phenotype, but it is more like that its technical effects rather than biological. Do you by chance have a plot showing the cell counts by donor? I'm curious if it shows the same pattern as we have previously seen (whereby earlier numbers have lower cell counts overall compare to later numbers).

@ruifanp
Copy link
Collaborator

ruifanp commented Oct 21, 2021

There is a wide range of objects. Previously, we have seen that abnormally low counts result in unreproducible and unreliable data, which mostly happens with the higher line numbers (see above).

Distribution of cell counts:
image

Following this, all wells with counts below 1000 or above 7500 were removed. The data was renormalized and re feature selected. I also changed up some of the categories I used for removing redundant features, and used pycytominer's replicate correlation function instead of my own. The Cells_AreaShape_Area feature, which is almost perfectly correlated with count, is regressed out of the data also.

image

PCA shows some separation between controls and deletions, though not too much visually.

image

We can distinguish stem from progenitors easily using PCA though.

Logistic regression with 100 trials has an accuracy of 0.81 ± 0.068. This is an improvement over the previous run which had an accuracy of 0.67 ± 0.11.

@yhan8
Copy link
Contributor

yhan8 commented Jun 8, 2022

I redid logistic regression doing the split based on patient number rather than a purely random train test split. Here's the results.

image

Note that the scores tend to be pretty unstable depending on the random state used for the splitting, so I ran with several random states and chose a representative score. It appears that limiting results to ones with over 200 cell count only improves the quality of the data and leads to better separation of the classes.

@shntnu If I am understanding this correctly here, you guys were trying to classify progenitors vs. stem cells using logistic regression while regressing out low cell count. The conclusion is that low cell count is not a driving factor and splitting based on patient id is unreliable. If this interpretation is correct, then I am unclear on what we want to predict with the neuronal cells since there is only one class?

@shntnu
Copy link
Collaborator Author

shntnu commented Jun 15, 2022

@shntnu If I am understanding this correctly here, you guys were trying to classify progenitors vs. stem cells using logistic regression while regressing out low cell count. The conclusion is that low cell count is not a driving factor and splitting based on patient id is unreliable. If this interpretation is correct, then I am unclear on what we want to predict with the neuronal cells since there is only one class?

I skimmed this and couldn't recollect the rationale for this analysis – I can dig further but I am hoping that @ruifanp might be able to help us out here

@yhan8
Copy link
Contributor

yhan8 commented Jun 27, 2022

@shntnu If I am understanding this correctly here, you guys were trying to classify progenitors vs. stem cells using logistic regression while regressing out low cell count. The conclusion is that low cell count is not a driving factor and splitting based on patient id is unreliable. If this interpretation is correct, then I am unclear on what we want to predict with the neuronal cells since there is only one class?

I skimmed this and couldn't recollect the rationale for this analysis – I can dig further but I am hoping that @ruifanp might be able to help us out here

Pinning @shntnu so this is on his radar.

@ruifanp
Copy link
Collaborator

ruifanp commented Jul 5, 2022

@yhan8

Apologies for the late response, as I just got back from my travels and had some github access issues I needed to work out.

The logistic regression was actually attempting to classify diseased vs healthy samples at the stem cell or progenitor stage. Classifying stem vs progenitor cells is actually very easily done since they look very different, as seen from the clear separation in the PCA plot above. The challenge is being able to differentiate controls from deletions at the same development level (stem or progenitors). While any model was able to predict control vs deletions with much greater than random chance, the prediction scores were often unstable, especially in the progenitors. Oftentimes, there was greater variation between individuals of the same condition (deletion or control) than individuals of different conditions.

Anomalous cell counts had a large impact on the phenotype (see PCA plot with long spread out tail of prog deletions; those tend to be low cell counts) so we thought that regressing out the cell count could possibly improve the quality of the data. Unfortunately, doing so reduced too much signal/had too much noise to say it was an improvement. Removing wells with low cell count was undoubtedly better, but it's hard to say what the cutoff should be when the cell counts per well smoothly covers such a wide range.

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 5, 2022

Thank you @ruifanp!

@yhan8 and I chatted today and she has all the information she needs to proceed

@shntnu shntnu mentioned this issue Nov 2, 2022
@shntnu shntnu closed this as completed Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants