Replies: 1 comment 1 reply
-
Hi Tu,
Thanks for your question! Could you provide more details about your
intended use case? For example, are you performing a meta-analysis,
training a model, or something else?
In general, rather than using a machine with more memory, I would recommend
executing your query lazily. This way, you avoid needing to hold the entire
dataset in memory. You can achieve this with our ExperimentDataPipe class,
which retrieves data lazily from the Census in batches. This method
prevents loading the entire training dataset into memory at once.
You can find more details about using ExperimentDataPipe in our Census
documentation: Training a PyTorch Model (Create an ExperimentDataPipe)
<https://chanzuckerberg.github.io/cellxgene-census/notebooks/experimental/pytorch.html#Create-an-ExperimentDataPipe>
.
Let us know if this solution works for your use case, or feel free to
provide more context if you need additional guidance.
…On Mon, Oct 14, 2024 at 12:10 PM Tu Hu ***@***.***> wrote:
Hi there,
I plan to read 323 thousand cells into the memory. Is that too much? It
takes really long time to respond.
I am running on the cluster with 16 vCPU, memory 96 Gb. Should I use more
RAM?
Any advice for me to optimize?
import cellxgene_census
with cellxgene_census.open_soma(census_version="2024-07-01") as census:
adata = cellxgene_census.get_anndata(
census = census,
organism = "Homo sapiens",
obs_value_filter = "tissue == 'lung' and cell_type == 'CD4-positive, alpha-beta T cell'",
column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]},
)
print(adata)
—
Reply to this email directly, view it on GitHub
<#2705>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDZQDI76L6WDG4DH4SEP43Z3OKBTAVCNFSM6AAAAABP4USLRWVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXGMYTKOBVGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I plan to read 323 thousand cells into the memory. Is that too much? It takes really long time to respond.
I am running on the cluster with 16 vCPU, memory 96 Gb. Should I use more RAM?
Any advice for me to optimize?
Beta Was this translation helpful? Give feedback.
All reactions