Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues in random dataset generation and processing #41

Open
knaegle opened this issue Jan 4, 2021 · 1 comment
Open

Memory issues in random dataset generation and processing #41

knaegle opened this issue Jan 4, 2021 · 1 comment

Comments

@knaegle
Copy link
Contributor

knaegle commented Jan 4, 2021

Trying to run PDX S/T example and even on a cluster running out of memory (error below). Also, cannot run a large number of replicates on TCGA data in multiprocessing for similar error as before (negative bit issues). These suggest that we need to adjust the data streams to minimize data passing.

  1. Update to minimize data sizes that are passed in multiprocessing calculations.
  2. Update assembly of random data to not require entire phosphoproteome as size, but grows as merge increases sites seen.

Traceback (most recent call last):
File "run_PDX.py", line 40, in
kstar_activity.normalize_analysis(kinact_dict, activity_log, num_random_experiments, target_alpha)
File "/home/kmn4mj/.local/lib/python3.7/site-packages/kstar/activity/kstar_activity.py", line 999, in normalize_analysis
kinact.run_normalization(log, num_random_experiments, target_alpha)
File "/home/kmn4mj/.local/lib/python3.7/site-packages/kstar/activity/kstar_activity.py", line 125, in run_normalization
self.random_kinact.calculate_kinase_activities( agg='count', threshold=1.0, greater=True )
File "/home/kmn4mj/.local/lib/python3.7/site-packages/kstar/activity/kstar_activity.py", line 401, in calculate_kinase_activities
filtered_evidence_list = [self.evidence_binary[self.evidence_binary[col] ==1 ] for col in self.data_columns]
File "/home/kmn4mj/.local/lib/python3.7/site-packages/kstar/activity/kstar_activity.py", line 401, in
filtered_evidence_list = [self.evidence_binary[self.evidence_binary[col] ==1 ] for col in self.data_columns]
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 2890, in getitem
return self._getitem_bool_array(key)
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 2944, in _getitem_bool_array
return self._take_with_is_copy(indexer, axis=0)
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 3354, in _take_with_is_copy
result = self.take(indices=indices, axis=axis)
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 3342, in take
indices, axis=self._get_block_manager_axis(axis), verify=True
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1415, in take
new_axis=new_labels, indexer=indexer, axis=axis, allow_dups=True
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1259, in reindex_indexer
for blk in self.blocks
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1259, in
for blk in self.blocks
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 1251, in take_nd
values, indexer, axis=axis, allow_fill=allow_fill, fill_value=fill_value
File "/home/kmn4mj/.local/lib/python3.7/site-packages/pandas/core/algorithms.py", line 1678, in take_nd
out = np.empty(out_shape, dtype=dtype)
MemoryError: Unable to allocate 354. MiB for an array with shape (3750, 12385) and data type int64

@srcrowl
Copy link
Contributor

srcrowl commented Nov 7, 2022

Steps we are trying to take to address memory issues:

  • Allow option which does not save all random datasets generated to reduce memory burden
  • Explore possibility of using one random dataset for all samples in an experiment (requires that distribution of study bias is similar)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants