Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset regions from cistromes params and fix dotplot repressors #337

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

DanieFD
Copy link

@DanieFD DanieFD commented Mar 26, 2024

Added the "path to regions to subset" as a parameter in the snakemake config file. This allows giving the path to the bed file containing the MACS called peaks that one would like to subset when creating cistromes (hence, cell type-specific cistromes). It can also be left empty (empty string) if no regions should be subsetted.

Also, one small change in the regulons dotplot: considered all regulons with "-" as repressors.

Copy link
Collaborator

@SeppeDeWinter SeppeDeWinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Danie

Thank you for the PR.
Could you have a look at the comments I made.

Best,

Seppe

multiome_mudata_fname: pathlib.Path,
out_file_direct_annotation: pathlib.Path,
out_file_extended_annotation: pathlib.Path,
out_file_tf_names: pathlib.Path,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason why you added blank lines between the parameters? Otherwise I would delete them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no reason. I will delete them

adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
paths_to_motif_enrichment_results=paths_to_motif_enrichment_results,
scplus_regions=set(mdata['scATAC'].var_names),
scplus_regions=regions,
subset_regions = regions_to_overlap, #could be the set of all ATAC regions or a subset if subset is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this function call within the if statement block.

if path_to_regions_to_subset is not None:
    [...]
    adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
        [...],
        subset_regions = regions_to_overlap
    )
else:
    adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
        [...],
    )

log.info("Reading multiome MuData.")
mdata = mudata.read(multiome_mudata_fname.__str__())
log.info("Getting cistromes.")
regions = set(mdata['scATAC'].var_names)
regions_to_overlap = set(mdata['scATAC'].var_names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this within the if statement, see next comment as well.

parser.add_argument(
"--path_to_regions_to_subset", dest="path_to_regions_to_subset",
action="store", type=str, required=False,default ="",
help="Path to bed file for regions to subset when merging cistromes (MACS called peaks).")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be any bed file right, don't necessarily need to be peaks called by MACS? If this is the case I would remove "(MACS called peaks)" from the documentation to avoid confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True could be removed. or could add for example. Will fix it

extended = True))
return cistromes

def _overlap_if_necessary(s_query_regions, test_regions, regions_to_overlap):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken this function returns either a set (else block) or a pyranges object (if block), while the Cistrome class expects a set for the target_regions parameter.

Also I would run this function not while providing the arguments to clearly seperate these two things (i.e. 1. initiating the class and 2. putting the subset region in the same coordinate system as SCENIC+).

@@ -124,7 +124,7 @@ def heatmap_dotplot(
# Plotting
plotnine.options.figure_size = figsize
plotting_df["repressor_activator"] = [
"activator" if "+" in n.split("_")[2] else "repressor" for n in plotting_df[feature_name_key]]
"repressor" if "-" in n.split("_")[2] else "activator" for n in plotting_df[feature_name_key]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, that was an obvious mistake from my part. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants