Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting rankgenegroups to pandas dataframes #3486

Closed
Fougere87 opened this issue Feb 26, 2025 · 2 comments
Closed

Exporting rankgenegroups to pandas dataframes #3486

Fougere87 opened this issue Feb 26, 2025 · 2 comments
Labels
Triage 🩺 This issue needs to be triaged by a maintainer

Comments

@Fougere87
Copy link

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

Hi scanpy and thank you for your work :-) ,

I created a small function to export rank_gene_groups genes to panda Dataframes, so I'm proposing it:

Here is the code:

def get_df_of_rank_gene_groups(adata, key): 
    """Returns a dictionnary containing the differentially expressed genes in "key" in dataframes."""    
    rgg=adata.uns[key].copy()    
    if "params" in rgg :
        params=rgg.pop("params")     
    groups=list(rgg["names"].dtype.names) 
    d_return={}     
    for g in groups :
        df_g = pd.DataFrame({key:rgg.get(key)[g]for key in rgg.keys() if key not in ["pts", "pts_rest"]},                  
                            columns= rgg.keys())     
        df_g.index=df_g.pop("names") 
        
        if "pts" in rgg.keys():
            df_g.pop("pts") 
            if "pts_rest" in rgg.keys():
                df_g.pop("pts_rest")
                pts_df=pd.DataFrame({"pts":rgg.get("pts")[g],"pts_rest":rgg.get("pts_rest")[g]}, index=adata.var_names )
            else:
                pts_ref=rgg.get("pts")[rgg.get("pts").columns.difference(rgg["names"].dtype.names)]
                pts_df=pd.DataFrame({"pts":rgg.get("pts")[g], "pts_rest":pts_ref.iloc[:,0]}, index=adata.var_names )
            df_g=pd.concat([df_g,pts_df], axis=1)
        d_return[g]=df_g  
    
    if len(groups) > 1:         
        return d_return     
    else :         
        return d_return[groups[0]]

Depending on the number of groups, it returns a dictionnary, or a single dataframe, with genes names, pvalues and proportions of cells expressing.

I'm sure it needs improvments but it can really be useful to export DEGs for publication or sharing.

Best

@Fougere87 Fougere87 added the Triage 🩺 This issue needs to be triaged by a maintainer label Feb 26, 2025
@maltekuehl
Copy link

Hi, I'm not a maintainer of scanpy, just happened to see this. Are you aware of sc.get.rank_genes_groups_df? What additional functionality does your code add and could this be implemented as a change to the existing function?

@Fougere87
Copy link
Author

Hi,
Thank you Malte! No I was not aware of that one in the get module ! And I believe it does everything we need as it seems to also incluse pts.
So all good !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage 🩺 This issue needs to be triaged by a maintainer
Projects
None yet
Development

No branches or pull requests

2 participants