Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutations above 1K #5

Open
dmoaks opened this issue Jan 26, 2020 · 4 comments
Open

Permutations above 1K #5

dmoaks opened this issue Jan 26, 2020 · 4 comments
Assignees

Comments

@dmoaks
Copy link

dmoaks commented Jan 26, 2020

Hello,
I am trying to increase the accuracy of the nominal p-value by increasing the number of permutations, but when I increase nperm from 1K to 10K it returns the error:
"Error in if (obs.phi[i, j] >= 0) { :
missing value where TRUE/FALSE needed"

Everything works fine when I set nperm to 1K. Please advise as how to fix this.

@ACastanza
Copy link
Collaborator

Hi,
I was not able to reproduce this issue in either phenotype, or gene set permutation mode with nperm=10000.
Could you please provide the exact command you used to initialize GSEA?

@ACastanza ACastanza self-assigned this Jan 31, 2020
@ACastanza
Copy link
Collaborator

@dmoaks we still need additional information to follow-up on this issue. Please let us know if you are still experiencing it or were able to resolve it on your own. If we don't hear back in the next week or so, I'll be closing the report.

@dmoaks
Copy link
Author

dmoaks commented Feb 12, 2020

@ACastanza I'm sorry for the delay---I forgot that my GitHub account was linked to a less-used email.
I am still having the same issue...when I run with 1K permutations it works, but doesn't with 10K. FYI--I am using 'preranked' mode.
Here is the exact command:
GSEA( # Input/Output Files :------------------------------------------------------------------------------- input.ds = inputds, # Input gene expression dataset file in GCT format input.cls = NA, # Input class vector (phenotype) file in CLS format gs.db = gsdb, # Gene set database in GMT format input.chip = "NOCHIP", # CHIP File output.directory = dir, # Directory where to store output and results (default: "") # Program parameters :------------------------------------------------------------------------------- doc.string = name, # Documentation string used as a prefix to name result files (default: "GSEA.analysis") # Run in interactive (i.e. R GUI) or batch (R command line) mode (default: F) reshuffling.type = "gene.labels", # Type of permutation reshuffling: "sample.labels" or "gene.labels" (default: "sample.labels" nperm = 10000, # Number of random permutations (default: 1000) weighted.score.type = 1, # Enrichment correlation-based weighting: 0=no weight (KS), 1= weigthed, 2 = over-weigthed (default: 1) nom.p.val.threshold = -1, # Significance threshold for nominal p-vals for gene sets (default: -1, no thres) fwer.p.val.threshold = -1, # Significance threshold for FWER p-vals for gene sets (default: -1, no thres) fdr.q.val.threshold = 0.05, # Significance threshold for FDR q-vals for gene sets (default: 0.25) topgs = 20, # Besides those passing test, number of top scoring gene sets used for detailed reports (default: 10) adjust.FDR.q.val = F, # Adjust the FDR q-vals (default: F) gs.size.threshold.min = 1, # Minimum size (in genes) for database gene sets to be considered (default: 25) gs.size.threshold.max = 5000, # Maximum size (in genes) for database gene sets to be considered (default: 500) reverse.sign = F, # Reverse direction of gene list (pos. enrichment becomes negative, etc.) (default: F) preproc.type = 0, # Preproc.normalization: 0=none, 1=col(z-score)., 2=col(rank) and row(z-score)., 3=col(rank). (def: 0) random.seed = as.integer(as.POSIXct(Sys.time())), # Random number generator seed. (default: 123456) perm.type = 0, # For experts only. Permutation type: 0 = unbalanced, 1 = balanced (default: 0) fraction = 1.0, # For experts only. Subsampling fraction. Set to 1.0 (no resampling) (default: 1.0) replace = F, # For experts only, Resampling mode (replacement or not replacement) (default: F) collapse.dataset = F, # Collapse dataset to gene symbols using a user provided chip file (default: F) collapse.mode = "NOCOLLAPSE", save.intermediate.results = T, # For experts only, save intermediate results (e.g. matrix of random perm. scores) (default: F) use.fast.enrichment.routine = T, # Use faster routine to compute enrichment for random permutations (default: T) gsea.type = 'preranked', # Select Standard GSEA (default) or preranked rank.metric = 'S2N')

@dmoaks
Copy link
Author

dmoaks commented Feb 12, 2020

Posting the command again because the code formatting made it hard to read:
GSEA(
# Input/Output Files :-------------------------------------------------------------------------------
input.ds = inputds, # Input gene expression dataset file in GCT format
input.cls = NA, # Input class vector (phenotype) file in CLS format
gs.db = gsdb, # Gene set database in GMT format
input.chip = "NOCHIP", # CHIP File
output.directory = dir, # Directory where to store output and results (default: "")
# Program parameters :-------------------------------------------------------------------------------
doc.string = name, # Documentation string used as a prefix to name result files (default: "GSEA.analysis")
# Run in interactive (i.e. R GUI) or batch (R command line) mode (default: F)
reshuffling.type = "gene.labels", # Type of permutation reshuffling: "sample.labels" or "gene.labels" (default: "sample.labels"
nperm = 10000, # Number of random permutations (default: 1000)
weighted.score.type = 1, # Enrichment correlation-based weighting: 0=no weight (KS), 1= weigthed, 2 = over-weigthed (default: 1)
nom.p.val.threshold = -1, # Significance threshold for nominal p-vals for gene sets (default: -1, no thres)
fwer.p.val.threshold = -1, # Significance threshold for FWER p-vals for gene sets (default: -1, no thres)
fdr.q.val.threshold = 0.05, # Significance threshold for FDR q-vals for gene sets (default: 0.25)
topgs = 20, # Besides those passing test, number of top scoring gene sets used for detailed reports (default: 10)
adjust.FDR.q.val = F, # Adjust the FDR q-vals (default: F)
gs.size.threshold.min = 1, # Minimum size (in genes) for database gene sets to be considered (default: 25)
gs.size.threshold.max = 5000, # Maximum size (in genes) for database gene sets to be considered (default: 500)
reverse.sign = F, # Reverse direction of gene list (pos. enrichment becomes negative, etc.) (default: F)
preproc.type = 0, # Preproc.normalization: 0=none, 1=col(z-score)., 2=col(rank) and row(z-score)., 3=col(rank). (def: 0)
random.seed = as.integer(as.POSIXct(Sys.time())), # Random number generator seed. (default: 123456)
perm.type = 0, # For experts only. Permutation type: 0 = unbalanced, 1 = balanced (default: 0)
fraction = 1.0, # For experts only. Subsampling fraction. Set to 1.0 (no resampling) (default: 1.0)
replace = F, # For experts only, Resampling mode (replacement or not replacement) (default: F)
collapse.dataset = F, # Collapse dataset to gene symbols using a user provided chip file (default: F)
collapse.mode = "NOCOLLAPSE",
save.intermediate.results = T, # For experts only, save intermediate results (e.g. matrix of random perm. scores) (default: F)
use.fast.enrichment.routine = T, # Use faster routine to compute enrichment for random permutations (default: T)
gsea.type = 'preranked', # Select Standard GSEA (default) or preranked
rank.metric = 'S2N'
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants