-
Notifications
You must be signed in to change notification settings - Fork 2
Manual: Manual and help page containing detailed explanation of EPEPT input parameters and results
EPEPT Version 2.0 EPEPT: A web service for enhanced P-value estimation in permutation tests Theo Knijnenburg, Jake Lin, Hector Rovira, John Boyle, Ilya Shmulevich Fewer Permutations, More Accurate P-Values Theo Knijnenburg, Lodewyk Wessels, Marcel Reinders, Ilya Shmulevich Institute for Systems Biology 401 Terry Ave North Seattle, WA 98109-5234, US (206) 732-1200 Contact Us Software License The software presented in this project is offered under the following open source license: GNU Lesser GPL
Workflow A data file is required for the EPEPT web application. Once that file is entered, fill in the other input parameters and click the Execute button. The page will automatically refresh itself (see browser settings) and once completed, display the results (see screen shot below). If the email address field is filled, a message containing the results URL will be sent to that address. Please see web service clients for different programming language http workflow details.
Inputs File : Required input file containing test statistics and permutation values or labeled dataset. For example, download example datasets. The file should be a tab delimited text file, a comma separated text file or an Excel file. EPEPT checks the extension of the file to decide upon its format: Excel files should have the .xls or .xlsx extension and the data should be on the first sheet. Comma-separated files should have the extension .csv. All other files are assumed to be tab-delimited text files.
In the Permutation Values setting, each column in the file should contain one test statistic and its corresponding permutation values. Since multiple columns are allowed, different events (e.g. different genes or gene sets) can be tested simultaneously, yet independently.
Header Row: The file is allowed to have one header row. In case of a header row, the test statistics should be on the second row. In case no header row is used, the test statistics should be on the first row.
All numerical values in the rows below the test statistic are assumed to be the permutation values. Non-numerical values, NaN's (not a number) and Inf's (infinite) are ignored. At least 1,000 permutation values per column should be reported in order for the tail estimation procedure to be used.
In the SAM and GSEA settings, each column should contain the expression levels of all genes in the dataset. The first row should contain the class labels or other response type assigned to the columns. Possible configurations of the first row should match the `resp.type' options of the samR package (see Response Type parameters).
The first column can be used as a header column for the gene names. EPEPT Mode Three different modes are available. The default is Permutation Values, where the expected input is a matrix of permutation values. The second mode is SAM (Significance Analysis of Mircoarrays), which requires the user to upload a labeled microarray gene expression data set. EPEPT uses the samR package to compute permutation values, which are subsequently used for P-value estimation. The third mode is GSEA (Gene Set Enrichment Analysis), which requires a labeled microarray gene expression data set and a file with gene sets in the .gmt format. In this case, EPEPT uses the GSA to compute permutation values, which are subsequently used for P-value estimation. Estimation method Three different methods are available to estimate the parameters of the generalized Pareto distribution (which models the tail of the distribution of the permutation values): probability weighted moments (PWM), maximum likelihood (ML), and method of moments (MOM). Using theoretical distributions and practical applications we found that all methods performed comparably to each other. Some studies have been done comparing these estimators, often favoring ML.
See paper for more details. Confidence interval The confidence interval of the estimated P-value indicates the reliability of the estimate. The confidence interval is qualified by the confidence level (default 95%). Loosely speaking, the confidence level indicates how sure (e.g. 95% sure) we can be that the actual P-value is within the confidence interval. This level can be set between 10 and 99. Confidence interval checkbox A flag determining whether the confidence interval should be computed or not. Optimal order preserving transform checkbox A flag determining whether the optimal order preserving transform should be applied or not. Convergence criteria checkbox A flag determining whether the convergence criteria should be applied or not. Random seed If a numerical value between 1 and 1,000,000 is given, this will be used as a random seed allowing the user to reproduce EPEPT runs. When the (default) value 0 is selected, the random seed will be chosen arbitrarily. Response type: (SAM/GSEA mode) When EPEPT is used to generate permutation values in the SAM or GSEA setting, the user can choose the response type. The value must be one of the following: Quantitative, Two class unpaired, Two class paired, Survival, or Multiclass. NPerms:SAM/GSEA mode When EPEPT is used to generate permutation values in the SAM or GSEA setting, the user can choose the number of permutations to be performed. In the SAM setting the maximum is 1,000. (SAM evaluates the P-value of one gene using the permutation values of all genes, effectively multiplying the number of permutations used by the number of genes.) In the GSEA setting the maximum is 10,000. Geneset_file: GSEA mode When EPEPT is used to generate permutation values in the GSEA setting, a file with gene set annotations in gene matrix transposed (.gmt) format has to be given. Such a tab delimited text file contains one gene set per row. The first two columns contain the gene set ID and description. The following columns contain the genes for that particular gene set. The annotation of these genes should match the gene annotation in the header column of the gene expression data file. GSEA statistic: GSEA mode When EPEPT is used to generate permutation values in the GSEA setting, the user can choose the statistic used to summarize genesets. The value must be one of the following: maxmean, mean or absmean. Email address (optional) An email will be sent to the provided email address when the EPEPT run completes. This mail contains links to the results and logs. This email address is completely confidential and will not be used or shared with other systems and purposes. On the results page, this email address will be displayed as xxx@your_domain for added security. Outputs Estimated P-values The main output of EPEPT are the estimated P-values. These are reported in a tab-delimited text file. Any headers provided in the original file will be included in the output file. If confidence intervals were requested the two rows under the P-value estimates indicate the lower and upper bound of the confidence intervals. If convergence criteria were applied an additional row is included with binary values indicating whether the estimate converged (1) or not (0).
Images (.png and .eps format) are generated to visually depict the estimated P-values and their confidence bounds.