diff --git a/COPYING.txt b/COPYING.txt new file mode 100644 index 0000000..b68ddb7 --- /dev/null +++ b/COPYING.txt @@ -0,0 +1,181 @@ +License + + Section of Biomedical Image Analysis + Department of Radiology + University of Pennsylvania + 3700 Hamilton Walk, Floor 7 + Philadelphia, PA 19104 + + Web: http://www.med.upenn.edu/sbia/software.html + Email: software at cbica.upenn.edu + + + +SBIA Contribution and Software License Agreement ("Agreement") +============================================================== + + Version 1.1 (June 29, 2017) + + This Agreement covers contributions to and downloads from Software maintained by + the Section of Biomedical Image Analysis ("SBIA"), Center for Biomedical Image + Computing and Analytics ("CBICA"), Department of Radiology at the University + of Pennsylvania. Part A of this Agreement applies to contributions of + software and/or data to the Software (including making revisions of or additions + to code and/or data already in this Software). Part B of this Agreement applies to + downloads of software and/or data from SBIA. Part C of this Agreement applies to + all transactions with SBIA. If you distribute Software (as defined below) downloaded + from SBIA, all of the paragraphs of Part B of this Agreement must be included with + and apply to such Software. + + Your contribution of software and/or data to SBIA (including prior to the date + of the first publication of this Agreement, each a "Contribution") and/or + downloading, copying, modifying, displaying, distributing or use of any software + and/or data from SBIA (collectively, the "Software") constitutes acceptance of + all of the terms and conditions of this Agreement. If you do not agree to such + terms and conditions, you have no right to contribute your Contribution, or to + download, copy, modify, display, distribute or use the Software. + + PART A. CONTRIBUTION AGREEMENT - LICENSE TO SBIA WITH RIGHT TO SUBLICENSE ("CONTRIBUTION AGREEMENT"). + ----------------------------------------------------------------------------------------------------- + + 1. As used in this Contribution Agreement, "you" means the individual contributing + the Contribution to the Software maintained by SBIA and the institution or entity + which employs or is otherwise affiliated with such individual in connection with + such Contribution. + + 2. This Contribution Agreement applies to all Contributions made to the Software + maintained by SBIA, including without limitation Contributions made prior to + the date of first publication of this Agreement. If at any time you make a + Contribution to the Software, you represent that (i) you are legally authorized + and entitled to make such Contribution and to grant all licenses granted in this + Contribution Agreement with respect to such Contribution; (ii) if your + Contribution includes any patient data, all such data is de-identified in + accordance with U.S. confidentiality and security laws and requirements, + including but not limited to the Health Insurance Portability and Accountability + Act (HIPAA) and its regulations, and your disclosure of such data for the purposes + contemplated by this Agreement is properly authorized and in compliance with all + applicable laws and regulations; and (iii) you have preserved in the Contribution + all applicable attributions, copyright notices and licenses for any third party + software or data included in the Contribution. + + 3. Except for the licenses granted in this Agreement, you reserve all right, + title and interest in your Contribution. + + 4. You hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, + non-exclusive, no charge, royalty-free, irrevocable license to use, reproduce, + make derivative works of, display and distribute the Contribution. If your + Contribution is protected by patent, you hereby grant to SBIA, with the right + to sublicense, a perpetual, worldwide, non-exclusive, no-charge, royalty-free, + irrevocable license under your interest in patent rights covering the Contribution, + to make, have made, use, sell and otherwise transfer your Contribution, alone + or in combination with any other code. + + 5. You acknowledge and agree that SBIA may incorporate your Contribution into + the Software and may make the Software available to members of the public + on an open source basis under terms substantially in accordance with the + Software License set forth in Part B of this Agreement. You further acknowledge + and agree that SBIA shall have no liability arising in connection with claims + resulting from your breach of any of the terms of this Agreement. + + 6. YOU WARRANT THAT TO THE BEST OF YOUR KNOWLEDGE YOUR CONTRIBUTION DOES NOT + CONTAIN ANY CODE THAT REQUIRES OR PRESCRIBES AN "OPEN SOURCE LICENSE" FOR + DERIVATIVE WORKS (by way of non-limiting example, the GNU General Public + License or other so-called "reciprocal" license that requires any derived + work to be licensed under the GNU General Public License or other + "open source license"). + + PART B. DOWNLOADING AGREEMENT - LICENSE FROM SBIA WITH RIGHT TO SUBLICENSE ("SOFTWARE LICENSE"). + ------------------------------------------------------------------------------------------------ + + 1. As used in this Software License, "you" means the individual downloading and/or + using, reproducing, modifying, displaying and/or distributing the Software and + the institution or entity which employs or is otherwise affiliated with such + individual in connection therewith. The Section of Biomedical Image Analysis, + Department of Radiology at the Universiy of Pennsylvania ("SBIA") hereby grants + you, with right to sublicense, with respect to SBIA's rights in the software, + and data, if any, which is the subject of this Software License (collectively, + the "Software"), a royalty-free, non-exclusive license to use, reproduce, make + derivative works of, display and distribute the Software, provided that: + (a) you accept and adhere to all of the terms and conditions of this Software + License; (b) in connection with any copy of or sublicense of all or any portion + of the Software, all of the terms and conditions in this Software License shall + appear in and shall apply to such copy and such sublicense, including without + limitation all source and executable forms and on any user documentation, + prefaced with the following words: "All or portions of this licensed product + (such portions are the "Software") have been obtained under license from the + Section of Biomedical Image Analysis, Department of Radiology at the University + of Pennsylvania and are subject to the following terms and conditions:" + (c) you preserve and maintain all applicable attributions, copyright notices + and licenses included in or applicable to the Software; (d) modified versions + of the Software must be clearly identified and marked as such, and must not + be misrepresented as being the original Software; and (e) you consider making, + but are under no obligation to make, the source code of any of your modifications + to the Software freely available to others on an open source basis. + + 2. The license granted in this Software License includes without limitation the + right to (i) incorporate the Software into proprietary programs (subject to + any restrictions applicable to such programs), (ii) add your own copyright + statement to your modifications of the Software, and (iii) provide additional + or different license terms and conditions in your sublicenses of modifications + of the Software; provided that in each case your use, reproduction or + distribution of such modifications otherwise complies with the conditions + stated in this Software License. + + 3. This Software License does not grant any rights with respect to third party + software, except those rights that SBIA has been authorized by a third + party to grant to you, and accordingly you are solely responsible for + (i) obtaining any permissions from third parties that you need to use, + reproduce, make derivative works of, display and distribute the Software, + and (ii) informing your sublicensees, including without limitation your + end-users, of their obligations to secure any such required permissions. + + 4. The Software has been designed for research purposes only and has not been + reviewed or approved by the Food and Drug Administration or by any other + agency. YOU ACKNOWLEDGE AND AGREE THAT CLINICAL APPLICATIONS ARE NEITHER + RECOMMENDED NOR ADVISED. Any commercialization of the Software is at the + sole risk of the party or parties engaged in such commercialization. + You further agree to use, reproduce, make derivative works of, display + and distribute the Software in compliance with all applicable governmental + laws, regulations and orders, including without limitation those relating + to export and import control. + + 5. The Software is provided "AS IS" and neither SBIA nor any contributor to + the software (each a "Contributor") shall have any obligation to provide + maintenance, support, updates, enhancements or modifications thereto. + SBIA AND ALL CONTRIBUTORS SPECIFICALLY DISCLAIM ALL EXPRESS AND IMPLIED + WARRANTIES OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + IN NO EVENT SHALL SBIA OR ANY CONTRIBUTOR BE LIABLE TO ANY PARTY FOR + DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES + HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY ARISING IN ANY WAY RELATED + TO THE SOFTWARE, EVEN IF SBIA OR ANY CONTRIBUTOR HAS BEEN ADVISED OF THE + POSSIBILITY OF SUCH DAMAGES. TO THE MAXIMUM EXTENT NOT PROHIBITED BY LAW OR + REGULATION, YOU FURTHER ASSUME ALL LIABILITY FOR YOUR USE, REPRODUCTION, + MAKING OF DERIVATIVE WORKS, DISPLAY, LICENSE OR DISTRIBUTION OF THE SOFTWARE + AND AGREE TO INDEMNIFY AND HOLD HARMLESS SBIA AND ALL CONTRIBUTORS FROM + AND AGAINST ANY AND ALL CLAIMS, SUITS, ACTIONS, DEMANDS AND JUDGMENTS ARISING + THEREFROM. + + 6. None of the names, logos or trademarks of CBICA/SBIA or any of CBICA/SBIA's affiliates + or any of the Contributors, or any funding agency, may be used to endorse + or promote products produced in whole or in part by operation of the Software + or derived from or based on the Software without specific prior written + permission from the applicable party. + + 7. Any use, reproduction or distribution of the Software which is not in accordance + with this Software License shall automatically revoke all rights granted to you + under this Software License and render Paragraphs 1 and 2 of this Software + License null and void. + + 8. This Software License does not grant any rights in or to any intellectual + property owned by SBIA or any Contributor except those rights expressly + granted hereunder. + + + PART C. MISCELLANEOUS + --------------------- + + This Agreement shall be governed by and construed in accordance with the laws + of The Commonwealth of Pennsylvania without regard to principles of conflicts + of law. This Agreement shall supercede and replace any license terms that you + may have agreed to previously with respect to Software from SBIA. diff --git a/Pre_computed_results/HYDRA_results_test.mat b/Pre_computed_results/HYDRA_results_test.mat new file mode 100644 index 0000000..b918b35 Binary files /dev/null and b/Pre_computed_results/HYDRA_results_test.mat differ diff --git a/README.txt b/README.txt new file mode 100644 index 0000000..7e6cd46 --- /dev/null +++ b/README.txt @@ -0,0 +1,143 @@ +HYDRA + + Section of Biomedical Image Analysis + Department of Radiology + University of Pennsylvania + Richard Building + 3700 Hamilton Walk, 7th Floor + Philadelphia, PA 19104 + + Web: https://www.med.upenn.edu/sbia/ + Email: sbia-software at uphs.upenn.edu + + Copyright (c) 2018 University of Pennsylvania. All rights reserved. + See https://www.med.upenn.edu/sbia/software-agreement.html or COPYING file. + +Author: +Erdem Varol +software@cbica.upenn.edu + +=============== +1. INTRODUCTION +=============== +This software performs clustering of heterogenous disease patterns within patient group. The clustering is based on seperating the patient imaging features from the control imaging features using a convex polytope classifier. Covariate correction can be performed optionally. + + +=============== +2. TESTING & INSTALLATION +=============== + +This software has been primarily implemented in MATLAB for Linux operating systems. + +---------------- + Requirements +---------------- +- Matlab optimization toolbox +- Matlab version >2014 + + +---------------- + Installation +---------------- + +Hydra can be run directly in a matlab environment without compilation. + +OPTIONAL: + +If the user wants to run hydra as a standalone executable, then it must be compiled as following (using the additionally obtained matlab compiler "mcc"): + +Run the following command in a MATLAB environment: + + mcc -m hydra.m + +----------------- + Test +----------------- +We provided a test sample in the test folder. + +To test in matlab enviroment, use the command: + +hydra('-i','test.csv','-o','.','-z','test_covar.csv','-k',3,'-f',3) + +To test in command line using the compiled executable, use the command: + +hydra -i test.csv -o . -z test_covar.csv -k 3 -f 3 + +This runs a HYDRA experiment which may take a few minutes. The test case contains a subset of a functional MRI study dataset by T. Satterwaithe comprising 100 subjects and their functional ROI's. The output is the clustering labels of the input subjects (only patients are clustered) at varying clustering levels. Also, the clustering stability at varying levels is output to show the rationale for choosing the clustering level. + +----------------- + Test Verification +----------------- + +Pre-computed HYDRA results have been included in directory "Pre_computed_test_results". The user may verify that their test results match the pre-computed results to confirm proper set-up. If the clustering occurred properly, ARI for clustering level k=3 should be greater than that of clustering level k=2. + +========== +3. USAGE +========== + +I. Running "HYDRA": + +Here is a brief introduction to running HYDRA. For a complete list of parameters, see --help option. + +To run this software, you will need an input csv file, with the following mandatory fields in the following column order: +(Column 1) ID: ID for subject +(Column 2---(last minus 1)) features: features to be used for clustering +(Column (last)) groups: label whether the subject is control (-1) or patient (1) + +NOTE: Controls must be strictly -1 and patients must be 1 label. +NOTE: Label headers names are not strict. + +An example input csv file looks as following: + +ID, feature_1, feauture_2, feature_3, group +subject_1, 5, 1, 79.3, -1 +subject_2, 10, 1, 71.4, 1 +subject_3, 3, 1, 82.7, -1 + +Optionally, you can provide a covariate file that will be used to remove covariate effects from imaging features before HYDRA analysis. The covariate file has the following format: +(Column 1) ID: ID for subject +(Column 2---(last)) covariates: covariates of subjects + +An example covariate csv file looks as following: + +ID, age, sex +subject_1, 29, 1 +subject_2, 35, 1 +subject_3, 51, 0 + +If you install the package successfully, there will be two ways of running HYDRA: + +1. Running HYDRA in a matlab environment, a simple example: + + hydra('-i','test.csv','-o','.','-z','test_covar.csv','-k',3,'-f',3) + +2. Running matlab compiled HYDRA executables in the command line, a simple example: + + hydra -i test.csv -o . -z test_covar.csv -k 3 -f 3 + + +The software returns: + + +1. HYDRA_results.mat in the specified output directory. + +This mat file stores the following variables + +CIDX - clustering indices for subjects (rows) at varying levels (columns) +ARI - adjusted rand index of clustering at varying levels, clustering level at the highest ARI should be selected +ID - subject ID of rows + +=========== +4. REFERENCE +=========== + +If you find this software useful, please cite: + +Varol, Erdem, Aristeidis Sotiras, Christos Davatzikos, and Alzheimer's Disease Neuroimaging Initiative. "HYDRA: Revealing heterogeneity of imaging and genetic patterns through a multiple max-margin discriminative analysis framework." NeuroImage 145 (2017): 346-364. + +=========== +5. LICENSING +=========== + + See https://www.med.upenn.edu/sbia/software-agreement.html or COPYING.txt file. + diff --git a/hydra.m b/hydra.m new file mode 100644 index 0000000..c6d3eb1 --- /dev/null +++ b/hydra.m @@ -0,0 +1,635 @@ +% HYDRA +% Version 1.0.0 --- January 2018 +% Section of Biomedical Image Analysis +% Department of Radiology +% University of Pennsylvania +% Richard Building +% 3700 Hamilton Walk, 7th Floor +% Philadelphia, PA 19104 +% +% Web: https://www.med.upenn.edu/sbia/ +% Email: sbia-software at uphs.upenn.edu +% +% Copyright (c) 2018 University of Pennsylvania. All rights reserved. +% See https://www.med.upenn.edu/sbia/software-agreement.html or COPYING file. +% +% Author: +% Erdem Varol +% software@cbica.upenn.edu + + +function [CIDX,ARI] = hydra(varargin) + +if nargin==0 + printhelp() + return +end + +if( strcmp(varargin{1},'--help') || isempty(varargin)) + printhelp() + return; +end + +if( strcmp(varargin{1},'-h') || isempty(varargin) ) + printhelp() + return +end + +if( strcmp(varargin{1},'--version') || isempty(varargin) ) + fprintf('Version 1.0.\n') + return +end + +if( strcmp(varargin{1},'-v') || isempty(varargin) ) + fprintf('Version 1.0.\n') + return +end + +if( strcmp(varargin{1},'-u') || isempty(varargin) ) + fprintf(' EXAMPLE USE (in matlab) \n'); + fprintf(' hydra(''-i'',''test.csv'',''-o'',''.'',''-k'',3,''-f'',3) \n'); + fprintf(' EXAMPLE USE (in command line) \n'); + fprintf(' hydra -i test.csv -o . -k 3 -f 3 \n'); + return +end + +if( strcmp(varargin{1},'--usage') || isempty(varargin) ) + fprintf(' EXAMPLE USE (in matlab) \n'); + fprintf(' hydra(''-i'',''test.csv'',''-o'',''.'',''-k'',3,''-f'',3) \n'); + fprintf(' EXAMPLE USE (in command line) \n'); + fprintf(' hydra -i test.csv -o . -k 3 -f 3 \n'); + return +end + +% function returns estimated subgroups by hydra for clustering +% configurations ranging from K=1 to K=10, or another specified range of +% values. The function returns also the Adjusted Rand Index that was +% calculated across the cross-validation experiments and comparing +% respective clustering solutions. +% +% INPUT +% +% REQUIRED +% [--input, -i] : .csv file containing the input features. (REQUIRED) +% every column of the file contains values for a feature, with +% the exception of the first and last columns. We assume that +% the first column contains subject identifying information +% while the last column contains label information. First line +% of the file should contain header information. Label +% convention: -1 -> control group - 1 -> pathological group +% that will be partioned to subgroups +% [--outputDir, -o] : directory where the output from all folds will be saved (REQUIRED) +% +% OPTIONAL +% +% [--covCSV, -z] : .csv file containing values for different covariates, which +% will be used to correct the data accordingly (OPTIONAL). Every +% column of the file contains values for a covariate, with the +% exception of the first column, which contains subject +% identifying information. Correction is performed by solving a +% solving a least square problem to estimate the respective +% coefficients and then removing their effect from the data. The +% effect of ALL provided covariates is removed. If no file is +% specified, no correction is performed. +% +% NOTE: featureCSV and covCSV files are assumed to have the subjects given +% in the same order in their rows +% +% [--c, -c] : regularization parameter (positive scalar). smaller values produce +% sparser models (OPTIONAL - Default 0.25) +% [--reg_type, -r] : determines regularization type. 1 -> promotes sparsity in the +% estimated hyperplanes - 2 -> L2 norm (OPTIONAL - Default 1) +% [--balance, -b] : takes into account differences in the number between the two +% classes. 1-> in case there is mismatch between the number of +% controls and patient - 0-> otherwise (OPTIONAL - Default 1) +% [--init, -g] : initialization strategy. 0 : assignment by random hyperplanes +% (not supported for regression), 1 : pure random assignment, 2: +% k-means assignment, 3: assignment by DPP random +% hyperplanes (default) +% [--iter, -t] : number of iterations between estimating hyperplanes, and cluster +% estimation. Default is 50. Increase if algorithms fails to +% converge +% [--numconsensus, -n] : number of clustering consensus steps. Default is 20. +% Increase if algorithm gives unstable clustering results. +% [--kmin, -m] : determines the range of clustering solutions to evaluate +% (i.e., kmin to kmax). Default value is 1. +% [--kmax, -k] : determines the range of clustering solutions to evaluate +% (i.e., kmin to kmax). Default value is 10. +% [--kstep, -s] : determines the range of clustering solutions to evaluate +% (i.e., kmin to kmax, with step kstep). Default value is 1. +% [--cvfold, -f]: number of folds for cross validation. Default value is 10. +% [--vo, -j] : verbose output (i.e., also saves input data to verify that all were +% read correctly. Default value is 0 +% [--usage, -u] Prints basic usage message. +% [--help, -h] Prints help information. +% [--version, -v] Prints information about software version. +% +% OUTPUT: +% CIDX: sub-clustering assignments of the disease population (positive +% class). +% ARI: adjusted rand index measuring the overlap/reproducibility of +% clustering solutions across folds +% +% NOTE: to compile this function do +% mcc -m hydra.m +% +% +% EXAMPLE USE (in matlab) +% hydra('-i','test.csv','-o','.','-k',3,'-f',3); +% EXAMPLE USE (in command line) +% hydra -i test.csv -o . -k 3 -f 3 + + +params.kernel=0; + + + +if( sum(or(strcmpi(varargin,'--input'),strcmpi(varargin,'-i')))==1) + featureCSV=varargin{find(or(strcmpi(varargin,'--input'),strcmp(varargin,'-i')))+1}; +else + error('hydra:argChk','Please specify input csv file!'); +end + + +if( sum(or(strcmpi(varargin,'--outputDir'),strcmpi(varargin,'-o')))==1) + outputDir=varargin{find(or(strcmp(varargin,'--outputDir'),strcmp(varargin,'-o')))+1}; +else + error('hydra:argChk','Please specify output directory!'); +end + + +if( sum(or(strcmpi(varargin,'--cov'),strcmpi(varargin,'-z')))==1) + covCSV=varargin{find(or(strcmpi(varargin,'--cov'),strcmp(varargin,'-z')))+1}; +else + covCSV=[]; +end + +if( sum(or(strcmpi(varargin,'--c'),strcmpi(varargin,'-c')))==1) + params.C=varargin{find(or(strcmpi(varargin,'--c'),strcmp(varargin,'-c')))+1}; +else + params.C=0.25; +end + +if( sum(or(strcmpi(varargin,'--reg_type'),strcmpi(varargin,'-r')))==1) + params.reg_type=varargin{find(or(strcmpi(varargin,'--reg_type'),strcmp(varargin,'-r')))+1}; +else + params.reg_type=1; +end + +if( sum(or(strcmpi(varargin,'--balance'),strcmpi(varargin,'-b')))==1) + params.balanceclasses=varargin{find(or(strcmpi(varargin,'--balance'),strcmp(varargin,'-b')))+1}; +else + params.balanceclasses=1; +end + +if( sum(or(strcmpi(varargin,'--init'),strcmpi(varargin,'-g')))==1) + params.init_type=varargin{find(or(strcmpi(varargin,'--init'),strcmp(varargin,'-g')))+1}; +else + params.init_type=3; +end + +if( sum(or(strcmpi(varargin,'--iter'),strcmpi(varargin,'-t')))==1) + params.numiter=varargin{find(or(strcmpi(varargin,'--iter'),strcmp(varargin,'-t')))+1}; +else + params.numiter=50; +end + +if( sum(or(strcmpi(varargin,'--numconsensus'),strcmpi(varargin,'-n')))==1) + params.numconsensus=varargin{find(or(strcmpi(varargin,'--numconsensus'),strcmp(varargin,'-n')))+1}; +else + params.numconsensus=20; +end + +if( sum(or(strcmpi(varargin,'--kmin'),strcmpi(varargin,'-m')))==1) + params.kmin=varargin{find(or(strcmpi(varargin,'--kmin'),strcmp(varargin,'-m')))+1}; +else + params.kmin=1; +end + +if( sum(or(strcmpi(varargin,'--kstep'),strcmpi(varargin,'-s')))==1) + params.kstep=varargin{find(or(strcmpi(varargin,'--kstep'),strcmp(varargin,'-s')))+1}; +else + params.kstep=1; +end + +if( sum(or(strcmpi(varargin,'--kmax'),strcmpi(varargin,'-k')))==1) + params.kmax=varargin{find(or(strcmpi(varargin,'--kmax'),strcmp(varargin,'-k')))+1}; +else + params.kmax=10; +end + +if( sum(or(strcmpi(varargin,'--cvfold'),strcmpi(varargin,'-f')))==1) + params.cvfold=varargin{find(or(strcmpi(varargin,'--cvfold'),strcmp(varargin,'-f')))+1}; +else + params.cvfold=10; +end + +if( sum(or(strcmpi(varargin,'--vo'),strcmpi(varargin,'-j')))==1) + params.vo=varargin{find(or(strcmpi(varargin,'--vo'),strcmp(varargin,'-j')))+1}; +else + params.vo=0; +end + +% create output directory +if (~exist(outputDir,'dir')) + [status,~,~] = mkdir(outputDir); + if (status == 0) + error('hydra:argChk','Cannot create output directory!'); + end +end + + +params.C=input2num(params.C); +params.reg_type=input2num(params.reg_type); +params.balanceclasses=input2num(params.balanceclasses); +params.init_type=input2num(params.init_type); +params.numiter=input2num(params.numiter); +params.numconsensus=input2num(params.numconsensus); +params.kmin=input2num(params.kmin); +params.kstep=input2num(params.kstep); +params.kmax=input2num(params.kmax); +params.cvfold=input2num(params.cvfold); +params.vo=input2num(params.vo); + + +% confirm validity of optional input arguments +validateFcn_reg_type = @(x) (x==1) || (x == 2); +validateFcn_balance = @(x) (x==0) || (x == 1); +validateFcn_init = @(x) (x==0) || (x == 1) || (x==2) || (x == 3) || (x == 4); +validateFcn_iter = @(x) isscalar(x) && (x>0) && (mod(x,1)==0); +validateFcn_consensus = @(x) isscalar(x) && (x>0) && (mod(x,1)==0); +validateFcn_kmin = @(x) isscalar(x) && (x>0) && (mod(x,1)==0); +validateFcn_kmax = @(x,y) isscalar(x) && (x>0) && (mod(x,1)==0) && (x>y); +validateFcn_kstep = @(x,y,z) isscalar(x) && (x>0) && (mod(x,1)==0) && (x+y0) && (mod(x,1)==0); +validateFcn_vo = @(x) (x==0) || (x == 1); + +if(~validateFcn_reg_type(params.reg_type)) + error('hydra:argChk','Input regularization type (reg_type) should be either 1 or 2!'); +end +if(~validateFcn_balance(params.balanceclasses)) + error('hydra:argChk','Input balance classes (balance) should be either 1 or 2!'); +end +if(~validateFcn_init(params.init_type)) + error('hydra:argChk','Initialization type can be either 0, 1, 2, 3, or 4!'); +end +if(~validateFcn_iter(params.numiter)) + error('hydra:argChk','Number of iterations should be a positive integer!'); +end +if(~validateFcn_consensus(params.numconsensus)) + error('hydra:argChk','Number of clustering consensus steps should be a positive integer!'); +end +if(~validateFcn_kmin(params.kmin)) + error('hydra:argChk','Minimum number of clustering solutions to consider should be a positive integer!'); +end +if(~validateFcn_kmax(params.kmax,params.kmin)) + error('hydra:argChk','Maximum number of clustering solutions to consider should be a positive integer that is greater than the minimum number of clustering solutions!'); +end +if(~validateFcn_kstep(params.kstep,params.kmin,params.kmax)) + error('hydra:argChk','Step number of clustering solutions to consider should be a positive integer that is between the minimun and maximum number of clustering solutions!'); +end +if(~validateFcn_cvfold(params.cvfold)) + error('hydra:argChk','Number of folds for cross-validation should be a positive integer!'); +end +if(~validateFcn_vo(params.vo)) + error('hydra:argChk','VO parameter should be either 0 or 1!'); +end + +disp('Done'); +disp('HYDRA runs with the following parameteres'); +disp(['featureCSV: ' featureCSV]); +disp(['OutputDir: ' outputDir]); +disp(['covCSV: ' covCSV]) +disp(['C: ' num2str(params.C)]); +disp(['reg_type: ' num2str(params.reg_type)]); +disp(['balanceclasses: ' num2str(params.balanceclasses)]); +disp(['init_type: ' num2str(params.init_type)]); +disp(['numiter: ' num2str(params.numiter)]); +disp(['numconsensus: ' num2str(params.numconsensus)]); +disp(['kmin: ' num2str(params.kmin)]); +disp(['kmax: ' num2str(params.kmax)]); +disp(['kstep: ' num2str(params.kstep)]); +disp(['cvfold: ' num2str(params.cvfold)]); +disp(['vo: ' num2str(params.vo)]); + +% csv with features +fname=featureCSV; +if (~exist(fname,'file')) + error('hydra:argChk','Input feature .csv file does not exist'); +end + +% csv with features +covfname=covCSV; +if(~isempty(covfname)) + if(~exist(covfname,'file')) + error('hydra:argChk','Input covariate .csv file does not exist'); + end +end + +% input data +% assumption is that the first column contains IDs, and the last contains +% labels +disp('Loading features...'); +input=readtable(fname); +ID=input{:,1}; +XK=input{:,2:end-1}; +Y=input{:,end}; + +% z-score imaging features +XK=zscore(XK); +disp('Done'); + +% input covariate information if necesary +if(~isempty(covfname)) + disp('Loading covariates...'); + covardata = readtable(covfname) ; + IDcovar = covardata{:,1}; + covar = covardata{:,2:end}; + covar = zscore(covar); + disp('Done'); +end + +% NOTE: we assume that the imaging data and the covariate data are given in +% the same order. No test is performed to check that. By choosing to have a +% verbose output, you can have access to the ID values are read by the +% software for both the imaging data and the covariates + +% verify that we have covariate data and imaging data for the same number +% of subjects +if(~isempty(covfname)) + if(size(covar,1)~=size(XK,1)) + error('hydra:argChk','The feature .csv and covariate .csv file contain data for different number of subjects'); + end +end + +% residualize covariates if necessary +if(~isempty(covfname)) + disp('Residualize data...'); + [XK0,~]=GLMcorrection(XK,Y,covar,XK,covar); + disp('Done'); +else + XK0=XK; +end + +% for each realization of cross-validation +clustering=params.kmin:params.kstep:params.kmax; +part=make_xval_partition(size(XK0,1),params.cvfold); %Partition data to 10 groups for cross validation +% for each fold of the k-fold cross-validation +disp('Run HYDRA...'); +for f=1:params.cvfold + % for each clustering solution + for kh=1:length(clustering) + params.k=clustering(kh); + disp(['Applying HYDRA for ' num2str(params.k) ' clusters. Fold: ' num2str(f) '/' num2str(params.cvfold)]); + model=hydra_solver(XK0(part~=f,:),Y(part~=f,:),[],params); + YK{kh}(part~=f,f)=model.Yhat; + end +end +disp('Done'); + +disp('Estimating clustering stabilitiy...') +% estimate cluster stability for the cross-validation experiment +ARI = zeros(length(clustering),1); +for kh=1:length(clustering) + tmp=cv_cluster_stability(YK{kh}(Y~=-1,:)); + ARI(kh)=tmp(1); +end +disp('Done') + +disp('Estimating final consensus group memberships...') +% Computing final consensus group memberships +CIDX=-ones(size(Y,1),length(clustering)); %variable that stores subjects in rows, and cluster memberships for the different clustering solutions in columns +for kh=1:length(clustering) + CIDX(Y==1,kh)=consensus_clustering(YK{kh}(Y==1,:),clustering(kh)); +end +disp('Done') + +disp('Saving results...') +if(params.vo==0) + save([outputDir '/HYDRA_results.mat'],'ARI','CIDX','clustering','ID'); +else + save([outputDir '/HYDRA_results.mat'],'ARI','CIDX','clustering','ID','XK','Y','covar','IDcovar'); +end +disp('Done') +end + +function [score,stdscore]=cv_cluster_stability(S) +k=0; +for i=1:size(S,2)-1 + for j=i+1:size(S,2) + k=k+1; + zero_idx=any([S(:,i) S(:,j)]==0,2); + [a(k),b(k),c(k),d(k)]=RandIndex(S(~zero_idx,i),S(~zero_idx,j)); + end +end +score=[mean(a) mean(b) mean(c) mean(d)]; +stdscore=[std(a) std(b) std(c) std(d)]; +end + +function [AR,RI,MI,HI]=RandIndex(c1,c2) +%RANDINDEX - calculates Rand Indices to compare two partitions +% ARI=RANDINDEX(c1,c2), where c1,c2 are vectors listing the +% class membership, returns the "Hubert & Arabie adjusted Rand index". +% [AR,RI,MI,HI]=RANDINDEX(c1,c2) returns the adjusted Rand index, +% the unadjusted Rand index, "Mirkin's" index and "Hubert's" index. +% +% See L. Hubert and P. Arabie (1985) "Comparing Partitions" Journal of +% Classification 2:193-218 + +%(C) David Corney (2000) D.Corney@cs.ucl.ac.uk + +if nargin < 2 | min(size(c1)) > 1 | min(size(c2)) > 1 + error('RandIndex: Requires two vector arguments') + return +end + +C=Contingency(c1,c2); %form contingency matrix + +n=sum(sum(C)); +nis=sum(sum(C,2).^2); %sum of squares of sums of rows +njs=sum(sum(C,1).^2); %sum of squares of sums of columns + +t1=nchoosek(n,2); %total number of pairs of entities +t2=sum(sum(C.^2)); %sum over rows & columnns of nij^2 +t3=.5*(nis+njs); + +%Expected index (for adjustment) +nc=(n*(n^2+1)-(n+1)*nis-(n+1)*njs+2*(nis*njs)/n)/(2*(n-1)); + +A=t1+t2-t3; %no. agreements +D= -t2+t3; %no. disagreements + +if t1==nc + AR=0; %avoid division by zero; if k=1, define Rand = 0 +else + AR=(A-nc)/(t1-nc); %adjusted Rand - Hubert & Arabie 1985 +end + +RI=A/t1; %Rand 1971 %Probability of agreement +MI=D/t1; %Mirkin 1970 %p(disagreement) +HI=(A-D)/t1; %Hubert 1977 %p(agree)-p(disagree) + + function Cont=Contingency(Mem1,Mem2) + + if nargin < 2 | min(size(Mem1)) > 1 | min(size(Mem2)) > 1 + error('Contingency: Requires two vector arguments') + return + end + + Cont=zeros(max(Mem1),max(Mem2)); + + for i = 1:length(Mem1); + Cont(Mem1(i),Mem2(i))=Cont(Mem1(i),Mem2(i))+1; + end + end +end + +function IDXfinal=consensus_clustering(IDX,k) +[n,~]=size(IDX); +cooc=zeros(n); +for i=1:n-1 + for j=i+1:n + cooc(i,j)=sum(IDX(i,:)==IDX(j,:)); + end + %cooc(i,i)=sum(IDX(i,:)==IDX(i,:))/2; +end +cooc=cooc+cooc'; +L=diag(sum(cooc,2))-cooc; + +Ln=eye(n)-diag(sum(cooc,2).^(-1/2))*cooc*diag(sum(cooc,2).^(-1/2)); +Ln(isnan(Ln))=0; +[V,~]=eig(Ln); +try + IDXfinal=kmeans(V(:,1:k),k,'emptyaction','drop','replicates',20); +catch + disp('Complex Eigenvectors Found...Using Non-Normalized Laplacian'); + [V,~]=eig(L); + IDXfinal=kmeans(V(:,1:k),k,'emptyaction','drop','replicates',20); +end + +end + +function [part] = make_xval_partition(n, n_folds) +% MAKE_XVAL_PARTITION - Randomly generate cross validation partition. +% +% Usage: +% +% PART = MAKE_XVAL_PARTITION(N, N_FOLDS) +% +% Randomly generates a partitioning for N datapoints into N_FOLDS equally +% sized folds (or as close to equal as possible). PART is a 1 X N vector, +% where PART(i) is a number in (1...N_FOLDS) indicating the fold assignment +% of the i'th data point. + +% YOUR CODE GOES HERE + +s=mod(n,n_folds);r=n-s; +p1=ceil((1:r)/ceil(r/n_folds)); +p2=randperm(n_folds);p2=p2(1:s); +p3=[p1 p2]; +part=p3(randperm(size(p3,2))); +end + +function [X0train,X0test]=GLMcorrection(Xtrain,Ytrain,covartrain,Xtest,covartest) + +X1=Xtrain(Ytrain==-1,:); +C1=covartrain(Ytrain==-1,:); +B=[C1 ones(size(C1,1),1)]; +Z=X1'*B*inv(B'*B); +X0train=(Xtrain'-Z(:,1:end-1)*covartrain')'; +X0test=(Xtest'-Z(:,1:end-1)*covartest')'; +end + +function printhelp() +fprintf(' function returns estimated subgroups by hydra for clustering \n') +fprintf(' configurations ranging from K=1 to K=10, or another specified range of\n') +fprintf(' values. The function returns also the Adjusted Rand Index that was\n') +fprintf(' calculated across the cross-validation experiments and comparing\n') +fprintf(' respective clustering solutions.\n') +fprintf('\n') +fprintf(' INPUT\n') +fprintf('\n') +fprintf(' REQUIRED\n') +fprintf(' [--input, -i] : .csv file containing the input features. (REQUIRED)\n') +fprintf(' every column of the file contains values for a feature, with\n') +fprintf(' the exception of the first and last columns. We assume that\n') +fprintf(' the first column contains subject identifying information\n') +fprintf(' while the last column contains label information. First line\n') +fprintf(' of the file should contain header information. Label\n') +fprintf(' convention: -1 -> control group - 1 -> pathological group\n') +fprintf(' that will be partioned to subgroups\n') +fprintf(' [--outputDir, -o] : directory where the output from all folds will be saved (REQUIRED)\n') +fprintf('\n') +fprintf(' OPTIONAL\n') +fprintf('\n') +fprintf(' [--covCSV, -z] : .csv file containing values for different covariates, which\n') +fprintf(' will be used to correct the data accordingly (OPTIONAL). Every\n') +fprintf(' column of the file contains values for a covariate, with the\n') +fprintf(' exception of the first column, which contains subject\n') +fprintf(' identifying information. Correction is performed by solving a\n') +fprintf(' solving a least square problem to estimate the respective\n') +fprintf(' coefficients and then removing their effect from the data. The\n') +fprintf(' effect of ALL provided covariates is removed. If no file is\n') +fprintf(' specified, no correction is performed.\n') +fprintf('\n') +fprintf(' NOTE: featureCSV and covCSV files are assumed to have the subjects given\n') +fprintf(' in the same order in their rows\n') +fprintf('\n') +fprintf(' [--c, -c] : regularization parameter (positive scalar). smaller values produce\n') +fprintf(' sparser models (OPTIONAL - Default 0.25)\n') +fprintf(' [--reg_type, -r] : determines regularization type. 1 -> promotes sparsity in the\n') +fprintf(' estimated hyperplanes - 2 -> L2 norm (OPTIONAL - Default 1)\n') +fprintf(' [--balance, -b] : takes into account differences in the number between the two\n') +fprintf(' classes. 1-> in case there is mismatch between the number of\n') +fprintf(' controls and patient - 0-> otherwise (OPTIONAL - Default 1)\n') +fprintf(' [--init, -g] : initialization strategy. 0 : assignment by random hyperplanes\n') +fprintf(' (not supported for regression), 1 : pure random assignment, 2:\n') +fprintf(' k-means assignment, 3: assignment by DPP random\n') +fprintf(' hyperplanes (default)\n') +fprintf(' [--iter, -t] : number of iterations between estimating hyperplanes, and cluster\n') +fprintf(' estimation. Default is 50. Increase if algorithms fails to\n') +fprintf(' converge\n') +fprintf(' [--numconsensus, -n] : number of clustering consensus steps. Default is 20.\n') +fprintf(' Increase if algorithm gives unstable clustering results.\n') +fprintf(' [--kmin, -m] : determines the range of clustering solutions to evaluate\n') +fprintf(' (i.e., kmin to kmax). Default value is 1.\n') +fprintf(' [--kmax, -k] : determines the range of clustering solutions to evaluate\n') +fprintf(' (i.e., kmin to kmax). Default value is 10.\n') +fprintf(' [--kstep, -s] : determines the range of clustering solutions to evaluate\n') +fprintf(' (i.e., kmin to kmax, with step kstep). Default value is 1.\n') +fprintf(' [--cvfold, -f]: number of folds for cross validation. Default value is 10.\n') +fprintf(' [--vo, -j] : verbose output (i.e., also saves input data to verify that all were\n') +fprintf(' read correctly. Default value is 0\n') +fprintf(' [--usage, -u] Prints basic usage message. \n'); +fprintf(' [--help, -h] Prints help information.\n'); +fprintf(' [--version, -v] Prints information about software version.\n'); +fprintf('\n') +fprintf(' OUTPUT:\n') +fprintf(' CIDX: sub-clustering assignments of the disease population (positive\n') +fprintf(' class).\n') +fprintf(' ARI: adjusted rand index measuring the overlap/reproducibility of\n') +fprintf(' clustering solutions across folds\n') +fprintf('\n') +fprintf(' NOTE: to compile this function do\n') +fprintf(' mcc -m hydra.m\n') +fprintf('\n') +fprintf('\n') +fprintf(' EXAMPLE USE (in matlab)\n') +fprintf(' hydra(''-i'',''test.csv'',''-o'',''.'',''-k'',3,''-f'',3);\n') +fprintf(' EXAMPLE USE (in command line)\n') +fprintf(' hydra -i test.csv -o . -k 3 -f 3\n') +fprintf('======================================================================\n'); +fprintf('Contact: software@cbica.upenn.edu\n'); +fprintf('\n'); +fprintf('Copyright (c) 2018 University of Pennsylvania. All rights reserved.\n'); +fprintf('See COPYING file or http://www.med.upenn.edu/sbia/software/license.html\n'); +fprintf('======================================================================\n'); +end + +function o=input2num(x) +if isnumeric(x) + o=x; +else + o = str2double(x); +end +end diff --git a/hydra_solver.m b/hydra_solver.m new file mode 100644 index 0000000..8dbded7 --- /dev/null +++ b/hydra_solver.m @@ -0,0 +1,710 @@ +% HYDRA SOLVER +% Version 1.0.0 --- January 2018 +% Section of Biomedical Image Analysis +% Department of Radiology +% University of Pennsylvania +% Richard Building +% 3700 Hamilton Walk, 7th Floor +% Philadelphia, PA 19104 +% +% Web: https://www.med.upenn.edu/sbia/ +% Email: sbia-software at uphs.upenn.edu +% +% Copyright (c) 2018 University of Pennsylvania. All rights reserved. +% See https://www.med.upenn.edu/sbia/software-agreement.html or COPYING file. +% +% Author: +% Erdem Varol +% software@cbica.upenn.edu + + +function model=hydra_solver(XK,Y,Cov,params); +%% Parameters: +% numconsensus -- (int>=0) 0 for no consensus, positive integer for number of consensus +% runs +% numiter -- (int>0) number of iterative assignment steps +% C -- (real>0) loss penalty +% k -- (int>0) number of polytope faces (final number may be less due to +% face dropping) +% kernel -- (0 (default) or 1), treat XK as X*X' solve dual problem (1), else XK is X +% solve primal(0) +% init_type -- 0 : assignment by random hyperplanes (not supported for regression), 1 : pure random +% assignment, 2: k-means assignment (default), 3: assignment by DPP random +% hyperplanes +% reg_type -- (1 or 2): 1 solves L1-SVM, 2 solves L2-SVM +%% parameters +if ~isfield(params,'numconsensus') + params.numconsensus=50; +end +if ~isfield(params,'numiter') + params.numiter=20; +end +if ~isfield(params,'C') + params.C=1; +end +if ~isfield(params,'k') + params.k=1; +end +if ~isfield(params,'kernel') + params.kernel=0; +end +if ~isfield(params,'init_type') + params.init_type=2; +end +if ~isfield(params,'balanceclasses') + params.balanceclasses=0; +end +if ~isfield(params,'fixedclustering') + params.fixedclustering=0; +end +if ~isfield(params,'fixedclusteringIDX') + params.fixedclusteringIDX=ones(size(XK,1),1); +end +if ~isfield(params,'reg_type'); + params.reg_type=2; +end + +params.type='classification'; +initparams.init_type=params.init_type; +%% algorithms + + +switch params.type + case 'classification' + initparams.regression=0; + if params.fixedclustering==1 + params.k=numel(unique(params.fixedclusteringIDX(Y==1,1))); + [~,~,params.fixedclusteringIDX(Y==1,1)]=unique(params.fixedclusteringIDX(Y==1,1)); + end + + %option for l2-regularization (default) + if params.reg_type==2; + if params.kernel==0 + svmX=XK; + svmparams='-t 0'; + initparams.kernel=0; + elseif params.kernel==1 + svmX=[(1:size(XK,1))' XK]; + svmparams='-t 4'; + initparams.kernel=1; + end + + if params.fixedclustering==0 + IDX=zeros(size(Y(Y==1,:),1),params.numconsensus); + for tt=1:params.numconsensus + + %Initialization + W=ones(size(Y,1),params.k)/params.k; + W(Y==1,:)=hydra_init_v2(XK,Y,params.k,initparams); + S=zeros(size(W)); + cn=zeros(1,params.k);cp=zeros(1,params.k);nrm=zeros(1,params.k); + for t=1:params.numiter + for j=1:params.k + %Weights for negative and positive samples + cn(1,j)=1./mean(W(Y==-1,j),1); + cp(1,j)=1./mean(W(Y==1,j),1); + nrm(1,j)=cn(1,j)+cp(1,j); + cn(1,j)=cn(1,j)/nrm(1,j); + cp(1,j)=cp(1,j)/nrm(1,j); + + if params.balanceclasses==1 + %Weighted svm taking into account negative/positive imbalance to solve for polytope hyperplanes + mdl{j}=w_svmtrain(XK,Y,W(:,j),params.C,cp(1,j),cn(1,j),params.kernel); + else + %Unweighted svm to solve for polytope hyperplanes + mdl{j}=w_svmtrain(XK,Y,W(:,j),params.C,1,1,params.kernel); + end + %Solving subject projection score along each face of the polytope + S(:,j)=w_svmpredict(XK,mdl{j},params.kernel); + end + %Subject assignment to the face of the polytope with maximum score + [~,idx]=max(S(Y==1,:),[],2); + Wold=W; + W(Y==1,:)=0; + W(sub2ind(size(W),find(Y==1),idx))=1; + if norm(W-Wold,'fro')<1e-6; + disp('converged'); + break + end + end + IDX(:,tt)=idx; + + end + + %Consensus steps, solving the assignments multiple times for stability + if params.numconsensus>1 + IDXfinal=consensus_clustering(IDX,params.k); + W=zeros(size(Y,1),params.k); + W(sub2ind(size(W),find(Y==1),IDXfinal))=1; + W(Y==-1,:)=1/params.k; + cn=zeros(1,params.k);cp=zeros(1,params.k);nrm=zeros(1,params.k); + for j=1:params.k + cn(1,j)=1./mean(W(Y==-1,j),1); + cp(1,j)=1./mean(W(Y==1,j),1); + nrm(1,j)=cn(1,j)+cp(1,j); + cn(1,j)=cn(1,j)/nrm(1,j); + cp(1,j)=cp(1,j)/nrm(1,j); + if params.balanceclasses==1 + mdl{j}=w_svmtrain(XK,Y,W(:,j),params.C,cp(1,j),cn(1,j),params.kernel); + else + mdl{j}=w_svmtrain(XK,Y,W(:,j),params.C,1,1,params.kernel); + end + end + + else + IDXfinal=IDX; + end + + %If using fixed clustering inputs, solve polytope once: + elseif params.fixedclustering==1 + IDXfinal=params.fixedclusteringIDX(Y==1,1); + W=zeros(size(Y,1),params.k); + W(sub2ind(size(W),find(Y==1),IDXfinal))=1; + W(Y==-1,:)=1/params.k; + cn=zeros(1,params.k);cp=zeros(1,params.k);nrm=zeros(1,params.k); + for j=1:params.k + cn(1,j)=1./mean(W(Y==-1,j),1); + cp(1,j)=1./mean(W(Y==1,j),1); + nrm(1,j)=cn(1,j)+cp(1,j); + cn(1,j)=cn(1,j)/nrm(1,j); + cp(1,j)=cp(1,j)/nrm(1,j); + if params.balanceclasses==1 + mdl{j}=w_svmtrain(XK,Y,W(:,j),params.C,cp(1,j),cn(1,j),params.kernel); + else + mdl{j}=w_svmtrain(XK,Y,W(:,j),params.C,1,1,params.kernel); + end + end + + end +%store models and clustering outputs + model.mdl=mdl; + model.S=W(Y==1,:); + model.W=W; + model.Yhat=Y; + model.Yhat(Y==1)=IDXfinal; + model.cn=cn; + model.cp=cp; + end +%Option for sparse regularization + if params.reg_type==1 + if params.kernel==0 + svmX=sparse(XK); + initparams.kernel=0; + svmparams='-B 1'; + elseif params.kernel==1 + error('Kernel in sparse SVM not supported'); + end + if params.fixedclustering==0 + IDX=zeros(size(Y(Y==1,:),1),params.numconsensus); + for tt=1:params.numconsensus + W=ones(size(Y,1),params.k)/params.k; + W(Y==1,:)=hydra_init_v2(XK,Y,params.k,initparams); + S=zeros(size(W)); + cn=zeros(1,params.k);cp=zeros(1,params.k);nrm=zeros(1,params.k); + for t=1:params.numiter + for j=1:params.k + cn(1,j)=1./mean(W(Y==-1,j),1); + cp(1,j)=1./mean(W(Y==1,j),1); + nrm(1,j)=cn(1,j)+cp(1,j); + cn(1,j)=cn(1,j)/nrm(1,j); + cp(1,j)=cp(1,j)/nrm(1,j); + if params.balanceclasses==1 + mdl{j}=w_train(XK,Y,W(:,j),params.C,cp(1,j),cn(1,j)); + else + mdl{j}=w_train(XK,Y,W(:,j),params.C,1,1); + end + S(:,j)=w_svmpredict(XK,mdl{j},0); + end + [~,idx]=max(S(Y==1,:),[],2); + Wold=W; + W(Y==1,:)=0; + W(sub2ind(size(W),find(Y==1),idx))=1; + if norm(W-Wold,'fro')<1e-6; + disp('converged'); + break + end + end + IDX(:,tt)=idx; + + end + + if params.numconsensus>1 + IDXfinal=consensus_clustering(IDX,params.k); + W=zeros(size(Y,1),params.k); + W(sub2ind(size(W),find(Y==1),IDXfinal))=1; + W(Y==-1,:)=1/params.k; + cn=zeros(1,params.k);cp=zeros(1,params.k);nrm=zeros(1,params.k); + for j=1:params.k + cn(1,j)=1./mean(W(Y==-1,j),1); + cp(1,j)=1./mean(W(Y==1,j),1); + nrm(1,j)=cn(1,j)+cp(1,j); + cn(1,j)=cn(1,j)/nrm(1,j); + cp(1,j)=cp(1,j)/nrm(1,j); + if params.balanceclasses==1 + mdl{j}=w_train(XK,Y,W(:,j),params.C,cp(1,j),cn(1,j)); +% train(W(:,j),Y,svmX,['-s 5 -c ' num2str(params.C) ' -q -w-1 ' num2str(cn(1,j)) ' -w1 ' num2str(cp(1,j)) ' ' svmparams]); + else + mdl{j}=w_train(XK,Y,W(:,j),params.C,1,1); +% train(W(:,j),Y,svmX,['-s 5 -c ' num2str(params.C) ' -q ' svmparams]); + end + end + + else + IDXfinal=IDX; + end + elseif params.fixedclustering==1 + IDXfinal=params.fixedclusteringIDX(Y==1,1); + W=zeros(size(Y,1),params.k); + W(sub2ind(size(W),find(Y==1),IDXfinal))=1; + W(Y==-1,:)=1/params.k; + cn=zeros(1,params.k);cp=zeros(1,params.k);nrm=zeros(1,params.k); + for j=1:params.k + cn(1,j)=1./mean(W(Y==-1,j),1); + cp(1,j)=1./mean(W(Y==1,j),1); + nrm(1,j)=cn(1,j)+cp(1,j); + cn(1,j)=cn(1,j)/nrm(1,j); + cp(1,j)=cp(1,j)/nrm(1,j); + if params.balanceclasses==1 + mdl{j}=w_train(XK,Y,W(:,j),params.C,cp(1,j),cn(1,j)); +% train(W(:,j),Y,svmX,['-s 5 -c ' num2str(params.C) ' -q -w-1 ' num2str(cn(1,j)) ' -w1 ' num2str(cp(1,j)) ' ' svmparams]); + else + mdl{j}=w_train(XK,Y,W(:,j),params.C,1,1); +% train(W(:,j),Y,svmX,['-s 5 -c ' num2str(params.C) ' -q ' svmparams]); + end + end + + end + model.mdl=mdl; + model.S=W(Y==1,:); + model.W=W; + model.Yhat=Y; + model.Yhat(Y==1)=IDXfinal; + model.cn=cn; + model.cp=cp; + end + +end + +model.params=params; +end + +function IDXfinal=consensus_clustering(IDX,k) +%Function performs consensus clustering on a co-occurence matrix +[n,~]=size(IDX); +cooc=zeros(n); +for i=1:n-1 + for j=i+1:n + cooc(i,j)=sum(IDX(i,:)==IDX(j,:)); + end + %cooc(i,i)=sum(IDX(i,:)==IDX(i,:))/2; +end +cooc=cooc+cooc'; +L=diag(sum(cooc,2))-cooc; + +Ln=eye(n)-diag(sum(cooc,2).^(-1/2))*cooc*diag(sum(cooc,2).^(-1/2)); +Ln(isnan(Ln))=0; +[V,~]=eig(Ln); +try + IDXfinal=kmeans(V(:,1:k),k,'emptyaction','drop','replicates',20); +catch + disp('Complex Eigenvectors Found...Using Non-Normalized Laplacian'); + [V,~]=eig(L); + IDXfinal=kmeans(V(:,1:k),k,'emptyaction','drop','replicates',20); +end + +end + +function [S,Yhat]=hydra_init_v2(XK,Y,k,params) +%Function performs initialization for supervised clustering +nker=@(K)(K./sqrt(diag(K)*diag(K)')); +init_type=params.init_type; +if params.regression==0 + if params.kernel==0 + X=XK; + if init_type==0; %% Random hyperplanes + idxp=find(Y==1); + idxn=find(Y==-1); + prob=zeros(size(X(Y==1,:),1),k); + for j=1:k + ip=randi(length(idxp)); + in=randi(length(idxn)); + w0=(X(idxp(ip),:)-X(idxn(in),:)); + w0=w0/norm(w0); + prob(:,j)=bsxfun(@times,X(Y==1,:),1./norms(X(Y==1,:),2,2))*w0'; + end + l=min(prob-1,0); + d=prob-1; + S=LP1(l,d); + elseif init_type==1; %% Random assignment + S=drchrnd(ones(1,k),size(X(Y==1,:),1)); + elseif init_type==2; %% K-means + IDX=kmeans(X(Y==1,:),k,'replicates',20); + S=zeros(size(X(Y==1,:),1),k); + S(sub2ind(size(S),(1:size(S,1))',IDX))=1; + elseif init_type==3; %% DPP random hyperplanes + idxp=find(Y==1); + idxn=find(Y==-1); + num=size(X,1); + W=zeros(num,size(X,2)); + for j=1:num + ip=randi(length(idxp)); + in=randi(length(idxn)); + W(j,:)=(X(idxp(ip),:)-X(idxn(in),:)); + end + KW=W*W'; + KW=KW./sqrt(diag(KW)*diag(KW)'); + Widx = sample_dpp(decompose_kernel(KW),k); + prob=zeros(size(X(Y==1,:),1),k); + for j=1:k + prob(:,j)=bsxfun(@times,X(Y==1,:),1./norms(X(Y==1,:),2,2))*(W(Widx(j),:))'; + end + l=min(prob-1,0); + d=prob-1; + S=LP1(l,d); + end + Yhat=-ones(size(Y)); + [~,Yhat(Y==1)]=max(S,[],2); + elseif params.kernel==1 + K=XK; + if init_type==0 + Kn=nker(K); + idxp=find(Y==1); + idxn=find(Y==-1); + prob=zeros(size(K(Y==1,:),1),k); + for j=1:k + ip=randi(length(idxp)); + in=randi(length(idxn)); + prob(:,j)=Kn(:,idxp(ip))-Kn(:,idxn(in)); + end + l=min(prob-1,0); + d=prob-1; + S=LP1(l,d); + elseif init_type==1 + S=drchrnd(ones(1,k),size(K(Y==1,:),1)); + elseif init_type==2 + IDX=knkmeans(K(Y==1,Y==1),k,20); + S=zeros(size(K(Y==1,:),1),k); + S(sub2ind(size(S),(1:size(S,1))',IDX))=1; + elseif init_type==3; + Kn=nker(K); + idxp=find(Y==1); + idxn=find(Y==-1); + num=size(K,1); + KW=zeros(num,num); + KWidxp=zeros(num,1); + KWidxn=zeros(num,1); + for i=1:num + KWidxp(i,1)=randi(length(idxp)); + KWidxn(i,1)=randi(length(idxn)); + end + for i=1:num + for j=i:num + KW(i,j)=K(idxp(KWidxp(i,1)),idxp(KWidxp(j,1)))+K(idxn(KWidxn(i,1)),idxn(KWidxn(j,1)))-K(idxp(KWidxp(i,1)),idxn(KWidxn(j,1)))-K(idxn(KWidxn(i,1)),idxp(KWidxp(j,1))); + KW(j,i)=KW(i,j); + end + end + KW=KW./sqrt(diag(KW)*diag(KW)'); + Widx = sample_dpp(decompose_kernel(KW),k); + prob=zeros(size(K(Y==1,:),1),k); + for j=1:k + prob(:,j)=Kn(Y==1,idxp(KWidxp(Widx(j))))-Kn(Y==1,idxn(KWidxn(Widx(j)))); + end + l=min(prob-1,0); + d=prob-1; + S=LP1(l,d); + end + Yhat=-ones(size(Y)); + [~,Yhat(Y==1)]=max(S,[],2); + end +end +end + +function s=LP1(l,d) +% Proportional assignment based on margin +invL=1./l; +idx=find(invL==Inf); +invL(idx)=d(idx); +for i=1:size(invL,1) + pos=find(invL(i,:)>0); %#ok<*EFIND> + neg=find(invL(i,:)<0); + if ~isempty(pos) + invL(i,neg)=0; %#ok<*FNDSB> + else + invL(i,:)=invL(i,:)/min(invL(i,:),[],2); + invL(i,invL(i,:)<1)=0; + end +end +s=bsxfun(@times,invL,1./sum(invL,2)); +end + +function epsilon=svr_parameter_selection(XK,Y,params) +%Function selects epsilon for svr +sigma=noise_estimator(XK,Y,params); +epsilon=3*sigma*sqrt(log(size(XK,1))/size(XK,1)); +end + +function sigma=noise_estimator(XK,Y,params) + +if params.kernel==1 + Ypred=loo_kernel_knn(XK,Y,5); +elseif params.kernel==0 + K=XK*XK'; + Ypred=loo_kernel_knn(K,Y,5); +end + +sigma=sqrt((5/4)*(1/size(XK,1))*sum((Y-Ypred).^2)); +end + +function Ypred=loo_kernel_knn(K,Y,k) +[n,~]=size(K); +D=kernel2dist(K); +Ypred=zeros(n,1); +for i=1:n + Yi=Y((1:n)~=i); + [~,idx]=sort(D(i,(1:n)~=i),2,'ascend'); + Ypred(i,1)=mean(Yi(idx(1:k))); +end +end + +function D=kernel2dist(K) +D=zeros(size(K)); +for i=1:size(K,1)-1 + for j=i+1:size(K,1) + D(i,j)=K(i,i)+K(j,j)-2*K(i,j); + end +end +D=D+D'; +end + +function Y = sample_dpp(L,k) +% sample a set Y from a dpp. L is a decomposed kernel, and k is (optionally) +% the size of the set to return. + +if ~exist('k','var') + % choose eigenvectors randomly + D = L.D ./ (1+L.D); + v = find(rand(length(D),1) <= D); +else + % k-DPP + v = sample_k(L.D,k); +end +k = length(v); +V = L.V(:,v); + +% iterate +Y = zeros(k,1); +for i = k:-1:1 + + % compute probabilities for each item + P = sum(V.^2,2); + P = P / sum(P); + + % choose a new item to include + Y(i) = find(rand <= cumsum(P),1); + + % choose a vector to eliminate + j = find(V(Y(i),:),1); + Vj = V(:,j); + V = V(:,[1:j-1 j+1:end]); + + % update V + V = V - bsxfun(@times,Vj,V(Y(i),:)/Vj(Y(i))); + + % orthogonalize + for a = 1:i-1 + for b = 1:a-1 + V(:,a) = V(:,a) - V(:,a)'*V(:,b)*V(:,b); + end + V(:,a) = V(:,a) / norm(V(:,a)); + end + +end + +Y = sort(Y); +end + +function L = decompose_kernel(M) +L.M = M; +[V,D] = eig(M); +L.V = real(V); +L.D = real(diag(D)); +end + +function S = sample_k(lambda,k) +% pick k lambdas according to p(S) \propto prod(lambda \in S) + +% compute elementary symmetric polynomials +E = elem_sympoly(lambda,k); + +% iterate +i = length(lambda); +remaining = k; +S = zeros(k,1); +while remaining > 0 + + % compute marginal of i given that we choose remaining values from 1:i + if i == remaining + marg = 1; + else + marg = lambda(i) * E(remaining,i) / E(remaining+1,i+1); + end + + % sample marginal + if rand < marg + S(remaining) = i; + remaining = remaining - 1; + end + i = i-1; +end +end + +function E = elem_sympoly(lambda,k) +% given a vector of lambdas and a maximum size k, determine the value of +% the elementary symmetric polynomials: +% E(l+1,n+1) = sum_{J \subseteq 1..n,|J| = l} prod_{i \in J} lambda(i) + +N = length(lambda); +E = zeros(k+1,N+1); +E(1,:) = 1; +for l = (1:k)+1 + for n = (1:N)+1 + E(l,n) = E(l,n-1) + lambda(n-1)*E(l-1,n-1); + end +end +end + +function [label, energy,LABEL,ENERGY] = knkmeans(K,init,replicates) +% Perform kernel k-means clustering. +% K: kernel matrix +% init: k (1 x 1) or label (1 x n, 1<=label(i)<=k) +% Reference: [1] Kernel Methods for Pattern Analysis +% by John Shawe-Taylor, Nello Cristianini +% Written by Michael Chen (sth4nth@gmail.com). +if nargin<3 + replicates=20; +end +LABEL=zeros(size(K,1),replicates); +ENERGY=zeros(1,replicates); +for TT=1:replicates + n = size(K,1); + if length(init) == 1 + label = ceil(init*rand(1,n)); + elseif size(init,1) == 1 && size(init,2) == n + label = init; + else + error('ERROR: init is not valid.'); + end + last = 0; + while any(label ~= last) + [u,~,label] = unique(label,'legacy'); % remove empty clusters + k = length(u); + E = sparse(label,1:n,1,k,n,n); + E = bsxfun(@times,E,1./sum(E,2)); + T = E*K; + Z = repmat(diag(T*E'),1,n)-2*T; + last = label; + [val, label] = min(Z,[],1); + end + [~,~,label] = unique(label,'legacy'); % remove empty clusters + LABEL(:,TT)=label'; + ENERGY(:,TT) = sum(val)+trace(K); +end +[energy,IDX]=min(ENERGY,[],2); +label=LABEL(:,IDX); +end + +function r = drchrnd(a,n) +% take a sample from a dirichlet distribution +p = length(a); +r = gamrnd(repmat(a,n,1),1,n,p); +r = r ./ repmat(sum(r,2),1,p); +end + +function o = norms( x, p, dim ) +%Function computes vector norms +switch p, + case 1, + o = sum( abs( x ), dim ); + case 2, + o = sqrt( sum( x .* conj( x ), dim ) ); + case Inf, + o = max( abs( x ), [], dim ); + otherwise, + o = sum( abs( x ) .^ p, dim ) .^ ( 1 / p ); +end +end + +function mdl=w_svmtrain(X,Y,W,C,Cp,Cn,dual) +%Function solves weighted l2-svm, requires matlab optimization toolbox version 2014+ +if any(isnan([Cp Cn])) + mdl.w=zeros(size(X,2),1); + mdl.b=0; + warning('Cluster dropped'); + return +end +if dual==0 + X=X; +elseif dual==1 + [U,S,~]=svd(X); + X=U*sqrt(S); + +end +idxp=find(Y==1); +idxn=find(Y==-1); +Cw=zeros(size(Y)); +Cw(idxp)=Cp; +Cw(idxn)=Cn; +[n,d] = size(X); +H = diag([ones(1, d), zeros(1, n + 1)]); +f = [zeros(1,d+1) C*(ones(1,n).*W'.*Cw')]'; +p = diag(Y) * X; +A = -[p Y eye(n)]; +B = -ones(n,1); +lb = [-inf * ones(d+1,1) ;zeros(n,1)]; +options=optimoptions('quadprog','Display','off'); +z = quadprog(H,f,A,B,[],[],lb,[],[],options); + +mdl.w = z(1:d,:); +mdl.b = z(d+1:d+1,:); +mdl.eps = z(d+2:d+n+1,:); +end + +function mdl=w_train(X,Y,W,C,Cp,Cn) +%Function solves weighted l1-svm, requires matlab optimization toolbox version 2014+ +if any(isnan([Cp Cn])) + mdl.w=zeros(size(X,2),1); + mdl.b=0; + %warning('Cluster dropped'); + return +end +idxp=find(Y==1); +idxn=find(Y==-1); +Cw=zeros(size(Y,1),1); +Cw(idxp)=Cp; +Cw(idxn)=Cn; +[n,d]=size(X); +f=[ones(d,1);ones(d,1);zeros(1,1);C*W.*Cw.*ones(n,1)]; +A=-[diag(Y)*X -diag(Y)*X Y eye(n)]; +b=-ones(n,1); +lb=[zeros(d,1);zeros(d,1);-inf(1,1);zeros(n,1)]; +ub=[inf(d,1);inf(d,1);inf(1,1);inf(n,1)]; +options=optimoptions('linprog','Display','off'); +v = linprog(f,A,b,[],[],lb,ub,[],options); + +mdl.w=v(1:d)-v(d+1:2*d); +mdl.b=v(2*d+1); +end + +function S=w_svmpredict(X,mdl,dual) +%Function makes svm prediction using model +if dual==0 + X=X; +elseif dual==1 + [U,S,~]=svd(X); + X=U*sqrt(S); + +end + +S=X*mdl.w+mdl.b; + +end diff --git a/test.csv b/test.csv new file mode 100644 index 0000000..d309b9b --- /dev/null +++ b/test.csv @@ -0,0 +1,100 @@ +bblid,Ct_Nmf18C1,Ct_Nmf18C2,Ct_Nmf18C3,Ct_Nmf18C4,Ct_Nmf18C5,Ct_Nmf18C6,Ct_Nmf18C7,Ct_Nmf18C8,Ct_Nmf18C9,Ct_Nmf18C10,Ct_Nmf18C11,Ct_Nmf18C12,Ct_Nmf18C13,Ct_Nmf18C14,Ct_Nmf18C15,Ct_Nmf18C16,Ct_Nmf18C17,Ct_Nmf18C18,DepAnxTd2 +80010,184.711969572233,208.073190954062,183.550133463312,177.062454866797,156.300463794408,142.405154825436,154.253794393129,132.108028517332,132.47804512327,143.6382975696,141.739998990058,124.655258982695,132.533830724621,138.789861526898,120.027919062625,126.527015443658,132.814406820135,112.848383561664,-1 +80179,193.273270694236,169.471062680214,116.775197956897,169.66937748809,169.382036367007,147.985753532534,110.062178165077,122.120871704889,121.325052763506,140.618644748359,127.075311512378,121.134803243017,111.48045624697,107.970130203369,114.830295315865,107.143716751225,123.32275220604,108.597339243118,-1 +80199,175.492295461796,212.302410373465,175.414521860379,137.051800896619,171.055400471976,166.463954581143,146.853580785427,126.952091742605,145.377628161031,135.452520848344,151.117914628934,151.236505433842,123.692886361153,115.019329112865,126.503569889243,135.218086961578,127.263106636171,119.445967170215,-1 +80208,186.000563590667,193.821348301608,167.95087746834,165.840247093378,164.131873244708,140.110899779385,161.388754571173,143.491210354381,137.158026455331,146.669981517361,147.61461037914,143.67393170633,122.179270244777,136.70757534053,127.238011972477,129.078609190532,126.814872820586,97.9577978867572,1 +80249,167.215101330096,216.526553908641,179.492454995494,127.834857990617,157.068416967606,136.490351214596,156.128141215714,130.729863310192,119.231217089737,150.434680863076,134.09952555195,137.740657329345,128.345905814214,118.738783342523,132.269769677948,122.980596935778,116.146974296499,113.192808510964,-1 +80265,193.917327630079,207.385886161482,183.172918950404,165.281518796665,163.618082352931,168.094515212019,144.068219040907,154.576887499451,145.901026970263,146.075995206321,147.377380437765,145.655156447951,143.068457890204,124.340177313993,133.192388454429,126.845194850362,146.023637473513,144.086741471578,1 +80289,193.450535333028,190.581694250179,144.070758815272,141.213486381546,158.066845656822,132.06260640156,149.196658456476,130.584497863643,129.321863911991,145.141138584136,119.132489143198,117.139075411664,121.301584540212,121.415029316966,109.731279893959,113.070696034356,113.693797538447,101.180625555354,1 +80396,185.943807135894,202.901471214286,179.547229814012,161.063546656666,160.128325836623,136.245246523113,148.303655161246,138.511156114976,135.80865935908,127.14207275312,131.631505568199,121.207116873467,133.064869604645,126.336187246304,116.896172819412,128.744607452773,100.004387244354,76.9495933743204,-1 +80425,211.991950580759,203.186597567448,176.37644426952,181.410101462562,182.662567566615,187.674643230638,168.828848461675,160.1509160178,167.04925165255,137.205429823978,140.53564200201,134.785488117091,137.860602841703,125.916078321931,135.046261141723,124.811888778018,139.877122809404,137.200953067739,1 +80498,175.860425826368,198.744734043915,160.503099222927,134.428183863139,165.799636346412,142.316004000431,153.749601774959,124.804464329097,125.038130563313,159.687877104653,127.905562355147,143.608469089693,115.624645970605,123.541745026646,122.567294392781,123.212936783118,123.370197667902,92.5999572810838,1 +80537,203.840150207278,192.847357389616,179.638540701635,175.986998032208,161.466769524764,153.59303469842,164.554475803877,142.516681807929,127.291021468144,149.612214412402,126.926654405355,144.236243677078,131.874725775098,130.744793119488,123.400495260124,116.730535449725,125.499601489975,109.107901166346,-1 +80557,181.495948299111,191.909888098683,144.529663873924,148.229995979605,144.132092342952,135.745315825875,134.653216339299,126.373835118904,113.623732622043,124.052612699085,116.231193977872,106.675951053127,111.908388025463,106.739085092883,105.66644011128,112.107976825705,128.573802983298,122.9888065633,-1 +80575,187.256063644666,202.88072581678,178.806476577318,162.320898137946,163.135521223637,146.517851333837,146.376503798061,136.707653242965,137.500601972945,143.677915769299,133.400227728745,135.870176987265,133.26851135589,143.500952823506,119.080649786713,123.516534859317,137.83743818789,101.077130838469,-1 +80607,183.808206787467,206.171459321279,165.700105910549,154.866591410138,158.588303399503,145.873363952937,127.578120988809,133.109000745129,154.565985331612,133.900368495314,134.123038180249,133.377816707219,123.554959021946,124.380001365245,123.964958133902,117.179227759579,121.943870376535,118.264839056126,1 +80680,173.635092894071,210.147122674629,166.089418187501,139.286548831032,166.010322411311,135.191777501003,150.807778839806,123.876186766799,120.803140629207,143.276614904601,135.631484708707,131.088388789129,117.737398411752,126.072157024154,119.178027148892,135.626347687233,138.827161708861,125.689579880369,1 +80688,186.674934474583,197.560998451902,156.845862243447,146.867024503728,149.293526518835,133.460973632578,140.587913566104,135.968648978919,143.34166318231,122.804538498824,139.12986931229,114.102968182884,131.260070284761,119.002488810687,107.439383667831,112.620777216991,111.679737126796,96.2213406169383,-1 +80765,216.02104579827,209.185588138142,185.568629841796,150.70583490553,169.742523510451,166.547030546503,139.809133613569,142.825669086303,188.281116827985,146.657348183823,140.997190168962,141.889792545048,153.038211321771,131.29627004541,127.502545334423,136.606981614982,145.60717637486,139.555962148193,-1 +80812,184.657495424338,187.621858155783,166.245896878459,163.284785169606,157.144293775666,162.990447820797,156.334722149664,147.480554636829,124.953441827756,129.563754877457,115.282081768267,135.105807097746,128.959336835859,115.048368244867,129.886457418516,122.578576572654,116.86310570697,129.163732383231,1 +80854,185.020680336776,211.994904574952,182.325014581789,163.873404260239,171.508120921443,168.616966953365,152.612810195267,143.284242175234,126.499343602001,157.322552166084,147.5524972384,132.201230091164,122.969884799963,134.800332610804,130.726288396651,133.71387616152,129.718072486127,106.752542813065,-1 +80889,180.408503193305,200.808842126808,173.574290461812,136.371871986506,160.265321637845,153.060838176115,141.388947835633,136.881004236382,122.753940322195,146.810702476394,135.203578764716,143.190391890885,117.035501051085,119.473558897576,125.494984851257,125.225766021654,133.545819156681,145.490005144656,-1 +81043,202.741222733871,199.803362505197,180.167886733158,212.561493840988,163.174256347832,157.186908914421,151.593176788281,162.476649104023,150.712414100636,134.768817162979,136.092782158188,124.110678843833,140.300866853933,153.341771024687,134.031385839492,126.946122977309,117.662622996293,103.334416268983,-1 +81222,191.227277067359,208.793859505049,170.689278056503,150.310804755834,147.338456104986,139.8018504946,141.137479919513,148.296032475012,119.755506413434,137.244829068067,128.74583650041,122.599093087836,129.104503109011,112.641485851645,115.231338174106,119.795968133578,114.01238888934,86.4597805949561,-1 +81231,173.951118681247,167.303927682018,121.954948765009,145.093405666409,155.764480788899,136.646680682516,125.02574271229,126.038870020759,116.560878275846,134.080779037033,109.444459173586,121.119282725741,99.0148600363296,97.544605640499,115.634175590867,107.288764736496,102.270671932399,90.8497473762696,1 +81287,195.855640434662,187.182318933634,171.592591845035,140.99462398532,154.392765822075,135.830508029174,134.260809781787,140.779141885036,141.487905536686,124.742556630573,148.8536067529,118.865295517026,142.835174838291,120.122656600588,116.929979518081,124.117895810131,122.073510535191,102.218037049687,-1 +81323,184.731622039006,200.470917025537,174.880571585105,140.719245446992,160.074305403943,133.453180509026,155.079631896728,128.672387264526,143.768779334712,149.354167421456,140.93549578961,137.725341018675,135.673537540334,111.795610462314,105.735791040008,119.644348940175,146.847209006317,121.470361962221,-1 +81353,183.791011315868,214.88673645719,178.630883197233,167.326475187936,175.780544415484,157.57318999952,147.127103927919,152.583074918464,113.60967755374,145.848615775193,138.521899851783,150.221949097066,127.49261898015,135.405335560957,139.805755607718,126.727797199246,123.261963914268,114.947815146807,-1 +81456,188.351497753268,203.804520734997,180.229754794442,176.932177397112,158.212398201386,152.190857878528,151.640546791597,127.583563962161,137.539802613438,146.726299640256,137.886460465865,142.686754362683,126.608944294267,139.949283252773,137.656203915904,126.953273648364,126.670722649368,121.452204767846,-1 +81528,173.466607587345,179.141187327818,154.149510467778,131.130844570495,146.526254765375,132.289690025031,140.034053157514,125.95031503483,144.841970531183,141.057083054827,133.275844050624,126.616774588364,113.267848784402,108.373208624203,113.484864665957,116.469180231145,123.893363205284,99.2779126646425,-1 +81533,194.535926900645,203.250128833902,172.62091698492,169.120234411745,155.322564443198,137.154009807276,139.084103006045,135.496199727712,141.853516983401,136.211191849496,138.487505306105,132.744311847884,139.225186559326,129.361407996512,122.910032087405,117.849503226012,125.51276153112,110.687492779519,1 +81544,188.230539164423,185.766846926336,172.217462839062,175.95623947224,162.126880253091,167.403879203347,156.8439265777,158.942033677504,152.292655480838,142.227986356603,127.233135216447,140.779922332344,125.299666204253,132.359269152982,143.914814815372,128.037891565796,116.080396425386,127.063691551745,-1 +81644,207.438370865673,210.117669367261,166.956950012281,145.895527285308,171.957228954031,154.044789231472,147.894035230381,127.569571216529,165.307771875009,145.743724490374,128.006305279612,126.719562182933,148.276884592405,124.393637818077,122.106118710523,127.077277188521,127.461685221598,122.439399482592,-1 +81659,202.353916077945,182.57553139666,156.566926096897,167.193918328586,176.50248367666,158.385438724968,150.318953906617,148.611827447015,130.07737411307,146.812158910182,131.918779504328,134.199677496161,136.365280853968,123.68576338293,123.532236050954,109.16616432137,113.822515129668,105.556443450624,-1 +81662,201.596893790262,183.005479952149,153.050905156549,184.992831230821,146.785028777572,130.900765058317,144.513086855742,151.818029267437,141.99446879797,137.408196216612,127.268954879253,114.466510089526,133.336073136164,127.791806585654,115.178227672063,114.450184517966,137.398968430358,97.163626813351,1 +81754,186.820632447415,198.931191471447,170.494172670559,146.859205090785,167.691923552191,145.738195479351,150.659314520806,133.179716271964,111.644461345092,161.142360308624,146.892342961862,132.494659008819,117.757315160617,138.263743444129,119.837223463286,124.387835403629,147.032297999998,157.263164391375,1 +81826,201.783724060203,202.353959621711,180.737782065343,175.455556731534,159.734484904841,145.642554649505,143.847402714609,135.870734104698,180.35210760092,138.162147083984,149.404456579547,132.030232320999,147.882333482824,133.791569201187,109.33423087758,129.018930274293,128.420056403791,124.789585509048,-1 +81865,165.518568060897,220.237019518305,184.59719939912,133.56865434298,162.680753999024,147.979779317444,159.597163053668,132.60651675935,122.461186787114,128.857658013451,136.606160602902,122.280849978827,122.334527397751,117.679292374975,127.418295998034,128.458787793562,113.737733874681,115.22030510284,-1 +81876,195.185781404938,200.484839362524,170.692803835512,156.635116329371,174.481269176,151.93717822684,142.426527487934,145.126282738191,136.031430111843,130.424626239206,136.004130078243,138.420292756984,129.277234891083,131.954534890159,122.691168221136,125.211637456651,117.387878178826,127.255831744113,-1 +81903,197.58820989472,208.982826443101,179.92477292063,152.62010170471,179.353162721632,158.548244575339,155.81204742096,144.597801090905,135.64989808848,144.4789288086,152.804733535885,130.218117699749,128.218069226813,122.503291431467,129.080915110942,128.443846139353,124.749628845205,108.556311625098,-1 +81906,194.581619485061,211.6119571573,171.933879874861,126.236312836223,170.55884260352,160.499040732068,145.27140916041,132.714290030002,125.756726771605,154.610752565595,135.957819035356,137.291348620132,126.6012784212,116.459716083521,112.072853360605,121.811982379404,118.653905715831,111.331049375912,1 +81989,196.765953031284,195.650754244691,184.086083528063,183.539702900956,180.008109830494,200.119254364202,154.415115342952,151.901386463884,151.748774030156,142.780463539311,138.126759173219,140.542993063905,144.663691314291,132.770488713221,147.858760701774,132.847792094673,143.807255188982,125.318044635742,-1 +81992,197.914926762278,195.891306738574,177.378479364423,176.712895352515,163.986666704271,166.367945927126,152.113279570275,146.32598528489,135.232776244072,148.609712873702,132.0022100097,134.037464443143,139.671635066975,132.335403369389,124.989998096989,126.734549170746,128.958212951053,116.871539876859,-1 +82003,194.650997881069,193.27174759489,165.611837026958,165.481727221323,142.10739889724,128.980797139383,145.612151069499,137.577190488145,161.619292381798,125.75893076795,135.014334460605,130.815888926624,140.81909808044,119.352412150789,110.9646999069,116.412738619319,124.007974508027,122.719460266231,1 +82021,207.415315343335,204.435411731152,169.199780387556,173.218556046651,162.130398171134,161.215176702279,143.717994941999,139.758073205084,161.247625987084,137.851096988624,146.518730287929,133.466059519682,146.897753583793,130.898437674182,126.112160340129,128.362743691773,142.063145588498,140.152679562697,-1 +82063,184.8044609244,208.482923624304,167.659594602203,149.069958926674,144.161296220547,132.265963668938,140.188065693071,136.068798327264,123.140085322571,147.049850502033,145.185757870917,124.440304292426,124.407077447121,120.762829646527,138.665312740998,118.789300567994,129.154460230834,103.250179713456,1 +82066,187.001961352677,201.889221636501,164.015988358846,174.262491768036,159.090224916454,143.293669972107,144.554404227249,146.735464323286,139.26637547815,132.701127857805,131.327526170887,140.998398107208,128.571949685705,127.883742466174,125.83551329641,127.559614593346,137.616726953668,107.593485255815,1 +82096,194.8798414485,204.439351836094,170.548136922107,177.635407979248,167.986374145885,177.058832608407,160.417385109536,148.328742122333,152.19318386004,151.169058486068,137.671907322612,139.025975300724,130.15428864866,125.354758257331,135.274701714748,120.773652081656,123.849246520599,122.331840333856,-1 +82124,167.912244158518,179.214639570786,149.266449690748,135.993282256829,140.723453569636,132.59207194166,140.771344744969,133.355778614346,130.010755639004,115.174041364347,117.333076739369,107.34914047768,122.262488078664,111.24870041001,114.490597848384,107.524941688127,112.243633588868,91.2423949364323,1 +82155,201.05048189645,188.576926035844,169.972350044317,169.232804177661,167.721770071981,170.315790969035,125.38426563126,138.991647864496,148.336477332782,126.360151797832,126.257213534729,124.584291426179,152.488941310166,131.202702509493,140.860869528477,130.105646507495,144.833942605388,107.772689421417,1 +82202,183.492759365241,212.891448993664,181.044891879293,144.504611367253,151.966033955127,154.804161418355,153.964311888248,138.972435863289,146.86598311504,123.82555436438,141.036123789684,136.99671662636,136.479408378826,123.331880303801,136.423652936127,133.406200693637,130.395922233747,113.786063913678,1 +82208,199.132264405168,211.69899254189,194.761156051817,166.4897228191,157.536419126322,148.375873204094,166.74632106026,145.61233385739,141.827645799694,136.899079576487,147.210286341941,129.759616253735,137.207004293083,124.345159382541,124.056088234682,134.309796142692,119.286878238928,109.800320041019,-1 +82217,210.177868499674,203.205465144109,183.426578522159,156.47357235768,157.068650473597,158.970651996461,150.711611796688,143.45323645164,146.20938636147,147.629874718019,129.95526300055,135.097650356534,145.983626378188,129.136628276335,134.398135691187,136.428588765912,127.943126387863,122.636367531582,-1 +82229,182.203085396344,199.721822882095,178.948667219584,135.824139584283,167.698896191405,153.094507269862,160.555925381008,135.012573322113,151.771726802556,138.642743227456,145.700437280505,134.911620904569,130.445784889305,120.687416693722,116.006701706498,134.072424035423,113.324177411669,130.790684285251,1 +82232,196.582496023666,209.23898161958,169.919431850802,139.381573839579,136.966134280625,136.345167684326,148.606200542161,129.805161142578,144.321100174779,137.853411387772,118.567422377433,126.431193471462,136.345012549772,118.050182368542,113.825469454622,122.543468632972,109.69480716531,94.3681571243306,-1 +82281,195.274004206109,197.826513529998,185.936749177676,154.31498186534,184.889477199143,172.422715717891,154.283873755734,140.151221401587,160.109134856965,155.650155831219,157.833592705036,147.479902272912,131.664077114814,123.37472136451,149.887063309752,135.833658562578,142.505724195241,170.209101346245,1 +82293,193.08816646479,199.910245324403,165.490271320527,141.132771025227,164.28131836793,153.505405232248,138.757866534721,122.655792908805,126.025112137978,141.335778496496,127.581621349505,139.37511021623,121.47578239911,126.842962671789,128.382735792334,122.470021207535,123.477265364968,147.972791571861,1 +82311,173.120296492968,192.622327030068,161.922291453378,140.154653614942,147.031374901069,147.211863265204,129.641717437622,136.435830762652,139.486788431032,128.825495726975,126.7807061274,119.032455685234,131.438166017706,115.938184927093,120.819933294039,114.192574050756,99.2415294769923,111.598696833021,-1 +82359,212.972183661996,214.928281477529,198.083910832136,165.075800043997,173.439723467707,166.072439532728,179.495240048246,149.520278463773,148.958084238745,148.727298039155,149.549338900081,142.830047472183,142.664988311072,151.545063338623,137.194276172643,139.630618598892,122.47814566273,133.252637073487,-1 +82373,189.887724369362,213.738297621433,166.680226039658,143.122617538436,170.873303219196,148.592446994756,143.828113516822,136.727774062254,126.683373218968,154.565186156961,132.989967486667,145.190413415566,128.686708668899,123.217879406446,129.720804261025,130.633520752081,123.831998342329,124.824514079147,-1 +82423,199.652387911055,199.43238993139,183.641487664356,178.796153973028,164.760708781768,151.561990618325,148.240817287925,141.777368730768,145.635893687882,146.853432893683,137.587264409131,139.38881559956,144.050519621478,157.510023791229,130.147211030483,133.514849144885,141.944447225392,119.911931569173,-1 +82453,179.896103224736,201.227896249225,169.81275628242,138.880081910045,164.879908296724,160.764303589893,135.917862293666,140.844702767219,137.52503256484,130.003160874926,127.364156466016,122.850539530718,124.117318476215,112.805222821588,116.982794481418,118.843593298338,115.253981838684,113.938856946331,1 +82458,186.941905727173,202.267924188189,177.285277447047,151.194552856086,166.285636267148,152.898027145633,155.141908627116,144.781587492513,131.36603150728,126.330983185857,129.830178091742,136.894467168508,134.34929092556,134.632923726203,124.647255472932,127.655436994016,123.881597363295,130.908911215076,-1 +82467,179.480847052055,179.409749953211,143.037764118897,117.672660172057,154.663463381169,135.703002903422,132.0068713836,112.368664155362,127.744205452326,131.925978998015,117.755860050766,119.295441166729,111.649192166776,91.4418577308051,106.091485647482,111.961060515213,115.199508549002,106.083526944088,-1 +82492,199.135250098163,212.210654827173,182.754336620254,176.166576270141,174.752751020581,180.483580966445,154.383707087153,135.476619216632,137.222518807996,165.348810648877,150.440719351167,156.019627110793,136.877720922192,129.57679716064,139.787835125411,132.40652034396,142.060648561516,147.093562258807,1 +82511,186.669941768565,196.655486218833,173.670653734616,158.835088271773,159.993377729836,135.490676209326,164.561207113614,126.897747605229,154.786983441916,136.788652610008,143.85081641903,141.107163482512,132.107618891606,126.727850818939,128.033804243571,134.156392354536,123.34455823227,146.854744709796,1 +82587,185.880071130647,194.964140175912,176.470878002198,158.153776198763,154.400099527254,145.14924909894,141.528671269084,133.081327513508,134.270980791787,139.320545008324,137.104667891902,111.421350315025,129.911433314978,125.208172842227,113.508106683236,123.093001151973,119.799291502669,111.060096262352,-1 +82674,197.281559315538,211.979084019339,180.638731174547,155.795603164481,172.418244318108,166.558207844124,159.06330150813,144.052001041535,144.571881543313,140.176868961538,149.021855378398,150.991380199686,143.690924064802,139.751793359635,137.828112253754,134.635341543824,136.254689505868,144.30991696884,1 +82709,194.303858341477,221.970571218589,185.529851961899,176.300860392066,180.205968334286,157.452175759554,170.115806276671,148.547877568238,151.949370615078,153.47785921904,141.813883105623,145.548939399489,136.242619746405,136.125075207149,121.868913497701,128.39091896427,118.317527412575,131.832957688451,1 +82754,192.975893638717,217.118579606522,188.591978004207,145.216494564656,162.034214127145,137.305450031862,145.637881123979,142.518113785857,161.278739757094,158.413386119607,143.148725373766,119.099709806754,135.00979986267,130.278372074243,110.118039624274,120.580005050777,125.363672583389,119.95538018753,-1 +82784,176.318156240964,191.299737121032,166.12873827634,131.262528660837,158.379202274214,154.992547411024,137.093395142774,134.047577478401,132.998133966604,144.181791608861,126.952393899118,149.976972145049,131.3163936868,102.429032849279,134.264552969304,134.407139789673,103.11394590302,111.636151463308,1 +82877,209.38979944586,200.784621008945,168.270652500528,168.467526337812,166.900233388394,157.638963343199,156.520305524151,147.327309330949,141.552516595802,143.715781023687,138.052911883513,127.830781256311,133.291612955209,126.598898276231,131.380082800281,131.544808678976,112.87938592474,110.91662897322,-1 +82962,202.473835211611,211.636706925035,178.880446727355,164.401733342161,158.644129525762,140.919576807155,163.194465650887,156.139556873281,150.995878365966,148.347192564436,143.982946795725,125.773975844123,148.699564932229,132.534526661349,117.441725943933,128.297684243905,140.407373759505,135.081555982228,1 +82982,201.750876673501,183.245286550591,165.643800454107,172.207685996429,166.756666295017,157.765088733138,156.34057776325,145.405098412174,144.03340910352,129.813190585177,131.380796623924,138.759182325581,142.191281187268,138.474641440801,144.250031114849,122.870492737523,139.990911720982,129.988720363123,-1 +82985,215.325955169057,190.763113022411,195.591071194063,191.571542301133,168.749855659045,187.947591803362,144.528889801359,170.289010679065,150.287638376159,146.591187815732,142.927898822255,139.529404358785,162.784051753468,145.767411390676,142.546864742637,138.13398979123,148.535010838387,161.078469320903,1 +82989,201.4781994868,207.419939208093,181.516960275819,156.062648588665,173.451572875599,168.887311757583,154.672557160588,143.135586895635,152.412509503806,143.887854897024,129.466229120249,161.568110372046,143.429476782891,122.452671590441,136.00402456569,140.842677872242,136.254669569292,152.759318833434,1 +83010,168.558867994027,205.21051282294,172.814476413383,151.932518318357,156.393817800199,135.54222178847,134.176981954864,124.824243345518,109.067923867332,124.31237475418,133.58136419046,137.646825608161,125.487155563253,122.70812319563,111.466517157941,117.40397544533,115.084175101787,119.033326659744,1 +83013,187.063515062944,195.997455798209,171.646162560232,141.20668078104,167.824461857309,139.321708198867,144.532053819827,131.724977017729,153.619163317889,125.772370961122,128.007770418154,137.780582103358,127.320711058262,115.036308872096,124.218885772173,128.156440102172,111.147695797383,105.51730236435,1 +83044,192.194081955988,183.467264032455,168.181432514478,163.676192396142,153.096067312786,152.563410091811,161.225819155246,141.122114024106,130.024503572518,131.79493083304,127.690436340587,120.428400629844,132.677297971121,133.26105361486,131.248957049828,122.402747718506,126.571148251002,129.568462960093,1 +83080,184.176901885871,195.222817151136,160.575827167485,154.015756431598,152.018087307396,155.131299710161,130.952291204199,135.802832664122,157.604856919164,132.086121994334,129.06369382056,128.07011391964,140.543518336635,119.629382061515,116.345684052971,121.118526536659,109.273810498247,95.7199431749264,-1 +83103,201.705274509285,203.439439731613,188.712929416429,164.754021699637,168.782415531304,161.521553688962,179.099726543395,150.924318501929,155.26885642093,137.448594065576,141.909193930291,155.590279130013,140.737339850804,140.520223198076,141.11406958094,139.755380478866,131.864449000002,134.88656198492,-1 +83113,187.768034781522,202.172857095413,161.748885491451,142.92499489013,176.255904890047,171.738823974368,131.218172817283,135.020507617146,140.462586871932,127.779021680938,138.931623922797,142.359652954484,122.403334911203,115.624228696932,120.662068849559,115.928946009085,109.940776732505,101.877647929846,1 +83207,196.810935018655,223.943171364471,185.58498342083,164.464111172529,159.794010700213,140.458395409023,172.191212391599,140.618159222946,144.39119771048,137.8806774181,147.607190507547,140.927404070051,137.138537648292,133.242432680625,126.752452889359,133.842905313603,131.498913434982,117.403384138529,-1 +83260,174.92957790078,174.012226681469,141.942468461634,142.859332928752,136.523892867342,125.491956462402,127.916801070664,130.534966520511,129.446238273169,121.758273461513,112.326228528598,115.684560769772,109.92807534895,111.468013382689,101.402748043327,103.404813856974,109.443626915481,93.979310178847,1 +83358,195.840433796147,187.077598764879,172.905849112768,161.388314727454,159.699603656979,141.056383958897,147.939494020734,139.54833362778,133.907920413198,140.806875600941,147.932065172369,128.563585962299,123.724961139807,136.608413130803,115.872594959533,108.835983218282,131.153650903399,138.281068462265,1 +83372,190.220307212212,189.268706712013,169.291452416391,142.426560502524,153.245751470177,160.229575190319,157.488143986558,148.322779185114,127.325126130591,152.84780192135,139.991693137051,127.849344807806,134.379736029656,118.418378158389,124.257261903218,133.853242800654,108.052965879576,106.172321259962,-1 +83423,178.440499856179,200.65254556456,165.579401121443,144.834553477553,160.935446400365,147.593749338649,129.482319952274,130.455466669109,144.96402139255,128.676968667326,126.873963775151,128.903591127547,129.367828510881,121.629834494918,124.384774934554,120.334989792314,130.220774884222,132.831143252285,-1 +83429,215.45176045963,194.640969706567,165.248549336019,174.868210991122,179.716667778386,185.951291604066,159.573918627972,154.813812279405,146.321578878926,156.60953024643,142.32494539367,142.853286025871,135.475473374816,125.779092049348,142.947613975785,132.286932278329,127.35831582746,153.89649535954,1 +83454,180.884681707866,197.852210769323,160.567111255245,124.107341848546,157.001599349678,149.718713814416,132.828771414669,132.239299629266,123.979501056227,133.668001457558,135.817443580716,139.274093001875,128.569691511324,107.413891044346,120.35238800558,121.897004286842,122.383851458267,128.617387824467,-1 +83525,188.744976911065,200.550965433035,192.137911708731,174.373966291246,166.891650279147,163.613570772958,172.7961508972,143.988921151567,141.909577280499,145.435924516141,147.236687346094,141.167216847183,150.512001371583,144.717693976893,129.956688254551,132.176231204867,145.153502034592,135.183988522452,-1 +83531,196.628955594668,225.860955912858,183.881934187959,179.096052796298,170.577319610047,147.499013154728,138.711045574279,152.732098722718,142.476167681622,143.549578881708,142.885249646427,138.176970246614,140.111306887817,139.745389647258,131.338373640628,128.86875112286,133.979915722556,109.950639645315,1 +83580,195.096475973612,186.846765499261,165.780957990665,148.370604301683,163.975491353321,159.466456704958,144.525382681134,139.952169825885,125.41639424339,135.023361942473,122.892387431515,134.788161987828,131.251430628559,119.807554194378,121.639633622329,125.956874064879,103.133210115357,104.285111246583,-1 +83612,187.772738032596,213.774650127275,175.902709052128,160.283870825304,161.65330909586,169.818622332536,133.663246092458,137.631483352587,128.10828074404,120.308525050111,145.837652144318,135.004895715063,128.72959980545,131.920779035364,125.248098646268,118.134677569099,122.848600306566,131.152625636583,1 +83616,175.67266153937,192.073100555239,162.961598739692,129.08768380968,150.193542350623,138.849992211436,130.871755130875,119.961519394824,157.709303536165,134.514299819652,138.974006502434,119.516653460456,131.026972575314,112.074046798795,113.93048744692,123.292184844551,114.163345780562,105.573156614859,1 +83632,183.995260661019,211.388946621345,167.083078877339,143.317401247709,154.343930817348,132.601452726378,133.88971120474,130.887629432097,145.595427477525,136.188101432665,144.569041631489,120.663693511854,132.878810710745,121.394016441105,110.08588506981,125.236034740576,135.029610289123,124.637967330231,1 +83648,212.978664305823,205.228543880118,185.052149420143,160.329758352353,180.47716879343,166.546139822349,161.038633511094,142.513894778926,169.201610954847,163.69149985101,144.21207959104,142.883706760876,153.439348249746,126.486066977041,144.143132820937,141.108574842525,152.564215818041,149.611379917327,1 +83835,209.142755565757,205.046951024158,180.53749709132,180.208092541053,171.193193309716,175.830702429494,152.427841887408,157.09302343567,169.867100801088,143.686727181449,136.507466169549,144.872748993151,136.186640887368,133.675722679933,139.504422559391,128.245427486948,141.579634544641,123.024712482278,-1 +83972,207.140118393657,209.8611335696,186.431580548383,159.821442701552,171.150685846884,181.941708569223,152.133233428113,151.045029816421,141.815894488848,150.309028544661,133.124786851693,151.603950219774,150.455936996533,136.431649625801,136.008622405418,137.781362133518,135.172853386324,118.168832538743,-1 +83987,200.906813751077,214.599713547498,180.786599205092,155.105194756144,167.390172804919,143.220860341409,162.583481859524,136.3198200843,162.063763636495,155.624539854399,146.326854193646,125.156512635219,161.564588361461,137.207868749249,122.091150588175,137.09537078005,162.801900193979,120.719177790173,-1 +83999,196.07719674783,232.29190824387,183.141171570685,152.025042360673,164.744335935917,160.099885048049,149.616103316703,146.61875858089,136.059597658762,146.677820974888,157.672053664376,139.140753614186,131.308735926131,129.90210951838,126.048743665072,131.009288169389,114.567282256465,110.597056463706,1 +84002,182.877108480496,194.510740897736,163.833368203433,161.815577037177,167.0858341858,161.161043949378,133.480191412367,136.135042907855,160.165090386012,138.062151989191,123.982307969116,133.723769063821,133.959853584907,130.491723169664,125.688390832498,128.989450229346,130.795028422829,122.471876196879,-1 diff --git a/test_covar.csv b/test_covar.csv new file mode 100644 index 0000000..1b6c482 --- /dev/null +++ b/test_covar.csv @@ -0,0 +1,100 @@ +bblid,sex,medu1,age,ageSq,white +80010,0,16,21.75,45.3293735564122,1 +80179,1,14,21.1666666666667,37.8148239008634,1 +80199,0,12,20.3333333333333,28.2603085199208,0 +80208,0,12,20.5,30.0601004849982,0 +80249,1,12,20.8333333333333,33.8263510818197,1 +80265,1,16,20.5,30.0601004849982,0 +80289,0,12,20.0833333333333,25.6647872389713,0 +80396,0,18,20.8333333333333,33.8263510818197,1 +80425,1,12,20,24.8273912564326,0 +80498,0,16,20.9166666666667,34.8026359532473,0 +80537,1,12,20.9166666666667,34.8026359532473,1 +80557,1,12,21.5,42.0255189421294,0 +80575,0,20,21.75,45.3293735564122,1 +80607,0,12,21,35.7928097135638,0 +80680,0,12,21.0833333333333,36.7968723627691,1 +80688,1,14,21.9166666666667,47.6013877437118,0 +80765,1,16,20.5833333333333,30.9808298008702,1 +80812,1,14,20.5833333333333,30.9808298008702,0 +80854,0,18,20.1666666666667,26.5160721103989,1 +80889,0,12,21.75,45.3293735564122,1 +81043,1,12,20.75,32.863955099281,0 +81222,1,14,20.25,27.3812458707154,1 +81231,1,11,21.75,45.3293735564122,0 +81287,1,18,20,24.8273912564326,1 +81323,1,16,20.5833333333333,30.9808298008702,1 +81353,0,12,20.0833333333333,25.6647872389713,0 +81456,1,11,21.6666666666667,44.2141997960957,1 +81528,1,12,19.5833333333333,20.8487446770724,0 +81533,1,15,21.5833333333333,43.1129149246681,0 +81544,1,14,19.3333333333333,18.6282233961229,0 +81644,1,12,19.25,17.9158274135842,0 +81659,0,12,19.25,17.9158274135842,0 +81662,1,12,19.3333333333333,18.6282233961229,0 +81754,1,13,19.3333333333333,18.6282233961229,1 +81826,1,12,19.0833333333333,16.5327021151735,0 +81865,0,14,21.75,45.3293735564122,1 +81876,0,11,21.25,38.8466643278466,0 +81903,1,16,19.25,17.9158274135842,1 +81906,1,13,21.3333333333333,39.8923936437186,1 +81989,1,20,19.0833333333333,16.5327021151735,0 +81992,0,12,21.1666666666667,37.8148239008634,1 +82003,1,16,21.1666666666667,37.8148239008634,0 +82021,1,14,19.3333333333333,18.6282233961229,0 +82063,0,12,20.1666666666667,26.5160721103989,0 +82066,1,14,21.4166666666667,40.9520118484796,0 +82096,1,12,20.0833333333333,25.6647872389713,0 +82124,1,13,20.6666666666667,31.9154480056312,0 +82155,1,13,19.5,20.094682027867,0 +82202,1,16,19.5833333333333,20.8487446770724,1 +82208,1,16,21.1666666666667,37.8148239008634,1 +82217,1,14,19.5,20.094682027867,1 +82229,1,14,21.3333333333333,39.8923936437186,1 +82232,1,18,19,15.8619727993015,1 +82281,1,18,20,24.8273912564326,0 +82293,1,12,19.8333333333333,23.1942659580219,0 +82311,1,18,20.6666666666667,31.9154480056312,0 +82359,0,18,21.6666666666667,44.2141997960957,1 +82373,0,12,19.6666666666667,21.6166962151667,1 +82423,0,13,21.25,38.8466643278466,1 +82453,0,18,21.4166666666667,40.9520118484796,1 +82458,0,16,19.0833333333333,16.5327021151735,1 +82467,0,12,20.0833333333333,25.6647872389713,1 +82492,0,12,19.5,20.094682027867,0 +82511,1,16,19.6666666666667,21.6166962151667,1 +82587,1,17,19.5833333333333,20.8487446770724,1 +82674,1,16,19.6666666666667,21.6166962151667,1 +82709,1,11,22.5,55.9909373992605,0 +82754,0,18,20.8333333333333,33.8263510818197,1 +82784,1,16,19.5833333333333,20.8487446770724,1 +82877,0,12,20.25,27.3812458707154,0 +82962,0,16,19.1666666666667,17.2173203199344,1 +82982,1,12,18.8333333333333,14.562180834224,0 +82985,1,12,20.8333333333333,33.8263510818197,1 +82989,0,12,18.8333333333333,14.562180834224,1 +83010,0,20,19.5833333333333,20.8487446770724,1 +83013,1,12,20.25,27.3812458707154,1 +83044,1,14,20.8333333333333,33.8263510818197,0 +83080,1,8,18.5,12.1292635707359,0 +83103,0,14,18.1666666666667,9.91856852946994,0 +83113,1,12,18.6666666666667,13.3179444247022,0 +83207,0,12,19.25,17.9158274135842,1 +83260,0,16,22.6666666666667,58.5129515865602,0 +83358,1,12,19.8333333333333,23.1942659580219,0 +83372,0,16,18.25,10.4504089564531,1 +83423,1,16,20.75,32.863955099281,1 +83429,1,12,18.8333333333333,14.562180834224,0 +83454,1,20,20.1666666666667,26.5160721103989,1 +83525,1,18,18.9166666666667,15.2051323723183,1 +83531,0,12,19.9166666666667,24.0038841627828,1 +83580,0,17,19,15.8619727993015,1 +83612,1,12,20,24.8273912564326,0 +83616,1,14,18.5833333333333,12.7166595532746,1 +83632,1,13,20.9166666666667,34.8026359532473,1 +83648,1,16,19.5833333333333,20.8487446770724,1 +83835,0,12,20.6666666666667,31.9154480056312,0 +83972,0,12,19.0833333333333,16.5327021151735,1 +83987,1,14,18.4166666666667,11.5557564770861,1 +83999,0,18,18.6666666666667,13.3179444247022,1 +84002,0,12,22.25,52.3120827849777,1 diff --git a/test_data_readme.txt b/test_data_readme.txt new file mode 100644 index 0000000..e991e3d --- /dev/null +++ b/test_data_readme.txt @@ -0,0 +1,9 @@ +Test data is a subset of a functional MRI study dataset by T. Satterwaithe. + +The features are functional ROI's. + +Labels correspond to healthy and pathological subjects. + +The specific feature/label information is given in the header of test.csv and test_covar.csv files. + + diff --git a/user_manual.pdf b/user_manual.pdf new file mode 100644 index 0000000..b6202b2 Binary files /dev/null and b/user_manual.pdf differ