Merge pull request #99 from senresearch/dev

Update to version v1.2.0
senresearch · Aug 28, 2023 · d5ea4d3 · d5ea4d3 · GregFa · Aug 28, 2023
2 parents b33af06 + b78d3ad
commit d5ea4d3
Show file tree

Hide file tree

Showing 14 changed files with 651 additions and 242 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,13 @@
+## Version 1.2.0 (Aug 17, 2023)
+- Help documentation added: to view the help documentation for functions `scan()`, and `bulkscan()`, type `?` in front of the functions.
+- New features:
+    - Added the wrapper function `bulkscan()` for the three algorithms of multiple-trait scans previously named as `bulkscan_null()`, `bulkscan_null_grid()`, `bulkscan_alt_grid()`. Now, the user can simply call the common interface `bulkscan(...; method = )` by supplying with the method by the user's specific favor of computational speed or precision. Allowable inputs are string types, named as "null-exact", "null-grid", "alt-grid". The default option is "null-grid" with a loose grid of h2-step of size 0.1 (a grid of 10 values from 0.0, 0.10, ..., 0.90).
+    - Added the option for SVD decomposition of the kinship matrix. To use this feature, supply the option `decomp_scheme = svd`.
+    - Added the option in both `scan()` and `bulkscan()` functions for returning the $-log_{10}(p)$ result, where $p$ is the likelihood ratio test p-value: To use this feature, supply the option `output_pvals = true`. For more details, check the help instruction by `?scan()`, `?bulkscan()`.
+- Fixed bugs: 
+    - Fixed a bug causing compilation error in `bulkscan()` "null-grid" algorithm with REML due to a typo.
+    - Fixed a bug causing output dimension mismatch when using the function `scan()` for permutation testing with adjusted covariates.
+
 ## Version 1.1.1 (July 11, 2023)
 - Fixed bugs:  
     - REML option in multiple trait scan functions ("bulkscan"'s) used to lead to compilation errors.

diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "BulkLMM"
 uuid = "b8d15608-0852-4141-ae38-222578e2ed7b"
 authors = ["Zifan Yu, Gregory Farage, Saunak Sen"]
-version = "1.1.1"
+version = "1.2.0"
 
 [deps]
 CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"

diff --git a/README.md b/README.md
@@ -184,7 +184,8 @@ kinship = round.(kinship; digits = 12);
 
 For example, to conduct genome-wide associations mapping on the
 1112-th trait, we can run the function `scan()` with inputs of the trait (as
-a 2D-array of one column), geno matrix, and the kinship matrix.
+a 2D-array of one column), geno matrix, and the kinship matrix. Type `?scan()` for more 
+detailed description of the function.
 
 
 ```julia
@@ -308,41 +309,48 @@ multi-threads](https://docs.julialang.org/en/v1/manual/multi-threading/)
 or switch to a multi-threaded *julia* kernel if using Jupyter
 notebooks.
 
-Then, run the function `bulkscan_null()` with the matrices of
-traits, genome markers, kinship. The fourth required input is the
-number of parallelized tasks and we recommend it to be the number of
-*julia* threads.
+Then, run the function `bulkscan()` with the matrices of the
+traits of interest, genome markers, and the kinship. Type `?bulkscan()` for more 
+detailed description of the function.
 
-Here, we started a 16-threaded *julia* and executed the program on a
-Linux server with the Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz to
-get the LOD scores for all **~35k** BXD traits:
+Here, we started a 16-threaded *julia* session in julia version 1.9.2. Specific session info 
+is as follows:
+```julia
+versioninfo()
+```
+
+	Julia Version 1.9.2
+	Commit e4ee485e909 (2023-07-05 09:39 UTC)
+	Platform Info:
+		OS: Linux (x86_64-linux-gnu)
+		CPU: 48 × Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
+		WORD_SIZE: 64
+		LIBM: libopenlibm
+		LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
+		Threads: 17 on 48 virtual cores
+	Environment:
+		JULIA_NUM_THREADS = 16
 
 
 ```julia
-@time multiple_results_allTraits = bulkscan_null(pheno_processed, geno_processed, kinship; nb = Threads.nthreads());
+@time multiple_results_allTraits = bulkscan(pheno_processed, geno_processed, kinship);
 ```
 
-     82.421037 seconds (2.86 G allocations: 710.821 GiB, 41.76% gc time)
+    2.112011 seconds (107.94 k allocations: 5.053 GiB, 2.59% gc time)
 
+Please Note: the default method and modeling options for `bulkscan()` takes an approximated approach for 
+the best runtime performance. The user may choose to use other methods and options provided for more precision but longer runtime, following the detailed instructions in `?bulkscan()`.
 
 The output `multiple_results_allTraits` is an object containing our model results:
 - the matrix of LOD scores $L_{p \times m}$, where $p$ is the number of markers and $m$ is number of traits; each column corresponds to the LOD scores resulting from performing GWAS on each given trait.
-- the vector of heritability estimate per trait, `h2_null_list`, obtained from fitting the null model. 
-
-Similarly as the single trait scan function `scan()`, variance components are estimated from maximum-likelihood (ML) by default ("reml = false"). The user may choose REML for estimating by specifying in the input "reml = true".
+- variance components (heritability) results will be returned in various formats depending on the specific method and other options by the user. For more details, enter `?bulkscan()`.
 
 ```julia
 size(multiple_results_allTraits.L)
 ```
 
     (7321, 35554)
 
-```julia
-length(multiple_results_allTraits.h2_null_list)
-```
-
-    35554
-
 To visualize the multiple-trait scan results, we can use the plotting function `plot_eQTL` from `BigRiverQTLPlots.jl` to generate the eQTL plot.
 In the following example, we only plot the LOD scores that are above 5.0 by calling the function and specifying in the optional argument `threshold = 5.0`:
 

diff --git a/src/BulkLMM.jl b/src/BulkLMM.jl
@@ -6,6 +6,8 @@ module BulkLMM
     using Random, Distributions
 
     include("./util.jl");
+    export p2lod, lod2p, lod2log10p
+
     include("./kinship.jl");
     export calcKinship
 
@@ -36,10 +38,10 @@ module BulkLMM
     include("./bulkscan_helpers.jl");
 
     include("./bulkscan.jl");
-    export bulkscan_null, bulkscan_null_grid, bulkscan_alt_grid
+    export bulkscan, bulkscan_null, bulkscan_null_grid, bulkscan_alt_grid
 
     include("./transform_helpers.jl");
-    # export transform_rotation
+    export transform_rotation
 
     include("./analysis_helpers/single_trait_analysis.jl");
     export LODthresholds, get_thresholds, getLL, plotLL

diff --git a/src/analysis_helpers/single_trait_analysis.jl b/src/analysis_helpers/single_trait_analysis.jl
@@ -28,7 +28,9 @@ end
 ## Outputs: the logliks (null, alt mean model) under the given h2
 function getLL(y0::Array{Float64, 2}, X0::Array{Float64, 2}, lambda0::Array{Float64, 1},
                num_of_covar::Int64, 
-               markerID::Int64, h2::Float64; prior::Array{Float64, 1} = [0.0, 0.0])
+               markerID::Int64, h2::Float64; 
+               prior::Array{Float64, 1} = [0.0, 0.0],
+               reml::Bool = false)
 
     n = size(y0, 1);
     w = makeweights(h2, lambda0);
@@ -43,13 +45,14 @@ function getLL(y0::Array{Float64, 2}, X0::Array{Float64, 2}, lambda0::Array{Floa
     X_design[:, 1:num_of_covar] = X0_covar;
     X_design[:, num_of_covar+1] = X0[:, markerID+num_of_covar];
 
-    return (ll_null = wls(y0, X0_covar, w, prior).ell, ll_markerID = wls(y0, X_design, w, prior).ell)
+    return (ll_null = wls(y0, X0_covar, w, prior; reml = reml).ell, 
+            ll_markerID = wls(y0, X_design, w, prior; reml = reml).ell)
 end
 
-function profileLL(y::Array{Float64, 2}, G::Array{Float64, 2}, covar::Array{Float64, 2}, 
+function profile_LL(y::Array{Float64, 2}, G::Array{Float64, 2}, covar::Array{Float64, 2}, 
                    K::Array{Float64, 2}, 
                    h2_grid::Array{Float64, 1}, markerID::Int64;
-                   prior::Array{Float64, 1} = [0.0, 0.0])
+                   prior::Array{Float64, 1} = [0.0, 0.0], reml::Bool = false)
 
     ## Initiate the vector to store the profile likelihood values evaluated under each given parameter value
     ell_null = zeros(length(h2_grid)); # loglikelihood under null
@@ -62,7 +65,7 @@ function profileLL(y::Array{Float64, 2}, G::Array{Float64, 2}, covar::Array{Floa
     ## Loop through the supplied h2 values, evaluate the profile loglik under each h2
     for k in 1:length(h2_grid)
         curr_h2 = h2_grid[k];
-        output = getLL(y0, X0, lambda0, num_of_covar, markerID, curr_h2; prior = prior);
+        output = getLL(y0, X0, lambda0, num_of_covar, markerID, curr_h2; prior = prior, reml = reml);
         ell_null[k] = output.ll_null;
         ell_alt[k] = output.ll_markerID;
     end