Skip to content

Commit

Permalink
Merge pull request #99 from senresearch/dev
Browse files Browse the repository at this point in the history
Update to version v1.2.0
  • Loading branch information
GregFa authored Aug 28, 2023
2 parents b33af06 + b78d3ad commit d5ea4d3
Show file tree
Hide file tree
Showing 14 changed files with 651 additions and 242 deletions.
10 changes: 10 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
## Version 1.2.0 (Aug 17, 2023)
- Help documentation added: to view the help documentation for functions `scan()`, and `bulkscan()`, type `?` in front of the functions.
- New features:
- Added the wrapper function `bulkscan()` for the three algorithms of multiple-trait scans previously named as `bulkscan_null()`, `bulkscan_null_grid()`, `bulkscan_alt_grid()`. Now, the user can simply call the common interface `bulkscan(...; method = )` by supplying with the method by the user's specific favor of computational speed or precision. Allowable inputs are string types, named as "null-exact", "null-grid", "alt-grid". The default option is "null-grid" with a loose grid of h2-step of size 0.1 (a grid of 10 values from 0.0, 0.10, ..., 0.90).
- Added the option for SVD decomposition of the kinship matrix. To use this feature, supply the option `decomp_scheme = svd`.
- Added the option in both `scan()` and `bulkscan()` functions for returning the $-log_{10}(p)$ result, where $p$ is the likelihood ratio test p-value: To use this feature, supply the option `output_pvals = true`. For more details, check the help instruction by `?scan()`, `?bulkscan()`.
- Fixed bugs:
- Fixed a bug causing compilation error in `bulkscan()` "null-grid" algorithm with REML due to a typo.
- Fixed a bug causing output dimension mismatch when using the function `scan()` for permutation testing with adjusted covariates.

## Version 1.1.1 (July 11, 2023)
- Fixed bugs:
- REML option in multiple trait scan functions ("bulkscan"'s) used to lead to compilation errors.
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "BulkLMM"
uuid = "b8d15608-0852-4141-ae38-222578e2ed7b"
authors = ["Zifan Yu, Gregory Farage, Saunak Sen"]
version = "1.1.1"
version = "1.2.0"

[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
Expand Down
46 changes: 27 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,8 @@ kinship = round.(kinship; digits = 12);

For example, to conduct genome-wide associations mapping on the
1112-th trait, we can run the function `scan()` with inputs of the trait (as
a 2D-array of one column), geno matrix, and the kinship matrix.
a 2D-array of one column), geno matrix, and the kinship matrix. Type `?scan()` for more
detailed description of the function.


```julia
Expand Down Expand Up @@ -308,41 +309,48 @@ multi-threads](https://docs.julialang.org/en/v1/manual/multi-threading/)
or switch to a multi-threaded *julia* kernel if using Jupyter
notebooks.

Then, run the function `bulkscan_null()` with the matrices of
traits, genome markers, kinship. The fourth required input is the
number of parallelized tasks and we recommend it to be the number of
*julia* threads.
Then, run the function `bulkscan()` with the matrices of the
traits of interest, genome markers, and the kinship. Type `?bulkscan()` for more
detailed description of the function.

Here, we started a 16-threaded *julia* and executed the program on a
Linux server with the Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz to
get the LOD scores for all **~35k** BXD traits:
Here, we started a 16-threaded *julia* session in julia version 1.9.2. Specific session info
is as follows:
```julia
versioninfo()
```

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 48 × Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
Threads: 17 on 48 virtual cores
Environment:
JULIA_NUM_THREADS = 16


```julia
@time multiple_results_allTraits = bulkscan_null(pheno_processed, geno_processed, kinship; nb = Threads.nthreads());
@time multiple_results_allTraits = bulkscan(pheno_processed, geno_processed, kinship);
```

82.421037 seconds (2.86 G allocations: 710.821 GiB, 41.76% gc time)
2.112011 seconds (107.94 k allocations: 5.053 GiB, 2.59% gc time)

Please Note: the default method and modeling options for `bulkscan()` takes an approximated approach for
the best runtime performance. The user may choose to use other methods and options provided for more precision but longer runtime, following the detailed instructions in `?bulkscan()`.

The output `multiple_results_allTraits` is an object containing our model results:
- the matrix of LOD scores $L_{p \times m}$, where $p$ is the number of markers and $m$ is number of traits; each column corresponds to the LOD scores resulting from performing GWAS on each given trait.
- the vector of heritability estimate per trait, `h2_null_list`, obtained from fitting the null model.

Similarly as the single trait scan function `scan()`, variance components are estimated from maximum-likelihood (ML) by default ("reml = false"). The user may choose REML for estimating by specifying in the input "reml = true".
- variance components (heritability) results will be returned in various formats depending on the specific method and other options by the user. For more details, enter `?bulkscan()`.

```julia
size(multiple_results_allTraits.L)
```

(7321, 35554)

```julia
length(multiple_results_allTraits.h2_null_list)
```

35554

To visualize the multiple-trait scan results, we can use the plotting function `plot_eQTL` from `BigRiverQTLPlots.jl` to generate the eQTL plot.
In the following example, we only plot the LOD scores that are above 5.0 by calling the function and specifying in the optional argument `threshold = 5.0`:

Expand Down
6 changes: 4 additions & 2 deletions src/BulkLMM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ module BulkLMM
using Random, Distributions

include("./util.jl");
export p2lod, lod2p, lod2log10p

include("./kinship.jl");
export calcKinship

Expand Down Expand Up @@ -36,10 +38,10 @@ module BulkLMM
include("./bulkscan_helpers.jl");

include("./bulkscan.jl");
export bulkscan_null, bulkscan_null_grid, bulkscan_alt_grid
export bulkscan, bulkscan_null, bulkscan_null_grid, bulkscan_alt_grid

include("./transform_helpers.jl");
# export transform_rotation
export transform_rotation

include("./analysis_helpers/single_trait_analysis.jl");
export LODthresholds, get_thresholds, getLL, plotLL
Expand Down
13 changes: 8 additions & 5 deletions src/analysis_helpers/single_trait_analysis.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@ end
## Outputs: the logliks (null, alt mean model) under the given h2
function getLL(y0::Array{Float64, 2}, X0::Array{Float64, 2}, lambda0::Array{Float64, 1},
num_of_covar::Int64,
markerID::Int64, h2::Float64; prior::Array{Float64, 1} = [0.0, 0.0])
markerID::Int64, h2::Float64;
prior::Array{Float64, 1} = [0.0, 0.0],
reml::Bool = false)

n = size(y0, 1);
w = makeweights(h2, lambda0);
Expand All @@ -43,13 +45,14 @@ function getLL(y0::Array{Float64, 2}, X0::Array{Float64, 2}, lambda0::Array{Floa
X_design[:, 1:num_of_covar] = X0_covar;
X_design[:, num_of_covar+1] = X0[:, markerID+num_of_covar];

return (ll_null = wls(y0, X0_covar, w, prior).ell, ll_markerID = wls(y0, X_design, w, prior).ell)
return (ll_null = wls(y0, X0_covar, w, prior; reml = reml).ell,
ll_markerID = wls(y0, X_design, w, prior; reml = reml).ell)
end

function profileLL(y::Array{Float64, 2}, G::Array{Float64, 2}, covar::Array{Float64, 2},
function profile_LL(y::Array{Float64, 2}, G::Array{Float64, 2}, covar::Array{Float64, 2},
K::Array{Float64, 2},
h2_grid::Array{Float64, 1}, markerID::Int64;
prior::Array{Float64, 1} = [0.0, 0.0])
prior::Array{Float64, 1} = [0.0, 0.0], reml::Bool = false)

## Initiate the vector to store the profile likelihood values evaluated under each given parameter value
ell_null = zeros(length(h2_grid)); # loglikelihood under null
Expand All @@ -62,7 +65,7 @@ function profileLL(y::Array{Float64, 2}, G::Array{Float64, 2}, covar::Array{Floa
## Loop through the supplied h2 values, evaluate the profile loglik under each h2
for k in 1:length(h2_grid)
curr_h2 = h2_grid[k];
output = getLL(y0, X0, lambda0, num_of_covar, markerID, curr_h2; prior = prior);
output = getLL(y0, X0, lambda0, num_of_covar, markerID, curr_h2; prior = prior, reml = reml);
ell_null[k] = output.ll_null;
ell_alt[k] = output.ll_markerID;
end
Expand Down
Loading

2 comments on commit d5ea4d3

@GregFa
Copy link
Member Author

@GregFa GregFa commented on d5ea4d3 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/90393

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v1.2.0 -m "<description of version>" d5ea4d373463f7d457c618b906b9b0dd834f60a5
git push origin v1.2.0

Please sign in to comment.