R package

This repo is an installable R package, you can install a locally cloned copy with R CMD INSTALL ./cloned-location.

Alternatively, install directly from github with:

install.packages("https://github.com/bioDS/Pint/archive/refs/heads/main.tar.gz", repos=NULL)

This library provides a single function that performs square root lasso regularised linear regression on all pairs of columns in the input matrix X, otherwise modelling Y ~ X. The primary function (including default arguments) is:

output <- interaction_lasso(X, Y, n = dim(X)[1], p = dim(X)[2], lambda_min = -1, halt_error_diff=1.01, max_interaction_distance=-1, max_nz_beta=-1, max_lambdas=200, verbose=FALSE, log_filename="regression.log", depth=2, log_level="none", estimate_unbiased=FALSE, use_intercept=TRUE, num_threads=-1, approximate_hierarchy=FALSE, check_duplicates=FALSE, continuous_X=FALSE)

Arguments:

X : A binary $n \times p$ matrix.

Y : A vector of $n$ real values.

lambda_min : optionally set the final value of lambda. If $ < 0$ the default value of $ϕ ⁻ ¹ (\frac{0.95}{2 \times p})$ is used.

halt_error_diff : The loss-threshold to determine when an iteration is complete.

max_interaction_distance : The maximum distance between any two components of an interaction effect. Set to '-1' for no limit (default).

max_nz_beta : If >=0, halt after this many $β$ values are non-zero (note the the current $λ$ iteration will be completed first, so more values may be set). '-1' implies no limit.

max_lambdas : maximum number of iterations (i.e. number of $λ$ values). Initial iterations in which no $β$ values are changed do not count.

depth : Maximum number of columns that may be included in an interaction. If depth=1, only main effects (columns on their own) are included. If depth=2, pairwise interactions are also included. If depth=3 main effects, pairwise and three-way interactions are included.

estimate_unbiased : once the non-zero $β$ values have been determined, optionally re-fit with $λ = 0$ to avoid the minimising effect on $β$ values, while still keeping the result sparse.

use_intercept : If true, allow a non-zero intercept.

approximate_hierarchy : Approximates a strong hierarchy by only allowing interactions between columns that are (or were at a larger $λ$ value) non-zero. Note that a main effect may still be set to zero after the interactions is included, so this does not strictly enforce either a strong or weak hierarchy. This can considerably speed up fitting interactions on large data sets.

check_duplicates : Identify and report any duplicate columns or interactions, and only assign an effect to the one of them.

num_threads : Number of threads to use, use '-1' (default) to use all available CPU cores.

Experimental Features

A number of options have been implemented, but not thoroughly tested. These are:

continuous_X : If true, use floating point values for X. If false, all non-zero values in X are treated as 1. Note that this currently disables duplicate column detection.

log_filename : name of file to save current progress in case the process needs to be interrupted and resumed.

log_level : options are 'none' (no logging), and 'lambda' where progress is saved after each $λ$ iteration is completed.

Return Values

A list of non-zero pairwise/interaction and main effects is returned.

More precisely:

final_lambda : the final value of $λ$ .

intercept : (if use_intercept=TRUE) the intercept value.

main : A data frame effects containing $i, β_{i}$ for individual columns $X_{i}$ , and a list eqiuvalent of the columns/interactions that were indistinguishable from each (if check_duplicates was enabled).

pairwise (if depth $\geq 2$ ) A data frame effects containing $i, j, β_{i, j}$ for $X_{i} \circ X_{j}$ and a list equivalent of the columns/interactions that were indistinguishable from each (if check_duplicates was enabled).

triple (if depth $\geq 3$ ) A data frame effects containing $i, j, k, β_{i, j, k}$ for $X_{i} \circ X_{j} \circ X_{k}$ and a list equivalent of the columns/interactions that were indistinguishable from each (if check_duplicates was enabled).

estimate_unbiased : (if estimate_unbiased=TRUE) $β_{i}, β_{i, j}, β_{i, j, k}$ fit with $λ = 0$ , including only the effects that are non-zero for $lambda = $ final_lambda. For an estimate of the best fit, while excluding columns lasso regression sets to zero.

For an example that finds non-zero interactions with pint, before finding a more accurate estimate of effect strengths and various summary statistics with lm() see lm_example.R

Build Requirements

Compiling on Ubuntu 22.04 requires the following package:

libxxhash-dev

Additionally, the following are required for the standalone executable and/or running tests:

libgsl-dev
ninja-build
libglib2.0-dev
meson
gcovr

Standalone Executable

There is an executable version (primarily for testing) that can be run on X/Y as .csv files.

Build Utils

meson --buildtype release build
ninja -C build

Usage

./build/utils/src/lasso_exe X.csv Y.csv [main/int] verbose=T/F [max lambda] N P [max interaction distance] [frac overlap allowed] [q/t/filename] [log_level [i]ter/[l]ambda/[n]one]

All arguments must be supplied.

Argument	Use
X.csv	Path to X matrix in .csv format (see testX.csv for an example)
Y.csv	Path to Y matrix in .csv format (see testY.csv for an example)
main/int:	Find only main effects, or interactions. Main effects only intended for testing and may be broken.
verbose:	For debugging purposes.
max lambda:	Initial lambda value for regression, must be > 0.
N:	Number of rows of X/Y (e.g. no. fitness scores)
P:	Number of columns of X (e.g. no. genes)
max interaction distance:	Only columns within this distance in X will be considered. -1 to use all pairs.
frac overlap:	fraction of columns being updated at the same time that is allowed to overlap. No longer used.
q/t/filename:	output mode. [q]uit immediately without printing output, [t]erminal: prints first 10 values < -500 to terminal, [filename]: prints all non-zero effects to the given file.
log_level:	Whether and how to log partial results. iter -> every iteration, lambda -> every new lambda, none -> do not log.

Acknowledgements

This project includes the following work:

xxHash (for identifying identical columns) - BSD 2-Clause License.
Malte Skarupke's flat hash map - Boost Software License, Version 1.0.

Name	Name	Last commit message	Last commit date
Latest commit kollienne clean up comments Mar 26, 2024 8bcc0a6 · Mar 26, 2024 History 695 Commits
R	R	remove debug prints	Apr 13, 2022
scripts	scripts	allow force re-running tests	Oct 4, 2022
src	src	clean up comments	Mar 26, 2024
testcase	testcase	add missing test data	Aug 15, 2023
tests	tests	fix real-value upper bound for interactions	Aug 8, 2022
utils	utils	more efficient & accurate strong hierarchy	Apr 8, 2022
.Rbuildignore	.Rbuildignore	separate R and utils build files	Jul 25, 2019
.gitignore	.gitignore	Lots of fixes and a new test:	Jun 23, 2021
AUTHORS	AUTHORS	clean up untracked things in repo	Jun 10, 2020
ChangeLog	ChangeLog	clean up untracked things in repo	Jun 10, 2020
DESCRIPTION	DESCRIPTION	rename LassoTesting -> Pint	Dec 3, 2020
INSTALL	INSTALL	clean up untracked things in repo	Jun 10, 2020
LICENSE	LICENSE	update license	Aug 14, 2023
NAMESPACE	NAMESPACE	rename pairwise_lasso -> interaction_lasso	Jul 23, 2021
NEWS	NEWS	clean up untracked things in repo	Jun 10, 2020
README.md	README.md	update license	Aug 14, 2023
autogen.sh	autogen.sh	add autogen.sh	Jul 31, 2019
badge.svg	badge.svg	git test hook improvements	Feb 28, 2021
bij_ind_nolethals.csv	bij_ind_nolethals.csv	add non-lethal test data	Aug 15, 2019
config.guess	config.guess	R seems to also want config.guess	Sep 1, 2019
config.h.in	config.h.in	update build files	Sep 1, 2019
config.sub	config.sub	include config.sub	Sep 1, 2019
coverage-badge.svg	coverage-badge.svg	allow force re-running tests	Oct 4, 2022
install-sh	install-sh	update build files	Sep 1, 2019
install_and_run.R	install_and_run.R	more efficient & accurate strong hierarchy	Apr 8, 2022
lm_example.R	lm_example.R	rename pairwise_lasso -> interaction_lasso	Jul 23, 2021
meson.build	meson.build	probably working continuous X	Apr 5, 2022
meson_options.txt	meson_options.txt	add missing build file for optional opencl	Aug 7, 2019
nolethals.name	nolethals.name	add non-lethal test data	Aug 15, 2019
plot_lasso_stats.R	plot_lasso_stats.R	adcal/vsglint plot improvements	Aug 17, 2020
randomX.csv	randomX.csv	print error to 10 dp, include tiny randomly generated X/Y	Aug 3, 2020
randomY.csv	randomY.csv	print error to 10 dp, include tiny randomly generated X/Y	Aug 3, 2020
really_small_X.csv	really_small_X.csv	add testing files	Apr 2, 2019
really_small_X2.csv	really_small_X2.csv	add testing files	Apr 2, 2019
shell.nix	shell.nix	specify nixpkgs version, build should now be actually reproducable	Aug 4, 2020
test-badge.svg	test-badge.svg	allow force re-running tests	Oct 4, 2022
testX.csv	testX.csv	add testing files	Apr 2, 2019
testX2.csv	testX2.csv	add testing files	Apr 2, 2019
testX2Small.csv	testX2Small.csv	add small interactions test matrix	Apr 12, 2019
testX2_MatrixMarket.mtx	testX2_MatrixMarket.mtx	include testing data in MatrixMarket sparse format	Apr 3, 2019
testXSmall.csv	testXSmall.csv	add testing files	Apr 2, 2019
testXVSmall.csv	testXVSmall.csv	very small test files	Sep 16, 2019
testY.csv	testY.csv	add testing files	Apr 2, 2019
testYSmall.csv	testYSmall.csv	add testing files	Apr 2, 2019
testYVSmall.csv	testYVSmall.csv	very small test files	Sep 16, 2019
testY_MatrixMarket.mtx	testY_MatrixMarket.mtx	include testing data in MatrixMarket sparse format	Apr 3, 2019
test_lethal_indices.csv	test_lethal_indices.csv	convert known lethal indices to csv	Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R package

Arguments:

Experimental Features

Return Values

Build Requirements

Standalone Executable

Build Utils

Usage

Acknowledgements

About

Releases 1

Packages

Languages

License

bioDS/Pint

Folders and files

Latest commit

History

Repository files navigation

R package

Arguments:

Experimental Features

Return Values

Build Requirements

Standalone Executable

Build Utils

Usage

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages