SPDX-FileCopyrightText: Copyright (C) 2019 Max Chandler, PhD student at Cardiff University
SPDX-FileCopyrightText: Copyright (C) 2020-2024 Frank C Langbein [email protected], Cardiff University
SPDX-FileCopyrightText: Copyright (C) 2021-2022 S Shermer [email protected], Swansea University SPDX-License-Identifier: AGPL-3.0-or-later
MRSNet is aimed at MR spectral quantification using artificial neural networks. It is aimed at MEGAPRESS spectra. It also provides methods to generate datasets from loaded LCModel ".BASIS" files or simulated by FID-A or PyGamma.
More information can be found in the associated paper:
M Chandler, C Jenkins, SM Shermer, FC Langbein. MRSNet: Metabolite Quantification from Edited Magnetic Resonance Spectra With Convolutional Neural Network. Preprint, 2019. arXiv:1909.03836 https://langbein.org/mrsnet-paper/
- Tested on Linux and may not work on any other platform without some adjustments.
Standard packages for Linux are:
- Git and git-lfs for git with submodules and LFS support.
- Python 3.11 (more recent versions may not work).
- Install these using your package manager with root privileges. E.g. Debian
based distributions:
sudo apt update && sudo apt install git git-lfs python3.11 python3.11-venv
- For all standard python packages used, see
requirements.txt
. These will be installed with the commands below, but here are some extra notes on potential issues.- Tensorflow as machine learning library. In particular for training, but also for quantification, a GPU (with tensorflow support) is strongly recommended, with cudnn or OneAPI. Version 2.15 should work, but more recent versions will likely fail.
- For scipy/numpy you may need to install lapack and blas libraries for your
system. By default we use numpy's fft, but you can also use fftw3 for the
Fourier transform functions via pyfftw (see the
npfft_module
config variable and configuration files below), for which you should installlibfftw3
. - PyGamma,
a MRS simulation toolbox. You only need this if you wish to use the pygamma basis
spectra simulation. It is currently commented out in
requirements.txt
as not supported in python 3.11. If needed you can still try to install it manually or use a supported python version. See https://pygamma-mrs.github.io/gamma.io/release/GammaBuildingLibrary.html for installation instructions. - GPyOpt is no longer maintained, but usable, and depends on gpy. It can be safely
commented out from
requirements.txt
if model selection is not used. - Any missing libraries may cause the pip3 install command below to fail.
- FID-A, a MRS simulation toolbox. This is provided
via a git submodule and integrated during the installation process below.
- MATLAB - Only required if you plan to simulate new FID-A spectra (the basis sets we used in the paper are in the git data/basis-dist submodule).
- Clone the repository:
Check the clone url, as it may be different if you use a different repository, e.g. from a mirror or alternative versions for development, etc.
git clone https://qyber.black/mrs/code-mrsnet.git mrsnet
- Navigate to the directory:
Make sure to select a branch or tag with
cd mrsnet
git checkout BRANCH_OR_TAG
for a specific version instead of the main branch. - Update submodules:
git submodule update --init --recursive
- Install the requirements:
Of course, you can and probably shoudl install these in a virtual environment to avoid conflicts. Note that the requirements may need additional libraries, etc. to be installed on you system that pip does not add (see note above). Potentially you may have to set this up in a virtual environment or use the
pip3 install -r requirements.txt
--break-system-packages
options (on your own risk of breaking something else). Optionally you may want to install pygamma manually (see prerequisites above). In general dependency issues of python packages failing to installed can be addressed by commenting them out ofrequirements.txt
, but it may mean that certain MRSNet functionality may not work.
To update to the latest version (of your selected branch), run git pull
and
step 3 and 4 above in the project folder. To switch to another version or branch
run git checkout BRANCH_OR_TAG
first.
Call mrsnet.py --help
to get further information about all its sub-commands
and mrsnet.py COMMAND --help
for details for each sub-command. The
sub-commands available are:
- basis: Generate basis, if it does not exist.
- simulate: Generate simulated spectra dataset.
- generate_datasets: Generate standard simulated spectra datasets.
- compare: Compare spectra with basis.
- train: Train model on dataset.
- select: Model selection on dataset.
- quantify: Quantify spectra in dicoms.
- benchmark: Run benchmark on model.
Generally it is best to run mrsnet.py
from the base-folder of the git
repository. The folder locations in data are determined by the real location of
the mrsnet.py
file (not symbolic links). These and other configuration values
can be overwritten by providing a ~/.config/mrsnet.json
file (see Cfg
class
in mrsnet/cfg.py
for details; there is also a cfg.json
file in the project
folder, generated by this class, with the default values that can also be
changed there). If you change the location of the folders in data, you do have
to make sure the submodule data is available in the new location. MRSNet has
search paths for basis, model and simulation datasets defined as search_*
variables in the configuration files. It stores any newly generated data
under the data folder in basis
, sim-spectra
, or model
as default paths
that are always added by cfg.py
. MRSNet also stores other configuration
values in cfg.json
in the project folder or alternatively mrsnet.json
in
the config folder. This overwrites the defaults from cfg.py
(mrsnet.json
overwrites cfg.json
).
The benchmark dataset is in data/benchmark
. Newly generated basis sets are stored in
data/basis
. The default basis set is in a separate git repository as submodule in
data/basis-dist
. Newly generated artificial neural network models are stored under
data/model
. Our best models we distribute are stored in data/model-dist
as a separate
git submodule. Newly simulated spectra are stored in data/sim-spectra
. The submodules
with this data are automatically installed with the above git submodule command. The
*-dist
paths are automatically added to the search paths in the configuration.
The additional git submodules containing the data are
- Data - MRS - MEGAPRESS Spectra -
Swansea benchmark phantom datasets collected at Swansea University's 3T Siemens scanner (in
data/benchmark
); - Data - MRSNet - Models - Dist -
Best performing trained models for MRSNet (in
data/models-dist
); - Data - MRSNet - Basis Spectra - Dist -
Standard basis sets used ofr MEGAPRESS simulation (in
data/basis-dist
). - Code - QDicom Utilities - Library to read dicoms.
There are further git repositories on qyber.black with more data, generated for the publications, etc. that you can also use for your own analysis:
- Data - MRSNet - Models CNN: contains
a large amount of CNN models that you could clone into
data/model-cnn
and then add that path to the model search path incfg.json
ormrsnet.json
. Note that this is a very large repository. It contains the complete analysis data for the CNN models. - Data - MRSNet - Simulated Spectra - MEGAPRESS:
contains a range of simulated MEGAPRESS spectra with our simulators using the basis datasets in
data/basis-dist
. These datases have been used in the papers for training and testing the models. You may use these to train your own models, etc. You can clone this intodata/sim-spectra-megapress
. Note, this is a very large repository.
To generate a simulated spectra dataset with the standard set of metabolites use
./mrsnet.py simulate --source lcmodel --sample random --noise_sigma 0.1 -n 10 -vv
This uses the lcmodel basis set (see basis subcommand for other basis sets and
how to generate them, if needed) to generate 10 spectra, sampling the
concentrations randomly, adding normal distributed noise with a standard
deviation of 0.1 to the time domain signal. The spectra are stored in a joblib
datafile under data/sim-spectra
according to the parameters that were used to
generate them. The above would be stored in
data/sim-spectra/lcmodel/siemens/123.23/1.0/Cr-GABA-Gln-Glu-NAA/megapress/random/1.0-0.0-0.1/10-1
where the folder 10-1
indicates that this is the 1st set of 10 spectra generated.
To train a model run, e.g.,
./mrsnet.py train -d TRAIN-DATA-PATH -e 100 --validate 5 -m cnn_small_softmax -vv
This trains a model based on the simulated spectra in the TRAIN-DATA-PATH (see previous section of how to generate these and what these paths are) for 100 epochs using 5-fold cross validating on the cnn_small_softmax model with some verbosity.
MRSNet can run model selection approaches over a set of model parameters
(currently hardcoded in mrsnet/selection.py
) and also run the training
on a remote system using a separate script - see scheduler/run_scw.sh
for
an example running on Supercomputing Wales. For example, run
./mrsnet.py select -d DATASET_PATH -e 100 --validate 0.8 --method grid cnn-simple-all --remote ./scheduler/run_scw.sh:USERNAME:10:15 -vv
To run the benchmark dataset on a model run
./mrsnet.py benchmark --model MODEL -vv
where MODEL is the path to the trained tensorflow model in the data/model-dist
or data/model
folders (the path indicates the parameters used for the model
architecture and the training/testing data). Results are stored in the model
folder.
Quantifying your own spectra in dicom files or spectra joblib files (from simulate) is done via
./mrsnet.py quantify -d DATASET -m MODEL -vv
DATASET is either a joblib file or a folder with dicom spectra. The MODEL is the
folder with the trained tensorflow model. Results are stored in the data folder
specified, as csv file. If there is a concentrations.json
file at the top-level
in the data folder, this is assumed to contain the ground truth and quantification
results are compared to it.
The code will attempt to analyse all of the spectra contained in the provided directory. There are a couple of caveats to enable this to work correctly:
- All three acquisitions for each MEGA-PRESS scan must be present (edit on, edit off, difference).
- Spectra that belong to the same scan must have a unique ID of your choice added to their filename (e.g. SCAN_001 or be in separate folders where the folder becomes the ID).
- Spectra of the different acquisition types must be labelled, by adding either "EDIT_OFF", "EDIT_ON" or "DIFF" to anywhere after the unique ID from 2 in their filename.
An example for two MEGA-PRESS scan would be six files:
SCAN_000_EDIT_OFF.ima
SCAN_000_EDIT_ON.ima
SCAN_000_DIFF.ima
SCAN_001_EDIT_OFF.ima
SCAN_001_EDIT_ON.ima
SCAN_001_DIFF.ima
Also see the folders in the benchmark dataset (data/benchmark
), which you
can use as an example structure where folders separate the spectra (e.g.
data/benchmark/E1/MEGA_Combi_WS_ON
; note that the concentrations.json
file is not at the top-level for each of the spectra collections, so would
not be used if you run quantify on it; it is found separately by the benchmark
sub-command only).
Note, loading of non-Siemens DICOM files has not been tested.
- If GPyOpt for gpo selection fails with "not positive definite, even with jitter.",
see SheffieldML/GPy#660 for a solution. Changing
to
L = linalg.cholesky(A + np.eye(A.shape[0]) * jitter, lower=True)
inL = np.linalg.cholesky(A + np.eye(A.shape[0]) * jitter)
GPy/util/linalg.py
(GPy is a dependency of GPyOpt) seems to fix this.
Released versions:
- v1.0 - first release, tensorflow 1 and python2.
- v2.0 - update to python3 and tensorflow 2; code, api and ui cleanups; updates to spectra processing; extended dataset generation, model training, model selection, and quantification.
The code is developed and maintained on qyber\black at https://qyber.black/mrs/code-mrsnet
This code is mirrored at
The mirrors are only for convenience, accessibility and backup.
- Max Chandler, School of Computer Science and Informatics, Cardiff University
- Frank C Langbein, School of Computer Science and Informatics, Cardiff University; langbein.org
- Sophie M Shermer, Physics, Swansea University
- Christopher W Jenkins, Physics and Centre for Nanohealth and Clinical Imaging Unit, Swansea University; Cardiff University Brain Research Imaging Centre (CUBRIC)
- Brian Soher (VeSPA/PyGamma) for help locating the PyGamma pulse sequence code for MEGA-PRESS, PRESS and STEAM.
For any general enquiries relating to this project, send an e-mail.
M Chandler, SM Shermer, FC Langbein. Code - MRSNet. Version 2.0. Software, 2024. [DEV:https://qyber.black/mrs/code-mrsnet] [MIRROR:https://github.com/MaxChandler/MRSNet]