Install devtools
with
install.packages("devtools")
and then, install our software directly from github:
devtools::install_github("MatteoLacki/LFQBench2")
The package can be used to open excel and csv reports with quantification results.
To open a data in wide format, where intensities are reported in columns, use read_wide_report
.
For example, in case of IsoQuant reports, simply write:
library(LFQBench2)
R = read_wide_report(path_to_report, skip=1, sheet="TOP3 quantification")
Function 'read_wide_report' uses readxl::read_excel or data.table::fread underneath, and additional arguments to these function can be added directly to function call.
To extract intensity and design from the resulting data.table
use get_intensities
.
library(LFQBench2)
I = get_intensities(R, I_col_pattern=".* SYE (:condition:.) 1:3 (:tech_repl:.)")
Underneath, we use stringr
regular expressions, so the pattern can be pretty general and should differentiate the intensity columns from other columns.
Note, that we have modified these expressions to include group names.
Thus, :condition:
in (:condition:.)
will result in an additional column with the name condition
in the output of get_intensities
function.
You can give arbitrary names to groups and have as many groups you like.
However, naming one group as condition
is necessary for the calculations of intensity ratios.
The outcome of get_intensities
will might like that then:
$I
2019-068-03 SYE A 1:3 1 2019-068-03 SYE A 1:3 2 2019-068-03 SYE A 1:3 3 2019-068-06 SYE B 1:3 1 2019-068-06 SYE B 1:3 2 2019-068-06 SYE B 1:3 3
1: 1047382.333 1064490.000 1128334.000 1105886.667 1093933.333 1084953.333
2: 639159.667 671481.000 678078.667 668860.667 690640.667 671030.333
3: 1679442.333 1762389.333 1737078.333 1881198.000 1851061.667 1898205.667
4: 681142.333 679438.000 679724.000 677334.333 693968.333 666099.667
5: 56764.333 52828.000 65068.333 126121.667 127239.000 136701.667
---
278: 4702.500 5311.000 4066.500 13863.000 13288.000 13402.500
279: 4690.000 4768.000 4229.000 5138.000 5296.000 4388.000
280: 3960.000 4219.667 4002.333 2111.500 2258.500 1892.333
281: 4740.333 4664.667 5411.333 2692.667 2631.667 2757.500
282: 60659.500 61878.500 63753.000 67623.500 68640.500 56331.500
$design
I_col_name condition tech_repl
1: 2019-068-03 SYE A 1:3 1 A 1
2: 2019-068-03 SYE A 1:3 2 A 2
3: 2019-068-03 SYE A 1:3 3 A 3
4: 2019-068-06 SYE B 1:3 1 B 1
5: 2019-068-06 SYE B 1:3 2 B 2
6: 2019-068-06 SYE B 1:3 3 B 3
It is easy to add the design information to the intensities (if you want that):
library(data.table)
LI = melt(I$I, variable.name='I_col_name')
merge(LI, I$design, by='I_col_name')
> merge(LI, I$design, by='I_col_name')
I_col_name value condition tech_repl
1: 2019-068-03 SYE A 1:3 1 1047382.333 A 1
2: 2019-068-03 SYE A 1:3 1 639159.667 A 1
3: 2019-068-03 SYE A 1:3 1 1679442.333 A 1
4: 2019-068-03 SYE A 1:3 1 681142.333 A 1
5: 2019-068-03 SYE A 1:3 1 56764.333 A 1
---
1688: 2019-068-06 SYE B 1:3 3 13402.500 B 3
1689: 2019-068-06 SYE B 1:3 3 4388.000 B 3
1690: 2019-068-06 SYE B 1:3 3 1892.333 B 3
1691: 2019-068-06 SYE B 1:3 3 2757.500 B 3
1692: 2019-068-06 SYE B 1:3 3 56331.500 B 3
This can be used in other projects, where you might want to study intensities as depending upon the groups defined by the design of your experiment.
Now, we will need to know, which proteomes/peptidomes are there at which spiked in ratios:
sampleComposition = data.frame(
species = c("HUMAN","YEAS8", "ECOLI"),
A = c( 135, 03, 12 ),
B = c( 135, 09, 06 )
)
Then, it's all quite easy: we can then calculate median levels of intensities per protein/peptide with:
MI = get_ratios_of_medians(I$I, I$design, species, sampleComposition)
and plot the outcomes with
plots = plot_ratios(MI$I_cleanMeds, MI$sampleComposition)
plots$main
Alltogether, the code was as short as
sampleComposition = data.frame(
species = c("HUMAN","YEAS8", "ECOLI"),
A = c( 135, 03, 12 ),
B = c( 135, 09, 06 )
)
R = read_wide_report(path_to_report, skip=1, sheet="TOP3 quantification")
I = get_intensities(R, I_col_pattern=".* SYE (:condition:.) 1:3 (:tech_repl:.)")
species = get_species(species_col=R[['accession']],
species_pattern=".*_(.*)")
MI = get_ratios_of_medians(I$I, I$design, species, sampleComposition)
plots = plot_ratios(MI$I_cleanMeds, MI$sampleComposition)
plots$main
You can open multiple ISOQuant reports, stored each as a string in a vector. For instance, suppose you have all your protein quantification files in a folder called "greatResults". All files are excel files, with extension '.xlsx'. To find them all, you can use the in-build R Sys.glob function, add names to the outcome, and open the files.
reports_paths = Sys.glob('path/to/you/greatResults/*.xlsx') # * is a wildcard for any string
names(reports_paths) = your_short_names_for_files
mega_report = read_isoquant_protein_reports(reports_paths, "pattern_used_to_distinguish_intensity_columns")
Then, mega_report
will be a concatenation of long-format protein reports.
For this to work, you must supply the pattern that can be used to uniquely identify all the intensity columns, as described in ?read_isoquant_protein_report
.
It is now possible to read in configuration sets from the ISOQuant output. This should be done to assure that you have been using the same parameters across different projects. More generally, this allows for the monitoring of changes between the different files. Here be examples:
library(stringr)
library(LFQBench2)
# This script illustrates how to assure oneself that the same configuration
# files were used across your ISOQuant analysis, or to pinpoint the differences.
reports_paths = Sys.glob("data/kuner_2018_072/data/obelix/output/*.xlsx") # here any character vector will do
names(reports_paths) = c('your','short','names','for','files')
configs = read_isoquant_configs(reports_paths)
config_diff = diff_isoquant_configs(configs)
# if configs are the same, an empty data.table (data.frame) is returned.
# if there are differences, best to view them with the appropriated viewer:
View(config_diff)
Note that the comparison of configurations is done pairwise, for each pair of configurations. Also, note that the opening and comparison of configuration files is separated, so that you can always peep into the configs quickly.
Finally, if you supply only one value to the above function, it will return one data.table
instead of a list of data.table
s.
library(LFQBench2)
config = read_isoquant_configs(a_simple_path_to_either_config_ini_or_protein_report)
View(config)
With our package, you can also check the quality of your chromatography system by comparing multiple technical repetitions of the experiment over time (i.e. different runs).
In order to do this, prepare your peptide report and run:
library(LFQBench2)
library(data.table)
# Path to a file with data: to get the raw data from ISOQuant you have to download it directly from XAMP
path = path.expand('~/Projects/retentiontimealignment/Data/annotated_data.csv')
D = fread(path)
# We need to have following columns:
rt = D$rt # recorded retention times (but any other value will do, like drift times from IMS)
runs = D$run # which run was the retention time recorded at?
ids = D$id # which peptide was measured
S = get_smoothed_data(rt, runs, ids) # get the data for plotting
S[,run:=ordered(run)] # change run to ordered factor, for ggplot to be happy
plot_dist_to_reference(S)
Admittedly, with 10 runs together we experience some overplotting.
This is easy to cope with, since the output of the plot_dist_to_reference
function
returns a ggplot
object,
o + facet_wrap(~run) + geom_hline(yintercept=0, linetype='dotted')
Note, that if you eliminate columns bot
or top
from S
, the ribbons will not be plotted,
S[, `:=`(top=NULL, bot=NULL)]
plot_dist_to_reference(Z)
- Find out where your package was installed with
find.package('LFQBench2')
in your R console - add it to your PATH variable (this might work on Windows too, but it will be much more complicated).
Now you can simply use:
read_isospec_report -p <Pattern> <Path>
Run read_isospec_report -h
for further help.
- ISOQuant protein reports.