Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readQFeatures no longer supports reading from a .txt file #213

Open
lmsimp opened this issue May 8, 2024 · 6 comments
Open

readQFeatures no longer supports reading from a .txt file #213

lmsimp opened this issue May 8, 2024 · 6 comments

Comments

@lmsimp
Copy link

lmsimp commented May 8, 2024

Hi Laurent,

I have been trying to use the readQFeatures function in the latest release to read in data from an external file e.g. from third party .txt file. To date, this is what we have been doing in our workflows when using QFeatures but now I see this is no longer supported and only now is an option with readSummarizedExperiment?

Would you consider adding the option back to readQFeatures for users so we can create a QFeatures object directly from an external file as per readSummarizedExperiment?

Best,

Lisa


An example,

This works perfectly as the data is already a data.frame

## Get an example PSM file from QFeatures
data("hlpsms")

## Create QF object
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")

Create an example .csv and write it locally a test data

## Now write this data to a .csv as example data to read 
write.csv(hlpsms, file = "hlpsms.csv")

## Check the structure of the .csv
csv <- read.csv(file = "hlpsms.csv")
csv[1:3, 1:3]

# > csv[1:3, 1:3]
# X126      X127C      X127N
# 1 0.12283431 0.08045915 0.07080406
# 2 0.35268185 0.14162381 0.16752388
# 3 0.01546089 0.16142297 0.08693813

If I now try and read from a .csv file I get the following errors

## specify file name
f <- "hlpsms.csv"

grep("X1", names(read.csv(f, sep = ",")))
# [1]  2  3  4  5  6  7  8  9 10 11

## Looks good, quant data is now in 2:11, try read this data
qf2 <- readQFeatures(f, quantCols = 2:11)

# Checking arguments.
# Error in .checkQuantCols(assayData, colData, quantCols) : 
#   Some column names in 'quantCols' are not found in 'assayData': NA, NA, NA, NA, NA, NA, NA, NA, NA, NA.

Also, the same if a character is specified for the quantCols

(id_character <- grep("X1", names(read.csv(f, sep = ",")), value = TRUE))
# [1] "X126"  "X127C" "X127N" "X128C" "X128N" "X129C" "X129N" "X130C" "X130N" "X131" 
qf2 <- readQFeatures(f, quantCols = id_character)

# Checking arguments.
# Error in .checkQuantCols(assayData, colData, quantCols) : 
#   Some column names in 'quantCols' are not found in 'assayData': X126, X127C, X127N, X128C, X128N, X129C, X129N, X130C, X130N, X131.

Works perfectly for SEs

se <- readSummarizedExperiment(f, quantCols = 2:11)
> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] MSnbase_2.30.1              ProtGenerics_1.36.0         mzR_2.38.0                 
 [4] Rcpp_1.0.12                 QFeatures_1.14.0            MultiAssayExperiment_1.30.0
 [7] SummarizedExperiment_1.34.0 Biobase_2.64.0              GenomicRanges_1.56.0       
[10] GenomeInfoDb_1.40.0         IRanges_2.38.0              S4Vectors_0.42.0           
[13] BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.3.0   
@lgatto
Copy link
Member

lgatto commented May 8, 2024

What about

> qf <- read.csv("hlpsms.csv") |> readQFeatures(quantCols = 2:11)
Checking arguments.
Loading data as a 'SummarizedExperiment' object.
Formatting sample annotations (colData).
Formatting data as a 'QFeatures' object.

@Charl-Hutchings
Copy link

Agree with Lisa. The readQFeatures function was widely used in our lab, including in published workflows and courses that we designed using QFeatures. I think that one of the great things about QF is the simplicity, so this additional step seems to me unnecessary. It'd be great if the functionality to read directly from files could be added back.

@lgatto
Copy link
Member

lgatto commented May 8, 2024

I will look into it, but let me highlight some other aspects of your request:

You say that

I think that one of the great things about QF is the simplicity, so
this additional step seems to me unnecessary.

With regard to readQFeatures(), the example showed above is trivial, which is something you seem to appreciate. But we have files that are read and parsed into hundreds of assays. This is a situation that needed to be (and has been) simplified, in particular with regard to sample meta-data incorporation. In such non-trivial situations, we never read the data directly from a file; in the most simple cases, using readSummarizedExperiment() also just works.

You would like the old behaviour to be added. But please, keep in mind that this seemingly simple request adds some non-trivial work on my busy plate. I don't know how easy it will be to add what you want, but I will probably have to:

  • write a helper function that checks the input, and reads it if it's a character of length 1 pointing to an existing file;
  • handle arguments in ... to only pass the relevant ones to the read.csv() function;
  • factorise that code in readQFeatures() and readSummarizedExperiment() to use the helper function;
  • make sure this doesn't impact the subsequent argument validation in readQFeatures();
  • write unit tests that check that all still works as expected.

From what I gathered from the original request, the only thing that you need to change on your side is the suggestion I proposed:

readQFeatures(file, quantCols = e)

to

read.csv(file) |> readQFeatures(quantCols = e)

This applies to your teaching material [*], the F1000research paper (which you can update to a new version) and possibly more. But there is no way for me to guarantee backwards compatibility for code/material I don't maintain/know about. But in general, I do make efforts to maintain backwards compatibility.

And last but not least, which despite this issue might be good news, we are working on a dedicated import app, that aims at making data import (especially the more difficult cases) as easy as possible.

[*] You should anyway update your material and start using quantCols instead of ecols, that is likely going to be deprecated in a release or two.

@Charl-Hutchings
Copy link

Hi Laurent,

Completely understand how much work it is to develop/maintain all of your packages and appreciate that no task is trivial. We were simply disappointed to see what we had considered very useful functionality removed.

@lgatto
Copy link
Member

lgatto commented May 9, 2024

Don't forget that you can always

> readQFeaturesCCP <- function(filename, sep = ",", quantCols, ...) 
    read.csv(filename, sep = sep) |> 
    readQFeatures(quantCols = quantCols, ...)
> readQFeaturesCCP("hlpsms.csv", quantCols = 2:11)
Checking arguments.
Loading data as a 'SummarizedExperiment' object.
Formatting sample annotations (colData).
Formatting data as a 'QFeatures' object.
An instance of class QFeatures containing 1 assays:
 [1] quants: SummarizedExperiment with 3010 rows and 10 columns 

@cvanderaa
Copy link
Collaborator

(pinging @StijnVandenbulcke who raised the same issue to me.)

Hi @lmsimp and @Charl-Hutchings,

I'm sorry to hear that our recent changes negatively affected your teaching and research material.

I however agree with the points mentioned by Laurent. The refactoring of readQFeatures() resulted from quite some discussions, and we deliberately decided to remove functionality to read tables from files. In my experience and usage of readQFeatures(), I always use read.table() (or functions alike) because i. I always double check the table was correctly imported, ii. I always forget what quantCols should be; iii. I usually use grep() on the imported column names or look at the column indexing after printing the colnames to the console when defining quantCols, hence requiring the data to be already imported. I can argue this is the best practice, and in fact, this is also what you demonstrated in your recent workflow paper.

Of course, this is my opinion based on a small sample (a few lab members/collaborators and me). I would be open to spend some time reverting to importing data from file if you could provide us with a convincing use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants