Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: unused arguments #6

Open
ashokpatowary opened this issue Jan 22, 2025 · 15 comments
Open

Error: unused arguments #6

ashokpatowary opened this issue Jan 22, 2025 · 15 comments
Labels
bug Something isn't working

Comments

@ashokpatowary
Copy link

Hi @lingminhao

I was trying to run bambu-sc in my test data and encountering the following error. Can you please give a look

Error in bambu(reads = samples, annotations = annotations, genome = "GRCm39.primary_assembly.genome.fa", : unused arguments (processByChromosome = as.logical("TRUE"), processByBam = as.logical("false")) Execution halted

Thanks

@ashokpatowary
Copy link
Author

ashokpatowary commented Jan 25, 2025

@andredsim @lingminhao can you please provide some input? If I try to use the bambu from GitHub download; that's missing much more arguments.

@lingminhao
Copy link
Collaborator

lingminhao commented Jan 25, 2025

Hi @ashokpatowary, sorry for the late response. We are currently updating the bambu package, so it will impact the nextflow pipeline here. Can you try if with-docker lingminhao/bambusc:beta1.2 or with-singularity lingminhao/bambusc:beta1.2 (if you are using singularity) works ?

@ashokpatowary
Copy link
Author

Thanks @lingminhao . I did try with the new singularity package; however encounter the following error

Running Bambu-v3.9.0 WARNING - If you change the number of cores (ncore) between Bambu runs and there is no progress please restart your R session to resolve the issue that originates from the XGboost package. --- Start generating read class files --- Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error: object 'true' not found Execution halted INFO: Cleaning up image...

@lingminhao
Copy link
Collaborator

Hi @ashokpatowary, I have fixed the bug regarding your errors. Let me know if that helps.

@lingminhao lingminhao added the bug Something isn't working label Jan 26, 2025
lingminhao referenced this issue in GoekeLab/bambu Jan 26, 2025
@ashokpatowary
Copy link
Author

Thanks @lingminhao ; but this time encounter an additional error.

Running Bambu-v3.9.0 WARNING - If you change the number of cores (ncore) between Bambu runs and there is no progress please restart your R session to resolve the issue that originates from the XGboost package. --- Start generating read class files --- Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in clipFunction(alignData = GenomicAlignments::cigar(alignmentInfo), : unused argument (alignData = GenomicAlignments::cigar(alignmentInfo)) Execution halted INFO: Cleaning up image...

@lingminhao
Copy link
Collaborator

lingminhao commented Jan 26, 2025

Hi @ashokpatowary, you will have to clean the old singularity image, and update your singularity image to the latest ones by pulling it again.

@ashokpatowary
Copy link
Author

Thanks @lingminhao. This time it not showing the previous error; but I encounter a different error. Looks like its input data related; can you please give your input

prediction accuracy (CV) (higher for splice donor than splice acceptor) estimate: 12.7803086562292 pValue: 0.0138140193757546 AUC: 0.88 prediction accuracy (CV) (higher for splice donor than splice acceptor) Error in fisher.test(table(predictions > 0.5, labels.train.cv.test)) : 'x' must have at least 2 rows and columns Calls: bambu ... FUN -> testSpliceSites -> fitXGBoostModel -> fisher.test Execution halted INFO: Cleaning up image...

The code I am using

nextflow run bambu-singlecell-spatial/main.nf --bams Run28_sort.bam --genome reference/GRCm39.primary_assembly.genome.fa --annotation reference/gencode.vM34.annotation.gtf --ncore 36 --outdir output -with-singularity lingminhao/bambusc:beta1.2 --cleanReads TRUE --deduplicateUMIs TRUE --processByBam FALSE --processByChromosome TRUE

@lingminhao
Copy link
Collaborator

lingminhao commented Jan 27, 2025

Hi @ashokpatowary, I checked the error, it turns out that it occurs at parts where it was only used to provide the user with messages about some statistics used in the model, and will not impact the output. I turned off this functionality temporarily, so you should be able to get over this error now. There's no need to pull new singularity image for this.

Having said so, it will be beneficial if you could provide the test data if possible, or the .command.run & .command.log files under the work directory for third process in nextflow: bambu, so that we can diagnose the bug in detail and fix it accordingly. But let's see if there are any further errors first.

@ashokpatowary
Copy link
Author

Hi @lingminhao. Thanks for all your help. I got another error this time. I have uploaded all the files here: https://www.dropbox.com/scl/fo/6xybedt5u4dqjzj0xc0k4/AMIrTuRO1PesQUBzLDWtGoU?rlkey=v2lrxt8yc0zqbg9mknqlkq8r9&st=j0ajbihk&dl=0

--- Start extending annotations --- Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases Calls: bambu ... filterTranscriptsByAnnotation -> recommendNDR -> predict -> lm -> lm.fit Execution halted INFO: Cleaning up image...

@lingminhao
Copy link
Collaborator

lingminhao commented Jan 29, 2025

Hi @lingminhao , I did a check with your dataset, and realised that the Rsamtools package parsed the information from the BC and UG tag in the bam file incorrectly. This can happen if these BC & UG tag are added manually into the bam file, there could be errors during the process.

Image

Nevertheless, we acknowledge this can be a potential issue in the future, so now we instead prioritise BC/UMI parsing from the read name directly over the tags in the bam file. You may proceed without modifying anything in the bam file.

PS: I also tried running your downsampled dataset, and realise it's insufficient to run with 64G RAM on my mobile machine (I am on a trip), mostly due to the high number of cells. You may have to run on a machine with higher RAM.

Thanks!

@ashokpatowary
Copy link
Author

Thanks @lingminhao

Data is from non 10X processing; so I added the BC and UMI tag manually. Anyways, thanks for the help and yess this was very small dataset from large number of cells. But our final data will have over 2B reads.

Regards
Ashok

@lingminhao
Copy link
Collaborator

Hi @ashokpatowary,

sorry I forgot to mention again that you'll need to update the singularity image again for this BC&UMI parsing.

Yes, we haven’t been working much with datasets from non-10x platforms, those with over 60k cells, or 2 billion reads. So, your feedback here is very helpful to us!

Best regards,
Min Hao

@ashokpatowary
Copy link
Author

@lingminhao

I think if we give raw bam file as input; there will be always memory error. But if we clean the input data bases on Knee; that can be avoided. Saying that I am encountering the following error. Any suggestion where it went wrong

Running Bambu-v3.9.0 WARNING - If you change the number of cores (ncore) between Bambu runs and there is no progress please restart your R session to resolve the issue that originates from the XGboost package. --- Start isoform EM quantification --- Error in fread(clusters[[j]], header = FALSE, data.table = FALSE) : input= must be a single character string containing a file name, a system command containing at least one space, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or, the input data itself containing at least one \n or \r Calls: bambu -> fread -> stopf -> raise_condition -> signal Execution halted INFO: Cleaning up image...

@ashokpatowary
Copy link
Author

@lingminhao

I think it's the "EM" which causing the error. I did try running with 10X data; same error. Saying that I have another concern. I used the sample sample to generate

  1. 10X + PacBio
  2. non-10X+ONT

For 10X+PacBio; I did analysis previously with a different tool; I did used lots of experimental validation about novel finding. To my surprise when I used that same data with bambu-single-cell less than 5% of the novel transcripts were called. I am not sure why this happening. can we used opt.discovery in this; that might be an issue? Additionally; since lots of reads which might be a novel transcripts are filtered out; while calculating gene count matrix; does these filtered reads considered or not? For example; one reads might be incomplete splice matched for thew transcript; but for the gene count that will be a true copy.

@lingminhao
Copy link
Collaborator

lingminhao commented Feb 6, 2025

Hi @ashokpatowary, sorry for the late response, I was on a trip this week.

Let me break your questions into bullet point so it's easier.

  • the bug is now fixed, remember to pull a new singularity image.
  • Can I confirm with you what you meant by knee? Is that the knee plot used to filter the cell barcodes? More info about this is very helpful for future improvement for dataset with large number of cells
  • To discovery more novel transcript, you can try to increase the threshold using the NDR argument. A higher NDR meant to include more novel transcript, with the risk of increased number of false discovery. opt.discovery is used when your annotation is incomplete for model training or that you can only use annotations from another species for novel transcript discovery
  • Even though these reads are not sufficient to evident existence of novel transcript, they are still used for quantifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants