problem publishing large file to azure blob storage #2610
Replies: 2 comments 15 replies
-
I did some more digging and have a bit of additional information about the issue. We did some calls to the azure Blob Service REST API to look at the uncommitted and committed blocks of the file. When listing the blocks from the blob where the example command used:
When we use the same commands on a successfully uploaded blob, we see only committed blocks and no uncommitted blocks. The committed blocks here are again When we look at the blob of the same When going through the source code of the nf-azure plugin it seems like nextflow uses My question now is why is the block size limited to |
Beta Was this translation helpful? Give feedback.
-
(Reposting the response in the thread here to summarize the overall issue.) Updates: How I reproduced it locally. Basically, have a process which produces a file of a specific size process PUBLISH_REPORTS {
publishDir "${params.outdir}"
output:
path("*.tmp")
script:
"""
mkfile -n 50m file_50mb.tmp
"""
}
profiles {
azurebatch {
params {
outdir = "az://${AZURE_STORAGE_CONTAINER_NAME}/results"
}
azure {
azcopy {
blockSize = "10"
blobTier = "Hot"
}
storage {
accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
}
}
}
}
So, the crucial bit here is that when the Azure Batch executor isn't used, which is the only place where Instead, the Java SDK for Azure Storage is used for transferring the file to the blob container using this code here. TLDR: The Given that there is a This might be a major refactor for the I do suggest that you look into the |
Beta Was this translation helpful? Give feedback.
-
Hello everyone,
I hope it's OK for me to ask this here.
I have run into a problem when trying to publish a large file to azure blob storage.
We run our pipeline with executor Local but want to write some of our output files together with the process
.command.*
files and the report, timeline and trace files to an azure blob storage container.This works for the most part except for one file.
We have one process that creates a compressed
tar.gz
archive of intermediate output files. This is a relatively big file of 1.7 TB. When trying to publish this file to the blob storage, nextflow trowserror 409. The uncommitted block count cannot exceed the maximum limit of 100,000 blocks.
.nextflow.log
excerpt:All other output files and log files of other processes are successfully published to the azure blob container.
We thought maybe the file was to big, but when we try to upload the file manually to the same azure blob container using
azcopy
we don't get this error and the file is successfully uploaded.As I'm not able to resolve the problem I would appreciate any suggestions for fixing this issue.
more information below:
nextflow.config
file excerptnextflow process:
.command.sh
:Please let me know if you want additional information.
Beta Was this translation helpful? Give feedback.
All reactions