Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Corrupted #166

Open
Abalam-29895 opened this issue Apr 14, 2023 · 4 comments
Open

Dataset Corrupted #166

Abalam-29895 opened this issue Apr 14, 2023 · 4 comments

Comments

@Abalam-29895
Copy link

The audio files are being corrupted after downloading from the shell script which is provided. I have attached the link which I have been using to download and the error message from shell.
https://github.com/microsoft/DNS-Challenge/blob/2db96d5f75257df764a6ef66513b4b97bc707f30/download-dns-challenge-2.sh

Error Message :-
**bzip2: Data integrity error when decompressing.
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.**

Can you give me a fix for this? Thank you !

@thebarnable
Copy link

I'm experiencing a very similar issue. In the download-dns-challenge-4.sh script, I used the curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j line. Not for all tars, but for some (e.g. clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2):

curl: (56) OpenSSL SSL_read: Connection timed out, errno 110

bzip2: Compressed file ends unexpectedly;
	perhaps it is corrupted?  *Possible* reason follows.
bzip2: Inappropriate ioctl for device
	Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

@seowwj
Copy link

seowwj commented May 24, 2023

I faced the same issue (but with different files), the way it was solved for me was by retrying the download.

@JINSCOTT
Copy link

JINSCOTT commented Dec 26, 2023

I tried to use AzCopy" to download the files and it is way faster and much more reliable than wget and curl. No more timeouts and having to re-download the entire file from the start again.
Get AzCopy working and try something like this in the download file scripts:
azcopy copy "$URL" "$OUTPUT_PATH/$BLOB"

@valentin710
Copy link

I had the same issue as @thebarnable with the download-dns-challenge-5-headset-training.sh script. attempted multiple downloads so far, but without success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants