Skip to content
This repository has been archived by the owner on Dec 3, 2023. It is now read-only.

[bug] Some youtube videos download xml instead of video #309

Open
akanellis opened this issue May 30, 2020 · 5 comments
Open

[bug] Some youtube videos download xml instead of video #309

akanellis opened this issue May 30, 2020 · 5 comments

Comments

@akanellis
Copy link

Hello,

async function getYoutubeStream(url) {
    return new Promise( async (resolve, reject) => {
        const youtubedl = require('youtube-dl');
        const fs = require('fs');
        const filename = "output.webm";

        const stream = youtubedl(url,
            ["--cookies", "cookies.txt", "--format=251"],
            { cwd: __dirname })
        .on('error', (err) => {
            console.error("youtube-dl error");
            reject(new Error(err));
        })
        .on('end', function() {
            resolve(filename);
        })
        .pipe(fs.createWriteStream(filename));
    });
}

(async () => {
    let file = await getYoutubeStream("https://www.youtube.com/watch?v=WxMTeMc7R_A");
})();

The above simple code, downloads a specific audio stream from the youtube link shown and saves it into a file. In most videos from youtube this works perfectly. But in some rare cases like this specific video (please no comments, i know the video sucks, but it's an example :p), the output is actually a gzip'd xml file and not an actual audio file.

The on('error') event is not invoked when obviously something has gone wrong here. I have also tried try catch statements but this is also not caught anywhere. This is bad because when no error control works, trying to play this file crashes the whole app.

I am attaching the extracted xml file for debug purposes. (I have extracted it, renamed it to .xml and zipped it so that github would allow uploading)

Regards
output.webm.zip

@akanellis akanellis changed the title Did not find the EBML tag at the start of the stream [bug] Some youtube videos download xml instead of video May 30, 2020
@brunobg
Copy link
Contributor

brunobg commented Aug 18, 2020

The video from this issue is gone, but the same thing happens to https://vimeo.com/56282283, which downloads a gzipped xml instead, even though youtube-dl called directly works.

I am not sure of what is the idea after browsing the code. It seems that instead of letting youtube-dl actually download the video, processData() makes a request to the URL directly. In the case of this video the base url field is a XML, not a MP4.

@Kikobeats could explain the idea here? I imagine downloading in JS gives you a stream object, which is nice, but apparently the -f best guess does not work as expected. Although there are a bunch of other formats, I'd expect that this lib would download the highest resolution video by default, or at least behave the same as youtube-dl from command line.

Also, '-f', 'bestvideo+bestaudio explodes on processData since there's no field url. So this is unfortunately not a fix.

YTDL states this:

Since the end of April 2015 and version 2015.04.26, youtube-dl uses -f bestvideo+bestaudio/best as the default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading bestvideo and bestaudio separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to best and results in downloading the best available quality served as a single file. best is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add -f bestvideo[height<=?1080]+bestaudio/best to your configuration file. Note that if you use youtube-dl to stream to stdout (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as -o -, youtube-dl still uses -f best format selection in order to start content delivery immediately to your player and not to wait until bestvideo and bestaudio are downloaded and muxed.

My first impulse here is to use YTDL itself to do the download. This would change the package significantly, but would get rid of any problems. YTDL provides progress code when using --newline:

[vimeo] 56282283: Downloading webpage
[vimeo] 56282283: Extracting information
[vimeo] 56282283: Downloading JSON metadata
[vimeo] 56282283: Downloading JSON metadata
[vimeo] 56282283: Downloading akfire_interconnect_quic m3u8 information
[vimeo] 56282283: Downloading akfire_interconnect_quic m3u8 information
[vimeo] 56282283: Downloading akfire_interconnect_quic MPD information
[vimeo] 56282283: Downloading akfire_interconnect_quic MPD information
[dashsegments] Total fragments: 3
[download] Destination: Public Test Video-56282283.mp4
[download]  26.5% of ~3.78KiB at Unknown speed ETA 00:02
[download]  33.3% of ~3.78KiB at Unknown speed ETA 00:01
[download]  33.3% of ~3.78KiB at Unknown speed ETA 00:01
[download]   0.1% of ~2.73MiB at Unknown speed ETA 34:33
[download]   0.2% of ~2.73MiB at 97.23KiB/s ETA 18:38
[download]   0.3% of ~2.73MiB at 224.95KiB/s ETA 09:36
[download]   0.6% of ~2.73MiB at 478.73KiB/s ETA 04:51
etc

It'd be nicer to get this in machine parseable format, but it's viable to parse the '[download]' lines.

Any thoughts?

@brunobg
Copy link
Contributor

brunobg commented Aug 19, 2020

Ok, so I looked into this and it'd be reimplementing part of youtube-dl to support the conversion of separate audio and video. I started to implement code to use youtube-dl to download the video but it ended up changing things so much that I ended rewriting everything from scratch. The new project is at https://github.com/Corollarium/youtubedl-wrapper. I added a back link to this project.

@Kikobeats
Copy link
Collaborator

@brunobg happy so ship it as major version; keep it as a separate project is also comprehensible 👍

@brunobg
Copy link
Contributor

brunobg commented Aug 20, 2020

It's a major rewrite moving to promises and all. I'll ping you when I finish a stable release and see if you want to overhaul and merge. I also noticed https://github.com/ghjbnm/youtube-dl-wrap which I hadn't before for some reason, who created a very similar API to mine. Perhaps we could all work together.

@brunobg
Copy link
Contributor

brunobg commented Aug 21, 2020

I released v1.0 today. It misses playlists and subtitles, which I'll add eventually. Check if you want to merge it somehow.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants