Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetVideosAsync fix #821

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions YoutubeExplode.Tests/PlaylistSpecs.cs
Original file line number Diff line number Diff line change
Expand Up @@ -189,4 +189,18 @@ public async Task I_can_get_a_subset_of_videos_included_in_a_playlist()
// Assert
videos.Should().HaveCount(10);
}

[Fact]
public async Task I_can_get_videos_included_in_a_playlist_with_a_lot_of_duplicate()
{
// Arrange
var youtube = new YoutubeClient();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var youtube = new YoutubeClient();
// Arrange
var youtube = new YoutubeClient();


// Act
var videos = await youtube.Playlists.GetVideosAsync(PlaylistIds.EnormousDuplicates);

// Assert
videos.Should().HaveCountGreaterOrEqualTo(3_900);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also worth asserting that there are no duplicates

videos.Should().OnlyHaveUniqueItems();
}
}
1 change: 1 addition & 0 deletions YoutubeExplode.Tests/TestData/PlaylistIds.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ internal static class PlaylistIds
public const string UserUploads = "UUTMt7iMWa7jy0fNXIktwyLA";
public const string Weird = "PL601B2E69B03FAB9D";
public const string ContainsLongVideos = "PLkk2FsMngwGi9FNkWIoNZlfqglcldj_Zs";
public const string EnormousDuplicates = "PLI_eFW8NAFzYAXZ5DrU6E6mQ_XfhaLBUX";
}
19 changes: 15 additions & 4 deletions YoutubeExplode/Playlists/PlaylistClient.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
using System.Runtime.CompilerServices;
using System.Threading;
using System.Threading.Tasks;
using YoutubeExplode.Bridge;
using YoutubeExplode.Common;
using YoutubeExplode.Exceptions;
using YoutubeExplode.Videos;
Expand Down Expand Up @@ -92,8 +93,20 @@ public async IAsyncEnumerable<Batch<PlaylistVideo>> GetVideoBatchesAsync(
);

var videos = new List<PlaylistVideo>();
IEnumerable<PlaylistVideoData> originalVideos;
if (!encounteredIds.Any())
{
originalVideos = response.Videos.DistinctBy(v => v.Id);
}
else
{
originalVideos = response.Videos.Where(v =>
{
return !encounteredIds.Any(e => string.Equals(e.Value, v.Id));
});
}
Comment on lines +96 to +107
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the old code and the new code work the same 🤔

Previously, we used a HashSet to deduplicate video IDs. Using the following block:

  if (!encounteredIds.Add(videoId))
                    continue; // this will execute if the `videoId` already exists

And now we use this instead:

            if (!encounteredIds.Any())
            {
                originalVideos = response.Videos.DistinctBy(v => v.Id);
            }
            else
            {
                originalVideos = response.Videos.Where(v =>
                {
                    return !encounteredIds.Any(e => string.Equals(e.Value, v.Id));
                });
            }

Which seems to do the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run it by yourself and see that my changes allow to download more videos. I can't explain by words, why this changes works better than code on master tree. My observation just showed that for some reasons, on some point (probably when last item is duplicate), it starts to return videos, that were already added to HashSet.


foreach (var videoData in response.Videos)
foreach (var videoData in originalVideos)
{
var videoId =
videoData.Id
Expand All @@ -105,9 +118,7 @@ public async IAsyncEnumerable<Batch<PlaylistVideo>> GetVideoBatchesAsync(
videoData.Index
?? throw new YoutubeExplodeException("Failed to extract the video index.");

// Don't yield the same video twice
if (!encounteredIds.Add(videoId))
continue;
encounteredIds.Add(videoId);

var videoTitle =
videoData.Title
Expand Down