Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add invidious companion support #4985

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

unixfox
Copy link
Member

@unixfox unixfox commented Oct 7, 2024

Description

Invidious companion is the new tool created for the retrieval of the YouTube streams: https://github.com/iv-org/invidious-companion

Invidious will not handle the videos streams retrieval anymore, as it has become a burner for the Invidious team to adapt. Instead, invidious companion will be based on https://github.com/LuanRT/YouTube.js which is the most up to date when it comes to video streams retrieval.

This allows us to spend more time on actually improving the Invidious frontend.

What does this PR do?

Invidious send the /player request to Invidious companion in HTTP(S). Invidious companion does all of its magic then Invidious does the usual parsing of the player endpoint.

Invidious also delegate the work of latest_version, /api/manifest/dash/id and /videoplayback to Invidious companion.

What does invidious companion do

  • You can set multiple invidious_companion. Allowing you to utilize multiple external servers
  • It supports SOCKS proxy - related to Allow to use proxies (HTTP(s) & socks) #301
  • There is a plan to support multiple proxies: handle the ability to use multiple proxies invidious-companion#5
  • There is no need to use inv_sig_helper as the program automatically handle the deciphering.
  • There is no need for youtube-trusted-session-generator as the program automatically generate a po_token at a configurable frequency.
  • You can logging with a yt account

Incompatibilities

  • You can't use inv_sig_helper with invidious companion

Not supported yet - will be work in progress after this PR is merged

Future potential work

  • Have Invidious proxying the requests to Invidious companion in order to make it easier for beginners.

How to try?

  1. Run Invidious companion with a secret key: https://github.com/iv-org/invidious-companion?tab=readme-ov-file#run-locally
  2. Configure Invidious companion in the config.yaml and with the same secret key:
invidious_companion:
  - private_url: "http://localhost:8282/"
    public_url: "http://localhost:8282/"
invidious_companion_key: hoMyBeautifulKey
  1. Run Invidious

Fixes

Related to

@unixfox unixfox force-pushed the invidious-companion branch 2 times, most recently from 7efa8f7 to 194fb72 Compare October 20, 2024 00:11
@unixfox unixfox force-pushed the invidious-companion branch from f6d8ddc to a63fca8 Compare November 1, 2024 20:34
@unixfox unixfox changed the title add invidious_companion option Add invidious companion support Nov 1, 2024
@unixfox unixfox requested a review from syeopite November 1, 2024 21:19
@unixfox unixfox marked this pull request as ready for review November 1, 2024 21:19
@unixfox unixfox requested review from SamantazFox and a team as code owners November 1, 2024 21:19
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/videos/parser.cr Outdated Show resolved Hide resolved
src/invidious/videos.cr Outdated Show resolved Hide resolved
src/invidious/routes/watch.cr Outdated Show resolved Hide resolved
Comment on lines 23 to 25
if local && CONFIG.invidious_companion
return env.redirect "#{video.invidious_companion["baseUrl"].as_s}#{env.request.path}?#{env.request.query}"
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the get_video() call above already talks to the companion and gets playable URLs, why still redirect to the companion manifest handler? Why not let invidious generate the manifest?

PS: I realized that there is an important oversight here: what if the video object (e.g returned from cache) doesn't have a companion field, or the companion base URL is not present anymore in the config?

PPS: Why handling that case at all when the URL is already replaced in components/player.ecr?

Copy link
Member Author

@unixfox unixfox Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple reasons:

In the design plan, I said that everything related to the video retrieval should be handled now in invidious companion. It doesn't make a lot of sense to handle all the logic explained above in invidious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One may configure a proxy in invidious_companion. And invidious companion is going to handle multiple proxies soon, so this can't be handled inside Invidious

Unless this is done transparently to the end user (e.g with NGINX rules), the CSP header will need to reflect the use of these external proxies, and so Invidious will need to take care of it.

DASH manifest generation from youtube.js include way more things than invidious. which includes multi-language audio support

But is this compatible with videoJS? Our DASH manifest is missing multiple formats because of that.

If an external program relies on the dash api, it makes more sense to redirect this request to Invidious companion because it avoids making unnecessary process in invidious as Invidious companion is the one requesting the video streams from YouTube.

Invidious already requests and parses the whole /player response returned by the companion (if it wasn't already in cache) so I don't think this will represent much more processing.

One may configure multiple invidious companion, we need to make sure that the video stream used is from the same public IP address that generated the video stream.

For an API user, all URLs should already have that companion as the host, and for a web user, all the relevant URLs should already be handled in components/player.ecr. Am I right?

If so, the only way to access invidiou's own /latest_version and manifest endpoints would be willingly by a malicious or careless API user, so broken streams are a non-problem here imo.

Copy link
Member Author

@unixfox unixfox Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I would like to say that I have been in your thinking situation. I had long thoughts about the advantages and the disadvantages of each implementation. This current implementation outweighs the one where Invidious handle almost all the logic and Invidious companion is just "dumb proxy".

Also, invidious companion is not entirely new, it's actually based on the ideas of invidious-stripped-down. A project that I have used for 2 years on yewtu.be in order to replace some functions of Invidious. For really improving the performance and user experience of my instance.

Unless this is done transparently to the end user (e.g with NGINX rules), the CSP header will need to reflect the use of these external proxies, and so Invidious will need to take care of it.

Already does in https://github.com/iv-org/invidious/pull/4985/files#diff-a86ced8aa99e3403588538decc008851d3d33dfadbb00d392d5ec8b4696f7df4

But is this compatible with videoJS? Our DASH manifest is missing multiple formats because of that.

Yes. I have used it for 2 years on yewtu.be. I'm stripping out the codec that do not work: https://github.com/iv-org/invidious-companion/blob/master/src/routes/invidious_routes/dashManifest.ts#L41-L43

If an external program relies on the dash api, it makes more sense to redirect this request to Invidious companion because it avoids making unnecessary process in invidious as Invidious companion is the one requesting the video streams from YouTube.

Invidious already requests and parses the whole /player response returned by the companion (if it wasn't already in cache) so I don't think this will represent much more processing.

The benefit of handling latest_version and /api/manifest/dash/id/ is that you can more easily handle all the edge cases in invidious companion itself, no need to add more logic in Invidious. Especially in a time when YouTube is evolving rapidly, we want to keep the pace thanks to youtube.js.

For an API user, all URLs should already have that companion as the host, and for a web user, all the relevant URLs should already be handled in components/player.ecr. Am I right?
If so, the only way to access invidiou's own /latest_version and manifest endpoints would be willingly by a malicious or careless API user, so broken streams are a non-problem here imo.

My number one priority is users using Invidious directly. We can think about API users later. Our main user base uses our frontend.

It doesn't mean that developers using the API are forever forgotten. I have ideas but for the moment, if invidious companion is configured then the video streams given through the API may not always work, especially when "proxying" through the same public IP address is needed.

I'm sorry, but a day is only 24 hours and at the moment I'm a single developer trying to improve the overall usability of Invidious through youtube.js. I can't deal with all the cases at the same time. Happy to receive more help for contribution!

Copy link
Member

@SamantazFox SamantazFox Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already does in https://github.com/iv-org/invidious/pull/4985/files#diff-a86ced8aa99e3403588538decc008851d3d33dfadbb00d392d5ec8b4696f7df4

The problem is that it add the companion's domains to the CSP, but not the companion's proxies themselves.

The benefit of handling latest_version and /api/manifest/dash/id/ is that you can more easily handle all the edge cases in invidious companion itself [...]
[...]
My number one priority is users using Invidious directly. We can think about API users later. Our main user base uses our frontend.

My question was about the necessity of adding this redirect logic on these endpoints, where they should never be called in the first place, as none of the URLs in the /watch page should lead to invidious in the first place!

Copy link
Member Author

@unixfox unixfox Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already does in #4985 (files)

The problem is that it add the companion's domains to the CSP, but not the companion's proxies themselves.

This adds in the CSP the baseUrl: https://github.com/iv-org/invidious-companion/blob/master/config/default.toml#L5 which is given by invidious companion in the JSON reply. This domain is the same domain that which videojs will use for either latest_version or dash manifest api requests or videoplayback request.

We are not adding the URL passing in here: https://github.com/iv-org/invidious/pull/4985/files#diff-b68cf7cb2cb2dbd2275444b03cdc238962e05b5ccff0b89ba34b1f9126f39187R71

 invidious_companion:
 - http://127.0.0.1:8282

My question was about the necessity of adding this redirect logic on these endpoints, where they should never be called in the first place, as none of the URLs in the /watch page should lead to invidious in the first place!

You always have stuff that use these endpoints directly without going through the frontend. It's to make sure that we are using invidious companion all the time.

  • download function/button in invidious
  • bots
  • apps that use latest_version or dash manifest api instead of the official api of invidious

It's to make sure that we avoid sending wrong requests to youtube servers and the IP get banned for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds in the CSP the baseUrl [...] which is given by invidious companion in the JSON reply

I got that ^^ But if the companion has say proxy1.example.com and proxy2.example.com set, those won't be added to the CSP. Unless this is completely transparent to invidious (= the companion forwards the request to the proxies itself, and invidious only ever connects to http://127.0.0.1:8282)

It's to make sure that we avoid sending wrong requests to youtube servers and the IP get banned for that.

Fair enough!

Copy link
Member Author

@unixfox unixfox Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten some part. Let me explain.

If you ENABLE invidious_companion then by default in Invidious:

  • /api/manifest/dash/id/ is blocked
  • /latest_version is blocked

This makes sense, as those endpoints are not used in Invidious if Invidious companion is enabled. This is to avoid any tries to load some video stream through the incorrect IP and making the requests suspicious to YouTube servers.

But for the "downloads" feature, all requests will now be straight sent to latest_version endpoint of invidious companion.

I know this does not yet download the video file directly but the user can do it by clicking on the download feature in their browser and this works. It's being tracked in iv-org/invidious-companion#13

In my opinion this is not a dealbreaker as the Invidious download function is somewhat not useful anymore since the removal of hd720. Maybe in the future we could look into adding ffmpeg combining video and audio on the fly, but right now I'm prioritizing giving back more stability to Invidious.

src/invidious/routes/video_playback.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/views/components/player.ecr Outdated Show resolved Hide resolved
@unixfox
Copy link
Member Author

unixfox commented Nov 5, 2024

Feedback from @Fijxu:
Try to handle busted or IP blocked invidious companions.

src/invidious/videos/parser.cr Outdated Show resolved Hide resolved
Comment on lines 23 to 25
if local && CONFIG.invidious_companion
return env.redirect "#{video.invidious_companion["baseUrl"].as_s}#{env.request.path}?#{env.request.query}"
end
Copy link
Member

@SamantazFox SamantazFox Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already does in https://github.com/iv-org/invidious/pull/4985/files#diff-a86ced8aa99e3403588538decc008851d3d33dfadbb00d392d5ec8b4696f7df4

The problem is that it add the companion's domains to the CSP, but not the companion's proxies themselves.

The benefit of handling latest_version and /api/manifest/dash/id/ is that you can more easily handle all the edge cases in invidious companion itself [...]
[...]
My number one priority is users using Invidious directly. We can think about API users later. Our main user base uses our frontend.

My question was about the necessity of adding this redirect logic on these endpoints, where they should never be called in the first place, as none of the URLs in the /watch page should lead to invidious in the first place!

@unixfox unixfox force-pushed the invidious-companion branch from 2683b24 to 37df2b4 Compare November 8, 2024 19:28
@unixfox unixfox marked this pull request as draft November 8, 2024 22:44
@unixfox unixfox force-pushed the invidious-companion branch from 97ae26c to 1aa154b Compare November 16, 2024 21:36
src/invidious/routes/watch.cr Outdated Show resolved Hide resolved
src/invidious/routes/embed.cr Outdated Show resolved Hide resolved
src/invidious/routes/watch.cr Outdated Show resolved Hide resolved
src/invidious/routes/watch.cr Outdated Show resolved Hide resolved
src/invidious/videos.cr Outdated Show resolved Hide resolved
Comment on lines 4 to 6
if !CONFIG.invidious_companion.empty?
return error_template(403, "This endpoint is not permitted because it is handled by Invidious companion.")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think here to redirect to a random invidious companion, rather than display an error message? Sure, if there are multiple companions, we might not hit the same that initially loaded the watch page, but it might make playback smoother.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that you have to do a request to a random invidious companion for knowing its "public" URL. I had ideas to include the public URL inside the config.yml. But my poor Crystal knowledge limited me due to the new URIArrayConverter.

Something like:

invidious_companion:
 - http://127.0.0.1:8282:
      public_url: https://companion1.invidious.com

I want to avoid doing unnecessary requests to invidious companion.

http://127.0.0.1:8282 is the internal address that invidious uses to communicate with invidious companion. but the companion could very well be on another server, and so having a config like this can exist:

invidious_companion:
 - http://10.0.0.2:8282:
      public_url: https://companion1.invidious.com

10.0.0.2 is another server and 10.0.0.0/24 is an internal network faster than the internet network. example: https://www.ovhcloud.com/en/public-cloud/private-network/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would be more straight forward to add

invidious_companion:
  - private_url: "http://localhost:8000"
    public_url: "https://example.com"

  - private_url: "http://localhost:8001"
    public_url: "https://example2.com"
struct CompanionConfig
  include YAML::Serializable

  @[YAML::Field(converter: Preferences::URIConverter)]
  property private_url : URI = URI.parse("")

  @[YAML::Field(converter: Preferences::URIConverter)]
  property public_url : URI = URI.parse("")
end

class Config
  # ...

  property invidious_companion : Array(CompanionConfig)? = nil
end

Copy link
Member

@SamantazFox SamantazFox Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the simplest way to do it would be with an intermediate class, like that. Then the URIArrayConverter is not needed anymore, you can just use the simpler URIConverter

class CompanionConfig
  @[YAML::Field(converter: Preferences::URIConverter)]
  property internal_url : URI

  @[YAML::Field(converter: Preferences::URIConverter)]
  property public_url : URI
end

class Config
  # Invidious companion
  property invidious_companion : Array(CompanionConfig) = [] of CompanionConfig
end

src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
src/invidious/yt_backend/youtube_api.cr Outdated Show resolved Hide resolved
@unixfox
Copy link
Member Author

unixfox commented Dec 13, 2024

I have just added a new commit for giving invidious companion the ability to verify that the request originated from an invidious watch page.

This allows to combat against bots that will abuse the latest_version endpoint. This verification is not enabled by default in Invidious companion.

I made it on purpose to not include the verification ID for the internal latest_version redirect. Mainly because this would defeat the purpose of combatting bots since the ID would be given by Invidious.

@unixfox unixfox force-pushed the invidious-companion branch from bce789b to 1de2054 Compare December 13, 2024 19:41
@unixfox unixfox marked this pull request as ready for review December 15, 2024 22:27
@unixfox
Copy link
Member Author

unixfox commented Dec 15, 2024

I just marked this PR as ready as I think the code is now production ready.

@SamantazFox @syeopite could you please take a look again at the code? Thanks.

src/invidious/config.cr Outdated Show resolved Hide resolved
@@ -222,6 +238,24 @@ class Config
end
{% end %}

if !config.invidious_companion.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codestlye nitpick

What do you think about replacing !empty? with present? instead or unless {...}.empty?

I've seen a couple discussions in the Crystal community regarding if !empty? being harder to process cognitively due to an essentially double negation.

Related discussions:
https://forum.crystal-lang.org/t/collections-any-vs-empty/5303
crystal-lang/crystal#13847
crystal-lang/shards#577 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect for me. That was already odd for me to do !empty?, happy there is an alternative.

Comment on lines +204 to +209
if companion_base_url = video.invidious_companion.try &.["baseUrl"].as_s
env.response.headers["Content-Security-Policy"] =
env.response.headers["Content-Security-Policy"]
.gsub("media-src", "media-src #{companion_base_url}")
.gsub("connect-src", "connect-src #{companion_base_url}")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these to the before_all handler instead maybe under a:

if {"/embed", "/watch"}.any? { |r| env.request.resource.starts_with? r }
      env.response.headers["Content-Security-Policy"] =
        env.response.headers["Content-Security-Policy"]
          .gsub("media-src", "media-src #{companion_base_url}")
          .gsub("connect-src", "connect-src #{companion_base_url}")
end

client_config.client_type = YoutubeAPI::ClientType::AndroidTestSuite
new_player_response = try_fetch_streaming_data(video_id, client_config)
end
if !CONFIG.invidious_companion.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !CONFIG.invidious_companion.empty?
if CONFIG.invidious_companion.empty?

The Invidious stream data workarounds should run when invidious companion is not set


begin
invidious_companion = CONFIG.invidious_companion.sample
response = make_client(invidious_companion.private_url,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Invidious is expected to constantly make requests to the Invidious companion shouldn't it use a connection pool?

It shouldn't be too difficult to modify the connection pool's factory method to randomly select a companion URL for each client it creates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you suggest how to do that? I'm not very familiar with this stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it but something like this should do the trick:

Patch
diff --git a/src/invidious.cr b/src/invidious.cr
index b422dcbb..c0c78a79 100644
--- a/src/invidious.cr
+++ b/src/invidious.cr
@@ -97,6 +97,10 @@ YT_POOL = YoutubeConnectionPool.new(YT_URL, capacity: CONFIG.pool_size)
 
 GGPHT_POOL = YoutubeConnectionPool.new(URI.parse("https://yt3.ggpht.com"), capacity: CONFIG.pool_size)
 
+COMPANION_POOL = CompanionConnectionPool.new(
+  capacity: CONFIG.pool_size
+)
+
 # CLI
 Kemal.config.extra_options do |parser|
   parser.banner = "Usage: invidious [arguments]"
diff --git a/src/invidious/yt_backend/connection_pool.cr b/src/invidious/yt_backend/connection_pool.cr
index c4a73aa7..6f1ef9bd 100644
--- a/src/invidious/yt_backend/connection_pool.cr
+++ b/src/invidious/yt_backend/connection_pool.cr
@@ -46,6 +46,45 @@ struct YoutubeConnectionPool
   end
 end
 
+struct CompanionConnectionPool
+  property pool : DB::Pool(HTTP::Client)
+
+  def initialize(capacity = 5, timeout = 5.0)
+    options = DB::Pool::Options.new(
+      initial_pool_size: 0,
+      max_pool_size: capacity,
+      max_idle_pool_size: capacity,
+      checkout_timeout: timeout
+    )
+
+    @pool = DB::Pool(HTTP::Client).new(options) do
+      companion = CONFIG.invidious_companion.sample
+      next make_client(companion.private_url, force_resolve: true)
+    end
+  end
+
+  def client(&)
+    conn = pool.checkout
+    # Proxy needs to be reinstated every time we get a client from the pool
+    conn.proxy = make_configured_http_proxy_client() if CONFIG.http_proxy
+
+    begin
+      response = yield conn
+    rescue ex
+      conn.close
+
+      companion = CONFIG.invidious_companion.sample
+      conn = make_client(companion.private_url, force_resolve: true)
+
+      response = yield conn
+    ensure
+      pool.release(conn)
+    end
+
+    response
+  end
+end
+
 def add_yt_headers(request)
   request.headers.delete("User-Agent") if request.headers["User-Agent"] == "Crystal"
   request.headers["User-Agent"] ||= "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"
diff --git a/src/invidious/yt_backend/youtube_api.cr b/src/invidious/yt_backend/youtube_api.cr
index 74f65449..9bc6fe05 100644
--- a/src/invidious/yt_backend/youtube_api.cr
+++ b/src/invidious/yt_backend/youtube_api.cr
@@ -695,9 +695,7 @@ module YoutubeAPI
     # Send the POST request
 
     begin
-      invidious_companion = CONFIG.invidious_companion.sample
-      response = make_client(invidious_companion.private_url,
-        &.post(endpoint, headers: headers, body: data.to_json))
+      response = COMPANION_POOL.client &.post(endpoint, headers: headers, body: data.to_json)
       body = response.body
       if (response.status_code != 200)
         raise Exception.new(

Keep in mind that the connection pool will have the behavior described here #4326 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment