Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podcast 用の RSS パース時にデプロイエラーが起こる #1620

Closed
1 of 4 tasks
yasulab opened this issue Aug 6, 2024 · 6 comments
Closed
1 of 4 tasks

Podcast 用の RSS パース時にデプロイエラーが起こる #1620

yasulab opened this issue Aug 6, 2024 · 6 comments
Labels
バグ Bug needed to be fixed. 急ぎじゃないよ Make something better but not rushed.

Comments

@yasulab
Copy link
Member

yasulab commented Aug 6, 2024

やること

背景

RSS のサイズが大きすぎる...? cc/ @nanophate

RSS::NotWellFormedError: This is not well formed XML
entity expansion has grown too large


image

A deployment for coderdojo-japan failed due to a release phase command in release v3238. To inspect the failure, check your release phase log in the dashboard or run 'heroku releases:output v3238' in the CLI.

If you wish to retry the release, you can use the release retry CLI plugin.

image

(要ログイン) https://dashboard.heroku.com/apps/coderdojo-japan/activity/releases/3238

rails aborted!
RSS::NotWellFormedError: This is not well formed XML
entity expansion has grown too large
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/rexmlparser.rb:20:in `rescue in _parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/rexmlparser.rb:16:in `_parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:183:in `parse'
/app/vendor/ruby-3.1.4/lib/ruby/3.1.0/forwardable.rb:238:in `parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:88:in `parse'
/app/lib/tasks/podcasts.rake:16:in `block (2 levels) in <main>'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `block in execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:160:in `invoke_task'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block (2 levels) in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:125:in `run_with_threads'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:110:in `top_level'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block (2 levels) in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:186:in `standard_exception_handling'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/rake_module.rb:59:in `with_application'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:18:in `perform'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/command.rb:50:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands.rb:18:in `<main>'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
Caused by:
entity expansion has grown too large
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/baseparser.rb:559:in `block in unnormalize'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/baseparser.rb:551:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/baseparser.rb:551:in `unnormalize'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/streamparser.rb:39:in `parse'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/document.rb:402:in `parse_stream'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/rexmlparser.rb:18:in `_parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:183:in `parse'
/app/vendor/ruby-3.1.4/lib/ruby/3.1.0/forwardable.rb:238:in `parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:88:in `parse'
/app/lib/tasks/podcasts.rake:16:in `block (2 levels) in <main>'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `block in execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:160:in `invoke_task'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block (2 levels) in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:125:in `run_with_threads'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:110:in `top_level'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block (2 levels) in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:186:in `standard_exception_handling'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/rake_module.rb:59:in `with_application'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:18:in `perform'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/command.rb:50:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands.rb:18:in `<main>'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
Tasks: TOP => podcasts:upsert
(See full trace by running task with --trace)
@yasulab yasulab added バグ Bug needed to be fixed. 急ぎじゃないよ Make something better but not rushed. labels Aug 6, 2024
@yasulab
Copy link
Member Author

yasulab commented Aug 7, 2024

#1619 のコミット (a530169) が原因っぽいので一旦 revert しました 👀 ✅ (RSS は動的に設定できないので 急ぎじゃないよ で大丈夫そう)

@nanophate
Copy link
Member

nanophate commented Aug 7, 2024

原因

原因を調べるために、rexml の変更点とエラーメッセージを探したところ、下記のような処理が追加されていました。
ruby/rexml@v3.3.2...v3.3.4#diff-f8c7cdefc29090ed525a2be70411ce741d4124853cf6425db7d18a6ea3bb9bb3R558-R561

if sum > Security.entity_expansion_text_limit
   raise "entity expansion has grown too large"
end

この変更で Security.entity_expansion_text_limit を読み込むようになり、デフォルトでは 10240 に固定されたため、エラーが起きるようになったようです。ちなみに取得先の RSS では、1,800,000 近くまで行くので、上限を超えています。

回避方法

REXML::Security.entity_expansion_text_limit = 2_000_000 のように、Limit を固定することで問題なく実行できるようになる事を確認しました。このやり方だと適用範囲が大きそうなので、ruby/rexml#192 の影響範囲を小さく設定できる機能のマージを待ってから対応してもよさそうと思っています。

.oO(私たちのコード部分では、RSS Parser を使っているので、RSS Parser 側での対応が必要なのか気になりますね… 🤔💭 )

REXML::Security.entity_expansion_text_limit = 2_000_000
FM_RSS = "https://example.com/rss"
rss = RSS::Parser.parse(FM_RSS, false)

ANCHOR_FM_RSS = Rails.env.test? ?
'anchorfm_sample.rss' :
'https://anchor.fm/s/54d501e8/podcast/rss'
rss = RSS::Parser.parse(ANCHOR_FM_RSS, false)

@yasulab
Copy link
Member Author

yasulab commented Aug 8, 2024

@nanophate 早速の原因調査ありがとうございます!! 😻🆒✨ 現在のシステム構成では特にセキュリティ上問題になるような動的な RSS 入力はないという認識なので、僕も以下の対応が良いと思います! (≧∇≦)b✨

このやり方だと適用範囲が大きそうなので、ruby/rexml#192 の影響範囲を小さく設定できる機能のマージを待ってから対応してもよさそう

@naitoh
Copy link

naitoh commented Aug 17, 2024

@yasulab @nanophate
Security.entity_expansion_text_limit の計算方法に誤りがあったので ruby/rexml#195 で修正しました。

https://github.com/ruby/rexml/releases/tag/v3.3.5 で修正済みですので、rexml 3.3.5 を試してみて頂ければ 🙏

@nanophate
Copy link
Member

nanophate commented Aug 17, 2024

@naitoh 親切にお知らせいただきありがとうございます。先ほど、下記の PR で rexml 3.3.5 にアップデートした状態で問題なく、動く事および、デプロイ時にエラーにならない事の確認ができました...!! 💯 🚀 ✨数字も1,800,000 から 4,098 とページに対して適切なサイズになってました!対応いただき改めて感謝申し上げます🙇

https://github.com/coderdojo-japan/coderdojo.jp/pull/1622/files#diff-89cade48462044ee1b672dc5f4c3ec250fbd29effcd8932096a23c1283c6731fR365

Screenshot 2024-08-17 at 16 31 43
 bundle exec rails podcasts:upsert
==== START podcasts:upsert ====

Frame number: 0/42

From: /Users/vivio/.code/coderdojo.jp/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.5/lib/rexml/parsers/baseparser.rb:559 REXML::Parsers::BaseParser#unnormalize:

    554:               if entity_value
    555:                 re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
    556:                 rv.gsub!( re, entity_value )
    557:                 binding.pry
    558:                 
 => 559:                 if rv.bytesize > Security.entity_expansion_text_limit
    560:                   raise "entity expansion has grown too large"
    561:                 end
    562:               else
    563:                 er = DEFAULT_ENTITIES[entity_reference]
    564:                 rv.gsub!( er[0], er[2] ) if er

[1] pry(#<REXML::Parsers::BaseParser>)> rv.bytesize
=> 4098

@yasulab
Copy link
Member Author

yasulab commented Aug 18, 2024

産地直送...!!!! 🚜💨✨ ご丁寧なコメント&アドバイスありがとうございます!!!!(>人< )💖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
バグ Bug needed to be fixed. 急ぎじゃないよ Make something better but not rushed.
Projects
None yet
Development

No branches or pull requests

3 participants