Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug that Stream parser doesn't expand the user-defined entity references for "text" #200

Merged
merged 5 commits into from
Aug 21, 2024

Commits on Aug 20, 2024

  1. Configuration menu
    Copy the full SHA
    68939ea View commit details
    Browse the repository at this point in the history
  2. Fix a bug that Stream parser doesn't expand the user-defined entity r…

    …eferences for "text"
    
    ## Why?
    Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references.
    
    ## Change
    - text_stream_unnormalize.rb
    ```
    $LOAD_PATH.unshift(File.expand_path("lib"))
    require 'rexml/document'
    require 'rexml/parsers/sax2parser'
    require 'rexml/parsers/pullparser'
    require 'rexml/parsers/streamparser'
    require 'rexml/streamlistener'
    
    xml = <<EOS
    <!DOCTYPE foo [
      <!ENTITY la "1234">
      <!ENTITY lala "--&la;--">
      <!ENTITY lalal "&la;&la;">
    ]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root>
    EOS
    
    class StListener
      include REXML::StreamListener
    
      def text(text)
        puts text
      end
    end
    
    puts "REXML(DOM)"
    REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text}
    
    puts ""
    puts "REXML(Pull)"
    parser = REXML::Parsers::PullParser.new(xml)
    while parser.has_next?
      event = parser.pull
      case event.event_type
      when :text
        puts event[1]
      end
    end
    
    puts ""
    puts "REXML(Stream)"
    parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse
    
    puts ""
    puts "REXML(SAX)"
    sax = REXML::Parsers::SAX2Parser.new(xml)
    sax.listen(:characters) {|x| puts x }
    sax.parse
    ```
    
    ## Before (master)
    ```
    $ ruby  text_stream_unnormalize.rb
    REXML(DOM)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    
    REXML(Pull)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    
    REXML(Stream)
    &la;           #<= This
    &lala;         #<= This
    <P> <I> <B> Text </B> </I>
    test™
    
    REXML(SAX)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    ```
    
    After(This PR)
    
    ```
    $ ruby  text_stream_unnormalize.rb
    REXML(DOM)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    
    REXML(Pull)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    
    REXML(Stream)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    
    REXML(SAX)
    1234
    --1234--
    <P> <I> <B> Text </B> </I>
    test™
    ```
    naitoh committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    8b97bae View commit details
    Browse the repository at this point in the history
  3. Add support for XML entity expansion limitation in Stream parser

    ## Why?
    
    See:
    - ruby#187
    - ruby#195
    
    ## Change
    - Supported `REXML::Security.entity_expansion_limit=` in Stream parser
    - Supported `REXML::Security.entity_expansion_text_limit=` in Stream parser
    naitoh committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    dc48407 View commit details
    Browse the repository at this point in the history
  4. Update test_with_only_default_entities test case

    ## Why?
    Because `StreamParser#entity_expansion_count` was added.
    naitoh committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    7a8f3e3 View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2024

  1. Configuration menu
    Copy the full SHA
    c636358 View commit details
    Browse the repository at this point in the history