You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Perl version: v5.8.3 built for i386-linux-thread-multi
HTML::Parser version: 3.36
on Linux 2.4.25, Debian testing dist
I am working with emulation of web browsers and found I need to have some level of preprocessing in the HTML parser. A primitive I could use for this is the ability to inject input immediately after the current parse token.
As best I can tell, when a browser hits a chunk of content such as:
<script>
document.write('<a href="http://www.perl.org/">the stuff</a>');
</script>
it essentially injects that text immediately after the </script> element in the input parse buffer.
The attached patch adds an ->inject(chunk) method to an HTML::Parser object, and is far from a clean patch, but shows my intent.
Here is a sample use of the inject method to do simple preprocessing:
#!/usr/bin/perl
use strict;
use warnings;
use lib 'blib/lib';
use lib 'blib/arch';
use HTML::Parser qw();
use URI::Escape qw();
use IO::String qw();
use IO::Handle qw();
my $h = <<EOF;
<deftag name="foo">bar</deftag>
<deftag name="navbar">
<foo>
<table>
<tr><td><a href="http://www.perl.org/">perl</a>
<tr><td><a href="http://www.apache.org/">apache</a>
<tr><td><a href="http://www.mozilla.org/">mozilla</a>
</table>
</deftag>
<html><head><title>foo</title></head><body>
<navbar>
Testing 1... 2... 3...
</body></html>
EOF
my %special = ();
my $cdt = undef;
my $p;
my @out = (\*STDOUT);
$p = new HTML::Parser(
'start_h' => [ sub { my($tag, $attr, $txt) = @_;
if(exists $special{$tag}) {
$p->inject($special{$tag});
} elsif($tag eq 'deftag') {
$cdt = $attr->{'name'};
unshift @out, IO::String->new();
} else {
$out[0]->print($txt);
}
}, 'tag,attr,text' ],
'text_h' => [ sub { $out[0]->print(shift) }, 'text' ],
'end_h' => [ sub { my($tag, $txt) = @_;
if($tag eq '/deftag') {
$special{$cdt} = ${$out[0]->string_ref()};
shift @out;
} else {
$out[0]->print($txt);
}
}, 'tag,text' ],
) or die "No parser: $!";
$p->parse($h);
From on 2006-06-18 15:01:07
:
<a href='http://www.yahoo.com'></a>Thanks! http://www.insurance-top.com/auto/ <a href='http://www.insurance-top.com'>auto insurance</a>. <a href="http://www.insurance-top.com ">Insurance car</a>: auto insurance, insurance car, Best Insurance Web site
. Also [url]http://www.insurance-top.com/car/[/url] and [link=http://www.insurance-top.com]insurance quote[/link] from site .
From on 2006-06-18 15:01:13
:
Thanks!!! http://www.insurance-top.com/company/ auto site insurance. [URL=http://www.insurance-top.com]home insurance[/URL]: auto insurance, insurance car, Best Insurance Web site
. Also [url=http://www.insurance-top.com]cars insurance[/url] from website .
From on 2006-06-18 15:01:17
:
Hi! http://www.insurance-top.com/company/ auto site insurance. auto insurance, insurance car, Best Insurance Web site
. from website .
From on 2006-06-18 15:01:21
:
From on 2006-06-18 15:01:25
:
The text was updated successfully, but these errors were encountered:
Migrated from rt.cpan.org#5941 (status was 'open')
Requestors:
Attachments:
From on 2004-04-05 23:25:04
:
From on 2006-06-18 15:01:07
:
From on 2006-06-18 15:01:13
:
From on 2006-06-18 15:01:17
:
From on 2006-06-18 15:01:21
:
From on 2006-06-18 15:01:25
:
The text was updated successfully, but these errors were encountered: