tokuhirom / html-treebuilder-libxml Goto Github PK
View Code? Open in Web Editor NEWdrop-in-replacement for HTML::TreeBuilder::XPath
License: Other
drop-in-replacement for HTML::TreeBuilder::XPath
License: Other
Calling as_HTML() on a tree that has a script tag will incorrectly encode the characters <>&, which results in invalid HTML.
$ perl -MHTML::TreeBuilder::LibXML\ 9999
HTML::TreeBuilder::LibXML version 9999 required--this is only version 0.26.
BEGIN failed--compilation aborted.
$ perl -Maliased=HTML::TreeBuilder::LibXML,T \
-E 'my $t = T->new; $t->parse( q|<script>var a = 5 > 3 && 2 < 1;</script>| ); $t->eof; say $t->disembowel->as_HTML'
<script>var a = 5 > 3 && 2 < 1;</script>
I have:
my $jar = HTTP::CookieJar::LWP->new;
my $mech = WWW::Mechanize::GZip->new( cookie_jar => $jar, autocheck => 1, );
WWW::Mechanize::TreeBuilder->meta->apply($mech);
WWW::Mechanize::TreeBuilder->meta->apply($mech, tree_class => 'HTML::TreeBuilder::XPath');
And when running my script, I get the following error:
Use of uninitialized value in subroutine entry at <path>/perl5/perls/perl-5.34.0/lib/site_perl/5.34.0/HTML/TreeBuilder.pm line 108.
The error message is not emitted when I "downgrade" from WWW::Mechanize::GZip
to vanilla WWW::Mechanize
in the second line of the quoted code fagment. All modules are at the latest version.
While HTML::TreeBuilder::LibXML::Node
says that it is "HTML::Element compatible API", new
method is incompatible with that of HTML::Element
. With HTML::TreeBuilder::LibXML::Node
, we can't do operations such as:
my $new_element = ref($element)->new('div');
$element->replace_with($new_element);
I've just used minil test, minil release, and it created version 0.18 instead of 0.19.
Is it a bug, or did I do something wrong?
0.23 META.json declares Web::Scraper as a run-time dependency. However, this is needed only for running tests:
$ grep -Hnr Web::Scraper
cpanfile:6:requires 'Web::Scraper';
META.yml:39: Web::Scraper: 0
MYMETA.json:49: "Web::Scraper" : "0",
README.md:31:Web::Scraper work.
README.md:37: Web::Scraper: 0.26
t/02_web_scraper.t:6:plan skip_all => "this test requires Web::Scraper" unless eval "use Web::Scraper; 1";
t/02_web_scraper.t:21:use Web::Scraper;
MYMETA.yml:39: Web::Scraper: 0
META.json:47: "Web::Scraper" : "0",
lib/HTML/TreeBuilder/LibXML/Node.pm:88:# hack for Web::Scraper
lib/HTML/TreeBuilder/LibXML.pm:196:Web::Scraper work.
lib/HTML/TreeBuilder/LibXML.pm:202: Web::Scraper: 0.26
tools/benchmark.pl:6:use Web::Scraper;
tools/benchmark.pl:17:print "Web::Scraper: $Web::Scraper::VERSION\n";
Changes:89: * added workaround for Web::Scraper 0.36
Please move the Web::Scraper dependency from runtime requires to test requires.
After upgrading libxml2 from 2.9.10 to 2.9.12, t/05_empty.t test fails like this:
$ perl -Ilib t/05_empty.t
1..15
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
not ok 9
# Failed test at t/05_empty.t line 18.
# ''
# doesn't match '(?^:<html>)'
ok 10
ok 11
not ok 12
# Failed test at t/05_empty.t line 18.
# ''
# doesn't match '(?^:<html>)'
ok 13
ok 14
not ok 15
# Failed test at t/05_empty.t line 18.
# ''
# doesn't match '(?^:<html>)'
# Looks like you failed 3 tests of 15.
Hi.
i got
Use of uninitialized value in subroutine entry at /Users/dan/perl5/perlbrew/perls/perl-5.20.0/lib/site_perl/5.20.0/HTML/TreeBuilder/LibXML/Node.pm line 160.
warning.
there is code to reproduce.
use strict;
use warnings;
use HTML::TreeBuilder::LibXML;
my $doc = <<'DOC';
<html>
<body>
<div>
<div>a</div>
<div>b</div>
</div>
</body>
</html>
DOC
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($doc);
$tree->eof;
my @nodes = $tree->findnodes("//body/div");
my $new_node = $nodes[0]->clone;
thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.