parse_htmlfile - Parse HTML text from file
use HTML::Parse; $h = parse_htmlfile("test.html"); print $h->dump; $h = parse_html("<p>Some more <i>italic</i> text", $h); $h->delete;
print parse_htmlfile("index.html")->as_HTML; # tidy up markup in a file
The HTML::Parse
module provides functions to parse
HTML documents. There are two functions exported by
this module:
$obj
is assumed to be a subclass of HTML::Parser
. Refer to
Parser for more documentation.
The $obj
will default to a internally created HTML::TreeBuilder
object. This class implements a parser that builds (and is) a
HTML syntax tree with HTML::Element objects as nodes.
The return value from parse_html()
is $obj.
parse_html(),
but obtains
HTML text from the named file.
Returns undef if the file could not be opened, or $obj
otherwise.
HTML::TreeBuilder
object is created, the following variables control how parsing takes place:
Implicit elements have the implicit()
attribute set.
warn()
with an apropriate message for syntax errors.
Default is false.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.