| 18/38 |
© 2004 by Philippe "BooK" Bruhat.
|
Imagine I hate the <em> tag and only want to see <i> tags
The filtering routine is easy to write:
s!<(/?)em\b!<$1i!g;
And so is the filter:
$filter = HTTP::Proxy::BodyFilter::simple->new( sub { ${ $_[1] } =~ s!<(/?)em\b!<$1i!g; } );
Except that the filter might (and surely will) receive a chunk of data containing an incomplete tag, like
ove <b><em></b> tags, love, <em>love</e
In this case, your regular expression won't match the closing tag and the transmogrified HTML will not be well-formed any more.
You have to make sure no tag is cut. Maybe you could use HTML::Parser?