Could someone guide me on how to reduce these abusive calls to PERL…or better, put them all in one call ?!
The aim of the operation is to remove the unnecessary html code (javascripts and so on…) and alleviate the data to be processed.
--.
set htm to (do shell script "curl -s 'http://www.myveryprivateurl_sorry/id/[25-245]'" & ¬
" | perl -ne 'print if ( m /<head>/s...m /<!--\\[RIGHT BODY\\] -->/s)'" & ¬
" | perl -ne 'print unless ( m /<script/s...m /<\\/script>/s )'" & ¬
" | perl -ne 'print unless ( m /<!-- \\[TOP\\] -->/s...m /<!-- \\[\\/TOP\\] -->/s)'" & ¬
" | perl -pe 's/^\\s*//g; s/ //g; \\r//g'")
-- etc.
Here is one way to combine all those perl invocations into a single Perl program:
set perlCode to "if ( m /<head>/s...m /<!--\\[RIGHT BODY\\] -->/s) {
next if ( m /<script/s...m /<\\/script>/s );
next if ( m /<!-- \\[TOP\\] -->/s...m /<!-- \\[\\/TOP\\] -->/s);
s/^\\s*//g;
s/ //g;
s/\\r//g;
print;
}"
--.
set htm to (do shell script "curl -s 'http://www.myveryprivateurl_sorry/id/[25-245]'" & ¬
" | perl -ne " & quoted form of perlCode)
-- etc.
The last substitution command was garbled in the original post, so you might need to adjust it slightly.