PrevIndexNext

Miscellaneous Topics II

HEREDOC

A long series of print statements can get very repetitive. A construct called HEREDOCs (borrowed from the shell) can help: Instead of:
print "\nThere are $count people.\n"; print "\n"; print "$women of them are women\n"; print "$men of them are men.\n\n";
Try this:
print <<"EOS"; There are $count people. $women of them are women $men of them are men. EOS
Much less punctuation and easier to read, yes? The string constant extends from the line after the construct <<"EOS" up to a line that begins with EOS. String interpolation will happen because the EOS was double-quoted.

printf

If you want precise control of the printing of numbers and alignment of columns there is the printf (and sprintf) function. Its first parameter is a format with many different % characters that allow you to control the printing:
my $r = 5/6; my $f = 15/8; print "ratio $r\n"; print "fraction $f\n"; my $fmt = "%10s %9.4f\n"; printf $fmt, 'ratio', $r; printf $fmt, 'fraction', $f; # the above will print: ratio 0.833333333333333 fraction 1.875 ratio 0.8333 fraction 1.8750
See 'perldoc -f sprintf' for the details.

The glob operator

The < and > characters are used for reading from files. They can also be used to get filenames (with shell metacharacters):
my $line = <$in>; # read from $in while (<STDIN>) {} # read from STDIN my @txt_files = <*.txt>; # get all .txt files in the current directory my @pm_files = <src/perl/*.pm>; # get all .pm files in src/perl
How does Perl know how to interpret <>?? As before, it tries to do "what you mean" and nearly always succeeds. It's complicated. If you're curious, see File::Glob and perlsyn.

LWP::Simple

In the early days of the web (1990's) Perl was used to generate 95% of all dynamic web pages. Perl was affectionately termed 'The Duct Tape of the Internet'. Today is different, of course. Here is an example of how easy it is to do web related stuff in Perl:
use LWP::Simple; my $html = get("http://en.wikipedia.com/wiki/perl");
This gets the complete HTML source of the URL into a single scalar. You can then parse, slice & dice it with regex!

There is also getstore($url, $fname) which will store the contents of the URL in a file. Very convenient.

Regex and Multi-line strings

The regular expressions we saw before matched a pattern to a single line of text. Regex can also quite effectively deal with multi-line strings of arbitrary length. There are two things to note about this:
  1. The dot '.' character normally matches any character except a newline. With the /s modifier it will also match a newline.
    my $html = get('http://en.wikipedia.org/wiki/perl'); # we have a multi-line scalar $html =~ s{.*?<table>}{}s; # this discarded the first 53 lines up to the first <table> tag.
    The ? after the * in the regex make it match as little as possible rather than the default behavior of being 'greedy' and matching as much as possible.
  2. When dealing with multi-line strings it is advised to always use the /m modifier. It will change '^' and '$' from matching the start or end of the string to matching the start or end of any line anywhere within the string.

    The regex meta-characters \A and \z unambiguously match the start and end of the string.

For a Perl script that you can use to explore the above click here. You should be able to understand the source code but may need to consult the lengthy 'perldoc perlre'.

Getopt::Long

As we saw in the discussion of Arrays, command line arguments are placed in @ARGV and you can do whatever you like with them. Many modules have been written to help with this. Getopt::Long is a very sophisticated one and likely the one you'll see the most. Here is one way to use it:
use Getopt::Long; my %opt = ( nlines => 10, # default );; GetOptions(\%opt, qw/ verbose nlines=i fname=s /) or die "usage: $0 -v -n #lines -f fname\n"; print "starting\n" if $opt{verbose}; for (1 .. $opt{nlines}) { ... } open IN, '<', $opt{fname} or die "no file";
We'll discuss the \%opt construct later.

Named Arguments

Hashes can be used to good effect to help with passing many arguments to a subroutine:
sub process { my %args = @_; my $html = get($args{url}); for (1 .. $args{count}) { ... } print "$args{prefix}: $result\n"; } process( url => 'http://www.google.com', count => 45, prefix => 'abc', );
Note these things: An improvement to the code in sub process would be to first check the validity of the hash keys.

Exercise

For practice in using several of these miscellaneous topics here is a task I made up. This challenge is a concocted/fabricated one but is actually not that far off from a 'real' one.

PrevIndexNext