PrevIndexNext
Miscellaneous Topics II
HEREDOC
A long series of print statements can get very repetitive.
A construct called HEREDOCs (borrowed from the shell) can help:
Instead of:
print "\nThere are $count people.\n";
print "\n";
print "$women of them are women\n";
print "$men of them are men.\n\n";
Try this:
print <<"EOS";
There are $count people.
$women of them are women
$men of them are men.
EOS
Much less punctuation and easier to read, yes?
The string constant extends from the line after the construct <<"EOS"
up to a line that begins with EOS. String interpolation will happen
because the EOS was double-quoted.
printf
If you want precise control of the printing of numbers and alignment of
columns there is the printf (and sprintf) function. Its first parameter is
a format with many different % characters that allow
you to control the printing:
my $r = 5/6;
my $f = 15/8;
print "ratio $r\n";
print "fraction $f\n";
my $fmt = "%10s %9.4f\n";
printf $fmt, 'ratio', $r;
printf $fmt, 'fraction', $f;
# the above will print:
ratio 0.833333333333333
fraction 1.875
ratio 0.8333
fraction 1.8750
See 'perldoc -f sprintf' for the details.
The glob operator
The < and > characters are used for reading from files.
They can also be used to get filenames (with shell metacharacters):
my $line = <$in>; # read from $in
while (<STDIN>) {} # read from STDIN
my @txt_files = <*.txt>; # get all .txt files in the current directory
my @pm_files = <src/perl/*.pm>; # get all .pm files in src/perl
How does Perl know how to interpret <>??
As before, it tries to do "what you mean" and nearly always succeeds.
It's complicated. If you're curious, see File::Glob and perlsyn.
LWP::Simple
In the early days of the web (1990's) Perl was used to generate
95% of all dynamic web pages.
Perl was affectionately termed 'The Duct Tape of the Internet'.
Today is different, of course.
Here is an example of how easy it is to do web related stuff in Perl:
use LWP::Simple;
my $html = get("http://en.wikipedia.com/wiki/perl");
This gets the complete HTML source of the URL
into a single scalar. You can then parse, slice & dice it with regex!
There is also getstore($url, $fname) which will store the contents of
the URL in a file. Very convenient.
Regex and Multi-line strings
The regular expressions we saw before matched a pattern to a single line of text.
Regex can also quite effectively deal with multi-line strings of arbitrary length.
There are two things to note about this:
- The dot '.' character normally matches any character except a newline.
With the /s modifier it will also match a newline.
my $html = get('http://en.wikipedia.org/wiki/perl');
# we have a multi-line scalar
$html =~ s{.*?<table>}{}s;
# this discarded the first 53 lines up to the first <table> tag.
The ? after the * in the regex make it match as little as possible rather than
the default behavior of being 'greedy' and matching as much as possible.
- When dealing with multi-line strings it is advised to always use the /m modifier.
It will change '^' and '$' from matching the start or end
of the string to matching the start or end of any line
anywhere within the string.
The regex meta-characters \A and \z unambiguously match the
start and end of the string.
For a Perl script that you can use to explore the above click
here.
You should be able to understand the source code but may need to
consult the lengthy 'perldoc perlre'.
Getopt::Long
As we saw in the discussion of Arrays, command line arguments
are placed in @ARGV and you can do whatever you like with them.
Many modules have been written to help with this. Getopt::Long is
a very sophisticated one and likely the one you'll see the most.
Here is one way to use it:
use Getopt::Long;
my %opt = (
nlines => 10, # default
);;
GetOptions(\%opt, qw/
verbose
nlines=i
fname=s
/) or die "usage: $0 -v -n #lines -f fname\n";
print "starting\n" if $opt{verbose};
for (1 .. $opt{nlines}) {
...
}
open IN, '<', $opt{fname} or die "no file";
We'll discuss the \%opt construct later.
Named Arguments
Hashes can be used to good effect to help with
passing many arguments to a subroutine:
sub process {
my %args = @_;
my $html = get($args{url});
for (1 .. $args{count}) { ... }
print "$args{prefix}: $result\n";
}
process(
url => 'http://www.google.com',
count => 45,
prefix => 'abc',
);
Note these things:
- There is no need to remember the order of parameters.
- It is self documenting.
- Easily extended.
An improvement to the code in sub process would be to first check the
validity of the hash keys.
Exercise
For practice in using several of these miscellaneous topics
here is a task I made up. This challenge is a concocted/fabricated one
but is actually not that far off from a 'real' one.
PrevIndexNext