PERL for Biologists

Course by Kurt Stüber

Previous, Part 2, Next

Introduction to PERL arrays

Scalar arrays:

PERL arrays are indicated by the @-sign:
Examples:
@month
@weekday
@measurement

Array names are formed in a similar way as variable names and are subject to the same rules. Only the starting $ is replaced by an @-sign.

Arrays can be filled at the time of declaration:

@month = ("January", "February", "March", "April", "June", "July", "August", "September", "October", "November", "December" );
@weekday = ( "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" );
@measurement = ( 12, 3, 56, 17, 8, 32, 24 );
@base = ( "T","C","A","G" );
@amino_acid = ( "Ala", "Cys", "Asp", "Glu", "Phe", "Gly", "His", "Ile", "Lys", "Leu", "Met", "Asn", "Pro", "Gln", "Arg", "Ser", "Thr", "Val", "Trp", "Tyr", "***" );

The individual values in an array can be addressed by their indices. The values are automatically enumerated, starting by the index 0.

print $month[ 0 ];

This command will produce the word January as output. If in an expression a single element of an array is to be used, then the @-sign is replaced again with the $-sign. The index of the element is indicated between two square brackets [].

Size of arrays

PERL allows arrays of diffent lenths. Also the length (or number of elements) of an array can vary during the execution of the program. When one desires to know the length of an array this can be calculated from the following statement:

$no_of_months = @month;

When a single variable is assigned to a complete array the single variable obtains the number of elements in the array. The above statement will put the variable $no_of_months equal to 12. Note that in the array no element with the number 12 does exist, the elements are numbered starting with 0 and ending with 11.

Associative Arrays

PERL knows another type of array, called "associative array". Here the elements are indexed not by numbers but by arbitrary strings:

%month_long = ( "Jan", "January",   "Feb", "February", "Mar", "March",    "Apr", "April",                 "May", "May",       "Jun", "June",     "Jul", "July",     "Aug", "August",                 "Sep", "September", "Oct", "October",  "Nov", "November", "Dez", "Dezember" );

Here we have pairs of keys and values. "Jan" is the Key and "January" the corresponding value. And so on for "Feb" and "February", "Mar" and "March" etc. An individual value is then specified using its corresponding key:

print $month_long{ "Jan" );

This statement will print the full name of the first month: January.

Another example from biology is the genetic code. A convenient way to specify the amino acid translations of DNA codons is the use of an associative array:

%genetic_code = ( "TTT", "Phe", "TTC", "Phe", "TTA", "Leu", "TTG", "Leu",                   "TCT", "Ser", "TCC", "Ser", "TCA", "Ser", "TCG", "Ser",                   "TAT", "Tyr", "TAC", "Tyr", "TAA", "***", "TAG", "***",                   "TGT", "Cys", "TGC", "Cys", "TGA", "***", "TGG", "Trp",                   "CTT", "Leu", "CTC", "Leu", "CTA", "Leu", "CTG", "Leu",                   "CCT", "Pro", "CCC", "Pro", "CCA", "Pro", "CCG", "Pro",                   "CAT", "His", "CAC", "His", "CAA", "Gln", "CAG", "Gln",                   "CGT", "Arg", "CGC", "Arg", "CGA", "Arg", "CGG", "Arg",                   "ATT", "Ile", "ATC", "Ile", "ATA", "Ile", "ATG", "Met",                   "ACT", "Thr", "ACC", "Thr", "ACA", "Thr", "ACG", "Thr",                   "AAT", "Asn", "AAC", "Asn", "AAA", "Lys", "AAG", "Lys",                   "AGT", "Ser", "AGC", "Ser", "AGA", "Arg", "AGG", "Arg",                   "GTT", "Val", "GTC", "Val", "GTA", "Val", "GTG", "Val",                   "GCT", "Ala", "GCC", "Ala", "GCA", "Ala", "GCG", "Ala",                   "GAT", "Asp", "GAC", "Asp", "GAA", "Glu", "GAG", "Glu",                   "GGT", "Gly", "GGC", "Gly", "GGA", "Gly", "GGG", "Gly" );

This associative array can be used to translate codons. The following statement prints the translation of the codon "TCT":

print $genetic_code{ "TCT" };

Note that similar to scalar arrays an individual element of the array is specified by replacing the %-sign with the $-sign and giving the key between wavy brackets.

foreach-loop

To do an action repetitively with each element of an array PERL provides the foreach loop construction:

foreach $m ( @month )
   {
   print $m;
   print "\n";
   }

This is the first time we encounter a loop-construction. In the first part of this statement a single variable $m will be set equal to the first element in the array @month. Then all statements in the block (the statements in between the two wavy brackets) will be executed. Then program control goes back to the first statement and $m will be set equal to the next element in the array @month and the block will be executed again. This is done repetitively until the last element in the array is reached. Then the statements after the final closing wavy bracket will be executed. The loop above will print all the names of the months, one per line.

Please note that the loop starting from the first wavy bracket { to the finished wavy bracket } is indented, i.e. it is shifted to the right. This can be done by using tabulators or three blanks. The indentation ensures, that the reader is able to see immediately which statements belong to a given block and will be executed repetitively. This highly increases the readability of the program.

The array @measurement contains integer numbers. Using the following foreach loop the sum of these values can calculated:

$sum = 0;
foreach $value ( @measurement )
   {
   $sum = $sum + $value;
   }
print "The sum is $sum\n";

In the last statement a string is printed: "The sum is $sum\n". Before printing the variable $sum will be replaced by its contents. At the end of the string a control code "\n" ensures that at the end of the print line a new line will be started. Several such control codes exist for special purposes:

\n   new line
\t   tabulator
\a   alarm bell

If you want to write a maintainable program you should not hesitate to add many comments to your programm. Any text inserted into your program after a hash-sign (#) is a comment and will be ignored during the execution of the program.

# This progam calculates the sum of the array @measurement
$sum = 0;
foreach $value ( @measurement )
   {
   $sum = $sum + $value;
   }
print "The sum is $sum\n";

This will ensure, that everyone reading your program (even yourself) will be able to understand the use this program has been intended for. Another use of the #-sign is the "commenting out" of program codes:

# print "The sum is $sum\n";

This statement will then be ignored during the run-time of the program. On the other hand the program code is not discarded and may be used again later if necessary, simple by deleting the #-sign again.

Exercises

1. Rewrite the program for the weekly timetable using arrays. Try to make the program shorter in this way.

2. Write a program to generate a table of the genetic code, using arrays: @base and %genetic_code. The program should generate an output like the table you see here:

+--------------+--------------+--------------+--------------+
| TTT  Phe (F) | TCT  Ser (S) | TAT  Tyr (Y) | TGT  Cys (C) |
| TTC  Phe     | TCC  Ser     | TAC  Tyr     | TGC  Cys     |
| TTA  Leu (L) | TCA  Ser     | TAA  ***     | TGA  ***     | 
| TTG  Leu     | TCG  Ser     | TAG  ***     | TGG  Trp (W) |
+--------------+--------------+--------------+--------------+
| CTT  Leu (L) | CCT  Pro (P) | CAT  His (H) | CGT  Arg (R) |
| CTC  Leu     | CCC  Pro     | CAC  His     | CGC  Arg     |
| CTA  Leu     | CCA  Pro     | CAA  Gln (Q) | CGA  Arg     |
| CTG  Leu     | CCG  Pro     | CAG  Gln     | CGG  Arg     |
+--------------+--------------+--------------+--------------+
| ATT  Ile (I) | ACT  Thr (T) | AAT  Asn (N) | AGT  Ser (S) |
| ATC  Ile     | ACC  Thr     | AAC  Asn     | AGC  Ser     |
| ATA  Ile     | ACA  Thr     | AAA  Lys (K) | AGA  Arg (R) |
| ATG  Met (M) | ACG  Thr     | AAG  Lys     | AGG  Arg     |
+--------------+--------------+--------------+--------------+
| GTT  Val (V) | GCT  Ala (A) | GAT  Asp (D) | GGT  Gly (G) |
| GTC  Val     | GCC  Ala     | GAC  Asp     | GGC  Gly     |
| GTA  Val     | GCA  Ala     | GAA  Glu (E) | GGA  Gly     |
| GTG  Val     | GCG  Ala     | GAG  Glu     | GGG  Gly     |
+--------------+--------------+--------------+--------------+

Solutions


© 2007, by Kurt Stüber.