IntroToPerl - University of Oxford

Introduction to Perl
Thaddeus Aid
IT Learning Programme
University of Oxford
About the course
• In these three sessions we will cover:
Basic Programming Concepts
Perl Syntax
Data input, manipulation, and output
Provide a foundation to learn more Perl or other programming language
Provide a starting point for your own programs
• This class is designed to provide a supported learning environment
• Go at your own pace
What the course is not
• A complete review of Perl
• An in-depth introduction to programming
• A “taught” course
What is Perl?
• High-level. Strong abstraction from the details of the computer, some
natural language elements, easy-to-use, etc.
• General-purpose. People have done almost everything imaginable
with a computer using Perl!
• Interpreted. Perl translates source code into an efficient intermediate
representation which is then immediately executed.
• Dynamic. Perl executes at runtime many behaviours that other lowlevel languages might perform during compilation.
• Free Software: Perl is available for usage without charge.
A brief history of Perl
• Before Perl:
• C/C++/Fortran/COBOL: Compiled languages that were platform specific
• Shell Scripts: Basic automation of Operating System tasks
• Utilities (AWK/SED/GREP): Used to process text files
• There was a need by systems administrators to simplify things
• Perl was born in 1987
• Perl 5 (the current version) was released in 1994
What is Perl used for?
• Text processing. Designed as a Unix-based system for report
processing (systems administration).
• Web. Perl was one of the few programming languages at the time
suitable for processing the highly textual content of the web.
• Science. “Genomics” meant we needed a tool to process data
involving DNA sequences.
• Many enthusiasts have created extensions for Perl that allow its use
in almost any domain but check that it really is the right language for
Executing a Perl program
• What do you need?
• Source Code – a text file containing the instructions for the computer
• The Perl Interpreter – perl (available at
• (Optional) Data Files
• To execute type
“.pl” is the standard extension for a Perl program like “.doc” or “.exe”
• No executable program will be generated
Alternatives to Perl
• R/MATLAB/Octave/SciPy: Mathematical Computing. In Perl v5, this is
currently done via an external module (PDL) but some basic
functionality will be directly incorporated into v6.
• Python. Very powerful and popular alternative to Perl.
• Compiled languages. Speed and maximum performance.
Session Plan
• Session 1
Hello World! – Your first Perl program
Variables (Part 1) – Scalars and Arrays (1d and 2d)
Conditional Statements (if/else)
Loops (foreach/while)
• Session 2
• File handling
• Regular Expressions (RegEx)
• Variables (Part 2) – Hashes
• Session 3
• Functions
Hello World!
• Comments – “#” Notes to the programmer from the programmer
• #!/usr/bin/perl – A special message to the operating system
• Often you will need to remind yourself of what is happening or you will need to
explain something to another future programmer
• Pragma – “use” Sets environmental conditions for the program
• Functions – “something()” Reusable sections of code
• Sometimes the () is omitted for special functions such as “print”
• End of line – “;” The semicolon defines the end of a command/line
Escape Characters
• Some text mark up requires special codes
\n – New Line
\t – Tab
\\ – Normal Backslash
\” – Some times required to use double quote (use inside of strings)
\’ – Some times required to use single quote
• More information at:
use strict;
Variables (Part 1) – Scalars
• Numbers
• Integers
• Floating Point Numbers
• Strings
• Chunks of text
“this is a string”
“this is another string”
“This string has a number 182939”
“Thad said \“Hi Everyone!\””
Scalars (cont.)
• To create a Scalar variable use “$”
• $variable = “something”;
• To print a Scalar use the print function
• print $variable;
• print “$variable\n”;
Numeric Operations
Perl can perform all normal mathematical functions on number scalars
“%” – Modulo can be thought of as the remainder after a division function
“**” – Is shorthand for the exponential function
More mathematics like sin(), cos(), and tan() are available through extra libraries
String Operations
Perl is a very powerful tool for the manipulation of strings
“.” concatenates strings together
Strings surrounded in double quotes can automatically replace text for scalars in the string
String literals use single quotes and will not parse and replace anything in the string
There are a number of functions that will modify your string
User Input
• Scalars can be used to store information added to the program after
execution. This information can come from the user or from a file. For
now we will focus on user input.
• Arrays are ordered sets of scalar variables
• Arrays can mix numbers and strings in the set
• Arrays use the [email protected] symbol when referencing them
Arrays (Cont.)
Starting with an array of 5 scalars
“shift” – takes off the first scalar
“pop” – takes off the last scalar
“push” adds a new scalar at the end of the array
2D Arrays
Members of an array can be an array
This is different to a 2D array in a language like C
These are known as “jagged arrays”
You can have higher dimensionality if needed
• Programs often need to make decisions during execution
• This is handled by the ideas of if/else
• if statement is true -> do something
• else -> do something different
• This also introduces the idea of a Boolean true/false test
if ($name eq “Thad”) – this may or may not be true
if (9 > 0) – this is always true
if (9 < 0) – this is never true
if ($a + $b == 100) – perform the calculation then test against condition
Conditionals (Cont.)
• Conditionals control the direction that the program follows
• It is possible to have more than one truth statement in the if
• Loops are used to repeat a section of code an arbitrary number of
times. For example: I want to check each employee record in a
company to see if the employee’s salary is > £50,000
• The classic example is the “for” loop, which starts at a number, is
changed in some way at the end of each loop and continues while a
conditional statement is true.
• The “for” loop was improved into the “foreach” loops which will step
through an array and offer the scalar as a variable for testing.
• The final loop that we will encounter in this course is the “while”
loop. The while loop uses a conditional statement to determine if the
loop should execute.
For Loops
• The “for” loop requires three conditions to execute
• The start condition $i = 0
• The conditional statement $i < 20
• The step statement $i += 1
Foreach Loops
• Given a range or an array of items, the “foreach” loop will step
through each item and offer a special scalar $_ containing the current
Foreach (Cont.)
• Alternately to using $_ you can name the scalar to reduce confusion.
While Loops
• The “while” loop checks a conditional statement before executing its
code, once the end of the block has been reached it returns to the
top of the code block and checks the conditional statement again.
Additional Help
• When in doubt use Google or other search engine. There are millions
of code examples, questions, and solutions on the web and there is
no reason not to use them (with attribution if appropriate)
• If you have a question that you can’t solve with Google then try
asking the community at Stack Overflow
• Or find a Perl community on the web, there are a great many places
that a new programmer can go for help
Practical Session
• Please make your way to the computers, you will need to set up your
keyboards and the instructions are given in your course booklet.
• Please feel free to ask questions
• Go at your own pace
• If you don’t finish in class: Perl is available for free at and is a simple install to get on to Windows
• Linux and Mac should already have Perl included
• Please feel free to email me questions during the week:
[email protected]
Introduction to Perl
Part 2
Thaddeus Aid
IT Learning Programme
University of Oxford
• Any questions from last week?
• We have covered
• Hello World!
• Variables
• $scalars
• @arrays
• Conditional Statements
• if
• elsif
• else
• Loops
• for
• foreach
• while
This Session
• Basic File Handling
• File Reading (open, <)
• File Writing (open, >)
• Regular Expressions
• Text searching and manipulation
• Variables (Part 2)
• %hashes
Reading a File
• We will only be dealing with simple text file handling.
• We will be using the open function and a few more in the book.
• Redirection symbols are a legacy symbol from the Unix command line
• < redirect input from a file
• > redirect output to a file
• >> redirect output to a file in append mode
• open(FILEHANDLE, “Redirection Symbol”, “Filename”);
• open(INPUT, “<“, “somedata.txt”);
Reading a File (Cont.)
• Reading the entire file into an array for processing.
Reading a File (Cont.)
• Reading a file one line at a time.
Reading a File (Cont.)
• A more advanced example
• Skipping header lines
• Changing carriage return for new line
• Splitting a line of input
Bringing it all together
• As a recommendation, never write to the file that you have read
Regular Expressions
• Regular Expressions are a very powerful way to manipulate text files.
• Match – Find a substring in your string
• Translate – Replace one set of letters with another
• Substitute – A more powerful replacement command
Matching (Cont.)
• A simple example of matching
Matching (Cont.)
• A more complex matching example
• A simple substitution example
Variables: Part 2 - Hashes
• Hashes (known as associative arrays or maps in other languages) is
the third basic data structure in Perl.
• We store data in a hash using a key and value pair.
• The key acts as an index into the hash where a value is stored.
• In Perl, hashes are specified using the % symbol.
Hashes (Cont.)
• Hashes are like an array but use a “key” instead of an “index”.
Hashes (Cont.)
• An example of a hash table.
• Getting all the keys can be done with
the “keys” function.
• You can loop through the keys array
like any other array.
• Hash keys are not sorted, because of
Hashes (Cont.)
• “exists” checks to
see if entry is set.
• “sort” will sort an
A Real World Example
• This is a program that I wrote
to translate 22 file containing
phylogenetic information
about humans from one
version of the genome to
another version of the
genome (hg18 -> hg19)
• Please do the exercises in Chapters 3-5.
• Go at your own pace.
• Please ask questions if you get stuck.
Introduction to Perl
Part 3
Thaddeus Aid
IT Learning Programme
University of Oxford
• Any questions from last week?
• We have covered
Hello World!
Conditional Statements
File Handling
• Read
• Write
• Regular Expressions
• Variables
• Hashes
This Session
• Subroutines
• Discussions on the projects that you are going to work on
• sub
• A subroutine (function or method in other languages) is a self contained block
of code that can be called from different parts of the program.
• A subroutine may take one or more variables as input
• A subroutine may return one or more variables as output
• You have already been using built-in subroutines such as print and
• To use a subroutine you type the name and add parentheses after
• Subroutines live in code blocks
• We have been using my as a keyword without knowing what it
means. When we use the keyword my as the declaration of our
variable we are declaring that it exists in the current scope and
cannot be accessed outside of that scope.
• A variable can only be seen within the scope it was created in and in
any scopes created within that scope
• Scope is created by using {} and placing code between them
• This is called a code block
• As a suggestion in each new code block indent your code an additional tab
• The variables can be seen within the scope
that it is created and sub-scopes
• The green scope can see $test1
• The green scope cannot see $test4
• Variables cannot be seen in higher scopes
• The blue scope cannot see $test2 or 3 or 4
• Variables only exist so long as their scope
• Once you exit your scope your variables are
no longer accessible
File Scope
• Where you declare
your variables is very
important, in this
code the variable $x
can be seen
anywhere in the
Local Scope
• Again, where you declare your
variable is very important. In
this example the declaration of
$x cannot be seen in
• Subroutines can take information into themselves and return sets of
• Parameters can be passed as scalars or as references
• Passing an array normally causes it to be treated as a series of scalars
• The alternative way is to pass the address of the array and then you can “dereference”
the address to access the elements
• You should pass a hash by reference and then dereference the address
• You access the passed parameters by the special array @_
• You output variables using the return function
Passing Parameters
• In this code we are passing the
subroutine a number and
returning the answer.
Passing Parameters
• Here we pass two scalar
variables. They are joined into
an array and are access like a
normal array would be.
Using References
• A reference is create
using the \ character
such as \@x
• The array is the
accessed by
dereferencing the
reference, such as
• To return an array
send it as a
References Illustrated
• Depending on how you
pass your information will
change how you access
your information
• Any questions?
What next?
• There are a number of more advanced tutorials online to learn more Perl.
You can start at and looks through the available
resources in the Learning Perl 5 section
• CPAN hosts over 25,000 code libraries to add new functions to your
• You can look into a course on algorithms, learning how to use and
implement algorithms will make your code better
• Join Stack Overflow which is a fantastic place to find answers to difficult
programming problems
• Always remember that the answer to your problems are probably only a
Google search away
• Please complete the exercises in Chapter 6
• Go at your own pace
• Please feel free to ask me questions about the exercises or about your
future projects
• Thank you for attending this class, I really enjoy teaching it

similar documents