0% found this document useful (0 votes)
98 views58 pages

Introduction to PERL Scripting Basics

The document provides an introduction to the Perl programming language. It outlines what Perl is, why it would be used instead of other languages, its features like variables and flow control, how to run Perl scripts, and common Unix scripting tasks that can be accomplished in Perl like filtering files. The document contains examples of Perl syntax, variables, operators, functions, and subroutines.

Uploaded by

Gobara Dhan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views58 pages

Introduction to PERL Scripting Basics

The document provides an introduction to the Perl programming language. It outlines what Perl is, why it would be used instead of other languages, its features like variables and flow control, how to run Perl scripts, and common Unix scripting tasks that can be accomplished in Perl like filtering files. The document contains examples of Perl syntax, variables, operators, functions, and subroutines.

Uploaded by

Gobara Dhan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to PERL: Scripting for UNIX made simple and portable

Yuk Sham
MSI Consultant Phone: (612) 626 0802 (help) Email: help@[Link]

What is PERL? Why would I use PERL instead of something else? PERL features How to run PERL scripts PERL syntax, variables, quotes Flow control constructs Subroutines Typical UNIX scripting tasks Filter a file or a group of files Searching/Matching Naming file sequences Executing applications Parsing files More information

Outline

What is PERL?
Practical Extraction Report Language Written by Larry Wall Combines capabilities of Bourne shell, csh, awk, sed, grep, sort and C To assist with common tasks that are too heavy or portable-sensitive in shell, and yet too weird or too complicated to code in C or other programming language. File or list processing - matching, extraction, formatting (text reports, HTML, mail, etc.)

Why would I use PERL instead of something else?


Interpreted language Commonly used for cgi programs Very flexible Very automatic Can be very simple for a variety of tasks WIDELY available HIGHLY portable

PERL features
C-style flow control (similar) Dynamic allocation Automatic allocation Numbers Lists Strings Arrays Associative arrays (hashes)

PERL features
Very large set of publicly available libraries for wide range of applications Math functions (trig, complex) Automatic conversions as needed Pattern matching Standard I/O Process control System calls Can be object oriented

How to run PERL scripts


% cat [Link] print "Hello world from PERL.\n"; %

% perl [Link] Hello world from PERL.

How to run PERL scripts


OR ------------------

% which perl /usr/bin/perl

% cat [Link] #!/usr/bin/perl print "Hello world from PERL.\n"; %chmod a+rx [Link] % [Link] Hello world from PERL.
(the .pl suffix is just a convention - no special meaning - to perl) /usr/local/bin/perl is another place perl might be linked at Institute

PERL syntax
Free form - whitespace and newlines are ignored, except as delimiters PERL statements may be continued across line boundaries All PERL statement end with a ; (semicolon) Comments begin with the # (pound sign) and end at a newline Comments may be embedded in a statement
see previous item no continuation may be anywhere, not just beginning of line

Example 1: #!/usr/bin/perl # This is how perl says hello print "Hello world from PERL.\n"; # It says hello once print "Hello world again from PERL.\n";# It says hello twice Example 2:

Hello world

#!/usr/bin/perl print"Hello world from PERL.\n";print"Hello world again from PERL.\n"; Example 3: #!/usr/bin/perl print "Hello world from PERL.\n"; print "Hello world again from PERL.\n"; Hello world from PERL. Hello world again from PERL.

10

PERL variables
Number or string Array
$count List of numbers and/or strings Indexed by number starting at zero @an_array List of numbers and/or strings Indexed by anything %a_hash

Associative array or hash

11

$x = 27; $y = 35; $name = "john"; @a = ($x,$y,$name); print x = $x and y = $y\n; print The array is @a \n"; X = 27 and y = 35 The array is 27 35 john @a = ("fred","barney","betty","wilma"); print "The names are @a \n"; print "The first name is $a[0] \n"; print "The last name is $a[3] \n"; The names are fred barney betty wilma The first name is fred The last name is wilma

Strings and arrays

12

$a{dad} = "fred"; $a{mom} = "wilma"; $a{child} = "pebble"; print "The mom is $a{mom} \n"; The mom is wilma

Associative arrays

@keys = keys(%a); @values = values(%a); print The keys are @keys \n print The values are @values \n"; The keys are mom dad child The values are wilma fred pebble

13

increase or decrease existing value by 1 (++, --) modify existing value by +, -, * or / by an assigned value (+=, -=, *=, /=)
Example 1 $a = 1; $b = "a"; ++$a; ++$b; print "$a $b \n"; 2 b Example 2 $a = $b = $c = 1; ++$b; $c *= 3; print "$a $b $c\n"; 1 2 3

Operators and functions

14

Operators and functions


Numeric logical operators
==, !=, <, >, <=, >=

String logical operators


eq, ne, lt, gt, le, ge

15

Add and remove element from existing array (Push, pop, unshift, shift) Rearranging arrays (reverse, sort)

@a = qw(one two three four five six); print "@a\n"; one two three four five six unshift(@a,zero"); print "@a\n"; zero one two three four five six shift(@a); print "@a\n"; one two three four five six @a = reverse(@a); print "@a\n"; six five four three two one @a = sort(@a); print "@a\n"; five four one six three two

Operators and functions

# add elements to the array # from the left side # removes elements from the array # from the left side # reverse the order of the array

# sort the array in alphabetical order

16

Removes last character from a string (chop) Removes newline character, \n,from end of a string (chomp) Breaks a regular expression into fields (split) and joints the pieces back (join)
$a = "this is my expression\n"; print "$a"; this is my expression chomp($a); print "$a . "; @a = split(/ /,$a); print "$a[3] $a[2] $a[1] $a[0]\n";

Operators and functions

# splits $a string into an array called @a

this is my expression. expression my is this $a = join(":",@_); print "$a \n"; this:is:my:expression # create a string called $a by joining # all the elements in the array @a and # having : spaced between them

17

Substituting a pattern (=~ s/./../) Transliteration (=~ tr/././) $_ = "this is my expression\n"; print "$_\n"; this is my expression $_ =~ s/my/your/; print "$_\n"; this is your expression $_ =~ tr/a-z/A-Z/; print "$_\n"; THIS IS YOUR EXPRESSION

Operators and functions

18

Control_operator (expression(s) ) { statement_block; } Example: if ( $i < $N ) { statement_block; } else { statement_block; } foreach $i ( @list_of_items ) { statement_block; }

Flow control constructs

19

Subroutines
# assigns an array @a @a = qw(1 2 3 4); print summation(@a),"\n"; # prints results of subroutine # summation using @a as # input sub summation { my $k = 0; foreach $i (@_) { $k += $i; } return($k); } 10 # summing every element in # the array @a and return # the value as $k

20

Command-line arguments
#!/usr/bin/perl print "Command name: $0\n"; print "Number of arguments: $#ARGV\n"; for ($i=0; $i <= $#ARGV; $i++) { print "Arg $i is $ARGV[$i]\n"; }

% ./[Link] zero one two three Number of arguments: 3 Arg 0 is zero Arg 1 is one Arg 2 is two Arg 3 is three
21

Concatenating Strings with the . operator


$firstname = George; $midname = walker; $lastname = Bush; $fullname = $lastname . , . $firstname . . uc(substr $midname, 0, 1) . .\n; print $fullname;

Bush, George W.

22

UNIX Environment Variables


print your username is $ENV{USER} and \n; print your machine name is $ENV{HOST} and \n; print your display is set to $ENV{DISPLAY} and \n; print your shell is $ENV{SHELL} and \n; print your timezone is $ENV{TZ} etcetera.\n;

your your your your your

username is shamy and machine name is [Link] and display is set to localhost:10.0 and shell is /bin/tcsh and timezone is CST6CDT, etcetera...

23

Typical UNIX scripting tasks


Filter a file or a group of files Searching/Matching Naming file sequences Executing applications Parsing files

24

Filtering standard input


#!/usr/bin/perl while( <> ) { print "line $. : $_" ; } # read from stdin one line at a time # print current line to stdout

[Link]
Silicon Graphics' Info Search lets you find all the information available on a topic using a keyword search. Info Search looks begin through all the release notes, man pages, and/online books you done have installed on your system or on a networked server. From the Toolchest on your desktop, choose Help-Info Search. begin Quick Answers tells you how to connect to an Internet Service Provider (ISP). done From the Toolchest on your desktop, choose Help > Quick Answers > How Do I > Connect to an Internet Service Provider. through all the release notes, man pages, and/online books you Quick Answers tells you how to connect to an Internet Service Provider (ISP).

25

./[Link] [Link]

Filtering standard input

line 1 : Silicon Graphics' Info Search lets you find all the information line 2 : available on a topic using a keyword search. Info Search looks line 3 : begin line 4 : through all the release notes, man pages, and/online books you line 5 : done line 6 : have installed on your system or on a networked server. From line 7 : the Toolchest on your desktop, choose Help-Info Search. line 8 : begin line 9 : line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP). line 11 : done line 12 : From the Toolchest on your desktop, choose line 13 : Help > Quick Answers > How Do I > Connect to an Internet Service Provider. line 14 : through all the release notes, man pages, and/online books you line 15 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).

26

Filtering standard input


#!/usr/bin/perl while( <> ) { print "line $. : $_" unless $. %2; } ./[Link] [Link]
line 2 : available on a topic using a keyword search. Info Search looks line 4 : through all the release notes, man pages, and/online books you line 6 : have installed on your system or on a networked server. From line 8 : begin line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP). line 12 : From the Toolchest on your desktop, choose line 14 : through all the release notes, man pages, and/online books you

# print only the even lines

27

Filtering standard input


#!/usr/bin/perl while( <> ) { if( /begin/ .. /done/ ) { print "line $. : $_; } } ./[Link] [Link]
line 3 : begin line 4 : through all the release notes, man pages, and/online books you line 5 : done line 8 : begin line 9 : line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP). line 11 : done

# prints any text that # starts with begin # and finishes with end

28

Filtering standard input


#!/usr/bin/perl while( <> ) { if( /begin/ .. /done/ ) { unless( /begin/ || /done/ ) { print "line $. : $_; } } }

./[Link] [Link]
line 4 : through all the release notes, man pages, and/online books you line 9 : line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).

29

Naming files
Files Reformating files

30

#!/usr/bin/perl # [Link] foreach $i ( 0 .. 50 ) { print "touch gifdir/$[Link]\n"; system("touch gifdir/$[Link]"); } ./[Link]


Perl executes the following in unix: touch touch touch touch touch . . . touch touch touch gifdir/[Link] gifdir/[Link] gifdir/[Link] gifdir/[Link] gifdir/[Link]

Files

gifdir/[Link] gifdir/[Link] gifdir/[Link]

31

Files
% ls lt gifdir/*.gif -rw-------rw-------rw-------rw-------rw------1 1 1 1 1 shamy shamy shamy shamy shamy support support support support support 995343 995343 995343 995343 995343 . . . 995343 995343 995343 995343 995343 Oct Oct Oct Oct Oct 21 21 21 21 21 18:50 18:50 18:50 18:50 18:50 [Link] [Link] [Link] [Link] [Link]

-rw-------rw-------rw-------rw-------rw-------

1 1 1 1 1

shamy shamy shamy shamy shamy

support support support support support

Oct Oct Oct Oct Oct

21 21 21 21 21

18:50 18:50 18:50 18:50 18:50

[Link] [Link] [Link] [Link] [Link]

32

#!/usr/bin/perl foreach $i ( 0 .. 50 ) { $new = sprintf("step%[Link]", $i); print "mv gifdir2/$[Link] gifdir2/$new\n"; system "mv gifdir2/$[Link] gifdir2/$new"; } ./[Link] Perl executes the following in unix:
mv mv mv mv mv mv mv mv mv gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] . . gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link] gifdir2/[Link]

Files
# naming the gif file with # with a 3 digit numbering # scheme

33

ls gifdir2 (before)
gifdir2: [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link]

Files

ls gifdir2 (after)
gifdir2: script [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link]

34

Parsing and reformating Files


HEADER COMPND REMARK REMARK RORIGX2 CALCIUM-BINDING PROTEIN 29-SEP-92 CALMODULIN (VERTEBRATE) 1 REFERENCE 1 1 AUTH [Link],[Link],[Link] 0.000000 0.018659 0.001155 0.00000 . . . ATOM 1 N LEU 4 -6.873 21.082 25.312 ATOM 2 CA LEU 4 -6.696 22.003 26.447 ATOM 3 C LEU 4 -6.318 23.391 25.929 ATOM 4 O LEU 4 -5.313 23.981 26.352 ATOM 5 N THR 5 -7.147 23.871 25.013 ATOM 6 CA THR 5 -6.891 25.193 24.428 . . . CONECT 724 723 1137 CONECT 736 735 1137 1CLL 2 1CLL 3 1CLL 13 1CLL 14 1CLL 143

1.00 49.53 1.00 48.82 1.00 46.50 1.00 45.72 1.00 46.77 1.00 46.84

1CLL 1CLL 1CLL 1CLL 1CLL 1CLL

148 149 150 151 152 153

1CLL1440 1CLL1441

35

Parsing Files
#!/usr/bin/perl $pdbfile = shift; ($pref = $pdbfile) =~ s/\.pdb//; print "Converting $pdbfile to $[Link] \n"; open(FILIN, "<$pdbfile" || die "Cannot open pdb file $pdbfile \n "); open(FILOUT,">$[Link]"); while (<FILIN>) { if (/^ATOM/) { chomp; split; } }

printf FILOUT "%5d %4s %8.3f%8.3f%8.3f\n", $_[1], substr($_[2], 0, 1), $_[5], $_[6], $_[7];

close(FILIN); close(FILOUT);

36

Reformating Files
./[Link] [Link] more [Link]
1 2 3 4 5 6 N C C O N C -6.873 -6.696 -6.318 -5.313 -7.147 -6.891 21.082 22.003 23.391 23.981 23.871 25.193 25.312 26.447 25.929 26.352 25.013 24.428

. . .

37

Executing applications
#!/usr/bin/perl $pdbfile = shift(@ARGV); ($pref = $pdbfile) =~ s/.pdb//; system ("rm -r $pref"); system ("mkdir $pref"); chdir ("$pref"); open(SCRIPT,">script"); print SCRIPT "zap\n"; print SCRIPT "load pdb ../$pdbfile\n"; print SCRIPT "background black\n"; print SCRIPT "wireframe off\n"; print SCRIPT "ribbons on\n"; print SCRIPT "color ribbons yellow\n"; for ($i = 0; $i <= 50; ++$i) { $name = sprintf("%3.3d",$i); print SCRIPT "rotate x 10\n"; print SCRIPT "write $[Link]\n"; } print SCRIPT "quit\n"; close SCRIPT; #create a variable $pref using the prefix

#of the pdb filen

#create a directory named after $pref #change directory into $pref #create a a file called script

#assigns a value from 0 to 50 #create a file name based on this value #for every value, rotate 10 degrees #generate a gif file for each value

system("/usr/local/bin/rasmol < script"); system("dmconvert -f mpeg1v -p video ###.gif [Link]"); chdir ("..");

#execute the rasmol program #execute dmconvert to make movie

38

more foo/script
background black wireframe off ribbons on color ribbons yellow rotate x 10 write [Link] rotate x 10 write [Link] rotate x 10 write [Link] . .

Executing applications

ls -lt foo
total 99699 -rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw------1 1 1 1 1 1 1 1 shamy shamy shamy shamy shamy shamy shamy shamy support support support support support support support support 256504 Oct 21 18:34 [Link] 995343 Oct 21 18:33 [Link] 995343 Oct 21 18:33 [Link] 995343 Oct 21 18:33 [Link] . . 995343 Oct 21 18:32 [Link] 995343 Oct 21 18:32 [Link] 995343 Oct 21 18:32 [Link] 1418 Oct 21 18:32 script

39

>sequence1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGC ACGAGGTGGAAAAGCAATATCTTAACATTTTAGGACTGATTTCAGAAATA GAAGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGG TGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCA TGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAG CAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCA ATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >sequence2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGC GATGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGG TGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCA TGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAG CAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCA ATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Parsing a DNA sequence

40

#!/usr/bin/perl while (<>) { if ($_ =~ /^\>/ || eof ) { if ($count > 0) { $line = join("",@line); print $seq; fixhead($line); fixtail($line); write stdout; } $count = 0; $seq = $_; @line = ""; } else { chomp; ++$count; push(@line,$_); } } format stdout =

Parsing a DNA sequence

~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $line .
41

sub fixhead { $length = length($line); for ($i = 0;$i <= $length; ++$i){ if (substr($line,0,1) eq "X"){ $line = substr($line,1,$length-1); } else { return; } } }

Parsing a DNA sequence

sub fixtail { $length = length($line); for ($i = 0;$i <= $length; ++$i){ if (substr($line,$length-($i+1),1) eq "X"){ $line = substr($line,0,$length-($i+1)); } else { return; } } }

42

Parsing a DNA sequence


>sequence1 GGCACGAGGTGGAAAAGCAATATCTTAACATTTTAGGACTGATTTCAGAA ATAGAAGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTG TGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTA CCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAAT CAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAA GCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAC >sequence2 GGCGATGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTG TGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTA CCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAAT CAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAA GCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAC
43

Creating DNA sequence fragments

>chr4 GGACAGCGGAATCCTCGACCCGGTTGAGGAATGGTCGACGAAAATCTATCGGGTTCGAGG ATTCGTCGACCAGGTGTTGGAATCGTCGACCGAGTCTGAGAATTCGTAGACCAGGACGGC GGAATCCTCGACAATGACGAGGTATGGTCGAGGAAAATCTATCGGGTTCGAGGATTCGTC TACCAGGTGATGGAATCCTCGACCAGGACAAAGAATTCGTCGACCAGGGGTGGAATTGTT GTTATTCCGATCATGAGAGCGGATATCAGTACAGATCCGACGCTGGTGAAAAAGATCACG GCGATCGTGGATAGTATCAAGCCACCGAGAGTCTCGTATTCGGAGAAAGATCGGCCGATG AGGATAAGAGGTCGATCGGATGGACGGAAGAGGTAGAGGAAGAGCCATGAAGCGGCGAGG CATAGGAGGAGGATGAGCGAGAATGGGTGGGCGGGAAGAGAGAAACTGATGATCAGAGCG ATGATGCAGACGTAATTCACCCTGAAATAAGAGGAGTTCTTCCAGAATCGCGTCATGGCC TAAGGGTTAGGGGTTAAGGGTTAAGGGTTTAGGGTTAAGGGTTAAGGGTTTAGGGTTTAG GGTTTAGG

44

#!/usr/bin/perl # $infile = $ARGV[0]; $break_length = $ARGV[1]; $overlap_length = $ARGV[2]; $seq_count = 0; $count = 0; $fileflag = 0;

Parsing a DNA sequence

open (IN, "< $infile" ) || die "can not open input file for reading: $!\n"; while (<IN>) { if (!(/^\>/ )) { chomp; push(@line,$_); } } $seq = join("",@line); $length = length($seq); $nfrag = int($length/$break_length); $frag_length = $break_length + $overlap_length; print "The break length = $break_length\n"; print "The overlap length = $overlap_length\n"; print "The total length of the sequence = $length\n"; print "The total length of each fragment = $frag_length\n"; print "The total number of fragments = $nfrag\n\n";
45

Parsing a DNA sequence


for ($i = 0;$i <= $nfrag; ++$i){ $start = $i * $break_length; $stop = $i * $break_length+$frag_length; $frag = substr($seq,$start,$frag_length); # $outfile = $[Link]("_%5.5d_%5.5d",$start,$stop); $outfile = $infile."_".$start."_".$stop; open( OUT, "> $outfile" ) || die "Can not open output file\n"; print "Writing framgment from $start to $stop to fragment file $outfile\n"; print OUT "$outfile $start $stop\n"; write OUT; } format OUT = ~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $frag .

46

[Link] short 50 5
The The The The The break length = 50 overlap length = 5 total length of the sequence = 608 total length of each fragment = 55 total number of fragments = 12 framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment . from from from from from from from from from from from from from

Parsing a DNA sequence

Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing

0 to 55 to fragment file short_0_55 50 to 105 to fragment file short_50_105 100 to 155 to fragment file short_100_155 150 to 205 to fragment file short_150_205 200 to 255 to fragment file short_200_255 250 to 305 to fragment file short_250_305 300 to 355 to fragment file short_300_355 350 to 405 to fragment file short_350_405 400 to 455 to fragment file short_400_455 450 to 505 to fragment file short_450_505 500 to 555 to fragment file short_500_555 550 to 605 to fragment file short_550_605 600 to 655 to fragment file short_600_655

47

Parsing a DNA sequence

more short_*
short_0_55 0 55 GGACAGCGGAATCCTCGACCCGGTTGAGGAATGGTCGACGAAAATCTATCGGGTT ...skipping... short_100_155 100 155 AATTCGTAGACCAGGACGGCGGAATCCTCGACAATGACGAGGTATGGTCGAGGAA ...skipping... short_150_205 150 205 AGGAAAATCTATCGGGTTCGAGGATTCGTCTACCAGGTGATGGAATCCTCGACCA ...skipping... short_200_255 200 255 GACCAGGACAAAGAATTCGTCGACCAGGGGTGGAATTGTTGTTATTCCGATCATG ...skipping... short_250_305 250 305 TCATGAGAGCGGATATCAGTACAGATCCGACGCTGGTGAAAAAGATCACGGCGAT ...skipping... short_300_355 300 355 GCGATCGTGGATAGTATCAAGCCACCGAGAGTCTCGTATTCGGAGAAAGATCGGC ...skipping...

48

Convert seq to fasta format


ls *.seq [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link]

49

Convert seq to fasta format


source /usr/local/gcg/gcgstartup gcg [Link]

50

Convert seq to fasta format


#!/usr/bin/perl @list =`ls -1 *.seq`; foreach $i (@list) { chomp($i); system("/usr/local/gcg_10.3/solarisbin/gcgbin /execute/tofasta $i -Default"); }

51

Convert seq to fasta format


ls README [Link] [Link]* [Link] [Link]~* [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link]

52

<?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd" > <BlastOutput> <BlastOutput_program>blastn</BlastOutput_program> <BlastOutput_version>blastn 2.2.5 [Nov-16-2002]</BlastOutput_version> <BlastOutput_reference>~Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), ~&quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> <BlastOutput_db>[Link]</BlastOutput_db> <BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID> <BlastOutput_query-def>sequence1</BlastOutput_query-def> <BlastOutput_query-len>319</BlastOutput_query-len> <BlastOutput_param> <Parameters> <Parameters_expect>10</Parameters_expect> <Parameters_sc-match>1</Parameters_sc-match> <Parameters_sc-mismatch>-3</Parameters_sc-mismatch> <Parameters_gap-open>5</Parameters_gap-open> <Parameters_gap-extend>2</Parameters_gap-extend> <Parameters_filter>D</Parameters_filter> </Parameters> </BlastOutput_param>
53

Parsing Blast output

<BlastOutput_iterations> <Iteration> <Iteration_iter-num>1</Iteration_iter-num> <Iteration_hits> <Hit> <Hit_num>1</Hit_num> <Hit_id>gi|1789957|gb|AE000431.1|AE000431</Hit_id> <Hit_def>Escherichia coli K-12 MG1655 section 321 of 400 of the complete genome</Hit_def> <Hit_accession>AE000431</Hit_accession> <Hit_len>11575</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>30.2282</Hsp_bit-score> <Hsp_score>15</Hsp_score> <Hsp_evalue>1.12539</Hsp_evalue> <Hsp_query-from>267</Hsp_query-from> <Hsp_query-to>253</Hsp_query-to> <Hsp_hit-from>8485</Hsp_hit-from> <Hsp_hit-to>8499</Hsp_hit-to> <Hsp_query-frame>1</Hsp_query-frame> <Hsp_hit-frame>-1</Hsp_hit-frame> <Hsp_identity>15</Hsp_identity> <Hsp_positive>15</Hsp_positive> <Hsp_align-len>15</Hsp_align-len> <Hsp_qseq>GCTAATCACTTTATT</Hsp_qseq> <Hsp_hseq>GCTAATCACTTTATT</Hsp_hseq> <Hsp_midline>|||||||||||||||</Hsp_midline> </Hsp> </Hit_hsps> </Hit> <Hit> <Hit_num>2</Hit_num> <Hit_id>gi|1789185|gb|AE000366.1|AE000366</Hit_id> 54

Parsing Blast output

. more [Link]
sequence1</BlastOutput_query-def> 319

Parsing Blast output

more [Link]
sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> 1 2 3 4 5 6 7 8 Esch Esch Esch Esch Esch Esch Esch Esch AE000431 AE000366 AE000467 AE000410 AE000300 AE000220 AE000170 AE000123 30.2282 1.12539 30.2282 1.12539 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 11575 10405 15633 10826 16939 9780 10627 11093 1 1 1 1 1 1 1 1

more [Link]
sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 267 59 101 160 22 95 33 40 253 73 114 147 9 108 16 53 8485 7824 9628 2067 2971 5209 2344 2390 8499 7838 9641 2080 2984 5222 2361 2403 15 15 14 14 14 14 18 14 15 15 14 14 14 14 17 14 100.00 100.00 100.00 100.00 100.00 100.00 94.44 100.00 0 0 0 0 0 0 0 0 30.2282 1.12539 30.2282 1.12539 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683

55

foreach $i (@files) { ++$count; print $i; chomp($i); ($prefix = $i) =~ s/\.pdb//; `cp $dir/$i .`; &wait(); `./[Link] $i`; } sub wait { loop: $check = `llq -u duany | grep " [IR] "|wc`; @check = split(/\t/,$check); print "There are $check[0] in the queue\n"; if ($check[0] > 5) { print "I am sleeping\n"; sleep 60; goto loop; } else { print "I am awake\n"; print "I am right now working on $i\n"; return; } }

Executing applications in a queue

56

More info
CPAN - Comprehensive Perl Archive Network Perl Resource Topics Bioperl
[Link] [Link] [Link] [Link] [Link] Countless more are available... [Link] Source, binaries, libs, scripts, FAQs, links [Link]

Others

57

Contact the Institute for additional help


Yuk Sham Computational Biology/Biochemistry Consultant Phone: (612) 624 7427 (direct) Email: shamy@[Link]

58

You might also like