Monday, June 4, 2012

Scientific Computing in Perl


I come from a scientific computing background (computational biology) and as such have often had to perform numerical computing tasks in numerous programming languages ranging from tried and true stalwarts like Fortran and C to languages like MATLAB.  A large amount of the scientific computing work I have done has also been done in Perl which has a long and rich history in the field of bioinformatics.  Recently, however, there seems to be a shift in the trend of people entering computational biology and bioinformatics to tend towards learning and using Python instead of Perl, citing the availability of libraries such as SciPy (for numerical computing) and interfaces such as RPy (for statistical computing with R).  This post is not meant to start a flame war with any fans of Python or Python’s scientific computing capabilities, as the Python community has done some great work in the area of scientific computing and has even produced some tools I have made extensive use of, such as Pymol (http://www.pymol.org/).  In fact I actually think competition is always a good thing.  Rather, I intend to use this post to raise some awareness of some of the Perl modules that can be of great benefit to anyone considering using Perl for scientific computing purposes. 

Perl Data Language (PDL) – allows Perl to manipulate large n-dimensional arrays (similar to MATLAB, NumPy, etc) in a quick and efficient manner.  The PDL namespace on CPAN contains many modules that should be of interest to scientific programmers including interfaces for numerical computing functions found in the GNU Scientific Library (http://www.gnu.org/software/gsl/).  More information on PDL can be found at http://pdl.perl.org/.

The Statistics namespace on CPAN – This namespace includes many Perl modules that allow the computation of numerous statistical analyses, ranging from basic descriptive statistics (Statistics::Descriptive) to more sophisticated analyses like multivariate regression (e.g.  Statistics::Regression).  There is even a module that allows your Perl code to interface with the statistical computing language R (Statistics::R). 

The Math namespace on CPAN – This namespace includes all kinds of advanced math functions that can be easily integrated into a Perl application.  While there are too many to mention in such a short synopsis, modules located here will provide support for dealing with areas of math such as trigonometry (Math::Trig) and complex numbers (Math::Complex) as well as provide access to algorithms for solving many types of math problems, such as solving for the roots of polynomial equations (Math::Polynomial::Solve). 

BioPerl – while not as generic to scientific computing as some of the ones mentioned above, BioPerl is a large and widely used collection of Perl modules for performing bioinformatics tasks.  More information on BioPerl can be found at http://www.bioperl.org/wiki/Main_Page. 

Perl’s Inline capability – the ability to integrate code from other languages into your Perl application (particularly inline C) can help to give your applications the flexibility and ease of use of Perl while allowing you to optimize certain parts of your application to improve performance.  

Of course there are many other modules as well that could be of great use for many scientific applications, as there are also modules that deal with data mining techniques, machine learning, and numerous other aspects of data analysis.  The intent, however, was to demonstrate that Perl does have a rich set of tools available to it for use in the development of scientific computing applications and that Perl should not be quickly dismissed in favor of Python.  I think for anyone entering the fields of computational biology or bioinformatics, Perl is still a language worth learning and still my preferred language for bioinformatics tasks.  Even if you decide in favor of Python for your new projects, you should be aware that many existing projects are written in Perl, and you may well have to maintain, modify, or interface with such codebases, where knowledge of Perl will only be to your advantage. 

Kobo has over 2 million ebooks to choose from!

4 comments:

szabgab said...

If you'd like to have more people use Perl for scientific computation, it might be a good idea to think about and discuss what might have caused the shift of some people towards Python?

Is there something the Python community did better than the Perl community?
What would be he edge of Perl in this competition?

cfrenz said...

My response to this is contained in my latest blog post - http://perlgems.blogspot.com/2012/06/improving-image-of-perl-perl-marketing.html

I think it is not really a problem of what Python is doing better, but rather one of Perl being viewed as an old and uncool language that is difficult to work with. Perl does not seem to be trendy right now and that influences the choices a lot of up and coming programmers make.

Joel Berger said...

Certainly you have mentioned the major Perl-for-Science modules, however I would like to mention my PerlGSL namespace, which provides Perlish (closure-based) interfaces to the GSL. They are young, but they are powerful as well!

djzort said...

by a fluke of nature, Google wrote it first search in python. RedHat now parties in the Python discotech.

Python forces tabbed in code, so that makes your code readable.

Ruby is also hip atm, but it allows you to write as incomprehensible code as perl/c/java/c++/php/other.

So because its cool, kids learn it in uni, then use it in their PhD's, start ups etc.

"The right tool for the right job", is secondary to - "a tool i know how to use"