Applied Bioinformatics

Categories:

Recommended

1.1 Nucleic Acid Bioinformatics

The tremendous growth of bioinformatics and computational biology in the late 20th and early 21st centuries has had an associated growth in software and algorithms for studying biological sequences. This book is designed to introduce students of the life sciences with little to no programming experience to the concepts and methodologies of bioinformatics, and contemporary software applications. The majority of this course will deal with the analysis of nucleic acid sequence data. Many of these applications have web interfaces, but where possible command line software, and strategies for dealing with sequence data with the GNU/Linux command line interface (the terminal), will be used. We won’t be content to simply run commands blindly, but rather we will also learn a great deal of the equations and theory behind these methods. In addition, these notes will provide an introduction to Biopython as a tool for bioinformatics and as a framework to connect all the concepts presented.

1.1.1 GNU/Linux and the command line

Bioinformatics is for the most part done best using the GNU/Linux command line interface, and therefore we will need a brief introduction. The GNU/Linux command line interface is well suited for working with the kinds of files commonly used in bioinformatics. After this introduction, we will continue to learn new GNU/Linux commands while we broaden our repertoire of software tools, Biopython commands, file formats, and databases.

The GNU/Linux command line interface is a text-based interface to files and commands. Such a terminal can be accessed from many terminal applications in a GNU/Linux distribution, or with the Terminal program (and others such as iTerminal) for Mac OS X, and other software available for Windows, such as Putty and MobaXterm. Advanced users of the terminal can accomplish difficult tasks with amazing efficiency, such as running complex commands on thousands of files in one fell-swoop while barely breaking a sweat. The combination of a strong knowledge of the terminal and shell scripting, along with knowledge of a scripting language like python or perl can be very powerful and worth learning for any student of bioinformatics. Scripting languages are named after the fact that they are used to write “scripts”, which are programs written to automate a sequence of commands and typically use an interpreter rather than a compiler.

Once you’ve opened your terminal application, you should be given a cursor in which to type text-based commands. For example, say you want to print a list of all files and directories in your current location (directory), you use the command . Directories are like folders for keeping files, but they can also be a location. In other words, you can be working in a directory, and any files you create will then be kept in that directory. A file, such as a text file for storing a sequence, is the unit in which data is stored. Although most people know what a file is from their experiences with computers, there is a lot that could be said about what a file actually is beyond the scope of these class notes. For our purposes right now, this intuitive concept of a file that most people have will suffice.

If you’ve never programmed before, you’ve just created your first script! The preceding was by no means a complete introduction to GNU/Linux. If anything, you should now have a rudimentary starting point. Throughout these class notes, we will learn new commands as we go, and new python code as we go. The intent here is to learn the essential pieces to do something simple, and keep learning as we go. In the end, we’ll have a decent set of commands to accomplish a lot.

Category:

Attribution

Applied Bioinformatics by David A. Hendrix is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

VP Flipbook Maker

This flipbook was powered by Visual Paradigm Online. You can create one as well by upload your own PDF documents. Try out this online flipbook maker for free now!