

In both cases, we can consider the A to be an insertion into the sequence ( a) and a deletion in the same position in the sequence ( b). Note the columns four and five, where we placed gaps in the sequence ( b). We may obtain additional information where to position the gap by simultaneously comparing multiple sequences, but this is a topic in multiple sequence alignment tutorial.įigure 3. These two alignments are considered to be separate and different alignments although it is not possible to draw conclusions where to place the gap based on a pair-wise alignment. The number of matching characters is the same, independent of placement of the gap in position four or five (Figure 3). In this alignment, nine letters out of the total 13 are aligned and matching in the columns.įigure 2. By inserting gaps, the sequences become the same length, and by choosing proper positions, additional letters will align, as shown in Figure 2.

How can we make additional letters to match? In this alignment, there are only three matching letters on the columns. shows pair-wise sequence alignment, but a close examination reveals that we can make the rows to match better. A pair-wise raw sequence alignment.įigure 1. We can make the following alignment as in Figure 1.įigure 1. Let's take two sequences a) ATGAAGCGTGC, length 11 bases, and b) ATGAAGAGTGCA, length 12 bases. For the demonstration purposes, we only use DNA sequences, although the concepts also apply to RNA and protein sequences. The following paragraphs introduce the ideas of a match, mismatch, substitution, gap, insertion, deletion, indel, global and local alignments.

Note that, the list of reasons is longer than presented here.Ĭoncepts. (iii) Importantly, all sequence database searches involve comparison of sequences to detect a similarity to a search sequence. (ii) A comparison of multiple gene sequences from several species can recognize sequence stretches preserved or similar among species thus, hinting about the possibility that these conserved regions have a related function in organisms. So, why do we compare sequences? There are many reasons, for example, (i) we can identify causes of genetic diseases by comparing sequences from healthy and unhealthy individuals. We will get to that in detail in a later tutorial.

The single-letter codes for each amino acid are: I, L, V, F, M, C, A, G, P, T, S, Y, W, Q, N, H, E, D, K, R. Protein sequences consist of 20 different amino acids. In ribonucleic acid (RNA) thymine (T) is replaced by uracil, represented by the letter U. Deoxyribonucleic acid (DNA) consists of four bases adenosine, guanine, thymine, and cytosine that are represented by single letters A, G, T, and C respectively. Nonetheless, we have a different purpose in bioinformatics, and that is why we will look at the DNA, RNA and protein sequences. Programs can detect a similarity between songs the same way, perhaps to identify plagiarism. In principle, apps accomplish this by alignment of musical notes and rhythms. There are apps available that can recognize songs by listening to them. This way we could align two different audio recordings of a piece of music. In general, we can compare two sequences by placing them above each other in rows and comparing them character by character. Here we are interested in things that we can arrange in sequences thus we discuss the comparison of sequences. Whether it is worth the effort is another story. In any case, it is possible to compare whatever we like. Introduction to the following concepts: match, mismatch, substitution, gap, insertion, deletion, indel, global and local alignments.Īpples and oranges.They say that it is not worthwhile to compare apples and oranges, because they are fundamentally different.
