2 Search problems
“Search” is such a broad term that this entire book could be called Classic Search Problems in Python. This chapter is about core search algorithms that every programmer should know. It does not claim to be comprehensive, despite the declaratory title.
2.1 DNA search
Genes are commonly represented in computer software as a sequence of the characters A, C, G, and T. Each letter represents a nucleotide, and the combination of three nucleotides is called a codon. This is illustrated in figure 2.1. A codon codes for a specific amino acid that together with other amino acids can form a protein. A classic task in bioinformatics software is to find a particular codon within a gene.
Figure 2.1 A nucleotide is represented by one of the letters A, C, G, and T. A codon is composed of three nucleotides, and a gene is composed of multiple codons.
2.1.1 Storing DNA
We can represent a nucleotide as a simple enum with four cases.
Listing 2.1 Gene.java
package chapter2;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
public class Gene {
public enum Nucleotide {
A, C, G, T
}
Codons can be defined as a combination of three Nucleotides. The constructor for the Codon class converts a String of three letters into a Codon. To implement search methods, we will need to be able to compare one Codon to another. Java has an interface for that, Comparable.