DNA as a Biochemical Entity and Data String


David M. Francis, The Ohio State University; Heather L. Merk, The Ohio State University

This module provides an introduction to DNA as a biochemical entity and as a data string.


We can think of DNA as a biochemical entity or as a data string. With the advent of high throughput sequencing and an increasingly central role for bioinformatics, the tendency is to think of DNA as a string of A‘s, C‘s, T‘s, and G‘s. However, in seeking to translate information from genome sequencing projects into applied outcomes, we need to revisit DNA as a biochemical entity in order to design appropriate assays to detect genetic variation. The Khan Academy developed a thorough video introduction to DNA.

DNA Structure

The central theme of molecular biology is that DNA makes RNA, RNA makes protein, and these proteins are responsible for phenotypes. An understanding of the variation in DNA can therefore lead to a mechanistic understanding of phenotypic variation. We now know that the central Dogma—DNA makes RNA and RNA makes protein—is an oversimplification. The ability of RNA to make DNA, the role of small RNA molecules in regulating genes, and conditioning genome imprinting has led to the RNA world hypothesis, a term coined by Gilbert (1986). Despite revisions in our thinking about the central role of DNA, there is still a central assumption that most phenotypic variation can be linked to variation in DNA sequence. The Khan Academy developed an instructional video describing generation of DNA sequence variation.

DNA is composed of four bases—adenine, guanine, cytosine, and thymine, with uracil replacing thymine in DNA that has been transcribed into mRNA (Fig. 1 and video). In DNA, these bases are attached to a phosphate–deoxyribose backbone and they pair, adenine with thymine and guanine with cytosine, in an anti-parallel manner as seen in Fig. 1. The implications of this pairing are that one strand forms a template for the other strand.

Chemical structure of DNA
Figure 1. Chemical structure of DNA, with colored label identifying the four bases as well as the phosphate and deoxyribose components of the backbone. Figure credit: Madeline Price Ball, Harvard University.

Watch the legendary scientist, James Watson, describe DNA structure in the following video, created by the Dolan DNA Learning Center.

In this video, James Watson describes how he and Francis Crick uncovered the structure of DNA.

Information in the DNA Code

The biochemical structure of DNA determines much of how we interpret a string of sequence. For example, the 4,527 bases from the tomato PSY1 gene can be depicted as a string as shown in the screenshot below. This sequence is in FASTA format where the symbol > denotes a name. The name can be simple, as in the sequence depicted here, or more complex. For example, the NCBI report for this sequence found in Fig. 2, as well as at the NCBI website, lists the name as “>gi|155965506|gb|EF534740.1| Solanum lycopersicum cultivar Red Setter phytoene synthase 1 (psy1) gene, complete cds”. In the FASTA format the name is followed by a paragraph break, and subsequently by a string of DNA or protein sequence.

The PSY1 gene in FASTA format taken from NCBI
Figure 2. The PSY1 gene in FASTA format taken from NCBI. Screenshot provided by Heather Merk, The Ohio State University.

This DNA string contains all of the information that we need to determine the reverse strand and therefore the fifteen possible open reading frames that might code for a protein (Fig. 3).

Possible Open Reading Frames identified in the PSY1 gene of tomato
Figure 3. Possible Open Reading Frames identified in the PSY1 gene of tomato. The figure is generated from NCBI’s ORF finder. Screenshot provided by Heather Merk, The Ohio State University.

By comparing sequences between two or more plant varieties, we can identify differences that might affect the protein and therefore the phenotype. This topic will be covered on the molecular markers page.

On a lighter note, “Ladders” (a poem about DNA), is included for your enjoyment.

References Cited

External Links

  • DNA [Online]. Khan Academy. Available at: www.khanacademy.org/video/dna (verified 5 Aug 2012).
  • Dolan DNA Learning Center [Online]. Cold Spring Harbor Laboratory. Available at: www.dnalc.org/ (verified 5 Aug 2012).
  • National Center for Biotechnology Information [Online]. U.S. National Library of Medicine, National Institutes of Health. Available at: http://www.ncbi.nlm.nih.gov (verified 5 Aug 2012).
  • Variation in a species [Online]. Khan Academy. Available at: www.khanacademy.org/video/variation-in-a-species (verified 5 Aug 2012).
  • Wikipedia contributor. File: DNA chemical structure.svg [Online]. Wikipedia, The Free Encyclopedia. Available at: http://en.wikipedia.org/wiki/File:DNA_chemical_structure.svg (verified 5 Aug 2012).

Additional Resources

To learn more about the RNA world hypothesis, see Penny (2005).

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 871