How do Cells Read Genes?
Like words in a sentence, the DNA sequence of a gene determines the amino acid sequence for the protein it encodes. In the protein-coding region of a gene, the DNA sequence is interpreted in groups of three nucleotide bases, called codons. Each codon specifies a single amino acid in a protein.
Learn about the other parts of a gene in Anatomy of a Gene.
DNA as a sentence
We can think about the protein-coding sequence of a gene as a sentence made up entirely of 3-letter words. In the sequence, each 3-letter word is a codon, specifying a single amino acid in a protein. Have a look at this sentence:
If you were to split this sentence into individual 3-letter words, you would probably read it like this:
The sun was hot but the old man did not get his hat.
This sentence represents a gene. Each letter corresponds to a nucleotide base, and each word represents a codon. What if you shifted the "reading frame?" You would end up with:
T hes unw ash otb utt heo ldm and idn otg eth ish at.
Or Th esu nwa sho tbu tth eol dma ndi dno tge thi sha t.
As you can see, only one of these reading frames translates into an understandable sentence. In the same way, only one reading frame within a gene codes for the correct protein.
Mutating a DNA sentence
Take this DNA sequence:
You can separate the sequence into 3-letter codons, in 3 different ways:
- GCA TGC TGC GAA ACT TTG GCT GA
- G CAT GCT GCG AAA CTT TGG CTG A
- GC ATG CTG CGA AAC TTT GGC TGA
How can you tell which reading frames is the correct one?
All protein-coding regions begin with the sequence "ATG," which encodes the amino acid methionine (Met). Therefore, the correct reading frame will contain the codon "ATG."
You can predict the amino acid sequence of the protein by using the Universal Genetic Code.
The Universal Genetic Code
The Universal Genetic Code is the instruction manual that all cells use to read the DNA sequence of a gene and build a corresponding protein. Proteins are made of amino acids that are strung together in a chain. Each 3-letter DNA sequence, or codon, encodes a specific amino acid.
The code has several key features:
- All protein-coding regions begin with the "start" codon, ATG.
- There are three "stop" codons that mark the end of the protein-coding region.
- Multiple codons can code for the same amino acid.
Note: Protein-building machinery does not read DNA directly. Instead, it reads an intermediate molecule, called messenger RNA, that is copied fron DNA. Learn more about this process in The Connection Between DNA, Proteins, and Genes.
Mutation is a process that makes a permanent change in a DNA sequence. Changing a gene's DNA sequence can change the amino acid sequence of the protein it codes for.
Point mutations are single base changes in a gene's DNA sequence. They can be further categorized:
- Missense mutations cause a single amino acid change within the protein.
- Nonsense mutations create a premature "stop" codon, causing the protein to be shortened.
- Silent mutations do not cause amino acid changes.
Insertion and Deletion Mutations
Insertion mutations and deletion mutations add or remove one or more DNA bases. Insertions and deletions (unless they happen in multiples of 3) can shift the reading frame of a gene, changing the grouping of bases into codons. Also called frameshift mutations, these changes can greatly affect a protein's amino acid sequence.