Anatomy of a Gene

Genes are best known as the instructions for building proteins. However, only a portion of the nucleotides in a gene actually code for the protein itself. Other parts of the gene provide additional information—including sequences that control when, where, and how much protein to make.

More about genes

The average human protein-coding gene is about 3,000 letters long, but our genes come in a wide range of sizes. The shortest has only 500 letters, and the longest has 2.3 million.

Given their importance, genes make up a surprisingly small proportion of the human genome. Our 21,000 or so protein-coding genes account for less than 2% of the genome's total nucleotides. Another small chunk of the genome contains non-coding genes, which code for RNA products like transfer and ribosomal RNA that are not translated into proteins. But the bulk of the genome doesn't code for any product at all. It does, however, provide the necessary structure and organization that keep our genes working properly.

Alternative splicing allows cells to use the information in our genes in different ways

By putting different combinations of exons together, our cells can make different mRNAs from the same gene. This process, known as "alternative splicing," allows our cells to use the information in our genes in different ways. For example, for many proteins, one version (or "isoform") is stuck into the cell membrane, while another, shorter, version is free-floating. Thanks to alternative splicing, our cells can make many more proteins than we have genes.

More-complex organisms like humans don't typically have more genes than simpler organisms. Rather, our genomes have more sophisticated control mechanisms that allow our genes to be used in more ways, leading to greater complexity.

Switches are used in combination

Switches give cells the flexibility to react to signals from the outside world. They are also central to differentiation, the process by which a cell takes on an identity—as a liver cell as opposed to a skin cell, for example. Each cell type has a different combination of active and inactive genes. Whether a gene is turned "on" or "off" is regulated in part by switch proteins.

Our genome contains 21,000 or so genes, but only about 2,600 of those code for DNA-binding proteins, the proteins that activate switches. So how does a limited number of switch proteins regulate such a large number of genes? The cell uses two basic strategies to leverage the usefulness of switch proteins. First, switches are used in combination. Many genes are controlled by multiple switches, each activated by a different switch proteins. And different combinations of switches are used to activate different genes. Second, genes encoding proteins involved in the same process often have similar switches that are activated by the same switch proteins. In other words, a single switch protein may activate multiple genes.

Other important pieces

At either end of the mRNA are 5-prime and 3-prime untranslated regions (UTRs). The UTRs are assembled from what are considered to be exons, even though they don't directly code for protein. They do, however, contain sequences that are important in the protein-building process. For example, many UTRs contain sequences that help the ribosome attach and detach, influence how much protein is made, and affect the lifespan of the mRNA. Some also contain "localization" signals, special tags that keep an mRNA within a specific area of the cell. UTRs vary in size from about 100 to a few thousand nucleotides.

  • Funding

    Funding provided by grant 51006109 from the Howard Hughes Medical Institute, Precollege Science Education Initiative for Biomedical Research.