|
Once the DNA sequence of a new gene is known, how do scientists figure
out the function of its protein product?
One of the first things a scientist would do is use the Universal Genetic Code
to predict the amino acids encoded by the gene. Unfortunately, this only gives
a list of the protein's sequence. Proteins actually don't remain in the nice long,
straight lines they're coded in: they fold, and this folding is the key to their unique functions.
Proteins fold into a variety of 3-dimensional shapes. Experiments to unfold and
refold proteins have shown that the amino acid sequence itself contains all the
instructions needed for proper folding. Scientists have searched for rules governing
folding but have found no reliable way to predict a 3-dimensional structure from a simple sequence.
Knowing a protein's sequence does help, however. Having determined the sequence, the
next thing a scientist would do is compare the DNA sequence of the newly discovered gene with those
of all previously discovered genes. Perhaps the sequence of the new gene will be similar
to another whose function is already known. Scientists have already determined the functions
of many proteins using a variety of methods.
For example, they can determine:
- How big a protein is
- Where it's located in an organism or even inside a cell
- Whether it interacts with DNA, RNA, nucleotides, membranes, or other proteins
- Whether it's changed by the cell after being made
- Whether it can change other proteins by modifying them or breaking them into pieces
All of this information tells a scientist something about the possible functions of a
protein. With this knowledge, the scientist can formulate testable hypotheses about the protein's role.
If you can relate the sequence of a new protein to that of one that has already been
characterized using tests like these, you have a jump on figuring out what the new protein
might be doing in the cell. What sorts of similarities might a scientist find between a
newly discovered gene and one we know more about?
- Genes may share high sequence similarity across their entire length.
- Genes may show sequence similarity that is limited to a certain region. For example,
the protein encoded by the gene may share a well-characterized DNA-binding domain with
other proteins, while other parts of the protein are different.
- Genes may share similar motifs. Motifs are common amino acid sequences whose folded structure
is known; zinc fingers and leucine zippers are good examples. Sequences residing between motifs
can differ greatly from protein to protein, and the folded structure of these areas may be
unknown, but the known motifs will usually fold into similar shapes.
If the new gene shares no similarities with any other known gene, the scientist will term
it "unique." Without any clues to go on, it is more difficult to propose and test hypotheses
about the gene's function, but previous research findings can be useful.
Background information about a wide variety of known proteins helps speed the study of new ones.
In the same way, basic research -- meaning research for the purpose of discovering any and all
information, not necessarily immediately useful information -- is crucial to the progress of all science.
Funding provided by a Howard Hughes Medical Institute Precollege Science Education Initiative for Biomedical Research
Institutions Award (Grants 51000125, 51000176)
|