Gene Expression: How Cells Turn DNA Information into Proteins
Transcription and RNA Processing
What “gene expression” means (and why cells need it)
Gene expression is the process by which the information stored in DNA is used to build functional products—usually proteins (or sometimes functional RNAs). It matters because DNA is like a long-term storage archive: most of it stays protected in the genome, but cells constantly need specific molecules (enzymes, receptors, structural proteins) to do work right now. Expressing the right genes in the right amounts is how a liver cell behaves like a liver cell while a neuron behaves like a neuron, even though they have (essentially) the same DNA.
In AP Biology, gene expression is often taught as two big stages:
- Transcription: copying a gene’s DNA sequence into RNA.
- Translation: using RNA to assemble a polypeptide (protein).
This section starts with transcription and then explains how eukaryotic cells process RNA so it can be translated accurately.
Transcription: making an RNA copy of a gene
Transcription is the synthesis of an RNA molecule using DNA as a template. The key idea is complementary base pairing, similar to DNA replication, but with important differences:
- RNA uses uracil (U) instead of thymine (T).
- Only one strand of DNA is used as a template for a given gene.
- The enzyme is RNA polymerase, not DNA polymerase.
The roles of the DNA strands (template vs. coding)
For any gene, one DNA strand is the template strand (the strand RNA polymerase reads). The RNA sequence is complementary to this template.
The other strand is the coding strand (also called the non-template strand). Its sequence matches the RNA sequence except that DNA has T where RNA has U.
A common confusion is thinking “the gene is on one strand.” In reality, genes can be located on either strand of the double helix, but each individual gene is transcribed from only one strand.
Direction matters: 5' and 3'
RNA polymerase adds nucleotides to the 3' end of the growing RNA, so RNA is synthesized 5' to 3'. That means RNA polymerase reads the DNA template 3' to 5'.
Students often memorize directions without understanding them. A helpful way to see it: the chemistry of polymerization requires adding to the 3' hydroxyl group, so the chain can only grow at that end.
Steps of transcription: initiation, elongation, termination
Even though details vary between prokaryotes and eukaryotes, the same three-stage structure applies.
Initiation (starting at the promoter)
A promoter is a DNA sequence where RNA polymerase (and associated proteins) binds to begin transcription. Promoters do two crucial jobs:
- They mark where transcription begins.
- They determine which direction transcription proceeds (which strand becomes the template).
In eukaryotes, RNA polymerase does not act alone. Transcription factors are proteins that help RNA polymerase bind to the promoter and start correctly. This is a common exam theme: if you disrupt transcription factors or promoter sequences, transcription drops or stops.
In prokaryotes, RNA polymerase can bind promoters with the help of a sigma factor (you do not always need to name sigma for AP-level responses, but you should understand that accessory proteins help locate promoters).
Elongation (building the RNA strand)
During elongation, RNA polymerase moves along the template strand and adds ribonucleotides (A, U, C, G) to the growing RNA. Base pairing rules:
- DNA A pairs with RNA U
- DNA T pairs with RNA A
- DNA C pairs with RNA G
- DNA G pairs with RNA C
RNA polymerase locally unwinds DNA as it moves and then DNA re-forms behind it. Unlike DNA polymerase, RNA polymerase does not require a primer.
Termination (stopping transcription)
Termination occurs when transcription ends and the RNA transcript is released.
- In prokaryotes, termination often involves specific DNA sequences that cause the polymerase to stop.
- In eukaryotes, termination is tied to processing signals; transcription typically continues past the coding region and the RNA is later cut to form the mature 3' end.
Prokaryotes vs. eukaryotes: why location changes everything
A high-yield distinction is that prokaryotes lack a nucleus.
| Feature | Prokaryotes | Eukaryotes |
|---|---|---|
| Where transcription occurs | Cytoplasm (nucleoid region) | Nucleus |
| Where translation occurs | Cytoplasm | Cytoplasm (after RNA export) |
| Are transcription and translation coupled? | Yes—ribosomes can translate as RNA is made | No—RNA must be processed and exported |
| RNA processing | Minimal | Extensive (cap, tail, splicing) |
This matters because it explains why eukaryotes need RNA processing: they must protect and prepare RNA for export, stability, and accurate translation.
RNA processing in eukaryotes: turning pre-mRNA into mRNA
In eukaryotes, the initial transcript is called pre-mRNA (or primary transcript). Before it can be translated, it is processed into mature mRNA.
1) 5' cap
A 5' cap is a modified guanine nucleotide added to the 5' end of the pre-mRNA.
Why it matters:
- Protects mRNA from degradation by enzymes.
- Helps ribosomes recognize and bind the mRNA during translation.
- Assists with export from the nucleus.
A common misconception is that the cap “starts translation” by itself. It helps with ribosome binding in eukaryotes, but translation still requires a start codon and proper initiation factors.
2) Poly-A tail
A poly-A tail is a string of adenine nucleotides added to the 3' end of the mRNA.
Why it matters:
- Increases mRNA stability (slows degradation).
- Helps with nuclear export.
- Can improve translation efficiency.
Students sometimes assume the poly-A tail is coded in the DNA as a long run of T’s. It’s typically added enzymatically after transcription using a signal in the RNA, not directly transcribed as a long poly-A sequence.
3) RNA splicing: removing introns, joining exons
Many eukaryotic genes contain introns (non-coding regions) and exons (expressed regions that remain in the final mRNA). RNA splicing removes introns and joins exons together.
Splicing is carried out by the spliceosome, a complex made of proteins and small RNAs.
Why splicing matters:
- It ensures the coding sequence is continuous so ribosomes read the correct codons.
- It allows alternative splicing, where the same pre-mRNA can be spliced in different ways to produce different proteins.
Alternative splicing (big idea: one gene, multiple proteins)
Alternative splicing means different combinations of exons are included in the final mRNA. This is a major reason eukaryotes can produce a wide variety of proteins without having a separate gene for each protein variant.
What can go wrong:
- A mutation at a splice site can cause an intron to remain or an exon to be skipped. That often shifts the reading frame, producing a nonfunctional protein.
“Show it in action”: transcription and processing example
Suppose a short region of the DNA template strand is:
- Template DNA: 3'–T A C G G A T T A C–5'
The RNA polymerase reads 3' to 5' and builds complementary RNA 5' to 3'. The mRNA transcript (before processing) would be:
- pre-mRNA: 5'–A U G C C U A A U G–3'
Now imagine this transcript contains an intron:
- pre-mRNA exons/intron: 5'–AUG CCU [intron] AAUG–3'
After splicing, the intron is removed and exons join:
- mature mRNA: 5'–AUG CCU AAUG–3'
Even in a simplified example, notice how removing or keeping bases changes the grouping into codons. That is why splicing errors can drastically change a protein.
Memory aids
- Cap at the front, tail at the end: 5' cap and 3' poly-A tail protect the message.
- INtrons are IN the way: introns get removed.
Exam Focus
- Typical question patterns:
- Given a DNA template or coding strand, determine the RNA transcript (watch strand orientation and base pairing).
- Predict effects of mutations in a promoter, splice site, or poly-A signal on mRNA production and protein levels.
- Compare prokaryotic vs. eukaryotic gene expression (especially coupling and RNA processing).
- Common mistakes:
- Mixing up template vs. coding strand; fix this by writing “RNA matches coding strand (U for T).”
- Writing RNA in the wrong direction; always build RNA 5' to 3'.
- Treating introns as if they are translated; remind yourself: introns are removed before translation in eukaryotes.
Translation
What translation is (and why it matters)
Translation is the process of building a polypeptide (protein) by reading the sequence of an mRNA. If transcription is copying a recipe, translation is cooking the meal. The “language” changes from nucleotides to amino acids.
Translation is central because most cell structures and functions are carried out by proteins—enzymes that catalyze reactions, channels that move substances, and signaling molecules that coordinate responses.
The genetic code: codons specify amino acids
A codon is a three-nucleotide sequence on mRNA that specifies an amino acid or a stop signal. Because codons are read in groups of three, the mRNA has a reading frame.
Key properties of the genetic code (high-yield concepts):
- It is non-overlapping: each nucleotide is part of only one codon in a reading frame.
- It is redundant (degenerate): multiple codons can code for the same amino acid.
- It is nearly universal across life, which supports common ancestry.
Start and stop signals:
- Start codon is typically AUG, which codes for methionine (Met). It establishes the reading frame.
- Stop codons: UAA, UAG, UGA (they do not code for amino acids).
A common misconception: “AUG always means the first amino acid in the final protein is methionine.” Often it does start with Met, but in many organisms the initial Met may be removed or modified after translation. For AP questions, treat AUG as the start signal that sets the frame.
The molecular players: mRNA, tRNA, ribosomes
Translation requires coordinated work by several components.
mRNA: the message
mRNA (messenger RNA) carries the codon sequence from DNA (in eukaryotes, from nucleus to cytoplasm).
tRNA: the adapter that matches codons to amino acids
tRNA (transfer RNA) molecules bring amino acids to the ribosome. Each tRNA has:
- An anticodon: a three-base sequence complementary to an mRNA codon.
- An attachment site for a specific amino acid.
Why this matters: the ribosome does not “know” amino acids directly from codons; tRNA is the translator.
tRNAs are “charged” (loaded) with the correct amino acid by enzymes called aminoacyl-tRNA synthetases. If these enzymes make mistakes, the wrong amino acid can be inserted even if codon-anticodon pairing is correct.
Ribosomes: the protein-building machines
A ribosome is made of rRNA (ribosomal RNA) and proteins. Ribosomes have:
- A small subunit that binds mRNA.
- A large subunit that helps form peptide bonds.
A crucial concept is that ribosomes are ribozymes: rRNA helps catalyze peptide bond formation (often summarized as “the ribosome’s rRNA does the catalysis”). This supports the idea that RNA can be both information and function.
Where translation happens
- In prokaryotes, translation occurs in the cytoplasm and can begin while transcription is still happening.
- In eukaryotes, translation occurs in the cytoplasm (either on free ribosomes or on ribosomes bound to the rough ER), after mRNA processing and export.
Steps of translation: initiation, elongation, termination
Translation is often tested as a sequence of events. Understanding the logic of each step is more important than memorizing every factor name.
Initiation: assembling the parts at the start codon
During initiation:
- The small ribosomal subunit binds the mRNA.
- An initiator tRNA pairs its anticodon with the start codon (AUG).
- The large ribosomal subunit joins to form a complete ribosome.
Ribosomes have three key binding sites for tRNA:
- A site (aminoacyl site): where the next charged tRNA enters.
- P site (peptidyl site): holds the tRNA carrying the growing polypeptide.
- E site (exit site): where empty tRNA leaves.
A helpful way to remember the order: A-P-E corresponds to the flow of tRNA through the ribosome.
Elongation: adding amino acids and moving along the mRNA
During elongation, the ribosome repeats a cycle:
- A charged tRNA enters the A site and base-pairs with the next codon.
- A peptide bond forms between the amino acid in the A site and the growing chain in the P site.
- The ribosome translocates (moves) one codon forward:
- The tRNA with the growing chain shifts to the P site.
- The empty tRNA shifts to the E site and exits.
The polypeptide grows from the N-terminus to the C-terminus (a detail that sometimes appears in deeper questions, but the essential AP takeaway is that amino acids are added in the order specified by codons).
Because the code is read in triplets, adding or removing one nucleotide (a frameshift mutation) changes every codon downstream and often produces a drastically different, nonfunctional protein.
Wobble and redundancy
Because the genetic code is redundant, some tRNAs can pair with multiple codons (often due to flexible pairing at the third codon position). You usually do not need deep “wobble rules” for AP, but you should understand the consequence: redundancy helps buffer against some mutations, especially changes in the third base that do not alter the amino acid (silent mutations).
Termination: stopping at a stop codon
When a stop codon enters the A site, no tRNA matches it. Instead, a release factor binds, prompting the ribosome to release the completed polypeptide. The ribosomal subunits then dissociate and can be reused.
“Show it in action”: translating an mRNA sequence
Consider the mRNA sequence:
- 5'–AUG GCU UUU UGA–3'
Step-by-step:
- AUG: start codon, codes for Met (methionine). Translation begins and the reading frame is set.
- GCU: codes for alanine (Ala).
- UUU: codes for phenylalanine (Phe).
- UGA: stop codon. Translation ends.
Resulting polypeptide (in order): Met–Ala–Phe
Now connect this to mutation reasoning (common on exams): if the second codon changes from GCU to GCA, it still codes for alanine (silent mutation). But if the second codon changes to UAA, that is a stop codon—translation would terminate early (a nonsense mutation), producing a shorter protein.
Polyribosomes: making many proteins quickly
A single mRNA can be translated by multiple ribosomes at the same time, forming a polyribosome (polysome). This increases efficiency—one transcript can produce many protein copies quickly.
This is a frequent conceptual connection: gene expression levels can increase not only by making more mRNA, but also by translating existing mRNA more efficiently.
Real-world connections: why translation is a common drug target
Because bacterial ribosomes differ structurally from eukaryotic ribosomes, many antibiotics target bacterial translation with relatively limited harm to human cells.
- Some antibiotics block the bacterial ribosome’s function, preventing protein synthesis and therefore bacterial growth.
A typical reasoning question might ask why a drug affects bacteria but not human cells: differences in ribosome structure provide selectivity.
Common “what goes wrong” scenarios (and how to reason through them)
- Frameshift mutation (insertion/deletion not in multiples of three): changes downstream codons, often introduces an early stop codon. Expect major protein disruption.
- Nonsense mutation (codon becomes a stop codon): shorter protein, usually nonfunctional.
- Missense mutation (codon changes amino acid): effect depends on how important that amino acid is to protein structure/function.
- Start codon mutation: ribosome may not initiate properly, leading to no translation or initiation at a later AUG (shorter protein).
A common student error is assuming “any mutation changes the protein dramatically.” Due to redundancy, some mutations are silent. The key is to translate and compare, or at least classify the mutation type.
Linking transcription, RNA processing, and translation into one coherent flow
To really understand gene expression, keep the dependencies straight:
- If transcription is reduced (promoter mutation, missing transcription factors), there is less pre-mRNA and ultimately less protein.
- If RNA processing is defective (splicing error, missing cap/tail), mRNA may be unstable or never leave the nucleus, reducing translation.
- If translation machinery is blocked (ribosome-targeting toxin), mRNA may be present but protein is not produced.
These cause-and-effect chains show up often in AP Biology free-response questions where you must explain experimental results.
Exam Focus
- Typical question patterns:
- Translate an mRNA (or determine amino acid sequence) and predict how a mutation changes the polypeptide (silent vs. missense vs. nonsense vs. frameshift).
- Explain how tRNA anticodons pair with codons and how the ribosome’s A, P, and E sites coordinate elongation.
- Compare translation in prokaryotes vs. eukaryotes (especially coupling with transcription and cellular location).
- Common mistakes:
- Reading the mRNA in the wrong direction; always read codons 5' to 3'.
- Losing the reading frame by grouping bases incorrectly; start at the start codon and partition into triplets.
- Confusing codon vs. anticodon; codons are on mRNA, anticodons are on tRNA.