Protein Domains: Structure, Function, and Modularity

Domains are separate from primary, secondary, tertiary, and quaternary structure
Definition: a region of a protein that has its own discrete fold. If you separate that chunk of the polypeptide, it would fold in exactly the same way. Thus, a domain sits between secondary structure and the overall tertiary fold; it can be made from different secondary structures (often a mix of alpha helices and beta sheets) but has its own characteristic fold
Domains generally have their own functions too, such as:
- enabling dimerization (protein–protein interactions)
- helping localize the protein by binding other molecules (proteins, DNA, lipids)
- having enzymatic activity
Many proteins are composed of multiple domains, contributing to modular construction of protein function

SRC (sarcoma protein kinase) is a signaling enzyme with three domains, color-coded in diagrams: a C-terminal kinase domain (yellow/orange), an SH2 domain (blue), and an SH3 domain (green)
Kinase domain (C-terminal):
- ATP is sandwiched between two lobes; this domain functions as a kinase and carries out phosphorylation
- The domain is highly structured and contains both alpha helices and beta sheets
SH2 domain (blue): binds phosphotyrosine residues
SH3 domain (green): binds sequences containing proline and hydrophobic amino acids
SH2 and SH3 domains provide regulatory functions for SRC; these domains are common in many proteins and will be revisited in later topics on protein modification and regulation

SH2 domain (approx. $100$ amino acids) shown in four representations to illustrate folds
- Backbones model (top-left): used for overlays of domain folds; shows only backbone carbons and nitrogens
- Ribbon model (top-right): highlights secondary structure; makes alpha helices and beta sheets visually distinct
- Space-filling model (bottom-right): uses van der Waals radii to show how much space the domain occupies
- Wireframe model (bottom-left): shows amino acid side chains; useful for inspecting active sites and interaction surfaces
Often, many proteins or domains are shown as a mix of these representations (e.g., bulk of the protein in backbone or ribbon, with residues at active sites shown as wireframe or substrate in space-fill)

Cytochrome B562 (left): single-domain protein composed of alpha helices; involved in electron transport; shown using ribbon representation
NAD-binding domain of lactate dehydrogenase (center): core contains a mix of alpha helices and beta sheets
Immunoglobulin variable domain (right): beta-sheet structure, largely antiparallel; contains unstructured regions (linkers) represented in yellow that connect adjacent secondary structure elements
Unstructured regions (linkers): flexible sequences that connect helices to sheets or sheets to sheets, enabling dynamic interactions

Domains are generally small, modular parts of proteins that can be composed of alpha helices, beta sheets, or a mix
Each domain has its own fold and function, contributing to the overall properties of the protein

Homeodomain (DNA-binding domain) shown in ribbon (left) and backbone overlay (right)
Ortholog comparison: yeast vs Drosophila (two billion years of evolution, $2\times 10^9\text{ years}$ )
Sequence conservation is low yet structural fold is highly conserved:
- 60 amino acids examined, with only $rac{17}{60}$ identical ≈ 0.283 (about 28.3%)
- Despite this, the backbone overlay shows nearly identical fold, indicating that primary sequence can diverge while the domain fold remains conserved
Concept reinforced: different amino acid sequences can converge on a conserved structural fold at the level of domains

Fibronectin example: extracellular matrix protein composed of four adjacent, highly similar domains (fibronectin type III domains)
- These four domains are practically identical due to tandem duplication at the genomic level
- Concept: tandem duplication increases the number of identical domains in a protein
- Similar phenomena occur with cadherins (cell–cell adhesion proteins) showing repeated domains
Domain architecture as a recurrent theme in extracellular and signaling proteins

Domain shuffling slides show multiple proteins built from a combination of domains
Mechanism: accidental joining of DNA sequences encoding different domains during evolution; if the new gene/protein is useful, it is conserved
Visual takeaway: domains act as building blocks shared across many proteins; proteins—especially those involved in signaling—often assemble from common domain modules found across different genes

Five proteins shown, except for EGF (growth factor) at the top, are all proteases with a common protease domain (brown) at the C-terminus
Examples:
- Chymotrypsin (simple digestive enzyme): protease domain alone, with no other domains
- Urokinase, Factor IX, Plasminogen: multi-domain proteases with additional regulatory domains
Factor IX: multi-domain architecture with
- Calcium-binding domain (yellow) that enables binding to phospholipids in a calcium-dependent fashion
- Two EGF-like domains (green) that facilitate binding to tissue factor on sub-endothelial cells and platelets, directing activity to the right place at the right time during blood clotting
Plasminogen: protease domain plus five kringle domains (blue) which mediate binding to clots and localization of activity; enables breakdown of clots
The protease domain count and placement (often at the C-terminus) demonstrate how domain shuffling can position catalytic domains with regulatory or targeting domains to achieve precise control of activity

Domains are a separate class of structural organization from the classic four levels of structure; they are folding units that can fold independently
They provide specific properties: catalytic activity, binding to other proteins or molecules, or regulatory roles
Domain sharing is common: the same or similar domains appear in many different proteins due to domain shuffling and duplication
The modular nature of domains underpins evolution of complex signaling networks and multifunctional enzymes
Takeaway: understanding domains helps explain protein function, evolution, and how multi-domain proteins achieve precise spatial and temporal control of activity

Next video topic: covalent modification of proteins and protein regulation (to connect domain structure with regulation and control of activity)