The Structure and Function of Large Biological Molecules

The Molecules of Life

All living things are composed of four main classes of large biological molecules:
- Carbohydrates
- Lipids
- Proteins
- Nucleic acids
Macromolecules are large, complex molecules.
Large biological molecules possess unique properties derived from the ordered arrangement of their constituent atoms.

Macromolecules: Polymers Built from Monomers

Polymer: A long molecule composed of many similar or identical building blocks linked by covalent bonds.
Monomer: The repeating units that serve as the building blocks of a polymer.
Three of the four classes of life’s organic molecules are polymers:
- Carbohydrates
- Proteins
- Nucleic acids
Lipids are the exception; they are not polymers and are generally not considered macromolecules in the same way, though they are large biological molecules.

Synthesis and Breakdown of Polymers

Enzymes: Specialized macromolecules (usually proteins) that function as catalysts to speed up chemical reactions, including those that build or break down polymers.
Dehydration Reaction (Synthesis):
- Occurs when two monomers are covalently bonded together.
- Involves the loss of a water molecule ( $H_2O$ ).
- Forms a new covalent bond between the monomers.
Hydrolysis (Breakdown):
- The process by which polymers are disassembled into monomers.
- Essentially the reverse of a dehydration reaction.
- A water molecule ( $H_2O$ ) is added, breaking a covalent bond in the polymer.

Diversity of Polymers

Each cell contains thousands of different macromolecules.
Macromolecules exhibit variation:
- Among different cells within an organism.
- More significant variation within a species.
- Even greater variation between different species.
A vast array of polymers can be constructed from a relatively small set of monomers, much like a few letters of the alphabet form countless words.

Carbohydrates: Fuel and Building Material

Carbohydrates encompass sugars and the polymers formed from sugars.

Sugars

Monosaccharides (Simple Sugars):
- The simplest carbohydrates.
- Typically have molecular formulas that are multiples of $CH_2O$ .
- Glucose ( $C<em>6H</em>{12}O_6$ ) is the most common monosaccharide.
- Classification:
  - Location of the carbonyl group:
    - Aldose (aldehyde sugar): Carbonyl group at the end of the carbon skeleton (e.g., Glyceraldehyde, Ribose, Glucose, Galactose).
    - Ketose (ketone sugar): Carbonyl group within the carbon skeleton (e.g., Dihydroxyacetone, Ribulose, Fructose).
  - Number of carbons in the carbon skeleton:
    - Trioses: $3$ -carbon sugars (e.g., Glyceraldehyde, Dihydroxyacetone).
    - Pentoses: $5$ -carbon sugars (e.g., Ribose, Ribulose).
    - Hexoses: $6$ -carbon sugars (e.g., Glucose, Galactose, Fructose).
- In aqueous solutions, many sugars form ring structures rather than remaining in linear skeletons.
- Serve as a major fuel for cellular respiration and as raw material for synthesizing other organic molecules (e.g., in metabolic pathways).
Disaccharides:
- Formed when two monosaccharides are joined by a dehydration reaction.
- The covalent bond connecting them is called a glycosidic linkage.
- Examples:
  - Maltose: Glucose + Glucose, linked by a $1-4$ glycosidic linkage.
  - Sucrose: Glucose + Fructose, linked by a $1-2$ glycosidic linkage (table sugar).

Polysaccharides

Polymers of many sugar building blocks (monosaccharides).
Serve significant storage and structural roles.
- The specific architecture and function of a polysaccharide are determined by its sugar monomers and the positions of its glycosidic linkages.
Storage Polysaccharides:
- Starch:
  - The primary storage polysaccharide in plants.
  - Consists entirely of glucose monomers.
  - Plants store surplus starch as granules within chloroplasts and other plastids.
  - The simplest form is amylose (unbranched).
  - Amylopectin is a more complex, somewhat branched form.
  - The glucose monomers in starch are joined by $\alpha$ glycosidic linkages, causing the polymer to be largely helical in structure.
- Glycogen:
  - The main storage polysaccharide in animals.
  - Extensively branched polymer of glucose.
  - Primarily stored in liver and muscle cells.
  - Hydrolysis of glycogen in these cells releases glucose rapidly when the demand for sugar (energy) increases (e.g., during exercise or fasting).
Structural Polysaccharides:
- Cellulose:
  - A major component of the tough plant cell walls.
  - Like starch, it is a polymer of glucose.
  - However, the glycosidic linkages differ: cellulose uses $\beta$ glycosidic linkages (as opposed to $\alpha$ in starch).
  - Glucose ring forms: The difference is based on two ring forms for glucose which are in equilibrium in aqueous solution: alpha ( $\alpha$ ) glucose and beta ( $\beta$ ) glucose.
  - Cellulose molecules (with $\beta$ configuration) are straight and unbranched.
  - Many hydroxyl groups ( $-OH$ ) on parallel cellulose molecules can form hydrogen bonds with each other, leading to the formation of strong microfibrils.
  - Digestibility: Enzymes that digest starch by hydrolyzing $\alpha$ linkages cannot hydrolyze the $\beta$ linkages in cellulose due to the structural difference.
    - Humans cannot digest cellulose; it passes through the digestive tract as "insoluble fiber," which aids in digestion.
    - Some microbes (e.g., in the guts of cows, termites) possess enzymes to digest cellulose, forming symbiotic relationships with these herbivores.
- Chitin:
  - Another important structural polysaccharide.
  - Found in the tough exoskeletons of arthropods (insects, crustaceans).
  - Also provides structural support for the cell walls of many fungi.
  - Has practical applications, such as being used to make strong and flexible surgical thread.

Lipids: Diverse Hydrophobic Molecules

Lipids are the only class of large biological molecules that do not include true polymers.
Unifying feature: They mix poorly, if at all, with water; hence, they are hydrophobic.
Their hydrophobicity arises because they consist mostly of hydrocarbons, which form nonpolar covalent bonds.
The most biologically important lipids are fats, phospholipids, and steroids.

Fats

Constructed from two types of smaller molecules:
- Glycerol: A three-carbon alcohol with a hydroxyl group ( $-OH$ ) attached to each carbon.
- Fatty acids: Consist of a carboxyl group ( $-COOH$ ) at one end attached to a long carbon skeleton (hydrocarbon chain).
When forming a fat, three fatty acids are joined to glycerol by ester linkages via dehydration reactions.
The resulting molecule is called a triacylglycerol or triglyceride.
Fats separate from water because water molecules hydrogen-bond to each other, excluding the nonpolar fat molecules.
Fatty acids within a single fat molecule can be all the same or of two or three different kinds.
Variation in Fatty Acids:
- Vary in length (number of carbons).
- Vary in the number and locations of double bonds.
- Saturated Fatty Acids:
  - Have the maximum number of hydrogen atoms possible, meaning no double bonds between carbon atoms in the hydrocarbon chain.
  - Fats made from saturated fatty acids are called saturated fats and are typically solid at room temperature (e.g., most animal fats).
  - A diet rich in saturated fats may contribute to cardiovascular disease through plaque deposits in blood vessels.
- Unsaturated Fatty Acids:
  - Have one or more double bonds between carbon atoms in the hydrocarbon chain.
  - The double bonds often cause kinks or bends in the hydrocarbon chain (specifically, cis double bonds).
  - Fats made from unsaturated fatty acids are called unsaturated fats or oils and are typically liquid at room temperature (e.g., plant fats, fish fats).
- Hydrogenation:
  - A process that converts unsaturated fats to saturated fats by adding hydrogen atoms, thereby breaking double bonds and creating single bonds.
  - This process can also create trans double bonds in unsaturated fats, resulting in trans fats.
  - Trans fats are considered particularly unhealthy and may contribute more to cardiovascular disease than saturated fats.
- Essential Fatty Acids:
  - Certain unsaturated fatty acids that the human body cannot synthesize.
  - Must be obtained from the diet.
  - Omega-$3$ fatty acids are an example, required for normal growth and potentially offering protection against cardiovascular disease.
Major Function of Fats: Energy storage.
- Humans and other mammals store long-term food reserves in adipose cells.
- Adipose tissue also serves to cushion vital organs and insulate the body.

Phospholipids

In a phospholipid, two fatty acids and a phosphate group are attached to glycerol.
Structure:
- The two fatty acid tails are hydrophobic (water-fearing).
- The phosphate group and its attachments form a hydrophilic head (water-loving).
- This amphipathic nature (both hydrophilic and hydrophobic parts) is crucial for their function.
When phospholipids are added to water, they spontaneously self-assemble into double-layered structures called bilayers.
At the surface of a cell, phospholipids are arranged in a bilayer, with the hydrophobic tails pointing towards the interior of the membrane and the hydrophilic heads facing the aqueous environment inside and outside the cell.
The unique structure of phospholipids is fundamental to the formation of cell membranes, and thus, the existence of cells depends on them.

Steroids

Lipids characterized by a carbon skeleton consisting of four fused rings.
Cholesterol:
- A type of steroid that is an essential component in animal cell membranes.
- Serves as a precursor from which other steroids (e.g., sex hormones) are synthesized.
- High levels of cholesterol in the blood may contribute to cardiovascular disease.

Proteins: Diversity of Structures and Functions

Proteins are the most structurally and functionally diverse molecules, accounting for more than $50\%$ of the dry mass of most cells.
Diverse Functions of Proteins:
- Enzymatic proteins: Selective acceleration of chemical reactions (e.g., digestive enzymes catalyzing hydrolysis of food bonds).
- Storage proteins: Storage of amino acids (e.g., casein in milk, ovalbumin in egg white, plant seed proteins).
- Defensive proteins: Protection against disease (e.g., antibodies inactivating viruses and bacteria).
- Transport proteins: Transport of substances (e.g., hemoglobin transports oxygen; membrane proteins transport molecules across cell membranes).
- Hormonal proteins: Coordination of an organism’s activities (e.g., insulin regulates blood sugar concentration).
- Receptor proteins: Response of cells to chemical stimuli (e.g., nerve cell receptors detecting signaling molecules).
- Contractile and motor proteins: Movement (e.g., actin and myosin in muscle contraction; motor proteins responsible for cilia and flagella undulations).
- Structural proteins: Support (e.g., keratin in hair/horns/feathers; silk fibers in cocoons/webs; collagen and elastin in animal connective tissues).
Enzymes are proteins that act as catalysts to speed up specific chemical reactions without being consumed in the process.
- They can perform their functions repeatedly, acting as essential "workhorses" of life.

Amino Acid Polymers (Polypeptides)

All proteins are constructed from the same set of $20$ amino acids.
Polypeptides are unbranched polymers built from these amino acids.
A protein is a biologically functional molecule consisting of one or more polypeptides.
Amino Acid Structure:
- Each amino acid consists of a central $\alpha$ carbon atom bonded to:
  - An amino group ( $NH_2$ )
  - A carboxyl group ( $COOH$ )
  - A hydrogen atom ( $H$ )
  - A variable side chain, designated R group
Classification of Amino Acids by R Group Properties:
- Nonpolar side chains (hydrophobic): e.g., Glycine, Alanine, Valine, Leucine, Isoleucine, Proline, Tryptophan, Phenylalanine, Methionine.
- Polar side chains (hydrophilic): e.g., Serine, Threonine, Cysteine, Tyrosine, Asparagine, Glutamine.
- Electrically charged side chains (hydrophilic):
  - Acidic (negatively charged): e.g., Aspartic acid, Glutamic acid.
  - Basic (positively charged): e.g., Lysine, Arginine, Histidine.
Peptide Bonds:
- Amino acids are linked together by covalent bonds called peptide bonds.
- Formed via a dehydration reaction between the carboxyl group of one amino acid and the amino group of another.
Polypeptide Characteristics:
- Range in length from a few monomers to more than a thousand.
- Each polypeptide has a unique linear sequence of amino acids.
- Has a distinct amino end (N-terminus) with a free amino group and a carboxyl end (C-terminus) with a free carboxyl group.

Protein Structure and Function

The specific activities of proteins are a direct result of their intricate three-dimensional architecture.
A functional protein consists of one or more polypeptides precisely twisted, folded, and coiled into a unique, biologically active shape.
The sequence of amino acids (primary structure) ultimately determines a protein’s 3D structure.
A protein’s structure dictates how it works; its function usually depends on its ability to recognize and bind to some other molecule with specificity (e.g., an antibody binding to a virus protein).

Four Levels of Protein Structure

1. Primary Structure ( $1^{\circ}$ ):
- The unique linear sequence of amino acids in a polypeptide chain.
- This sequence is like the order of letters in a very long word.
- It is determined by inherited genetic information.
2. Secondary Structure ( $2^{\circ}$ ):
- Consists of coils and folds in the polypeptide chain.
- These structures result from hydrogen bonds formed between the repeating constituents of the polypeptide backbone (not the R groups).
- Typical secondary structures include:
  - The $\alpha$ (alpha) helix: a delicate coil, stabilized by hydrogen bonds between every fourth amino acid.
  - The $\beta$ (beta) pleated sheet: a folded, accordion-like structure formed by hydrogen bonds between parallel segments of the polypeptide backbone.
3. Tertiary Structure ( $3^{\circ}$ ):
- The overall three-dimensional shape of a polypeptide.
- Determined by interactions among the R groups (side chains) of the amino acids, rather than interactions between backbone constituents.
- These interactions can include:
  - Hydrogen bonds between polar side chains.
  - Ionic bonds between charged (acidic and basic) side chains.
  - Hydrophobic interactions and van der Waals interactions between nonpolar side chains clustered in the protein's core.
- Strong covalent bonds called disulfide bridges ( $-S-S-$ ) may form between the sulfhydryl groups of two cysteine monomers, further reinforcing the protein’s structure.
4. Quaternary Structure ( $4^{\circ}$ ):
- Arises when a protein consists of two or more polypeptide chains that aggregate to form one functional macromolecule.
- Examples:
  - Collagen: A fibrous protein composed of three polypeptides coiled together like a rope, providing connective tissue strength.
  - Hemoglobin: A globular protein consisting of four polypeptide subunits: two alpha ( $\alpha$ ) chains and two beta ( $\beta$ ) chains, each binding a heme group with iron to transport oxygen.

Sickle-Cell Disease: A Change in Primary Structure

A slight alteration in a protein’s primary structure can profoundly affect its overall structure and biological function.
Sickle-cell disease is an inherited blood disorder.
- It results from a single amino acid substitution (glutamic acid to valine) in the primary structure of the $\beta$ subunit of the protein hemoglobin.
- This change causes hemoglobin molecules to aggregate into fibers under low oxygen conditions, reducing their oxygen-carrying capacity and deforming red blood cells into a sickle shape ( $5 \text{ µm}$ ).

What Determines Protein Structure?

Beyond primary amino acid sequence, physical and chemical conditions significantly affect protein structure.
Denaturation:
- The loss of a protein’s native (biologically active) structure (unraveling).
- Caused by alterations in environmental factors such as:
  - pH (changes in $H^+$ concentration)
  - Salt concentration (disrupting ionic bonds)
  - Temperature (excessive heat can disrupt weak bonds)
  - Exposure to certain chemicals
- A denatured protein is biologically inactive and usually cannot perform its function.
Renaturation: In some cases, if the denaturing agent is removed, a protein may spontaneously refold back into its correct functional shape.

Protein Folding in the Cell

It is challenging to predict a protein’s final $3D$ structure solely from its primary amino acid sequence.
Most proteins likely proceed through several intermediate stages on their way to a stable, functional structure.
Chaperonins (Chaperone Proteins):
- Protein molecules that assist the proper folding of other proteins.
- They provide a protective environment for polypeptide folding, preventing aggregation or incorrect folding.
- Mechanism: An unfolded polypeptide enters a hollow cylinder; a cap attaches, creating a hydrophilic environment inside for folding; the cap comes off, and the properly folded protein is released.
Misfolded proteins are associated with serious diseases, including Alzheimer’s disease, Parkinson’s disease, and mad cow disease.

Methods for Determining Protein Structure:
- X-ray crystallography: Often used to determine the precise three-dimensional structure of a crystallized protein by analyzing the diffraction pattern of X-rays.
- Nuclear Magnetic Resonance (NMR) spectroscopy: Another method that can determine protein structure, which has the advantage of not requiring protein crystallization.
- Bioinformatics: Computational approaches used to predict protein structure from amino acid sequences, especially valuable for large-scale analysis.

Nucleic Acids: Store, Transmit, and Express Hereditary Information

The amino acid sequence of a polypeptide is ultimately programmed by a unit of inheritance called a gene.
Genes are composed of DNA (deoxyribonucleic acid), which is a type of nucleic acid.
The monomers of nucleic acids are nucleotides.

The Roles of Nucleic Acids

There are two main types of nucleic acids:
- Deoxyribonucleic acid (DNA)
- Ribonucleic acid (RNA)
DNA’s Functions:
- Provides directions for its own replication when a cell divides.
- Directs the synthesis of messenger RNA (mRNA).
- Through mRNA, DNA ultimately controls protein synthesis.
This entire process from gene to protein is called gene expression.
Flow of Genetic Information (Central Dogma):
1. DNA synthesis of mRNA (Transcription): A gene along a DNA molecule serves as a template for the synthesis of a complementary mRNA molecule within the nucleus.
2. Movement of mRNA into cytoplasm.
3. mRNA directs synthesis of protein (Translation): The mRNA molecule interacts with the cell’s protein-synthesizing machinery (ribosomes) in the cytoplasm to direct the production of a specific polypeptide.
- Summarized as: $DNA \rightarrow RNA \rightarrow protein$

The Components of Nucleic Acids

Nucleic acids are polymers known as polynucleotides.
Each polynucleotide is made up of monomers called nucleotides.
Each nucleotide consists of three parts:
- A nitrogenous base
- A pentose sugar ( $5$ -carbon sugar)
- One or more phosphate groups
A nucleoside is the portion of a nucleotide without the phosphate group (i.e., nitrogenous base + sugar).
Nitrogenous Bases (two families):
- Pyrimidines: Characterized by a single six-membered carbon-nitrogen ring.
  - Cytosine (C)
  - Thymine (T) (found only in DNA)
  - Uracil (U) (found only in RNA, replaces Thymine)
- Purines: Characterized by a six-membered ring fused to a five-membered ring.
  - Adenine (A)
  - Guanine (G)
Pentose Sugars:
- In DNA: The sugar is deoxyribose (it lacks an oxygen atom on the $2^{\prime}$ carbon compared to ribose).
- In RNA: The sugar is ribose.

Nucleotide Polymers

Nucleotides are linked together to construct a polynucleotide chain.
Adjacent nucleotides are joined by a phosphodiester linkage.
- This linkage consists of a phosphate group that covalently links the $5^{\prime}$ carbon of one sugar to the $3^{\prime}$ carbon of the next sugar.
These links create a repeating sugar-phosphate backbone along the polynucleotide, with the nitrogenous bases as appendages.
A polynucleotide has a distinct $5^{\prime}$ end (with a free phosphate group attached to the $5^{\prime}$ carbon of the sugar) and a $3^{\prime}$ end (with a free hydroxyl group attached to the $3^{\prime}$ carbon of the sugar).
The specific sequence of bases along a DNA or mRNA polymer is unique for each gene and carries the genetic information.

The Structures of DNA and RNA Molecules

DNA (Deoxyribonucleic Acid):
- Typically exists as a double helix, consisting of two polynucleotides spiraling around an imaginary central axis.
- The two sugar-phosphate backbones run in antiparallel directions to each other (one strand runs $5^{\prime} \rightarrow 3^{\prime}$ and the other runs $3^{\prime} \rightarrow 5^{\prime}$ ).
- One DNA molecule includes many genes.
- Complementary Base Pairing: Only specific pairs of nitrogenous bases can form hydrogen bonds and interact with each other in the double helix:
  - Adenine (A) always pairs with Thymine (T).
  - Guanine (G) always pairs with Cytosine (C).
- This complementary base pairing is essential for replicating DNA into two identical copies when a cell prepares to divide.
RNA (Ribonucleic Acid):
- In contrast to DNA, RNA molecules are generally single-stranded.
- Base Pairing in RNA: While single-stranded, complementary pairing can still occur between regions of the same RNA molecule (forming complex $3D$ shapes, e.g., in transfer RNA, tRNA) or between two different RNA molecules.
- In RNA, Uracil (U) replaces Thymine (T), so Adenine (A) pairs with Uracil (U).
- RNA molecules are more variable in form and can adopt diverse three-dimensional structures depending on their function.

Genomics and Proteomics: Transforming Biological Inquiry

Following the elucidation of DNA structure and its relationship to amino acid sequences, biologists began decoding genes by determining their base sequences.
The first chemical techniques for DNA sequencing were developed in the $1970s$ and refined over the next $20$ years.
The ability to sequence the full complement of DNA in an organism’s genome (all of its genetic material) has been immensely enlightening.
- The rapid development of faster and less expensive sequencing methods was significantly spurred by the Human Genome Project.
- Many genomes have now been sequenced, generating vast amounts of biological data.
Bioinformatics: An interdisciplinary field that utilizes computer software and other computational tools to manage, analyze, and interpret the enormous datasets generated from sequencing many genomes.
Genomics: The systematic study and comparison of large sets of genes, or even entire genomes, of different species.
Proteomics: A similar large-scale analysis of sets of proteins, including their sequences, structures, and functions.

DNA and Proteins as Tape Measures of Evolution

The sequences of genes and their protein products serve as a molecular record, documenting the hereditary background of an organism.
Linear sequences of DNA molecules are faithfully passed from parents to offspring.
The concept of tracing "molecular genealogy" can be extended to understand the evolutionary relationships between different species by comparing their DNA and protein sequences.
Molecular biology, through genomics and proteomics, has added a powerful new measure to the toolkit of evolutionary biology, allowing for precise comparisons at the molecular level.