We are able to generate a sizeable table that displays domain utilization for each organism for whom the sequence of its genome is available. For example, the human genome has the DNA sequences for about 1000 immunoglobulin domains, 500 protein kinase domains, 250 DNA-binding homeodomains, 300 SH3 domains, and 120 SH2 domains. Additionally, the human genome contains the DNA sequences for 300 SH3 domains and 120 SH2 domains. In addition, we find that the same domain pairs typically occur in the same relative order inside of a protein, and that more than two-thirds of all proteins contain two or more domains. Moreover, we find that the same domains frequently occur in pairs. Despite the fact that archaea, bacteria, and eukaryotes share 50% of all domain families, only around 5% of the possible combinations of two domains are similarly shared by these three kingdoms. This trend demonstrates that domain shuffling took place extremely late in the evolutionary process of the vast majority of proteins that have particularly efficient two-domain combinations.
As a direct result of the sequencing of the human genome, it came as a surprise to realize that our chromosomes only contain approximately 21,000 genes that code for proteins. This information was previously unknown. Based only on this figure, it would appear that humans are not more complicated than the tiny mustard weed called Arabidopsis, and that we are just somewhat more complicated than a nematode worm. The sequencing of the genomes also reveal that only seven percent of the protein domains found in humans are unique to vertebrates. This suggests that almost all of the protein domains found in vertebrates have been inherited from invertebrates.
There are roughly twice as many unique protein domain combinations found in human proteins as there are in the proteins of worms or flies. This is because of the domain shuffles that took place during the evolution of vertebrates. For example, the trypsin-like serine protease domain is found to be covalently linked to at least 18 different protein domains in human proteins, whereas in worm proteins, this domain is only found to be joined to 5 different protein domains. This difference in the number of domains to which it is joined can be attributed to the fact that human proteins are more complex. It is not known how the additional variety that exists in our proteins affects the human nature or how we operate as a species because this variety enables a much wider variety of protein-protein interactions to take place. However, this variety does enable a much wider variety of protein-protein interactions to take place.
The complexity of living things is mind-boggling, and it is very disheartening to realize that for over 10,000 of the proteins that have been discovered so far through the study of the human genome, we do not currently have even the slightest hint as to what the function might be. This is a situation that is quite depressing. It is without a doubt that the next generation of cell biologists will have a great deal of difficulty in overcoming a variety of challenging obstacles, in addition to exciting riddles to solve.
A protein chain can fold into a certain form thanks to the same weak noncovalent interactions that allow the chain to fold into that shape. These interactions also allow proteins to bind to one another to build larger structures inside the cell. Any region on the surface of a protein that is capable of interacting with another molecule through clusters of noncovalent connections is referred to be a binding site. It's possible for a protein to contain binding sites for both large and small molecules. A larger protein molecule that has its geometry fully defined is generated when two folded polypeptide chains are tightly bonded together at a binding site that recognizes the surface of another protein. This results in the formation of a protein complex. Each individual polypeptide chain that is comprised of a protein is referred to as a protein subunit.
The development of a dimer is the consequence of two identical folded polypeptide chains joining to each other in a "head-to-head" position. A dimer is a symmetric complex consisting of two protein subunits and is formed when the two chains link in this position. The interactions that take place between these two binding sites that are identical to one another are what maintain the integrity of this dimer.
In addition, many cells have an abundance of other types of symmetric protein complexes. These complexes are made up of several copies of the same polypeptide chain.
A large number of the proteins that can be found in cells each contain two or more distinct types of polypeptide chains. Hemoglobin, the protein that transports oxygen in red blood cells, is made up of a symmetrical arrangement of four identical globin subunits, two of which are -globin subunits and two of which are -globin subunits. These multisubunit proteins can be rather big and are found in many different types of cellular components.
The vast majority of the proteins that we have discussed up until this point are globular proteins. In these proteins, the polypeptide chain condenses into a small, compact shape that resembles a ball with a rough surface. In spite of this, certain of these protein molecules can assemble into filaments that can run the length of a cell and even beyond it. If each protein molecule in the chain contains a binding site that is complementary to another area of its surface, then the process of producing a long chain of identical protein molecules will be much simpler. One example is the long, helical structure called an actin filament, which is composed of multiple molecules of the actin protein. Actin, a globular protein, is a component of one of the primary filament systems of the cytoskeleton. It is especially abundant in eukaryotic cells.
Despite the fact that many enzymes are enormous, complicated proteins that are comprised of a great number of subunits, the majority of enzymes have a spherical overall shape. A globular protein has the ability to combine with other proteins to form long filaments. However, in order to carry out a number of their activities, proteins require that their monomers travel vast distances. These proteins are usually referred to as fibrous proteins, and their typical three-dimensional structure is an uncomplicated, elongated chain.
The -keratin and its cousins are members of a large family of intracellular fibrous proteins. We were initially introduced to this family when we were discussing the helix structure. Keratin filaments, which are the primary constituent of long-lasting structures such as hair, horn, and nails, are extremely resilient. A -keratin dimer is composed of two identical subunits, and the long helices of each subunit combine to produce a coiled-coil structure.
There is a globular domain that contains binding sites at each end of the sections that are coiled-coil shaped. The formation of intermediate filaments, an essential component of the cytoskeleton that is responsible for establishing the internal structural framework of the cell, is made possible as a result of this set of proteins being able to come together and form intermediate filaments.
Fibrous proteins are especially abundant outside of the cell, where they are an essential component of the extracellular matrix. This matrix is a sort of gel that plays a vital role in the process of tissue formation by helping to bind together groups of cells. Proteins that are part of the extracellular matrix are secreted by cells into the surrounding environment, where they typically take the form of sheets or long fibrils. Collagen is the type of protein that is found in the greatest abundance in animal tissues. Glycine, a nonpolar amino acid, is inserted into the three lengthy polypeptide chains that comprise a collagen molecule at every third position. These chains make up the collagen molecule. Due to the chain's regular structure, it is possible for the chains to wind around and around each other to form a long, regular triple helix. Connective tissues get their tensile strength from highly tough collagen fibrils, which are formed when several collagen molecules attach to one another both side by side and end to end to form long arrays that overlap. Collagen fibrils are responsible for the elasticity of connective tissues.