1/74
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What were the stated goals of the project?
- Identify all the genes in the human nuclear dna
- Determine the sequences of the 3x109 chemical base pairs that make up human DNA ( estimated initial timespan 15 years; 3 billion cost) – initially focusing in euchromatin
- provide information to public databases
Why is studying the genome useful?
Sequence of DNA bases and features such as GC content and CpG content:
- Prediction and annotation of genes
- identification of CpG islands
DNA methylation status and histone modification status:
- gives information about the status and function of the underlying DNA sequence
Compare normal and disease states
- genetic and epigenetic
What is the shotgun method? in terms of DNA sequence assembly
Involves: dna being fragmented at random position, so by chance many of the dna sequneces will overlap and these small pieces are cloned and sequenced, and then a the whole sequence is produced computationally
What are markers?
Markers are ‘anonymous’ sequneces ( i.e not known genes ) whose location is already known ( their location was classically determined by linkage analysis in family studies – i.e classic pedigree analysis )
What are restriction fragment length polymorphisms?
Restriction fragment length polymorphisms (RFLPs) are variations in the length of DNA fragments generated by the digestion of DNA with restriction enzymes
What are polymorphic sites?
are restriction sites that could be different between alleles ( different between the maternal and paternal alleles or between different people )
What are microsatellites?
microsatellites are repeats of short nucleotide sequences these are distributed throughout the genome.
The repeat units are 1-4 nucleotides long
How can you sequence large genome ( if shotgun sequencing isnt working )?
clone contig approach
due to the large size of the human genome and the technical difficulty in overlapping so many fragments, approaches to break this down into smaller "chunks" were developed.
a library of (large-insert) clones is made, and these are assembled into overlaps that form a contiguous stretch of DNA
What is flow cytometry? Describe the process
this process separates individual chromosomes
How is this done:
- Condense chromosomes are opened up
- Stained
- Passed through a fine aperture ( a thin hole ) so only chromosome passes at a time
- Fluorescence level of each chromomsome is checked, if they have a certain flueorescence a charge is applied to them
They are then passed through deflecting plates and separated out
What is sanger di deoxy method? When was it used?
Sanger di-deoxy method - this gives long sequence reads as is required when sequencinga genome "de novo"
it was use to sequnece a genome sequence containing a mixture of sequences from different individuals - i.e. it was a composite
What is chromatin immunoprecipitation? What can it be used for?
look at protein-dna interactions in chromatin
it can be used to:
- analyse histone modification patterns
- or to identify regions of the genome thatare bound by a specific transcription factor, or regionsof chromatin that are associated with a specifichistone tail modification
Describe the process of chromatin immunoprecipitation?
Process:
- cells are treated with formaldehyde to cross-link the proteins to the genomic DNA; chromatin is isolated, and the DNA sheared into small pieces by sonication.
- A,B,C could be a specific transcription factor or a specific modified histone
- the mixture is incubated with an antibody that recognises the specific protein
- these specific pieces of DNA will then be amplified
As a result you end up identifying which DNA sequences are bound by a specific transcription factor or are associated with a specific modified histone
How were protein coding genes predicted and annotated?
Gene annotation:
Known genes:
- gene has already been described and sequenced
- cDNA sequences
- expressed sequence Tags (ESTs)
Gene prediction ‘ab intio’
- homology searching to identify evolutionarily conserved sequencesfind novel genes by screening for particular characteristics of genes
- CpG islands and base composition and Histone modification patterns – methylation of H3K4me3
What sequence does translation initiate at?
The initiation of the translation always starts as an AUG sequence, the initiation of translation occurs most frequently when the AUG is found with a set consense of nucleotides
How can transcription start sites be identified?
the transcription start sites of many genes are associated with CpG islands - the identification of CpG islands is therefore highly informative
What factors contribute to gene length?
The variation in length between individual human genes is due to variation in the number of exons ( but not exon length ) and to variation in the length of the introns
What are gene families? What are unique genes? How do they change as genome size increases
Gene families:
- contain genes with related function; evolved by gene duplication
Unique genes:
- genes that are not a member of a gene family
in higher eukaryotes/ as genome size increases:
- in higher eukaryotes, the total number of genes increases and is highly variable, but the number of gene families does not change much
- the number of gene families plateaus with genome size
The proportion of unique genes decreases with increasing genome size and the proportion of genes in families increases
Name characteristic features of higher eukaryotic genomes?
Two characteristic features of higher eukaryotic genomes are the presence of introns and the presence of repetitive DNA
How can unequal crossing over occur?
If not aligned properly:
- they can misalign and recombination can occur, leading to duplication
What are orthologs?
- Genes present in different species; common ancestor predate the split between the species
What are paralogs?
Genes present in the same species, often as members of multigene family; common ancestor possibly or possibly not predates the species in which the genes are not found
How are genes classed into a gene family? How does this similarity between genes occur?
Closely related members have a high degree of sequence homology over most of the coding sequence
- happens when: recombination between repeat sequences located between individual genes
or
members of the gene family are related by the presence of a common encoded protein domain; other parts of the gene may be very different
- happens when: recombination between repeat sequences located within introns
What can aberrant recombination lead to?
recombination between introns can lead to duplication of domains and domain shuffling ( page 20 to see diagram )
aberrant recombination within introns leads to exon duplication
Aberrant recombination within introns generates novel genes
What do does exon duplication lead to?
Alternative splicing:
- following exon duplication, the new domains may evolve different roles - for example, individual domains may become adapted for functioning in specific tissues.
- the gene may then evolve such that is spliced differently in different tissues.
What is acrocentric chromosome?
Acrocentric chromosomes – when the chromosomes is near the end
An acrocentric chromosomes centromere is situated so that one of the chromosome arms is much shorter than the other
What is a nuclear organiser?
a cluster of tandemly repeated rRNA genes on one chromosome region
Associated with the nucleolus
What happened in the nucleolus? in terms of rRNA synthesis
Nucleolus:
- Fibrillar core: rRNA transcription from rDNA template
- granular cortex: ribonucleoprotein particles into which rRNA is assembled
- other RNA: protein complexes are also assembled in the nucleolus
transcription of rDNA and assembly of ribonucleoprotein particles takes place in different regions of the nucleolus
What are the two classes of repetitive DNA? Describe their roles
Two classes of repetitive DNA: interspersed and tandem
Tandem repeats:
In which the repeats are organized next to each other
Interspersed repeats ( also known as transposon repeats ):
In which the repeats are distributed through the genome
Give examples of tandemly repeated DNA?
satellite DNA
minisatellite DNA
microsatellite DNA
Give examples of interspersed/dispersed repeats?
LINES
SINES
retrovirus-like (LTR transposons)
DNA transposon fossils
What is satellite DNA?
it is tandemly repeated DNA
Satellite DNA comprises constitutive heterochromatin
Comprises enormous arrays of DNA repeats whose base composition is very different from that of bulk genomic DNA
the classic satellite bands I, II and III have a buoyant density that is different from that of the "main" band of genomic DNA - this is a reflection of their different base composition
satellites are the largest type of tandem repeat – the size of each unit of the repeat is very variable, and each satellite band (on the centrifugation) is made up of several types of repeat
Where can satellite DNA be found on the chromosome?
satellite DNA is found in large tracts at the centromeres
on the short arms of the acrocentric chromosomes; there are additional blocks that are close to the centromeres (i.e. known as pericentric) on some chromosomes, at the telomeres, and on most of the Y chromosome
What are mini and micro satellites? What are their properties?
are two types of repeated DNA
- mini - and microsatellites are much smaller than satellites, both with respect to the size of the repeatunit and the size of the overall array.
- the base composition of mini- and microsatellites is not very different from bulk genomic DNA
- these classes of repeat are highly polymorphic - i.e. the length of the array of a given repeat at a specific genomic location will frequently vary between individuals, and there are many possibilities for its overall size (e.g. it could contain 10, 13, 19 repeats etc)
Often called variable number tandem repeat (VNTR) regions
What is array length?
Total array length is the section of the genome that contains gene/repeats
Tandemly repetitive DNA is classified according to the size of the repeat array
On a chromosome what locations are specfic tandem repeats associated with? such as satellite dna
satellites dna is associated with all centromeres ( centromeric repeats ), the satellite dna is also found close to centre ( these are known as pericentric repeats )
mini satellite dna is associated with the telomeric region ( these are known as the telomeric repeats ), mini satellite dna that near the telomeric region are known as subtelomeric repeats can be fouind on the short arm of acrocentric chromosomes
can be found on the short arm of a acrocentric chromosome
What are the properties associated with certain regions of a chromosome? in terms of repetitive DNA
-repeats at the centromeres are very important for chromosome stability, and repeats at the telomeres have a specific function in maintaining the integrity of the ends of linear chromosomes
-the repetitive DNA at subtelomeric and pericentric regions is very prone to recombination - genes located in these regions have become duplicated and distributed to other regions
What transposons?
Transposons = DNA sequences that can change their position within the genome
How are transposons and interspersed repeats linked?
Interspersed repeats are derived from transposons = transposable elements
What are the 2 groups of transposons? in terms of method transposition
Transpososns are organized into 2 groups depending on the method of tansposition
Retrotransposons = retroposons
DNA transposons
What are retrotransposons? What is its function and how does it do it?
Retrotransposons = retroposons
- reverse transcriptase converts an RNA transcripts into cDNA that then integrates into genomic DNA at different location
most of the transposons in the human genome "move" via an RNA intermediate
in this situation, the original transposon is still in its original position, and a new one is "gained" at another position.
copy and paste
What are DNA transposons? What is their function and how do they do it?
DNA transposons:
- Migrate directly without copying of the sequence
- Cut-and-paste mechanism
the DNA transposons simply move from one position to another by "cut and paste" – the transposition is removed from its original position and it then integrates elsewhere in the genome
What is complementary DNA?
Complementary dna is dna synthesized from a single stranded RNA ( e.g mRNA ) template in a reaction catalysed by the enzyme reverse transcriptase
What is a retrovirus?
Retrovirus - any of a family of RNA viruses that have an enzyme (reverse transcriptase) capable of making a complementary DNA copy of the viral RNA, which then is integrated into a host cell’s DNA
What is an endonuclease?
Endonuclease – enables the section of dna that has been copied by reverse transcriptase to by inserting into a new position in the genome. It does it by cleaving the phosphodiester bond within a polynucleotide chain (namely DNA or RNA)
What is an alternative way transposons may be classified? What are the differences?
Autonomous transposable elements
- Can transpose independently
- some transposons encode the enzymatic machinery required for their transposition (autonomous)
Nonautonomous transposable elements
- Cannot transpose independently
these transposons use the enzymatic machinery provided from other sources (e.g. by an autonomous transposon)
Give examples of types of retrotransposon?
4 classes of human transposon repeat
LTR = long terminal repeat
LINE = long interspersed nuclear element
SINE = short interspersed nuclear element
P = transcriptional promoter
Why are the cis-acting nucelotide sequences of the transposon important?
- the cis-acting nucleotide sequences of the transposon that are required to enable it to move - for example, in the case of the retrotransposons, the LTR and the promoter direct transcription of the initial RNA (for the LTR- and Poly A transposons respectively).
- if a transposon acquires defects in any of these cis-acting regions (e.g. point mutations, deletions) they will no longer be able to transpose.
Give examples of the types of retrotransposons that are non-autonomous?
not all classes of transposon encode enzymatic activity (LTR elements and SINEs) – these use the enzymatic activity encoded by other transposons
What are human LTR transposons?
autonomous and non-autonomous retro-virus like elements
flanked by long terminal repeats containing necessary transcriptional regulatory elements
What is a Human endogenous retroviral sequences ( HERV )?
is a type of LTR transposon
contain gag and pol genes
encode protease, reverse transcripase, RNaseH and integrase
can transpose independently ( autonomous )
mostly defective
What are long terminal repeats?
LTR = long terminal repeat sequence that acts as the transcriptional promoter
What is a “full length” HERV similiar to?
"full length" versions encode reverse transcriptase (pol gene), and have the genome organisation of a typical retrovirus
Human possess retroviruses that exist in two forms. What are they?
as normal genetic elements in their chromosomal DNA (endogenous retroviruses)
as horizontally-transmitted infectious RNA-containing viruses which are transmitted from human-to-human (exogenous retroviruses, e.g. HIV and human T cell leukemia virus, HTLV)
Give an example of another type of LTR transposons?
an example of a LTR transposon
human non-autonomous retrovirus like elements
lack pol gene and often also gag
is very truncated
Whats different about poly A transposons?
most transposons are species-specific - i.e. they have a short "lifespan" - this contrasts the polyA transposons found in the human genome
What are poly A transposons?
Poly A transposons, also known as polyadenylated transposons, are a type of transposable element that contains a polyadenylated tail at their 3' end
cut and paste mechanism
autonomous
Give an example of a poly A transposon?
the largest of the polyA transposons are the LINE elements.
these have a promoter, a poly A sequence and two open reading frames - in many ways they look like a "regular" cellular gene.
the ORF2 encodes the enzymatic activity required for transposition (reverse transcriptase and endonuclease).
Why are lines , particularly line 1 the most important human transposons?
- of the three families of LINEs, LINE-1 is the only family still transposing.
- LINE-1 i: not only can it transpose "itself", but it provides the transposition machinery (reverse transcriptase and endonuclease) for the transposition of non-autonomous polyA transposons (SINEs), andalso sometimes leads to the transposition of cellular RNAs.
- As a result LINEs are very relevant both in the evolution of our genomes and in disease
What is a promoter?
Promoter is a sequence of DNA to which proteins bind to initiate transcription of a single RNA transcript from the DNA downstream of the promoter.
Within a LINE. Why is the poly A sequence important?
the poly A sequence is important not only in determining the end of the transcript, but also has relevance in the integration process
Describe the propagation of LINES? in detail
Propagation of LINES – 1: LINE mRNA associates with ORF1 and ORF2 proteins
- it is important to note that the transcript (LINE mRNA) contains the promoter (P) nucleotide sequences (which are downstream from the transcription start site itself).
- the ORF1 and ORF2 proteins bind to the LINE mRNA near the polyA sequence at the 3' end.
- these proteins preferentially bind the transcript from which they were translated
Propagation of LINEs – 2: Target site recognition and cleavage
- the ribonucleoprotein moves to the nucleus, and interacts with an AT-rich sequence in the genome.
- the endonuclease (encoded by the LINE) nicks the genomic DNA (adjacent to the Ts in the bottom strand of the schematic).
- LINEs preferentially integrate in AT-rich regions – since AT-rich DNA is gene-poor, therefore LINEs do not in general cause too many mutations
Propagation of LINEs – 3: Reverse Trasncription of LINE mRNA and integration
- the reverse transcriptase activity (encoded by the LINE) then copies the RNA into cDNA (it is primed by the genomic DNA bound at the "nick"), and this cDNA is then integrated into the genome
How is a “truncated” transposon formed?
often the reverse transcriptase does not extend to the very 5' end of the LINE mRNA, resulting "truncated" transposons - since the promoter is located at the 5'end, this sequence is therefore not incorporated into the translocated LINE, and this new copy cannot therefore "re-transpose".
What is a 3’ poly A sequence?
it is a sequence of AAAA ( like seen on the diagram ) that follow a 3’ ( 3 prime untranslated region )
What is the function of a poly A site?
a poly A site to direct ORF1/2 binding and integration
What are the feature of the Pol 2 cellular trascript transcribed by LINE machinery?
The transcript is known as a processed pseudogenes since it has no introns
It is characterised by a lack of promoter nucleotide sequence and no introns
Describe the process of transposition by the LINE machinery? for a Pol II-transcribed cellular transcript
- the LINE machinery binds to the spliced transcript and inserts it into a new genomic location.
- the translocated cellular gene has therefore lost its introns and is known as a processed pseudogene.
- the translocated gene no longer has its promoter - it cannot therefore be re-transposed because it cannot be expressed into RNA.
What is a retrogene?
sometimes the LINE-generated copy of the cellular gene happens by chance to integrate downstream from a cellular promoter that can drive expression of the processed gene copy.
in this situation, the transposed intron less gene is expressed, and known as a retrogene.
What are SINEs?
SINEs are 100-400 bps
non autonomous poly A retrotransposons
do not encode proteins and cannot transpose independently
SINEs can be mobilised by neighbouring lines as they both share sequneces the 3 prime ends
How are SINEs transcribed?
SINEs originated from cDNA copies of genes transcribed by RNA polymerase III
Why can full length SINEs be re-transposed?
- polymerase III-transcribed genes usually have an internal promoter (located downstream from the transcription start site),and such sequences are therefore present in transposed "full length" copies of SINEs.
- full-length SINEs can therefore be re-transposed (using LINE-encoded transposition machinery) because the transposed copy contains a promoter and can therefore be expressed (contrast transposed copies of cellular transcripts transcribed by RNA polymerase II).
Give an example of SINE sequence?
the most important SINE sequences are the Alu family of repeats - there are over a million copies of Alu elements in the human genome
What is an Alu sequence? structure and functions?
a SINE sequence
function:
most contain a recognition site for the restriction enzyme Alul
Alu elements act as a site for recombination
structure:
comprises two tandem repeats
the Alu repeat is so-named because it contains sequence that is recognised by the restriction enzyme AluI.
many Alu repeats are still capable of being transposed because they contain an internal promoter (boxes A and B) and are transcribed by RNA polymerase III.
What are the properties and genomic location of Alu repeats?
- Alu repeats are very abundant in the human genome- they are only found in primates.
- Alu sequences are GC-rich, and most Alu repeats are found in GC-rich euchromatin (contrast LINEs)
- Alu repeats are sometime transcribed when the organism is under stress
- Alu repeats do not generally damage the "host" gene at their site of insertion