Human genome project

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/74

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

75 Terms

1
New cards

What were the stated goals of the project?

-        Identify all the genes in the human nuclear dna

-        Determine the sequences of the 3x109 chemical base pairs that make up human DNA ( estimated initial timespan 15 years; 3 billion cost) – initially focusing in euchromatin

-        provide information to public databases

2
New cards

Why is studying the genome useful?

Sequence of DNA bases and features such as GC content and CpG content:

-        Prediction and annotation of genes

-        identification of CpG islands

DNA methylation status and histone modification status:

-        gives information about the status and function of the underlying DNA sequence

Compare normal and disease states

-        genetic and epigenetic

3
New cards

What is the shotgun method? in terms of DNA sequence assembly

Involves: dna being fragmented at random position, so by chance many of the dna sequneces will overlap and these small pieces are cloned and sequenced, and then a the whole sequence is produced computationally

4
New cards

What are markers?

Markers are ‘anonymous’ sequneces ( i.e not known genes ) whose location is already known ( their location was classically determined by linkage analysis in family studies – i.e classic pedigree analysis )

5
New cards

What are restriction fragment length polymorphisms?

Restriction fragment length polymorphisms (RFLPs) are variations in the length of DNA fragments generated by the digestion of DNA with restriction enzymes

6
New cards

What are polymorphic sites?

are restriction sites that could be different between alleles ( different between the maternal and paternal alleles or between different people )

7
New cards

What are microsatellites?

microsatellites are repeats of short nucleotide sequences these are distributed throughout the genome.

The repeat units are 1-4 nucleotides long

8
New cards

How can you sequence large genome ( if shotgun sequencing isnt working )?

clone contig approach

due to the large size of the human genome and the technical difficulty in overlapping so many fragments, approaches to break this down into smaller "chunks" were developed.

a library of (large-insert) clones is made, and these are assembled into overlaps that form a contiguous stretch of DNA

9
New cards

What is flow cytometry? Describe the process

this process separates individual chromosomes

How is this done:

-        Condense chromosomes are opened up

-        Stained

-        Passed through a fine aperture ( a thin hole ) so only chromosome passes at a time

-        Fluorescence level of each chromomsome is checked, if they have a certain flueorescence a charge is applied to them

They are then passed through deflecting plates and separated out

10
New cards

What is sanger di deoxy method? When was it used?

Sanger di-deoxy method - this gives long sequence reads as is required when sequencinga genome "de novo"

it was use to sequnece a genome sequence containing a mixture of sequences from different individuals - i.e. it was a composite

11
New cards

What is chromatin immunoprecipitation? What can it be used for?

look at protein-dna interactions in chromatin

it can be used to:

-        analyse histone modification patterns

-        or to identify regions of the genome thatare bound by a specific transcription factor, or regionsof chromatin that are associated with a specifichistone tail modification

12
New cards

Describe the process of chromatin immunoprecipitation?

Process:

-        cells are treated with formaldehyde to cross-link the proteins to the genomic DNA; chromatin is isolated, and the DNA sheared into small pieces by sonication.

-        A,B,C could be a specific transcription factor or a specific modified histone

-        the mixture is incubated with an antibody that recognises the specific protein
-  these specific pieces of DNA will then be amplified

As a result you end up identifying which DNA sequences are bound by a specific transcription factor or are associated with a specific modified histone

13
New cards

How were protein coding genes predicted and annotated?

Gene annotation:

Known genes:

-        gene has already been described and sequenced

-        cDNA sequences

-        expressed sequence Tags (ESTs)

Gene prediction ‘ab intio’

-        homology searching to identify evolutionarily conserved sequencesfind novel genes by screening for particular characteristics of genes

-        CpG islands and base composition and Histone modification patterns – methylation of H3K4me3

14
New cards

What sequence does translation initiate at?

The initiation of the translation always starts as an AUG sequence, the initiation of translation occurs most frequently when the AUG is found with a set consense of nucleotides

15
New cards

16
New cards

How can transcription start sites be identified?

the transcription start sites of many genes are associated with CpG islands - the identification of CpG islands is therefore highly informative

17
New cards

What factors contribute to gene length?

The variation in length between individual human genes is due to variation in the number of exons ( but not exon length ) and to variation in the length of the introns

18
New cards

What are gene families? What are unique genes? How do they change as genome size increases

Gene families:

-        contain genes with related function; evolved by gene duplication

Unique genes:

-        genes that are not a member of a gene family

 in higher eukaryotes/ as genome size increases:

-        in higher eukaryotes, the total number of genes increases and is highly variable, but the number of gene families does not change much

-        the number of gene families plateaus with genome size

The proportion of unique genes decreases with increasing genome size and the proportion of genes in families increases

19
New cards

Name characteristic features of higher eukaryotic genomes?

Two characteristic features of higher eukaryotic genomes are the presence of introns and the presence of repetitive DNA

20
New cards

How can unequal crossing over occur?

If not aligned properly:

-        they can misalign and recombination can occur, leading to duplication

 

21
New cards

What are orthologs?

-        Genes present in different species; common ancestor predate the split between the species

22
New cards

What are paralogs?

Genes present in the same species, often as members of multigene family; common ancestor possibly or possibly not predates the species in which the genes are not found

23
New cards

How are genes classed into a gene family? How does this similarity between genes occur?

Closely related members have a high degree of sequence homology over most of the coding sequence

-        happens when: recombination between repeat sequences located between individual genes

or

members of the gene family are related by the presence of a common encoded protein domain; other parts of the gene may be very different

-        happens when: recombination between repeat sequences located within introns

24
New cards

What can aberrant recombination lead to?

recombination between introns can lead to duplication of domains and domain shuffling ( page 20 to see diagram )

aberrant recombination within introns leads to exon duplication

Aberrant recombination within introns generates novel genes

25
New cards

What do does exon duplication lead to?

Alternative splicing:

 

-        following exon duplication, the new domains may evolve different roles - for example, individual domains may become adapted for functioning in specific tissues.

-        the gene may then evolve such that is spliced differently in different tissues.

26
New cards

What is acrocentric chromosome?

Acrocentric chromosomes – when the chromosomes is near the end

An acrocentric chromosomes centromere is situated so that one of the chromosome arms is much shorter than the other

27
New cards

What is a nuclear organiser?

  • a cluster of tandemly repeated rRNA genes on one chromosome region

  • Associated with the nucleolus

28
New cards

What happened in the nucleolus? in terms of rRNA synthesis

Nucleolus:

-        Fibrillar core: rRNA transcription from rDNA template

-        granular cortex: ribonucleoprotein particles into which rRNA is assembled

-        other RNA: protein complexes are also assembled in the nucleolus

  • transcription of rDNA and assembly of ribonucleoprotein particles takes place in different regions of the nucleolus

29
New cards

What are the two classes of repetitive DNA? Describe their roles

Two classes of repetitive DNA: interspersed and tandem

Tandem repeats:

In which the repeats are organized next to each other

Interspersed repeats ( also known as transposon repeats ):

In which the repeats are distributed through the genome

30
New cards

Give examples of tandemly repeated DNA?

satellite DNA

minisatellite DNA

microsatellite DNA

31
New cards

Give examples of interspersed/dispersed repeats?

LINES

SINES

retrovirus-like (LTR transposons)

DNA transposon fossils

32
New cards

What is satellite DNA?

it is tandemly repeated DNA

Satellite DNA comprises constitutive heterochromatin

Comprises enormous arrays of DNA repeats whose base composition is very different from that of bulk genomic DNA

the classic satellite bands I, II and III have a buoyant density that is different from that of the "main" band of genomic DNA - this is a reflection of their different base composition

satellites are the largest type of tandem repeat – the size of each unit of the repeat is very variable, and each satellite band (on the centrifugation) is made up of several types of repeat

33
New cards

Where can satellite DNA be found on the chromosome?

satellite DNA is found in large tracts at the centromeres

on the short arms of the acrocentric chromosomes; there are additional blocks that are close to the centromeres (i.e. known as pericentric) on some chromosomes, at the telomeres, and on most of the Y chromosome

34
New cards

What are mini and micro satellites? What are their properties?

are two types of repeated DNA

- mini - and microsatellites are much smaller than satellites, both with respect to the size of the repeatunit and the size of the overall array.

- the base composition of mini- and microsatellites is not very different from bulk genomic DNA

- these classes of repeat are highly polymorphic - i.e. the length of the array of a given repeat at a specific genomic location will frequently vary between individuals, and there are many possibilities for its overall size (e.g. it could contain 10, 13, 19 repeats etc)

Often called variable number tandem repeat (VNTR) regions

35
New cards

What is array length?

Total array length is the section of the genome that contains gene/repeats

Tandemly repetitive DNA is classified according to the size of the repeat array

36
New cards

On a chromosome what locations are specfic tandem repeats associated with? such as satellite dna

satellites dna is associated with all centromeres ( centromeric repeats ), the satellite dna is also found close to centre ( these are known as pericentric repeats )

mini satellite dna is associated with the telomeric region ( these are known as the telomeric repeats ), mini satellite dna that near the telomeric region are known as subtelomeric repeats can be fouind on the short arm of acrocentric chromosomes

can be found on the short arm of a acrocentric chromosome

37
New cards

What are the properties associated with certain regions of a chromosome? in terms of repetitive DNA

-repeats at the centromeres are very important for chromosome stability, and repeats at the telomeres have a specific function in maintaining the integrity of the ends of linear chromosomes

 

-the repetitive DNA at subtelomeric and pericentric regions is very prone to recombination - genes located in these regions have become duplicated and distributed to other regions

38
New cards

What transposons?

Transposons = DNA sequences that can change their position within the genome

39
New cards

How are transposons and interspersed repeats linked?

Interspersed repeats are derived from transposons = transposable elements

40
New cards

What are the 2 groups of transposons? in terms of method transposition

Transpososns are organized into 2 groups depending on the method of tansposition

Retrotransposons = retroposons

DNA transposons

41
New cards

What are retrotransposons? What is its function and how does it do it?

Retrotransposons = retroposons

-        reverse transcriptase converts an RNA transcripts into cDNA that then integrates into genomic DNA at different location

most of the transposons in the human genome "move" via an RNA intermediate

in this situation, the original transposon is still in its original position, and a new one is "gained" at another position.

copy and paste

42
New cards

What are DNA transposons? What is their function and how do they do it?

DNA transposons:

-        Migrate directly without copying of the sequence

-        Cut-and-paste mechanism

the DNA transposons simply move from one position to another by "cut and paste" – the transposition is removed from its original position and it then integrates elsewhere in the genome

43
New cards

What is complementary DNA?

Complementary dna is dna synthesized from a single stranded RNA ( e.g mRNA ) template in a reaction catalysed by the enzyme reverse transcriptase

44
New cards

What is a retrovirus?

Retrovirus - any of a family of RNA viruses that have an enzyme (reverse transcriptase) capable of making a complementary DNA copy of the viral RNA, which then is integrated into a host cell’s DNA

45
New cards

What is an endonuclease?

Endonuclease – enables the section of dna that has been copied by reverse transcriptase to by inserting into a new position in the genome. It does it by cleaving the phosphodiester bond within a polynucleotide chain (namely DNA or RNA)

46
New cards

What is an alternative way transposons may be classified? What are the differences?

Autonomous transposable elements

-        Can transpose independently

-        some transposons encode the enzymatic machinery required for their transposition (autonomous)

Nonautonomous transposable elements

-        Cannot transpose independently

these transposons use the enzymatic machinery provided from other sources (e.g. by an autonomous transposon)

47
New cards

Give examples of types of retrotransposon?

4 classes of human transposon repeat

LTR = long terminal repeat

LINE = long interspersed nuclear element

SINE = short interspersed nuclear element

P = transcriptional promoter

48
New cards

Why are the cis-acting nucelotide sequences of the transposon important?

-        the cis-acting nucleotide sequences of the transposon that are required to enable it to move - for example, in the case of the retrotransposons, the LTR and the promoter direct transcription of the initial RNA (for the LTR- and Poly A transposons respectively).

 

-        if a transposon acquires defects in any of these cis-acting regions (e.g. point mutations, deletions) they will no longer be able to transpose.

49
New cards

Give examples of the types of retrotransposons that are non-autonomous?

not all classes of transposon encode enzymatic activity (LTR elements and SINEs) – these use the enzymatic activity encoded by other transposons

50
New cards

What are human LTR transposons?

  • autonomous and non-autonomous retro-virus like elements

  • flanked by long terminal repeats containing necessary transcriptional regulatory elements

51
New cards

What is a Human endogenous retroviral sequences ( HERV )?

is a type of LTR transposon

contain gag and pol genes

encode protease, reverse transcripase, RNaseH and integrase

can transpose independently ( autonomous )

mostly defective

52
New cards

What are long terminal repeats?

LTR = long terminal repeat sequence that acts as the transcriptional promoter

53
New cards

What is a “full length” HERV similiar to?

"full length" versions encode reverse transcriptase (pol gene), and have the genome organisation of a typical retrovirus

54
New cards

Human possess retroviruses that exist in two forms. What are they?

as normal genetic elements in their chromosomal DNA (endogenous retroviruses)

as horizontally-transmitted infectious RNA-containing viruses which are transmitted from human-to-human (exogenous retroviruses, e.g. HIV and human T cell leukemia virus, HTLV)

55
New cards

Give an example of another type of LTR transposons?

an example of a LTR transposon

human non-autonomous retrovirus like elements

  • lack pol gene and often also gag

  • is very truncated

56
New cards

Whats different about poly A transposons?

most transposons are species-specific - i.e. they have a short "lifespan" - this contrasts the polyA transposons found in the human genome

57
New cards

What are poly A transposons?

Poly A transposons, also known as polyadenylated transposons, are a type of transposable element that contains a polyadenylated tail at their 3' end

cut and paste mechanism

autonomous

58
New cards

Give an example of a poly A transposon?

the largest of the polyA transposons are the LINE elements.

these have a promoter, a poly A sequence and two open reading frames - in many ways they look like a "regular" cellular gene.

the ORF2 encodes the enzymatic activity required for transposition (reverse transcriptase and endonuclease).

59
New cards

Why are lines , particularly line 1 the most important human transposons?

-        of the three families of LINEs, LINE-1 is the only family still transposing.

 

-        LINE-1 i: not only can it transpose "itself", but it provides the transposition machinery (reverse transcriptase and endonuclease) for the transposition of non-autonomous polyA transposons (SINEs), andalso sometimes leads to the transposition of cellular RNAs.

 

-         As a result LINEs are very relevant both in the evolution of our genomes and in disease

60
New cards

What is a promoter?

Promoter is a sequence of DNA to which proteins bind to initiate transcription of a single RNA transcript from the DNA downstream of the promoter.

61
New cards

Within a LINE. Why is the poly A sequence important?

the poly A sequence is important not only in determining the end of the transcript, but also has relevance in the integration process

62
New cards

Describe the propagation of LINES? in detail

Propagation of LINES – 1: LINE mRNA associates with ORF1 and ORF2 proteins

-        it is important to note that the transcript (LINE mRNA) contains the promoter (P) nucleotide sequences (which are downstream from the transcription start site itself).

-        the ORF1 and ORF2 proteins bind to the LINE mRNA near the polyA sequence at the 3' end.

-        these proteins preferentially bind the transcript from which they were translated

Propagation of LINEs – 2: Target site recognition and cleavage

-        the ribonucleoprotein moves to the nucleus, and interacts with an AT-rich sequence in the genome.

-        the endonuclease (encoded by the LINE) nicks the genomic DNA (adjacent to the Ts in the bottom strand of the schematic).

-        LINEs preferentially integrate in AT-rich regions – since AT-rich DNA is gene-poor, therefore LINEs do not in general cause too many mutations

Propagation of LINEs – 3: Reverse Trasncription of LINE mRNA and integration

-        the reverse transcriptase activity (encoded by the LINE) then copies the RNA into cDNA (it is primed by the genomic DNA bound at the "nick"), and this cDNA is then integrated into the genome

63
New cards

How is a “truncated” transposon formed?

often the reverse transcriptase does not extend to the very 5' end of the LINE mRNA, resulting "truncated" transposons - since the promoter is located at the 5'end, this sequence is therefore not incorporated into the translocated LINE, and this new copy cannot therefore "re-transpose".

64
New cards

What is a 3’ poly A sequence?

it is a sequence of AAAA ( like seen on the diagram ) that follow a 3’ ( 3 prime untranslated region )

65
New cards

What is the function of a poly A site?

a poly A site to direct ORF1/2 binding and integration

66
New cards

What are the feature of the Pol 2 cellular trascript transcribed by LINE machinery?

The transcript is  known as a processed pseudogenes since it has no introns

It is characterised by a lack of promoter nucleotide sequence and no introns

67
New cards

Describe the process of transposition by the LINE machinery? for a Pol II-transcribed cellular transcript

-        the LINE machinery binds to the spliced transcript and inserts it into a new genomic location.

-        the translocated cellular gene has therefore lost its introns and is known as a processed pseudogene.

-        the translocated gene no longer has its promoter - it cannot therefore be re-transposed because it cannot be expressed into RNA.

68
New cards

What is a retrogene?

  • sometimes the LINE-generated copy of the cellular gene happens by chance to integrate downstream from a cellular promoter that can drive expression of the processed gene copy.

  • in this situation, the transposed intron less gene is expressed, and known as a retrogene.

69
New cards

What are SINEs?

SINEs are 100-400 bps

non autonomous poly A retrotransposons

do not encode proteins and cannot transpose independently

SINEs can be mobilised by neighbouring lines as they both share sequneces the 3 prime ends

70
New cards

How are SINEs transcribed?

SINEs originated from cDNA copies of genes transcribed by RNA polymerase III

71
New cards

Why can full length SINEs be re-transposed?

-        polymerase III-transcribed genes usually have an internal promoter (located downstream from the transcription start site),and such sequences are therefore present in transposed "full length" copies of SINEs.

-        full-length SINEs can therefore be re-transposed (using LINE-encoded transposition machinery) because the transposed copy contains a promoter and can therefore be expressed (contrast transposed copies of cellular transcripts transcribed by RNA polymerase II).

72
New cards

Give an example of SINE sequence?

the most important SINE sequences are the Alu family of repeats - there are over a million copies of Alu elements in the human genome

73
New cards

What is an Alu sequence? structure and functions?

a SINE sequence

function:

most contain a recognition site for the restriction enzyme Alul

Alu elements act as a site for recombination

structure:

comprises two tandem repeats

the Alu repeat is so-named because it contains sequence that is recognised by the restriction enzyme AluI.

many Alu repeats are still capable of being transposed because they contain an internal promoter (boxes A and B) and are transcribed by RNA polymerase III.

74
New cards

What are the properties and genomic location of Alu repeats?

-        Alu repeats are very abundant in the human genome- they are only found in primates.

-        Alu sequences are GC-rich, and most Alu repeats are found in GC-rich euchromatin (contrast LINEs)

- Alu repeats are sometime transcribed when the organism is under stress

-        Alu repeats do not generally damage the "host" gene at their site of insertion

75
New cards