Genome Assembly and Annotation

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/80

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

81 Terms

1
New cards

What is genome sequencing?

The determination of the complete DNA sequence of an organism's genome.

2
New cards

What insights does genome sequencing provide?

It provides insights on genetic basis of evolutionary relationships, diseases, and genes, including coding and non-coding regions.

3
New cards

What are the three generations of sequencing?

First-generation, second-generation, and third-generation sequencing.

4
New cards

What was the significance of Rosalind Franklin's work in 1952?

She photographed X-ray diffraction of DNA, providing crystallographic data crucial for understanding DNA structure.

5
New cards

Who solved the three-dimensional structure of DNA?

James Watson and Francis Crick in 1953.

6
New cards

What is first-generation sequencing?

Sanger Sequencing, which uses a chain termination method and is considered the 'gold standard' for accuracy.

7
New cards

Who developed Sanger Sequencing?

Frederick Sanger in 1977.

8
New cards

What are dNTPs?

Deoxyribonucleoside triphosphates, the building blocks for DNA replication.

9
New cards

What are ddNTPs?

Dideoxyribonucleoside triphosphates, which act as chain-terminating inhibitors in Sanger Sequencing.

10
New cards

What is the role of DNA polymerase in Sanger Sequencing?

It synthesizes new DNA strands by adding nucleotides to a growing chain.

11
New cards

What is the main challenge associated with genome assembly?

Dealing with repetitive sequences and accurately aligning reads to reconstruct the genome.

12
New cards

Why is annotation important in bioinformatics?

It helps identify the locations of genes and other features in the genome, providing functional insights.

13
New cards

What does the term 'coding regions' refer to?

Parts of the genome that are translated into proteins.

14
New cards

What does the term 'non-coding regions' refer to?

Parts of the genome that do not code for proteins but may have regulatory or other functions.

15
New cards

What is the historical significance of the year 1953 in genetics?

It marks the year Watson and Crick published their model of the DNA double helix.

16
New cards

What is the primary method used in Sanger Sequencing?

Chain termination method using dideoxynucleotides.

17
New cards

What is the purpose of primers in Sanger Sequencing?

Primers are short sequences that initiate DNA synthesis during the sequencing process.

18
New cards

What is the difference between dNTPs and ddNTPs?

dNTPs have one less oxygen than ribose, while ddNTPs have two less, preventing further elongation of the DNA strand.

19
New cards

What is the relevance of sequencing in understanding diseases?

Sequencing can reveal genetic mutations associated with diseases, aiding in diagnosis and treatment.

20
New cards

What does 'assembly' refer to in genome sequencing?

The process of piecing together short DNA sequences to form a complete genome.

21
New cards

What are the main challenges in second-generation sequencing?

Higher error rates and difficulties in assembling short reads into longer contiguous sequences.

22
New cards

What advancements characterize third-generation sequencing?

The ability to sequence longer DNA fragments in real-time, improving assembly and accuracy.

23
New cards

What is the main mechanism of Automated Sanger Sequencing?

It uses four ddNTPs labeled with different fluorescent tags.

24
New cards

What is the difference between manual and automated Sanger sequencing?

Manual uses radioisotopes and polyacrylamide gel slabs, while automated uses capillary electrophoresis and dye-labeled ddNTPs.

25
New cards

What is the Human Genome Project?

An ambitious research effort to decipher the entire human genetic code, published in 2001 and finalized in 2003.

26
New cards

What are the basic steps of second-generation sequencing?

  1. Library preparation, 2. Template amplification, 3. Sequencing.
27
New cards

What are the common features of second-generation sequencing?

Highly parallel, microscale reactions, fast results, and low-cost genome sequencing.

28
New cards

What is 454 GS20?

The first NGS technology developed by Roche, allowing massive parallel sequencing.

29
New cards

What are the advantages of Illumina Sequencing?

Allows high-throughput sequencing at reduced costs and produces shorter reads.

30
New cards

What is the significance of Illumina Sequencing in NGS data generation?

It accounts for about 80% of all NGS data generated.

31
New cards

What is the process of bridge amplification in Illumina Sequencing?

Template DNA makes U-shaped loops attached to the surface, generating dense clusters of DNA.

32
New cards

What is PacBio SMRT sequencing?

A third-generation sequencing method that does not require amplification of template DNA.

33
New cards

What is Nanopore Sequencing?

A sequencing technology that uses ionic current signals to read DNA, producing longer reads.

34
New cards

What is a major challenge after sequencing?

Ensuring high quality of the assembled and annotated genomic sequence.

35
New cards

What is the primary goal of the Human Genome Project?

To identify genes associated with rare and common diseases and examine ethical implications.

36
New cards

What are the characteristics of second-generation sequencing?

It is fast, low-cost, and allows for high-throughput sequencing.

37
New cards

What are the disadvantages of 454 GS20 technology?

It is prone to errors, especially in indels and homopolymer regions.

38
New cards

What is the purpose of adapter ligation in library preparation?

To prepare fragmented DNA for sequencing.

39
New cards

What is the difference between single-end and paired-end sequencing?

Single-end sequences from one end, while paired-end sequences from both ends of the DNA fragment.

40
New cards

What is the role of DNA polymerase in Illumina Sequencing?

To synthesize new DNA strands during the sequencing process.

41
New cards

What does 'base calling' refer to in sequencing?

The process of determining the identity of the first base in a sequencing reaction.

42
New cards

What is the significance of massively increased throughput in sequencing?

It allows for parallelization of many reactions, enhancing efficiency.

43
New cards

What is the purpose of sequencing primer in second-generation sequencing?

To initiate the addition of bases to the template DNA.

44
New cards

What are the key differences between first, second, and third-generation sequencing?

First generation is manual, second generation is high-throughput and cost-effective, and third generation allows real-time sequencing without amplification.

45
New cards

What are the implications of genetic technologies examined by the Human Genome Project?

Ethical, legal, and social implications related to genetics.

46
New cards

What is the output of Illumina Sequencing?

Shorter reads that can be sequenced from one or both ends.

47
New cards

What is the primary aim of genome assembly?

To create a genome assembly with the longest possible sequences (least fragmented) and the smallest number of mis-assemblies.

48
New cards

What does the phrase 'garbage in, garbage out' imply in bioinformatics?

The quality of output is determined by the quality of the input data.

49
New cards

Why is quality control important in genome assembly?

To avoid erroneous downstream applications and conclusions.

50
New cards

What is the FASTQ file format?

A format that contains 4 lines per read: Read Name, Sequence, Plus sign, and Quality Scores.

51
New cards

What does the Phred Quality Score (Q Score) measure?

The probability of a correct base call in sequencing data.

52
New cards

What is FastQC?

A commonly used tool for read quality assessment that can be run from both web-based and command line interfaces.

53
New cards

What is adapter trimming in genome assembly?

The removal of adapter sequences from the ends of DNA fragments to ensure only the actual target DNA is analyzed.

54
New cards

What is low-quality end trimming?

The removal of poor-quality base calls at the ends of reads to ensure only high-quality bases are present.

55
New cards

What is genome assembly?

A computational process of deciphering the genetic material within the cell of an organism using numerous short sequences called reads.

56
New cards

What are the two main types of genome assembly?

Reference assembly and de novo assembly.

57
New cards

What is a reference genome?

A representative example of a set of chromosomes for a species, ideally produced from the DNA of one member of that species.

58
New cards

What challenges exist in genome assembly?

Repetitive regions can cause gaps, rearrangements, and inaccurate repetitions in the assembly.

59
New cards

What are short tandem repeats?

Repetitive sequences in the genome that consist of short sequences repeated in tandem, such as ATATATATA.

60
New cards

What are long interspersed nuclear elements (LINES)?

Repetitive sequences in the genome that are approximately 7000 base pairs long.

61
New cards

What is scaffolding in genome assembly?

The process of stitching assembled contigs together based on information from paired short reads.

62
New cards

What does N50 measure in genome assembly?

The length of the smallest contig such that the sum of contig lengths covers 50% of the total size of contigs.

63
New cards

What is the purpose of gap filling in genome assembly?

To fill in gaps using actual sequences to improve the continuity of the assembly.

64
New cards

What tools are commonly used for adapter trimming?

PrinSEQ and Trimmomatic.

65
New cards

What is the significance of misassemblies in genome assembly?

Misassemblies need to be corrected before scaffolding to ensure accurate genome representation.

66
New cards

What is the role of Quast in genome assembly?

A tool used to compare metrics between different genome assemblies.

67
New cards

What is the difference between short read and long read assembly?

Short read assembly uses shorter sequences for assembly, while long read assembly uses longer sequences, which can provide more context.

68
New cards

What is genome annotation?

The process of deriving structural and functional information of a protein or gene from raw data using various analysis techniques.

69
New cards

What are the two main components of genome annotation?

(a) Identifying elements on the genome (gene prediction) and (b) attaching biological information to these elements.

70
New cards

What are the three categories of genome annotation?

  1. Nucleotide-level: identify location of gene features; 2. Protein-level: determine possible functions of genes; 3. Process-level: identify pathways and processes where genes interact.
71
New cards

What does structural annotation involve?

Attaching biological meaning to genome sequences by analyzing their sequence structure and composition.

72
New cards

What is the output of structural annotation?

Gene maps and location of elements.

73
New cards

What does functional annotation assign?

Biologically relevant information to predicted polypeptides and the features they derive from, such as genes and mRNA.

74
New cards

What are the outputs of functional annotation?

Biological processes, cellular components, and molecular functions.

75
New cards

What factors should be considered in genome assembly?

  1. Genome size; 2. Repeats; 3. Heterozygosity.
76
New cards

What is the General Feature Format (GFF)?

Often the output of a genome annotation, used to submit data to databases for improved availability and findability.

77
New cards

What is the relevance of genome annotation in bioinformatics?

It translates raw genetic data into understandable biological information from physical and genetic maps.

78
New cards

What are the three sequencing technologies?

  1. First generation (Sanger): high accuracy, ~1000 bp; 2. Second generation: massively parallel, high throughput, 150-300 bp; 3. Third generation: direct sequencing without amplification, 10,000 to 1,000,000 bp.
79
New cards

What are the challenges associated with genome assembly?

Presence of repeating sequences, short reads, sequencing errors, and computational requirements.

80
New cards

Why is quality control important prior to bioinformatic analyses?

To ensure the accuracy and reliability of the data being analyzed.

81
New cards

What are the two types of genome annotation?

Structural annotation and functional annotation.