Platypus Genomics and Wildlife Conservation

Platypus (Ornithorhynchus anatinus): From Genetics to Genomics

Learning Outcomes

  • Describe the main characteristics of experimental approaches and bioinformatic pipelines in genomics.
  • Discuss the applications of genomic and bioinformatic pipelines in wildlife conservation.

Outline

  • Recommended readings.
  • Brief introduction to genomics: genomic approaches and pipelines.
  • Platypus case studies x 3 using genome resequencing and reduced representation sequencing.

Recommended Readings

  • Barbosa S., Hendricks S.A., Funk W.C., Rajora O.P. (2020) Wildlife Population Genomics: Applications and Approaches. In: Hohenlohe P.A., Rajora O.P. (eds) Population Genomics: Wildlife. Population Genomics. Springer, Cham. https://doi.org/10.1007/13836_2020_83
  • Martin HC, Batty EM, Hussin J, Westall P, Daish T, Kolomyjec S, Piazza P, Bowden R, Hawkins M, Grant T, Moritz C, Grutzner F, Gongora J, Donnelly P. Insights into Platypus Population Structure and History from Whole-Genome Sequencing. Mol Biol Evol. 2018 May 1;35(5):1238-1252. doi: 10.1093/molbev/msy041.
  • Zhou et al. (2021) Platypus and echidna genomes reveal mammalian biology and evolution. Nature. 2021 Apr;592(7856):756-762. doi: 10.1038/s41586-020-03039-0.
  • Warren WC, et al (2008). Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008 May 8;453(7192):175-83. doi: 10.1038/nature06936.
  • Litticia M. Bryant, Matt N. Krosch, Lines in the land: a review of evidence for eastern Australia's major biogeographical barriers to closed forest taxa, Biological Journal of the Linnean Society, Volume 119, Issue 2, October 2016, Pages 238–264, https://doi.org/10.1111/bij.12821.

Experimental Approach

  • Further details can be found at: Barbosa et al (2020) Wildlife Population Genomics: Applications and Approaches. In: Hohenlohe P.A., Rajora O.P. (eds) Population Genomics: Wildlife. Population Genomics. Springer, Cham. https://doi.org/10.1007/13836_2020_83
  • DNA extraction method (purposes: long reads or short reads)
  • Size partitioning/fragmented
  • Ligation of adaptors
  • PCR enrichment depending of the seq method
  • Assessing DNA quality
  • Preparation of sequencing reactions (single/multiplexing)
  • Sampling collection method
  • Short or long read sequencing.
  • Library preparation
  • Sample preparation

Traditional Markers

  • Various techniques for small to moderate numbers of markers
  • Examples: Microsatellites; exon-priming intron-crossing markers
  • General considerations:
    • Cost per sample: Variable US\$10–50
    • Number of markers: 10^1 –10^2
    • Applicability to new taxa: Moderate
    • Ability to target candidate loci: Yes
    • DNA quality required: Low
    • Equipment needed: PCR machine; traditional sequencer

qPCR-Based SNP Chips

  • Hybridizing array; genotyping by real-time qPCR
  • Examples: Fluidigm dynamic arrays; Illumina Golden Gate; Applied Biosystems OpenArray128
  • General considerations:
    • Cost per sample: 200–500
    • Number of markers: 10^2 –10^3
    • Applicability to new taxa: Low
    • Ability to target candidate loci: Yes
    • DNA quality required: Low
    • Equipment needed: $100,000 platform

High-Density SNP Chips

  • High-density oligonucleotide hybridizing array with fluorescent probes
  • Examples: Affymetrix GeneChip; Illumina BeadChip129
  • General considerations:
    • Cost per sample: 200–1,000
    • Number of markers: 10^4 –10^5
    • Applicability to new taxa: Low
    • Ability to target candidate loci: Yes
    • DNA quality required: High
    • Equipment needed: $150,000 platform

Targeted DNA Sequencing

  • Fragment capture with oligonucleotide array; genotyping by next-generation sequencing
  • Examples: Exon capture110
  • General considerations:
    • Cost per sample: 50–150
    • Number of markers: 10^4 –10^5
    • Applicability to new taxa: Low–moderate
    • Ability to target candidate loci: Yes
    • DNA quality required: High
    • Equipment needed: $5,000 for equipment; next-generation sequencer

Anonymous DNA Sequencing

  • High-throughput sequencing of reduced representation genomic DNA fragments
  • Examples: RAD sequencing127
  • General considerations:
    • Cost per sample: 500–5,000
    • Number of markers: 10^4 –10^6
    • Applicability to new taxa: High
    • Ability to target candidate loci: No
    • DNA quality required: Low–moderate
    • Equipment needed: PCR machine; next-generation sequencer

Whole-Genome Resequencing

  • Sequencing of whole genome for multiple individuals in a sample
  • Examples: Next-generation and future sequencing technologies
  • General considerations:
    • Cost per sample: 500–5,000
    • Number of markers: Complete genome
    • Applicability to new taxa: Low
    • Ability to target candidate loci: Yes (bioinformatically)
    • DNA quality required: High
    • Equipment needed: Next-generation sequencer, bioinformatics resources
  • qPCR, quantitative PCR; RAD, restriction-site-associated DNA. Sourced and modified from: Allendorf, F., Hohenlohe, P. & Luikart, G. Genomics and the future of conservation genetics. Nat Rev Genet 11, 697–709 (2010). https://doi.org/10.1038/nrg2844
  • Assessing the requirements, resolution and costs of genomic approaches

Sequencing Approaches

  • 1st, 2nd and 3rd generation

Primary Problem & Possible Genomic Solution

  • Estimation of effective population size (Ne), migration (m) and selection coefficient (s):
    • Increasing the number of markers, reconstructing pedigrees and using haplotype information will provide greater power to estimate and monitor Ne and m, as well as to identify migrants, estimate the direction of migration and estimate s for individual loci within a population.
  • Reducing the amount of admixture in hybrid populations:
    • Genome scanning of many markers will help to identify individuals with greater amounts of admixture so that they can be removed from the breeding pool.
  • Identification of units for conservation: species, ESU and MU:
    • The incorporation of adaptive genes and gene expression will augment our understanding of conservation units based on neutral genes. The use of individual-based landscape genetics will help to identify boundaries between conservation units more precisely.
  • Minimising adaptation to captivity:
    • Numerous markers throughout the genome could be monitored to detect whether populations are becoming adapted to captivity.
  • Predicting the negative impact of inbreeding depression:
    • Understanding the genetic basis of inbreeding depression will facilitate the prediction of the effectiveness of purging. Genotyping of individuals at loci associated with inbreeding depression will allow the selection of individuals as founders or mates in captive populations. Pedigree reconstruction will allow more powerful tests of inbreeding depression.
  • Estimating the amount of outbreeding depression:
    • Understanding the divergence of populations at adaptive genes will help to predict effects on fitness when these genes are combined. Detecting chromosomal rearrangements will help to predict outbreeding depression.
  • Predicting the viability of local populations:
    • Incorporating genotypes that affect vital rates and the genetic architecture of inbreeding depression will improve population viability models.
  • Predicting the potential of populations to adapt to environmental changes and threats:
    • Understanding adaptive genetic variation will help to predict the response to a rapidly changing environment or to harvesting by humans and allow the selection of individuals for assisted migration.
  • Sourced and modified from: Allendorf, F., Hohenlohe, P. & Luikart, G. Genomics and the future of conservation genetics. Nat Rev Genet 11, 697–709 (2010). https://doi.org/10.1038/nrg2844
  • The need for thinking about the problem and possible solution before deciding on the genomic methods

Questions About Genomic Approaches

  • In what circumstances should we use complete/entire genome sequencing approaches instead of low genome representation approaches in wildlife research?
    • Interested to sequence the entire genome to develop a reference sequence.
    • Have a comprehensive understanding of the genetic variation across the genome
    • To make genome-wide associations
    • Investigate structural genome rearrangements
  • In what circumstances should we use low genome representation approaches instead of complete/entire genome sequencing approaches in wildlife research?
    • Low representation approaches can be used if we are interested in studying portions or parts of the genomes of species or populations.
    • It can be used in DNA barcoding, targeted sequencing, DNA capture or genotyping by sequencing
    • It can be use for genotyping of SNPs for population structure, population dynamics, population demographics and genetic diversity within and across populations.
  • In what circumstances should we use third-generation/long-read sequencing instead of short-read sequencing?
    • Complete genome sequencing projects
    • Long fragments of the genomes
    • Complex genomes rearrangements
    • Complex genomic regions (e.g. containing gene copy number variation)

Development of Bioinformatic Pipelines for Raw Data

  • Further details can be found at: Barbosa et al (2020) Wildlife Population Genomics: Applications and Approaches. In: Hohenlohe P.A., Rajora O.P. (eds) Population Genomics: Wildlife. Population Genomics. Springer, Cham. https://doi.org/10.1007/13836_2020_83
  • Raw genomic data
  • Quality filtering of raw sequence data
  • Read alignment and mapping
  • Mapping statistics
  • Post-alignment filtering
  • Base quality score recalibration
  • Variant calling
  • Filtering of variants
  • Variant annotation

Development of Data Analysis Pipelines

  • It refers to the actual analyses for the dataset, It depends quality and quantity of the data and questions addressed in your study.
  • Further details can be found at: Barbosa et al (2020) Wildlife Population Genomics: Applications and Approaches. In: Hohenlohe P.A., Rajora O.P. (eds) Population Genomics: Wildlife. Population Genomics. Springer, Cham. https://doi.org/10.1007/13836_2020_83
  • Data Integration pipeline modify from Lee et al 2018 https://pubmed.ncbi.nlm.nih.gov/29256177/
  • Species Hierarchical clustering Multiway ANOVA Result interpretation
  • Genus Family Guide to develop a pipeline for population genomics in natural populations
  • Example of pipeline in comparative genomics

Case Study: Genome Resequencing of Platypuses

  • Martin HC, Batty EM, Hussin J, Westall P, Daish T, Kolomyjec S, Piazza P, Bowden R, Hawkins M, Grant T, Moritz C, Grutzner F, Gongora J, Donnelly P. Insights into Platypus Population Structure and History from Whole-Genome Sequencing. Mol Biol Evol. 2018 May 1;35(5):1238-1252. doi: 10.1093/molbev/msy041. PMID: 29688544; PMCID: PMC5913675.

Previous Genetic Studies on Platypuses

Paper/Thesis# samplesSampling locationsGenetic dataMajor finding
Gemmell (1994)121Mainly Shoalhaven; & Goulburn, Thredbo, Mitta Mitta, Tambo, Merri riversMitochondrial RFLP haplotypesGeographical partitioning of platypuses
Akiyama (1998)Mainly Shoalhaven & a few from Warrawong SanctuaryMicrosatellitesGeographical partitioning of platypuses
Warren et al.90QLD, NSW, VIC, TAS, SAFirst platypus genome draft/ 57 retrotransposonsTwo major groups (mainland vs TAS)
Kolomyjec et al.130Mainly Hawkesbury and Shoalhaven river basins, NSW12 microsatellitesRivers act as discreet population units/MU
Gongora et al.28422 river basins across platypus rangeMitochondrial control region and cytochrome b geneTwo major groups (mainland vs TAS) 3-4 major genetic lineages/ESUs across the species’ range
Kolomyjec et al23512 basins across platypus range12 microsatellites3-4 ESUs across the species’ range
Furlan et al.752River basins across NSW, Victoria, TAS13 microsatellites, two mitochondrial haplotypesRivers act as discreet population units/MU

Why Use Genome Sequencing to Understand Platypus Population Dynamics and Demographics?

  • Rationale:
    • A small number of markers gives limited information about the underlying population history
    • A genealogy built from a single locus may not provide sufficient resolution to understand the historical relationships between populations
    • Whole-genome sequencing data is much more informative than mitochondria or microsatellites
    • Previous attempts to assess kinship relationships using microsatellites have failed
  • Aims:
    • Investigate diversity
    • Investigate structure and differentiation between subpopulations
    • Investigate relative historical effective population sizes
    • Investigate relatedness between the individuals sampled
    • Investigate dispersal
  • The next slide presents the work described in Martin HC, Batty EM, Hussin J, Westall P, Daish T, Kolomyjec S, Piazza P, Bowden R, Hawkins M, Grant T, Moritz C, Grutzner F, Gongora J, Donnelly P. Insights into Platypus Population Structure and History from Whole-Genome Sequencing. Mol Biol Evol. 2018 May 1;35(5):1238-1252. doi: 10.1093/molbev/msy041. PMID: 29688544; PMCID: PMC5913675.
  • Most of the figures and tables presented here have been sourced from this paper.

Sampling

  • Samples provided by Jaime Gongora, Frank Grutzner, Tasman Daish, Tom Grant and other collaborators
  • Sampling locations across Queensland, New South Wales, and Tasmania

How Genetic Work Is Done

  • https://schoolworkhelper.net/gel-electrophoresis-basics-steps/
  • DNA extraction
  • Library preparation & short-read genome resequencing
  • Mapping and variant calling: we used the improved genome assembly ornAna2
    • To assign contigs to chromosomes, we used chromosome assignment made for the ornAna1 reference: chromosomes 1 to 7, 10-12, 14, 15, 17, 18, 20, X1, X2, X3 and X5.
  • Identifying relatives to estimate the kinship coefficient
  • Population genetic (genomic) analyses: diversity indexes, Fst estimates, Principal Component Analyses PCA and STRUCTURE
  • Pairwise sequential Markovian coalescent (PSMC) to investigate historical population sizes
  • Sampling Pipeline

Results

  • Variant calling and SNP distribution
  • The variant calls were filtered to produce a set of 6.7M high-quality SNPs across 54 autosomal scaffolds comprising 965Mb of the assembly.

Principle Component Analyses

  • Identify five genetic distinct groups within the platypus

Population differences FST

Population differences FST between most distant human populations ~0.1-0.15

North QLDNorth NSWCentral NSWTasmania
North QLD-0.3520.3840.769
North NSW0.352-0.1340.651
Central NSW0.3840.134-0.649
Tasmania0.7690.6510.649-
Population # fixed divergences from reference (out of 2.5 million stringent SNPs)
# Fixed Divergences
------------------------------
North QLD150,596
North NSW*9,675
Central NSW39,286
Tasmania353,964
*Excluding the reference sample

Genomics Provides Resolution to Assess Relatedness

  • Confirmed father-daughter pair from Taronga Zoo used in the study
  • 31 of the 57 samples had a first-, second- or third-degree relative amongst the other samples
    • (one from each relative pair removed for population genetic analyses – 42 samples)
  • In most cases, the relative pairs were male-female or female-female (due to male-biased dispersal?)
  • Family quartet: mother + two offspring (DZ twins?) (+ mother’s sister) collected 2km downstream of father, all within 3 years
  • One pair of 2nd degree relatives from Barnard River, plus six pairs from Shoalhaven
  • Five pairs of 3rd degree relatives each from Barnard, Shoalhaven and Wingecarribee, and four pairs from Dirran Creek
  • Finding of many relative pairs across sampling sites suggest limited dispersal of at least some relatives

Genomics Allowed Estimating the De Novo Mutation Rate

Genomics allowed estimating the de novo mutation rate in the platypus genome which was not possible in previous genetic studies

  • Putative de novo mutations in the two offspring in the family quartet
  • Our point estimate of the rate of 7.0 × 10^{-9} (95% CI 4.1 × 10^{-9} – 1.2 × 10^{-8}/bp/generation) in the platypus.
  • This is lower than the estimated rate of 1.2 × 10^{-8} in humans and chimpanzees (Kong et al. 2012; Venn et al. 2014) but higher than the rates estimated for laboratory mice (5.4 × 10^{-9}) (Uchimura et al. 2015).
  • Using the lenient call set:
    • \mu = \frac{\text{#. putative de novos for N742 + #. putative de novos for N757}}{\text{# sites callable in N742 & parents + #. sites callable in N757 & parents}} = \frac{66+103}{872184858+883117446} = 9.6 × 10^{-8}/\text{bp/generation}

There Is Not Extreme Inbreeding in the Platypus

  • F_{ROH} – estimate of inbreeding coefficient.
  • Little evidence of extreme inbreeding
  • Most individuals don’t have substantial homozygosity, suggesting mechanism for avoiding mating with close (1st or 2nd degree) relatives – maybe male-biased dispersal?
  • For Carnarvon sample (n=1) (low genetic diversity), difficult to distinguish inbreeding from historically low Ne. It may reflect the low effective population size over the last ∼50,000 years or may be the result of recent inbreeding
  • Proportion of genome in homozygous chunks (F_{ROH})

STRUCTURE: Five Genetic Lineages

STRUCTURE: five genetic lineages were identified in the platypus

  • Structure program:
    • Infers genetic genetic population ancestries
    • Assigns individuals to populations
    • Infer admixed individuals
    • Identify migrants

Unexpected Similarities and Differences

  • Unexpected similarities and differences between river systems were identified, opposite sides of Great Dividing Range adjacent river systems
  • Murray-Darling basin connected to east-flowing basins?
  • Previous genetic studies did not provide resolution to assess this

PSMC: Pairwise Sequentially Markovian Coalescent Analyses

  • Last Glacial Maximum (LGM 16-20K YPB)
  • YBP: years before present
  • Ne: effective population size
  • Analyses of Ne in the context of divergence time allowed to underwent bottleneck and whether this persist at present
    • NSW and TAS pops have had some fluctions but stable & higher Ne
    • In contrast low Ne was identified in QLD pops ~ LGM. Impact of arid periods—glacial maxima—on the mesic forest biotas of the region (Bryant and Krosch 2016).
    • Considerable reduction of Ne (strong bottleneck) in NQLD pops was found but there has been some recovery
    • More studies are required to assess the degree of this bottleneck in the region in contemporary populations
  • Previous genetic studies did not provide resolution to assess this

Summary Findings

  • Very strong population structure in the platypus
  • Finding of many relative pairs suggests limited dispersal of at least some relatives
  • Little evidence of extreme inbreeding
  • First de novo mutation rate estimate in the platypus
  • Some patterns of similarity and differences between populations not easy to reconcile with geography
  • Some populations underwent bottleneck events and some seemed to have recovered
  • Impact for conservation:
    • Population structure, genomic diversity and demographic histories of platypus populations should be used to inform captive breeding and reintroduction programs
    • Carnarvon River platypuses (NQLD) may be particularly vulnerable due the high level of homozygosity and should therefore be a priority for conservation

Fragmentation by Large Dams and Implications

  • Insights from neutral loci using reduced representation sequencing

How Functional Is Landscape Connectivity?

  • Do dams affect dispersal between populations?
  • Hypotheses
    • Unregulated (undammed) rivers
      • If there are not barriers for platypus dispersal within a river system, it would be expected to see a continuous platypus population with non or little genetic differentiation across river system
    • Regulated (dammed) rivers
      • If dams affect platypus dispersal, genetic differentiation will be greater across the dam than along a similar stretch of the undammed river

Methods

  • This study investigated 214 platypuses from regulated and unregulated rivers using reduced-representation sequencing (2.6K SNPs) on putatively neutral loci (i.e. non-coding genomic regions).
  • Samples were genotyped using DArTseqTM (DArT Pty Ltd, Canberra, ACT, Australia).
  • DArT’s procedure uses a combination of genome complexity reduction methods using restriction enzymes, implicit fragment size selection and next-generation sequencing to produce thousands of SNPs randomly distributed throughout the genome.

Genetic Differentiation vs Dam Age

  • Genetic differentiation is greater across older dams

PCA Based on Genetic Data

  • PCA shows genetic differences based on location relative to damming

Alleles in Space - Dammed vs Undammed River

  • Alleles in Space based on the genetic data – Dammed River
  • Alleles in Space based on the genetic data – UNdammed River

Conservation Implications of Dams

  • Genetic Differentiation is:
    • Greater across older dams
    • Highest in the location of the dams
  • Conservation Implications
    • Dams appear to block to immigration & recolonization
    • Dams subdivide populations which might lose genetic variation at a faster rate
  • This study found that genetic differentiation between populations (FST) across dams was 4- to 20-fold higher than along similar stretches of adjacent undammed rivers, indicating that major dams act as major barriers to platypus movement.

The Influence of Dams

  • Insights from genome-wide neutral and adaptive loci

Introduction and Methods - Dams and Platypus Populations

  • Background: Neutral markers have shown that large dams restrict connectivity and disturb platypus habitat
  • Issue: It is unclear as to whether this impact on neutral markers have also extended onto adaptive changes.
  • Aim: In this study, whole platypus genomes were re-sequenced representing groups from below and above Pindari Dam on the Severn River (dammed river), compared to Tenterfield Creek, an undammed adjacent stream.
  • Methods: Through genome resequencing, we analysed 9.2-million SNPs in 25,862 genomic regions every 100-Kbp including coding sequences (possibly adaptative) and non-coding regions (usually presumed neutral).

Study Area - Dams and Platypus Populations

  • Dataset: Whole genome sequencing data consisting of 26 platypus samples that were collected in the unregulated Tenterfield Creek (n=11) and below the dam (n=8), and above the dam (n=7) in the regulated Severn River.

Potential Effects of Dam Construction

  • Three potential effects of dam construction were evaluated in this study:
    • A. Only adaptive regions diverged;
    • B. Only neutral regions diverged;
    • C. Adaptive regions diverged and influenced adjacent neutral loci.

Dividing the Dataset - Dams and Platypus

  • To determine the scenarios regarding the fate of neutral and potential adaptive loci following dam construction, the dataset was divided in four groups:
    • Coding regions in Severn.
    • Non-coding regions in Severn.
    • Coding regions in Tenterfield.
    • Non-coding regions in Tenterfield.

Analyzing FST of Chromosomes

  • We then sampled genomic regions in each chromosome for each group every 100Kbp and assigned to each region a scenario based on the following expectations:

    • a) Adaptive loci diverged; neutral loci did not diverge; if:
      • - FST coding region Severn > 3 SD + FST coding region Tenterfield; and
      • - FST coding region Severn > 3 SD + FST non-coding region Severn.
    • b) Neutral loci diverged; adaptive loci did not diverge; if:
      • - FST non-coding region Severn > 3 SD + FST non-coding region Tenterfield; and
      • - FST non-coding region Severn > 3 SD + FST coding region Severn.
    • c) Adaptive loci diverged; neutral loci diverged; if:
      • - FST coding region Severn > 3 SD + FST coding region Tenterfield; and
      • - FST non-coding region Severn > 3 SD + FST non-coding region Tenterfield.
        Identifying loci under selection
  • The program Outflank (Whitlock & Lotterhos 2015) was used to identify loci with unusually large values of FST.

  • Loci with some alleles favored in some places and other alleles favored elsewhere should be more genetically differentiated among populations than otherwise.

  • Whitlock, M.C. and Lotterhos K.J. (2015) Reliable detection of loci responsible for local adaptation: inference of a neutral model through trimming the distribution of Fst. The American Naturalist 186: 24 - 36.

Principal Component Analysis (PCA)

  • PCA were performed to determine which loci contributed the most to genetic structure between the groups below and above the dam, as well as the group in the undammed river

Extracting PCA Loadings

  • FYI: Extracting PCA loadings
  • PCA loadings describe how much each variable contributes to a particular principal component.
  • Large loadings (positive or negative) indicate that a particular variable has a strong relationship to a particular principal component.
  • The sign of a loading indicates whether a variable and a principal component are positively or negatively correlated.

Results - Dams and Platypus Populations

  • We found 115, 46, and 185 genomic regions explained by scenarios a), b) and c) respectively. (please see slide ‘Analysing FST of chromosomes using 100 Kbp at the time to test the scenarios’)
  • We found probable selection sites in genes with metabolic and development functions including KCNIP4, ADAM10 and OXR1.

Discussion - Dams and Platypus Populations

  • This could be explained by differential environmental challenges above and below dams, resulting in natural selection.
  • This could indicate that adaption to new environments may have occurred in as short as seven generations.