Molecular Markers Thomas Lubberstedt Iowa State University Agronomy
Outline - DNA versus Non-DNA markers - Understand SSRs (incl. PCR) as example for classical markers - Understand INDEL and SNP marker basics - Genotyping-by-Sequencing - Fundamental marker applications
Metabolite profiles
Transcripts, expression profiles Methylation status
Environment, development stage independent
Haploid Induction Maize
DNA versus Non-DNA Markers - DNA markers: not environment affected, but: measure of GENETIC POTENTIAL - Non-DNA markers: environment and cell-type dependent, but: indicative of REALIZED POTENTIAL - Humans: Disease gene carrier does not necessarily get sick ->Diagnosis of presence of disease condition and possible treatment requires non-DNA biomarkers
Non – DNA Marker Example:
Human Biomarkers
Unique Micro RNA Profile in Lung Cancer Diagnosis and Prognosis • miRNAs are small non-coding RNAs which play key roles in regulating the translation and degradation of mRNAs
• Genetic and epigenetic alteration may affect miRNA expression, thereby leading to aberrant target gene(s) expression in cancers
• Yanaihara et al, Cancer Cell, 2006: - miRNA profiles of 104 pairs of primary lung cancers and corresponding noncancerous lung tissues were analyzed by miRNA microarrays - 43 miRNAs showed statistical differences
Unique Micro RNA Profile in Lung Cancer Diagnosis and Prognosis
• Yanaihara et al, Cancer Cell, 2006: - miRNA profiles of 104 pairs of primary lung cancers and corresponding noncancerous lung tissues were analyzed by miRNA microarrays - 43 miRNAs showed statistical differences • A univariate Cox proportional hazard regression model with a global permutation test indicated that expression of the miRNAs has-mir-155 and has-let-7a-2 was related to adenocarcinoma patient outcome • Lung adenocarcinoma patients with either high has-mir-155 or reduced has-let-7a-2 expression had poor survival
DNA Markers
Historic markers: PCR - a key technology
PCR: polymerase chain reaction
5’ 3’
3’ 5’
PCR
Exponential increase of the number of copies during PCR
Microsatellites • Design primers to flanking regions
Strength: - Hypervariable, multiple alleles (high PIC) - In silico development straightforward
Weakness: - Capability for multiplexing limited (max. 10-15) ⇒Affects costs / datapoint - Few intragenic SSRs
INDEL & SNP Markers
Characteristics ”next generation” vs. ”historic” markers: - Rely on massive sequence portfolio (most, not all) - High-throughput (96 or 384 scale sample numbers) - Automated procedures - Detection of most basic type of polymorphism: - Insertion – deletion (INDEL) - Single nucleotide polymorphism (SNP) - Most not restricted to particular sequence features (restriction site, SSR)
INDEL Array – Arabidopsis thaliana
Salathia et al. 2007
Current markers: SNPs Single Nucleotide Polymorphisms
TaqMan - rtPCR
•4 oligos must be designed and tested for each SNP •Fast & cheap for lots of samples
SNP 1252 - T
Genotype Calling - Cluster Analysis
SNP 1252 - C
Components of the GeneChip® System
http://www.noble.org/corefacilities/genomics/
http://www.memsjournal.com/2011/02/health-andmedical-market-strategies-for-mems-companies.html
• The basic component of the GeneChip® system is the array (top right), the actual array is about the size of your thumbnail. • The array requires specialized equipment (top left) • From left to right – hybridization chamber, fluidics station, analysis software, and chip scanner. • Cost as package: $250,000 • Automation is varied, but system can be upgraded
Making a GeneChip®
500K: Content Optimized SNP Selection
~2,200,000 SNPs From Public & Perlegen 48 individuals Call rate, concordance
•
Initial Selection: 48 people – 2.2M SNPs – 25 million genotypes – 16 each Caucasian, African, Asian – All HapMap samples
•
Maximize performance: Second selection over 400 people – 270 HapMap Samples – 130 diversity samples – Accuracy • HW, Mendel error, reproducibility – Call rates
•
Maximize information content: – Prioritize SNPs based on LD & HapMap (Broad Institute)
~650K SNPs 400 samples Call rate, accuracy LD
500K SNPs
The Assay - Details
Optimized for 250-2000bp
http://www.affymetrix.com/products/arrays/specific/100k.affx
Some Cost and Time Estimates • Array cost - $225 - $500 depending on size
– Design fees applicable for custom arrays ~$200-$2000 depending on size
• UW Madison Service Fee (not including array cost) – 1-4 Array - $0.04 per data point – 5-12 Array - $0.05 per data point
• Per data point cost is fixed (if equipment is owned) – $0.02 for maize array
• With most updated equipment – 96 samples - 22 hours – 960 samples – 220 hours – 9600 samples – 2200 hours
Genotyping by Sequencing
Advances in Sequencing
Chan 2005
Sequencing Costs
http://www.genome.gov/Pages/Newsroom/Webcasts/2010ScienceReportersWorkshop/Sch
Progress in Genomics: Sequencing Technology Solexa - 3*109 bp / run - 3.000-10.000 $ / run - Short sequences (75 bp) - 8 channels - Primer indexing: multiplexes
https://www.youtube.com/watch?v=womKfi
https://www.youtube.com/watch?v=KzdWZ
454 - 4*107 bp / run - Intermediate length (300-400 bp)
Barcoding
Craig et al. 2008
Genotyping by Sequencing
Sliding Window Approach
Huang et al. 2010
Recombination & Bin Maps
Huang et al. 2010
Defining the scale of the genotyping project is key to selecting an approach: 1000 individuals 5 to 10 SNPs in a candidate gene - Many approaches (expensive ~ 0.60 per SNP/genotype) 48 ( to 96) SNPs in a handful of candidate genes (~ 0.25 to 0.30 per SNP/genotype) 384 - 1,536 SNPs - cost reductions based on scale (~0.08 - 0.15 per SNP/genotype)
$6,000
$~29,000
$57,600-122,880
300,000 to 500,000 SNPs defined format (~0.002 per SNP/ genotype)
$800,000
10,000-20,000 SNPs - defined and custom formats (~0.03 per SNP/genotype)
$>250,000
Single marker assay (low scale)
Medium density assay (medium scale)
(Ultra)-high density (genome-wide scale)
HRM Axiom TaqMan KASPar
Infinium
iPLEX GoldenGate
1,000
SNuPe SNaPshot
SNPlex
Number of samples
10,000
100
MIP
Genotyping by sequencing
10
Invader 1
10
100
1000
10,000
100,000
1,000,000
Number of SNPs
Studer & Kolliker 2013
Two Major Applications of Markers Fingerprinting Gene tagging
”Fingerprinting” Corn salad - elite set (decoded) all lanes, ABCD bands
Genotype 1 2 3 4 5………….
34 45 41 27 30 22 7 35 44 47 46 25 13 11 40 33 32 39 31 43 36 20 10 21 16 15 42 29 37 19 14 23 2 17 18 26 12 24 9 3 6 4 28 38
AFLP-Fingerprint 0.42
0.57
0.71
0.86
Coefficient
Genetic similarity: No. common bands / all bands
Cluster analysis
1.00
Use of ”genetic fingerprint” Phase I: Genetic variation Genotype 1 2 3 4 5………….
→ Parent selection → Recurrent selection → Assignment to heterotic pools → Choice of genetic resources
Phase II + III: Variety parents, Test hybrids
→ Measure heterozygosity: Predict hybrid performance → Backcrossing
AFLP-”Fingerprint”Seed multiplication & Variety protection → Purity of hybrids & inbreds → Variety approval → “Essentially derived varieties” (EDV)
Why tagging a gene with a molecular marker, which can be phenotypically scored ?
Marker-assisted Selection (MAS) Target gene: P, p
DNA-Marker: M, m
Complete linkage: /three diploid genotypes Development Improvement of inbreds Comparison phenotypic selection - MAS:
PP
pp
Pp
Selection after PP flowering Dominanz: = Pp Environment affected expression PP-recognition → Fixation of P Approval of recessive allele → Backcrossing Early selection Discrimination of PP & Pp : Environment independent
MM
Mm
mm
Use of linked markers and fingerprinting to assist Backcrossing
Marker-assisted production of cms – lines: background selection (S)msms x
(N)MsMs Recurrent parent
Donor
x
x
x x
(S)Msms
F1 x
BC1 BC=Backcross
x
BC2
(S)msms
Segregation in BC1
..... Background selection: Fast production of new cms-lines
Marker-assisted production of new cms – lines: Foreground selection Old ♀: Donor
New ♀: Recurrent parent x
(S)msms
(N)MsMs
(S)Msms
x
(N)MsMs BC
1 (S)Msms Problem:
fertile
:
1 (S)MsMs Ms/ms-linked marker Selfing fertile
Segregation: 3 fertile :1 sterile
Further Reading (Books) Molecular Plant Breeding, Yunbi Xu
Diagnostics in Plant Breeding, Thomas Lubberstedt & Rajeev Varshney (eds.)
Both available online in ISU elibrary