SlideShare a Scribd company logo
De novo Genome Assembly
A/Prof Torsten Seemann
Winter School in Mathematical & Computational Biology - Brisbane, Australia - 4 July 2016
Introduction
The human genome has 47 pieces
MT
(or XY)
The shortest piece is 48,000,000 bp
Total haploid size
3,200,000,000 bp
(3.2 Gbp)
Total diploid size
6,400,000,000 bp
(6.4 Gbp)
∑ = 6,400,000,000 bp
Human DNA iSequencer ™ 46 chromosomal
and1 mitochondrial
sequences
In an ideal world ...
AGTCTAGGATTCGCTATAG
ATTCAGGCTCTGATATATT
TCGCGGCATTAGCTAGAGA
TCTCGAGATTCGTCCCAGT
CTAGGATTCGCTAT
AAGTCTAAGATTC...
The real world ( for now )
Short
fragments
Reads are stored in “FASTQ” files
Genome
Sequencing
Assemble Compare
Genome assembly
(the red pill)
De novo genome assembly
Ideally, one sequence per replicon.
Millions of short
sequences
(reads)
A few long
sequences
(contigs)
Reconstruct the original genome
from the sequence reads only
De novo genome assembly
“From scratch”
An example
A small “genome”
Friends,
Romans,
countrymen,
lend me your ears;
I’ll return
them
tomorrow!
• Reads
ds, Romans, count
ns, countrymen, le
Friends, Rom
send me your ears;
crymen, lend me
Shakespearomics
Whoops!
I dropped
them.
• Reads
ds, Romans, count
ns, countrymen, le
Friends, Rom
send me your ears;
crymen, lend me
• Overlaps
Friends, Rom
ds, Romans, count
ns, countrymen, le
crymen, lend me
send me your ears;
Shakespearomics
I am good
with words.
• Reads
ds, Romans, count
ns, countrymen, le
Friends, Rom
send me your ears;
crymen, lend me
• Overlaps
Friends, Rom
ds, Romans, count
ns, countrymen, le
crymen, lend me
send me your ears;
• Majority consensus
Friends, Romans, countrymen, lend me your ears; (1 contig)
Shakespearomics
We have
reached a
consensus !
Overlap - Layout - Consensus
Amplified DNA
Shear DNA
Sequenced reads
Overlaps
Layout
Consensus ↠ “Contigs”
Assembly graphs
Overlap graph
Another example is this
Overlaps find
Do the graph traverse
Size matters not. Look at me. Judge me by my size, do you?
Size matters not. Look at me. Judge me by my soze, do you?
2 supporting reads
1 supporting reads
So far, so good.
Why is it so hard?
What makes a jigsaw puzzle hard?
Repetitive
regions
Lots of
pieces
Missing
pieces
Multiple
copies
No corners
(circular
genomes)
No box
Dirty
pieces
Frayed
pieces : .,
What makes genome assembly hard
Size of the human genome = 3.2 x 109
bp (3,200,000,000)
Typical short read length = 102
bp (100)
A puzzle with millions to billions of pieces
Storing in RAM is a challenge ⇢ “succinct data structure”
1. Many pieces (read length is very short compared to the genome)
What makes genome assembly hard
Finding overlaps means
examining every pair of reads
Comparisons = N×(N-1)/2
~ N2
Lots of smart tricks to reduce
this close to ~N
2. Lots of overlaps
What makes genome assembly hard
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATA
TATATATA TATATATA TATATATA TATATATA TATATATA
TATATATA TATATATA TATATATA
TATATATA TATATATA TATATATA
3. Lots of sky (short repeats)
How could we possibly assemble this segment of DNA?
All the reads are the same!
What makes genome assembly hard
Read 1: GGAACCTTTGGCCCTGT
Read 2: GGCGCTGTCCATTTTAGAAACC
4. Dirty pieces (sequencing errors)
1. The error in Read 2 might prevent us from seeing that it
overlaps with Read 1
2. How would we know which is the correct sequence?
What makes genome assembly hard
5. Multiple copies (long repeats)
Our old nemesis the REPEAT !
Gene Copy 1 Gene Copy 2
Repeats
What is a repeat?
A segment of DNA
that occurs more than once
in the genome
Major classes of repeats in the human genome
Repeat Class Arrangement Coverage (Hg) Length (bp)
Satellite (micro, mini) Tandem 3% 2-100
SINE Interspersed 15% 100-300
Transposable elements Interspersed 12% 200-5k
LINE Interspersed 21% 500-8k
rDNA Tandem 0.01% 2k-43k
Segmental Duplications Tandem or
Interspersed
0.2% 1k-100k
Repeats
Repeat copy 1 Repeat copy 2
Collapsed repeat consensus
1 locus
4 contigs
Repeats are hubs in the graph
Long reads can span repeats
Repeat copy 1 Repeat copy 2
long
reads
1 contig
Draft vs Finished genomes (bacteria)
250 bp - Illumina - $250 12,000 bp - Pacbio - $2500
The two laws of repeats
1. It is impossible to resolve repeats of length L
unless you have reads longer than L.
2. It is impossible to resolve repeats of length L
unless you have reads longer than L.
How much data do we need?
Coverage vs Depth
Coverage
Fraction of the genome
sequenced by
at least one read
Depth
Average number of reads
that cover any given region
Intuitively more reads should increase coverage and depth
Example: Read length = ⅛ Genome Length
Coverage = ⅜
Depth = ⅜
Genome
Reads
Example: Read length = ⅛ Genome Length
Coverage = 4.5/8
Depth = 5/8
Genome
Reads
Newly Covered Regions = 1.5 Reads
Example: Read length = ⅛ Genome Length
Coverage = 5.2/8
Depth = 7/8
Genome
Reads
Newly Covered Regions = 0.7 Reads
Example: Read length = ⅛ Genome Length
Coverage = 6.7/8
Depth = 10/8
Genome
Reads
Newly Covered Regions = 1.5 Reads
Depth is easy to calculate
Depth = N × L / G
Depth = Y / G
N = Number of reads
L = Length of a read
G = Genome length
Y = Sequence yield (N x L)
Example: Coverage of the Tarantula genome
“The size estimate of the tarantula
genome based on k-mer analysis is 6 Gb
and we sequenced at 40x coverage from
a single female A. geniculata”
Sangaard et al 2014. Nature Comm (5) p1-11
Depth = N × L / G
40 = N x 100 / (6 Billion)
N = 40 * 6 Billion / 100
N = 2.4 Billion
Coverage and depth are related
Approximate
formula for
coverage
assuming
random reads
Coverage = 1 - e-Depth
Much more sequencing needed in reality
● Sequencing is not random
○ GC and AT rich regions
are under represented
○ Other chemistry quirks
● More depth needed for:
○ sequencing errors
○ polymorphisms
Assessing assemblies
Contiguity
Completeness
Correctness
Contiguity
● Desire
○ Fewer contigs
○ Longer contigs
● Metrics
○ Number of contigs
○ Average contig length
○ Median contig length
○ Maximum contig length
○ “N50”, “NG50”, “D50”
Contiguity: the N50 statistic
Completeness : Total size
Proportion of the original genome represented by the assembly
Can be between 0 and 1
Proportion of estimated genome size
… but estimates are not perfect
Completeness: core genes
Proportion of coding sequences can be estimated based on
known core genes thought to be present in a wide variety of
organisms.
Assumes that the proportion of assembled genes is equal to the
proportion of assembled core genes.
Number of Core Genes in
Assembly
Number of Core Genes in
Database
In the past this was done with a tool called CEGMA
There is a new tool for this called BUSCO
Correctness
Proportion of the assembly that is free from errors
Errors include
1. Mis-joins
2. Repeat compressions
3. Unnecessary duplications
4. Indels / SNPs caused by assembler
Correctness: check for self consistency
● Align all the reads back to the contigs
● Look for inconsistencies
Original Reads
Align
Assembly
Mapped Read
Assemble ALL the things
Why assemble genomes
● Produce a reference for new species
● Genomic variation can be studied by comparing
against the reference
● A wide variety of molecular biological tools become
available or more effective
● Coding sequences can be studied in the context of their
related non-coding (eg regulatory) sequences
● High level genome structure (number, arrangement of
genes and repeats) can be studied
Vast majority of genomes remain unsequenced
Estimated number of species ~ 9 Million
Whole genome sequences (including
incomplete drafts) ~ 3000
Estimated number of species Millions to
Billions
Genomic sequences ~ 150,000
Eukaryotes Bacteria & Archaea
Every individual has a different genome
Some diseases such as cancer give rise to massive genome level variation
Not just genomes
● Transcriptomes
○ One contig for every isoform
○ Do not expect uniform coverage
● Metagenomes
○ Mixture of different organisms
○ Host, bacteria, virus, fungi all at once
○ All different depths
● Metatranscriptomes
○ Combination of above!
Conclusions
Take home points
● De novo assembly is the process of reconstructing a long
sequence from many short ones
● Represented as a mathematical “overlap graph”
● Assembly is very challenging (“impossible”) because
○ sequencing bias under represents certain regions
○ Reads are short relative to genome size
○ Repeats create tangled hubs in the assembly graph
○ Sequencing errors cause detours and bubbles in the assembly graph
Contact
tseemann.github.io
torsten.seemann@gmail.com
@torstenseemann
The End

More Related Content

What's hot

Genome assembly
Genome assemblyGenome assembly
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
Denis C. Bauer
 
Overview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsOverview of Genome Assembly Algorithms
Overview of Genome Assembly Algorithms
Ntino Krampis
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
Denis C. Bauer
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
ProshantaShil
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
Vijay Hemmadi
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
Farid MUSA
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
James McInerney
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Somdutt Sharma
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysisyuvraj404
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading FramesOsama Zahid
 
FASTA
FASTAFASTA
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
Prof. Wim Van Criekinge
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
Sebastian Schmeier
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
SHEETHUMOLKS
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
Jyoti Singh
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Jajati Keshari Nayak
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
MugdhaSharma11
 

What's hot (20)

Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Overview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsOverview of Genome Assembly Algorithms
Overview of Genome Assembly Algorithms
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
Est database
Est databaseEst database
Est database
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
 
FASTA
FASTAFASTA
FASTA
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
 

Viewers also liked

Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1
Robert (Rob) Salomon
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
Philip Bourne
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
Philip Bourne
 
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura AdamMapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
madalladam
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
Senthil Natesan
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
madalladam
 
Ap Chapter 21
Ap Chapter 21Ap Chapter 21
Ap Chapter 21
smithbio
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
Saramita De Chakravarti
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
Christian Frech
 
Gene concept
Gene conceptGene concept
Gene concept
Promila Sheoran
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 

Viewers also liked (12)

Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura AdamMapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
 
Ap Chapter 21
Ap Chapter 21Ap Chapter 21
Ap Chapter 21
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Gene concept
Gene conceptGene concept
Gene concept
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 

Similar to De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au - 4 july 2016

GENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.pptGENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.ppt
sherylbadayos
 
1_7_genome_1.ppt
1_7_genome_1.ppt1_7_genome_1.ppt
1_7_genome_1.ppt
OmerBushra4
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
fmaumus
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
Cot Curve_Dr. Sonia.pdf
Cot Curve_Dr. Sonia.pdfCot Curve_Dr. Sonia.pdf
Cot Curve_Dr. Sonia.pdf
soniaangeline
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
Monica Munoz-Torres
 
Sept2016 sv 10_x
Sept2016 sv 10_xSept2016 sv 10_x
Sept2016 sv 10_x
GenomeInABottle
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
Cherry
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
Monica Munoz-Torres
 
NGSWorkflows.ppt
NGSWorkflows.pptNGSWorkflows.ppt
NGSWorkflows.ppt
RaulArias48
 
Genome structure
Genome structure Genome structure
Genome structure
mwangi nicholas
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
Monica Munoz-Torres
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
Torsten Seemann
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Integrated DNA Technologies
 
DNA sequencing
DNA sequencing  DNA sequencing
DNA sequencing
Nawfal Aldujaily
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 

Similar to De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au - 4 july 2016 (20)

GENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.pptGENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.ppt
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
1_7_genome_1.ppt
1_7_genome_1.ppt1_7_genome_1.ppt
1_7_genome_1.ppt
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
Cot Curve_Dr. Sonia.pdf
Cot Curve_Dr. Sonia.pdfCot Curve_Dr. Sonia.pdf
Cot Curve_Dr. Sonia.pdf
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
Sept2016 sv 10_x
Sept2016 sv 10_xSept2016 sv 10_x
Sept2016 sv 10_x
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
NGSWorkflows.ppt
NGSWorkflows.pptNGSWorkflows.ppt
NGSWorkflows.ppt
 
Genome structure
Genome structure Genome structure
Genome structure
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
DNA as Storage Medium
DNA as Storage MediumDNA as Storage Medium
DNA as Storage Medium
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
 
DNA sequencing
DNA sequencing  DNA sequencing
DNA sequencing
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 

More from Torsten Seemann

How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will use
Torsten Seemann
 
How to write bioinformatics software people will use and cite - t.seemann - ...
How to write bioinformatics software people will use and cite -  t.seemann - ...How to write bioinformatics software people will use and cite -  t.seemann - ...
How to write bioinformatics software people will use and cite - t.seemann - ...
Torsten Seemann
 
Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016
Torsten Seemann
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Torsten Seemann
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
Torsten Seemann
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Torsten Seemann
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Torsten Seemann
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Torsten Seemann
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Torsten Seemann
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
Torsten Seemann
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Torsten Seemann
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Torsten Seemann
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Torsten Seemann
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
Torsten Seemann
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Torsten Seemann
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Torsten Seemann
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
Torsten Seemann
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Torsten Seemann
 
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Torsten Seemann
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Torsten Seemann
 

More from Torsten Seemann (20)

How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will use
 
How to write bioinformatics software people will use and cite - t.seemann - ...
How to write bioinformatics software people will use and cite -  t.seemann - ...How to write bioinformatics software people will use and cite -  t.seemann - ...
How to write bioinformatics software people will use and cite - t.seemann - ...
 
Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
 
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 

Recently uploaded

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 

Recently uploaded (20)

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 

De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au - 4 july 2016

  • 1. De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, Australia - 4 July 2016
  • 3. The human genome has 47 pieces MT (or XY)
  • 4. The shortest piece is 48,000,000 bp Total haploid size 3,200,000,000 bp (3.2 Gbp) Total diploid size 6,400,000,000 bp (6.4 Gbp) ∑ = 6,400,000,000 bp
  • 5. Human DNA iSequencer ™ 46 chromosomal and1 mitochondrial sequences In an ideal world ... AGTCTAGGATTCGCTATAG ATTCAGGCTCTGATATATT TCGCGGCATTAGCTAGAGA TCTCGAGATTCGTCCCAGT CTAGGATTCGCTAT AAGTCTAAGATTC...
  • 6. The real world ( for now ) Short fragments Reads are stored in “FASTQ” files Genome Sequencing
  • 7.
  • 10. De novo genome assembly Ideally, one sequence per replicon. Millions of short sequences (reads) A few long sequences (contigs) Reconstruct the original genome from the sequence reads only
  • 11. De novo genome assembly “From scratch”
  • 13. A small “genome” Friends, Romans, countrymen, lend me your ears; I’ll return them tomorrow!
  • 14. • Reads ds, Romans, count ns, countrymen, le Friends, Rom send me your ears; crymen, lend me Shakespearomics Whoops! I dropped them.
  • 15. • Reads ds, Romans, count ns, countrymen, le Friends, Rom send me your ears; crymen, lend me • Overlaps Friends, Rom ds, Romans, count ns, countrymen, le crymen, lend me send me your ears; Shakespearomics I am good with words.
  • 16. • Reads ds, Romans, count ns, countrymen, le Friends, Rom send me your ears; crymen, lend me • Overlaps Friends, Rom ds, Romans, count ns, countrymen, le crymen, lend me send me your ears; • Majority consensus Friends, Romans, countrymen, lend me your ears; (1 contig) Shakespearomics We have reached a consensus !
  • 17. Overlap - Layout - Consensus Amplified DNA Shear DNA Sequenced reads Overlaps Layout Consensus ↠ “Contigs”
  • 22. Do the graph traverse Size matters not. Look at me. Judge me by my size, do you? Size matters not. Look at me. Judge me by my soze, do you? 2 supporting reads 1 supporting reads
  • 23. So far, so good.
  • 24.
  • 25. Why is it so hard?
  • 26. What makes a jigsaw puzzle hard? Repetitive regions Lots of pieces Missing pieces Multiple copies No corners (circular genomes) No box Dirty pieces Frayed pieces : .,
  • 27. What makes genome assembly hard Size of the human genome = 3.2 x 109 bp (3,200,000,000) Typical short read length = 102 bp (100) A puzzle with millions to billions of pieces Storing in RAM is a challenge ⇢ “succinct data structure” 1. Many pieces (read length is very short compared to the genome)
  • 28. What makes genome assembly hard Finding overlaps means examining every pair of reads Comparisons = N×(N-1)/2 ~ N2 Lots of smart tricks to reduce this close to ~N 2. Lots of overlaps
  • 29. What makes genome assembly hard ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATA TATATATA TATATATA TATATATA TATATATA TATATATA TATATATA TATATATA TATATATA TATATATA TATATATA TATATATA 3. Lots of sky (short repeats) How could we possibly assemble this segment of DNA? All the reads are the same!
  • 30. What makes genome assembly hard Read 1: GGAACCTTTGGCCCTGT Read 2: GGCGCTGTCCATTTTAGAAACC 4. Dirty pieces (sequencing errors) 1. The error in Read 2 might prevent us from seeing that it overlaps with Read 1 2. How would we know which is the correct sequence?
  • 31. What makes genome assembly hard 5. Multiple copies (long repeats) Our old nemesis the REPEAT ! Gene Copy 1 Gene Copy 2
  • 33. What is a repeat? A segment of DNA that occurs more than once in the genome
  • 34. Major classes of repeats in the human genome Repeat Class Arrangement Coverage (Hg) Length (bp) Satellite (micro, mini) Tandem 3% 2-100 SINE Interspersed 15% 100-300 Transposable elements Interspersed 12% 200-5k LINE Interspersed 21% 500-8k rDNA Tandem 0.01% 2k-43k Segmental Duplications Tandem or Interspersed 0.2% 1k-100k
  • 35. Repeats Repeat copy 1 Repeat copy 2 Collapsed repeat consensus 1 locus 4 contigs
  • 36. Repeats are hubs in the graph
  • 37. Long reads can span repeats Repeat copy 1 Repeat copy 2 long reads 1 contig
  • 38. Draft vs Finished genomes (bacteria) 250 bp - Illumina - $250 12,000 bp - Pacbio - $2500
  • 39. The two laws of repeats 1. It is impossible to resolve repeats of length L unless you have reads longer than L. 2. It is impossible to resolve repeats of length L unless you have reads longer than L.
  • 40. How much data do we need?
  • 41. Coverage vs Depth Coverage Fraction of the genome sequenced by at least one read Depth Average number of reads that cover any given region Intuitively more reads should increase coverage and depth
  • 42. Example: Read length = ⅛ Genome Length Coverage = ⅜ Depth = ⅜ Genome Reads
  • 43. Example: Read length = ⅛ Genome Length Coverage = 4.5/8 Depth = 5/8 Genome Reads Newly Covered Regions = 1.5 Reads
  • 44. Example: Read length = ⅛ Genome Length Coverage = 5.2/8 Depth = 7/8 Genome Reads Newly Covered Regions = 0.7 Reads
  • 45. Example: Read length = ⅛ Genome Length Coverage = 6.7/8 Depth = 10/8 Genome Reads Newly Covered Regions = 1.5 Reads
  • 46. Depth is easy to calculate Depth = N × L / G Depth = Y / G N = Number of reads L = Length of a read G = Genome length Y = Sequence yield (N x L)
  • 47. Example: Coverage of the Tarantula genome “The size estimate of the tarantula genome based on k-mer analysis is 6 Gb and we sequenced at 40x coverage from a single female A. geniculata” Sangaard et al 2014. Nature Comm (5) p1-11 Depth = N × L / G 40 = N x 100 / (6 Billion) N = 40 * 6 Billion / 100 N = 2.4 Billion
  • 48. Coverage and depth are related Approximate formula for coverage assuming random reads Coverage = 1 - e-Depth
  • 49. Much more sequencing needed in reality ● Sequencing is not random ○ GC and AT rich regions are under represented ○ Other chemistry quirks ● More depth needed for: ○ sequencing errors ○ polymorphisms
  • 52. Contiguity ● Desire ○ Fewer contigs ○ Longer contigs ● Metrics ○ Number of contigs ○ Average contig length ○ Median contig length ○ Maximum contig length ○ “N50”, “NG50”, “D50”
  • 53. Contiguity: the N50 statistic
  • 54. Completeness : Total size Proportion of the original genome represented by the assembly Can be between 0 and 1 Proportion of estimated genome size … but estimates are not perfect
  • 55. Completeness: core genes Proportion of coding sequences can be estimated based on known core genes thought to be present in a wide variety of organisms. Assumes that the proportion of assembled genes is equal to the proportion of assembled core genes. Number of Core Genes in Assembly Number of Core Genes in Database In the past this was done with a tool called CEGMA There is a new tool for this called BUSCO
  • 56. Correctness Proportion of the assembly that is free from errors Errors include 1. Mis-joins 2. Repeat compressions 3. Unnecessary duplications 4. Indels / SNPs caused by assembler
  • 57. Correctness: check for self consistency ● Align all the reads back to the contigs ● Look for inconsistencies Original Reads Align Assembly Mapped Read
  • 59. Why assemble genomes ● Produce a reference for new species ● Genomic variation can be studied by comparing against the reference ● A wide variety of molecular biological tools become available or more effective ● Coding sequences can be studied in the context of their related non-coding (eg regulatory) sequences ● High level genome structure (number, arrangement of genes and repeats) can be studied
  • 60. Vast majority of genomes remain unsequenced Estimated number of species ~ 9 Million Whole genome sequences (including incomplete drafts) ~ 3000 Estimated number of species Millions to Billions Genomic sequences ~ 150,000 Eukaryotes Bacteria & Archaea Every individual has a different genome Some diseases such as cancer give rise to massive genome level variation
  • 61.
  • 62. Not just genomes ● Transcriptomes ○ One contig for every isoform ○ Do not expect uniform coverage ● Metagenomes ○ Mixture of different organisms ○ Host, bacteria, virus, fungi all at once ○ All different depths ● Metatranscriptomes ○ Combination of above!
  • 64. Take home points ● De novo assembly is the process of reconstructing a long sequence from many short ones ● Represented as a mathematical “overlap graph” ● Assembly is very challenging (“impossible”) because ○ sequencing bias under represents certain regions ○ Reads are short relative to genome size ○ Repeats create tangled hubs in the assembly graph ○ Sequencing errors cause detours and bubbles in the assembly graph