Targeted Sequencing in Allopolyploids: Comparison of BAC-by-BAC and Whole Genome Approaches using Third Generation Sequencing

Authors

Carlos Ernesto Maldonado, Beatriz Padilla, Alvaro Gaitán, Marcela Yepes, Aleksey Zimin, Keithanne Mockaitis, Carrie Ganote, Sheri A Sanders, Dave Kudrna, Rod A Wing, Herb Aldwinckle

A targeted sequencing approach was implemented to characterize genomic regions containig QTLs associated with important agronomic traits in the allotetraploid Coffea arabica. The physical map of C. arabica var. Caturra was integrated with the C. canephora genome using the programs FPC and SyMap, and the Minimum Tilling Path (MTP) that covers the target region was selected, sequenced by single molecule real time sequencing (SMRT-Seq) and assembled by HGAP and postHGAP software from PACBio and the Arizona Genomics Institute respectively. A second dataset included the whole genome obtained by PACBio long read sequencing of 20 Kb libraries (WGS-SMRT ~80X coverage) and assembled using Falcon/Falcon Unzip (Chin et al. 2016) and MaSuRCA (Zimin et al. 2017) and the transcriptome from pooled and MID labeled tissues (meristem, leaves and flowers), whole length transcrips were obtained by cDNA sequencing (Iso-Seq PacBio) of C. arabica var. Caturra. For both approaches a region covering 30 cM upstream and downstream from the markers (SNV, DArT and SSR) associated with the QTL was determined by the integration of genetic and physical maps of C. arabica and the C. canephora genome, used as a reference, by alignment of the specific marker or BAC end sequence (BES) using BLAT or e-PCR as mapping tools. The C. arabica WGS was aligned to C. canephora using SyMap and the contigs covering the target region were selected. As was expected for an allotetraploid species several WGS contigs mapped over the same reference region; contigs with higher identity to C. canephora were assigned to that subgenome and those with lower identity to the C. eugenioides subgenome. Contigs of non-overlapping regions were assigned to both subgenomes in the scaffolding process. In the case of the BAC-by-BAC sequencing the assembly was performed using Minimus. Gene prediction was done using the program MAKER and functional annotation was done uysing Blast2GO-pro, and repeats annotation using REPET.

An assembled sequence of 7.38 Mb with 1,188 predicted genes was obtained from the BAC-by-BAC approach. WGS derived scaffolds of 10.2 Mb with 1,675 predicted genes for the C. canephora subgenome and 10.7 Mb with 1,991 for C. eugenioides subgenome. Dot-Plot analysis showed evidence of chimeric assembly between subgenomes in the BAC-by-BAC approach. By functional annotation the enzyme category with greater representation was transferases. The results of Interproscan showed 11 NBS domains of disease resistance genes in the C. eugenioides subgenome, and 9 in the C. canephora subgenome. These results coupled with annotation of P-loop domains and leucine-rich regions domains suggest that this region could be associated with disease resistance that could be contributing to coffee yield. Overall, the best sequencing strategy to obtain high quality data from the tetraploid genome was WGS sequencing, demonstrated by no apparent presence of chimeric regions and a clear differentiation of subregions between subgenomes.

authors
  • Carlos Ernesto Maldonado
    • Centro Nacional de Investigaciones de Café, CENICAFE
  • Beatriz Padilla
    • Universidad Católica de Manizales
  • Alvaro Gaitán
    • Centro Nacional de Investigaciones de Café, CENICAFE
  • Marcela Yepes
    • Cornell University/ School of Integrative Plant Sciences/ Plant Pathology and Plant Microbe Biology Section
  • Aleksey Zimin
    • Johns Hopkins University, Department of Computer Science
    • University of Maryland
  • Keithanne Mockaitis
    • Department of Biology, Indiana University
    • National Center for Genome Analysis Support, Pervasive Technology Institute
  • Carrie Ganote
    • National Center for Genome Analysis Support, Pervasive Technology Institute/ Indiana University
  • Sheri A. Sanders
    • National Center for Genome Analysis Support, Pervasive Technology Institute
  • Dave Kudrna
    • Arizona Genomics Institute, University of Arizona
  • Rod A. Wing
    • Arizona Genomics Institute, University of Arizona
  • Herb Aldwinckle
    • Cornell University/ School of Integrative Plant Sciences/ Plant Pathology and Plant Microbe Biology Section  
Date of publication:
2018