Improved Lentil genome assemblies through long read technology

Objectives
  • Develop high quality, long read genome assemblies with annotation for at least one accession of each Lens species.

  • Facilitate precise identification of molecular markers and candidate genes.

Germplasm
Germplasm Genus
Lens
Germplasm Scientific Name
Lens spp.
Germplasm Collection
CDC Greenstar, CDC Redberry, IG 72815, L01-827A, IG 72623, BGE 016880, IG 110813, IG 72815, ILWL 28
Executive Summary

Long-read sequencing technologies have considerably improved the Lcu assembly by allowing us to sequence through the many repeat-heavy regions, correcting orientation and scaffold order, and pulling apart collapsed regions. Further exploration of long-read data will greatly facilitate our understanding of the variation outside the genic regions in both cultivated lentil and its wild relatives and enhance our ability to harness adaptive variation lurking in the dark corners of the Lens genepool. It will also help better resolve quantitative trait loci (QTL) and facilitate the design of markers useful for marker-assisted selection. Emerging research suggests that, like SNPs, structural variations are not evenly distributed throughout plant genomes, but instead elevated in regions of high recombination where uneven crossing over is most likely to occur. It has been hypothesized that many structural variants that segregate in wild taxa are fixed during domestication. Understanding lentil phenotypes in the context of these structural variations will help us design better crossing strategies for harnessing existing variation.

In this study we are focusing on generating improved genome assemblies for all Lens species to provide a solid basis for ongoing research. This includes additional data generation using new technologies such as PacBio Hifi, fine-tuning our existing long-read genome assembly workflow and leveraging information from other cool season legumes. Our initial goal was to collect at least 50X whole genome coverage for each genome and to assemble them independently.

Attribution
The following researchers and their organizations were involved in this work and should be credited for their role in any resulting or related publications.
Data Custodian
Kirstin E Bett
Collaborator
  • Larissa Ramsay
  • Kevin Koh
  • Mari Baker
Data Curator
  • Lacey-Anne Sanderson
  • Carolyn T Caron
Research Organization
Research Outputs
The following research outputs are specific to the study as a whole. There are additional research outputs associated with the linked experiments.
Research Outputs

GenomeFocus: long read genome assembly workflow for large and highly repetitive genomes.

Developed by Larissa Ramsay

This bioinformatics tool was developed through this study based on our experience with large and highly repetitive plant genomes. Furthermore, this workflow was used to assemble many of the genomes resulting from this study. It is open-source and freely available for others to use in their own research at GitHub. There is accompanying documentation detailing how to prepare your files, run associated tools and use scripts in this workflow.

Datasets
  • Lens culinaris: CDC Redberry Genome Assembly v2.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (34x PacBio SMRT, 20x Oxford nanopore reads). The contiguity of the assembly was further validated and improved using HiC data, as well as both an optical and genetic map (LR-01; ILL 1704 x CDC Robin intraspecific RIL). The finished assembly is 3.69 Gb arranged in 7 pseudo-molecules and 2,068 unplaced unitigs.

  • Lens culinaris: CDC Greenstar Genome Assembly v1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (34X HiFi). Scaffolding of the contigs was done using Irys optical mapping data and these were aligned to the CDC Redberry 2.0 assembly and ordered relative to that reference. Further validation was done using genetic maps with CDC Greenstar as one of the parents. The finished assembly is 3.839 Gb, 3.799 Gb of which are arranged in 7 pseudomolecules, and has a contig  N50 of 37.93 Mb.

  • Lens orientalis: BGE 016880 Genome Assembly 1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (50x Oxford nanopore reads). The contiguity of the assembly was further validated and improved using HiC data and a genetic map (LR-89; BGE 016880 x IG 72529 intraspecific RIL). The finished assembly is 3.75 Gb arranged in 7 pseudo-molecules and 689 unplaced unitigs with a contig N50 of 2.316 Mb.

  • Lens lamottei: IG 110813 Genome Assembly 1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This IG 110813 Lens lamottei genome assembly (Lla.1ESP) was assembled using nanopore long reads (>5kb) at 42x coverage and the assembler smartdenovo, resulting in 3695 contigs. The LR-74 genetic map was used to assign the contigs to chromosome bins, which were ordered and oriented with ALLHiC. The finished assembly is 3.238 Gb, 3.052 Gb of which are arranged in 7 pseudomolecules, with a contig N50 of 2.53 Mb.

  • Lens tomentosus: IG 72805 Genome Assembly 1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (60x Oxford nanopore reads). The contiguity of the assembly was further validated and improved using HiC data and a genetic map (LR-90; IG 72614 x IG 72805 intraspecific F2). The finished assembly is 4.07 Gb arranged in 7 pseudo-molecules and 1093 unplaced unitigs with a contig N50 of 3.211 Mb.

  • Lens odemensis: IG 72623 Genome Assembly 1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (41x Oxford nanopore reads). The contiguity of the assembly was further validated and improved using HiC data and a genetic map (IG 72543 x IG 72623 intraspecific RIL). The finished assembly is 3.55 Gb arranged in 7 pseudo-molecules and 1360 unplaced unitigs with a contig N50 of 3.269 Mb.

  • Lens ervoides: IG 72815 Genome Assembly 1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (52x Oxford nanopore reads). The contiguity of the assembly was further validated and improved using HiC data and a genetic map (LR-66; L1-02-827 x IG 72815 intraspecific RIL). The finished assembly is 2.87 Gb arranged in 7 pseudo-molecules and 1,134 unplaced unitigs with a contig N50 of 4.7 Mb.

  • Lens nigricans: ILWL 25 Genome Assembly 1.0

    The above link takes you to the genome assembly page which provides methodology and metadata, as well as, a download link. This genome assembly was constructed with long-read data (Oxford nanopore reads). Contigs were initially assigned their pseudomolecule position based on a reference-guided assembly with the Ler.1DRT assembly, and further validated and improved using HiC data in Juicebox. The finished assembly is 2.77 Gb arranged in 7 pseudo-molecules (96.9% of the genome sequence) and 1445 unplaced unitigs.

Grant Activity
Title
EVOLVES: Enhancing the Value of Lentil Variation for Ecosystem Survival
Data Custodian
  • Kirstin E Bett
  • Albert Vandenberg
Research Organization
Funding Range

2019-2023