DNA barcoding is a global initiative for species identification through sequencing of short DNA sequence markers. Sequences of two loci, ITS and LSU, were generated as barcode data for all (ca. 9k) yeast strains included in the CBS collection, originally assigned to ca. 2 000 species. Taxonomic sequence validation turned out to be the most severe bottleneck due to the large volume of generated trace files and lack of reference sequences. We have analysed and validated CBS strains and barcode sequences automatically. Our analysis shows that there were 6 and 9.5 % of CBS yeast species that could not be distinguished by ITS and LSU, respectively. Among them, ∼3 % were indistinguishable by both loci. Except for those species, both loci were successfully resolving yeast species as the grouping of yeast DNA barcodes with the predicted taxonomic thresholds was more than 90 % similar to the grouping with respect to the expected taxon names. The taxonomic thresholds predicted to discriminate yeast species were 98.41 % for ITS and 99.51 % for LSU. To discriminate current yeast genera, thresholds were 96.31 % for ITS and 97.11 % for LSU. Using ITS and LSU barcodes, we were also able to show that the recent reclassifications of basidiomycetous yeasts in 2015 have made a significant improvement for the generic taxonomy of those organisms. The barcodes of 4 730 (51 %) CBS yeast strains of 1 351 (80 %) accepted yeast species that were manually validated have been released to GenBank and the CBS-KNAW website as reference sequences for yeast identification.
- automated curation