supersampler Fractionnal hitting set implementation for lightweight genomic data sketching - CRISTAL-BONSAI Accéder directement au contenu
Logiciel Année : 2023

supersampler Fractionnal hitting set implementation for lightweight genomic data sketching

Timothé Rouzé
Igor Martayan
Camille Marchet

Résumé

Bird-eye view SuperSampler (SPSP) is an implementation for a novel k-mer selection scheme we called Fractional Hitting Sets (FHS) which is a generalisation of Universal Hitting Sets (UHS). It allows to quickly create sketches of genomes/ metagenomes and to compare such sketches to obtain Containment or Jaccard indices of the input data. SuperSampler uses super-k-mers instead of k-mers which allows for lighter sketches, less RAM usage and less computational time when performing comparison than traditional subsampling methods. Thanks to a clever sketch organisation allowed by the super-k-mers structure. Sketch creation is an application of FracMinHash on the selection of minimizers (a m-mer of a k-mer which hash value is minimal). When a minimizer is selected, every k-mer around it which shares the same minimizer is selected and will form a super-k-mer.

Mots clés

Citer

Antoine Limasset, Timothé Rouzé, Igor Martayan, Camille Marchet. supersampler Fractionnal hitting set implementation for lightweight genomic data sketching. 2023, ⟨swh:1:dir:bb6dea99f95337e2bf26180f5242fa4b09b897f6;origin=https://hal.archives-ouvertes.fr/hal-04448163;visit=swh:1:snp:1f35ba3a6890c0ed7ca0e06e4662f5abc05db16c;anchor=swh:1:rel:42686bbd8658ab7a2a280c4dd46d8bd8223a0851;path=/⟩. ⟨hal-04448163⟩
14 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More