Targeted reduction of highly abundant transcripts with pseudo-random primers

In quantitative analysis of gene expression using DNA sequencers, the precision depends on the number of sequence reads, and each of these reads has a cost. Therefore, if one prevents uninformative sequences to be read, the cost/performance of the analysis increases. We have developed a method to deplete target sequences from transcriptome libraries (typically nanoCAGE).

Before sequencing, RNA molecules are usually converted into DNA molecules with an enzyme, the reverse-transcriptase, that uses short DNA oligonucleotides as synthesis primers. Reactions to convert the whole transcriptome are often conduced with a "random" mixture of 4,096 different primers, that covers all the possible combinations of A, C, G and T on 6 consecutive nucleotides. In 2009, Armour and collaborators showed that by using a subset of these "random" primers, one can avoid the conversion of target RNA molecules (typically the ribosomal RNAs, which are highly abundant but carry little information). However, this subset was several hundreds of oligonucleotides, which is costly to synthesize.

Observing that the reverse-transcriptase is highly tolerant to mismatches, we reasoned that a similar result could be obtained with a dramatically lower number of oligonucleotides, which we called "pseudo-random primers". In an article published this month in BioTechniques (Arnaud et al., 2016), we demonstrate our approach by reducing either ribosomal or hemoglobin RNA. Importantly, our method also applies to the reduction of the artifacts created by the cross priming of the oligonucleotide tails.

During the publication process of this work, we also explored the new ways of communicating scientific results on Internet. We deposited our presubmission manuscript to the "bioRxiv" repository, and the scripts of our bioinformatics analysis as supplemental material in GitHub.