Library Barcode Design Improves Error Rate in Sequencing
Typically, libraries assign a unique number to each item in their collections and attach that barcode to it. That barcode is then read by a barcode scanner and linked to a record in a library database. This allows the item to be tracked throughout its circulation and to help ensure that the correct item gets returned to the proper patron.
The problem arises when a large number of items are circulating in the same batch, and many different barcodes are present in the sequence data. This can cause errors in the library records and make it difficult to identify which item is associated with each barcode.
To reduce these errors we developed a new barcode design that significantly improves the error rate of the resulting sequencing data. The design uses a pair of twofold degenerate "WS" bases that both control the GC content and limit the length of mononucleotide runs, and fourfold degenerate bases that increase information content and further reduce the size of dinucleotide runs. The result is a dramatically improved error rate for a wide range of sequence datasets.
In addition, the new design is amenable to library barcode tag (BLT) experiments, in which chemicals are tagged with batch and target codes from different experiments and then pooled for sequencing. We have demonstrated this by creating a BLT library that contains all possible combinations of bits for the three different bit values (1, 2, and 3). In a series of BLT experiments, our approach produced the lowest error rate when compared to a library constructed using standard methods.