Subscribe for email alerts

Don’t miss a single Science & Enterprise post. Sign up for our daily email alerts.

Please share Science & Enterprise

DNA Data Search, Retrieval Medium Designed

Digital DNA

(Pete Linforth, Pixabay. https://pixabay.com/illustrations/dna-life-biotechnology-evolution-4068826/)

11 June 2021. A bioengineering team created a practical technique for using DNA as a medium to store and retrieve vast and rapidly growing volumes of data. Researchers from Massachusetts Institute of Technology describe their process in yesterday’s issue of the journal Nature Materials (paid subscription required).

Advances in informatics, such as artificial intelligence, the Internet of things, and autonomous vehicles, are expected to create massive data volumes that stretch current data storage facilities, such as data centers, to their limits. These demands will likely lead to a need for ever-larger physical plants, higher power consumption, and much higher costs, but building more of these large and expensive facilities is becoming increasingly unfeasible and not sustainable.

Researchers from MIT and the Broad Institute affiliated with MIT and Harvard are seeking methods that take advantage of DNA’s data storage potential. The lab of MIT biological engineering professor Mark Bathe studies the structural properties of DNA, particularly for nanoscale applications, including therapeutics, computing, and data storage.

In the article, Bathe and colleagues note DNA’s capacity for ultra-high density data storage. Instead of coding data as 0s and 1s as in digital media, DNA codes data in the four bases found in DNA molecules: adenine, cytosine, guanine, and thymine, or A – C – G – T. Stored data are then written in these synthesized nucleic acids, which are chemically stable and thus can last for long periods of time. Genetic sequencing routines can read and decode the stored DNA data into the original file.

Burning the haystack to find the needle

Holding back adoption of DNA for storing data are its high cost and lack of a practical method for retrieving the desired data. Synthesizing DNA can be done but it’s still much too expensive for large-scale use, at least for now. In the paper, the MIT/Broad team addressed the second issue, a practical retrieval method. “You’re going to have a pile of DNA,” says Bathe in an MIT statement, “which is a gazillion files, images or movies and other stuff, and you need to find the one picture or movie you’re looking for. It’s like trying to find a needle in a haystack.”

In addition, retrieving data today requires sequencing DNA with polymerase chain reaction or PCR, a basic amplification technique. PCR adds a primer sequence to the target DNA sample to jump-start the process. For most other uses, adding the primer does not affect the outcome, but for data retrieval, the primer could get confused with DNA other than the data sought for retrieval. And enzymes used for PCR amplification could also consume other DNA data in the pool. “You’re kind of burning the haystack to find the needle,” notes Bathe, “because all the other DNA is not getting amplified and you’re basically throwing it away.”

The researchers devised a technique for this retrieval problem. The team uses a tiny capsule of silica — silicon dioxide, a naturally occurring substance in quartz, sand, and human soft tissue — to store DNA data files. Each impervious silica particle is coded with a strand of DNA acting like a bar code with metadata to identify and describe data in the particle. The search process uses PCR primers with fluorescent properties that illuminate when matching the metadata encoded in the DNA bar code. The authors say each capsule can store up to the equivalent of a gigabyte of digital data.

The team demonstrated the process with 20 digital images coded into DNA. Each image required some 3,000 nucleotides, building block molecules in nucleic acids, which the authors say is equivalent to about 100 bytes of digital data. The authors say the demonstration shows the technique returns searches, including Boolean searches, with an accuracy of 1 in 1 million files.

Bathe envisions a short-term need for storing DNA data from medical specimens, such as Covid-19 tests, as well as a longer-term need for archival data storage, where data are not intended to be retrieved frequently. Bathe and postdoctoral researcher James Banal, the paper’s first author, are founders of the start-up company Cache DNA developing DNA data storage and retrieval with uniquely identified micro- to nanoscale capsules.

More from Science & Enterprise:

*     *     *

Comments are closed.