Using DNA for stable, long-term, limitless data storage
The Takeaway: For as long as almost all of us have been alive, digital data have been stored on disk and tape systems or flash memory. While efficient, these systems have their drawbacks. A new era of digital storage may be upon us, however—using synthetic DNA to store important information for eternity.
Ever since we started creating stuff online, there has been a way to store it. From nostalgic floppy disks to cell phones that can hold a terabyte of data, data storage has been a healthy, growing industry. Today, most data are stored in “the cloud,” which actually means an off-site server, and these giant facilities spreading around the world keep the internet humming—but at a cost. Data centers have become controversial for their consumption of vast amounts of electricity, colossal quantities of water, precious open space, pollution, and noise.
One emerging solution? Storing data into the sequence of nucleotides (A, C, G, and T) that make up DNA molecules. While this method is far from common yet and has its share of problems, it’s an emerging field with a lot of opportunities.
Why is DNA good for data storage?
DNA has several properties that make it an attractive option for long-term data storage:
- Density: DNA can store immense amounts of information in a very small space. It is estimated that a gram of DNA can store about 215 petabytes (215 million gigabytes) of data.
- Durability: Under the right conditions, DNA can remain stable for thousands of years, making it a potential candidate for long-term data archiving.
- Information security: DNA can be encrypted, offering potential security benefits for sensitive data.
How can data be stored on DNA?
Here's a general overview of the process of storing data on DNA:
- Encoding data into DNA:
a. Convert data: Before encoding data into DNA, it needs to be converted into binary format (0s and 1s).
b. Mapping to nucleotides: Assign each set of binary digits (e.g., 8 bits = 1 byte) to a specific sequence of nucleotides. For example, you could use A for 00, C for 01, G for 10, and T for 11.
c. Error correction: DNA synthesis and sequencing are error-prone processes. To ensure accuracy, error correction codes are often applied.
-
Synthesizing DNA: Use a DNA synthesizer to create short strands of DNA according to the encoded sequence.
- Storing DNA: DNA needs to be stored in a stable environment to prevent degradation. Specialized storage methods, such as cryogenic storage or desiccation, can be used.
- Retrieving data: To retrieve the data, the code is sequenced using DNA sequencing technologies. The sequence is then decoded back into binary and subsequently into the original digital format.
Challenges and considerations when using DNA for data storage
Cost: Synthesizing and sequencing DNA can be expensive, although costs are decreasing with continuing advancements in these technologies.
Speed: Current DNA synthesis and sequencing processes are slower compared to traditional digital storage methods.
Error rates: DNA synthesis and sequencing can introduce errors. Error correction techniques are essential to ensure data integrity.
Ethical and regulatory considerations: There may be ethical and regulatory concerns regarding the synthesis and storage of DNA containing digital information.
Standardization: Developing standards for encoding, decoding, and storing DNA data is necessary for widespread adoption.
Despite these challenges, ongoing research and advancements in biotechnology are making DNA data storage increasingly feasible and practical for certain applications, particularly for long-term archival storage.
IDT’s role in using DNA for data storage
For more than 35 years, IDT has been a leader in supplying custom DNA oligos for research applications, and IDT DNA oligos have successfully been used in groundbreaking data storage studies.
Data were first stored on DNA using IDT’s gBlocks™ Gene Fragments to demonstrate random access and information rewriting on a DNA-based storage system. Today, DNA storage at IDT is led by oPools™ Oligo Pools, which offer high fidelity, uniformity, low error rates, and low dropout rates. How does it work? As noted, researchers translate the 1s and 0s of data into As, Ts, Cs, and Gs, then synthesize this code onto a molecule. To retrieve the data, PCR hunts for the targeted section of the sequence, which is then replicated, sequenced, decoded, and adjusted for errors. Because the process is error-prone, redundancy is used to ensure the correct data are read, a step earlier methods did not use.
While the cost is high, DNA storage could prove to be perfect for someone who needs to store data for long periods of time and access it infrequently. Due to the stable nature of DNA, it is possible that data could be stored on them practically forever with some optimization. The fact that IDT makes enough material to be reused many times may also drive down costs.