DNA as Information Storage- Integrating Biology With Technology

8 minutes read

Information Technology and Biology have always been two very distant fields having little in common.

But the world is changing and with its comes new possibilities. Read this article and learn the potential usage of DNA as Information Storage. Merging the boundaries of biology and technology.

Written By
Approved by
Updated On

The world is continuously evolving and developing and with this development, the rate at which we generate, consume, and store data has also proliferated.

There used to be a time when even downloading images could take significant time and now we can stream live HD videos easily. With this increase in data, the need to store it has also been on the rise.

To meet this demand extensive research has been going on in various fields to make better data storage devices. One such novel idea is using DNA to store digital data.

Origin of DNA Memory Storage

The idea of storing digital data may seem fresh and new but in reality, it has been in the works for decades now. The theoretical concept for this was given in the 1960s as a thought experiment.

Due to the technological constraints of the time, the idea could not be fully explored. However, the recent advancements in the field have made researchers revisit the idea.

With the development of CRISPR technology, editing genes have become easier than ever before. There has been development in nanofluids coupled with the development of nanopore sensors or optofluidic devices have enabled real-time monitoring and analysis of DNA molecules during data storage and retrieval processes bringing us even closer to successfully creating DNA drives.

The Potential of DNA as Information Storage

Before we invest time and money into anything we must first evaluate its potential and if it is worth the investment.

Now instead of some abstract ideas and theories let us take a look at some stats and do some calculations to better understand the potential of DNA for data storage.

As reported by Statista in its worldwide survey of data generation, it is forecasted that in 2025, the amount of data produced will be 181 Zettabytes (ZB).

A huge amount of this data gets lost creating a hassle for the users. For a more in-depth insight regarding data loss read the data loss report 2023.

Let us calculate the amount of DNA that will be required to store this data.

In DNA, we have a double helix structure where A bonds with T and G bonds with C, we can say that each base pair of the DNA will encode 1 bit of data.

Now, since the average molar mass of a DNA base pair is 660 g/mol. Then the average number of base pairs in 1 gram of DNA would be

This shows that 1gm of DNA can save 0.114 ZB of data. So to store 181 ZB data 1587.71 gm of DNA is necessary. Taking the density of dried DNA as 1.7 g/cm3 we get the required volume of approximately 934 cm3.

This is smaller than even a basketball. Imagine the data produced during the entire year across the globe can stored inside a drive, the size of a basketball. This is the potential of DNA for data storage.

Even saying DNA data storage drive can make the most advanced present-day storage drives seem like floppy disks is an understatement.

How does DNA Memory Storage work?

The principle behind DNA Memory storage is quite simple. There are 4 possible nucleotides in DNA- Adenine (A), Thymine (T), Guanine (G), and Cytosine (C). They play an important role in data storage. There are broadly five steps involved in the working of DNA Memory Storage.

  • Encode- First the data gets encoded. All digital data is essentially nothing but binary codes. We can convert this binary code to a nucleotide sequence. This step is called encoding the data. There are various methods to encode binary code to nucleotide sequence.
  • Synthesis- The next step is synthesis. This involves writing the sequence into an actual DNA molecule.
  • Storing- After synthesis comes storage. To store DNA it is either dried or encapsulated within a protective material. DNA by nature is very stable and if proper conditions are provided will remain unaffected for centuries if not millennials.
  • Retrieving- When the data needs to be accessed the DNA molecule will be retrieved and sequenced. There are various methods for sequencing DNA like PCR and Next Gen Sequencing (NGS).
  • Decoding- After the DNA gets sequenced we get the nucleotide sequence. Using the same schema used to encode the data decoding is done and we can access the data.

Why Choose DNA?

The demand for data storage has been increasing each day and is not going to stop anytime soon. So, a lot of research has been going on in different sectors to develop a better alternative tech for data storage. But, with a vast variety of biomolecules and materials present why focus on DNA molecules?

  • Long-Term Stability- DNA molecules are innately very stable and under suitable conditions will remain unchanged for centuries making it excellent for archival purposes. Other technologies like 5D Optical Data Storage also have a high potential for archiving. But unlike it, the use of DNA is not limited to only archiving.
  • High Information Density- It has a very high information density making it an excellent candidate for high storage capacity devices.
  • Environment Friendliness- With the immense amount of focus being given to eco-friendliness and companies moving towards green data technologies and green data centers. DNA is very environment-friendly and can be a huge step forward in reducing carbon footprint.
  • Security- It is not easy to edit sequences and it can be physically protected from unauthorized access. It has the potential for better data security against ransomware. This is because you can physically remove DNA, sequence it, and decode it to retrieve your data.

All of these advantages make DNA an excellent choice to explore its potential for data storage.

Challenges Stopping Us from Implementing this Novel Tech

As great as its potential is there are some challenges also stopping us from employing this technology. The major challenges include.

  • Cost- No matter how useful a technology it is only feasible if its cost is reasonable. Currently, the cost of editing and sequencing has reduced a lot when compared to the past. But it is still significantly high making creating this technology at present not feasible.
  • Error- Even though CRISPR Cas9 has made editing genes much easier than ever before it is still at risk of errors. This is another significant obstacle that needs to be tackled.
  • Speed- Although modern DNA sequencing technologies are very fast. It is still slow when compared to the processing speeds of data storage devices. The speed of encoding, editing, and decoding DNA molecules has to be increased to bring DNA drives to fruition.
  • Advanced Software Requirements- To make this technology scalable special algorithms, software, and hardware would be required that can quickly encode, edit, and decode DNA sequences.

Taking all these challenges into account we can estimate that there is still a lot of development that needs to be made before we can see DNA storage devices.

Final Verdict

Even with a lot of challenges in the way to fully develop this technology, its potential is so enticing that data storage companies are still investing in developing this.

Although, you may not see it anytime soon, when fully developed DNA data storage drive has the potential to forever revolutionize the industry.

About the Author: Hamid Imtiaz

Hamid Imtiaz is the Chief Technology Officer (CTO) at Remo Software, where he leads the charge in developing cutting-edge solutions in the fields of data recovery, data security, and data forensics. With a strong engineering background, Hamid has a keen eye for detail and a meticulous approach to his work. His expertise spans a wide…