The impact of Big data in Drug Discovery

Table of Contents


In recent years, drug discovery has undergone a remarkable evolution. We’ve moved from laborious trials and errors with minimal data to an exciting new era. Now, we’re combining vast information and advanced technology to uncover the mysteries of diseases and discover potential cures. At the heart of this transformation is Big Data, a concept that’s more than just massive datasets – it’s changing how we develop life-saving medicines.

What is Big Data in Drug Discovery?

Let’s dive into how we can use these massive datasets to revolutionize the way we discover new drugs. In 2023, about 328.77 million terabytes of data are created from multiple sources every single day, and that’s such a huge amount that it’s impossible to even count! So, we need special technology to help us make sense of all this data, spot patterns, and make smart decisions.

The same concept applies when we use big data in drug discovery. The scientists gather large and complex sets of important chemical and scientific information, and use computer models to help them find new medicines faster and cheaper. By doing this, they’re able to discover potential new drugs more efficiently, save money in the development process, and increase the chances of finding new treatments for different diseases. It’s all about making the most of the data available to improve healthcare and save lives.

How is Big Data being used in Drug Discovery?

  1. Scientists collect lots of data like genes, patient records, clinical records, chemical and biological information, and other scientific data related to drugs from various sources.
  2. Scientists uncover patterns in the data by using advanced algorithms, like what causes diseases or how new compounds might work.
  3. Machine learning helps predict how well potential drugs will work and how they’ll behave in the body.
  4. High-performance computing quickly checks thousands of compounds for potential new drugs.
  5. Promising drugs found through data analysis are tested on computer simulation, if proven successful, they then proceed to clinical trials to make sure they work.
  6. If they pass these tests, they can go for approval from agencies like the FDA.
  7. Even after approval, big data is still useful for watching how drugs perform in the real world, to keep improving them.

A Notable Example: The Cancer Genome Atlas (TCGA) Project

Back in 2006, the Cancer Genome Atlas (TCGA) project kicked off as a partnership between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) which has significantly advanced the understanding of cancer and has paved the way for more targeted and effective treatments ever since. The Cancer Genome Atlas (TCGA) is a comprehensive research initiative that has compiled and analyzed genetic data from various cancer types and it aims to understand the genetic changes that occur in cancer cells to advance our knowledge of cancer biology, improve diagnosis, and develop targeted treatments. 

This extensive genomic and clinical data has revolutionized cancer research, enabling scientists like Dr. Matthew L. Meyerson, renowned for his work in lung cancer genomics and the identification of genetic changes in various cancer types, to delve into the intricacies of cancer genetics. Dr. Meyerson’s work, deeply intertwined with TCGA’s contributions, exemplifies the real-world application of this wealth of information. By leveraging TCGA’s data, he has made significant strides in understanding and targeting cancer more effectively, ultimately contributing to the development of personalized cancer therapies. TCGA’s open-access data has democratized knowledge, propelling advances in the field and offering hope to cancer patients worldwide.

Comprehensive access to TCGA datasets, e.g. gene expression, copy number variation and full clinical information, is available via the TCGA Data Portal.


This move toward using big data to discover drugs promises quicker and more effective solutions for various diseases in the future. There are many techniques like Retrosynthesis to make drug discovery more efficient using big data, and that’s definitely a topic worth digging into.