Data – The fuel that powers the engine of genomics
Personalization is now more of a customer expectation than a pleasure when it comes to products and services, but when it comes to healthcare, we always try to assimilate the role of genomic data in the creation of tailor-made drugs. The power of genomics lies in getting the right data to the right people to enable them to make the right decisions! To appreciate the role of data in genomics, it is essential to first recognize its varied applications, from diagnostics to drug discovery, although the underlying goal remains the same: to understand the biology of a disease.
What is data?
Identifying the order of the letters that make up an organism’s DNA and translating it into meaningful information is what genomics is all about. Each human genome has 20,000 to 25,000 genes and is made up of 3 million base pairs. The Human Genome Project, which began in 1990, took 13 years to sequence the entire human genome and gave the world a source of detailed information on the structure, organization and function of the entire whole. human genes. It has helped us understand how genes work together to direct the growth and development of an organism.
While whole genome sequencing became popular after that and was widely adopted in research practice, whole exome sequencing rose to prominence in clinical diagnostics. The exome is a collection of all the exons (pieces of DNA that provide instructions for making proteins) in a genome. These are thought to make up 1% of a person’s genome. Since most known disease-causing mutations occur in exons, this method can identify variations in the protein-coding region of a gene.
The data generated by sequencing techniques includes large amounts of information potentially important to an individual’s current and future health and may also have implications for family members. This data also provides opportunities for reuse for additional clinical, health, research, or recreational purposes. However, decoding the genome sequence is only a starting point!
Human biological systems are more complex and understanding the interaction between molecules will require a more integrated approach at multiple levels such as genome, epigenome, transcriptome, proteome and metabolome. This multi-omics data integration provides information about biomolecules of different layers sequentially or simultaneously and can bridge the gap between an individual’s genetic data and observable trait data. Various studies have proven that integrating multiomics data can help identify multiple pathways and processes that cause disease and based on this, we can stratify patients for informed and targeted therapeutic disease management.
The vastness of the data
Compute demands throughout the lifecycle of an omics data set range from acquisition, storage, distribution, analysis to sharing. A single sequence of the human genome generates nearly 200 gigabytes of data. Downstream analysis generates an additional 100 gigabytes of data per genome. Sequencing multiple human genomes alone would represent hundreds of petabytes of data, and the data created by analyzing molecular interactions multiplies that. By one estimate, we will need nearly 40 exabytes to store the world’s only generated genomic sequence data by 2025. In comparison, 5 exabytes could store all the words ever spoken by human beings!
With the huge computing power required, relying on high-performance computers for data analysis is economically unfeasible for most companies and research institutes. Large servers require exorbitant capital and significant maintenance overhead with constant upgrades to maintain performance. The difference in configurations and technical specifications compared to the software makes the reproducibility of the analysis a greater challenge.
When it comes to human data, not only volume but also complexity becomes an integral part of data. Before even getting to sequencing, access to the use of this data while ensuring patient confidentiality is the most critical part. Developing biorepositories or sharable databases is not an easy task when there are different legal requirements regarding patient data in different countries.
The long way to go!
Genomics has advanced our understanding of many diseases and contributed to the development of new drugs. However, before embarking on this journey of converting raw data into actionable clinical insights, we need to answer some critical questions; Who can have access to the data and how to ensure that it does not fall into the wrong hands? Beyond the apparent technical difficulties, overcoming the underlying ethical, legal and privacy concerns is the major challenge. This has prompted various funding agencies, research institutes and private research consortia to develop their own bespoke databases, which also raises the big question: how to make this data easily accessible so that it can build on past efforts for a better future ?
With the advent of AI and cloud computing tools in healthcare, it will be easier to move from data to actionable insights. Cloud computing offers flexibility as a pay-as-you-go model for renting computing power and storage. Some companies now also offer cloud computing infrastructure, analytical workflows, algorithms, and solutions that are fast, scalable, and scalable. secure data science tools.
However, when dealing with sensitive data sets, the solution may still seem elusive to many.
Implementing data-driven analytical tools to identify genetic biomarkers of disease will bring more targeted drugs to market. AI and machine learning will not only improve the speed of analysis, but will also help identify disease patterns, stratify patients, make clinical predictions, and examine the impact of drugs on disease progression. sickness.
We are still at the tip of the iceberg! But, interestingly, we are also not far from the day when DNA itself could be used as a digital storage tool. Scientists are already trying to solve the lack of storage space for sequence data by using living organisms as hard drives!
The opinions expressed above are those of the author.
END OF ARTICLE