Google AI tool identifies genetic drivers of cancer

Google introduced DeepSomatic, an open-source AI tool using convolutional neural networks to improve the accuracy of cancer-related mutation identification in tumor genomes. Trained with the high-quality CASTLE dataset, DeepSomatic analyzes genetic variants, distinguishing clinically relevant mutations from noise. It outperforms existing methods, particularly in identifying insertions and deletions (Indels), even in challenging samples. It can also operate in “tumor-only” mode and generalizes well to new cancer types, with the goal of enabling more precise and personalized cancer treatments.

Google has unveiled DeepSomatic, a sophisticated artificial intelligence tool designed to enhance the accuracy of identifying cancer-related mutations within tumor genetic sequences. The innovation, detailed in a publicationin *Nature Biotechnology*, aims to address a critical challenge in oncology: precisely pinpointing the genetic drivers behind a tumor’s growth to facilitate the development of targeted therapies.

Cancer, at its core, is a disease of uncontrolled cell division triggered by malfunctions in the regulatory mechanisms governing cellular processes. Identifying the specific somatic (acquired) genetic mutations fueling a tumor is paramount for devising effective, personalized treatment strategies. The current standard of care often involves sequencing tumor cell genomes obtained from biopsies to inform treatment decisions targeting the unique vulnerabilities of a specific cancer’s growth and propagation.

DeepSomatic leverages convolutional neural networks to analyze genetic variants in tumor cells, purportedly achieving a higher level of accuracy compared to existing methodologies. Significantly, Google has made both the DeepSomatic tool and the high-quality training dataset (CASTLE) used in its development open source, potentially accelerating research and clinical applications across the oncology landscape. This move aligns with a growing trend in AI-driven healthcare, where transparency and collaboration are seen as essential for fostering innovation and trust.

The Somatic Variant Challenge: Noise vs. Signal

Cancer genetics presents a complex and nuanced picture. While genome sequencing can reveal a multitude of genetic variations within cancer cells, distinguishing genuine, clinically relevant mutations from sequencing artifacts and background noise poses a significant challenge. This is precisely where AI tools like DeepSomatic can offer substantial value. The vast majority of cancers are driven by somatic variants, which are acquired throughout a person’s lifetime due to environmental factors or errors during DNA replication, rather than inherited germline variants.

These somatic mutations, occurring post-birth, can be triggered by factors such as UV radiation damage or random errors arising during DNA replication. When these mutations disrupt normal cellular function, they can lead to the uncontrolled proliferation characteristic of cancer development and progression.

The difficulty in identifying somatic variants arises from their often low frequency within tumor cell populations, sometimes even lower than the inherent error rates associated with DNA sequencing technologies. This necessitates highly sensitive and accurate methods for detection and distinguishing true mutations from spurious signals.

A Deep Dive into DeepSomatic’s Architecture

In clinical practice, a typical workflow involves sequencing both tumor cells from a biopsy specimen and normal, healthy cells from the same patient. DeepSomatic operates by identifying the differences between these two sets of sequencing data, specifically zeroing in on variations present in the tumor cells but absent in the inherited germline information. These tumor-specific variations provide crucial insights into the molecular mechanisms driving cancer progression.

The DeepSomatic AI model processes raw genetic sequencing data from both tumor and normal samples, transforming it into visual representations, or “images,” encompassing pertinent data points. These include the raw sequencing reads, alignment information depicting their positions along the chromosome, and other relevant metadata. These images are then fed into a convolutional neural network, which is trained to discern between the standard reference genome, the individual’s inherited genetic variants, and the cancer-causing somatic mutations, while simultaneously filtering out sequencing errors. The final output is a curated list of identified cancer-related mutations.

A notable feature of DeepSomatic is its ability to operate in “tumor-only” mode when normal cell samples are not available. This is particularly relevant in the context of hematological malignancies such as leukemia, where obtaining a suitable normal cell control sample can be challenging. This “tumor-only” capability broadens the tool’s applicability significantly across a diverse range of research and clinical scenarios.

Training Data: The Bedrock of AI Accuracy

The accuracy of any AI model hinges on the quality and comprehensiveness of the data used to train it. To this end, Google collaborated with the UC Santa Cruz Genomics Institute and the National Cancer Institute to create CASTLE, a benchmark dataset meticulously designed for this purpose. This dataset comprises sequencing data from tumor and normal cells derived from four breast cancer samples and two lung cancer samples.

These samples were rigorously analyzed using three leading sequencing platforms to generate a highly accurate reference dataset. This involved combining the outputs from each platform and meticulously removing platform-specific biases and errors. Interestingly, the data highlighted the substantial heterogeneity in mutational signatures even within the same cancer type, a finding that could prove instrumental in predicting patient responses to specific therapies.

In validation studies, DeepSomatic consistently outperformed other established methods across all three major sequencing platforms. The tool demonstrated particular proficiency in identifying complex mutations known as insertions and deletions, or “Indels.” For Indels, DeepSomatic achieved a 90% F1-score on Illumina sequencing data, compared to 80% for the next-best method. The improvement was even more pronounced on Pacific Biosciences data, where DeepSomatic achieved a score exceeding 80%, while the nearest competitor scored less than 50%. This improved accuracy in Indel detection is particularly significant as these types of mutations are often missed by traditional methods but can play a crucial role in cancer development.

Further tests were performed on challenging samples, including a breast cancer sample preserved using formalin-fixed-paraffin-embedded (FFPE), a common tissue preservation technique known to introduce DNA damage and complicate downstream analysis. Performance was also evaluated using data generated from whole exome sequencing (WES), a more cost-effective method that focuses on sequencing only the approximately 1% of the genome that codes for proteins. DeepSomatic consistently outperformed other tools in these scenarios, suggesting its potential application in analyzing lower-quality or historical samples where DNA integrity may be compromised.

A Universal Tool for Cancer Genomics?

A compelling attribute of DeepSomatic is its demonstrated ability to generalize its learning and analyze new cancer types beyond those it was initially trained on. In an analysis of a glioblastoma sample, an aggressive form of brain cancer, DeepSomatic accurately identified the key variants known to drive the disease. Furthermore, in a collaborative project with Children’s Mercy in Kansas City, DeepSomatic analyzed eight pediatric leukemia samples and successfully identified previously known variants while uncovering ten novel mutations, despite the analysis being performed on tumor-only samples.

Google’s overarching vision is for research laboratories and clinical practitioners to widely adopt DeepSomatic to gain deeper insights into the unique characteristics of individual tumors. By facilitating the detection of known cancer variants, the tool could assist in selecting optimal existing treatments. Furthermore, by identifying novel variants, DeepSomatic could pave the way for the development of innovative therapies targeting previously uncharacterized molecular targets. The ultimate objective is to advance the field of precision medicine and deliver more effective, personalized treatments to cancer patients.

Original article, Author: Samuel Thompson. If you wish to reprint this article, please indicate the source:https://aicnbc.com/11091.html

Like (0)
Previous 2 days ago
Next 2 days ago

Related News