Using machine learning to predict chromatin interactions in DNA and to improve cancer research

by Nicholas ANG Siong Lim | May 6, 2022 | Biology, School of Biological Sciences, Women in Science

Have you ever wondered how Spotify knows what new tunes to recommend to you? Or how Netflix can suggest another television series for you to binge-watch? It is all down to a branch of artificial intelligence known as machine learning, where computer algorithms analyse data and improve themselves automatically through experience, making decisions without needing much human intervention. Machine learning has become ubiquitous in our everyday lives and is on a meteoric rise in terms of importance in many fields, including medicine.

Chromatin Interactions and Current Machine Learning Frameworks

In science and medicine, machine learning is being harnessed to study how diseases develop as well as to test new drugs. It is also being used in genomics to better understand and predict chromatin interactions.

Chromatin refers to the complexes of DNA and other proteins that are compressed to fit inside a cell’s nucleus. Chromatin interactions play an important role in how gene expressions are regulated. During gene expression, DNA sequences known as enhancers are brought closer together with other DNA sequences – promoters – to aid in the transcription of an associated gene. These DNA sequences are brought close together through the folding profiles of different genomes, and are called “chromatin interactions”.

Machine learning frameworks can be taught to identify the folding profiles of these genomes once they have been provided with the data, and they can then predict more of these folding profiles. However, current frameworks leave much to be desired. Some, like Akita and DeepC, are currently only able to perform predictions within a limited DNA sequence region, so chromatin interactions between distantly located genomic regions cannot be predicted. These frameworks have also not been further tested for their abilities to predict chromatin interactions in new patient cancer samples. Some of these frameworks also have a high operational cost. Therefore, there is a need for a better predictor of chromatin interactions.

Chromatin Neural Network (ChINN)

A team of scientists led by Assistant Professor Melissa J. Fullwood, from NTU’s School of Biological Sciences, and Associate Professor Kwoh Chee Keong, from NTU’s School of Computer Science and Engineering have developed a new machine learning method, called the Chromatin Interaction Neural Network (ChINN). The first authors of this work, which was recently published in Genome Biology, are Dr Cao Fan, Dr Zhang Yu and Dr Yichao Cai. ChINN is a convolutional neural network, which is a form of artificial intelligence that uses deep learning – a type of machine learning that mimics the way humans learn – algorithms to process and analyse images. This network is further inspired by the biological neural networks that make up a brain, and its algorithms are more complex than those used in traditional machine learning. This means that ChINN is a very powerful processing system that excels at identifying images, learning and improving itself. With this large amount of processing power, ChINN can identify open chromatin interactions in a genome-wide manner, providing greater sequence coverage than other frameworks.

Assistant Professor Melissa Fullwood and Associate Professor Kwoh Chee Keong

Patient Heterogeneity

The team was also able to demonstrate the prediction power of ChINN. They applied their framework on 6 chronic lymphocytic leukaemia (CLL) patient samples to see if their method was able to predict chromatin interactions in a new dataset as well as identify heterogeneity – differing chromatin interactions – amongst the samples. Genomic sequences can be nearly identical across different patient samples, except for regions of patient-specific cancer structural variations.

During testing, the team observed that there was a lack of similar chromatin interactions across the CLL samples, with many of the open chromatin interactions only being found in single samples. This showed that there was extensive patient heterogeneity. This heterogeneity was further demonstrated when the team ran ChINN through a larger cohort of 84 CLL samples. Their results showed that despite there being similar open chromatin interactions that were present in all the samples, there was still a large number of patient-specific open chromatin interactions. The team noted that this widespread nature of patient-specific chromatin interactions had not been previously reported in the 3D genome organisation field.

Future studies and applications

While further refinement of ChINN is needed to improve the accuracy of the framework’s chromatin interaction prediction ability, the framework only needs the use of open chromatin data, and it can show good generalisability on the same type of chromatin interactions across different cell types. This means that it has the potential to be used on large sets of clinical samples that only have limited amounts of biological material. This presents scientists with a more cost-effective method to carry out large scale genetic testing and chromatin prediction.

The widespread nature of patient heterogeneity, as observed from the results of ChINN on CLL samples, also demonstrates a possible new direction for future cancer research and treatments. By being able to identify specific chromatin interaction-based biomarkers, scientists will then be able to differentiate between the various subtypes of cancers and develop more precise therapies, thus resulting in more effective cancer treatments.