Discovery consists of seeing what everybody has seen and thinking about what nobody else has thought. – Albert Szent-Györgyi
Jonathan Ng is a second-year Nanyang Research Scholar at the NTU School of Biological Sciences (SBS), where he is currently working on his Ph.D. project, involving the development of machine learning models to identify the functions of plant genes, such as those involved in the biosynthesis of secondary metabolites. From working on molecular biology research to developing machine learning models which can harness genomic information, Jonathan’s long term career aspiration is to work in the Biotechnology Research & Development industry, developing AI solutions for personalized medicine applications.
What are your research interests and why?
My interest is in developing machine learning models which can harness genomic information, to predict gene function and classify them. Currently, I am working on the genomic data of plants and aim to develop a method to predict the function of all plant genes – particularly secondary metabolism genes. I really enjoy data analytics and would like to apply this to investigate genomic phenomena.
This interest was triggered when you were working with Illumina. Could you share a little about what brought about this change?
During my time in Illumina, I condcuted molecular biology research in the field of protein engineering. While I enjoyed it, I felt that dry lab work would suit me better and hence wanted to have a change in field. My time in Illumina also allowed me to appreciate the challenge in interpreting genomic information; sequencing genomes can be relatively easy to do, but interpreting it is a lot harder. Hence, this sparked my interest in bioinformatics, as I wanted to develop machine learning models to help us apply genomic information to solve biological problems.
Why did you choose to pursue your Ph.D. in NTU SBS, at Dr Marek’s lab?
Dr Marek, who had just joined NTU at that point in time, had an interesting machine learning project in mind, focusing on plant genomics. Since I was looking for a dry lab project to work in, I was keen to join him. In addition, having completed my first degree in NUS, I wanted to experience research life in NTU.
Could you share a little more about what your research encompasses and its desired outcomes?
Plant secondary metabolites play vital economic roles in society, as they are used as drugs, cosmetics, food flavourings and other important chemicals. Therefore, identifying genes which produce such metabolites, would be useful for biotechnological purposes, as we could apply such knowledge to make more of such chemicals which are useful to society. However, despite an abundance of genomic information, it is still difficult for scientists to infer gene function. Therefore, one way would be to use machine learning, trained on genomic information, to predict which genes are responsible for secondary metabolite synthesis. This would enable us to identify candidate secondary metabolite genes, for experimental confirmation, which can then be used for biotechnological purposes.
What would you share with prospective students who might want to apply for the Nanyang Research Scholarship?
Think about your research interests and career goals, and speak to the different professors and students in SBS. This aids in getting a feel of the culture of different labs and find out which lab suits you most. While selecting a lab based on your research interest is important, I was advised on the importance of finding a lab where I can get along with the professors and team, which I felt was valuable advice.
Jonathan was featured in the Author Spotlight in the January 2020 issue of Oxford Academic here.
About Dr Marek Mutwil’s lab
Dr Marek’s lab uses systems biology to elucidate the function of genes in the plant kingdom. Today, knowledge of gene function is mostly confined to Arabidopsis thaliana, the main model organism in plant research, which limits our understanding of the plant kingdom. To remedy this, his team is characterising also other plant species in terms of genomics, gene expression, metabolomics and protein-protein interaction networks. Based on this biological data, we can then predict gene function with state-of-the-art network ensemble methods and other dry-lab techniques. This knowledge is important to understand plants and to tailor them to our needs.
Find out more about their lab here.