top of page


Publications: List

A Review of Molecular Representation in the Age of Machine Learning

WIREs Computational Molecular Science, 2022

Research in chemistry increasingly requires interdisciplinary work prompted by, among other things, advances in computing, machine learning, and artificial intelligence. Everyone working with molecules, whether chemist or not, needs an understanding of the representation of molecules in a machine-readable format, as this is central to computational chemistry. Four classes of representations are introduced: string-, connection table-, feature based-, and computer learned-representations. Three of the most significant representations are SMILES, InChI, and the MDL molfile, of which SMILES was the first to successfully be used in conjunction with a variational autoencoder to yield a continuous representation of molecules. This is noteworthy because a continuous representation allows for efficient navigation of the immensely large chemical space of possible molecules. Since 2018, when the first model of this type was published, considerable effort has been put into developing novel and improved methodologies. Most, if not all, researchers in the community make their work easily accessible on GitHub, though discussion of computation time and domain of applicability is often overlooked. Herein we present questions for consideration in future work which we believe will make chemical variational autoencoders even more accessible.

Multi-task Bayesian Optimisation of Chemical Reactions

NeurIPS Machine Learning for Molecules, 2020

Recent work has shown how Bayesian optimization (BO) is an efficient method for optimizing expensive experiments such as chemical reactions. However, in previous studies, each optimization has been started from scratch with no information about previous or similar chemical optimization studies. Therefore, BO can still require more iterations than many experimental budgets provide. Here, we overcome this challenge using multi-task BO. Through in silico benchmarking studies, we show how past experimental data can be leveraged to improve the quality and speed of reaction optimization.

A Framework for Biogas Exploitation in Italian Waste Water Treatment Plants

Chemical Engineering Transactions, 2019

Effective utilisation of biogas is an important step in increasing usage of renewable energy, due to the great flexibility that solar and wind power in particular lacks. Biogas generated through anaerobic digestion (AD) of sewage sludge addresses environmental concerns together with creating electricity generation potential. There is currently no optimisation-based decision-support framework to determine the best use of biogas from a Waste Water Treatment Plant (WWTP), and provide a market outlook for each of the options. This work proposes a novel multi-period Mixed Integer Linear Program (MILP) model for dispatch and selection of technologies capable of exploiting biogas produced from sludge. The novelty is also highlighted by extrapolating the optimised results to a broader analysis of 855 Italian WWTPs with Population Equivalent (P.E.) > 20,000. The use of real input data provides a unique added value to the work. The modelling framework is applied to several case studies. Results show that 7–23 % savings in operating costs are possible from integrating three systems to exploit biogas, and the trade-offs between capital and operating costs affect the optimal system choice. Furthermore, market driven scenarios are used to analyse how to improve the economic performance.

Recent Presentations

June 2022

Cambridge Chemical Engineering and Biotechnology Conference

In silico prediction of PDB by a mechanistic DFT-aided algorithm
Explaining how a network of linear equations fitted to energy difference calculations performed using Density Functional Theory (DFT) can be used to predict protodeboronation (PDB).

April 2022

Guest Speaker, Amaro & McCammon Labs
UC-San Diego, US (virtual)

A practical guide to machine-readable molecular representation

In this talk I presented an overview of molecular representation, we explored reading & writing SMILES and  mdl molfiles, and finally I took a deep dive into the ECFP algorithm.

March 2022

Lapkin Lab Machine Learning Subgroup
Cambridge, UK

Molecular Representation Workshop

In this workshop I presented an overview of molecular representation, presented practice problems for understanding SMILES & deepSMILES.

February 2022

St Edmund's Student Conference
Cambridge, UK

Teaching Chemistry to Computers

How are molecules represented to be understandable to com­puters? Based on my recently accepted review paper I explained feature engi­neering for chemistry.

January 2022

Aspect Capital ML Reading Group
London, UK (virtual)

Multi-task Bayesian Optimisation of Chemical Reactions

Outlining the results from my conference paper on the same topic which I presented at Neurips 2020.

May 2021

Lapkin Lab Machine Learning Subgroup
Cambridge, UK (virtual)

ChemDraw Workshop

You may be using ChemDraw in your research, but are you aware of all the shortcuts and hotkeys designed to speed up your work?

April 2021

Joint CDT Conference Data Driven Chemical Synthesis and Catalysis, UK (virtual)

Multi-task Bayesian Optimisation

This presentation was based on my Neurips conference paper.

December 2020

Neurips Machine Learning for Molecules

Multi-task Bayesian Optimisation of Chemical Reactions

Presented a poster based on the accepted manuscript which you can find above.

Publications: Experience

Course List

PhD Machine Learning for Chemistry

3rd year

Cambridge Rising Stars (public engagement), Career Progression in Academia and Industry, Statistics for Chemical Engineers (CET IIA Statistics)

2nd year

EnterpriseTECH: A hands-on entry-level entrepreneurship programme. Our team developed a commercialisation strategy for a by-product formed when extracting fluids with medical applications from watercress.

1st year

Computational Parametrisation, Green Chemistry, Introduction to Computer Science and Programming Using Python, Introduction to Probabilistic Modelling, Machine Learning in Chemistry 101, Model Development and Model Based DoE, Philosophy for Chemists, Responsible Research and Innovation, Science Communication (in Science, Media, and Business), Statistics for Chemists, The Drug Discovery Process.

Publications: List

Course List

MEng Chemical Engineering

4th year

Research project: Optimisation of Biogas Exploitation in Waste Water Treatment Plants using GAMS. 
Design project: Technical design for continuous Lapatinib production plant in South Africa.
Advanced Bioprocess Engineering, Business Economics, Corporate Finance, Dynamic Behaviour of Process Systems, Modelling of Biological Systems, Transport Processes in Biological Systems.

3rd year

Research project: Predicting Protein Aggregation using Self-Interaction Chromatography
Coursework projects: Flowsheeting, Mechanical Design, Techno-economic Evaluation
Accounting, Biochemical Engineering, Environmental Engineering, Numerical Methods, Project Management, Reaction Engineering II, Safety and Loss Prevention, Strategy of Design, Transfer Processes III.

2nd year

Research project: Optimising performance of a propane-fed power plant
Coursework projects: Carbon Capture Pilot Plant, Reactor Design and Control
Biochemistry, Business for Engineers II, Development of Business and Economic Ideas 1800-2010, Industrial Chemistry, Mathematics II, Process Dynamics and Control, Reaction Engineering I, Separation Processes II, Thermodynamics II, Transfer Processes II.

1st year

Coursework projects: Foundational lab work
Business for Engineers I, Chemistry I, Introduction to Management, Introduction to MATLAB, Mathematics I, Process Analysis, Professional Skills for Employability, Properties of Matter, Separation Processes I, Transfer Processes I.

Publications: List
bottom of page