IBM bets on AI and robotics to speed up drug discovery
Using AI for chemical synthesis and retrosynthesis
IBM’s RoboRXN is the culmination of three years of research and development in applying AI to chemical research. In 2017, the company developed an AI system for predicting chemical reactions in forward synthesis.
Hypothesizing about chemical reactions and experimenting with different chemical components is one of the most time-consuming parts of chemical research. It requires a lot of experience, and chemists usually specialized in specific fields, making it challenging for them to tackle new tasks.
IBM’s AI is a neural machine translation system tailored to chemical synthesis.Artificial neural networkshave made great inroads in natural language processing in recent years. While neural networksdo not understand the context of human language, their broader capabilities in processing sequential data can serve many fields, including chemical research.
For instance,recurrent neural networks(RNN) and transformers can perform sequence-to-sequence mapping. Train an RNN on a set of input strings and their corresponding output strings, and it will find statistical correlations that map the inputs to outputs (you still need quality data, though). These strings can contain any kind of symbols, including letters, musical notes, or character representations of atoms and molecules. As long as there is consistency in the data and there are patterns to be learned, the neural network can find a way to map the inputs to the outputs.
Trained on a dataset of more than 2 million chemical reactions, the neural network was firstintroduced in a paperpresentedby the IBM Research team at the NIPS 2017 AI conference. The next year, IBM developed the AI into RXN for Chemistry, a cloud-based platform for chemical research, and presented it at the American Chemical Society annual exposition. RXN for Chemistry aids chemists in predicting the likely outcome of chemical reactions, saving research time, and reducing the years it takes to acquire experience.
In 2019, the IBM Research team improved the AI behind RXN for Chemistry to also support retrosynthesis. This is the inverse process of chemical synthesis. In this case, you already know the molecular structure you want to achieve. The AI must predict the series of steps and chemical components needed to reach the desired result.
“The retrosynthesis planning model models were developed in collaboration with retrosynthesis experts from the University of Pisa, who constantly gave us feedback how to improve our models,” Teodoro Laino, the manager of IBM Research Zurich, toldTechTalks.
IBM RXN for Chemistry also has the possibility to design retrosynthetic routes in an interactive mode.
In the interactive mode, the human chemist goes through the route step by step, getting suggestions by the AI at each stage. “Chemical synthesis becomes a human-AI interaction game,” Laino says.
Bringing the AI pieces together
Philippe Schwaller, predoctoral researcher at IBM Research Zurich, toldTechTalksthat the final AI system used in RoboRXN is composed of several sequence-to-sequence transformer models, each performing one part of the task.
“Given a target molecule, RoboRXN breaks it down in multiple recipe steps using predictions by a retro reaction prediction and a pathway scoring model until the system finds commercially available molecules,” Schwaller said. “Then, for each step in the recipe, the reaction equations are converted using another seq-2-seq transformer model to all necessary actions, which the robot has to perform, to successfully run the chemical reaction. This model predicts reaction conditions (e.g. temperature, duration) for the different actions (e.g. add, stir, filter).”
In the process of creating the AI, the team published their findings in several peer-reviewed journals and made their AI models available on aGitHub repository. Their latest paper,published inNaturein July, explores the use of transformers to translate the chemical experiments written in open-prose to distinct steps. This is a key component in integrating the AI system with robo-labs, which expect distinct commands.
“For a given target molecule, RoboRXN provides not only a recipe made of multiple chemical reactions that would lead from commercially available molecules to the target molecule, but is also able to generate for each step in the recipe, the specific actions that a robot or human has to perform to successfully run the reaction step,” Laino says.
To draw an analogy with cooking, if you ask the system how to cook pizza, one AI layer will predict the ingredients, and a second will predict the sequence of operations to go from the ingredients to the final dish.
“In all cases, the AI can choose between several predictions. We provide the ones with the highest confidence score, but a user can always override the recommendations and give human feedback,” Laino says. In Wednesday’s presentation, the team showed how a user could jump in the process by adding, removing, or modifying the steps predicted by the neural networks.
Tackling explainability issues
A pure neural network–based approach comes with some benefits. The AI models scale well with the availability of data. And the system will benefit from all the research going into deep learning in general and transformers in particular.
But deep learning comes withinterpretabilitychallenges. Neural networks are very good at finding and exploiting correlations between different data points in their training corpus, but those correlations do not necessarily have causal value can yield erroneous results. The scientists employing the system should be able to explore and correct the reasoning used by the AI system.
The fact that the system provides a step-by-step procedure of creating the target molecule provides a level of explainability, making it easier for scientists to review the entire process. But the IBM researchers acknowledged that providing more granular explanations of the individual steps is still a work in progress.
Schwaller toldTechTalksthat the team has investigated BERT and ALBERT, two other transformer-based neural network architectures, to improve the interpretability of the predictions, classify them into named reactions, and link the predicted reactions back to similar reactions in the patents. The researchers have published their findings intwo separatepaperspublished in the ChemRxiv preprint server.
“Recently, we have also investigated why language models learn organic chemistry and chemical reactions so well and discovered that, without human labelling or supervision Transformer models capture how atoms rearrange during a chemical reaction,” Laino adds. “From this so-called atom-mapping signal we can extract the rules and grammar of chemical reactions and make our prediction models more interpretable.”
The team has developed avisualization toolfor the RXN AI models and made it available online.
Integration with the robotics lab
The original idea for the fully automated chemistry lab came when IBM presented RXN for Chemistry at the American Chemical Society annual exposition in 2018. “It was surprising to see that irrespective of the flaws that every data-driven model has (including RXN) the reaction of the chemical community was overwhelming—we actually had a line of people at our booth to try out demo,” Laino said. “We saw the real potential in front of us. I asked myself: Can an AI model drive an autonomous chemical lab?”
After discussing the idea with the rest of the team, the idea of RoboRXN was conceived. “The rest was only an intense but gratifying run to build everything: the remaining AI models, the integration of commercially existing hardware and the deployment of all services in the cloud,” Laino says.
During the online presentation, Laino and his team ran a hypothetical experiment with RoboRXN. A user connected to the IBM Cloud application and provided a target molecule to RoboRXN. The AI system processed the request and provided a suggestive instruction set for the experiment. After the user tweaked and confirmed the result, the instructions RoboRXN fed the commands to the robotic research lab and the experiment was kicked off. A live camera view allowed us to follow the steps as the robotic lab conducted the experiments.
The hardware used in the project is already commercially available, making it possible to integrate it with robotic labs organizations already have in place.
“Rather than developing our own hardware we decided to use industry standard hardware and use AI and Cloud to solve the issue of programming and accessing the robot remotely,” Laino said. “The project is hardware agnostic. Different types of hardware can be easily interfaced.”
The team also envisions RoboRXN to scale and run parallel experiments. Research labs can use the platform to coordinate operations across multiple labs and speed up the process of testing hypothesis and gathering the results.
Research during the pandemic and beyond
Automated tools such as RoboRXN could give a boost to research labs and scientists who have been constrained by the covid-19 lockdown.
“The pandemic rang a bell to each of us on how to integrate all existing digital solutions to avoid similar disruption in the future. Lab chemists, even today, are facing severe limitations to come back to work,” said Matteo Manica, machine learning researcher at IBM. “Computational scientists can work remotely, accessing supercomputing resources available online. We decided to provide the same at the level of a chemical lab. A chemical laboratory accessible remotely, that is supervised by AI and executed by robotic chemical hardware.”
But the benefits can go beyond just providing remote access and help direct the cognitive capacity of human scientists where it is needed most.
“RoboRXN can be considered for chemists what robotic vacuum cleaners are for humans. They do not necessarily make things faster, but they make things in a very reproducible way and during their work, you can focus on doing something else,” Laino said.
The increased adoption of automated labs will also generate more digital data, which can help improve the performance of the AI models in the future. Organizations can use the IBM Cloud to run RoboRXN and store the results obtained from the robotic labs. Alternatively, they can have the entire system installed on-premise or in a private cloud. IBM does not currently have plans to use data obtained from RoboRXN to finetune its AI models. Researchers using the platform can, however, integrate their own results with other open datasets and use them to train the deep learning models IBM has publicly made available.
This article was originally published by Ben Dickson onTechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original articlehere.
Story byBen Dickson
Ben Dickson is the founder of TechTalks. He writes regularly about business, technology and politics. Follow him on Twitter and Facebook(show all)Ben Dickson is the founder ofTechTalks. He writes regularly about business, technology and politics. Follow him onTwitterandFacebook
Get the TNW newsletter
Get the most important tech news in your inbox each week.