Computer-guided Retrosynthesis

Gaurav Kumar


Computer-aided retrosynthesis is a milestone achievement in the career of chemists that unlock several gateways for implementing machine learning techniques for guiding molecules of any compound. Deep Blue technology developed by IBM defeated Garry Kasparov, world champion in chess in the year 1997. This opened the mind for several chemists to apply this type of technology in their research. Bartosz Grzybowski researched the idea of implementing the algorithms of Deep Blue technology for the molecules of a compound. Hence retrosynthesis can be defined as the implementation of such an algorithm for targeting molecules of a compound. This was a great success in the history of chemistry. Systems based on machine learning algorithms can efficiently guide reactions for targeting the molecules. Andy Extance recently researched various approaches and the thesis behind the computer-aided retrosynthesis approaches in modern science development.

Target Molecules retrosynthesis tree.
Main Disconnecting approaches in retrosynthesis

History of Retrosynthesis

The process of retrosynthesis consists of a very brief history from its past to modern development in the field of chemistry. Computer-based retrosynthesis was previously implemented by the chemist E J, Corey. However, he didn’t succeed in his research because of some limitations at that time. In the year 2005, Bartosz Grzybowski along with his team of chemists researched this idea and successfully defined molecule synthesis as a combination of networks. In the year 2012, Bartosz Grzybowski and his team succeeded in introducing a scoring function for evaluation as well as optimization of the existing synthesis process. However, Bartosz Grzybowski continued his research for improving the retrosynthesis process like the Deep Blue technology, which defeated the world champion by implementing several perfect moves in the game. In the year 2016, Bartosz Grzybowski and his team succeeded in publishing the first pathway. They manually coded the rules for the algorithm that defines the reaction mechanism. In the year 2020, they finally made a historical achievement by producing routes for complex products, and this achievement was made under Chematica.

Synthia: Success Story

Owner of the Sigma-Aldrich company, Merck KGaA renamed Chematica Synthia after buying it in the year 2017. Synthia is manufactured and developed mainly for storing a large amount of data sources and literature that may be required in the approaches of the computer-aided retrosynthesis process. Merck KGaA is from Germany and he is a well-known popular giant in the field of chemistry. He wanted to test the automation of retrosynthesis on a total of 7 compounds as directed by the Sigma-Aldrich company. According to Lindsey Rickershauser, sales, and marketing manager of Synthia, the company mainly focused on minimizing the pressures over the company’s chemists and scientists. Hence the company succeeded in developing advanced synthetic routes for various products. This resulted in obtaining more successful results over retrosynthesis. They also succeeded in reducing the cost as well as the total number of steps involved in the computer-aided retrosynthesis process.

Molecular Similarity in a Computer-aided Retrosynthesis

For synthesizing any target molecule, we must understand the identification of reaction steps involved in it. Robert Robinson tried to synthesize tropinone in the year 1917. Later E. J. Corey applied retrosynthesis for targeting molecules of tropinone in the year 1990 and received Nobel Prize for his successful work. This motivated the chemists to work upon automation techniques for obtaining enhanced and high-yielding results from retrosynthesis.

The recent development in the field of computer-aided retrosynthesis is mostly given by J. Gasteiger. Most of the computer-aided retrosynthesis work is done by encoding the given reaction templates or generalizing targeting rules in molecules. Generalized and abstracted templates are obtained by extracting reaction data storage by applying certain algorithms or by hand-written codes and algorithms.

Several techniques have been found and successfully implemented for the process of extraction of the meaningful reaction centers from a compound. Very large sets of templates are very difficult to be extracted and the cost involved in the process is also very high. For the extraction of the reaction template, we need to solve the isomorphism problems related to the subgraph. This consideration is needed to be predicted before applying the automation technique in the retrosynthesis process. However, this limitation doesn’t affect the development of the research in the field of computer-aided retrosynthesis. The reaction template is considered very useful in encoding the transformation in any compound. Reaction templates consist of the capability of targeting molecules in a compound very efficiently. Let us consider an example of the cleavage process of two single bond aromatic carbons. They are linked along with 57 several other group pairs. The encoding of the reaction sites can be done by encoding the string known as SMARTS as [cH0]-[cH0].


We have discussed several approaches, history, and applications of computer-aided retrosynthesis that are mostly on the basis of analogizing the target reactions. Various approaches and attempts have been made in the field of computer-aided retrosynthesis for development and enhancement in modern chemistry. The molecular similarity is also important to understand between the reactants as well as between the products.

The molecular similarity in computer-aided retrosynthesis is sufficiently required for the determination of the related forms and precedents. Various tools for retrosynthesis are on the basis of lags in machine learning and the performance is monitored by Synthia. The knowledge and information obtained from the datasets are considered very useful for chemists to develop and enhance computer-guided retrosynthesis techniques. Several chemists in this field succeeded in developing synergies among machine learning and the synthesis of a molecule.

Not only machine learning and artificial intelligence is much sufficient, but it also requires molecular and quantum mechanics in the process of computer-aided retrosynthesis. Hence this approach requires various knowledge and understanding, but yet it has yielded a lot of success in the chemical industry. The techniques involved in it are very much effective and provide a very précised outcome. Thus implementation of machine learning and artificial intelligence system in the synthesis process, known as computer-aided retrosynthesis is a key success in the research and development of modern science.


Cook, A.; Johnson, A. P.; Law, J.; Mirzazadeh, M.; Ravitz, O.; Simon, A. Computer-aided synthesis design: 40 years on Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 79– 107.

Corey, E. J.; Long, A. K.; Rubenstein, S. D. Computer-assisted analysis in organic synthesis Science 1985, 228, 408– 419.

Corey, E. J. The logic of chemical synthesis: multistep synthesis of complex carbogenic molecules (Nobel Lecture) Angew. Chem., Int. Ed. Engl. 1991, 30, 455– 465.

Gelernter, H.; Rose, J. R.; Chen, C. Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning J. Chem. Inf. Model. 1990, 30, 492– 504.

Law, J.; Zsoldos, Z.; Simon, A.; Reid, D.; Liu, Y.; Khew, S. Y.; Johnson, A. P.; Major, S.; Wade, R. A.; Ando, H. Y. Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation J. Chem. Inf. Model. 2009, 49, 593– 602.

Ott, M. A.; Noordik, J. H. Computer tools for reaction retrieval and synthesis planning in organic chemistry. A brief review of their history, methods, and programs Recl. Trav. Chim. Pays-Bas 1992, 111, 239– 246.

Satoh, H.; Funatsu, K. SOPHIA, a knowledge base-guided reaction prediction system-utilization of a knowledge base derived from a reaction database J. Chem. Inf. Model. 1995, 35, 34– 44.

Satoh, K.; Funatsu, K. A novel approach to retrosynthetic analysis using knowledge bases derived from reaction databases J. Chem. Inf. Comput. Sci. 1999, 39, 316– 325.

Todd, M. H. Computer-aided organic synthesis Chem. Soc. Rev. 2005, 34, 247– 266.

Warr, W. A. A. Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility Mol. Inf. 2014, 33, 469– 476.


Share on facebook
Share on twitter
Share on linkedin

Leave a Reply

Your email address will not be published.