Releasing the Kraken

This feature is adapted from a story by the University of Toronto. Find the original here.

An open-access tool for chemists that promises to save time and money in the discovery of chemical reactions has been launched this week by the research group of Distinguished Professor Matt Sigman of the University of Utah Department of Chemistry and the Matter group of professor Alán Aspuru-Guzik at the University of Toronto.

An image

Kraken is a library of over 330,000 virtual, machine-learning calculated organic compounds. Photo credit: Gabriel dos Passos Gomes and Robert Pollice.

Kraken—created in a collaboration between the Matter lab, the Sigman group, IBM Research and AstraZeneca—is a library of virtual, machine-learning calculated organic compounds, roughly 300 thousand of them, with 190 descriptors each.

“This collaborative project changes how researchers will approach reaction optimization both in industry and academics,” Sigman says. “It will provide unforeseen opportunities to investigate new reactions while also the ability to know why the reactions work.”

“The world has no time for science as usual,” says Aspuru-Guzik, “Neither for science done in a silo. This is a collaborative effort to accelerate catalysis science that involves a very exciting team from academia and industry.”

“It takes a long time, a lot of money and a whole lot of human resources to discover, develop and understand new catalysts and chemical reactions,” says co-lead author and Banting Fellow Dr. Gabriel dos Passos Gomes. “These are some of the tools that allow molecular scientists to precisely develop materials and drugs, from the plastics in your smartphone to the probes that allowed for humanity to achieve the COVID-19 vaccines at an unforeseen pace. This work shows how machine learning can change the field.”

When developing a transition-metal catalyzed chemical reaction, a chemist must find a suitable combination of metal and ligand. Despite the innovations in computer-optimized ligand design led by the Sigman group, ligands would typically be identified by trial and error in the lab. With kraken, chemists will eventually have a vast data-rich collection at their fingertips, reducing the number of trials necessary to achieve optimal results.

The Kraken library features organophosphorus ligands, what Tobias Gensch—one of the co-lead authors of this work—recalls as “some of the most prevalent ligands in homogeneous catalysis.”

“We worked extremely hard to make this not only open and available to the community, but as convenient and easy to use as we possibly could,” says Gomes, who worked with graduate student Theophile Gaudin in the development of the web application. “With that in mind, we created a web app where users can search for ligands and their properties in a straightforward manner.”

The team also notes that while 330,000 compounds will be available at launch, a bigger-scale library of over 190 million ligands will be made available in the future. In comparison, similar libraries have been limited to compounds in the hundreds with far fewer properties.

“This is very exciting as it shows the potential of AI for scientific research,” says Aspuru-Guzik. “In this context, the University of Toronto has launched a global initiative called the Acceleration Consortium which hopes to bring academia, government, and industry together to tackle AI-driven materials discovery. It is exciting to have Professor Matthew Sigman on board with the consortium and seeing results of this collaborative work come to fruition.”

Kraken can be freely accessed here. The preprint describing how the dataset was elaborated and how the tool can be used for reaction optimization can be accessed at ChemRxiv.

Media Contacts

Matt Sigmandistinguished professor, Department of Chemistry

Paul Gabrielsenresearch/science communications specialist, University of Utah Communications
Mobile: 801-505-8253