Computing InChIs#
About this interactive
recipe
Author: Vincent Scalfani
Reviewer: Stuart Chalk
Topics: How to Calculate InChIs from SMILES, Using RDKit or Open Babel Adapted from CPCDS 2021 Digital IUPAC Session - 51st IUPAC General Assembly
Format: Interactive Jupyter Notebook (Python)
Scenarios: You need to convert a SMILES string into its equivalent InChI string.
Skills: You should be familiar with
Learning outcomes: After completing this example you should understand:
How to load and use RDKit to obtain and display chemical identifiers
How to load and use Open Babel to obtain and display chemical identifiers
Citation: ‘Computing InChIs’, Vincent Scalfani, The IUPAC FAIR Chemistry Cookbook, Contributed: 2024-02-14 https://w3id.org/ifcc/IFCC012.
Reuse: This notebook is made available under a CC-BY-4.0 license.
1. Using RDKit#
1.1 Import RDKit Modules#
from rdkit import Chem
from rdkit.Chem import Draw
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from rdkit import Chem
2 from rdkit.Chem import Draw
ModuleNotFoundError: No module named 'rdkit'
1.2 Create a Molecular Object from SMILES#
# PubChem CID: 134601
m = Chem.MolFromSmiles('COC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CC(=O)O)N')
m # to show image of molecule
# Internally, we have created an RDKit molecular object
print(m)
1.3 Calculate InChI#
# Compute InChI from RDKit mol
Chem.MolToInchi(m)
# Compute InChIKey from RDKit mol
Chem.MolToInchiKey(m)
1.4 Calculate InChIs for a List of Molecules#
# Import a file of SMILES strings
smiles_list = []
with open('../files/my_smiles.smi') as infile:
for smi in infile:
smiles_list.append(smi.rstrip()) # rstrip removes newline
print(smiles_list)
# Or create a list directly
smiles_list = ['COC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CC(=O)O)N',
'COC(=O)[C@@H](CC1=CC=CC=C1)NC(=O)[C@@H](CC(=O)O)N',
'COC(=O)[C@H](CC1=CC=CC=C1)NC(=O)C[C@@H](C(=O)O)N',
'C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC=O',
'C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N',
'CC(C)C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)C']
# Next, loop through the smiles_list and create RDKit molecular objects
mols = []
for smi in smiles_list:
mols.append(Chem.MolFromSmiles(smi))
print(mols)
# alternative solution
# mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
# Display the molecules in a grid
# SVG False uses PNG
Draw.MolsToGridImage(mols, molsPerRow=3, useSVG=False)
# Loop through mols (molecular objects) and calculate InChIs
InChIs = [Chem.MolToInchi(mol) for mol in mols]
print(InChIs)
2. Using Open Babel#
2.1 Import Open Babel Modules#
# Open Babel v3.1.1
from openbabel import pybel
2.2 Create a Molecular Object from SMILES#
m = pybel.readstring("smi", "COC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CC(=O)O)N")
m # to show image of molecule
# Internally, we have created an Open Babel molecular object
print(type(m))
2.3 Calculate InChI#
# Set up InChI conversion
conv = pybel.ob.OBConversion()
conv.SetOutFormat("inchi")
# Calculate InChI
inchi_output = conv.WriteString(m.OBMol)
print(inchi_output)
# Set up InChIKey conversion
conv = pybel.ob.OBConversion()
conv.SetOutFormat("inchikey")
# Calculate InChIKey
inchikey_output = conv.WriteString(m.OBMol)
print(inchikey_output)
2.4 Calculate InChIs for a List of Molecules#
# Import a file of SMILES
smiles_list =[]
with open('../files/my_smiles.smi') as infile:
for smi in infile:
smiles_list.append(smi.rstrip()) # rstrip removes newline
print(smiles_list)
# Next,loop through the smiles_list and create OB molecular objects
ms = [pybel.readstring("smi", m) for m in smiles_list]
print(ms)
# Set up InChI conversion
conv = pybel.ob.OBConversion()
conv.SetOutFormat("inchi")
# Loop through mols (molecular objects) and calculate InChIs
InChIs = [conv.WriteString(m.OBMol).rstrip() for m in ms]
print(InChIs)
References
[1] RDKit Documentation: https://www.rdkit.org/docs/index.html
[2] Open Babel Python Documentation: https://open-babel.readthedocs.io/en/latest/UseTheLibrary/Python.html