Previsione dei risultati della sperimentazione clinica | di Lennart Langouche | Ottobre 2023 | Intelligenza-Artificiale

Le molecole possono essere rappresentate nelle stringhe SMILES. SMILES è una notazione di linea per codificare la struttura molecolare. I dati sulle molecole del farmaco vengono estratti da ClinicalTrials.gov e collegato alla sua struttura molecolare (stringhe SMILES) utilizzando CACTUS.

import requestsdef get_smiles(drug_name):
# URL for the CIR API
base_url = "https://cactus.nci.nih.gov/chemical/structure"
url = f"{base_url}/{drug_name}/smiles"
try:
# Send a GET request to retrieve the SMILES representation
response = requests.get(url)
if response.status_code == 200:
smiles = response.text.strip()  # Get the SMILES string
print(f"Drug Name: {drug_name}")
print(f"SMILES: {smiles}")
else:
print(f"Failed to retrieve SMILES for {drug_name}. Status code: {response.status_code}")
smiles = ''
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return smiles
# Define the drug name you want to convert
drug_name = "aspirin"  # Replace with the drug name of your choice
get_smiles(drug_name)
### Output:
# Drug Name: aspirin
# SMILES: CC(=O)Oc1ccccc1C(O)=O

Scopo profondo può essere utilizzato per codificare composti molecolari. Attualmente supporta 15 codifiche diverse. Utilizzeremo la codifica Morgan, che codifica i gruppi atomici di una sostanza chimica in un vettore binario con lunghezza e raggio come due parametri. Per prima cosa dobbiamo installare la libreria DeepPurpose.

pip install DeepPurpose

Panoramica degli encoder DeepPurpose (immagine di Huang et al.licenza CC)

Creiamo un dizionario che associa SMILES alla rappresentazione Morgan e un dizionario che associa gli identificatori delle sperimentazioni cliniche (NCTID) direttamente alla loro rappresentazione Morgan.

def create_smiles2morgan_dict():
from DeepPurpose.utils import smiles2morgan # Import toy dataset
toy_df = pd.read_csv('data/toy_df.csv')
smiles_lst = list(map(txt_to_lst, toy_df('smiless').tolist()))
unique_smiles = set(reduce(lambda x, y: x + y, smiles_lst))
morgan = pd.Series(list(unique_smiles)).apply(smiles2morgan)
smiles2morgan_dict = dict(zip(unique_smiles, morgan))
pickle.dump(smiles2morgan_dict, open('data/smiles2morgan_dict.pkl', 'wb'))
def create_nctid2molecule_embedding_dict():
# Import toy dataset
toy_df = pd.read_csv('data/toy_df.csv')
smiles_lst = list(map(txt_to_lst, toy_df('smiless').tolist()))
smiles2morgan_dict = load_smiles2morgan_dict()
embedding = ()
for drugs in tqdm(smiles_lst):
vec = ()
for drug in drugs:
vec.append(smiles2morgan_dict(drug))
# print(np.array(vec).shape) # DEBUG
vec = np.mean(np.array(vec), axis=0)
# print(vec.shape) # DEBUG
embedding.append(vec)
print(np.array(embedding).shape)
dict = zip(toy_df('nctid'), np.array(embedding))
nctid2molecule_embedding_dict = {}
for key, row in zip(toy_df('nctid'), np.array(embedding)):
nctid2molecule_embedding_dict(key) = row
pickle.dump(nctid2molecule_embedding_dict, open('data/nctid2molecule_embedding_dict.pkl', 'wb'))  
create_nctid2molecule_embedding_dict()