- Research
- Open access
- Published:
Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity
Journal of Cheminformatics volume 16, Article number: 122 (2024)
Abstract
The adverse outcome pathway (AOP) concept has gained attention as a way to explore the mechanism of chemical toxicity. In this study, quantitative structure–activity relationship (QSAR) models were developed to predict compound activity toward protein targets relevant to molecular initiating events (MIE) upstream of organ-specific toxicities, namely liver steatosis, cholestasis, nephrotoxicity, neural tube closure defects, and cognitive functional defects. Utilizing bioactivity data from the ChEMBL 33 database, various machine learning algorithms, chemical features and methods to assess prediction reliability were compared and applied to develop robust models to predict compound activity. The results demonstrate high predictive performance across multiple targets, with balanced accuracy exceeding 0.80 for the majority of models. Furthermore, stability checks confirmed the consistency of predictive performance across multiple training-test splits. The results obtained by using QSAR predictions to identify known markers of adversities highlighted the utility of the models for risk assessment and for prioritizing compounds for further experimental evaluation.
Scientific contribution
The work describes the development of QSAR models as tools for screening chemicals with potential systemic toxicity, thus contributing to resource savings and providing indications for further better-targeted testing. This study provides advances in the field of computational modeling of MIEs and information from AOP which is still relatively young and unexplored. The comprehensive modeling procedure is highly generalizable, and offers a robust framework for predicting a wide range of toxicological endpoints.
Introduction
Toxicology has recently experienced a paradigm shift, emphasizing alternative testing methods based on an understanding of modes of action and biological mechanisms responsible for adverse effects. The adverse outcome pathway (AOP) concept has received significant attention as the foundation for the development of novel testing methods rooted in a deep comprehension of biological events leading to toxic outcomes. An AOP is a logical construct linking an upstream molecular initiating event (MIE), such as a chemical interaction with a molecular target, to a downstream adverse outcome (AO), progressing through key events (KE) causally connected by KE relationships [1, 3, 48, 50, 51].
In recent years, the AOP framework has attracted interest as a strategy for chemical risk assessment [34, 43]. The Organisation for Economic Cooperation and Development (OECD) has recognized the promising role of AOPs, actively promoting their development since its conceptualization [33]. The AOP-Wiki, a knowledgebase platform facilitating sharing and collaborative AOP development (https://aopwi.ki.org/) exemplifies the utility of this tool, housing approximately 460 AOP, 1700 KE, and over 2500 KE relationships.
A key advantage of AOP use lies in its chemical-agnostic representation of toxicological processes. This implies that the knowledge underlying a single AOP can characterize the toxicity of multiple chemical stressors capable of interacting with and modulating endogenous targets initiating toxicological processes and triggering associated MIEs. Consequently, AOPs enable the prioritization of chemicals based on activated MIEs and their extent of activation, aiding in the assessment of compounds with unknown hazards [28].
Accordingly, the AOP concept has proven valuable in guiding the development of New Approach Methodologies (NAMs). Direct assessment of systemic-level adverse effects often necessitates complex, time-consuming, and ethically questionable in vivo testing methods. In contrast, some in vitro assays focusing on single cellular events connected to MIEs or KEs are more straightforward and advantageous in several ways and are also scientifically supported by the causative link between molecular events and AO documented by AOPs.
Similar to AOP-based in vitro testing methods, various authors have emphasized the integration of in silico methods with the AOP concept [13, 28, 34, 43]. Notably, quantitative structure–activity relationships (QSAR) models have gained interest because they can estimate the capability of chemicals to modulate MIE-linked activity [2, 17, 20, 38, 47]. AOP-derived information aids in simplifying complex systemic endpoints into simpler ones that describe the modulation of single molecular targets, thereby facilitating the establishment of relationships effectively captured by QSAR models [20].
The integration of QSAR and AOP information is further supported by the abundance of data from cellular assays on MIE-relevant targets. High-throughput screening (HTS) and high-content screening (HCS) assays, accessible from chemical databases and computational toxicology dashboards (e.g., ToxCast/Tox21, PubChem, ChEMBL), provide substantial and high-quality information. This wealth of data enables the exploitation of the full potential of machine learning (ML) and artificial intelligence advancements in computational toxicology [30, 36].
In this study, we leverage extensive data availability to develop a battery of QSAR models to predict the modulation of MIEs upstream of five AOP networks. These networks encompass multiple AOPs that converge on common AOs related to liver (steatosis and cholestasis) [44, 46], kidney (nephrotoxicity) [5] and developing brain (neural tube closure and cognitive functional defects) [23, 29, 41]. These organ-specific adversities are of primary concern in drug safety assessment, with organ toxicity identified as a major contributor to preclinical safety attrition [12]. Moreover, these organs are potential targets in oral repeated-dose toxicity studies on food additives [25], pesticides [40] and cosmetic products [49].
Multiple ML algorithms were employed on the chemical bioactivity data for relevant protein targets (receptors, enzymes, and transporters) associated with the selected AOP networks. A comprehensive hyperparametrization of the models, benchmarking of various modeling techniques, and external validation were conducted to assess the predictivity of the models under different conditions. Multiple strategies have been applied to select reliable predictions. Finally, the models were applied to discriminate between decoys and known markers of the adversities under study. The promising external validation performance and the results achieved by the screening of markers suggest that the models presented here could serve as an effective first tier in an integrated testing strategy tailored to rapidly screen numerous chemicals and prioritize those posing the highest potential hazard.
Methods
Selection of targets associated with MIEs
An initial list of MIEs was compiled from AOP networks related to five organ-specific adversities: liver steatosis (STE) [46], cholestasis (CHO) [44], kidney failure (KF) [5], neural tube closure defects (NTC) [23], and cognitive functional defects (CFD) [29, 41]. Only MIEs describing agonism, antagonism, binding, or inhibition of specific proteins (e.g., receptors, transporters, and enzymes) were considered. Chemical events not attributable to a specific protein target, such as cytotoxicity, protein alkylation, or reactive oxygen species production, were discarded.
Bioactivity data for the selected MIE targets were manually extracted from the ChEMBL 33 database [14]. Targets not found in the ChEMBL database or those lacking bioactivity data were excluded. The final list of MIE targets for each adversity is detailed below.
-
Liver steatosis (STE): aryl hydrocarbon receptor (AHR), liver X receptor (LXR), pregnane X receptor (PXR), peroxisome proliferator-activated receptor alpha (PPARα) and gamma (PPARγ).
-
Cholestasis (CHO): bile salt export pump (BSEP), multidrug resistance-associated protein isoform 2 (MRP2), isoform 3 (MRP3), isoform 4 (MRP4), Na+/taurocholate co-transporting polypeptide (NTCP).
-
Kidney failure (KF): organic anion transporter (OAT1), cyclooxygenase 1 (COX1), angiotensin-converting enzyme (ACE) and angiotensin II receptor type 1 (AT1R).
-
Neural tube closure (NTD): histone deacetylase (HDeac), cytochrome P450 26A1 (CYP26), sonic hedgehog signaling molecule (SHH), fibroblast growth factors (FGF), bone morphogenetic proteins (BMP), protein Wnt (WNT), carbamoyl-phosphate synthetase 2, aspartate transcarbamylase, dihydroorotase (CAD), noggin (NOG), smoothened (SMO), frizzled (FRIZ), fibroblast growth factor receptor isoform 1 (FGFR1), isoform 2 (FGFR2), isoform 3 (FGFR3), isoform 4 (FGFR4).
-
Cognitive function defects (CFD): acetylcholinesterase (ACHE), Na+/I− symporter (NIS) N-methyl-d-aspartic acid receptor (NMDA), ryanodine receptors (RYR) thyroperoxidase (TPO), transthyretin (TTR), thyroid hormone receptor alpha (THRα) and beta (THRß), voltage-gated sodium channels (VGSC).
Bioactivity data curation
Bioactivity data from ChEMBL 33 were curated according to a protocol adapted from Bosc et al. [9]. Specifically:
-
Only bioactivities with pChEMBL values, representing measures of half-maximal responses (molar IC50, XC50, EC50, AC50, Ki, Kd, potency, and ED50) expressed on a negative logarithmic scale [6] were chosen.
-
Records with no potential duplicates, flagged with no "Data Validity Comment" and not classified as ‘inconclusive,’ ‘undetermined,’ or ‘not determined’ in the “Comment” field were considered.
-
When possible, data related to Homo sapiens were prioritized, and when multiple isoforms were available for a single target, data were aggregated unless a single isoform was specified in the MIE description.
Selected pChEMBL data were converted into a binary classification (active, not active) based on the information reported in the ‘Standard Value’ (the standardized bioactivity value), ‘Standard Relation’ (the activity qualifier) and ‘Comment’ fields (reporting activity conclusions from the data depositor after taking into account other factors such as counter screens). A graphical overview of the data curation procedure is shown in Fig. 1.
-
When both a ‘Standard Relation’ and a ‘Standard Value’ were available and ‘Standard unit’ was ‘nM’, records were classified as not active if the ‘Standard Relation’ was “=”, “>” or “≥” and the ‘Standard Value’ was higher or equal than 10,000 nM. Conversely, records were classified as active if the ‘Standard Relation’ was “=” or “<”and the ‘Standard Value’ was lower than 10,000 nM.
-
When a ‘Standard Value’ was available but a ‘Standard Relation’ was missing, records with a ‘Standard Value’ higher or equal to 10,000 nM were flagged as not active, while those with ‘Standard Value’ lower than 10,000 nM were flagged as active.
-
If neither the ‘Standard Value’ nor the ‘Standard Relation’ were available, the ‘Comment’ field was further searched for keywords suggesting an activity classification (e.g., active/not active, inhibitor/not inhibitor).
After assigning activity labels, the ‘Comment’ field was further searched for keywords suggesting an activity classification (e.g., active/not active, inhibitor/not inhibitor, etc.). If any keyword was found suggesting a contradictory activity with respect to the one previously assigned, the record was discarded.
Workflow for bioactivity data selection and curation. Raw data extracted from ChEMBL were filtered for a specific ‘standard type’. Only data with no potential duplicates, flagged with no ‘data validity comments’, and without concerning ‘Activity comments’ were considered. Based on the availability of information in the ‘Standard Value’ (SV), ‘Standard Relation’ (SR) and ‘Comment’ fields, different criteria were used to classify records into active or not active
Structural data curation
A semi-automated curation procedure [18] was applied to SMILES strings retrieved from ChEMBL to neutralize ionized chemical structures, remove counterions, and discard inorganics, organometallics, and mixtures. Duplicate structures were verified automatically at the InChI level. Duplicates with contradictory activity labels were removed. Curated datasets with less than 100 data points or poorly represented for one of the two activity categories (less than 20 data points in each class) were discarded, resulting in the exclusion of 11 targets (LXR, MRP2, MRP3, NCTP, OAT1, NIS, TPO, CAD, NOG, SHH, FRIZ).
Table 1 provides a list of the targets considered for modeling along with information on the dataset numerosity. The curated datasets are available in the Supporting Material (File 1).
To visualize the structural coverage of the datasets, chemicals were plotted in a reference chemical space encompassing the major chemical categories of real-life interest. The reference chemical space integrated data from the ECHA database of substances registered under the REACH directive representing industrial chemicals (https://echa.europa.eu/da/information-on-chemicals/registered-substances), Drug Bank representing approved drugs ([54], https://go.drugbank.com/), and the Natural Products Atlas representing natural chemical products ([45], https://www.npatlas.org/). The chemicals included in the three datasets are reported in Supporting Material (File 2, Tables S1–S3). Compounds from the MIE datasets and the three reference datasets were standardized, and functional connectivity circular fingerprints (FCFP) with a radius of 2, folded to 1024 bits, were computed using the Chemistry Development Kit (CDK; https://cdk.github.io/). Each dataset was subsequently plotted on the two-dimensional chemical space resulting from principal component analysis, allowing for an assessment of the chemical space covered by the MIE datasets.
As depicted in Fig. 2, the datasets collected from ChEMBL have different sizes, and some of them cover very specific portions of the chemical space. However all of them mainly cover areas associated with drugs (blue density curves), with some of them partially spreading into areas covered by industrial (green density curves) and natural products. This was not unexpected, considering that the data from ChEMBL mainly covered literature data of drug-like compounds and potential lead compounds.
Reference chemical space with projections of the training dataset. The reference chemical space is represented by a kernel density estimate plot, with three probability density curves indicating the chemical space covered by drugs (blue), natural chemicals (orange) and industrial chemicals (green). The training datasets are represented with scatter plots characterized by different colors based on the target protein
QSAR development
The collected data were used to develop QSAR models by employing various independent variables, ML methods, and associated parameters. The following procedure was applied to each curated MIE dataset. A graphical representation of the QSAR model development is shown in Fig. 3.
Graphical representation of the modeling workflow. 1 Targets for modeling were identified from five AOPs of the liver, kidney, and developing brain adversities. 2 Bioactivity data for each target were extracted from ChEMBL and curated. 3 Three types of chemical features were evaluated as independent variables. Feature selection was performed using the VSURF method. 4 The datasets were partitioned into training and test set. Synthetic minority oversampling (SMOTE) was applied to the training data. 5 Six machine learning methods were evaluated for modeling (MLP multilayer perceptron, SVM support vector machine, KNN k-nearest neighbors, GB gradient boosting, XGB extreme gradient boosting, BRF balanced random forest), and hyper-parameterized in cross-validation. The five best models were selected based on their balanced accuracy in external validation. 6 The stability of the selected models was checked by iteratively replicating the training-test split, and the final model was selected based on the average balanced accuracy. Confidence estimation and novelty detection methods were applied to identify reliable predictions
Independent variables
Three types of independent variables were compared for their ability to describe chemical structures within the QSAR models:
-
Extended structural fingerprints: Provided by the KNIME-CDK extension [8] and implemented in KNIME Analytics Platform v5.2.5 [7].
-
Molecular descriptors: Calculated using DRAGON software [42], which includes the simplest atom types, functional groups and fragment counts, topological and geometrical descriptors, and several physico-chemical properties. The descriptors were normalized in a 0–1 range.
-
Continuous data-driven descriptors (CDDD): Generated by deep neural networks, these descriptors are derived from the embedding learned by a model trained to translate semantically different molecular representations and have been reported to outperform other molecular fingerprints [52]. Descriptors were calculated using the code available at https://github.com/jrwnter/cddd
The models were built using both the entire pool of descriptors and a reduced set selected by the R package VSURF, based on a Random Forest (RF) algorithm that eliminates redundant or irrelevant variables [22]. A reduced pool of descriptors often improves performance and facilitates model interpretation [4, 16, 18, 26].
Classification algorithms
Six classification algorithms implemented in the KNIME Analytics Platform v5.2.5 [7] were applied to develop QSAR models:
-
Balanced random forest (BRF): A variation of RF [11] was used which artificially balances the class distribution in each tree. The number of trees was tuned (100 to 500, step: 50).
-
K-nearest neighbors (KNN): The number of neighbors was tuned (2 to 9, step: 2) with chemical similarity weighting.
-
Multilayer perceptron artificial neural network (MLP): Tuned parameters included the number of hidden layers (1 or 2) and the number of hidden neurons (2 to 10).
-
Support vector machine (SVM) with radial basis function: Tuned arameters included cost (10−1 to 10−2) and sigma (10−4 to 10) on a logarithmic grid [35].
-
Gradient boosting (GB): Tuned parameters included the number of trees (100 to 500, step: 50) and the learning rate (0.1 to 10, step: 0.1) [53].
-
Extreme gradient boosting (XGB): Tuned parameters included gamma (0 to 10, step: 0.1), lambda (0 to 10, step: 0.1), alpha (0 to 10, step 0.1), eta (0 to 1.0, step: 0.1), and the maximum depth of each tree (3 to 10, step: 1) [53].
Model development workflow
-
Dataset partitioning: Datasets were randomly partitioned into training (TrS) and test (TeS) sets (80:20 ratio) using stratified sampling to ensure a uniform distribution of active and not active samples between the two datasets for each endpoint.
-
Hyperparameter tuning: Model parameters were selected based on hyperparameter tuning and five-fold cross-validation performance [37] calculated by micro averaging performance of the folds. Balanced accuracy (BA) was used as the objective function. A grid search [27] was employed for parameter selection except for GB and XGB, where Bayesian optimization was used to reduce the computational time [39].
-
Addressing imbalance: ML methods can be ineffective for highly unbalanced data, which are common in the case of biological activities and toxicological endpoints. Synthetic Minority Oversampling Technique (SMOTE) was applied to mitigate the impact of highly unbalanced data and create new synthetic samples until the two categories were balanced [10]. SMOTE was not applied to the KNN, whereas two versions with and without SMOTE were trained for the remaining ML models.
-
Model evaluation: A total of 66 models were developed for each dataset, considering three independent variables, six ML methods, and the possible application of feature selection and SMOTE. The models were ranked based on BA on TeS, and the top five solutions were selected.
-
Validation stability check: The splitting procedure was repeated 100 times for each model to ensure the stability and reliability of external predictivity performance. The optimal parameters identified from the previous grid search were kept constant, and statistics on the TeS were collected for each iteration.
-
Optimal model selection: Overall external predictivity was obtained by macro-averaging statistics obtained by single splitting iterations, and the optimal solution among the top five models was identified based on the maximum average BA.
Analysis of the reliability of predictions
The reliability of single predictions was evaluated. Two different strategies were applied to identify unreliable predictions based on similarity and probability-based approaches. The former, often referred to as ‘novelty detection’, is independent of the original classifier and is useful for flagging predicted test chemicals structurally dissimilar to the training samples. The latter, referred to as ‘confidence estimation’, uses information of the trained model to identify test chemicals falling close to the decision boundary of the classifier [24].
-
Novelty detection (ND): a QSAR model cannot reliably predict chemical structures significantly different from those of TrS compounds. Extended fingerprints in CDK-KNIME [8] were calculated to estimate the average pairwise similarity between each TeS compound and the five closest analogues in the TrS. The similarity was expressed as the Tanimoto coefficient. TeS chemicals with an average similarity lower than a given threshold were considered too remote from the TrS and the associated predictions were flagged as unreliable.
-
Confidence estimation (CE): probabilities associated with predictions returned by each model were used to identify predictions characterized by high reliability. Probabilities have a different meaning depending on the technique applied (e.g., for RF the probability is equal to the percentage of trees returning the same prediction) and lower values flag for smaller distances to the decision boundary, yielding a potentially higher error probability. Predictions with probabilities lower than a given threshold were considered unreliable.
ND and CE strategies were compared by evaluating the variation in global performance (MCC and BA) of each model when unreliable TeS predictions were excluded from the statistical evaluation. The combination of the two strategies was also evaluated by excluding predictions deemed unreliable by either the ND or CE.
QSAR models for screening adversities
The QSAR models developed to predict binding to MIE-associated proteins were utilized to identify chemicals known to correlate with the five adversities under study. The Comparative Toxicogenomics Database (CTD) was queried for chemicals reported to be correlated with five diseases (i.e., markers of diseases) aligned with the five adversities: ‘Non-alcoholic Fatty Liver Disease’ (NAFLD) for STE, ‘Intrahepatic Cholestasis’ for CHO, ‘Cognition Disorders’ for CFD, ‘Neural Tube Defects’ for NTC, and ‘Chronic Kidney Failure’ for KF. The CTD is a comprehensive collection of scientific literature linking chemical exposure with genetic, molecular, and biological outcomes (e.g., diseases) curated manually using controlled vocabularies and ontologies [15]. The SMILES notation for these chemicals was retrieved and curated using the same semi-automated procedure applied to prepare the datasets of the models [18]. A curated collection of known markers of diseases was predicted using the developed QSAR models. The lists of chemical markers associated with diseases and associated MIE predictions are included in the Supporting Material (File 2, Table S4). To verify the utility of QSAR models to identify chemicals potentially leading to adversities, the variation in the relative abundance of markers was analysed with respect to the number of active QSAR predictions for MIEs upstream of each of the five adversities. The markers for remaining diseases were iteratively treated as decoys.
Results
Model performance
The average BA over 100 training-test splits was used to rank the models and to select the single best ones for each investigated target. Performance of the best models, along with optimal parameters and modeling methods, is presented in Table 2. A comprehensive overview of the performance achieved by all the developed models is provided in the Supporting Material (File 2, Table S5) while optimal models for each MIE are provided as a KNIME workflow at https://github.com/DGadaleta88/MIE_QSAR.
Overall, the results depicted in Table 2 affirm the effectiveness of the experimental approach employed, as the majority of the developed models yielded high statistics in terms of overall predictivity and balanced capability to predict samples from both active and not active categories. Nearly all the best models for each MIE achieved an average Balanced Accuracy (BA avg) in external validation exceeding 0.80, with some targets surpassing 0.90, particularly those with larger datasets. Conversely, endpoints with a lower BA avg typically had fewer validation samples (less than 50), as observed in the cases of BSEP, MRP2, and TTR.
The amount of data is a crucial factor to consider when evaluating the models’ capability to predict samples accurately from both activity classes without generating excess or false positives or false negatives. Models with fewer test samples, such as AHR, MRP4, and FGF, exhibited a lower balance between sensitivity (SEN) and specificity (SPE), with some endpoints returning values lower than 0.75 for one of the two parameters. The imbalance in datasets and the limited number of samples in one of the activity classes for training the ML models could explain this behaviour. Notably, these cases often refer to models that were neither BRF nor SMOTE-based, underscoring the importance of artificial data balancing methods for achieving better predictive performance from unbalanced data.
Stability checks performed by averaging statistical parameters across multiple training-validation splits confirmed the consistency of predictive performance across various splits without being biased by favourable partitioning. The low standard deviations associated with the average statistical parameters (typically lower than 0.10) indicated stability across iterations, except for endpoints with a low number of validation samples in one of the activity categories, where variability in statistical parameters was more pronounced (e.g., MRP4, FGF, SMO, WNT, and CYP26).
Figure 4 provides a graphical breakdown of the performance (average SEN and SPE) of the top five models preliminarily identified from the initial training-test splits subjected to the iterative stability check, highlighting differences in the effectiveness of ML methods and features applied. Targets associated with NTC and STE consistently exhibited good predictive performance, whereas targets related to CFD, KF, and CHO demonstrated variable performance (Fig. 4A).
Comparison of model statistics for protein targets involved in MIEs. The top five preliminarily selected models were grouped based on A adversity associated with the predicted target (STE steatosis, CHO cholestasis, CFD cognitive functional defect, KF kidney failure, NTC neural tube closure), B ML method (BRF balanced random forest, GB gradient boosting, KNN k-nearest neighbors, MLP multi-layer perceptron, SVM support vector machine—radial basis function, XGB extreme gradient boosting), and C the features used in the model. On the left side of the figure, the scatter plots show the SPE and SEN of the models. Models applying SMOTE and VSURF are marked with stars, while the best models for each target based on average BA are circled in red. On the right side of the figure, the average statistics were graphically disclosed in box plots
When looking at both the preliminarily selected (Fig. 4B) and single best models for each target (Table 2), BRF emerged as the most successful ML method. Owing to its intrinsic capability to handle unbalanced data, the BRF generally does not benefit from SMOTE application. Conversely, XGB, GB, and SVM were also successful and were often associated with SMOTE rebalancing. The MLP and KNN models generally yielded lower statistics.
Regarding features, there is no clear-cut optimal solution. Dragon descriptors were slightly more frequent among the preliminarily selected models (Fig. 4C) compared to CDDD and fingerprints, despite the latter consistently showing good predictivity (SEN and SPE higher than 0.60) compared to Dragon models that spread more through SEN and SPE ranges. Fingerprint models were also those that showed up more frequently among the single best models (Table 2). BRF models are often coupled with Dragon descriptors and fingerprints, whereas CDDD is more effective when applied to SVM. Feature selection from VSURF does not sensibly impact the performance of the developed models, rarely shows up among the single best models (Table 2), and is notably relevant only in the case of the SVM and CDDD models that benefitted from feature selection for approximately the 50% of the preliminarily selected models (Fig. 4C).
Evaluation of reliability of predictions
The reliability of the predictions generated by QSAR models for MIEs was defined using two orthogonal approaches based on similarity (ND) and probability (CE). ND involves calculating the average similarity of each TeS chemical with respect to its five closest neighbors in the TrS. In contrast, CE utilizes prediction probabilities to assess the reliability of each prediction. Predictions below a certain threshold are considered unreliable.
Figure 5 graphically depicts the variation in global performance parameters, namely BA and MCC, as the thresholds for the minimum accepted neighbor similarity or probability were gradually increased from 0.0 to 1.0. As expected, both ND and CE demonstrated that excluding unreliable predictions generally improved the predictive performances of the models. An overview of various statistical parameters with respect to the variation in thresholds is provided in the Supporting Material (File 2, Table S6). With a few exceptions in smaller datasets, CE resulted in better and more gradual discrimination of reliable predictions compared to the ND method. A sufficient number of training and test samples was crucial for generating stable models and observing clear data trends during validation. Indeed, the superior performance of the CE method was particularly evident in models characterized by a higher number of test samples, such as ACHE, COX1, HDEAC, PPARa, PPARg, and FGFR1. Conversely, models with fewer test samples, such as AHR, MRP4, FGF, and WNT, exhibited significant variability in performance, likely owing to the challenges of observing clear data trends with limited test data.
Variation of global performance after removing unreliable predictions resulting from the ‘novelty detection’ and the ‘confidence estimation’ methods. The variations in the BA and MCC parameters are represented by solid and dashed lines, respectively, whereas the variation in coverage (i.e., percentage of reliable test set predictions) is represented by dotted lines. Variation of parameters associated with the application of the ‘novelty detection’ are represented by red lines, while those associated with the ‘confidence estimation’ are represented by blue lines. The combination of the two approaches is represented by green lines. ND novelty detection, CE confidence estimation
Increasing both the similarity and probability thresholds naturally reduces the percentage of reliable predictions. The reduction in coverage was more gradual with CE, particularly for the datasets with fewer data points. For larger test sets, the difference between the two methods was less pronounced, allowing for a fairer comparison of the variation in statistics caused by the two approaches. Generally, the integration of the two methods either did not improve the performance returned by the best strategy for each model or was associated with an unacceptable decrease in coverage (such as for ACHE, BSEP, FGFR2, FGFR3, MRP4, TTR and VGSC).
Screening markers of adversities
The developed QSAR models for MIEs were applied to predict a dataset extracted from the CTD that included chemicals (‘markers’ of diseases) known to correlate with the five adversities under study. When considering each adversity, the markers of the remaining diseases were treated as decoys. Given the causative correlation between MIEs and adversities downstream of the specific AOP, a higher number of active MIE responses were hypothesized to be associated with markers for a disease.
While the variation in the percentage of markers does not fully justify using this strategy to screen chemicals for potential adversities, a trend emerged favoring chemicals characterized by a high number of active MIE predictions upstream of each adversity. In particular, the portion of the dataset characterized by multiple active predictions generally showed a higher relative percentage of markers than the original list of chemicals. Notably, an increase in the percentage of markers was observed for KF (from 16 to 25% for chemicals predicted to be simultaneously active for ACE, COX1, and AT1R), CFD (from 52 to 67% for chemicals active for the six relevant MIEs), CHO (from 20 to 25% for chemicals active for both BSEP and MRP4), and STE (from 17 to 28% for chemicals active for all four MIEs). No favourable trends were observed for chemicals with multiple active predictions for the MIEs of the NTC (Fig. 6).
Variation in the abundance of the markers of adversities after screening CTD chemicals using MIE predictions. For each adversity, the percentage of markers was compared between the complete dataset compiled from the CTD (bottom bars, lighter colours) and the subset of the dataset characterized by multiple positive MIE predictions upstream of adversities (top bars, darker colours). Red bars refer to markers of specific adversity, while blue bars refer to decoys. CHO cholestasis, STE steatosis, NTC neural tube closure, CFD cognitive functional defects, KF kidney failure
Discussion
The development and validation of QSAR models for the prediction of chemical toxicity is crucial for reducing reliance on animal testing. In this study, a comprehensive approach was developed and employed to model the disruption of various proteins relevant to the initiation of organ-related adverse outcomes. Diverse feature sets and ML algorithms were used, with the results demonstrating the robustness and generalizability of the proposed methodology, as evidenced by the high BA avg across most models, particularly for targets with larger datasets and a less pronounced imbalance between activity categories.
As suggested by previous findings [4, 16,17,18, 26], the consistent performance of BRF, particularly when coupled with Dragon descriptors, highlights the effectiveness of this method in handling unbalanced datasets and the capability of the model to grasp the complexities and variability inherent in biological data. The use of SMOTE to balance datasets proved beneficial for most ML methods, particularly the SVM and XGB approaches, which otherwise struggle with unbalanced data. Conversely, utilizing a reduced set of descriptors from feature selection methods did not return particularly superior performance compared to those using the entire pool, suggesting that the ML methods used perform relatively well ‘out-of-shelf’.
Despite the generally good performance disclosed by external validation, the variability in SEN and SPE and the general discrepancies in model performance for targets with fewer validation samples underscore the challenges posed by limited data and the potential for overfitting in such scenarios. In this regard, both expanding the biological coverage of HTS assays in terms of molecular targets, and the generation of new and high-quality bioactivity data, may increase the predictivity of the ML models.
Considering the limitations associated with the availability and diversity of training data, identifying less reliable predictions is of utmost importance. Both ND and CE have been already proposed to define the applicability domain of classification models [24, 31] because of their capability to remove unreliable predictions and generally improve model performance, which also brings advantages specific for each strategy. In particular, the models are not expected to generate reliable predictions for chemicals that are significantly different from those in the TrS. ND recognizes this aspect, underscoring the necessity of carefully checking the inclusion of test compounds in the chemical space covered during model development to ensure robust and reliable predictions independently of the classifier used. Conversely, CE uses information form the underlying classifier to identify the most uncertain predictions generated for chemicals falling close to the model’s decision threshold. This method also offers a significant advantage of quantifying the confidence level of prediction, adding a layer of decision-making that goes beyond a simple binary classification. This is potentially relevant for various applications, such as probabilistic hazard and risk assessment; indeed, understanding the probability associated with each prediction can help prioritize chemicals for further testing.
Based on these results, CE provided a better balance between the improvement of predictive performance and coverage, confirming the findings of previous studies [19, 21]. This method appears to be more useful for identifying reliable predictions, especially in regulatory contexts where the accuracy of predictions is critical. In this regard, a probability threshold of 0.70 represents a good compromise in terms of increased performance and loss of predictions for the models evaluated, while lower thresholds (e.g., 0.60) are likely more suitable for the ND because of to the more drastic reduction in coverage at higher thresholds.
To investigate the practical utility of MIE predictions for screening chemicals with potential systemic adverse effects, the developed models were applied to a set of chemicals, including known markers of adversities downstream of the MIEs. Despite the general increase in the percentage of markers for the subset of data returning positive MIE predictions compared with the entire dataset, this increase was often modest. The strategy applied suffers from a few limitations that explain the moderate results. Notably, the proteins covered by the models represent only a subset of the MIEs potentially involved in the onset of the adversities under investigation, implying that the current set of QSAR models can only capture a portion of the mechanisms leading to the adversities, and some markers might not be correctly identified because of the incomplete coverage of the AOPs. Another aspect to keep in mind is that the markers for the remaining adversities were used as decoys in the analysis, rather than chemicals explicitly reported as negative. This approach introduces a degree of uncertainty in the activity assignment of negative samples, as these decoys may not be truly not active and could possess some unreported or undetected adverse effects, potentially leading to misclassification.
Despite these limitations, the results demonstrated that predictions from the QSAR models could provide hints about the presence of chemicals with potential adverse effects. This finding underscores the utility of computational predictions in prioritizing chemicals for further testing, especially when integrated with other non-testing methods such as in vitro and HTS assays. Such an integrated approach could enhance the accuracy of the screening process, allowing for more reliable identification of hazardous substances.
Conclusion
This study demonstrates the utility of QSAR modeling as a valuable tool for assessing the potential effects on specific protein targets associated with organ-specific toxicities. By leveraging multiple ML algorithms and chemical descriptors, as well as feature selection techniques and data balancing methods, robust models have been developed to accurately predict compound activity across various targets. The consistency of the predictive performance observed across multiple TrS-TeS splits underscores the robustness of the procedure and its versatility and potential utility on a large spectrum of endpoints. The implementation of strategies for identifying and excluding unreliable predictions further enhances the practical utility of the models.
Overall, the results highlight the utility of computational models and the suitability of the statistical methods used to provide evidence of adverse effects in a more time- and resource-efficient manner compared to other traditional testing methods. In particular, QSAR models represent an ideal and valuable first tier of integrated frameworks because of their capability to rapidly screen large numbers of chemicals and prioritize compounds for additional experimental testing based on their potential adverse effects. The predictions of MIEs provided by QSARs may provide indications of which assays to prioritise among the battery of in vitro tests available for assessing chemical toxicity.
Exploring the integration of QSAR models with other experimental approaches, such as HTS and HCS assays, is an essential future step to enable a more comprehensive and holistic assessment of the biological activity. This integrative approach leverages the strengths of each method to provide more accurate and reliable predictions, thereby facilitating informed decision-making in chemical risk assessment.
Importantly, the QSARs are built on AOP knowledge that guarantees a strong mechanistic basis for fulfilling regulatory requirements [32]. By modeling the interactions between chemicals and these specific targets, QSAR models provide valuable insights into the molecular mechanisms driving adverse outcomes, although they currently capture only part of the mechanisms owing to incomplete AOP coverage. Despite this limitation, QSAR predictions provide a scientifically sound rationale for the identification and mitigation of potential hazards, with the potential to further expand their mechanistic coverage as more AOP knowledge and training data become available.
Going forward, future work should focus on expanding the dataset size and quality exploring additional data and feature types to potentially enhance model interpretability to further improve the robustness and applicability of QSAR models in a regulatory context.
Availability of data and materials
QSAR models developed for each MIE are implemented as a KNIME workflow and are available at https://github.com/DGadaleta88/MIE_QSAR. The dataset(s) supporting the conclusions of this article is(are) included within the article (and its additional file(s)): Additional File 1 (XLSX): Curated SMILES of chemicals included in each MIE datasets. Additional File 2 (XLSX): •Table S1: DrugBank chemicals used to build the reference chemical space. •Table S2: ECHA registered chemicals used to build the reference chemical space. •Table S3: Natural Product Atlas (NPA) chemicals used to build the reference chemical space. •Table S4. Lists of chemical markers from the Comparative Toxicogenomic Database (CTD) associated with diseases. •Table S5. Validation performance and optimized parameters of the best five models for each target associated with MIEs. •Table S6. Variation of models’ performance after removing unreliable test predictions.
Abbreviations
- ACHE:
-
Acetylcholinesterase
- AHR:
-
Aryl hydrocarbon receptor
- AO:
-
Adverse outcome
- AOP:
-
Adverse outcome pathway
- BA:
-
Balanced accuracy
- BA avg:
-
Average balanced accuracy
- BMP:
-
Bone morphogenetic proteins
- BSEP:
-
Bile-salt export pump
- CE:
-
Confidence estimation
- CAD:
-
Dihydroorotase
- CDDD:
-
Continuous data-driven descriptors
- CFD:
-
Cognitive functional defects
- CHO:
-
Cholestasis
- CTD:
-
Comparative toxicogenomics database
- HDeac:
-
Histone deacetylase
- CYP26:
-
Cytochrome P450 26A1
- FGF:
-
Fibroblast growth factors
- FGFR:
-
Fibroblast growth factor receptor
- FRIZ:
-
Frizzled
- GB:
-
Gradient boosting
- HTS:
-
High-throughput screening
- HCS:
-
High-content screening
- KE:
-
Key event
- KF:
-
Kidney failure
- KNN:
-
K-nearest neighbors
- LXR:
-
Liver X receptor
- MIE:
-
Molecular initiating event
- ML:
-
Machine learning
- MCC:
-
Matthew’s correlation coefficient
- MLP:
-
Multilayer perceptron artificial neural network
- MRP:
-
Multidrug resistance-associated protein (MRP)
- NAFLD:
-
Non-alcoholic fatty liver disease
- NAM:
-
New approach methodology
- ND:
-
Novelty detection
- NIS:
-
Na+/I− symporter
- NMDA:
-
N-methyl-d-aspartic acid receptor
- NOG:
-
Noggin
- NRF2:
-
Nuclear actor erythroid 2-related factor 2
- NTC:
-
Neural tube closure
- NTCP:
-
Na+/taurocholate co-transporting polypeptide
- OECD:
-
Organisation for Economic Cooperation and Development
- PXR:
-
Pregnane X receptor
- PPAR:
-
Peroxisome proliferator activated receptor
- QSAR:
-
Quantitative structure–activity relationships
- RF:
-
Random forest
- RYR:
-
Ryanodine receptor
- SEN:
-
Sensitivity
- SMOTE:
-
Synthetic minority oversampling technique
- SPE:
-
Specificity
- SHH:
-
Sonic Hedgehog signaling molecule
- SMO:
-
Smoothened
- STE:
-
Steatosis
- SVM:
-
Support vector machine
- TeS:
-
Test set
- THR:
-
Thyroid hormone receptors
- TPO:
-
Thyroperoxidase
- TrS:
-
Training set
- TTR:
-
Transthyretin
- VGSC:
-
Voltage gate sodium channels
- WNT:
-
Protein Wnt
- XGB:
-
Extreme gradient boosting
References
Allen TE, Goodman JM, Gutsell S, Russell PJ (2016) A history of the molecular initiating event. Chem Res Toxicol 29(12):2060–2070
Allen TE, Goodman JM, Gutsell S, Russell PJ (2019) Quantitative predictions for molecular initiating events using three-dimensional quantitative structure-activity relationships. Chem Res Toxicol 33(2):324–332
Ankley GT, Bennett RS, Erickson RJ et al (2010) Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment. Env Toxicol Chem 29(3):730–741
Baderna D, Gadaleta D, Lostaglio E et al (2020) New in silico models to predict in vitro micronucleus induction as marker of genotoxicity. J Haz Mat 385:121638
Barnes DA, Firman JW, Belfield SJ, Cronin MTD, Vinken M, Janssen MJ, Masereeuw R (2024) Development of an adverse outcome pathway network for nephrotoxicity. Arch Toxicol 98(24):1–14
Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42(D1):D1083–D1090
Berthold MR, Cebron N, Dill F et al (2008) KNIME: The Konstanz information miner. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 319–326
Beisken S, Meinl T, Wiswedel B, de Figueiredo LF, Berthold M, Steinbeck C (2013) KNIME-CDK: workflow-driven cheminformatics. BMC Bioinformatics 14:1–4
Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11:1–16
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen C, Liaw A (2004) Using random forest to learn imbalanced data. University of California, Berkeley, CA
Cook D, Brown D, Alexander R, March R, Morgan P, Satterthwaite G, Pangalos MN (2014) Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nat Rev Drug Discovery 13(6):419–431
Cronin MT, Richarz AN (2017) Relationship between adverse outcome pathways and chemistry-based in silico models to predict toxicity. Appl Vitro Toxicol 3(4):286–297
Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP (2015) ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res 43(W1):W612–W620
Davis AP, Wiegers TC, Johnson RJ, Sciaky D, Wiegers J, Mattingly CJ (2023) Comparative toxicogenomics database (CTD): update 2023. Nucleic Acids Res 51(D1):D1257–D1262
Delre P, Lavado G, Lamanna G et al (2022) Ligand-based prediction of hERG-mediated cardiotoxicity based on the integration of different machine learning techniques. Front Pharmacol 13:951. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fphar.2022.951083
Gadaleta D, Manganelli S, Roncaglioni A et al (2018) QSAR modeling of toxcast assays relevant to the molecular initiating events of AOPs leading to hepatic steatosis. J Chem Inf Model 58:1501–1517
Gadaleta D, Lombardo A, Toma C, Benfenati E (2018) A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications. J Cheminform 10(1):1–13
Gadaleta D, Marzo M, Toropov A, Toropov A, Lavado GJ, Escher SE, Dorne JLC, Benfenati E (2020) Integrated in silico models for the prediction of no-observed-(adverse)-effect levels and lowest-observed-(adverse)-effect levels in rats for sub-chronic repeated-dose toxicity. Chem Res Toxicol 34(2):247–257
Gadaleta D, Spînu N, Roncaglioni A, Cronin MT, Benfenati E (2022) Prediction of the neurotoxic potential of chemicals based on modelling of molecular initiating events upstream of the adverse outcome pathways of (developmental) neurotoxicity. Int J Mol Sci 23(6):3053
Garcia de Lomana M, Weber AG, Birk B, Landsiedel R, Achenbach J, Schleifer KJ, Mathea M, Kirchmair J (2020) In silico models to predict the perturbation of molecular initiating events related to thyroid hormone homeostasis. Chem Res Toxicol 34(2):396–411
Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236
Heusinkveld HJ, Staal YC, Baker NC, Daston G, Knudsen TB, Piersma A (2021) An ontology for developmental processes and toxicities of neural tube closure. Reprod Toxicol 99:160–167
Klingspohn W, Mathea M, Ter Laak A, Heinrich N, Baumann K (2017) Efficiency of different measures for defining the applicability domain of classification models. J Cheminform 9:1–17
Kramer NI, Hoffmans Y, Wu S, Thiel A, Thatcher N, Allen TEH, Levorato S, Traussnig H, Schulte S, Boobis A, Rietjens IMCM, Vinken M (2019) Characterizing the coverage of critical effects relevant in the safety evaluation of food additives by AOPs. Arch Toxicol 93:2115–2125
Lavado GJ, Gadaleta D, Toma C et al (2020) Zebrafish AC50 modelling: (Q)SAR models to predict developmental toxicity in zebrafish embryo. Ecotoxicol Environ Saf 202:110936
La Valle SM, Branicky MS, Lindemann SR (2004) On the relationship between classical grid search and probabilistic roadmaps. Int J Robotics Res 23:673–692
Leist M, Ghallab A, Graepel R, Marchan R, Hassan R, Bennekou SH, Limonciel A, Vinken M, Schildknecht S, Waldmann T, Danen E, van Ravenzwaay B, Kamp H, Gardner I, Godoy P, Bois FY, Braeuning A, Reif R, Oesch F, Drasdo D, Höhme S, Schwarz M, Hartung T, Braunbeck T, Beltman J, Vrieling H, Sanz F, Forsby A, Gadaleta D, Fisher C, Kelm J, Fluri D, Ecker G, Zdrazil B, Terron A, Jennings P, Burg BVD, Dooley S, Meijer AH, Willighagen E, Martens M, Evelo C, Mombelli E, Taboureau O, Mantovani A, Hardy B, Koch B, Escher S, van Thriel C, Cadenas C, Kroese D, Water BVD, Hengstler JG (2017) Adverse outcome pathways: opportunities, limitations and open questions. Arch Toxicol 91:3477–3505
Li J, Settivari R, LeBaron MJ, Marty MS (2019) An industry perspective: a streamlined screening strategy using alternative models for chemical assessment of developmental neurotoxicity. Neurotoxicology 73:17–30
Luechtefeld T, Hartung T (2017) Computational approaches to chemical hazard assessment. Altex 34(4):459
Mathea M, Klingspohn W, Baumann K (2016) Chemoinformatic classification methods and their applicability domain. Mol Inform 35(5):160–180
OECD (2014) Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models, OECD Series on Testing and Assessment, No. 69. OECD Publishing, Paris
OECD (2017) Guidance document on developing and assessing adverse outcome pathways. OECD Publishing, Paris
Patlewicz G, Simon TW, Rowlands JC, Budinsky RA, Becker RA (2015) Proposing a scientific confidence framework to help support the application of adverse outcome pathways for regulatory purposes. Regul Toxicol Pharmacol 71(3):463–477
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Ram RN, Gadaleta D, Allen TE (2022) The role of ‘big data’ and ‘in silico’ New Approach Methodologies (NAMs) in ending animal use–a commentary on progress. Computational Toxicology 23:100232
Schaffer C (1993) Selecting a classification method by cross-validation. Mach Learn 13:135–143
Seo M, Chae CH, Lee Y, Kim HR, Kim J (2021) Novel QSAR models for molecular initiating event modeling in two intersecting adverse outcome pathways based pulmonary fibrosis prediction for biocidal mixtures. Toxics 9(3):59
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems. Curran Associates Inc, New York.
Spielmann H, Gerbracht U (2001) The use of dogs as second species in regulatory testing of pesticides: Part II: subacute, subchronic and chronic studies in the dog. Arch Toxicol 75(1):1–21
Spinu N, Bal-Price A, Cronin MT, Enoch SJ, Madden JC, Worth AP (2019) Development and analysis of an adverse outcome pathway network for human neurotoxicity. Arch Toxicol 93(10):2759–2772
Todeschini R, Consonni V (2008) Handbook of molecular descriptors, vol 11. John Wiley & Sons, Hoboken
Tollefsen KE, Scholz S, Cronin MT, Edwards SW, de Knecht J, Crofton K, Garcia-Reyero N, Hartung T, Worth A, Patlewicz G (2014) Applying adverse outcome pathways (AOPs) to support integrated approaches to testing and assessment (IATA). Regul Toxicol Pharmacol 70(3):629–640
van Ertvelde J, Verhoeven A, Maerten A, Cooreman A, Santos Rodrigues BD, Sanz-Serrano J, Mihajlovic M, Tripodi I, Teunis M, Jover R, Luechtefeld T, Vanhaecke T, Jiang J, Vinken M (2023) Optimization of an adverse outcome pathway network on chemical-induced cholestasis using an artificial intelligence-assisted data collection and confidence level quantification approach. J Biomed Inform 145:104465
van Santen JA, Poynton EF, Iskakova D, McMann E, Alsup TA, Clark TN, Fergusson CH, Fewer DP, Hughes AH, McCadden CA, Parra J, Soldatou S, Rudolf JD, Janssen EML, Duncan KR, Linington RG (2022) The natural products Atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res 50(D1):D1317–D1323
Verhoeven A, van Ertvelde J, Boeckmans J, Gatzios A, Jover R, Lindeman B, Lopez-Soop G, Rodrigues RM, Rapisarda A, Sanz-Serrano J, Stinckens M, Sepehri S, Teunis M, Vinken M, Jiang J, Vanhaecke T (2024) A quantitative weight-of-evidence method for confidence assessment of adverse outcome pathway networks: a case study on chemical-induced liver steatosis. Toxicol 505:153814
Viganò EL, Ballabio D, Roncaglioni A (2024) Artificial intelligence and machine learning methods to evaluate cardiotoxicity following the adverse outcome pathway frameworks. Toxics 12(1):87
Villeneuve DL, Crump D, Garcia-Reyero N et al (2014) Adverse outcome pathway (AOP) development I: strategies and principles. Toxicol Sci 142(2):312–320
Vinken M, Pauwels M, Ates G, Vivier M, Vanhaecke T, Rogiers V (2012) Screening of repeated dose toxicity data present in SCC (NF) P/SCCS safety evaluations of cosmetic ingredients. Arch Toxicol 86:405–412
Vinken M (2013) The adverse outcome pathway concept: a pragmatic tool in toxicology. Toxicology 312:158–165
Vinken M, Knapen D, Vergauwen L, Hengstler JG, Angrish M, Whelan M (2017) Adverse outcome pathways: a concise introduction for toxicologists. Arch Toxicol 91:3697–3707
Winter R, Montanari F, Noé F, Clevert D-A (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10:1692–1701
Wade C, Glynn K (2020) Hands-On Gradient Boosting with XGBoost and scikit-learn: perform accessible machine learning and extreme gradient boosting with Python. Packt Publishing Ltd., Birmingham
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34(1):D668–D672
Acknowledgements
The authors acknowledge Julen Sanz-Serrano and Job Berkhout for their useful feedback
Funding
This work was performed in the context of the ONTOX project (https://ontoxproject.eu/) that has received funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 963845. ONTOX is part of the ASPIS project cluster (https://aspiscluster.eu/).
Author information
Authors and Affiliations
Contributions
DG: Conceptualization, Methodology, Data Collection, Data Analysis, Writing – Original Draft. MGL, ESC, ROV: Methodology, Writing – Review and Editing. AR, RG, EB: Supervision, Writing – Review & Editing, Funding Acquisition.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
13321_2024_917_MOESM2_ESM.xlsx
Additional file 2: Table S1. DrugBank chemicals used to build the reference chemical space. Table S2. ECHA registered chemicals used to build the reference chemical space. Table S3. Natural Product Atlas (NPA) chemicals used to build the reference chemical space. Table S4. Lists of chemical markers from the Comparative Toxicogenomic Database (CTD) associated with diseases. Table S5. Validation performance and optimized parameters of the best five models for each target associated with MIEs. Table S6. Variation of models’ performance after removing unreliable test predictions
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gadaleta, D., Garcia de Lomana, M., Serrano-Candelas, E. et al. Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity. J Cheminform 16, 122 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13321-024-00917-x
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13321-024-00917-x