You are viewing the site in preview mode

Skip to main content
Fig. 1 | Journal of Cheminformatics

Fig. 1

From: Evaluating the generalizability of graph neural networks for predicting collision cross section

Fig. 1

A Model architecture. The upper section of this figure illustrates the conversion of the SMILES representation of the molecule into a molecular graph, which is then represented as three matrices (an adjacency matrix, an edge attributes matrix, and a node attributes matrix). These matrices are fed into a GNN. The GNN’s output is concatenated with the output from a linear model which accepts additional features (such as adduct, instrument type, etc.) as input. This concatenated vector is then fed into another set of fully connected layers which outputs a CCS value. B Evaluation schema. Each database is split in train (80%) and test (20%) based on molecule type (e.g., lipid, small molecule, etc.) and Murcko scaffolds. Next, each model is trained on the training set of each database (either CCSBase train or METLIN-CCS train) and evaluated on the two test sets of both databases (CCSBase test and METLIN-CCS test). When the model is evaluated on the same database that has been trained on, the model has already seen similar molecules, and thus, the evaluation is on similar chemical space (left). When the model is evaluated on a test set containing dissimilar molecules, the evaluation is a novel chemical space (middle). Lastly, both databases are also combined for training and testing (right)

Back to article page