Fig. 4
From: CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability

Visualization of the CardioGenAI framework applied to pimozide. The input molecule (pimozide), the 100 generated refined molecules, and the molecules in the training set for the transformer-based models (approximately 5 million datapoints), are projected into a principal component analysis (PCA)-reduced physicochemical-based space, shown in (A). Pimozide is colored yellow, the generated refined compounds are colored purple, and the compounds in the training set of the transformer-based models are colored red. The first two principal components explain 45.07% and 17.61% of the total variance, respectively. Clearly, the CardioGenAI framework is able to identify the region of physicochemical space corresponding to compounds that are similar to pimozide, yet exhibit significantly reduced activity against the hERG channel. The density of predicted pIC50 values against the hERG channel of the generated refined compounds as compared to that of pimozide is shown in (B). The distribution of generated compounds exhibits a maximum predicted pIC50 value of 6.00, with a mean of 5.59 and minimum of 4.64