chapter seven

7 Unsupervised Learning: Repurposing Drugs, Curating Compounds, & Screening Fragments

This chapter covers

Key methods in unsupervised learning
Application of dimensionality reduction for drug repurposing
Clustering to facilitate the design of diverse and focused compound libraries
How to leverage density estimation for pharmacophore modeling
The fragment-based drug design paradigm

In the last three chapters, we learned about several applications of supervised learning to molecular property prediction (e.g., solubility and metabolism) and QSAR model development. However, experiments in the life sciences are costly and time-consuming, and a vast majority of available data goes unlabeled. We can leverage unsupervised learning to uncover hidden patterns and intrinsic structures within data without the need for labels. Unlike supervised learning, where models are trained with input-output pairs, unsupervised learning algorithms seek to identify the underlying relationships and distributions that govern the data, using only the inputs.

Through unsupervised learning, we will achieve tasks such as:

7.1 Dimensionality Reduction: Drug Repurposing

7.1.1 High Throughput Screening Data

7.1.2 Self-Organizing Maps

7.1.3 Drug Repurposing

7.1.4 Universal Manifold Approximation & Projection (UMAP)

7.2 Clustering: Curating Diverse Compound Libraries

7.2.1 Diversity and Focus

7.2.2 Combinatorial Libraries

7.2.3 Cluster-based Compound Selection

7.2.4 Dissimilarity-based Compound Selection

7.3 Density Estimation: Fragment-based Drug Discovery

7.3.1 Fragment-based Drug Discovery

7.3.2 Pharmacophore Modeling

7.3.3 Density Estimation

7.4 Summary

7.5 Exercises

7.6 References