2 Ligand-based Screening: Filtering & Similarity Searching
This chapter covers
- Virtual screening taxonomy with focus on ligand-based screening.
- How to acquire, curate, visualize, and represent molecule datasets.
- Compound filtering of undesirable properties and substructures.
- Similarity searching to uncover antimalarial hit compounds.
After discussing how drug discovery and machine learning intersect to unearth novel therapeutics, we are ready to focus on specific components of the drug discovery pipeline. We begin our journey with virtual screening. Virtual screening is the computational alternative to experimental, high-throughput screening in a lab (table 2.1 compares these two methods). With advances in robotics and miniaturization, high-throughput facilities can generate large amounts of experimental data and test up to millions of compounds in a reasonable amount of time. While high-throughput screening is cheap for simple testing, it is expensive for complex assays. However, we can use data generated from sources such as high-throughput screens to train machine learning models. With these models, we can quickly and affordably scale up testing to virtually screen billions of compounds.