chapter eleven

11 Graph Neural Networks for Predicting Drug-Target Affinity

This chapter covers

Transform SMILES strings and proteins into graph representations.
Master core GNN theory and message-passing mechanisms.
Build a dual-stream GNN to predict drug-target binding affinity.
Train and evaluate the model using real-world benchmark datasets.
Interpret the model's performance and prediction results.

In previous chapters, we treated molecules as sequences of characters in a SMILES string. While powerful, this simplification forces a linear structure onto objects that are inherently three-dimensional. A SMILES string discards the rich topological and structural information that dictates chemical behavior. This loss of information can impair a model's predictive power and the functional relevance of its learned representations.

In this chapter, we embrace a more natural representation of molecules as graphs. We will tackle one of the most critical tasks in computational drug discovery: predicting the binding affinity between a drug and a target protein. This task is central to reducing the immense cost and time associated with de novo drug development.

Our starting point for this journey will be the influential GraphDTA model, which demonstrated the Graph Neural Networks (GNNs) for learning from drug structures [1]. The original study showed that by representing drugs as graphs, their model could outperform contemporary deep learning approaches that relied on 1D string representations.

11.1 Challenges of Drug-Target Affinity Prediction

11.1.1 Benchmarking Drug-Target Affinity

11.1.2 Necessity for Better Molecular and Protein Representations

11.2 Molecular Graph Construction

11.2.1 Small Molecules as Molecular Graphs

11.2.2 From SMILES to Molecular Graphs

11.3 Proteins as Residue Interaction Graphs

11.3.1 Contact Maps as Structure Representation

11.3.2 Amino Acid Features and Position-Specific Scoring Matrices

11.4 Graph Neural Network Foundations

11.4.1 The Engine of GNNs: Message Passing

11.4.2 Graph Convolutional Networks

11.4.3 Graph Attention Networks (GATs)

11.4.4 Graph Isomorphism Networks (GINs)

11.4.5 Graph-level Pooling for Molecular Representations

11.4.6 Challenges in Deep GNNs

11.5 DualGraphDTA’s Dual-Stream Architecture

11.5.1 The Overall Architecture

11.5.2 Architectural Components and Information Flow

11.5.3 Embedding Fusion and Prediction

11.5.4 Alternative Architectures: GAT and GIN

11.5.5 Graph Data Preparation & Abstraction with PyTorch Geometric

11.5.6 Loss Function and Optimization

11.6 Evaluating Drug-Target Interaction Models

11.6.1 Performance Comparison of GCN, GAT, and GIN

11.8 Summary