10 Graph feature engineering: manual and semi-automated approaches
This chapter covers
- Manual feature engineering techniques for nodes and relationships in graphs
- Combining domain expertise with semi-automated extraction in graph representation
- Real-world applications of feature engineering in fraud and drug discovery
The success of machine learning on graphs depends on a fundamental challenge: how to effectively represent graph elements (nodes, relationships, and entire graphs) as vectors that machine learning algorithms can process. This representation step, often called vectorization or featurization, determines how well our models can learn and make predictions.
While modern machine learning algorithms – from traditional approaches like logistic regression and random forests to sophisticated deep learning models – are well-established, they can't directly process graph structures. Instead, they require numerical input vectors. The quality of these vectors directly impacts the performance of any downstream task, whether it's classifying nodes, predicting relationships, or analyzing entire graphs.