chapter ten

10 Graph feature engineering: Manual and semiautomated approaches

 

This chapter covers

  • Manual feature engineering techniques for nodes and relationships in graphs
  • Combining domain expertise with semiautomated extraction in a graph representation
  • Real-world applications of feature engineering

The success of machine learning (ML) on graphs depends on a fundamental challenge: how to effectively represent graph elements (nodes, relationships, and entire graphs) as vectors that ML algorithms can process. This representation step, often called vectorization or featurization, determines how well our models can learn and make predictions.

Although modern ML algorithms—from traditional approaches like logistic regression and random forests to sophisticated deep learning models—are well-established, they can’t directly process graph structures. Instead, they require numerical input vectors. The quality of these vectors directly affects the performance of any downstream task, whether it’s classifying nodes, predicting relationships, or analyzing entire graphs.

This chapter explores the art and science of creating these vector representations, progressing from manual to automated approaches. We start with manual feature engineering, crafting interpretable features based on domain knowledge and graph properties. This hands-on approach, although time-consuming, provides insights into what makes representations effective and helps us understand why certain features work better than others.

10.1 Manual node features

10.1.1 Degree

10.1.2 Triangles

10.1.3 Density

10.1.4 Geodesic (or shortest) path

10.1.5 Closeness

10.1.6 Betweenness

10.1.7 PageRank

10.1.8 Prediction

10.2 Manual relationship features

10.2.1 Node-based representation

10.2.2 Path-based features

10.3 Semiautomated feature extraction

10.3.1 Performing ReFeX manually

10.3.2 Performing ReFeX automatically with code

Summary