chapter seven

7 Time series analysis: Day trading with machine learning

This chapter covers

Working with time series data
Constructing a custom feature set and response variable, using standard time series feature types
Tracking intraday profits from our ML pipeline
Adding domain-specific features to our dataset to enhance performance
Extracting and selecting features to minimize noise and maximize latent signal

We have been through a lot together, from tabular data to bias reduction to text and image vectorization. All of these datasets had one major thing in common: they were all datasets based on a snapshot in time. All of the people represented in the COMPAS dataset had their data aggregated before we started our analysis. All of the tweets were already sent. All of the images were already taken. Another similarity is that each row in our datasets was not dependent on other rows in the dataset. If we pick a single person from the COMPAS set or a tweet from our NLP dataset, the values that are attached to each person do not depend on another data point in that dataset. We aren’t, for example, tracking values for a person across time. Another similarity between datasets we have been working with up until now is that we were always given a pretty straightforward response variable to target in our ML pipelines. We always knew, for example, the sentiment of the tweet, the object in the photo, or whether the patient had COVID-19. There was never any doubt as to what we were trying to predict.

7.1 The TWLO dataset

7.1.1 The problem statement

7.2 Feature construction

7.2.1 Date/time features

7.2.2 Lag features

7.2.3 Rolling/expanding window features

7.2.4 Domain-specific features

7.3 Feature selection

7.3.1 Selecting features using ML

7.3.2 Recursive feature elimination

7.6 Answers to exercises