appendix D Machine learning tools and techniques
Machine learning is the foundation of most NLP pipelines. If you had a lot of time but not a lot of data, you could handcraft most of the algorithms in this book. But think about how difficult it is to design a regular expression to match just a single keyword (and all its variations). Machine learning lets you replace all that hard software development work with data. So it pays to understand some of the basic tools and techniques of machine learning to allow the machine to work for you, instead of the other way around.
D.1 Change of paradigm
Machine learning is a paradigm shift—a fundamentally different problem-solving approach to the traditional software programming paradigm. In traditional software development, you, the programmer, write a set of instructions for the computer to follow to turn the given inputs into the desired outputs. For example, if you wanted to predict home prices, you would have to figure out a price per square foot for each neighborhood and write up some math formulae that take into account all the other features of a house that affect its price. And whenever the inputs and outputs for a function changed, you would have to edit the program to adapt it to the new situation. If, for example, a new solar panel subsidy was announced in a particular state, you’d have to go into your program and change the formula for that state to account for your new mental model of how homes are priced.