In the previous chapter, we looked in some detail at some important shallow neural network architectures important in transfer learning for NLP, including word2vec and sent2vec. Recall also that the vectors produced by these methods are static and noncontextual, in the sense that they produce the same vector for the word or sentence in question, regardless of the surrounding context. This means these methods are unable to disambiguate, or distinguish, between different possible meanings of a word or sentence.
In this and the next chapter, we will cover some representative deep transfer learning modeling architectures for NLP that rely on recurrent neural networks (RNNs) for key functions. Specifically, we will be looking at the modeling frameworks SIMOn,1 ELMo,2 and ULMFiT.3 The deeper nature of the neural networks employed by these methods will allow the resulting embedding to be contextual, that is, to produce word embeddings that are functions of context and allow disambiguation. Recall that we first encountered ELMo in chapter 3. In the next chapter, we will take a closer look at its architecture.