concept `model` in category `reinforcement learning`

appears as: model, models, model, models, A model, The model

Grokking Deep Reinforcement Learning MEAP V14 epub

This is an excerpt from Manning's book Grokking Deep Reinforcement Learning MEAP V14 epub. Login to get full access to this book.

Supervised learning (SL) is the task of learning from labeled data. In SL, a human decides which data to collect and how to label it. The goal in SL is to generalize. A classic example of SL is a handwritten-digit recognition application; a human gathers images with handwritten digits, labels those images, and trains a model to recognize and classify digits in images correctly. The trained model is expected to generalize and correctly classify handwritten digits in new images.

Unsupervised learning (UL) is the task of learning from unlabeled data. Even though data no longer needs labeling, the methods used by the computer to gather data still need to be designed by a human. The goal in UL is to compress. A classic example of UL is a customer segmentation application; a human collects customer data and trains a model to group customers into clusters. These clusters compress the information uncovering underlying relationships in customers.

to see more go to 1 Introduction to deep reinforcement learning

Second, we explore algorithms that use experience samples to learn a model of the environment, a Markov Decision Process (MDP.) By doing so, these methods extract the most out of the data they collect and often arrive at optimality more quickly than methods that don’t. The group of algorithms that attempt to learn a model of the environment is referred to as model-based reinforcement learning.

to see more go to 7 Achieving goals more effectively and efficiently

Deep Reinforcement Learning in Action

This is an excerpt from Manning's book Deep Reinforcement Learning in Action. Login to get full access to this book.

Deep learning models are just one of many kinds of machine learning models we can use to classify images. In general, we just need some sort of function that takes in an image and returns a class label (in this case, the label identifying which kind of animal is depicted in the image), and usually this function has a fixed set of adjustable parameters—we call these kinds of models parametric models. We start with a parametric model whose parameters are initialized to random values—this will produce random class labels for the input images. Then we use a training procedure to adjust the parameters so the function iteratively gets better and better at correctly classifying the images. At some point, the parameters will be at an optimal set of values, meaning that the model cannot get any better at the classification task. Parametric models can also be used for regression, where we try to fit a model to a set of data so we can make predictions for unseen data (figure 1.2). A more sophisticated approach might perform even better if it had more parameters or a better internal architecture.

Figure 1.2. Perhaps the simplest machine learning model is a simple linear function of the form f(x) = mx + b, with parameters m (the slope) and b (the intercept). Since it has adjustable parameters, we call it a parametric function or model. If we have some 2-dimensional data, we can start with a randomly initialized set of parameters, such as [m = 3.4, b = 0.3], and then use a training algorithm to optimize the parameters to fit the training data, in which case the optimal set of parameters is close to [m = 2, b = 1].

Deep neural networks are popular because they are in many cases the most accurate parametric machine learning models for a given task, like image classification. This is largely due to the way they represent data. Deep neural networks have many layers (hence the “deep”), which induces the model to learn layered representations of input data. This layered representation is a form of compositionality, meaning that a complex piece of data is represented as the combination of more elementary components, and those components can be further broken down into even simpler components, and so on, until you get to atomic units.

to see more go to 1.1. The “deep” in deep reinforcement learning

As we mentioned in the introduction, our goal in this chapter is to implement a model called distributed advantage actor-critic (DA2C), and we’ve discussed the “advantage actor-critic” part of the name at a conceptual level. Let’s do the same for the “distributed” part now.

to see more go to Chapter 5. Tackling more complex problems with actor-critic methods

Figure 5.8. The most common form of training a deep learning model is to feed a batch of data together into the model to return a batch of predictions. Then we compute the loss for each prediction and average or sum all the losses before backpropagating and updating the model parameters. This averages out the variability present across all the experiences. Alternatively, we can run multiple models with each taking a single experience and making a single prediction, backpropagate through each model to get the gradients, and then sum or average the gradients before making any parameter updates.

to see more go to Chapter 5. Tackling more complex problems with actor-critic methods

As with the rest of the deep learning methods, we generally must use batches of data in order to effectively train. Training with a single example a time introduces too much noise, and the training will likely never converge. To introduce batch training with Q-learning we used an experience replay buffer that could randomly select batches of previous experiences. We could have used experience replay with actor-critic, but it is more common to use distributed training with actor-critic (and, to be clear, Q-learning can also be distributed). Distributed training in actor-critic models is more common because we often want to use a recurrent neural network (RNN) layer as part of our reinforcement learning model in cases where keeping track of prior states is necessary or helpful in achieving the goal. But RNNs need a sequence of temporally related examples, and experience replay relies on a batch of independent experiences. We could store entire trajectories (sequences of experiences) in a replay buffer, but that just adds complexity. Instead, with distributed training and each process running online with its own environment, the models can easily incorporate RNNs.

to see more go to 5.4. N-step actor-critic

Listing 6.13. Training the models

num_generations = 25                                                       #1
population_size = 500                                                      #2
mutation_rate = 0.01
pop_fit = []
pop = spawn_population(N=population_size,size=407)                         #3
for i in range(num_generations):
    pop, avg_fit = evaluate_population(pop)                                #4
    pop_fit.append(avg_fit)
    pop = next_generation(pop, mut_rate=mutation_rate,tournament_size=0.2) #5

to see more go to Chapter 6. Alternative optimization methods: Evolutionary algorithms

concept model in category reinforcement learning

Grokking Deep Reinforcement Learning MEAP V14 epub

Deep Reinforcement Learning in Action

Listing 6.13. Training the models

Unable to load book!

concept `model` in category `reinforcement learning`