4 Building a concurrent web crawler

 

This chapter covers

  • Asynchronous context managers
  • How to make asyncio friendly web requests with Aiohttp
  • Running web requests concurrently with gather
  • Processing results as they come in with as_completed
  • Keeping track of in-flight requests with wait
  • Setting and handling timeouts for groups of requests
  • Cancelling in-flight requests

In the previous chapter we learned more about the inner workings of sockets and built a basic echo server. Now that we’ve seen how to design a basic application, we’ll take this knowledge and apply it to making concurrent, non-blocking web requests. Utilizing asyncio for web requests allows us to make hundreds at the same time, cutting down on our application's runtime compared to a synchronous approach. This is useful for when we have to make multiple requests to a set of REST APIs as can happen in a microservice architecture, or when we have a web crawling task. This approach also allows for other code to run as we're waiting for potentially long web requests to finish, allowing us to build more responsive applications.

4.1   Introducing Aiohttp

4.2   Asynchronous context managers

4.3   Making a web request with Aiohttp

4.3.1   Setting timeouts with aiohttp

4.4   Running tasks concurrently revisited

4.5   Running requests concurrently with gather

4.5.1   Handling exceptions with gather

4.6   Processing requests as they complete

4.6.1   Timeouts with as_completed

4.7   Finer grained control with wait

4.7.1   Waiting for all tasks to complete

4.7.2   Watching for exceptions

4.7.3   Processing results as they complete

4.7.4   Handling timeouts

4.7.5   Why wrap everything in a Task?

4.8   Summary

sitemap