4 Building a concurrent web crawler
This chapter covers
- Asynchronous context managers
- How to make asyncio friendly web requests with Aiohttp
- Running web requests concurrently with gather
- Processing results as they come in with as_completed
- Keeping track of in-flight requests with wait
- Setting and handling timeouts for groups of requests
- Cancelling in-flight requests
In the previous chapter we learned more about the inner workings of sockets and built a basic echo server. Now that we’ve seen how to design a basic application, we’ll take this knowledge and apply it to making concurrent, non-blocking web requests. Utilizing asyncio for web requests allows us to make hundreds at the same time, cutting down on our application's runtime compared to a synchronous approach. This is useful for when we have to make multiple requests to a set of REST APIs as can happen in a microservice architecture, or when we have a web crawling task. This approach also allows for other code to run as we're waiting for potentially long web requests to finish, allowing us to build more responsive applications.