Managing Failed Requests in Python

This guide explains how to handle failed HTTP requests in Python with retry strategies and custom logic.

What Are Status Codes?
Retry Strategies
HTTPAdapter
Tenacity
Building a Custom Retry Mechanism
Conclusion

What Are Status Codes?

Status codes are standardized three-digit numbers used in various protocols to indicate the result of a request. According to Mozilla, HTTP status codes can be broken down into the following categories:

100-199: Informational responses
200-299: Successful responses
300-399: Redirection messages
400-499: Client error messages
500-599: Server error messages

When developing client-side applications like web scrapers, it's crucial to pay attention to status codes in the 400 and 500 ranges. Codes in the 400s typically indicate client-side errors, such as authentication failures, rate limiting, timeouts, or the well-known 404: Not Found error. Meanwhile, status codes in the 500s signal server-side issues that may require retries or alternative handling strategies.

Here is a list of common error codes (taken from Mozilla’s official documentation) you will encounter when performing web scraping:

Status Code	Meaning	Description
400	Bad Request	Check your request format
401	Unauthorized	Check your API key
403	Forbidden	You cannot access this data
404	Not Found	Site/Endpoint doesn’t exist
408	Request Timeout	Request timed out, try again
429	Too Many Requests	Slow down your requests
500	Internal Server Error	Generic server error, retry request
501	Not Implemented	Server doesn’t support this yet
502	Bad Gateway	Failed response from an upstream server
503	Service Unavailable	Server is temporarily down, retry later
504	Gateway Timeout	Timed out waiting for an upstream server

Retry Strategies

When implementing a retry mechanism in Python, you can leverage pre-built libraries like HTTPAdapter and Tenacity. Alternatively, you may choose to develop custom retry logic based on your specific needs.

A well-designed retry strategy should include both a retry limit and a backoff mechanism. The retry limit prevents infinite loops, ensuring that failed requests don’t continue indefinitely. A backoff strategy, which gradually increases the delay between retries, helps prevent excessive requests that could lead to being blocked or overloading the server.

Retry Limits: It’s essential to define a retry limit. After a specified number of attempts (X), the scraper should stop retrying to avoid infinite loops.
Backoff Algorithm: A gradual increase in wait time between retries helps prevent overwhelming the server. Start with a small delay, such as 0.3 seconds, then incrementally increase it to 0.6 seconds, 1.2 seconds, and so forth.

HTTPAdapter

With HTTPAdapter, we need to configure three things: total, backoff_factor, and status_forcelist. allowed_methods isn’t a requirement per se, but it helps define our retry conditions and thus makes our code safer. In the code below, we use httpbin to automatically force an error and trigger the retry logic.

import logging
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Create a session
session = requests.Session()

# Configure retry settings
retry = Retry(
    total=3,  # Maximum retries
    backoff_factor=0.3,  # Time between retries (exponential backoff)
    status_forcelist=(429, 500, 502, 503, 504),  # Status codes to trigger a retry
    allowed_methods={"GET", "POST"}  # Allow retries for GET and POST
)

# Mount the adapter with our custom settings
adapter = HTTPAdapter(max_retries=retry)
session.mount("http://", adapter)
session.mount("https://", adapter)

# Function to make a request and test retry logic
def make_request(url, method="GET"):
    try:
        logger.info(f"Making a {method} request to {url} with retry logic...")
        
        if method == "GET":
            response = session.get(url)
        elif method == "POST":
            response = session.post(url)
        else:
            logger.error("Unsupported HTTP method: %s", method)
            return
        
        response.raise_for_status()
        logger.info("✅ Request successful: %s", response.status_code)
    
    except requests.exceptions.RequestException as e:
        logger.error("❌ Request failed after retries: %s", e)
        logger.info("Retries attempted: %d", len(response.history) if response else 0)

# Test Cases
make_request("https://httpbin.org/status/200")  # ✅ Should succeed without retries
make_request("https://httpbin.org/status/500")  # ❌ Should retry 3 times and fail
make_request("https://httpbin.org/status/404")  # ❌ Should fail immediately (no retries)
make_request("https://httpbin.org/status/500", method="POST")  # ❌ Should retry 3 times and fail

Once you created a Session object, do this:

Create a Retry object and define:
- total: The maximum limit for retrying a request.
- backoff_factor: Time to wait between retries. This adjusts exponentially as our retries increase.
- status_forcelist: A list of bad status codes. Any codes in this list will automatically trigger a retry.
Create an HTTPAdapter object with our retry variable: adapter = HTTPAdapter(max_retries=retry).
Once you’ve created the adapter, mount it to the HTTP and HTTPS methods using session.mount().

When you run this code, the three retries (total=3) will run, and then you’ll get the following output.

2024-06-10 12:00:00 - INFO - Making a GET request to https://httpbin.org/status/200 with retry logic...
2024-06-10 12:00:00 - INFO - ✅ Request successful: 200

2024-06-10 12:00:01 - INFO - Making a GET request to https://httpbin.org/status/500 with retry logic...
2024-06-10 12:00:02 - ERROR - ❌ Request failed after retries: 500 Server Error: INTERNAL SERVER ERROR for url: ...
2024-06-10 12:00:02 - INFO - Retries attempted: 3

2024-06-10 12:00:03 - INFO - Making a GET request to https://httpbin.org/status/404 with retry logic...
2024-06-10 12:00:03 - ERROR - ❌ Request failed after retries: 404 Client Error: NOT FOUND for url: ...
2024-06-10 12:00:03 - INFO - Retries attempted: 0

2024-06-10 12:00:04 - INFO - Making a POST request to https://httpbin.org/status/500 with retry logic...
2024-06-10 12:00:05 - ERROR - ❌ Request failed after retries: 500 Server Error: INTERNAL SERVER ERROR for url: ...
2024-06-10 12:00:05 - INFO - Retries attempted: 3

Tenacity

You can also use Tenacity, a popular open source retry library for Python. It’s not limited to HTTP, but it gives you an expressive way to implement retries.

Start with installing Tenacity:

pip install tenacity

Once installed, create a decorator and use it to wrap a requests function. With the @retry decorator, add the stop, wait, and retry arguments.

import logging
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type, retry_if_result, RetryError

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Define a retry strategy
@retry(
    stop=stop_after_attempt(3),  # Retry up to 3 times
    wait=wait_exponential(multiplier=0.3),  # Exponential backoff
    retry=(
        retry_if_exception_type(requests.exceptions.RequestException) |  # Retry on request failures
        retry_if_result(lambda r: r.status_code in {500, 502, 503, 504})  # Retry on specific HTTP status codes
    ),
)
def make_request(url):
    logger.info("Making a request with retry logic to %s...", url)
    response = requests.get(url)
    response.raise_for_status()
    logger.info("✅ Request successful: %s", response.status_code)
    return response

# Attempt to make the request
try:
    make_request("https://httpbin.org/status/500")  # Test with a failing status code
except RetryError as e:
    logger.error("❌ Request failed after all retries: %s", e)

The logic and settings here are very similar to the first example with HTTPAdapter:

stop=stop_after_attempt(3): This tells tenacity to give up after 3 failed retries.
wait=wait_exponential(multiplier=0.3) uses the same wait that we used before. It also backs off exponentially, just like before.
retry=retry_if_exception_type(requests.exceptions.RequestException) tells tenacity to use this logic every time a RequestException occurs.
make_request() makes a request to our error endpoint. It receives all of the traits from the decorator you created above it.

When you run this code, you get a similar output:

2024-06-10 12:00:00 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:01 - WARNING - Retrying after 0.3 seconds...
2024-06-10 12:00:01 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:02 - WARNING - Retrying after 0.6 seconds...
2024-06-10 12:00:02 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:03 - ERROR - ❌ Request failed after all retries: RetryError[...]

Building a Custom Retry Mechanism

You can also create a custom retry mechanism, which is often the best approach when working with specialized code. With a relatively small amount of code, you can achieve the same functionality provided by existing libraries while tailoring it to your specific needs.

The code below demonstrates how to import sleep for the exponential backoff, set the configuration (total, backoff_factor and bad_codes), and use a while loop to hold the retry logic. whileyou still have tries and you haven’t succeeded, attempt the request.

import logging
import requests
from time import sleep

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Create a session
session = requests.Session()

# Define retry settings
TOTAL_RETRIES = 3
INITIAL_BACKOFF = 0.3
BAD_CODES = {429, 500, 502, 503, 504}

def make_request(url):
    current_tries = 0
    backoff = INITIAL_BACKOFF
    success = False

    while current_tries < TOTAL_RETRIES and not success:
        try:
            logger.info("Making a request with retry logic to %s...", url)
            response = session.get(url)
            
            if response.status_code in BAD_CODES:
                raise requests.exceptions.HTTPError(f"Received {response.status_code}, triggering retry")
            
            response.raise_for_status()
            logger.info("✅ Request successful: %s", response.status_code)
            success = True
            return response

        except requests.exceptions.RequestException as e:
            logger.error("❌ Request failed: %s, retries left: %d", e, TOTAL_RETRIES - current_tries - 1)
            if current_tries < TOTAL_RETRIES - 1:
                logger.info("⏳ Retrying in %.1f seconds...", backoff)
                sleep(backoff)
                backoff *= 2  # Exponential backoff
            current_tries += 1

    logger.error("🚨 Request failed after all retries.")
    return None

# Test Cases
make_request("https://httpbin.org/status/500")  # ❌ Should retry 3 times and fail
make_request("https://httpbin.org/status/200")  # ✅ Should succeed without retries

The actual logic here is handled by a simple while loop.

If response.status_code is in the list of bad_codes, the script throws an exception.
If a request fails, the script:
- Prints an error message to the console.
- sleep(backoff_factor) waits before sending the next request.
- backoff_factor = backoff_factor * 2 doubles our backoff_factor for the next try.
- Increments current_tries so it doesn’t stay in the loop indefinitely.

Here’s the output from the custom retry code.

2024-06-10 12:00:00 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:01 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 2
2024-06-10 12:00:01 - INFO - ⏳ Retrying in 0.3 seconds...
2024-06-10 12:00:02 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:03 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 1
2024-06-10 12:00:03 - INFO - ⏳ Retrying in 0.6 seconds...
2024-06-10 12:00:04 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:05 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 0
2024-06-10 12:00:05 - ERROR - 🚨 Request failed after all retries.

Conclusion

To avoid all kinds of failed requests, we’ve developed products like the Web Unlocker API and Scraping Browser. These tools automatically handle anti-bot measures, CAPTCHA challenges, and IP blocks, ensuring seamless and efficient web scraping for even the most challenging websites.

Sign up now and start your free trial today.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Managing Failed Requests in Python

What Are Status Codes?

Retry Strategies

HTTPAdapter

Tenacity

Building a Custom Retry Mechanism

Conclusion

About

Uh oh!

luminati-io/manage-failed-python-requests

Folders and files

Latest commit

History

Repository files navigation

Managing Failed Requests in Python

What Are Status Codes?

Retry Strategies

HTTPAdapter

Tenacity

Building a Custom Retry Mechanism

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks