# Requests Library (Python) - HTTP Client Best Practices **Authority Tier:** Tier 2 (Vendor - most widely used HTTP library) **Source:** https://requests.readthedocs.io/ **Relevance:** Timeout configuration, retry strategies, TLS verification, session pooling --- ## Timeout Configuration ### Separate Connect and Read Timeouts > **Best Practice:** Use a tuple `(connect_timeout, read_timeout)` for fine-grained control. > > ```python > requests.get(url, timeout=(10, 30)) # 10s connect, 30s read > ``` **Rationale:** - **Connect timeout:** Should be short (3-10s) - if server doesn't respond quickly, it's likely down - **Read timeout:** Should be longer (30-60s) - response bodies may be large or slow **Key Claim:** - `httpclient/timeout/separate_connect_read :: recommended = true` - **Consequence:** Single timeout value can't optimize for both connection and response scenarios ### Default Timeout Values > **Requests defaults:** > - **Connect:** 10 seconds > - **Read:** 30 seconds > > **Industry consensus:** These values work well for most use cases. **Key Claims:** - `httpclient/connect_timeout :: default_value = 10` - `httpclient/read_timeout :: default_value = 30` --- ## TLS Verification ### Certificate Validation > **Default behavior:** Requests **enables** certificate verification by default. > > **Critical warning:** Never use `verify=False` in production. ```python # BAD - disables verification requests.get(url, verify=False) # GOOD - uses system CA bundle requests.get(url, verify=True) ``` **Key Claim:** - `httpclient/tls/verify :: required = true` - **Consequence:** `verify=False` enables MITM attacks, credential theft ### Custom CA Bundle > **Best Practice:** If using self-signed certificates, provide explicit CA bundle path instead of disabling verification. ```python requests.get(url, verify='/path/to/ca-bundle.crt') ``` **Key Claim:** - `httpclient/tls/custom_ca :: recommended = path_over_disabled` - **Consequence:** Disabling verification is easier but creates security hole --- ## Session Pooling ### Connection Reuse > **Best Practice:** Use `requests.Session()` for multiple requests to the same host. > > **Benefit:** Reuses TCP connections (HTTP keep-alive), significantly faster. ```python session = requests.Session() session.get('https://api.example.com/users') session.get('https://api.example.com/posts') # Reuses connection ``` **Key Claim:** - `httpclient/sessions/connection_pooling :: recommended = true` - **Consequence:** Without pooling, every request pays TCP handshake + TLS handshake cost ### Default Pool Size > **Requests default:** 10 connections per host (via `urllib3.poolmanager`). > > **Configurable:** Can increase for high-throughput scenarios. ```python session = requests.Session() adapter = requests.adapters.HTTPAdapter(pool_connections=20, pool_maxsize=20) session.mount('https://', adapter) ``` **Key Claim:** - `httpclient/pool/default_size :: default_value = 10` - **Consequence:** Default works for most cases, but high-concurrency apps need tuning --- ## Retry Logic ### Retry Adapter > **Best Practice:** Use `urllib3.util.retry.Retry` for automatic retries with exponential backoff. ```python from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry retry_strategy = Retry( total=3, # Max 3 retries backoff_factor=1, # 1s, 2s, 4s backoff status_forcelist=[429, 500, 502, 503, 504], # Retry on these status codes allowed_methods=["GET", "PUT", "DELETE"] # Only idempotent methods ) adapter = HTTPAdapter(max_retries=retry_strategy) session.mount("https://", adapter) ``` **Key Claims:** - `httpclient/retry/max_attempts :: max_value = 3` - `httpclient/retry/backoff :: required = exponential` - `httpclient/retry/idempotent_only :: required = true` - **Consequence:** More than 3 retries amplifies load during outages (retry storms) ### Retry-Safe Methods > **Default:** Requests only retries on idempotent methods (GET, HEAD, PUT, DELETE, OPTIONS, TRACE). > > **Never retries POST by default** - non-idempotent, may cause duplicate operations. **Key Claim:** - `httpclient/retry/post_excluded :: required = true` - **Consequence:** Retrying POST can cause duplicate charges, bookings, etc. --- ## Redirect Handling ### Max Redirects > **Requests default:** 30 redirects allowed. > > **Industry recommendation:** 10 redirects (per RFC 7231). ```python requests.get(url, allow_redirects=True, max_redirects=10) ``` **Key Claim:** - `httpclient/redirects/max :: max_value = 10` - **Consequence:** Requests' default (30) is too permissive, allows longer redirect chains ### Redirect Loop Detection > **Built-in:** Requests detects redirect loops and raises `TooManyRedirects` exception. **Key Claim:** - `httpclient/redirects/loop_detection :: required = true` - **Consequence:** Without detection, infinite loops exhaust resources --- ## Headers ### User-Agent > **Default:** Requests sends `User-Agent: python-requests/`. > > **Best Practice:** Customize User-Agent to identify your application. ```python headers = {'User-Agent': 'MyApp/1.0.0 (https://example.com)'} requests.get(url, headers=headers) ``` **Key Claim:** - `httpclient/headers/user_agent :: recommended = custom` - **Consequence:** Generic User-Agent may trigger rate limiting or blocking ### Accept-Encoding > **Automatic:** Requests automatically handles gzip/deflate compression. > > **Transparent:** Decompresses response bodies automatically. **Key Claim:** - `httpclient/compression/automatic :: recommended = true` - **Consequence:** Without compression, wastes bandwidth --- ## Error Handling ### Timeout Errors > **Exception:** `requests.exceptions.Timeout` raised on timeout. > > **Best Practice:** Always catch and handle timeouts explicitly. ```python try: response = requests.get(url, timeout=10) except requests.exceptions.Timeout: # Handle timeout (log, retry, return error) pass ``` **Key Claim:** - `httpclient/error_handling/timeout :: must = raise_exception` - **Consequence:** Unhandled timeouts crash application or hang indefinitely ### Connection Errors > **Exception:** `requests.exceptions.ConnectionError` for network failures. **Key Claim:** - `httpclient/error_handling/connection :: must = raise_exception` - **Consequence:** Must distinguish connection errors from other failures --- ## Summary of Requests Library Defaults | Setting | Requests Default | httpclient Should Use | |---------|------------------|----------------------| | **Connect Timeout** | 10 seconds | 10s ✅ | | **Read Timeout** | 30 seconds | 30s ✅ | | **Max Redirects** | 30 | 10 (RFC 7231) | | **TLS Verify** | True | True ✅ | | **Max Retries** | 0 (manual) | 3 (with backoff) | | **Pool Size** | 10 per host | 10-50 (configurable) | | **Retry Methods** | Idempotent only | Idempotent only ✅ | **Deviations from Requests:** - **Max Redirects:** Use 10 (RFC-compliant) instead of 30 - **Retries:** Enable by default (Requests requires manual setup) **Authority Tier:** Tier 2 (Vendor - 100M+ downloads/month, de facto standard)