Building an HTTP proxy scanner is an excellent project for understanding network protocols, concurrency, and proxy mechanics. This guide will walk you through creating a functional, concurrent HTTP proxy scanner in Python.
By the end of this article, you will have a script that takes a list of potential proxy servers, tests their connectivity, and verifies if they successfully mask your IP address. Understanding the Architecture
An HTTP proxy scanner operates on a simple fetch-and-verify mechanism.
[Scanner] —> [Candidate Proxy] —> [Judge Server] ^ | |____________ Returns public IP ________|
The Target List: A collection of IP addresses and ports to test.
The Proxy Judge: A reliable, public server that echoes back the requester’s IP address (e.g., http://httpbin.org).
The Concurrency Engine: A system to test hundreds of proxies simultaneously so the script does not stall on dead connections. Step 1: Setting Up the Environment
We will use Python 3 and the requests library for handling HTTP traffic. Because standard sequential requests are too slow for scanning, we will use Python’s built-in concurrent.futures module to handle simultaneous connections. First, ensure you have the required library installed: pip install requests Use code with caution. Step 2: Designing the Core Verification Logic
The core function must attempt to route a request through a specific proxy to a “proxy judge.” If the judge returns the proxy’s IP instead of your home IP, the proxy works.
import requests def check_proxy(proxy_address): “”” Tests a single proxy. Format expected: ‘ip:port’ (e.g., ‘192.168.1.50:8080’) “”” proxy_dict = { “http”: f”http://{proxy_address}“, “https”: f”http://{proxy_address}” } # Using httpbin.org to echo back the IP address seen by the server judge_url = “http://httpbin.org” try: # Low timeout is critical; dead proxies shouldn’t hang your script response = requests.get(judge_url, proxies=proxy_dict, timeout=5) if response.status_code == 200: # Verify the judge actually saw the proxy IP, confirming anonymity returned_ip = response.json().get(“origin”) if returned_ip and proxy_address.split(‘:’)[0] in returned_ip: return {“proxy”: proxy_address, “status”: “Working”, “speed_ms”: response.elapsed.total_seconds()1000} except requests.RequestException: # Captures timeouts, connection drops, and bad handshakes pass return {“proxy”: proxy_address, “status”: “Dead”, “speed_ms”: None} Use code with caution. Step 3: Implementing Concurrency
If you have 1,000 proxies to scan and each timeout takes 5 seconds, a single-threaded scanner would take over an hour. By using a thread pool, we can process dozens of proxies at the same time.
Leave a Reply