I thought my setup was bullet‑proof: a tiny Apache server inside an Ubuntu VM, a single index.html
, and a friendly DNS shortcut into /etc/hosts
. Type that URL in any browser on the host and the page pops right up.
Then I tried to be clever and fetch the same HTML with nothing but Python sockets. Instead of the page, Apache tossed me a 404 Not Found. One hour of head‑scratching later, I discovered a one‑character typo that broke the whole request.
Below is the journey from the wrong code, through the detective work, to the quick fix and a handful of practice ideas so you can stretch your own socket muscles.
Error Code
import socket
import sys # for sys.exit()
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error:
print("Failed to initialize socket")
sys.exit()
print("Socket initialized")
host = "vulnerable"
port = 80
try:
remote_ip = socket.gethostbyname(host)
except socket.gaierror:
print("Hostname could not be resolved. Exiting")
sys.exit()
s.connect((remote_ip, port))
print(f"Socket connected to {host} on IP {remote_ip}")
message = "GET /HTTP/1.1\r\n\r\n".encode() # send as bytes
try:
s.sendall(message)
except socket.error:
print("Send failed")
sys.exit()
print("Message sent successfully")
reply = s.recv(4096)
print(reply)
The program ran, but Apache answered:
HTTP/1.1 404 Not Found
...
The requested URL /HTTP/1.1 was not found on this server.
Why the Server Spat Out 404
An HTTP/1.1 request line must look exactly like this:
Edit<METHOD> <PATH> <VERSION>\r\n
My request line was:
GET /HTTP/1.1
There’s only one space, so Apache parses it as:
Field | Value |
---|---|
Method | GET |
Path | /HTTP/1.1 |
Version | (missing) |
Apache dutifully looked for a file called /HTTP/1.1
, couldn’t find it, and returned 404. A browser, however, sends something like:
GET / HTTP/1.1
Host: vulnerable
Note the extra space after the slash and the mandatory Host
header both required for HTTP/1.1.
A Minimal, Working Fix
import socket, sys
HOST, PORT = "vulnerable", 80
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
try:
s.connect((socket.gethostbyname(HOST), PORT))
print(f"Connected to {HOST}")
except socket.error as e:
sys.exit(f"Socket error → {e}")
request = (
"GET / HTTP/1.1\r\n"
f"Host: {HOST}\r\n"
"Connection: close\r\n\r\n"
).encode()
s.sendall(request)
response = bytearray()
while chunk := s.recv(4096):
response.extend(chunk)
print(response.decode(errors="replace"))
What changed?
GET / HTTP/1.1
correct path and space- Added
Host:
header (mandatory) - Added
Connection: close
so Apache closes the socket when done - Replaced the single
recv
with a loop so I capture the whole response, not just the first 4 kB
Extra Practice Ideas
- Save the HTML to a file
open("index_from_socket.html", "wb") as f:
f.write(response)
- Measure round‑trip time
time
start = time.perf_counter()
# …send/recv…
print(f"Took {time.perf_counter() - start:.3f}s")
- Send a HEAD request
SwapGET /
forHEAD /
to fetch headers only. - Parse response headers
header_blob, _, body = response.partition(b"\r\n\r\n")
headers = dict(
line.decode().split(":", 1)
for line in header_blob.split(b"\r\n")[1:] # skip status line
)
print(headers.get("Content-Type"))
- Turn it into a helper Wrap everything in
fetch(host, path="/" , *, port=80)
that returns(status, headers, body)
. - Try basic HTTPS
import ssl
context = ssl.create_default_context()
with context.wrap_socket(socket.socket(), server_hostname=HOST) as s:
s.connect((HOST, 443))
s.sendall(request) # same request string
...
- Follow one redirect
If the status is301
or302
, grab theLocation:
header and callfetch()
again. - Mini stress‑test Spin up 50 threads via
concurrent.futures.ThreadPoolExecutor
and record average latency.
Final Thoughts
One stray character /HTTP/1.1
instead of / HTTP/1.1
cost me an afternoon. Raw sockets give you total control, but that means every byte must be perfect. Once the syntax is fixed, the same 30‑line script becomes a playground: you can benchmark your server, parse headers, experiment with HTTPS, or even build a toy web crawler.