Building an HTTP Server From Scratch in Python

Estimated read time 15 min read

In this in-depth tutorial, the author explores how an HTTP server works at the byte level, from raw TCP sockets to request parsing, routing, MIME types, and gzip compression. Through clear explanations and focused code examples drawn from building a server from scratch in Python, the article is ideal for developers who want to understand how the web works beneath frameworks.

Why I Built My Own HTTP Server

When I started backend development, I felt that if I jumped straight into frameworks, I would be building on top of something I did not truly understand. And with AI able to scaffold projects and generate boilerplate in seconds, slowing down to learn the fundamentals felt more important than ever.

So, I built an HTTP server from scratch in Python.

I wanted to see what really happens between a client and a server, to understand HTTP at the level where everything is just bytes and structure. Once I began reading raw bytes from a browser, the “magic” disappeared and HTTP revealed itself as a simple, disciplined dialogue between two machines.

You can find the full implementation and commits on my GitHub repo. To make the flow easier to visualize, here is the end-to-end path a request follows inside the server, each stage adding just enough structure to turn a byte stream into a valid HTTP response.

The Bare Metal: TCP Sockets

Every HTTP server begins by creating a TCP socket. TCP, which stands for Transmission Control Protocol, provides a continuous stream of bytes between two machines. The server binds this socket to an address and listens for incoming connections. The moment a client connects, you receive a stream of bytes, and that stream is your only source of truth.

Python snippet

def serve_forever(self) -> NoReturn:
    sem = threading.Semaphore(self.config.max_concurrent_connections)
    lock = threading.Lock()
    server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
    server_socket.bind((self.config.host, self.config.port))
    server_socket.listen(64)

In the code above, the server sets up that listening socket. It creates a semaphore to cap the number of concurrent connections, a lock to protect shared state, and then a TCP socket using AF_INET and SOCK_STREAM. The socket option SO_REUSEPORT is enabled so the server can restart cleanly on the same port, after which the socket is bound to the configured host and port and put into listening mode, ready to accept incoming clients.

This is the first real lesson: TCP does not care about message boundaries. You might receive half a request, one and a half requests, headers in one chunk and the body in another, or everything mixed together. It’s your job to impose structure on that chaos. Working at this layer forces a new mindset: thinking in terms of buffering, partial reads, and unpredictable arrival patterns. Before I even touched HTTP, the socket taught me to treat incoming data as unstructured and indifferent to my expectations.

Parsing the request: finding structure in the chaos

Once bytes arrive, the next challenge is deciding where the HTTP request ends. HTTP gives you two essential landmarks:

  • \r\n\r\n: the separator (carriage return line feed – CRLF) between headers and body
  • Content-Length: how many bytes belong to the body

To turn the raw header bytes into a structured request, the server does three things:

  1. Decode the incoming bytes so it can search for \r\n\r\n and split text lines safely, even if the client sends odd characters.
  2. Extract the start-line by splitting at the first \r\n. If that separator is not found, the request is malformed because HTTP requires a start-line followed by headers.
  3. Validate the start-line format by splitting it on spaces. A proper request line must have exactly three parts: <method> <request-target> <http-version>.

Python snippet

text = header_bytes.decode("utf-8", errors="replace")
try:
    start_line, header_block = text.split(CRLF, 1)
except ValueError:
    raise BadRequest("Malformed request: missing CRLF after the start-line")

parts = start_line.split(" ")
if len(parts) != 3:
    raise BadRequest("A request line must follow: "
                     "<method>SPACE<request-target>SPACE<HTTP-version>CRLF")

method, target, http_version = parts[0], parts[1], parts[2]

In the code above, the server decodes the raw header bytes into text, then tries to split once on the first CRLF to separate the start-line from the rest of the header block. If this split fails, it raises a BadRequest error because a valid HTTP request must begin with a start-line. It then splits the start-line on spaces and checks that there are exactly three parts; otherwise, it raises another BadRequest explaining the required format. Finally, it assigns those three parts to method, target, and http_version, turning an unstructured line of text into a clean, typed request line. Once the header section is isolated, the server can parse the remaining headers and determine whether a body is expected.

Handling real-world byte streams

Clients rarely send data in perfectly shaped segments. A header block might be split across reads, or arrive glued together with the first part of the body. To handle this, the server uses a simple loop:

  1. Read the socket in chunks and append each chunk to a growing buffer.
  2. Stop as soon as \r\n\r\n appears; at that point, the headers are complete. Reject oversized or stalled requests via limits and timeouts.
  3. Split the buffer into two parts:
    • body_prefix: any bytes that arrived after the headers (often the first slice of the request body)
    • header_bytes: everything up to and including \r\n\r\n

Python snippet

while True:
    chunk = conn.recv(4096)
    if not chunk:
        break
    buffer += chunk
    if end in buffer:
        break
    if len(buffer) > max_headers:
        raise BadRequest("Header section too large.")

if end not in buffer:
    raise BadRequest("Header section incomplete.")

head_end = buffer.index(end) + len(end)
header_bytes = bytes(buffer[:head_end])
body_prefix = bytes(buffer[head_end:])
return header_bytes, body_prefix

In the snippet above, the server repeatedly calls recv(4096) and appends each chunk to buffer. As soon as the header terminator sequence is found, the loop breaks. If the buffer grows beyond max_headers, it raises BadRequest("Header section too large.") to protect the server from oversized or malicious requests. After the loop, it checks that the terminator is actually present; if not, it raises BadRequest("Header section incomplete."). Finally, it computes the position of the header end, slices out header_bytes and body_prefix, and returns both so later code can continue parsing.

Now the server needs to read the request body. It does this by using Content-Length to finish the job:

  1. Check how many bytes the client promised to send using Content-Length.
  2. Reject oversized bodies early using a configured maximum size.
  3. Start the body buffer using the prefix already received.
  4. Continue reading from the socket until the body length matches Content-Length.
  5. If the connection closes before all bytes arrive, the server raises an error for an incomplete body.

Python snippet

total = req.content_length or 0
if total == 0:
    req.add_body(b"")
    return

if total > self.config.max_body_bytes:
    raise PayloadTooLarge()

body = bytearray(req.body_prefix)
if len(body) > total:
    body = body[:total]

while len(body) < total:
    chunk = conn.recv(min(4096, total - len(body)))
    if not chunk:
        raise IncompleteBody()
    body += chunk

In this second snippet, the server does exactly that. It looks at req.content_length to determine how many bytes it should expect. If the total is zero, it simply attaches an empty body and returns. If the total exceeds max_body_bytes, it raises PayloadTooLarge. It then initializes body from the previously captured body_prefix. If that prefix already contains more bytes than expected, it trims it down to exactly total. Otherwise, it keeps calling recv() until len(body) reaches total, and if the connection closes too early it raises IncompleteBody. This guarantees that the final body matches the declared Content-Length exactly. This approach works even when clients pipeline multiple requests or send headers and body together, because the server trusts boundaries, not timing. Parsing HTTP manually forces precision and reveals why real servers care so much about buffering and strict message framing.

Routing, file serving and HTTP response

Once the request is parsed, the server decides how to respond. Every HTTP response follows a strict skeleton: a status line, a series of HTTP headers, and usually a message body. If any part is malformed, especially Content-Length, the browser rejects it.

Example: the /user-agent endpoint

This is a simple dynamic route that illustrates the basics:

  1. Read the User-Agent header from the parsed request.
  2. Use it as the plain-text body.
  3. Build a response with 200 OK, Content-Type: text/plain, and the correct Content-Length.
  4. Send headers and body together with correct CRLF formatting.

Python snippet

elif path == "/user-agent":
    ua = req.get_header("user-agent")
    body = ua.encode("utf-8")
    head = CRLF.join([
        HTTP_CODE_200,
        "Content-Type: text/plain",
        f"Content-Length: {len(body)}",
        "Connection: close",
    ]) + END_HEADERS
    return head.encode('utf-8') + body

In the snippet above, when the path equals /user-agent, the server pulls the User-Agent header from the request, encodes it as bytes, and uses that as the response body. It then assembles the response head by joining the status line and headers (Content-Type, Content-Length, Connection) with CRLF, appends the final blank line, and returns the encoded headers followed by the body bytes. This is a minimal but complete example of constructing a valid HTTP response by hand.

Even tiny mistakes here break clients, which made response construction feel surprisingly exacting.

Example: the /files/ endpoint

The /files/ endpoint turns the server into a small static file server with safety checks:

  • Validate the filename (rejecting .. and unsafe path escapes).
  • Resolve it against a fixed root directory.
  • Return 404 if the file is outside the root or does not exist.
  • For GET, read the file’s bytes and return them with Content-Type: application/octet-stream.
  • For POST, write the request body into a file and return 201 Created.

In the code snippets below, you can see these safeguards in action.

Extracting and validating the filename

The server first checks whether the path starts with /files/ and uses partition to extract the requested filename. If the filename is empty or contains / or .., the server immediately returns 404, blocking directory-traversal attempts.

Python snippet

elif path.startswith("/files/"):
    _, _, filename = path.partition("/files/")
    if not filename or "/" in filename or ".." in filename:
        head = HTTP_CODE_404 + END_HEADERS
        return head.encode("utf-8")

Resolving the path safely

Once the filename passes the initial validation, the server resolves it against a fixed root directory. Using relative_to, it verifies that the resolved path is still inside that root. If the resolution escapes the allowed directory, the server logs a message and rejects the request with 404.

Python snippet

else:
    full_root = file_root.resolve()
    full_path = (full_root / filename).resolve()
    try:
        full_path.relative_to(full_root)
    except ValueError:
        print(f"Cannot read '{filename}' "
              "as it is outside the permitted directory.")
        head = HTTP_CODE_404 + END_HEADERS
        return head.encode("utf-8")

Handling GET requests

If the request method is GET, the server checks whether the resolved path points to a real file. If not, it returns 404. Otherwise, it reads the file’s bytes, constructs a 200 OK response with Content-Type: application/octet-stream and an accurate Content-Length, and sends the headers followed by the file content.

Python snippet

if req.method == 'GET':
    if not full_path.is_file():
        print(f"'{filename}' is not a file or format not allowed.")
        head = HTTP_CODE_404 + END_HEADERS
        return head.encode("utf-8")
    try:
        content = full_path.read_bytes()
        head = CRLF.join([
            HTTP_CODE_200,
            "Content-Type: application/octet-stream",
            f"Content-Length: {len(content)}",
            "Connection: close",
        ]) + END_HEADERS
        return head.encode('utf-8') + content

Handling POST requests

For POST requests, the server delegates the write operation to a helper function that writes the body to disk and verifies the result. If the write succeeds, the handler returns 201 Created; otherwise, it logs the error and returns a failure response.

Python snippet

if req.method == 'POST':
    if create_write_file(full_path, req):
        head = HTTP_CODE_201 + END_HEADERS
        return head.encode('utf-8')

MIME types in practice

At this point, MIME types start to matter. To the server, everything is just bytes; the browser needs labels. The Content-Type header tells the browser whether the data is HTML, JSON, an image, or something else. It’s a small detail, but essential to client behavior:

  • text/plain → raw text
  • text/html → HTML document
  • application/json → JSON data
  • application/octet-stream → arbitrary binary file
  • image/png → PNG image
  • image/jpeg → JPEG image

Gzip compression: transforming content the right way

To support gzip, the server checks the Accept-Encoding header. If gzip is not listed, it returns the original body unchanged. If it is supported, the server:

  1. compresses the body,
  2. adds Content-Encoding: gzip,
  3. recalculates Content-Length to match the compressed size.

Python snippet

if "accept-encoding" in req.headers:
    comp_scheme = req.get_header("accept-encoding")
    if "gzip" not in comp_scheme:
        head = CRLF.join([
            HTTP_CODE_200,
            "Content-Type: text/plain",
            f"Content-Length: {len(body)}",
            "Connection: close",
        ]) + END_HEADERS
        return head.encode("utf-8") + body
    else:
        compressed_body = gzip.compress(body)
        head = CRLF.join([
            HTTP_CODE_200,
            "Content-Type: text/plain",
            "Content-Encoding: gzip",
            f"Content-Length: {len(compressed_body)}",
            "Connection: close",
        ]) + END_HEADERS
        return head.encode("utf-8") + compressed_body

As shown in the code above, the server retrieves the Accept-Encoding header and checks whether it contains the string "gzip". If not, it builds a normal 200 OK response with the uncompressed body. If gzip is allowed, the server compresses the response body using gzip.compress(), sets Content-Encoding: gzip, and computes the new Content-Length based on the compressed bytes before sending the response.

This is where message framing becomes unavoidable. After compression, the server must send exactly the number of bytes it declares. One extra or missing byte leads to a corrupted stream or a hanging browser.

Beyond correctness, gzip shows why the modern web feels fast. Human-readable data compresses extremely well, cutting bandwidth and latency. Clients expect gzip almost automatically, because compression is now a standard part of web performance. Implementing it gave me a practical understanding of why servers care so much about precision and efficiency.

Pitfalls that only show up when you build it yourself

Working through these features exposed the problems that frameworks normally shield you from. Buffering and partial reads became constant companions: a single recv() might contain half a header, two headers glued together, or a body split across multiple fragments.

Another challenge was handling more than one request on the same connection. Even without full keep-alive support, a single read could contain the end of one request and the beginning of the next. Reading even one byte too far meant accidentally swallowing part of the next message.

Then there was the inevitable off-by-one Content-Length bug. If the length I advertised did not match the actual body, even by a single byte, the browser would discard the response or hang indefinitely.

Gzip came with its own sharp edges as well: one wrong size, one incorrect frame, and the browser simply refused to decode the stream. These small battles are where HTTP stops being a neat list of rules and becomes something you can feel. TCP is not trying to help you; it simply delivers whatever arrives. In those unpredictable corners, real intuition is built. No framework or tutorial can give you that.

Where a custom HTTP server makes sense

I built this server as a learning exercise, but it also made me notice where a tiny hand-rolled server genuinely has value.

Some environments need something light and predictable, especially embedded or constrained systems where a full framework is excessive. In other contexts, you may need complete control over the bytes on the wire: for experimenting with protocol behavior or testing edge cases that mainstream servers hide.

There are long-lived systems built on older or proprietary protocols that expect very specific communication patterns. In those cases, building your own server is not over-engineering; it is the only way to meet the requirements.

And for debugging, a minimal server that shows exactly what it sends and receives is incredibly useful for performance investigations and networking issues, where removing abstraction brings clarity. I did not begin this project thinking about real-world uses, but working this close to the protocol made those possibilities hard to miss.

Next step: persistent connections

With the basics and gzip in place, my next step is implementing persistent connections.

HTTP/1.1 encourages reusing the same connection for multiple requests, which means:

  • reading the next request without mixing it with leftover bytes,
  • avoiding reads that block forever,
  • keeping clean state across consecutive messages.

At that point, it starts feeling less like a single exchange and more like a conversation streamed over time.

Why fundamentals matter even more in the AI era

Building this server changed how I see backend development. AI can write code, scaffold apps, and fill in patterns, but it cannot give you intuition. It cannot teach you why something breaks or help you debug a system that behaves unexpectedly.

The fundamentals — buffering, framing, parsing, MIME types, compression — are the bedrock beneath everything else. The more we automate, the more these foundations matter. Building an HTTP server was not about reinventing anything. It was about grounding myself before climbing higher.

Subscribe to our newsletter!

Camille Onoda https://www.linkedin.com/in/camilleonoda/

Camille Onoda is a backend developer in training working with Python, Go, and AWS. Coming from a background in translation, she brings clarity, structure, and cross-cultural communication into her engineering work. She is focused on learning backend fundamentals through practical, self-built projects. Experienced in remote collaboration and version control, she values reliability, clean design, and understanding how systems really work under the surface. She speaks French, English and Japanese.

You May Also Like

+ There are no comments

Add yours