How a Tiny Django Droplet Turned 502 Bad Gateway into a Bullet Proof Stack

I run a small Django site on a DigitalOcean droplet. One morning every page started throwing 502 Bad Gateway. A quick peek at the logs showed Django’s classic connection-refused bomb:

The Error

django.db.utils.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1:3306' (111)")

Below is the full play by play: the original setup, the detective work, the code I shipped, and the extra tooling I added so the same glitch won’t bite me twice.

My Code

# settings.py (old)
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.mysql",
        "NAME": "hms",
        "USER": "hms",
        "PASSWORD": "27Kovelov",
        "HOST": "127.0.0.1",
        "PORT": "3306",
        "OPTIONS": {"init_command": "SET sql_mode='STRICT_TRANS_TABLES'"},
    }
}

Nothing exotic here plain local MySQL, default port, one SQL-mode tweak.

What the Error

Error 2003 is MySQL-speak for “can’t reach the server.”
(111) is the underlying Linux errno for connection refused.
When Django fails its first handshake, my application server (Gunicorn) bails, so Nginx gives the visitor a 502.

Because the failure hit every route at once, I knew the bottleneck was the socket handshake, not a dodgy query buried in a view.

Why the Crash Was Only “Sometimes”

Likely Trigger	What Actually Happens
MySQL restarts or is killed by the OOM reaper	Django’s first attempt after the restart dies; Nginx shows 502
`max_connections` reached	New sockets are refused until old ones close
Firewall or DNS hiccup	Handshake times out; Django gives up
A separate data-logging app pounds MySQL	CPU or I/O starves, new logins rejected

So concurrent access isn’t the real villain; resource exhaustion is.

Correct Code

First I asked the box a simple question:

# on the droplet
systemctl status mysql
# from my laptop
mysqladmin -h 127.0.0.1 -P 3306 ping

If either command stalls or prints “mysqld is not running,” the fix starts at the database not Django.

Hardening `settings.py`

I decided to stop hard-coding secrets, keep sockets alive a bit longer, and fail fast when MySQL really is down:

# settings.py (new)
import os
from pathlib import Path
from django.core.exceptions import ImproperlyConfigured

def env(key, default=None):
    value = os.getenv(key, default)
    if value is None:
        raise ImproperlyConfigured(f"Missing env var {key}")
    return value

BASE_DIR = Path(__file__).resolve().parent.parent

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.mysql",
        "NAME": env("DB_NAME", "hms"),
        "USER": env("DB_USER", "hms"),
        "PASSWORD": env("DB_PASSWORD"),
        "HOST": env("DB_HOST", "127.0.0.1"),
        "PORT": env("DB_PORT", "3306"),
        "CONN_MAX_AGE": 300,            # keep sockets for 5 min
        "OPTIONS": {
            "init_command": "SET sql_mode='STRICT_TRANS_TABLES'",
            "connect_timeout": 2        # fail fast
        },
    }
}

Environment variables keep credentials out of Git.
CONN_MAX_AGE means fewer handshakes, so a short MySQL blip doesn’t nuke every request.
connect_timeout stops the user from staring at a spinner while the load-balancer tries another worker.

Small Tools That Pay Big Dividends

Health-Check Endpoint

# core/views.py
from django.db import connections
from django.http import JsonResponse

def db_health(request):
    with connections["default"].cursor() as cur:
        cur.execute("SELECT 1")
    return JsonResponse({"mysql": "ok"})

I wired this view into urls.py and pointed my uptime monitor at it. Now I get a ping the moment MySQL naps.

Connection Stress Test

# core/management/commands/db_burnin.py
from django.core.management.base import BaseCommand
from django.db import connections, OperationalError

class Command(BaseCommand):
    help = "Open N rapid connections to probe MySQL limits"

    def add_arguments(self, parser):
        parser.add_argument("count", type=int)

    def handle(self, *args, **opts):
        failures = 0
        for _ in range(opts["count"]):
            try:
                with connections["default"].cursor() as cur:
                    cur.execute("SELECT 1")
            except OperationalError:
                failures += 1
        self.stdout.write(
            self.style.SUCCESS(f"Done: {failures}/{opts['count']} failures")
        )

Running python manage.py db_burnin 500 after hours tells me exactly when MySQL starts refusing new logins.

Lightweight Retry Middleware

# middleware.py
import time
from django.db import OperationalError, connections

class DBRetryMiddleware:
    def __init__(self, get_response, retries=3):
        self.get_response = get_response
        self.retries = retries

    def __call__(self, request):
        for attempt in range(self.retries):
            try:
                return self.get_response(request)
            except OperationalError:
                if attempt + 1 == self.retries:
                    raise
                time.sleep(0.5)
                for conn in connections.all():
                    conn.close_if_unusable_or_obsolete()

Added low in MIDDLEWARE, it waits half a second and tries again good enough for a transient spike, cheap enough for normal traffic.

Final Thought

Every “intermittent” database meltdown I’ve hit boiled down to resources (RAM, I/O, connection slots) or a silent MySQL restart never a rogue Django query. By moving secrets to the environment, keeping sockets warm, adding fast timeouts, and building tiny self-tests into the project, I traded midnight mysteries for clean alerts and quick recoveries.

How a Tiny Django Droplet Turned 502 Bad Gateway into a Bullet Proof Stack

The Error

My Code

What the Error

Why the Crash Was Only “Sometimes”

Correct Code

Hardening `settings.py`

Small Tools That Pay Big Dividends

Health-Check Endpoint

Connection Stress Test

Lightweight Retry Middleware

Final Thought

Related blog posts

How Do I Fix “Game is not Define” Error in My Console Game

How I Fix the Error While Sending Request From Web Server to Game Server

How to Fix the CANNOT_BIND Error While Integrating Amazon Game Circle on Kindle Fire

How I Fix an Array Index Error While Swiping in My Match-3 Game

How a Tiny Django Droplet Turned 502 Bad Gateway into a Bullet Proof Stack

The Error

My Code

What the Error

Why the Crash Was Only “Sometimes”

Correct Code

Hardening settings.py

Small Tools That Pay Big Dividends

Health-Check Endpoint

Connection Stress Test

Lightweight Retry Middleware

Final Thought

Related blog posts

How Do I Fix “Game is not Define” Error in My Console Game

How I Fix the Error While Sending Request From Web Server to Game Server

How to Fix the CANNOT_BIND Error While Integrating Amazon Game Circle on Kindle Fire

How I Fix an Array Index Error While Swiping in My Match-3 Game

Hardening `settings.py`