How a Tiny Django Droplet Turned 502 Bad Gateway into a Bullet Proof Stack

I run a small Django site on a DigitalOcean droplet. One morning every page started throwing 502 Bad Gateway. A quick peek at the logs showed Django’s classic connection-refused bomb:

The Error

django.db.utils.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1:3306' (111)")

Below is the full play by play: the original setup, the detective work, the code I shipped, and the extra tooling I added so the same glitch won’t bite me twice.

My Code

# settings.py (old)
DATABASES = {
"default": {
"ENGINE": "django.db.backends.mysql",
"NAME": "hms",
"USER": "hms",
"PASSWORD": "27Kovelov",
"HOST": "127.0.0.1",
"PORT": "3306",
"OPTIONS": {"init_command": "SET sql_mode='STRICT_TRANS_TABLES'"},
}
}

Nothing exotic here plain local MySQL, default port, one SQL-mode tweak.

What the Error

  • Error 2003 is MySQL-speak for “can’t reach the server.”
  • (111) is the underlying Linux errno for connection refused.
  • When Django fails its first handshake, my application server (Gunicorn) bails, so Nginx gives the visitor a 502.

Because the failure hit every route at once, I knew the bottleneck was the socket handshake, not a dodgy query buried in a view.

Why the Crash Was Only “Sometimes”

Likely TriggerWhat Actually Happens
MySQL restarts or is killed by the OOM reaperDjango’s first attempt after the restart dies; Nginx shows 502
max_connections reachedNew sockets are refused until old ones close
Firewall or DNS hiccupHandshake times out; Django gives up
A separate data-logging app pounds MySQLCPU or I/O starves, new logins rejected

So concurrent access isn’t the real villain; resource exhaustion is.

Correct Code

First I asked the box a simple question:

# on the droplet
systemctl status mysql
# from my laptop
mysqladmin -h 127.0.0.1 -P 3306 ping

If either command stalls or prints “mysqld is not running,” the fix starts at the database not Django.

Hardening settings.py

I decided to stop hard-coding secrets, keep sockets alive a bit longer, and fail fast when MySQL really is down:

# settings.py (new)
import os
from pathlib import Path
from django.core.exceptions import ImproperlyConfigured

def env(key, default=None):
value = os.getenv(key, default)
if value is None:
raise ImproperlyConfigured(f"Missing env var {key}")
return value

BASE_DIR = Path(__file__).resolve().parent.parent

DATABASES = {
"default": {
"ENGINE": "django.db.backends.mysql",
"NAME": env("DB_NAME", "hms"),
"USER": env("DB_USER", "hms"),
"PASSWORD": env("DB_PASSWORD"),
"HOST": env("DB_HOST", "127.0.0.1"),
"PORT": env("DB_PORT", "3306"),
"CONN_MAX_AGE": 300, # keep sockets for 5 min
"OPTIONS": {
"init_command": "SET sql_mode='STRICT_TRANS_TABLES'",
"connect_timeout": 2 # fail fast
},
}
}
  • Environment variables keep credentials out of Git.
  • CONN_MAX_AGE means fewer handshakes, so a short MySQL blip doesn’t nuke every request.
  • connect_timeout stops the user from staring at a spinner while the load-balancer tries another worker.

Small Tools That Pay Big Dividends

Health-Check Endpoint

# core/views.py
from django.db import connections
from django.http import JsonResponse

def db_health(request):
with connections["default"].cursor() as cur:
cur.execute("SELECT 1")
return JsonResponse({"mysql": "ok"})

I wired this view into urls.py and pointed my uptime monitor at it. Now I get a ping the moment MySQL naps.

Connection Stress Test

# core/management/commands/db_burnin.py
from django.core.management.base import BaseCommand
from django.db import connections, OperationalError

class Command(BaseCommand):
help = "Open N rapid connections to probe MySQL limits"

def add_arguments(self, parser):
parser.add_argument("count", type=int)

def handle(self, *args, **opts):
failures = 0
for _ in range(opts["count"]):
try:
with connections["default"].cursor() as cur:
cur.execute("SELECT 1")
except OperationalError:
failures += 1
self.stdout.write(
self.style.SUCCESS(f"Done: {failures}/{opts['count']} failures")
)

Running python manage.py db_burnin 500 after hours tells me exactly when MySQL starts refusing new logins.

Lightweight Retry Middleware

# middleware.py
import time
from django.db import OperationalError, connections

class DBRetryMiddleware:
def __init__(self, get_response, retries=3):
self.get_response = get_response
self.retries = retries

def __call__(self, request):
for attempt in range(self.retries):
try:
return self.get_response(request)
except OperationalError:
if attempt + 1 == self.retries:
raise
time.sleep(0.5)
for conn in connections.all():
conn.close_if_unusable_or_obsolete()

Added low in MIDDLEWARE, it waits half a second and tries again good enough for a transient spike, cheap enough for normal traffic.

Final Thought

Every “intermittent” database meltdown I’ve hit boiled down to resources (RAM, I/O, connection slots) or a silent MySQL restart never a rogue Django query. By moving secrets to the environment, keeping sockets warm, adding fast timeouts, and building tiny self-tests into the project, I traded midnight mysteries for clean alerts and quick recoveries.

Related blog posts