I run a small Django site on a DigitalOcean droplet. One morning every page started throwing 502 Bad Gateway. A quick peek at the logs showed Django’s classic connection-refused bomb:
The Error
django.db.utils.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1:3306' (111)")
Below is the full play by play: the original setup, the detective work, the code I shipped, and the extra tooling I added so the same glitch won’t bite me twice.
My Code
# settings.py (old)
DATABASES = {
"default": {
"ENGINE": "django.db.backends.mysql",
"NAME": "hms",
"USER": "hms",
"PASSWORD": "27Kovelov",
"HOST": "127.0.0.1",
"PORT": "3306",
"OPTIONS": {"init_command": "SET sql_mode='STRICT_TRANS_TABLES'"},
}
}
Nothing exotic here plain local MySQL, default port, one SQL-mode tweak.
What the Error
- Error 2003 is MySQL-speak for “can’t reach the server.”
- (111) is the underlying Linux errno for connection refused.
- When Django fails its first handshake, my application server (Gunicorn) bails, so Nginx gives the visitor a 502.
Because the failure hit every route at once, I knew the bottleneck was the socket handshake, not a dodgy query buried in a view.
Why the Crash Was Only “Sometimes”
Likely Trigger | What Actually Happens |
---|---|
MySQL restarts or is killed by the OOM reaper | Django’s first attempt after the restart dies; Nginx shows 502 |
max_connections reached | New sockets are refused until old ones close |
Firewall or DNS hiccup | Handshake times out; Django gives up |
A separate data-logging app pounds MySQL | CPU or I/O starves, new logins rejected |
So concurrent access isn’t the real villain; resource exhaustion is.
Correct Code
First I asked the box a simple question:
# on the droplet
systemctl status mysql
# from my laptop
mysqladmin -h 127.0.0.1 -P 3306 ping
If either command stalls or prints “mysqld is not running,” the fix starts at the database not Django.
Hardening settings.py
I decided to stop hard-coding secrets, keep sockets alive a bit longer, and fail fast when MySQL really is down:
# settings.py (new)
import os
from pathlib import Path
from django.core.exceptions import ImproperlyConfigured
def env(key, default=None):
value = os.getenv(key, default)
if value is None:
raise ImproperlyConfigured(f"Missing env var {key}")
return value
BASE_DIR = Path(__file__).resolve().parent.parent
DATABASES = {
"default": {
"ENGINE": "django.db.backends.mysql",
"NAME": env("DB_NAME", "hms"),
"USER": env("DB_USER", "hms"),
"PASSWORD": env("DB_PASSWORD"),
"HOST": env("DB_HOST", "127.0.0.1"),
"PORT": env("DB_PORT", "3306"),
"CONN_MAX_AGE": 300, # keep sockets for 5 min
"OPTIONS": {
"init_command": "SET sql_mode='STRICT_TRANS_TABLES'",
"connect_timeout": 2 # fail fast
},
}
}
- Environment variables keep credentials out of Git.
CONN_MAX_AGE
means fewer handshakes, so a short MySQL blip doesn’t nuke every request.connect_timeout
stops the user from staring at a spinner while the load-balancer tries another worker.
Small Tools That Pay Big Dividends
Health-Check Endpoint
# core/views.py
from django.db import connections
from django.http import JsonResponse
def db_health(request):
with connections["default"].cursor() as cur:
cur.execute("SELECT 1")
return JsonResponse({"mysql": "ok"})
I wired this view into urls.py
and pointed my uptime monitor at it. Now I get a ping the moment MySQL naps.
Connection Stress Test
# core/management/commands/db_burnin.py
from django.core.management.base import BaseCommand
from django.db import connections, OperationalError
class Command(BaseCommand):
help = "Open N rapid connections to probe MySQL limits"
def add_arguments(self, parser):
parser.add_argument("count", type=int)
def handle(self, *args, **opts):
failures = 0
for _ in range(opts["count"]):
try:
with connections["default"].cursor() as cur:
cur.execute("SELECT 1")
except OperationalError:
failures += 1
self.stdout.write(
self.style.SUCCESS(f"Done: {failures}/{opts['count']} failures")
)
Running python manage.py db_burnin 500
after hours tells me exactly when MySQL starts refusing new logins.
Lightweight Retry Middleware
# middleware.py
import time
from django.db import OperationalError, connections
class DBRetryMiddleware:
def __init__(self, get_response, retries=3):
self.get_response = get_response
self.retries = retries
def __call__(self, request):
for attempt in range(self.retries):
try:
return self.get_response(request)
except OperationalError:
if attempt + 1 == self.retries:
raise
time.sleep(0.5)
for conn in connections.all():
conn.close_if_unusable_or_obsolete()
Added low in MIDDLEWARE
, it waits half a second and tries again good enough for a transient spike, cheap enough for normal traffic.
Final Thought
Every “intermittent” database meltdown I’ve hit boiled down to resources (RAM, I/O, connection slots) or a silent MySQL restart never a rogue Django query. By moving secrets to the environment, keeping sockets warm, adding fast timeouts, and building tiny self-tests into the project, I traded midnight mysteries for clean alerts and quick recoveries.