After deploying Docker containers to production serving millions of requests, I learned that development Docker and production Docker are completely different beasts. A poorly configured container can be a security nightmare, performance bottleneck, and debugging hell. This guide covers battle-tested practices for running Docker in production safely and efficiently.
Security Best Practices
1. Never Run as Root
# β Bad: Runs as root (default)
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]
# β
Good: Create and use non-root user
FROM node:20-alpine
# Create app user
RUN addgroup -g 1001 -S appuser && \
adduser -S -u 1001 -G appuser appuser
WORKDIR /app
# Copy package files first
COPY --chown=appuser:appuser package*.json ./
RUN npm ci --omit=dev
# Copy app files
COPY --chown=appuser:appuser . .
# Switch to non-root user
USER appuser
EXPOSE 3000
CMD ["node", "server.js"]
For .NET:
# β
Good: .NET with non-root user
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime
# Create non-root user
RUN addgroup -g 1001 -S appuser && \
adduser -S -u 1001 -G appuser appuser
WORKDIR /app
COPY --from=build --chown=appuser:appuser /app/publish .
# Switch to non-root user
USER appuser
EXPOSE 8080
ENTRYPOINT ["dotnet", "MyApp.dll"]
2. Use Multi-Stage Builds
# β
Multi-stage build for .NET
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
# Copy csproj and restore dependencies (cached layer)
COPY ["MyApp/MyApp.csproj", "MyApp/"]
RUN dotnet restore "MyApp/MyApp.csproj"
# Copy everything else and build
COPY . .
WORKDIR "/src/MyApp"
RUN dotnet build "MyApp.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false
# Final stage: runtime only (smaller image)
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "MyApp.dll"]
# Size comparison:
# SDK image: ~700 MB
# Runtime-only image: ~110 MB
3. Scan for Vulnerabilities
# Install Trivy scanner
brew install trivy # macOS
# or
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
# Scan image for vulnerabilities
trivy image myapp:latest
# Scan and fail on HIGH/CRITICAL
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:latest
# Generate report
trivy image --format json --output report.json myapp:latest
In CI/CD:
# GitHub Actions
- name: Run Trivy vulnerability scanner
uses: aquasecurity/[email protected]
with:
image-ref: 'myapp:${{ github.sha }}'
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
- name: Upload Trivy results to GitHub Security
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: 'trivy-results.sarif'
4. Minimize Attack Surface
# β
Use minimal base images
FROM alpine:3.19 # 7 MB
FROM scratch # 0 MB (for Go binaries)
FROM gcr.io/distroless/static-debian12 # ~2 MB
# β Avoid full OS images
FROM ubuntu:22.04 # 77 MB
5. Don't Include Secrets in Images
# β Bad: Secrets in environment variables
ENV DATABASE_PASSWORD="MyPassword123"
# β Bad: Secrets in image
COPY secrets.json /app/
# β
Good: Use secrets at runtime
# Docker secrets, environment variables, or mounted volumes
Image Optimization
1. Optimize Layer Caching
# β
Good: Dependencies cached separately
FROM node:20-alpine
WORKDIR /app
# Install dependencies first (cached if package.json unchanged)
COPY package*.json ./
RUN npm ci --omit=dev
# Copy source code (changes frequently)
COPY . .
CMD ["node", "server.js"]
# β Bad: Dependencies reinstalled on every code change
FROM node:20-alpine
WORKDIR /app
COPY . . # Copies everything
RUN npm install # Runs every time
CMD ["node", "server.js"]
2. Use .dockerignore
# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
.vscode
.idea
*.md
.env
.env.local
Dockerfile
docker-compose.yml
*.log
coverage/
.next/
dist/
build/
3. Minimize Layers
# β Bad: Many layers
FROM alpine:3.19
RUN apk add --no-cache curl
RUN apk add --no-cache ca-certificates
RUN apk add --no-cache bash
RUN apk add --no-cache git
# β
Good: Single layer
FROM alpine:3.19
RUN apk add --no-cache \
curl \
ca-certificates \
bash \
git
4. Remove Build Dependencies
# β
Install and remove in same layer
FROM alpine:3.19
RUN apk add --no-cache --virtual .build-deps \
gcc \
musl-dev \
python3-dev && \
pip install --no-cache-dir numpy && \
apk del .build-deps
5. Compress Images
# Use docker-slim to reduce image size by 30-50%
docker-slim build --http-probe myapp:latest
# Example results:
# Original: 500 MB
# Optimized: 150 MB (70% reduction)
Resource Management
1. Set Resource Limits
# docker-compose.yml
# Note: version field is optional in Compose V2+
version: '3.8'
services:
app:
image: myapp:latest
deploy:
resources:
limits:
cpus: '1.0' # Max 1 CPU
memory: 512M # Max 512 MB RAM
reservations:
cpus: '0.5' # Guaranteed 0.5 CPU
memory: 256M # Guaranteed 256 MB RAM
restart: unless-stopped
# Docker run with limits
docker run -d \
--name myapp \
--cpus="1.0" \
--memory="512m" \
--memory-swap="1g" \
--restart=unless-stopped \
myapp:latest
2. Health Checks
# Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY . .
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD node healthcheck.js || exit 1
CMD ["node", "server.js"]
// healthcheck.js
const http = require('http');
const options = {
host: 'localhost',
port: 3000,
path: '/health',
timeout: 2000
};
const req = http.request(options, (res) => {
console.log(`STATUS: ${res.statusCode}`);
process.exit(res.statusCode === 200 ? 0 : 1);
});
req.on('error', (err) => {
console.error('ERROR:', err);
process.exit(1);
});
req.end();
For .NET:
# Install curl for health check (Alpine doesn't include it by default)
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime
RUN apk add --no-cache curl
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Alternative: Use wget (usually pre-installed)
# HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
# CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1
// Program.cs
app.MapHealthChecks("/health");
// Or custom health check
app.MapGet("/health", () =>
{
// Check database, external services, etc.
return Results.Ok(new { status = "healthy" });
});
3. Graceful Shutdown
# Use tini for proper signal handling
FROM alpine:3.19
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]
// server.js - Graceful shutdown
process.on('SIGTERM', async () => {
console.log('SIGTERM received, starting graceful shutdown');
// Stop accepting new connections
server.close(async () => {
console.log('HTTP server closed');
// Close database connections
await db.close();
// Exit
process.exit(0);
});
// Force exit after 30 seconds
setTimeout(() => {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 30000);
});
Networking
1. Use Bridge Networks
# docker-compose.yml
version: '3.8'
services:
app:
image: myapp:latest
networks:
- frontend
- backend
ports:
- "3000:3000"
db:
image: postgres:15
networks:
- backend
# Not exposed to host
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access
2. Use Reverse Proxy
# docker-compose.yml with Nginx
version: '3.8'
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./certs:/etc/nginx/certs:ro
networks:
- frontend
depends_on:
- app
app:
image: myapp:latest
networks:
- frontend
# No ports exposed to host
# nginx.conf
upstream app {
server app:3000;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://app;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Logging & Monitoring
1. Structured Logging
// Use structured logging (JSON)
const logger = require('pino')();
logger.info({ userId: 123, action: 'login' }, 'User logged in');
// Output: {"level":"info","time":1234567890,"userId":123,"action":"login","msg":"User logged in"}
// .NET structured logging
Log.Information("User {UserId} logged in from {IpAddress}", userId, ipAddress);
2. Log to STDOUT/STDERR
# β
Good: Log to stdout
CMD ["node", "server.js"]
# β Bad: Log to files inside container
CMD ["node", "server.js", ">>", "/var/log/app.log"]
3. Centralized Logging
# docker-compose.yml with logging
version: '3.8'
services:
app:
image: myapp:latest
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
labels: "production,app"
# Or use a logging driver
app-with-fluentd:
image: myapp:latest
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: myapp
4. Monitoring with Prometheus
# docker-compose.yml
version: '3.8'
services:
app:
image: myapp:latest
ports:
- "3000:3000"
- "9090:9090" # Metrics endpoint
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
ports:
- "9091:9090"
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
prometheus-data:
grafana-data:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'myapp'
static_configs:
- targets: ['app:9090']
Health Checks
Advanced Health Check
// Program.cs
builder.Services.AddHealthChecks()
.AddCheck("self", () => HealthCheckResult.Healthy())
.AddSqlServer(
connectionString: builder.Configuration.GetConnectionString("DefaultConnection")!,
name: "database",
timeout: TimeSpan.FromSeconds(5))
.AddRedis(
redisConnectionString: builder.Configuration.GetConnectionString("Redis")!,
name: "redis")
.AddUrlGroup(
new Uri("https://api.example.com/health"),
name: "external-api",
timeout: TimeSpan.FromSeconds(3));
app.MapHealthChecks("/health", new HealthCheckOptions
{
ResponseWriter = async (context, report) =>
{
context.Response.ContentType = "application/json";
var result = JsonSerializer.Serialize(new
{
status = report.Status.ToString(),
checks = report.Entries.Select(e => new
{
name = e.Key,
status = e.Value.Status.ToString(),
description = e.Value.Description,
duration = e.Value.Duration
}),
totalDuration = report.TotalDuration
});
await context.Response.WriteAsync(result);
}
});
Secrets Management
1. Docker Secrets
# docker-compose.yml
version: '3.8'
services:
app:
image: myapp:latest
secrets:
- db_password
- api_key
environment:
- DB_PASSWORD_FILE=/run/secrets/db_password
- API_KEY_FILE=/run/secrets/api_key
secrets:
db_password:
file: ./secrets/db_password.txt
api_key:
file: ./secrets/api_key.txt
// Read secrets from files
const fs = require('fs');
function getSecret(secretName) {
const secretPath = process.env[`${secretName}_FILE`];
if (secretPath) {
return fs.readFileSync(secretPath, 'utf8').trim();
}
return process.env[secretName]; // Fallback to env var
}
const dbPassword = getSecret('DB_PASSWORD');
const apiKey = getSecret('API_KEY');
2. Use Environment Variables Securely
# β
Good: Pass secrets at runtime
docker run -d \
-e DATABASE_PASSWORD="$DB_PASSWORD" \
myapp:latest
# β Bad: Secrets in Dockerfile
# ENV DATABASE_PASSWORD="hardcoded"
3. Azure Key Vault / AWS Secrets Manager
// .NET with Azure Key Vault
builder.Configuration.AddAzureKeyVault(
new Uri($"https://{keyVaultName}.vault.azure.net/"),
new DefaultAzureCredential());
// Access secrets
var dbPassword = builder.Configuration["DbPassword"];
Production Deployment Checklist
# Complete production docker-compose.yml
version: '3.8'
services:
app:
image: myapp:latest
container_name: myapp-prod
restart: unless-stopped
# Security
user: "1001:1001"
read_only: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
# Resources
deploy:
resources:
limits:
cpus: '2.0'
memory: 1G
reservations:
cpus: '1.0'
memory: 512M
# Health check (ensure curl is available, or use custom healthcheck script)
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 40s
# Logging
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Networking
networks:
- app-network
# Secrets
secrets:
- db_password
# Environment
environment:
- NODE_ENV=production
- DB_PASSWORD_FILE=/run/secrets/db_password
# Volumes (read-only where possible)
volumes:
- type: tmpfs
target: /tmp
- type: bind
source: ./config
target: /app/config
read_only: true
networks:
app-network:
driver: bridge
secrets:
db_password:
external: true
Conclusion
Production Docker requires discipline and attention to detail:
- β Security first - non-root user, minimal images, vulnerability scanning
- β Optimize images - multi-stage builds, layer caching, .dockerignore
- β Resource limits - prevent resource exhaustion
- β Health checks - automatic recovery from failures
- β Structured logging - easier debugging and monitoring
- β Secrets management - never hardcode credentials
Production Checklist:
- β Running as non-root user
- β Multi-stage build for smaller images
- β Vulnerability scanning in CI/CD
- β Resource limits configured
- β Health checks implemented
- β Graceful shutdown handling
- β Structured logging to stdout
- β Secrets managed securely
- β Monitoring and alerting setup
- β Backup and recovery plan
Remember: If it's not monitored, it's not in production.
π‘ Pro Tip: Always test your Docker configuration in a staging environment that mirrors production before deploying. Use tools like docker-compose to maintain consistency across environments.
π¬ Comments & Reactions