Docker Production Best Practices: Security, Optimization & Monitoring

After deploying Docker containers to production serving millions of requests, I learned that development Docker and production Docker are completely different beasts. A poorly configured container can be a security nightmare, performance bottleneck, and debugging hell. This guide covers battle-tested practices for running Docker in production safely and efficiently.

Security Best Practices

1. Never Run as Root

# ❌ Bad: Runs as root (default)
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]

# ✅ Good: Create and use non-root user
FROM node:20-alpine

# Create app user
RUN addgroup -g 1001 -S appuser && \
    adduser -S -u 1001 -G appuser appuser

WORKDIR /app

# Copy package files first
COPY --chown=appuser:appuser package*.json ./
RUN npm ci --omit=dev

# Copy app files
COPY --chown=appuser:appuser . .

# Switch to non-root user
USER appuser

EXPOSE 3000
CMD ["node", "server.js"]

For .NET:

# ✅ Good: .NET with non-root user
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

# Create non-root user
RUN addgroup -g 1001 -S appuser && \
    adduser -S -u 1001 -G appuser appuser

WORKDIR /app
COPY --from=build --chown=appuser:appuser /app/publish .

# Switch to non-root user
USER appuser

EXPOSE 8080
ENTRYPOINT ["dotnet", "MyApp.dll"]

2. Use Multi-Stage Builds

# ✅ Multi-stage build for .NET
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src

# Copy csproj and restore dependencies (cached layer)
COPY ["MyApp/MyApp.csproj", "MyApp/"]
RUN dotnet restore "MyApp/MyApp.csproj"

# Copy everything else and build
COPY . .
WORKDIR "/src/MyApp"
RUN dotnet build "MyApp.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false

# Final stage: runtime only (smaller image)
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "MyApp.dll"]

# Size comparison:
# SDK image: ~700 MB
# Runtime-only image: ~110 MB

3. Scan for Vulnerabilities

# Install Trivy scanner
brew install trivy  # macOS
# or
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin

# Scan image for vulnerabilities
trivy image myapp:latest

# Scan and fail on HIGH/CRITICAL
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:latest

# Generate report
trivy image --format json --output report.json myapp:latest

In CI/CD:

# GitHub Actions
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/[email protected]
  with:
    image-ref: 'myapp:${{ github.sha }}'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'

- name: Upload Trivy results to GitHub Security
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: 'trivy-results.sarif'

4. Minimize Attack Surface

# ✅ Use minimal base images
FROM alpine:3.19           # 7 MB
FROM scratch              # 0 MB (for Go binaries)
FROM gcr.io/distroless/static-debian12  # ~2 MB

# ❌ Avoid full OS images
FROM ubuntu:22.04         # 77 MB

5. Don't Include Secrets in Images

# ❌ Bad: Secrets in environment variables
ENV DATABASE_PASSWORD="MyPassword123"

# ❌ Bad: Secrets in image
COPY secrets.json /app/

# ✅ Good: Use secrets at runtime
# Docker secrets, environment variables, or mounted volumes

Image Optimization

1. Optimize Layer Caching

# ✅ Good: Dependencies cached separately
FROM node:20-alpine
WORKDIR /app

# Install dependencies first (cached if package.json unchanged)
COPY package*.json ./
RUN npm ci --omit=dev

# Copy source code (changes frequently)
COPY . .

CMD ["node", "server.js"]

# ❌ Bad: Dependencies reinstalled on every code change
FROM node:20-alpine
WORKDIR /app
COPY . .                    # Copies everything
RUN npm install             # Runs every time
CMD ["node", "server.js"]

2. Use .dockerignore

# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
.vscode
.idea
*.md
.env
.env.local
Dockerfile
docker-compose.yml
*.log
coverage/
.next/
dist/
build/

3. Minimize Layers

# ❌ Bad: Many layers
FROM alpine:3.19
RUN apk add --no-cache curl
RUN apk add --no-cache ca-certificates
RUN apk add --no-cache bash
RUN apk add --no-cache git

# ✅ Good: Single layer
FROM alpine:3.19
RUN apk add --no-cache \
    curl \
    ca-certificates \
    bash \
    git

4. Remove Build Dependencies

# ✅ Install and remove in same layer
FROM alpine:3.19
RUN apk add --no-cache --virtual .build-deps \
        gcc \
        musl-dev \
        python3-dev && \
    pip install --no-cache-dir numpy && \
    apk del .build-deps

5. Compress Images

# Use docker-slim to reduce image size by 30-50%
docker-slim build --http-probe myapp:latest

# Example results:
# Original: 500 MB
# Optimized: 150 MB (70% reduction)

Resource Management

1. Set Resource Limits

# docker-compose.yml
# Note: version field is optional in Compose V2+
version: '3.8'

services:
  app:
    image: myapp:latest
    deploy:
      resources:
        limits:
          cpus: '1.0'        # Max 1 CPU
          memory: 512M       # Max 512 MB RAM
        reservations:
          cpus: '0.5'        # Guaranteed 0.5 CPU
          memory: 256M       # Guaranteed 256 MB RAM
    restart: unless-stopped

# Docker run with limits
docker run -d \
  --name myapp \
  --cpus="1.0" \
  --memory="512m" \
  --memory-swap="1g" \
  --restart=unless-stopped \
  myapp:latest

2. Health Checks

# Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY . .

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD node healthcheck.js || exit 1

CMD ["node", "server.js"]

// healthcheck.js
const http = require('http');

const options = {
  host: 'localhost',
  port: 3000,
  path: '/health',
  timeout: 2000
};

const req = http.request(options, (res) => {
  console.log(`STATUS: ${res.statusCode}`);
  process.exit(res.statusCode === 200 ? 0 : 1);
});

req.on('error', (err) => {
  console.error('ERROR:', err);
  process.exit(1);
});

req.end();

For .NET:

# Install curl for health check (Alpine doesn't include it by default)
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime
RUN apk add --no-cache curl

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

# Alternative: Use wget (usually pre-installed)
# HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
#   CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1

// Program.cs
app.MapHealthChecks("/health");

// Or custom health check
app.MapGet("/health", () =>
{
    // Check database, external services, etc.
    return Results.Ok(new { status = "healthy" });
});

3. Graceful Shutdown

# Use tini for proper signal handling
FROM alpine:3.19
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]

// server.js - Graceful shutdown
process.on('SIGTERM', async () => {
  console.log('SIGTERM received, starting graceful shutdown');
  
  // Stop accepting new connections
  server.close(async () => {
    console.log('HTTP server closed');
    
    // Close database connections
    await db.close();
    
    // Exit
    process.exit(0);
  });
  
  // Force exit after 30 seconds
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 30000);
});

Networking

1. Use Bridge Networks

# docker-compose.yml
version: '3.8'

services:
  app:
    image: myapp:latest
    networks:
      - frontend
      - backend
    ports:
      - "3000:3000"

  db:
    image: postgres:15
    networks:
      - backend
    # Not exposed to host

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # No external access

2. Use Reverse Proxy

# docker-compose.yml with Nginx
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./certs:/etc/nginx/certs:ro
    networks:
      - frontend
    depends_on:
      - app

  app:
    image: myapp:latest
    networks:
      - frontend
    # No ports exposed to host

# nginx.conf
upstream app {
    server app:3000;
}

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://app;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Logging & Monitoring

1. Structured Logging

// Use structured logging (JSON)
const logger = require('pino')();

logger.info({ userId: 123, action: 'login' }, 'User logged in');
// Output: {"level":"info","time":1234567890,"userId":123,"action":"login","msg":"User logged in"}

// .NET structured logging
Log.Information("User {UserId} logged in from {IpAddress}", userId, ipAddress);

2. Log to STDOUT/STDERR

# ✅ Good: Log to stdout
CMD ["node", "server.js"]

# ❌ Bad: Log to files inside container
CMD ["node", "server.js", ">>", "/var/log/app.log"]

3. Centralized Logging

# docker-compose.yml with logging
version: '3.8'

services:
  app:
    image: myapp:latest
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
        labels: "production,app"

  # Or use a logging driver
  app-with-fluentd:
    image: myapp:latest
    logging:
      driver: "fluentd"
      options:
        fluentd-address: localhost:24224
        tag: myapp

4. Monitoring with Prometheus

# docker-compose.yml
version: '3.8'

services:
  app:
    image: myapp:latest
    ports:
      - "3000:3000"
      - "9090:9090"  # Metrics endpoint

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    ports:
      - "9091:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

volumes:
  prometheus-data:
  grafana-data:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['app:9090']

Health Checks

Advanced Health Check

// Program.cs
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy())
    .AddSqlServer(
        connectionString: builder.Configuration.GetConnectionString("DefaultConnection")!,
        name: "database",
        timeout: TimeSpan.FromSeconds(5))
    .AddRedis(
        redisConnectionString: builder.Configuration.GetConnectionString("Redis")!,
        name: "redis")
    .AddUrlGroup(
        new Uri("https://api.example.com/health"),
        name: "external-api",
        timeout: TimeSpan.FromSeconds(3));

app.MapHealthChecks("/health", new HealthCheckOptions
{
    ResponseWriter = async (context, report) =>
    {
        context.Response.ContentType = "application/json";
        var result = JsonSerializer.Serialize(new
        {
            status = report.Status.ToString(),
            checks = report.Entries.Select(e => new
            {
                name = e.Key,
                status = e.Value.Status.ToString(),
                description = e.Value.Description,
                duration = e.Value.Duration
            }),
            totalDuration = report.TotalDuration
        });
        await context.Response.WriteAsync(result);
    }
});

Secrets Management

1. Docker Secrets

# docker-compose.yml
version: '3.8'

services:
  app:
    image: myapp:latest
    secrets:
      - db_password
      - api_key
    environment:
      - DB_PASSWORD_FILE=/run/secrets/db_password
      - API_KEY_FILE=/run/secrets/api_key

secrets:
  db_password:
    file: ./secrets/db_password.txt
  api_key:
    file: ./secrets/api_key.txt

// Read secrets from files
const fs = require('fs');

function getSecret(secretName) {
  const secretPath = process.env[`${secretName}_FILE`];
  if (secretPath) {
    return fs.readFileSync(secretPath, 'utf8').trim();
  }
  return process.env[secretName]; // Fallback to env var
}

const dbPassword = getSecret('DB_PASSWORD');
const apiKey = getSecret('API_KEY');

2. Use Environment Variables Securely

# ✅ Good: Pass secrets at runtime
docker run -d \
  -e DATABASE_PASSWORD="$DB_PASSWORD" \
  myapp:latest

# ❌ Bad: Secrets in Dockerfile
# ENV DATABASE_PASSWORD="hardcoded"

3. Azure Key Vault / AWS Secrets Manager

// .NET with Azure Key Vault
builder.Configuration.AddAzureKeyVault(
    new Uri($"https://{keyVaultName}.vault.azure.net/"),
    new DefaultAzureCredential());

// Access secrets
var dbPassword = builder.Configuration["DbPassword"];

Production Deployment Checklist

# Complete production docker-compose.yml
version: '3.8'

services:
  app:
    image: myapp:latest
    container_name: myapp-prod
    restart: unless-stopped
    
    # Security
    user: "1001:1001"
    read_only: true
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    
    # Resources
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 1G
        reservations:
          cpus: '1.0'
          memory: 512M
    
    # Health check (ensure curl is available, or use custom healthcheck script)
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 40s
    
    # Logging
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    
    # Networking
    networks:
      - app-network
    
    # Secrets
    secrets:
      - db_password
    
    # Environment
    environment:
      - NODE_ENV=production
      - DB_PASSWORD_FILE=/run/secrets/db_password
    
    # Volumes (read-only where possible)
    volumes:
      - type: tmpfs
        target: /tmp
      - type: bind
        source: ./config
        target: /app/config
        read_only: true

networks:
  app-network:
    driver: bridge

secrets:
  db_password:
    external: true

Conclusion

Production Docker requires discipline and attention to detail:

✅ Security first - non-root user, minimal images, vulnerability scanning
✅ Optimize images - multi-stage builds, layer caching, .dockerignore
✅ Resource limits - prevent resource exhaustion
✅ Health checks - automatic recovery from failures
✅ Structured logging - easier debugging and monitoring
✅ Secrets management - never hardcode credentials

Production Checklist:

☐ Running as non-root user
☐ Multi-stage build for smaller images
☐ Vulnerability scanning in CI/CD
☐ Resource limits configured
☐ Health checks implemented
☐ Graceful shutdown handling
☐ Structured logging to stdout
☐ Secrets managed securely
☐ Monitoring and alerting setup
☐ Backup and recovery plan

Remember: If it's not monitored, it's not in production.

💡 Pro Tip: Always test your Docker configuration in a staging environment that mirrors production before deploying. Use tools like docker-compose to maintain consistency across environments.