Skip to content

PostgreSQL 18: OLD_DATABASES detection fails to find data in /var/lib/postgresql/data/18/docker/ causing silent data loss on container recreation #1400

@athom

Description

@athom

Summary

The docker_setup_env() function in the PostgreSQL 18 entrypoint script fails to detect existing database data located
at /var/lib/postgresql/data/18/docker/ during container recreation, leading to silent data loss through creation of
a new database in an anonymous volume.

Environment

  • Image: postgres:18.1-alpine
  • PGDATA: /var/lib/postgresql/18/docker (default for PostgreSQL 18)
  • Volume Configuration:
    volumes:
      - ./data/postgres:/var/lib/postgresql/data
    

Root Cause

The glob pattern used to detect old databases is incomplete:

Current detection logic (docker-entrypoint.sh, lines 250-254):

  for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker; do
      if [ -s "$d/PG_VERSION" ]; then
          OLD_DATABASES+=( "$d" )
      fi
  done

Problem: The pattern /var/lib/postgresql/*/docker matches only two-level paths:

  • ✅ Matches: /var/lib/postgresql/18/docker
  • ❌ Misses: /var/lib/postgresql/data/18/docker (three-level path)

Reproduction Steps

Initial Setup (First Container)

  # Create docker-compose.yml
  cat > docker-compose.yml <<EOF
  services:
    postgres:
      image: postgres:18.1-alpine
      volumes:
        - ./data/postgres:/var/lib/postgresql/data
      environment:
        POSTGRES_USER: testuser
        POSTGRES_PASSWORD: testpass
        POSTGRES_DB: testdb
  EOF

  # Start container
  docker compose up -d

  # Verify data location
  docker exec postgres ls -la /var/lib/postgresql/data/18/docker/
  # Output shows PG_VERSION exists

  # Insert test data
  docker exec postgres psql -U testuser -d testdb -c "CREATE TABLE test (id INT); INSERT INTO test VALUES (1);"
  docker exec postgres psql -U testuser -d testdb -c "SELECT * FROM test;"
  # Output: 1 row

Container Recreation (Triggers Bug)

# Recreate container (e.g., due to config change)
docker compose down
docker compose up -d

# Check data - LOST!
docker exec postgres psql -U testuser -d testdb -c "SELECT * FROM test;"
# ERROR: relation "test" does not exist

Why Data Loss Occurs

  1. First container: PostgreSQL initializes data at /var/lib/postgresql/data/18/docker/ (bind mount)
  2. Container recreation: New anonymous volume created for /var/lib/postgresql/
  3. Entrypoint check:
# PGDATA = /var/lib/postgresql/18/docker (in anonymous volume)
[ -s "$PGDATA/PG_VERSION" ]  # FALSE (anonymous volume is empty)

# Check old database locations
for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker; do
    # /var/lib/postgresql/18/docker - matches, but is new (empty)
    # /var/lib/postgresql/data/18/docker - NOT MATCHED by glob pattern!
done
  1. Result: OLD_DATABASES array remains empty → No error raised → initdb runs in anonymous volume → New database created
    → Old data silently abandoned

Verification

Test glob pattern behavior:

In running container

  docker exec postgres sh -c 'for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker; do if [
   -s "$d/PG_VERSION" ]; then echo "Found: $d"; fi; done'
  # Output: Found: /var/lib/postgresql/18/docker
  # MISSING: /var/lib/postgresql/data/18/docker

With fixed pattern:

  docker exec postgres sh -c 'for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker
  /var/lib/postgresql/data/*/docker; do if [ -s "$d/PG_VERSION" ]; then echo "Found: $d"; fi; done'
  # Output: Found: /var/lib/postgresql/18/docker
  #         Found: /var/lib/postgresql/data/18/docker ✓

Impact

  • Severity: Critical - Silent data loss
  • Scope: Any deployment where:
    • Volume mounted at /var/lib/postgresql/data (common legacy pattern)
    • Container recreation occurs (e.g., docker-compose config changes, docker run --force-recreate)
  • No warnings: Users receive no error messages - database appears to work but data is lost

Proposed Fix

Update glob pattern in docker_setup_env() to include three-level paths:

  for d in /var/lib/postgresql \
           /var/lib/postgresql/data \
           /var/lib/postgresql/*/docker \
           /var/lib/postgresql/data/*/docker; do  # <-- ADD THIS LINE
      if [ -s "$d/PG_VERSION" ]; then
          OLD_DATABASES+=( "$d" )
      fi
  done

This ensures detection of databases in:

  • /var/lib/postgresql/18/docker (new default)
  • /var/lib/postgresql/data/18/docker (legacy bind mount + PostgreSQL 18 subdirectory)
  • /var/lib/postgresql/17/docker (old version upgrade scenarios)

Workaround

Option 1: Mount parent directory (PostgreSQL 18 recommendation)
volumes:
- ./data/postgres:/var/lib/postgresql # <-- No /data suffix

Option 2: Set explicit PGDATA in bind mount
environment:
PGDATA: /var/lib/postgresql/data/pgdata
volumes:
- ./data/postgres:/var/lib/postgresql/data

Option 3: Manual recovery after recreation

  # Remove anonymous volume
  docker compose down
  docker volume rm $(docker volume ls -q -f name=postgres)
  # Restart - will use bind mount data
  docker compose up -d

Related Issues

Additional Context

This bug was discovered during a production deployment where a configuration change (adding extra_hosts to Caddy
service) triggered container recreation via docker compose up -d. The PostgreSQL container was recreated despite no
changes to its own configuration, resulting in data loss for a production database. The bind mount contained 9.9MB of
user data in /var/lib/postgresql/data/18/docker/, but the entrypoint script failed to detect it, creating a fresh
database in an anonymous volume instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions