Replica data file corrupted after restoring database from pg dump file (on primary pod)

Hi, I got data file corrupted errors with the log in postgresql-log file :
`ERROR:  could not read block xxx in file "base/xxx/xxx": read only 0 of 8192 bytes`

Not all postgres data files were corrupted, but some of them.

I tried to detect broken fsm files as intructions in [this link](https://wiki.postgresql.org/wiki/Free_Space_Map_Problems) by running query:

`SELECT oid::regclass AS relname,
       pg_relation_filepath(oid) || '_fsm' AS fsm
FROM pg_class,
     CAST(current_setting('block_size') AS bigint) AS bs
WHERE relkind IN ('r', 'i', 't', 'm') AND EXISTS
  (SELECT 1 FROM
   generate_series(pg_relation_size(oid) / bs,
                   (pg_relation_size(oid, 'fsm') - 2*bs) / 2) AS blk
   WHERE pg_freespace(oid, blk) > 0);`

and got result like below:
![image](https://github.com/CrunchyData/postgres-operator/assets/26757165/1deba4d7-e073-4a8d-b659-59c7eaec9c5d)

Reproducing steps:

1. Install postgres cluster by helm with config yaml:
```
postgresVersion: 16
pgBouncerReplicas: 1
instances:
  - name: "instance1"
    replicas: 2
    resources:
      requests:
        cpu: "0.5"
        memory: "1Gi"
      limits:
        cpu: "4.0"
        memory: "10Gi"
    dataVolumeClaimSpec:
      accessModes:
      - "ReadWriteOnce"
      resources:
        requests:
          storage: "1Gi"
patroni:
  dynamicConfiguration:
    postgresql:
      parameters:
        shared_buffers: "1GB"
        max_connections: "300"
pgBackRestConfig: 
  global:
    repo1-retention-full: "2"
    repo1-retention-full-type: count
  repos:
  - name: repo1
    schedules:
      full: "00 18 * * *"
    volume:
      volumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: "1Gi"
pgBouncerConfig: 
  replicas: 1
  service: 
    metadata:
      labels:
        k8slens-edit-resource-version: v1
        postgres-operator.crunchydata.com/cluster: app
        postgres-operator.crunchydata.com/role: pgbouncer
    type: NodePort
    nodePort: [Port number]
  config:
    global:
      max_client_conn: "2000"
      pool_mode: "transaction"
      default_pool_size: "30"
      server_idle_timeout: "60"
service:
  metadata:
    name: app-primary-ex
    namespace: pgo
    labels:
      app-label: app-label-1
  type: NodePort
  nodePort: [Port Number]
```

2. Restore database from pg dump backup file (I tried with different pg dump backup files (different databases) but got the same issue)
3. Failover to change primary pod (patronictl switchover)
4. Got FSM broken file error in the postgres log file.

Solved by: 
1. Clear the data folder of the replica pod right after restoring db (from pg dump file)--> patroni auto reinit and copy all file
2. Try to switchover --> no error.

Is it a bug or something?

### Environment

Please provide the following details:

- Platform: Kubernetes
- Platform Version: 1.28.2
- PGO Image Tag: ubi8-5.5.1-0
- Postgres Version: 16
- Storage: nfs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replica data file corrupted after restoring database from pg dump file (on primary pod) #3923

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replica data file corrupted after restoring database from pg dump file (on primary pod) #3923

Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions