Open
Description
Hi, I got data file corrupted errors with the log in postgresql-log file :
ERROR: could not read block xxx in file "base/xxx/xxx": read only 0 of 8192 bytes
Not all postgres data files were corrupted, but some of them.
I tried to detect broken fsm files as intructions in this link by running query:
SELECT oid::regclass AS relname, pg_relation_filepath(oid) || '_fsm' AS fsm FROM pg_class, CAST(current_setting('block_size') AS bigint) AS bs WHERE relkind IN ('r', 'i', 't', 'm') AND EXISTS (SELECT 1 FROM generate_series(pg_relation_size(oid) / bs, (pg_relation_size(oid, 'fsm') - 2*bs) / 2) AS blk WHERE pg_freespace(oid, blk) > 0);
Reproducing steps:
- Install postgres cluster by helm with config yaml:
postgresVersion: 16
pgBouncerReplicas: 1
instances:
- name: "instance1"
replicas: 2
resources:
requests:
cpu: "0.5"
memory: "1Gi"
limits:
cpu: "4.0"
memory: "10Gi"
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "1Gi"
patroni:
dynamicConfiguration:
postgresql:
parameters:
shared_buffers: "1GB"
max_connections: "300"
pgBackRestConfig:
global:
repo1-retention-full: "2"
repo1-retention-full-type: count
repos:
- name: repo1
schedules:
full: "00 18 * * *"
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "1Gi"
pgBouncerConfig:
replicas: 1
service:
metadata:
labels:
k8slens-edit-resource-version: v1
postgres-operator.crunchydata.com/cluster: app
postgres-operator.crunchydata.com/role: pgbouncer
type: NodePort
nodePort: [Port number]
config:
global:
max_client_conn: "2000"
pool_mode: "transaction"
default_pool_size: "30"
server_idle_timeout: "60"
service:
metadata:
name: app-primary-ex
namespace: pgo
labels:
app-label: app-label-1
type: NodePort
nodePort: [Port Number]
- Restore database from pg dump backup file (I tried with different pg dump backup files (different databases) but got the same issue)
- Failover to change primary pod (patronictl switchover)
- Got FSM broken file error in the postgres log file.
Solved by:
- Clear the data folder of the replica pod right after restoring db (from pg dump file)--> patroni auto reinit and copy all file
- Try to switchover --> no error.
Is it a bug or something?
Environment
Please provide the following details:
- Platform: Kubernetes
- Platform Version: 1.28.2
- PGO Image Tag: ubi8-5.5.1-0
- Postgres Version: 16
- Storage: nfs
Metadata
Metadata
Assignees
Labels
No labels