Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Node Lock Issue After Manually Deleting and Rebuilding an Elasticsearch Node #1257

Open
iziang opened this issue Nov 28, 2024 · 0 comments
Assignees

Comments

@iziang
Copy link
Collaborator

iziang commented Nov 28, 2024

Description:

I encountered an error while trying to start an Elasticsearch node after manually deleting and rebuilding it. The error message indicates that Elasticsearch failed to obtain the necessary node locks, which is preventing it from starting. However, the node started successfully after a second attempt.

Error Details:

  • Timestamp: 2024-11-28T06:26:16.939Z
  • Log Level: ERROR
  • Message: fatal exception while booting Elasticsearch
  • Node Name: bean7-6c556b54b-mdit-0
  • Cluster Name: bean7-6c556b54b
  • Error Type: java.lang.IllegalStateException
  • Error Message: failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?
  • Stack Trace:
    java.lang.IllegalStateException: failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?
        at [email protected]/org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:291)
        at [email protected]/org.elasticsearch.node.Node.<init>(Node.java:483)
        at [email protected]/org.elasticsearch.node.Node.<init>(Node.java:327)
        at [email protected]/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:216)
        at [email protected]/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:216)
        at [email protected]/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)
    Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by another program: /usr/share/elasticsearch/data/node.lock
        at [email protected]/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:117)
        at [email protected]/org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)
        at [email protected]/org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)
        at [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:229)
        at [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:204)
        at [email protected]/org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:283)
        ... 5 more
    

Environment:

  • Elasticsearch Version: 8.8.2
  • Kubernetes Version: [Specify your Kubernetes version]
  • Operating System: [Specify your OS and version]
  • Filesystem: [Specify your filesystem type, e.g., ext4, xfs]

Cluster Configuration:

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  annotations:
    cloud.kubeblocks.io/fields: '{"mode":"multi-node","proxyEnabled":null}'
    kubeblocks.io/extra-env: '{"mdit-roles":"master,data,ingest,transform","mode":"multi-node"}'
  name: bean7-6c556b54b
  namespace: kubeblocks-cloud-ns
spec:
  affinity:
    nodeLabels:
      node-role.cloud.kubeblocks.io/data-plane: ""
      topology.kubernetes.io/region: hangzhou
      topology.kubernetes.io/zone: hangzhou
    podAntiAffinity: Required
    tenancy: SharedNode
    topologyKeys:
    - kubernetes.io/hostname
  componentSpecs:
  - componentDef: elasticsearch-8
    monitor: false
    name: mdit
    replicas: 3
    resources:
      limits:
        cpu: "1"
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 2Gi
    serviceAccountName: kb-bean7-6c556b54b
    serviceVersion: 8.8.2
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        storageClassName: apelocal-hostpath-default

Steps to Reproduce:

  1. Deploy a three-node Elasticsearch cluster using the provided YAML configuration.
  2. Manually delete one of the nodes (e.g., bean7-6c556b54b-mdit-0).
  3. Trigger the rebuild of the deleted node.
  4. Observe the error in the logs indicating the failure to obtain node locks.
  5. Restart the node and observe that it starts successfully.

Expected Behavior:
The Elasticsearch node should start successfully without any errors after the manual deletion and rebuild.

Actual Behavior:
The Elasticsearch node fails to start on the first attempt with an IllegalStateException due to the inability to obtain node locks. The node starts successfully on the second attempt.

Additional Information:

  • The error suggests that the data directory might not be writable or that multiple nodes are trying to use the same data path.
  • I have verified that the data directory /usr/share/elasticsearch/data is writable and no other Elasticsearch instances are running on the same data path.

Questions:

  • Is there a known issue with the current version of Elasticsearch (8.8.2) related to node locks during the rebuild process?
  • Are there any specific permissions or configurations that need to be set for the data directory to avoid this issue?
  • What steps can I take to resolve this issue and ensure that the Elasticsearch node starts successfully on the first attempt?
@iziang iziang self-assigned this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant