Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分布式部署副本未生效 #2716

Open
1 task done
shuai-smart opened this issue Dec 30, 2024 · 4 comments
Open
1 task done

分布式部署副本未生效 #2716

shuai-smart opened this issue Dec 30, 2024 · 4 comments
Labels
inactive pd PD module store Store module

Comments

@shuai-smart
Copy link

shuai-smart commented Dec 30, 2024

Problem Type (问题类型)

others (please edit later)

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: 1.5.0 (Apache Release Version)
  • Backend: hstore

Your Question (问题描述)

hadoop01->110
hadoop02->115
hadoop03->140

1.server(110)
2.pd (110,115,140)
3.store(110,115,140)


110pd配置

spring:
  application:
    name: hugegraph-pd

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

logging:
  config: 'file:./conf/log4j2.xml'
license:
  verify-path: ./conf/verify-license.json
  license-path: ./conf/hugegraph.license
grpc:
  port: 8787
  host: hadoop01

server:
  port: 8620

pd:
  data-path: ./pd_data
  patrol-interval: 1800
  initial-store-count: 3
  initial-store-list: hadoop01:8500,hadoop02:8500,hadoop03:8500

raft:
  address: hadoop01:8610
  peers-list: hadoop01:8610,hadoop02:8610,hadoop03:8610

store:
  max-down-time: 172800
  monitor_data_enabled: true
  monitor_data_interval: 1 minute
  monitor_data_retention: 1 day

partition:
  default-shard-count: 2
  store-max-shard-count: 5

115pd配置

spring:
  application:
    name: hugegraph-pd

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

logging:
  config: 'file:./conf/log4j2.xml'
license:
  verify-path: ./conf/verify-license.json
  license-path: ./conf/hugegraph.license
grpc:
  port: 8787
  host: hadoop02

server:
  port: 8620

pd:
  data-path: ./pd_data
  patrol-interval: 1800
  initial-store-count: 3
  initial-store-list: hadoop01:8500,hadoop02:8500,hadoop03:8500

raft:
  address: hadoop02:8610
  peers-list: hadoop01:8610,hadoop02:8610,hadoop03:8787

store:
  max-down-time: 172800
  monitor_data_enabled: true
  monitor_data_interval: 1 minute
  monitor_data_retention: 1 day

partition:
  default-shard-count: 2
  store-max-shard-count: 5

140pd配置

spring:
  application:
    name: hugegraph-pd

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

logging:
  config: 'file:./conf/log4j2.xml'
license:
  verify-path: ./conf/verify-license.json
  license-path: ./conf/hugegraph.license
grpc:
  port: 8787
  host: hadoop03

server:
  port: 8620

pd:
  data-path: ./pd_data
  patrol-interval: 1800
  initial-store-count: 3
  initial-store-list: hadoop01:8500,hadoop02:8500,hadoop03:8500

raft:
  address: hadoop03:8610
  peers-list: hadoop01:8610,hadoop02:8610,hadoop03:8610

store:
  max-down-time: 172800
  monitor_data_enabled: true
  monitor_data_interval: 1 minute
  monitor_data_retention: 1 day

partition:
  default-shard-count: 2
  store-max-shard-count: 5

110 store配置

pdserver:
  address: hadoop01:8787,hadoop02:8787,hadoop03:8787

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

grpc:
  host: hadoop01
  port: 8500
  netty-server:
    max-inbound-message-size: 1000MB
raft:
  disruptorBufferSize: 1024
  address: hadoop01:8510
  max-log-file-size: 600000000000
  snapshotInterval: 1800
server:
  port: 8520

app:
  data-path: ./storage

spring:
  application:
    name: store-node-grpc-server
  profiles:
    active: default
    include: pd

logging:
  config: 'file:./conf/log4j2.xml'
  level:
    root: info

115store配置

pdserver:
  address: hadoop01:8787,hadoop02:8787,hadoop03:8787

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

grpc:
  host: hadoop02
  port: 8500
  netty-server:
    max-inbound-message-size: 1000MB
raft:
  disruptorBufferSize: 1024
  address: hadoop02:8510
  max-log-file-size: 600000000000
  snapshotInterval: 1800
server:
  port: 8520

app:
  data-path: ./storage

spring:
  application:
    name: store-node-grpc-server
  profiles:
    active: default
    include: pd

logging:
  config: 'file:./conf/log4j2.xml'
  level:
    root: info

140store配置

pdserver:
  address: hadoop01:8787,hadoop02:8787,hadoop03:8787

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

grpc:
  host: hadoop03
  port: 8500
  netty-server:
    max-inbound-message-size: 1000MB
raft:
  disruptorBufferSize: 1024
  address: hadoop03:8510
  max-log-file-size: 600000000000
  snapshotInterval: 1800
server:
  port: 8520

app:
  data-path: ./storage

spring:
  application:
    name: store-node-grpc-server
  profiles:
    active: default
    include: pd

logging:
  config: 'file:./conf/log4j2.xml'
  level:
    root: info

110server hugegraph.properties配置

gremlin.graph=org.apache.hugegraph.HugeFactory
hstore.partition_count=3

vertex.cache_type=l2
edge.cache_type=l2


backend=hstore
serializer=binary

store=hugegraph


pd.peers=hadoop01:8787,hadoop02:8787,hadoop03:8787


task.scheduler_type=distributed
task.schedule_period=10
task.retry=0
task.wait_timeout=10


search.text_analyzer=jieba
search.text_analyzer_mode=INDEX

110server rest-server.properties配置

restserver.url=http://hadoop01:8083
gremlinserver.url=http://hadoop01:8184

graphs=./conf/graphs

batch.max_write_ratio=80
batch.max_write_threads=0

arthas.telnet_port=8562
arthas.http_port=8561
arthas.ip=127.0.0.1
arthas.disabled_commands=jad

rpc.server_host=hadoop01
rpc.server_port=8092

server.id=server-1
server.role=master

log.slow_query_threshold=1000

memory_monitor.threshold=0.85
memory_monitor.period=2000

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

/v1/partitions
{
  "message": "OK",
  "data": {
    "partitions": [
      {
        "id": 0,
        "version": 0,
        "graphName": "hugegraph/g",
        "startKey": 0,
        "endKey": 21845,
        "workState": "PState_Normal",
        "shards": [
          {
            "address": "hadoop03:8500",
            "storeId": "6354478704347015657",
            "role": "Leader",
            "state": "SState_Normal",
            "progress": 0,
            "committedIndex": "3",
            "partitionId": 0
          }
        ],
        "timestamp": "2024-12-30 16:18:32"
      },
      {
        "id": 1,
        "version": 0,
        "graphName": "hugegraph/g",
        "startKey": 21845,
        "endKey": 43690,
        "workState": "PState_Normal",
        "shards": [
          {
            "address": "hadoop02:8500",
            "storeId": "8980730234736961059",
            "role": "Leader",
            "state": "SState_Normal",
            "progress": 0,
            "committedIndex": "2",
            "partitionId": 1
          }
        ],
        "timestamp": "2024-12-30 16:18:32"
      },
      {
        "id": 2,
        "version": 0,
        "graphName": "hugegraph/g",
        "startKey": 43690,
        "endKey": 65535,
        "workState": "PState_Normal",
        "shards": [
          {
            "address": "hadoop01:8500",
            "storeId": "5761019429836672228",
            "role": "Leader",
            "state": "SState_Normal",
            "progress": 0,
            "committedIndex": "3",
            "partitionId": 2
          }
        ],
        "timestamp": "2024-12-30 16:18:32"
      }
    ]
  },
  "status": 0
}

----------------------------------------------------------------------------------

v1/graphs
{
  "message": "OK",
  "data": {
    "graphs": [
      {
        "graphName": "hugegraph",
        "partitionCount": 3,
        "state": "PState_Normal",
        "partitions": [
          {
            "partitionId": 0,
            "graphName": "hugegraph",
            "workState": "PState_Normal",
            "startKey": 0,
            "endKey": 21845,
            "shards": [
              {
                "partitionId": 0,
                "storeId": 6.354478704347015e+18,
                "state": "SState_Normal",
                "role": "Leader",
                "progress": 0
              }
            ],
            "dataSize": 1
          },
          {
            "partitionId": 1,
            "graphName": "hugegraph",
            "workState": "PState_Normal",
            "startKey": 21845,
            "endKey": 43690,
            "shards": [
              {
                "partitionId": 1,
                "storeId": 8.980730234736962e+18,
                "state": "SState_Normal",
                "role": "Leader",
                "progress": 0
              }
            ],
            "dataSize": 1
          },
          {
            "partitionId": 2,
            "graphName": "hugegraph",
            "workState": "PState_Normal",
            "startKey": 43690,
            "endKey": 65535,
            "shards": [
              {
                "partitionId": 2,
                "storeId": 5.761019429836672e+18,
                "state": "SState_Normal",
                "role": "Leader",
                "progress": 0
              }
            ],
            "dataSize": 1
          }
        ],
        "dataSize": 3,
        "nodeCount": 0,
        "edgeCount": 0,
        "keyCount": 55
      }
    ]
  },
  "status": 0
}

----------------------------------------------------------------------------------

/v1/stores
{
  "message": "OK",
  "data": {
    "stores": [
      {
        "storeId": 5.761019429836672e+18,
        "address": "hadoop01:8500",
        "raftAddress": "hadoop01:8510",
        "version": "",
        "state": "Up",
        "deployPath": "/ssd01/build/bigdata/hugegraph/hugegraph/apache-hugegraph-incubating-1.5.0/apache-hugegraph-store-incubating-1.5.0/lib/hg-store-node-1.5.0.jar",
        "dataPath": "./storage",
        "startTimeStamp": 1735546322025,
        "registedTimeStamp": 1735546322025,
        "lastHeartBeat": 1735548184409,
        "capacity": 944990375936,
        "available": 449771778048,
        "partitionCount": 1,
        "graphSize": 1,
        "keyCount": 17,
        "leaderCount": 1,
        "serviceName": "hadoop01:8500-store",
        "serviceVersion": "",
        "serviceCreatedTimeStamp": 1735546321000,
        "partitions": [
          {
            "partitionId": 2,
            "graphName": "hugegraph",
            "role": "Leader",
            "workState": "PState_Normal",
            "dataSize": 4
          }
        ]
      },
      {
        "storeId": 8.980730234736962e+18,
        "address": "hadoop02:8500",
        "raftAddress": "hadoop02:8510",
        "version": "",
        "state": "Up",
        "deployPath": "/home/ws/apache-hugegraph-store-incubating-1.5.0/lib/hg-store-node-1.5.0.jar",
        "dataPath": "./storage",
        "startTimeStamp": 1735546322710,
        "registedTimeStamp": 1735546322710,
        "lastHeartBeat": 1735548184413,
        "capacity": 1963882692608,
        "available": 1024558968832,
        "partitionCount": 1,
        "graphSize": 1,
        "keyCount": 17,
        "leaderCount": 1,
        "serviceName": "hadoop02:8500-store",
        "serviceVersion": "",
        "serviceCreatedTimeStamp": 1735546322000,
        "partitions": [
          {
            "partitionId": 1,
            "graphName": "hugegraph",
            "role": "Leader",
            "workState": "PState_Normal",
            "dataSize": 3
          }
        ]
      },
      {
        "storeId": 6.354478704347015e+18,
        "address": "hadoop03:8500",
        "raftAddress": "hadoop03:8510",
        "version": "",
        "state": "Up",
        "deployPath": "/opt/software/ws/apache-hugegraph-incubating-1.5.0/apache-hugegraph-store-incubating-1.5.0/lib/hg-store-node-1.5.0.jar",
        "dataPath": "./storage",
        "startTimeStamp": 1735546321846,
        "registedTimeStamp": 1735546321846,
        "lastHeartBeat": 1735548184350,
        "capacity": 993795248128,
        "available": 233263923200,
        "partitionCount": 1,
        "graphSize": 1,
        "keyCount": 19,
        "leaderCount": 1,
        "serviceName": "hadoop03:8500-store",
        "serviceVersion": "",
        "serviceCreatedTimeStamp": 1735546116000,
        "partitions": [
          {
            "partitionId": 0,
            "graphName": "hugegraph",
            "role": "Leader",
            "workState": "PState_Normal",
            "dataSize": 3
          }
        ]
      }
    ],
    "stateCountMap": {
      "Up": 3
    },
    "numOfService": 3,
    "numOfNormalService": 3
  },
  "status": 0
}
Copy link

dosubot bot commented Dec 30, 2024

@shuai-smart 你好!我是一个能够帮助你解决 bug、回答问题并成为贡献者的机器人。我在这里帮助你,等待人类维护者的到来。

The issue with the distributed deployment of replicas not taking effect in your HugeGraph setup might be related to incorrect configuration settings. Here are a few things to check and adjust:

  1. Replica Configuration: Ensure that the replica configuration is correctly set with valid integer values for replica counts and that both the datacenter name and the replica count are specified correctly. For example, a valid configuration would look like dc1:2, dc2:1 [1].

  2. Partition and Shard Settings: Verify that the partition and shard settings in your configuration are correctly set. The default-shard-count and store-max-shard-count should align with your deployment needs. The initial-store-count and initial-store-list should accurately reflect the number of stores and their addresses.

  3. Raft Configuration: Check the raft configuration to ensure that the address and peers-list are correctly set across all nodes. Any discrepancies here can lead to issues in the distributed setup.

  4. PD and Store Configuration: Make sure that the pdserver addresses in the store configurations are consistent and correctly point to all PD nodes. Similarly, ensure that the pd configurations in each node are consistent and correctly list all store nodes.

  5. Logs and Monitoring: Review the logs for any errors or warnings that might indicate configuration issues. Also, ensure that monitoring is enabled and check the metrics to see if there are any anomalies.

By ensuring these configurations are correct, you should be able to resolve the issue with the distributed deployment of replicas not taking effect.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@JackyYangPassion
Copy link
Contributor

JackyYangPassion commented Dec 30, 2024

PD 配置项

# replicas 设置要奇数
default-shard-count = 2  设置问题

@shuai-smart

@JackyYangPassion
Copy link
Contributor

JackyYangPassion commented Dec 30, 2024

#2611
这个监控相关的 DashBoard 可以参考,有助于分析定位问题

Copy link

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive pd PD module store Store module
Projects
Status: In progress
Development

No branches or pull requests

2 participants