Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

Open
lightmans2 opened this issue Sep 23, 2021 · 2 comments
Open

ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

lightmans2 opened this issue Sep 23, 2021 · 2 comments

Comments

@lightmans2
Copy link

lightmans2 commented Sep 23, 2021

Hello together,

i need some help on our ceph 16.2.5 cluster as iscsi target with esxi nodes

background infos:

  • we have build 3x osd nodes with 60 bluestore osd with and 60x6TB spinning disks, 12 ssd´s and 3nvme.
  • osd nodes have 32cores and 256gb Ram
  • the osd disk are connected to a scsi raid controller ... each disk is configured as raid0 and with write back enabled to use the raid controller cache etc.
  • we have 3x mons and 2x iscsi gateways
  • all servers are connected on a 10Gbit network (switches)
  • all servers have two 10gbit network adapter configured as bond-rr
  • we created one rbd pool with autoscaling and 128pg (at the moment)
  • in the pool are at the moment 5 rbd images... 2x 10tb and 3x500gb with feature exlusic lock and striping v2 (4mb obj / 1mb stipe / count 4)
  • All the images are attached to the two iscsi gateays running tcmu-runner 1.5.4 and exposed as iscsi target
  • we have 6 esxi 6.7u3 servers as computed node connected to the ceph iscsi target

esxi iscsi config:
esxcli system settings advanced set -o /ISCSI/MaxIoSizeKB -i 512
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_LunQDepth=64
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_HostQDepth=64
esxcli system settings advanced set --int-value 1 --option /DataMover/HardwareAcceleratedMove

the osd nodes, mons, rgw/iscsi gateways and esxi nodes are all connected to the 10gbit network with bond-rr

rbd benchmark test:

root@cd133-ceph-osdh-01:~# rados bench -p rbd 10 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_cd133-ceph-osdh-01_87894
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        69        53   211.987       212    0.250578    0.249261
    2      16       129       113   225.976       240    0.296519    0.266439
    3      16       183       167   222.641       216    0.219422    0.273838
    4      16       237       221   220.974       216    0.469045     0.28091
    5      16       292       276   220.773       220    0.249321     0.27565
    6      16       339       323   215.307       188    0.205553     0.28624
    7      16       390       374   213.688       204    0.188404    0.290426
    8      16       457       441   220.472       268    0.181254    0.286525
    9      16       509       493   219.083       208    0.250538    0.286832
   10      16       568       552   220.772       236    0.307829    0.286076
Total time run:         10.2833
Total writes made:      568
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     220.941
Stddev Bandwidth:       22.295
Max bandwidth (MB/sec): 268
Min bandwidth (MB/sec): 188
Average IOPS:           55
Stddev IOPS:            5.57375
Max IOPS:               67
Min IOPS:               47
Average Latency(s):     0.285903
Stddev Latency(s):      0.115162
Max latency(s):         0.88187
Min latency(s):         0.119276
Cleaning up (deleting benchmark objects)
Removed 568 objects
Clean up completed and total clean up time :3.18627

the rbd benchmark says that min 250 mb/s is possible... but i saw realy much more... up to 550mb/s

if i start iftop on one osd node i see the ceph iscsi gw names as rgw and the traffic is nearly 80mb/s
grafik

the ceph dashboard shows that the write iscsi performance are only 40mb/s
the max value i saw was between 40 and 60mb/s.. very poor
grafik

if i look into the vcenter and esxi datastore performance i see very high storage device latencys between 50 and 100ms... very bad
grafik

root@cd133-ceph-mon-01:/home/cephadm# ceph config dump
WHO                                               MASK       LEVEL     OPTION                                       VALUE                                                                                        RO
global                                                       basic     container_image                              docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb  *
global                                                       advanced  journal_max_write_bytes                      1073714824
global                                                       advanced  journal_max_write_entries                    10000
global                                                       advanced  mon_osd_cache_size                           1024
global                                                       dev       osd_client_watch_timeout                     15
global                                                       dev       osd_heartbeat_interval                       5
global                                                       advanced  osd_map_cache_size                           128
global                                                       advanced  osd_max_write_size                           512
global                                                       advanced  rados_osd_op_timeout                         5
global                                                       advanced  rbd_cache_max_dirty                          134217728
global                                                       advanced  rbd_cache_max_dirty_age                      5.000000
global                                                       advanced  rbd_cache_size                               268435456
global                                                       advanced  rbd_op_threads                               2
  mon                                                        advanced  auth_allow_insecure_global_id_reclaim        false
  mon                                                        advanced  cluster_network                              10.50.50.0/24                                                                                *
  mon                                                        advanced  public_network                               10.50.50.0/24                                                                                *
  mgr                                                        advanced  mgr/cephadm/container_init                   True                                                                                         *
  mgr                                                        advanced  mgr/cephadm/device_enhanced_scan             true                                                                                         *
  mgr                                                        advanced  mgr/cephadm/migration_current                2                                                                                            *
  mgr                                                        advanced  mgr/cephadm/warn_on_stray_daemons            false                                                                                        *
  mgr                                                        advanced  mgr/cephadm/warn_on_stray_hosts              false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/10.50.50.21/server_addr                                                                                                     *
  mgr                                                        advanced  mgr/dashboard/ALERTMANAGER_API_HOST          http://10.221.133.161:9093                                                                   *
  mgr                                                        advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY         false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/GRAFANA_API_URL                https://10.221.133.161:3000                                                                  *
  mgr                                                        advanced  mgr/dashboard/ISCSI_API_SSL_VERIFICATION     true                                                                                         *
  mgr                                                        advanced  mgr/dashboard/NAME/server_port               80                                                                                           *
  mgr                                                        advanced  mgr/dashboard/PROMETHEUS_API_HOST            http://10.221.133.161:9095                                                                   *
  mgr                                                        advanced  mgr/dashboard/PROMETHEUS_API_SSL_VERIFY      false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/RGW_API_ACCESS_KEY             W8VEKVFDK1RH5IH2Q3GN                                                                         *
  mgr                                                        advanced  mgr/dashboard/RGW_API_SECRET_KEY             IkIjmjfh3bMLrPOlAFbMfpigSIALAQoKGEHzZgxv                                                     *
  mgr                                                        advanced  mgr/dashboard/camdatadash/server_addr        10.251.133.161                                                                               *
  mgr                                                        advanced  mgr/dashboard/camdatadash/ssl_server_port    8443                                                                                         *
  mgr                                                        advanced  mgr/dashboard/cd133-ceph-mon-01/server_addr                                                                                               *
  mgr                                                        advanced  mgr/dashboard/dasboard/server_port           80                                                                                           *
  mgr                                                        advanced  mgr/dashboard/dashboard/server_addr          10.251.133.161                                                                               *
  mgr                                                        advanced  mgr/dashboard/dashboard/ssl_server_port      8443                                                                                         *
  mgr                                                        advanced  mgr/dashboard/server_addr                    0.0.0.0                                                                                      *
  mgr                                                        advanced  mgr/dashboard/server_port                    8080                                                                                         *
  mgr                                                        advanced  mgr/dashboard/ssl                            false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/ssl_server_port                8443                                                                                         *
  mgr                                                        advanced  mgr/orchestrator/orchestrator                cephadm
  mgr                                                        advanced  mgr/prometheus/server_addr                   0.0.0.0                                                                                      *
  mgr                                                        advanced  mgr/telemetry/channel_ident                  true                                                                                         *
  mgr                                                        advanced  mgr/telemetry/contact                        [email protected]                                                                                *
  mgr                                                        advanced  mgr/telemetry/description                    ceph cluster                                                                         *
  mgr                                                        advanced  mgr/telemetry/enabled                        true                                                                                         *
  mgr                                                        advanced  mgr/telemetry/last_opt_revision              3                                                                                            *
  osd                                                        dev       bluestore_cache_autotune                     false
  osd                                             class:ssd  dev       bluestore_cache_autotune                     false
  osd                                                        dev       bluestore_cache_size                         4000000000
  osd                                             class:ssd  dev       bluestore_cache_size                         4000000000
  osd                                                        dev       bluestore_cache_size_hdd                     4000000000
  osd                                                        dev       bluestore_cache_size_ssd                     4000000000
  osd                                             class:ssd  dev       bluestore_cache_size_ssd                     4000000000
  osd                                                        advanced  bluestore_default_buffered_write             true
  osd                                             class:ssd  advanced  bluestore_default_buffered_write             true
  osd                                                        advanced  osd_max_backfills                            1
  osd                                             class:ssd  dev       osd_memory_cache_min                         4000000000
  osd                                             class:hdd  basic     osd_memory_target                            6000000000
  osd                                             class:ssd  basic     osd_memory_target                            6000000000
  osd                                                        advanced  osd_recovery_max_active                      3
  osd                                                        advanced  osd_recovery_max_single_start                1
  osd                                                        advanced  osd_recovery_sleep                           0.000000
    client.rgw.ceph-rgw.cd133-ceph-rgw-01.klvrwk             basic     rgw_frontends                                beast port=8000                                                                              *
    client.rgw.ceph-rgw.cd133-ceph-rgw-01.ptmqcm             basic     rgw_frontends                                beast port=8001                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.czajah              basic     rgw_frontends                                beast port=8000                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.pdknfg              basic     rgw_frontends                                beast port=8000                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.qkdlfl              basic     rgw_frontends                                beast port=8001                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.tdsxpb              basic     rgw_frontends                                beast port=8001                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.xnadfr              basic     rgw_frontends                                beast port=8001                                                                              *

can somebody explain me what i am doing wrong or what can i do to get a better performance with ceph-iscsi?
doesnt matter what i do or what i tweak the write performance will not get better.

i already experimented with gwcli and the iscsi queue and other settings.
actually i set:
hw_max_sectors 8192
max_data_area_mb 32
cmdsn_depth 64 / the esxi nodes are alredy set fixed to 64 max iscsi commands

everything is fine and multipathing is workind and the recovery is fast ... but the iscsi very slow and i dont know why.
can somebody help me maybe?

@breeze-cool
Copy link

Try to turn off multipath or turn off the feature exclusive lock

@chengzhidada
Copy link

Yes, ceph-iscsi performance loss is more than 50%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants