Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data mismatch on block volume when multiple pods share the same PVC #46

Open
stoneshi-yunify opened this issue Oct 12, 2020 · 1 comment

Comments

@stoneshi-yunify
Copy link
Contributor

stoneshi-yunify commented Oct 12, 2020

This issue is detected when running K8S CSI E2E test suite InitMultiVolumeTestSuite while CSI driver supports RWX(readwritemany) access mode.

Test steps:

  • set up a multi-node neonsan cluster, with k8s and neonsan-csi installed.
  • create a storage class
  • create a PVC pvc1 with above storage class, volume mode = block, access mode = ReadWriteMany
  • create pod1 on node1 with pvc1 (as block volume), create pod2 on node2 with pvc1 (as block volume) as well.
  • write some data on node1, then read the data on both node1 and node2.

Expected Result:

  • node1 and node2 should read out the exact same data, as they share the same underlying neonsan storage blocks (as they use the same one PVC).

Actual Result:

  • Data mismatch on node1 and node2. node1 has the correct data, while node2 does not.

Test Env:
172.31.30.10, ssh 192.168.101.174-176

Logs:

root@testr01n01:~# kubectl -n multivolume-7887 get pvc
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                    AGE
neonsan.csi.qingstor.com2s7jp   Bound    pvc-b88eb38a-6ff7-4a1c-a062-eef382aa53cc   5Gi        RWX            multivolume-7887-neonsan8wx4p   137m
root@testr01n01:~# kubectl -n multivolume-7887 get pod -o wide
NAME                                                    READY   STATUS    RESTARTS   AGE    IP             NODE         NOMINATED NODE   READINESS GATES
security-context-0cce6ebe-968e-4db2-ae00-f8a9a8d911ca   1/1     Running   0          139m   10.233.98.51   testr01n01   <none>           <none>
test-pod2                                               1/1     Running   0          38m    10.233.73.59   testr01n02   <none>           <none>

node1:

root@testr01n01:~# echo "i love china" | dd of=/dev/qbd7 bs=64 count=1
0+1 records in
0+1 records out
13 bytes copied, 7.0572e-05 s, 184 kB/s
root@testr01n01:~# head -c 64 /dev/qbd7
i love china
ay
O�0~�$f��R�1��dy��6n�u	#1�^;S�S�ϕC����q
m��root@testr01n01:~#
root@testr01n01:~#

node2:

root@testr01n02:~# qbd -l | grep b88eb38a-6ff7-4a1c-a062-eef382aa53cc
49	0x87a000000	qbd49	tcp://kube/pvc-b88eb38a-6ff7-4a1c-a062-eef382aa53cc	/etc/neonsan/qbd.conf	0	0	0	0
root@testr01n02:~# head -c 64 /dev/qbd49
test write data 
O�0~�$f��R�1��dy��6n�u	#1�^;S�S�ϕC����q
m��root@testr01n02:~#
root@testr01n02:~# blockdev --flushbufs /dev/qbd49
root@testr01n02:~# head -c 64 /dev/qbd49
i love china
ay
O�0~�$f��R�1��dy��6n�u	#1�^;S�S�ϕC����q
m��root@testr01n02:~#

You can see node2 read the stale data until command blockdev --flushbufs was executed. However, it does not make sense to run the flush command on a new node, and not practical either - user can not run this command every time new data was written from a different node.

The data should be read just right on whichever node sharing the same PVC, without flushing any buffers.

This issue should be fixed, otherwise, we can not claim neonsan supports RWX in k8s.

thanks.

@stoneshi-yunify
Copy link
Contributor Author

Discussed with neonsan developers, when doing IO to a sharing block volume, it's essential to use O_DIRECT of write to skip system cache so that the data can be really staged on the device. It's upper application's responsibility to do this job. In this case, the K8S E2E test suite's job.

Luckily, this issue was also detected by the k8s community and fixed 17 days ago, see kubernetes/kubernetes#94881 for more details.

So far no official k8s build containing the fix is released. We will wait a few days for that and revisit this issue till then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant