Description
Bug Report
dvc push: failed to push data to the cloud
Description
Hello, I stumbled into a problem that when multiple people work in the same project and they run different experiments on their computer sometimes dvc generates directories in .dvc/cache with the same name as it already exists in dvc remote server. Thus, if a user wants to push data after dvc run it can not be done, because for example directory .dvc/cache/16 exists in /path/to/remote/server/16. In that case error is shown:
ERROR: failed to transfer 'md5: 24dd737c0642bf1ff8eee74eb121fbb6' - Permission denied
ERROR: failed to transfer 'md5: 233c0d5895672b19e2428dae2ead5447' - Permission denied
ERROR: failed to push data to the cloud - 2 files failed to upload
This happens even though the user has all rights in the directory /path/to/remote/server/
This mostly happens when multiple people are working in the same project at the same time or user deletes all his cache in the computer.
I believe this problem can be solved if every user would download all cache from remote server, however, this is not possible in my case, because there are terabytes of data.
Reproduce
Repeat this multiple times:
- dvc add dataset.csv
- dvc run -first_step -d dataset.csv -o output.csv Rscript first_step.R
- dvc push
Delete all dvc cache files from computer and repeat it multiple times again. After a while it would generate folders with the same name as in remote server.
Expected
I expect to push files from any computer without having to download all cache from remote server without any errors.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.9.5 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-81-generic-x86_64-with-glibc2.14
Supports:
azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.7.1),
gdrive (pydrive2 = 1.10.0),
gs (gcsfs = 2022.1.0),
hdfs (fsspec = 2022.1.0, pyarrow = 7.0.0),
webhdfs (fsspec = 2022.1.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.1.0, boto3 = 1.20.24),
ssh (sshfs = 2021.11.2),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.4),
webdavs (webdav4 = 0.9.4)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb2
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sdb2
Repo: dvc, git
OS - Ubuntu 20.04.4 LTS