Skip to content

dvc push in a project with multiple users #7510

Closed
@emilijapur

Description

@emilijapur

Bug Report

dvc push: failed to push data to the cloud

Description

Hello, I stumbled into a problem that when multiple people work in the same project and they run different experiments on their computer sometimes dvc generates directories in .dvc/cache with the same name as it already exists in dvc remote server. Thus, if a user wants to push data after dvc run it can not be done, because for example directory .dvc/cache/16 exists in /path/to/remote/server/16. In that case error is shown:

ERROR: failed to transfer 'md5: 24dd737c0642bf1ff8eee74eb121fbb6' - Permission denied                                                                                                                                                                                 
ERROR: failed to transfer 'md5: 233c0d5895672b19e2428dae2ead5447' - Permission denied                                                                                                                                                                                 
ERROR: failed to push data to the cloud - 2 files failed to upload    

This happens even though the user has all rights in the directory /path/to/remote/server/

This mostly happens when multiple people are working in the same project at the same time or user deletes all his cache in the computer.

I believe this problem can be solved if every user would download all cache from remote server, however, this is not possible in my case, because there are terabytes of data.

Reproduce

Repeat this multiple times:

  1. dvc add dataset.csv
  2. dvc run -first_step -d dataset.csv -o output.csv Rscript first_step.R
  3. dvc push

Delete all dvc cache files from computer and repeat it multiple times again. After a while it would generate folders with the same name as in remote server.

Expected

I expect to push files from any computer without having to download all cache from remote server without any errors.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.9.5 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-81-generic-x86_64-with-glibc2.14
Supports:
        azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.7.1),
        gdrive (pydrive2 = 1.10.0),
        gs (gcsfs = 2022.1.0),
        hdfs (fsspec = 2022.1.0, pyarrow = 7.0.0),
        webhdfs (fsspec = 2022.1.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.1.0, boto3 = 1.20.24),
        ssh (sshfs = 2021.11.2),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.4),
        webdavs (webdav4 = 0.9.4)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb2
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sdb2
Repo: dvc, git

OS - Ubuntu 20.04.4 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/push

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions