Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of the RemoteIO framework #80

Open
mih opened this issue Sep 20, 2023 · 0 comments
Open

Analysis of the RemoteIO framework #80

mih opened this issue Sep 20, 2023 · 0 comments

Comments

@mih
Copy link
Member

mih commented Sep 20, 2023

This is a closer look at the functionality provided by the IO framework used in the ora special remote, shipped with datalad-core. The aim is to determine what functionality is not immediately available from datalad-next's UrlOperations, and how the two approaches can be consolidated.

The IOBase class define a set of operations https://github.com/datalad/datalad/blob/776f465b6332bb6320d1b4dc45c85112ced1dd67/datalad/distributed/ora_remote.py#L109. Three (derived) classes (LocalIO, SSHRemoteIO, HTTPRemoteIO) implement these operations for particular environments/protocols. HTTPRemoteIO is not actually a derived class, and most operations are not implemented for HTTP.

The following list provides notes on the availability, and related/alternative implementations:

  • get_7z [not-http]:
    Returns boolean, indicating the availability of the 7z command on the remote end.
  • mkdir [not-http]
    Creates a directory at a given path, including all parents and regardless of whether it already exists.
  • symlink [not-http]:
    Creates a symlink.
  • put [not-http]:
    Uploads a file.
  • get:
    Downloads a file.
  • rename [not-http]:
    Moves a file/directory
  • remove [not-http]:
    Deletes an existing file
  • remove_dir [not-http]:
    Deletes an empty directory
  • exists:
    Returns a boolean indicating the presence of a file/directory given by a path.
  • get_from_archive [not-http]:
    Extracts a file from a 7z archive and directly writes it through a pipe into a local target file.
  • in_archive [not-http]:
    Returns a boolean indicating the presence of a file/directory inside a 7z archive given by a path.
  • read_file:
    Read a remote file's content (all at once) and return it. This is pretty much get, without writing to a target file.
  • write_file [not-http]:
    Write content to a remote file. Content is passed all at once to printf, so this likely only works for small files. This is pretty much put, without reading from a source file.

Mapping to UrlOperations

This framework defines the following operations. The list contains notes on which IOBase functionality could be mapped onto them

  • stat:
    Can be used for exists, via a UrlOperationsResourceUnknown exception handling
  • download:
    Implements get and read_file.
  • upload [not-http]:
    Implements put and write_file. Handles mkdir implicitly.
  • delete [not-ssh, not-http]:
    Implements remove

There is no equivalent for the "extract from archive" functionality. A more general implementation (via FSSPEC) was proposed (datalad/datalad-next#210), but has not yet materialized. @christian-monch mentioned an implementation matching get_from_archive for HTTP, but it also has not been completed yet.

UrlOperations seems to implement progress reporting consistently, whereas IOBase and friends do not.

SSH-specific observation

SshUrlOperations used _SshCat, a different, simplistic helper to execute remote SSH command and read their output. It uses ThreadedRunner and exposes a stdin argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant