Skip to content

Commit

Permalink
Devex 1368 file overwrite (#43)
Browse files Browse the repository at this point in the history
Allowing file overwrite
Reducing race conditions by passing around inodes instead of file structures
Xattrs can be modified and removed
  • Loading branch information
orodeh authored Feb 14, 2020
1 parent 004d66f commit fc0c782
Show file tree
Hide file tree
Showing 37 changed files with 2,651 additions and 1,525 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
main
__pycache__
ENV
cmd_results.txt
cmd_results.txt
diff.txt
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/go/bin/dxfuse : $(wildcard *.go)
go build -o /go/bin/dxfuse /go/src/github.com/dnanexus/dxfuse/cli/main.go
76 changes: 60 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,22 @@ is dropped, and a warning is emitted to the log.
dxfuse approximates a normal POSIX filesystem, but does not always have the same semantics. For example:
1. Metadata like last access time are not supported
2. Directories have approximate create/modify times. This is because DNAx does not keep such attributes for directories.
3. Files are immutable, which means that they cannot be overwritten.
4. A newly written file is located locally. When it is closed, it becomes read-only, and is uploaded to the cloud.

There are several limitations currently:
- Primarily intended for Linux, but can be used on OSX
- Intended to operate on platform workers
- Limits directories to 10,000 elements
- Updates to the project emanating from other machines are not reflected locally
- Rename does not allow removing the target file or directory. This is because this cannot be
done automatically by dnanexus.
- Does not support hard links

Updates to files are batched and asynchronously applied to the cloud
object system. For example, if `foo.txt` is updated, the changes will
not be immediately visible to another user looking at the platform
object directly. Because platform files are immutable, even a minor
modification requires rewriting the entire file, creating a new
version. This is an inherent limitation, making file update
inefficient.

## Implementation

Expand Down Expand Up @@ -87,21 +93,32 @@ download methods were (1) `dx cat`, and (2) `cat` from a dxfuse mount point.
# Building

To build the code from source, you'll need, at the very least, the `go` and `git` tools.
Assuming the go directory is `/go`, then, clone the code with:
install dependencies:
```
go get github.com/google/subcommands
go get golang.org/x/sync/semaphore
go install github.com/google/subcommands
go get github.com/dnanexus/dxda
go install github.com/dnanexus/dxda
go install github.com/dnanexus/dxda/cmd/dx-download-
```

Assuming the go directory is `/go`, clone the code with:
```
cd /go/src/github.com/dnanexus/dxfuse
git clone [email protected]:dnanexus/dxfuse.git
```

Build the code:
```
go build -o /go/bin/dxfuse /go/src/github.com/dnanexus/cmd/main.go
go build -o /go/bin/dxfuse /go/src/github.com/dnanexus/dxfuse/cli/main.go
```

# Usage

To mount a dnanexus project `mammals` on local directory `/home/jonas/foo` do:
```
sudo dxfuse -uid $(id -u) -gid $(id -g) /home/jonas/foo mammals
sudo -E dxfuse -uid $(id -u) -gid $(id -g) /home/jonas/foo mammals
```

The bootstrap process has some asynchrony, so it could take it a
Expand All @@ -111,12 +128,12 @@ the `verbose` flag. Debugging output is written to the log, which is
placed at `/var/log/dxfuse.log`. The maximal verbosity level is 2.

```
sudo dxfuse -verbose 1 MOUNT-POINT PROJECT-NAME
sudo -E dxfuse -verbose 1 MOUNT-POINT PROJECT-NAME
```

Project ids can be used instead of project names. To mount several projects, say, `mammals`, `fish`, and `birds`, do:
```
sudo dxfuse /home/jonas/foo mammals fish birds
sudo -E dxfuse /home/jonas/foo mammals fish birds
```

This will create the directory hierarchy:
Expand All @@ -135,15 +152,22 @@ To stop the dxfuse process do:
sudo umount MOUNT-POINT
```

There are situations where you want the background process to
synchronously update all modified and newly created files. For example, before shutting down a machine,
or unmounting the filesystem. This can be done by issuing the command:
```
$ sudo dxfuse -sync
```

## Extended attributes (xattrs)

DNXa data objects have properties and tags, these are exposed as POSIX extended attributes. The package we use for testing is `xattr` which is native on MacOS (OSX), and can be installed with `sudo apt-get install xattr` on Linux. Xattrs can be written and removed. The examples here use `xattr`, although other tools will work just as well.
DNXa data objects have properties and tags, these are exposed as POSIX extended attributes. Xattrs can be read, written, and removed. The package we use here is `attr`, it can installed with `sudo apt-get install attr` on Linux. On OSX the `xattr` package comes packaged with the base operating system, and can be used to the same effect.

DNAx tags and properties are prefixed. For example, if `zebra.txt` is a file then `xattr -l zebra.txt` will print out all the tags, properties, and attributes that have no POSIX equivalent. These are split into three correspnding prefixes _tag_, _prop_, and _base_ all under the `user` Linux namespace.
DNAx tags and properties are prefixed. For example, if `zebra.txt` is a file then `attr -l zebra.txt` will print out all the tags, properties, and attributes that have no POSIX equivalent. These are split into three correspnding prefixes _tag_, _prop_, and _base_ all under the `user` Linux namespace.

Here `zebra.txt` has no properties or tags.
```
$ xattr -l zebra.txt
$ attr -l zebra.txt
base.state: closed
base.archivalState: live
Expand All @@ -152,24 +176,24 @@ base.id: file-xxxx

Add a property named `family` with value `mammal`
```
$ xattr -w prop.family mammal zebra.txt
$ attr -s prop.family -V mammal zebra.txt
```

Add a tag `africa`
```
$ xattr -w tag.africa XXX zebra.txt
$ attr -s tag.africa -V XXX zebra.txt
```

Remove the `family` property:
```
$ xattr -d prop.family zebra.txt
$ attr -r prop.family zebra.txt
```

You cannot modify any _base.*_ attribute, these are read-only. Currently, setting and deleting xattrs can be done only for files that are closed on the platform.
You cannot modify _base.*_ attributes, these are read-only. Currently, setting and deleting xattrs can be done only for files that are closed on the platform.

## Mac OS (OSX)

For OSX you will need to install [OSXFUSE](http://osxfuse.github.com/). Note that Your Milage May Vary (YMMV) on this platform, we are focused on Linux currently.
For OSX you will need to install [OSXFUSE](http://osxfuse.github.com/). Note that Your Milage May Vary (YMMV) on this platform, we are mostly focused on Linux.

# Common problems

Expand All @@ -178,3 +202,23 @@ If a project appears empty, or is missing files, it could be that the dnanexus t
If you do not set the `uid` and `gid` options then creating hard links will fail on Linux. This is because it will fail the kernel's permissions check.

There is no natural match for DNAnexus applets and workflows, so they are presented as block devices. They do not behave like block devices, but the shell colors them differently from files and directories.

Mmap doesn't work all that well with FUSE ([stack overflow issue](https://stackoverflow.com/questions/46839807/mmap-no-such-device)). For example, trying to memory-map (mmap) a file with python causes an error.

```
>>> import mmap
>>> fd = open('/home/orodeh/MNT/dxfuse_test_data/README.md', 'r')
>>> mmap.mmap(fp.fileno(), 0, mmap.PROT_READ)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 19] No such device
```

A workaround is to make the mapping private:

```
>>> import mmap
>>> fd = open('/home/orodeh/MNT/dxfuse_test_data/README.md', 'r')
>>> mmap.mmap(fd.fileno(), 0, prot=mmap.PROT_READ, flags=mmap.MAP_PRIVATE, offset=0)
>>> fd.readline()
```
9 changes: 6 additions & 3 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Release Notes

## v0.19
Improvements to extended attributes (xattrs). The testing tool we use is `xattr`, which is native on MacOS (OSX), and can be installed with `sudo apt-get install xattr` on Linux.
- *Experimental support for overwriting files*. There is a limit of 16 MiB on a file that is undergoes modification. This is because it needs to first be downloaded in its entirety, before allowing any changes. It will then be uploaded to the platform. This is an expensive operation that is required because DNAnexus files are immutable.

- Xattrs can be written and removed, with the current limitation that this works only for closed files.
- Tags and properties are namespaced. For example, if `zebra.txt` is a normal text file with no DNAx tags or properties then `xattr -l` will print out all the tags, properties, and extra attributes that have no POSIX equivalent. This is split into three namespaces: _base_, _prop_, and _tag_.
- Removed support for hard links. The combination of hard-links, cloning on DNAx, and writable files does not work at the moment.

- Improvements to extended attributes (xattrs). The testing tool we use is `xattr`, which is native on MacOS (OSX), and can be installed with `sudo apt-get install xattr` on Linux. Xattrs can be written and removed.

Tags and properties are namespaced. For example, if `zebra.txt` is a normal text file with no DNAx tags or properties then `xattr -l` will print out all the tags, properties, and extra attributes that have no POSIX equivalent. This is split into three namespaces: _base_, _prop_, and _tag_.

```
$ xattr -l zebra.txt
Expand Down
10 changes: 10 additions & 0 deletions cli/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ func usage() {

var (
debugFuseFlag = flag.Bool("debugFuse", false, "Tap into FUSE debugging information")
fsSync = flag.Bool("sync", false, "Sychronize the filesystem and exit")
gid = flag.Int("gid", -1, "User group id (gid)")
help = flag.Bool("help", false, "display program options")
readOnly = flag.Bool("readOnly", false, "mount the filesystem in read-only mode")
Expand All @@ -55,6 +56,10 @@ func lookupProject(dxEnv *dxda.DXEnvironment, projectIdOrName string) (string, e
// This is a project ID
return projectIdOrName, nil
}
if strings.HasPrefix(projectIdOrName, "container-") {
// This is a container ID
return projectIdOrName, nil
}

// This is a project name, describe it, and
// return the project-id.
Expand Down Expand Up @@ -179,6 +184,11 @@ func parseCmdLineArgs() Config {
fmt.Println(dxfuse.Version)
os.Exit(0)
}
if *fsSync {
cmdClient := dxfuse.NewCmdClient()
cmdClient.Sync()
os.Exit(0)
}
if *help {
usage()
os.Exit(0)
Expand Down
33 changes: 33 additions & 0 deletions cmd_client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package dxfuse

import (
"fmt"
"net/rpc"
"os"
)

type CmdClient struct {
}

// Sending commands with a client
//
func NewCmdClient() *CmdClient {
return &CmdClient{}
}

func (client *CmdClient) Sync() {
rpcClient, err := rpc.Dial("tcp", fmt.Sprintf(":%d", CmdPort))
if err != nil {
fmt.Printf("could not connect to the dxfuse server: %s", err.Error())
os.Exit(1)
}
defer rpcClient.Close()

// Synchronous call
var reply bool
err = rpcClient.Call("CmdServerBox.GetLine", "sync", &reply)
if err != nil {
fmt.Printf("sync error: %s", err.Error())
os.Exit(1)
}
}
83 changes: 83 additions & 0 deletions cmd_server.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
/* Accept commands from the dxfuse_tools program. The only command
* right now is sync, but this is the place to implement additional
* ones to come in the future.
*/
package dxfuse

import (
"fmt"
"log"
"net"
"net/rpc"
)

const (
// A port number for accepting commands
CmdPort = 7205
)

type CmdServer struct {
options Options
sybx *SyncDbDx
inbound *net.TCPListener
}

// A separate structure used for exporting through RPC
type CmdServerBox struct {
cmdSrv *CmdServer
}

func NewCmdServer(options Options, sybx *SyncDbDx) *CmdServer {
cmdServer := &CmdServer{
options: options,
sybx : sybx,
inbound : nil,
}
return cmdServer
}

// write a log message, and add a header
func (cmdSrv *CmdServer) log(a string, args ...interface{}) {
LogMsg("CmdServer", a, args...)
}

func (cmdSrv *CmdServer) Init() {
addy, err := net.ResolveTCPAddr("tcp", fmt.Sprintf(":%d", CmdPort))
if err != nil {
log.Fatal(err)
}

inbound, err := net.ListenTCP("tcp", addy)
if err != nil {
log.Fatal(err)
}
cmdSrv.inbound = inbound

cmdSrvBox := &CmdServerBox{
cmdSrv : cmdSrv,
}
rpc.Register(cmdSrvBox)
go rpc.Accept(inbound)

cmdSrv.log("started command server, accepting external commands")
}

func (cmdSrv *CmdServer) Close() {
cmdSrv.inbound.Close()
}

// Note: all export functions from this module have to have this format.
// Nothing else will work with the RPC package.
func (box *CmdServerBox) GetLine(arg string, reply *bool) error {
cmdSrv := box.cmdSrv
cmdSrv.log("Received line %s", arg)
switch arg {
case "sync":
cmdSrv.sybx.CmdSync()
default:
cmdSrv.log("Unknown command")
}

*reply = true
return nil
}
Loading

0 comments on commit fc0c782

Please sign in to comment.