Skip to content

Commit

Permalink
Merge branch 'subtag' (closes #27,#28)
Browse files Browse the repository at this point in the history
  • Loading branch information
mwatts15 committed Jan 13, 2015
2 parents 08a468f + cd0afa8 commit 87dba80
Show file tree
Hide file tree
Showing 41 changed files with 3,140 additions and 979 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ tests/test_sqlite3
tests/test_stage
tests/test_tagdb
tests/test_trie
*.log
*.db
7 changes: 2 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#

# define the C compiler to use
CC ?= gcc
CC=gcc

MARCO = ./marco.pl

Expand Down Expand Up @@ -34,10 +34,8 @@ file.c \
log.c \
file_log.c \
trie.c \
scanner.c \
key.c \
set_ops.c \
stream.c \
tag.c \
tagdb.c \
types.c \
Expand All @@ -49,11 +47,10 @@ path_util.c \
tagdb_fs.c \
fs_util.c \
sql.c \
file_cabinet.c
#query.c \
#search_fs.c \
SRCS+= file_cabinet.c
LIBS+= -lsqlite3
CFLAGS+= -DSQLITE_DEFAULT_MMAP_SIZE=268435456
#
# This uses Suffix Replacement within a macro:
Expand Down
104 changes: 101 additions & 3 deletions NOTES
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
(Note to developers: a lot of these notes may be out of date. I don't intend to go through
and mark which ones, but you can test things out to see or email me.)
(Note to developers: a lot of these notes may be out of date.)

For a filesystem with writing you must have truncate as well as write.
I'm not sure if flush is necessary (since it doesn't do anything) but I have
Expand All @@ -17,9 +16,17 @@ so that they are hidden while you can still cd to those directories without
the dot. It shouldn't create any conflicts since tags with filetype names
should refer to files with that type. That all is ultimately up to the user
though.
%%%
If there are such tags, and they shouldn't be visible in listings, then it is
the responsibility of the user to name the tags appropriately.
-- Mon Dec 1 04:05:57 CST 2014

Tag Types
----------
(* This is entirely invalid now. All files have the same "type" which is just a character
string. The types may be re-expanded later to include integers and floats as well, but the
list type probably won't return, and the dictionary type definitely won't.
-- Mon Dec 1 03:57:47 CST 2014 *)
The tag types are given in the name.types file corresponding to the name.db file for a tagdb.
A tag's type is created when the tag is created and saved when the database is saved. No code
other than that dealing with the actual storage and retrieval of the database files
Expand Down Expand Up @@ -61,6 +68,7 @@ how to store the data we get in there.

Sending commands/queries to the tag db
--------------------------------------
(* This is entirely invalid now. -- Mon Dec 1 03:57:47 CST 2014 *)
To communicate directly with the tag database, we have a special file called #LISTEN# that
clients can write to from any location in the filesystem. The #LISTEN# file is not listed
in any readdir calls, but the name is hard coded into the filesystem but may be moved to a
Expand Down Expand Up @@ -116,9 +124,23 @@ entry for the tag union every time a file is inserted into a file drawer, but
don't check to see if that file is actually new. It wouldn't be too hard to do
that, but to do a check wouldn't solve the whole problem. Instead, what I could
do is have
%%%
See the README for how this is handled now. Currently, you can't rename a file
onto another file. The rename operation succeeds, but only the first file will be
listed under the name and both the just-renamed file and the other file will be
listed as <id>#<file-name>.
-- Mon Dec 1 03:56:47 CST 2014

Transactions and durability
---------------------------
(* This has been partially addressed by using a sqlite3 database for storage.
My initial thinking was that the transactions should be scoped to file system
operations, but I was disappointed by the lack of nested transactions in
sqlite3. I'll either write a wrapper that checks if a transaction is in
progress (not sure that's possible) or just accept database-scoped transactions
instead. Incidentally, SQLite supports the kind of online backups I talk about
here: https://www.sqlite.org/capi3ref.html#sqlite3backupinit
-- Mon Dec 1 03:34:19 CST 2014 *)
One of the big problems in this alpha stage is that during debugging there
often come times when the simplest thing to do is kill TagFS. However, without
following the normal unmount procedure, the database file doesn't get written.
Expand All @@ -132,6 +154,10 @@ the database before transaction commit.

File storage and access in the run-time
---------------------------------------
(* This idea was, essentially, rolled into the sqlite3 database in the file_tag
table. The performance questions I was considering have been deferred until the
sqlite3 database shows significant slowdown for the relevant file creation and
deletion events. -- Mon Dec 1 03:15:50 CST 2014 *)
During normal operation, files are stored in heap memory and accessed through
either a file-id indexed table or a tag-set indexed table (called FileCabinet).
In order to link an in-memory file back to its disk record, we keep a reference
Expand Down Expand Up @@ -162,8 +188,80 @@ structure as byte offsets from the base address of the mapped file.

Finally, tags have a tagdb tagdb_value_t associated with them and regular
files have zero or more values associated with them (one for each tag on the
file). A tagdb_value_t can be of variable size and like the file-tag relation,
file). A tagdb_value_t can be of variable size and, like the file-tag relation,
entries are only added while the file system is mounted.

Tag Hierarchy
-------------
The tag hierarchy is intended to be a way of organizing tags where the rest of
the system is strictly tag-based. Super-tags serve the function of namespaces
allowing a user to distinguish tags that should have the same name, but which
have different semantics (e.g., "animal::bat", "baseball::bat", "wom::bat").

Deleting a tag will promote all of its children to the level of the tag being
deleted. For example, if a tag named "a" had a child "b", and the tag named "b"
had a child named "c", then, if the "b"-tag was deleted, the "c" tag becomes a
direct child of "a". Using shell file utilities, this would look like:

$ ls
a a::b a::b::c
$ rmdir a::b
$ ls
a a::c

(* The things below here haven't been coded yet.
-- Mon Dec 1 09:14:50 CST 2014 *)
(* I opted to simply fail the operation rather than do the renaming. It's
easier and doesn't have the potential problems with additional conflicts and
unreasonably long names.
-- Sun Jan 11 16:59:02 CST 2015 *)
If "a" also had a child "c" in the situation described above, then there would
be a conflict between the child of "b" and that. We use two strategies to deal
with this. First, to allow all conflicted files to be recovered using info used
in the 'remove' operation, we prefix the deleted tag's name on to the child's
with an underscore character between. Although this doesn't completely respect
the user's attempt to destroy the semantics of the super-tag, it has the
advantage that we can list all of the collided files of a tag named "tag" by
doing "ls tag_*". Using shell utilities:

$ ls
a a::b a::c a::b::c
$ rmdir a::b
$ ls
a a::c a::b_c

Of course, this first correction may be insufficient if "a" also has a child
with the name "b_c". To resolve this, we append an underscore character to the
end of the new name:

$ ls
a a::b a::b_c a::c a::b::c
$ rmdir a::b
$ ls
a a::c a::b_c a::b_c_

Additional conflicts are addressed by appending more underscores until there is
no longer a conflict or the max file name size is reached. If the max file name
size is reached, then the tag-remove operation will fail completely, the tag
will still be there, and all of its children will have their original names.

Removing a sub-tag relationship does not do any promotion. For the "a::b::c"
example above, removing the "b::c" relationship would not then establish a new
"a::c" relationship. In the TagFS, you would remove a sub-tag relationship by
simply renaming the tag like this:

$ ls
a a::b a::b::c a::b::c::d
$ mv a::b::c c
$ ls
a a::b c c::d

As demonstrated in the example above, a 'remove' is necessarily combined with an
'insert', so the removal of the "b::c" relationship is paired with the promotion
of "c" to a root position.

It is appropriate to think of there being a single root tag that sits at the top
of the hierarchy, although I have discussed things as though there were multiple
'root' tags. TagDB is currently structured so that there are several 'root' tags
but that might change, and in any case it won't affect the description above.
-- Mon Dec 1 09:29:43 CST 2014
28 changes: 23 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,15 @@ in the main project directory. Then run:

to build and run the tests to make sure everything works correctly on your system.

Prerequisites
- Fuse (libs and headers) (>= 2.8.7)
- GLib (Works with glib versions 2.24.1, 2.30.2, 2.32.3)
- CUnit for tests (>= 2.1-3)
Prerequisites:
- GCC (4.8.2) or Clang (3.3)
- Fuse (libs and headers) (2.8.7)
- GLib (Works with glib versions 2.24.1, 2.30.2, 2.32.3, 2.40.2)
- CUnit for tests (2.1-3)
- Valgrind for testing and development

*Other versions may or may not work.*

On Debian/Ubuntu:

$ apt-get install libglib2.0-dev libfuse-dev libcunit1-dev valgrind
Expand All @@ -36,9 +39,24 @@ Where `<mount directory>` is an empty directory. TagFS will create the files it

All un-tagged files are shown in the top level. A file can be referenced at any point in the file system by it's id, so a file `movies/Seven_Samurai` with id `12334` can be referenced as `12334#` or `akira_kurosawa/12334#Seven_Samurai` or `some/random/directory/12334#what-even-is-this-file-s-name`.

Tags are created by making a directory under the mounted TagFS (`mkdir tagname`). In general, directories appear in any location where you could open the directory and find more files. Tags can be renamed and made to appear at additional locations by moving the directory (`mv tag a/b/c/tag`).

Tags can be put in a hierarchy for purposes of organization. If you make a tag `a::b`, TagFS will first create the `a` tag and then the `a::b` tag as its child. An `a::b` directory can then be found under `a` and any files that belong to the `a::b` tag will also appear under `a`. An arbitrarily deep (but resource-limited!) nesting of tags can be made. Renaming tags can detach them from their parent tags, but child tags are only detached from their parents if the child-tags themselves are renamed or the parent tag is deleted.

When you "copy" a file to the mounted tagfs, the file is tagged with the directory name it falls under and thus appears where it would in a normal hierarchical file system. The actual file content is stored in your tagfs data directory which, by default, is in your tagfs user-data directory. You can set the location of your data directory with the `--data-dir` option to tagfs.

Moving a file already within the tagfs to another directory in the tagfs will add the tags that comprise the path except in the special case for removing tags. To remove tags from a file, move the file to a parent directory of the one you are moving from (`mv a/b/tag-to-remove/1#file a/b`)[1]. Note that this depends ONLY on the starting and ending locations rather than on the tags associated with the file. However, the tags associated with the file are generally the only ones that make up a path to the file. Moving a tag to a new location (`mv the-tag new-location`) will cause the tag to show up there, but it will also remain in the original location; you could do the same thing by calling `mkdir new-location/the-tag` assuming `the-tag` already exists.
Moving a file already within the tagfs to another directory in the tagfs will add the tags that comprise the path except in the special case for removing tags. To remove tags from a file, move the file to a directory above the one you are moving from (`mv a/b/tag-to-remove/1#file a/b`)[1]. Note that the set of tags removed depends ONLY on the starting and ending locations rather than on all of the tags associated with the file:

$ ls a/b/c/d
file
$ mv a/b/file a/
$ ls a/b/c/d
$ ls a/c/d
file

If the destination location isn't an ancestor of the starting location, no tags will be removed, but tags besides those already attached to the file will be added.

Moving a *directory* to a new location (`mv the-tag new-location`) will cause the directory to show up there, but it will also remain in the original location; you could do the same thing by calling `mkdir new-location/the-tag` assuming `the-tag` already exists.

When listing files, there are situations where two files with the same name would be listed together. In this case, one of the files is listed normally, but all of the files (including that first one) are also listed with their prefixed name (e.g., `1#filename`). This allows for accessing the file under the usual name as well as accessing all of the files regardless of where they are accessed from.

Expand Down
2 changes: 2 additions & 0 deletions TESTING
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ The options can be used in conjunction with one another.
TESTS="test_trie test_log" NO_VALGRIND=1 make tests

Tests are set up to run on Travis-CI under my free account. Currently the acceptance tests don't run because the fuse file system never mounts. I have no idea why.

One small note: If you want to write a test that deals with startup after doing something specific before shutdown of tagfs, then it's currently easier to make a test in test_tagdb.lc rather than in acceptance_test.pl since doing the shutdown and checking data structures is easier in test_tagdb.lc . In any case, all of the file system state that is stored between runs is in the SQLite database and in the "copies" directory.
5 changes: 5 additions & 0 deletions abstract_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,8 @@ file_id_t get_file_id (AbstractFile *f)
{
return f->id;
}

void set_file_id (AbstractFile *f, file_id_t id)
{
f->id = id;
}
2 changes: 1 addition & 1 deletion abstract_file.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ const char *abstract_file_get_name (AbstractFile *f);
void _set_name (AbstractFile *f, const char *new_name);
int file_id_cmp (AbstractFile *f1, AbstractFile *f2);
int file_name_cmp (AbstractFile *f1, AbstractFile *f2);
int file_str_cmp (AbstractFile *f, char *name);
int file_name_str_cmp (AbstractFile *f, char *name);
file_id_t get_file_id (AbstractFile *f);
void set_file_id (AbstractFile *f, file_id_t);

Expand Down
Loading

0 comments on commit 87dba80

Please sign in to comment.