Merge branch 'subtag' (closes #27,#28)

mwatts15 · Jan 13, 2015 · 87dba80 · 87dba80
2 parents 08a468f + cd0afa8
commit 87dba80
Show file tree

Hide file tree

Showing 41 changed files with 3,140 additions and 979 deletions.
diff --git a/.gitignore b/.gitignore
@@ -16,3 +16,5 @@ tests/test_sqlite3
 tests/test_stage
 tests/test_tagdb
 tests/test_trie
+*.log
+*.db
diff --git a/Makefile b/Makefile
@@ -5,7 +5,7 @@
 #
 
 # define the C compiler to use
-CC ?= gcc
+CC=gcc
 
 MARCO = ./marco.pl
 
@@ -34,10 +34,8 @@ file.c \
 log.c \
 file_log.c \
 trie.c \
-scanner.c \
 key.c \
 set_ops.c \
-stream.c \
 tag.c \
 tagdb.c \
 types.c \
@@ -49,11 +47,10 @@ path_util.c \
 tagdb_fs.c \
 fs_util.c \
 sql.c \
+file_cabinet.c
 #query.c \
 #search_fs.c \
 
-SRCS+= file_cabinet.c
-LIBS+= -lsqlite3
 CFLAGS+= -DSQLITE_DEFAULT_MMAP_SIZE=268435456
 #
 # This uses Suffix Replacement within a macro:

diff --git a/NOTES b/NOTES
@@ -1,5 +1,4 @@
-(Note to developers: a lot of these notes may be out of date. I don't intend to go through
- and mark which ones, but you can test things out to see or email me.)
+(Note to developers: a lot of these notes may be out of date.)
 
 For a filesystem with writing you must have truncate as well as write.
 I'm not sure if flush is necessary (since it doesn't do anything) but I have
@@ -17,9 +16,17 @@ so that they are hidden while you can still cd to those directories without
 the dot. It shouldn't create any conflicts since tags with filetype names
 should refer to files with that type. That all is ultimately up to the user
 though.
+%%%
+If there are such tags, and they shouldn't be visible in listings, then it is
+the responsibility of the user to name the tags appropriately.
+-- Mon Dec  1 04:05:57 CST 2014
 
 Tag Types 
 ----------
+(* This is entirely invalid now. All files have the same "type" which is just a character
+string. The types may be re-expanded later to include integers and floats as well, but the
+list type probably won't return, and the dictionary type definitely won't.
+-- Mon Dec  1 03:57:47 CST 2014 *)
 The tag types are given in the name.types file corresponding to the name.db file for a tagdb.
 A tag's type is created when the tag is created and saved when the database is saved. No code
 other than that dealing with the actual storage and retrieval of the database files
@@ -61,6 +68,7 @@ how to store the data we get in there.
 
 Sending commands/queries to the tag db
 --------------------------------------
+(* This is entirely invalid now. -- Mon Dec  1 03:57:47 CST 2014 *)
 To communicate directly with the tag database, we have a special file called #LISTEN# that
 clients can write to from any location in the filesystem. The #LISTEN# file is not listed
 in any readdir calls, but the name is hard coded into the filesystem but may be moved to a
@@ -116,9 +124,23 @@ entry for the tag union every time a file is inserted into a file drawer, but
 don't check to see if that file is actually new. It wouldn't be too hard to do
 that, but to do a check wouldn't solve the whole problem. Instead, what I could
 do is have 
+%%%
+See the README for how this is handled now. Currently, you can't rename a file
+onto another file. The rename operation succeeds, but only the first file will be
+listed under the name and both the just-renamed file and the other file will be
+listed as <id>#<file-name>.
+-- Mon Dec  1 03:56:47 CST 2014
 
 Transactions and durability
 ---------------------------
+(* This has been partially addressed by using a sqlite3 database for storage.
+My initial thinking was that the transactions should be scoped to file system
+operations, but I was disappointed by the lack of nested transactions in 
+sqlite3. I'll either write a wrapper that checks if a transaction is in 
+progress (not sure that's possible) or just accept database-scoped transactions
+instead. Incidentally, SQLite supports the kind of online backups I talk about
+here: https://www.sqlite.org/capi3ref.html#sqlite3backupinit
+-- Mon Dec  1 03:34:19 CST 2014 *)
 One of the big problems in this alpha stage is that during debugging there 
 often come times when the simplest thing to do is kill TagFS. However, without
 following the normal unmount procedure, the database file doesn't get written.
@@ -132,6 +154,10 @@ the database before transaction commit.
 
 File storage and access in the run-time
 ---------------------------------------
+(* This idea was, essentially, rolled into the sqlite3 database in the file_tag
+table. The performance questions I was considering have been deferred until the
+sqlite3 database shows significant slowdown for the relevant file creation and
+deletion events. -- Mon Dec  1 03:15:50 CST 2014 *)
 During normal operation, files are stored in heap memory and accessed through 
 either a file-id indexed table or a tag-set indexed table (called FileCabinet).
 In order to link an in-memory file back to its disk record, we keep a reference
@@ -162,8 +188,80 @@ structure as byte offsets from the base address of the mapped file.
 
 Finally, tags have a tagdb tagdb_value_t associated with them and regular 
 files have zero or more values associated with them (one for each tag on the 
-file). A tagdb_value_t can be of variable size and like the file-tag relation,
+file). A tagdb_value_t can be of variable size and, like the file-tag relation,
 entries are only added while the file system is mounted. 
 
+Tag Hierarchy
+-------------
+The tag hierarchy is intended to be a way of organizing tags where the rest of
+the system is strictly tag-based. Super-tags serve the function of namespaces
+allowing a user to distinguish tags that should have the same name, but which
+have different semantics (e.g., "animal::bat", "baseball::bat", "wom::bat").
 
+Deleting a tag will promote all of its children to the level of the tag being
+deleted. For example, if a tag named "a" had a child "b", and the tag named "b"
+had a child named "c", then, if the "b"-tag was deleted, the "c" tag becomes a
+direct child of "a". Using shell file utilities, this would look like:
 
+    $ ls
+    a a::b a::b::c
+    $ rmdir a::b
+    $ ls
+    a a::c
+
+(* The things below here haven't been coded yet.
+-- Mon Dec  1 09:14:50 CST 2014 *)
+(* I opted to simply fail the operation rather than do the renaming. It's
+easier and doesn't have the potential problems with additional conflicts and
+unreasonably long names.
+-- Sun Jan 11 16:59:02 CST 2015 *)
+If "a" also had a child "c" in the situation described above, then there would
+be a conflict between the child of "b" and that. We use two strategies to deal
+with this. First, to allow all conflicted files to be recovered using info used
+in the 'remove' operation, we prefix the deleted tag's name on to the child's
+with an underscore character between. Although this doesn't completely respect
+the user's attempt to destroy the semantics of the super-tag, it has the
+advantage that we can list all of the collided files of a tag named "tag" by
+doing "ls tag_*". Using shell utilities:
+
+    $ ls 
+    a a::b a::c a::b::c
+    $ rmdir a::b
+    $ ls
+    a a::c a::b_c
+
+Of course, this first correction may be insufficient if "a" also has a child
+with the name "b_c". To resolve this, we append an underscore character to the
+end of the new name:
+
+    $ ls 
+    a a::b a::b_c a::c a::b::c
+    $ rmdir a::b
+    $ ls
+    a a::c a::b_c a::b_c_
+
+Additional conflicts are addressed by appending more underscores until there is
+no longer a conflict or the max file name size is reached. If the max file name
+size is reached, then the tag-remove operation will fail completely, the tag
+will still be there, and all of its children will have their original names.
+
+Removing a sub-tag relationship does not do any promotion. For the "a::b::c"
+example above, removing the "b::c" relationship would not then establish a new
+"a::c" relationship. In the TagFS, you would remove a sub-tag relationship by
+simply renaming the tag like this:
+
+    $ ls
+    a a::b a::b::c a::b::c::d
+    $ mv a::b::c c
+    $ ls
+    a a::b c c::d
+
+As demonstrated in the example above, a 'remove' is necessarily combined with an
+'insert', so the removal of the "b::c" relationship is paired with the promotion 
+of "c" to a root position.
+
+It is appropriate to think of there being a single root tag that sits at the top
+of the hierarchy, although I have discussed things as though there were multiple
+'root' tags. TagDB is currently structured so that there are several 'root' tags
+but that might change, and in any case it won't affect the description above.
+-- Mon Dec  1 09:29:43 CST 2014
diff --git a/README.md b/README.md
@@ -13,12 +13,15 @@ in the main project directory. Then run:
 
 to build and run the tests to make sure everything works correctly on your system.
 
-Prerequisites
- - Fuse (libs and headers) (>= 2.8.7)
- - GLib (Works with glib versions 2.24.1, 2.30.2, 2.32.3)
- - CUnit for tests (>= 2.1-3)
+Prerequisites:
+ - GCC (4.8.2) or Clang (3.3)
+ - Fuse (libs and headers) (2.8.7)
+ - GLib (Works with glib versions 2.24.1, 2.30.2, 2.32.3, 2.40.2)
+ - CUnit for tests (2.1-3)
  - Valgrind for testing and development
 
+*Other versions may or may not work.*
+
 On Debian/Ubuntu:
 
      $ apt-get install libglib2.0-dev libfuse-dev libcunit1-dev valgrind
@@ -36,9 +39,24 @@ Where `<mount directory>` is an empty directory. TagFS will create the files it
 
 All un-tagged files are shown in the top level. A file can be referenced at any point in the file system by it's id, so a file `movies/Seven_Samurai` with id `12334` can be referenced as `12334#` or `akira_kurosawa/12334#Seven_Samurai` or `some/random/directory/12334#what-even-is-this-file-s-name`.
 
+Tags are created by making a directory under the mounted TagFS (`mkdir tagname`). In general, directories appear in any location where you could open the directory and find more files. Tags can be renamed and made to appear at additional locations by moving the directory (`mv tag a/b/c/tag`). 
+
+Tags can be put in a hierarchy for purposes of organization. If you make a tag `a::b`, TagFS will first create the `a` tag and then the `a::b` tag as its child. An `a::b` directory can then be found under `a` and any files that belong to the `a::b` tag will also appear under `a`. An arbitrarily deep (but resource-limited!) nesting of tags can be made. Renaming tags can detach them from their parent tags, but child tags are only detached from their parents if the child-tags themselves are renamed or the parent tag is deleted.
+
 When you "copy" a file to the mounted tagfs, the file is tagged with the directory name it falls under and thus appears where it would in a normal  hierarchical file system. The actual file content is stored in your tagfs data directory which, by default, is in your tagfs user-data directory. You can set the location of your data directory with the `--data-dir` option to tagfs.
 
-Moving a file already within the tagfs to another directory in the tagfs will add the tags that comprise the path except in the special case for removing tags. To remove tags from a file, move the file to a parent directory of the one you are moving from (`mv a/b/tag-to-remove/1#file a/b`)[1]. Note that this depends ONLY on the starting and ending locations rather than on the tags associated with the file. However, the tags associated with the file are generally the only ones that make up a path to the file. Moving a tag to a new location (`mv the-tag new-location`) will cause the tag to show up there, but it will also remain in the original location; you could do the same thing by calling `mkdir new-location/the-tag` assuming `the-tag` already exists.
+Moving a file already within the tagfs to another directory in the tagfs will add the tags that comprise the path except in the special case for removing tags. To remove tags from a file, move the file to a directory above the one you are moving from (`mv a/b/tag-to-remove/1#file a/b`)[1]. Note that the set of tags removed depends ONLY on the starting and ending locations rather than on all of the tags associated with the file: 
+
+    $ ls a/b/c/d
+    file
+    $ mv a/b/file a/
+    $ ls a/b/c/d
+    $ ls a/c/d
+    file
+
+If the destination location isn't an ancestor of the starting location, no tags will be removed, but tags besides those already attached to the file will be added.
+
+Moving a *directory* to a new location (`mv the-tag new-location`) will cause the directory to show up there, but it will also remain in the original location; you could do the same thing by calling `mkdir new-location/the-tag` assuming `the-tag` already exists.
 
 When listing files, there are situations where two files with the same name would be listed together. In this case, one of the files is listed normally, but all of the files (including that first one) are also listed with their prefixed name (e.g., `1#filename`). This allows for accessing the file under the usual name as well as accessing all of the files regardless of where they are accessed from.
 

diff --git a/TESTING b/TESTING
@@ -15,3 +15,5 @@ The options can be used in conjunction with one another.
     TESTS="test_trie test_log" NO_VALGRIND=1 make tests
 
 Tests are set up to run on Travis-CI under my free account. Currently the acceptance tests don't run because the fuse file system never mounts. I have no idea why.
+
+One small note: If you want to write a test that deals with startup after doing something specific before shutdown of tagfs, then it's currently easier to make a test in test_tagdb.lc rather than in acceptance_test.pl since doing the shutdown and checking data structures is easier in test_tagdb.lc . In any case, all of the file system state that is stored between runs is in the SQLite database and in the "copies" directory.
diff --git a/abstract_file.c b/abstract_file.c
@@ -119,3 +119,8 @@ file_id_t get_file_id (AbstractFile *f)
 {
     return f->id;
 }
+
+void set_file_id (AbstractFile *f, file_id_t id)
+{
+    f->id = id;
+}
diff --git a/abstract_file.h b/abstract_file.h
@@ -24,7 +24,7 @@ const char *abstract_file_get_name (AbstractFile *f);
 void _set_name (AbstractFile *f, const char *new_name);
 int file_id_cmp (AbstractFile *f1, AbstractFile *f2);
 int file_name_cmp (AbstractFile *f1, AbstractFile *f2);
-int file_str_cmp (AbstractFile *f, char *name);
+int file_name_str_cmp (AbstractFile *f, char *name);
 file_id_t get_file_id (AbstractFile *f);
 void set_file_id (AbstractFile *f, file_id_t);
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,3 +15,5 @@ The options can be used in conjunction with one another.
		TESTS="test_trie test_log" NO_VALGRIND=1 make tests

		Tests are set up to run on Travis-CI under my free account. Currently the acceptance tests don't run because the fuse file system never mounts. I have no idea why.

		One small note: If you want to write a test that deals with startup after doing something specific before shutdown of tagfs, then it's currently easier to make a test in test_tagdb.lc rather than in acceptance_test.pl since doing the shutdown and checking data structures is easier in test_tagdb.lc . In any case, all of the file system state that is stored between runs is in the SQLite database and in the "copies" directory.