Releases: mar-file-system/GUFI
0.6.6
Memory Usage
- Changed
struct work
name
to be dynamically allocated, but still contiguous within the struct- Address now points to space after
struct work
- This is similar to, but not the same as, a flexible array member to avoid needing to compile with C++ extensions enabled
- Address now points to space after
- Swap work items to storage if queue limit is hit
gufi_dir2index
,gufi_dir2trace
,gufi_trace2index
, andgufi_query
- use
-M <bytes>
to setqueue_limit
<bytes>
is divided across the number of threads (-n
) andwork
size to producequeue_limit
- use
-s <prefix>
to write swap files to a location that is not pwd
QueuePerThreadPool
- Large amounts of code reorganization and separating out code into functions
- API updates to support swapping,
BottomUp
updates, and to generally have better design
Miscellaneous
CMake
minimum version is now 3.5.0- Updated GitHub Actions Older CMake build
- Now using
_at
functions to reduce cost of path name resolution struct input
now has dynamically allocated values - callinput_fini
to clean updescend
- If
struct dirent
d_type
isDT_UNKNOWN
, calllstat(2)
- Removed unnecessary arguments
- If
- Individual trace files are now scouted in parallel to reduce likelihood of work generation being a bottleneck
- Now some executables dump
VmHWM
from/proc/self/status
at the end - Updated version string to print consistently across C and Python
- Split
BottomUp
code into multiple functions in case non-BottomUp
functions need to be run withBottomUp
- Removed
PRINT_CUMULATIVE_TIMES
,PRINT_PER_THREAD_STATS
, andPRINT_QPTPOOL_QUEUE_SIZE
- Performance History Framework
- gnuplot scripts
- GitHub Actions debug builds
- Updated
gpfs-scan-tool
to compile with latest code
GitHub Actions
- Removed macOS 12 and remnants of 13
- Added macOS 15
- Python 2 build now runs on CentOS 8
0.6.5
gufi_query
- Amortize external database views creation when
-Q
is not provided - Amortize xattr views creation when
-x
is not passed in
gufi_rollup
- Clear out old rollup data before copying new data in
- Unified with
gufi_unrollup
SQL
- Unified with
treesummary
is no longer copied upwards when rolling up- Index
summary.inode
to speed up queries - Fixed accidental modification of index during dry run
QueuePerThreadPool
- Claimed work can now be stolen to prevent starvation caused by long running threads
- If there is work that can be stolen, at least one work item will be taken even if the multiplier results in 0
External Databases
- Admins no longer have to know what files to track
- Changed external databases to be set by users in per-directory files called
external.gufi
that list one path per line- Relative paths with be treated as relative to the source tree (not the current directory in the index)
- Changed
-q
to check that external db files listed are valid
- Now tracked in trace files (old trace files do not have to be changed)
contrib/gufi_sqlite
-> src/gufi_sqlite3
- Added printing results - previous usage did not require it
NEW gufi_index2dir
- Convert an index into a source tree with file sizes of 0
NEW gufi_trace2dir
- Convert trace files into a source tree with file sizes of 0
NEW parallel_cpr
- Parallel
cp -r
Misc
- When descending a directory, if
struct dirent
d_type
is not set, fall back to callinglstat(2)
- Updated
opendb
behavior - Updated
dupdir
behavior - No longer replacing both
search
andprefix
withprefix
in regression test output
GitHub Actions
- Restored code coverage report with codecov
- Updated actions/checkout to v4
- Updated actions/cache to v4
- Added Rocky Linux 9
0.6.4
New: External User Databases
- Allows for arbitrary user data to be attached to filesystem metadata and queried
- Can be rolled up
gufi_dir2index
/gufi_dir2trace
-q
- Added
e
type to trace file format - does not affect old trace files
- Added
gufi_query -I -Q
- Added new views:
esummary
,epentries
,exsummary
,expentries
,evrsummary
evrpentries
,evrxsummary
,evrxpentries
- Always available, but will not be filled unless
-Q
is used.
- Always available, but will not be filled unless
- Reorganized
processdir
to be easier to read
- Added new views:
- External user database count is tracked in
treesummary
- Removed
attachname
column fromexternal_dbs
Extended Attributes
xattrs
view and convenience views are now always available, but only filled when-x
is passed togufi_query
gufi_client
now calls ssh
with subprocess.Popen
instead of paramiko
parallel_rmr
top-level directory bug fix
Longitudinal Snapshot
- More columns
- Cache intermediate results
- Allow for rolled up indexes
- Different views
- Graph (G)
- Per Level (L)
- Siblings (S)
- Per Directory (D)
Dependencies
- Updated
sqlite3-pcre
to usepcre2
- Existing installs should delete the GUFI sqlite3-pcre build/install and rebuild
- Removed
paramiko
tarball - Added new SQLite3 patch to increase attach limit to 254
- Existing installs should delete the GUFI SQLite3 build/install and rebuild
GitHub Actions
- Removed macOS 11 and 13
- Added macOS 14
- Added Ubuntu 24.04
- Now uploading PDFs and RPMs to tagged releases
- Added test on Windows with cygwin
- Codecov actions update is causing issues, so changed to not error on upload failure
0.6.3
gufi_query
- Input paths can now be symlinks
- Immediate subdirectories of input paths can now be symlinks
gufi_query
will get the realpath of the top-level input paths for traversal, but the custom SQLite functionspath
andrpath
will print the current path with the original user provided prefixfpath
still prints the actual path
Schema changes
- Added
ppinode
topentries
dmaxgidI
→dmaxgid
inode INT64
→inode TEXT
New: contrib/longitudinal_snapshot.py
- Takes snapshot of an index tree and summaries each directory's metadata i.e. min, max, mean, median, histograms, etc. of file size, file count, timestamps, string lengths, etc., and places data into a single SQLite database file that is much smaller than the index itself
- Recommend running
gufi_treesummary_all
before generating a snapshot
- Recommend running
- See Discussion in #149
New: contrib/treediff
- Walks directory tree and prints top-most directory mismatches
More tests
- Added empty directory to test tree
- Added deploy test
GitHub Actions
- macOS 11 → macOS 14
- Now keeping pdf documentation as artifacts when building main branch for 14 days
0.6.2
Schema Changes
summary
now has a 0 size count column calledtotzero
- New views
summarylong
andvrsummarylong
joinsummary
andvrsummary
with tables/views that contain additional data that should be associated with them but do not need to be added into thesummary
table. Currently, no extra information is attached.
${SEARCH}
now contains an empty db.db
to guarantee a db.db
above all indexes under ${SEARCH}
.
- This can be expanded in the future to add information that is separate from the rest of the index tree.
- Fixes #49
NEW: gufi_treesummary_all
generates treesummary
tables for all directories in an index instead of one directory at a time.
gufi_rollup
now also generates treesummary
tables while processing index
gufi_unrollup
does not removetreesummary
tables because there is no way to tell whether or not they were generated bygufi_rollup
or not. Might add column to say what utility was used to generate them in the future.
gufi_stat
→ gufi_stat_bin
gufi_stat
is now a script that callsgufi_stat_bin
- Server configuration file now also needs the path to
gufi_stat_bin
gufi_stats
- average-leaf-files
- average-leaf-links
- average-leaf-size
- median-leaf-links
- median-leaf-size
- filesize-log2-bins
- filesize-log1024-bins
- dirfilecount-log2-bins (#146)
- dirfilecount-log1024-bins (#146)
Scripts now have a --verbose
/-V
flag to print the command being run (#142)
bfwreaddirplus2db
was reorganized.
NEW: split_trace
splits trace files into chunks for parallel processing by gufi_trace2index
SQLite3
-
Updated from version 3.27 to version 3.43 to get built-in math functions
- Existing indexes should be rebuilt
-
Also added math functions
stdevs
,stdevp
, andmedian
-
Replaced
subdirs_walked()
withsubdirs(srollsubdirs, sroll)
When printing result columns, the delimiter after the last column is no longer printed
- Prevents pandas from unnecessarily generating a column of
None
s when parsing output
Significant increase in testing and code coverage
CMake
- db2, fuse, and gpfs tool building can be disabled even if the libraries are found
- Added
make pylint
,make shellcheck
, andmake checkstyle
gufi_client_jail
should not have been created- Example configuration files are no longer renamed to
config.example
GitHub Actions
- Now building on macOS 11, 12, and 13
- Now building with
-Wall -Wextra -Werror -pedantic
Added cygwin GCC support (not tested with CI)
0.6.1
Reduced size of struct work
Added optional work compression with zlib to gufi_dir2index
, gufi_dir2trace
, and gufi_query
Added in-situ processing of work items in descend function - after enqueuing n directories, the remaining immediate directories are processed in the parent thread instead of enqueued
gufi_query
no longer requires at least one of -T
, -S
, or -E
Changed gufi_trace2index
to read from file descriptors using pread(2)
instead of FILE *
s with getline(3)
Removed BENCHMARK
macro
Documentation and test updates
QueuePerThreadPool
- Added soft memory limit to via deferred processing
- If a thread's wait queue gets too big, new work items are placed in a different queue so they are not processed until the wait queue is empty
QPTPool_enqueue
now returns whether the new work item was placed in the wait queue or in the deferred queue
QPTPool_init
now only requires thread count and thread arguments to initialize- The other properties can be optionally set with setter functions
- Previous
QPTPool_init
has been renamed toQPTPool_init_with_props
- Symmetrical start up (
QPTPool_init
andQPTPool_start
) and end (QPTPool_wait
andQPTPool_destroy
)
SQLite3
- Renamed
path()
torpath()
- Returns full path properly for original and rolled up indicies
- Use with new views
vrsummary
,vrpentries
,vrxsummary
, andvrxpentries
- Restored
path()
,epath()
, andfpath()
functions - Removed alignment arguments from functions
- Updated URI processing to replace percent characters
Renamed
bfti
→gufi_treesummary
rollup
→gufi_rollup
unrollup
→gufi_unrollup
Performance History Framework
- Added helper script that allows user to specify a range of commits and how many times to benchmark each commit
- Downloads second copy of repo
- Added support for collecting new/renamed/removed
cumulative_times
debug values forgufi_query
in older commits - Plotting supports including or excluding commits without data
- More documentation
Removed INSTALL, NOTES.txt, Makefiles, and bfmi
Added SQL guide
Added presentation from MSST 2023
0.6.0
Extended Attributes (xattrs) support
- Secure storage and retrieval of user data
- Generic permission/user/group "external data" framework
- Can be rolled up
gufi_query
- Moved into its own directory
- Reorganized into smaller files
- Output targets with and without aggregation are now clearly defined
Enabled GitHub Actions
- Removed Travis CI
- Automatic testing on multiple OSs
- RPM packages
- Run pylint and shellcheck to clean up scripts
- Run valgrind to check for memory leaks
- Code scanning with CodeQL
- Code coverage with CodeCov
SQLite3
- Now handling URI paths
- Updated
path()
function - Removed
fpath()
- Removed
epath()
Documentation
- Added user, administrator, and developer guides (LaTeX)
- Added PDFs of slides from presentations
- Added citation to SC22 paper
Updated QueuePerThreadPool API
Added ability to skip directories listed in a file with -k
Added performance history collection framework
All Python code should work with Python 2 and Python 3
Increased testing
XATTR
0.5.2-rc2 Add CentOS 7 Docker image
XAttr
Initial implementation of querying with XAttrs
0.5.2-rc0
Added
rollup
executable to reduce the number of opens that need to be done during a tree walk.unrollup
to remove rollup information from a rolled up tree.parallel_rmr
to delete trees in parallel.