Possible Future Work for CRADA Year Two or Otherwise

SimpleByteRange Index Entry Compression

Start time could be 6 bytes if you use days from a recent year and microseconds since that day, gives you hundreds of years

End time could be 4 bytes if you use elapsed from start in microseconds if you have an hour max for the elapse, 5 bytes is about 300 hours elapsed in microseconds

Make sure we don't load end time when you open for read

Data Buffering

Fopen fwrite with buffer tuning, why not stick this as an option in fstest and try write size n to n sweep on Cray Lustre to see if it fixes shallow slope, try it on Panasas on Cray to see if it helps DVS, try it on Panasas on cluster, we could try it on GPFS also to be complete. That way we know if this is worth doing and if we get write buffering for nearly free. Seems easy to add to fstest and Alfred could run the write size n to n sweep test with write and fwrite to compare.

If this is good, we can consider doing an IOStore::Write() parent class to the IOHandle::write and make IOHandle::write private and a friend to IOStore so that only IOStore can call the IOHandle::write. Then we can add buffering in the IOStore::Write() and all IOStores can benefit from buffering. We'd need each IOHandle to provide an optional buffer which the IOStore::Write() wrapper could use when available.

Data Compression

Compression test. Why don't we take your program that reads the plfs map, serially, create a program that reads the map entry, reads the data associated with the record, if bigger than X bytes, runs compress on that read data, and reports on ave, min, max and total compression for the file. We run it against rage and silverton files to see if either wins on compression so we can see if adding compression to plfs is worth the trouble. Also measure time to compress and add that up and report so that we know the overhead. Seems like a simple thing to do?

Again, this could be in the IOStore::Write() parent class as described above in the Data Buffering section.

Fix O_RDWR when user really should have opened O_READONLY

Currently in O_RDWR mode, we destroy the index and recreate it for every read. This is probably the most correct behavior to help ensure that writes by another process are available for reads by others. However, it kills performance so we should sacrifice the correct behavior in favor of performance. We just need to document that reads following writes by others in N-1 O_RDWR is undefined and this is probably true of most file systems and users shouldn't be doing N-1 in O_RDWR anyway.

Move collectives into the library

Pass a table of message exchange general function pointers. So we'd have a table like in MPI-IO where we specify things like ad_plfs_open. This table would have things like plfs_broadcast and the ad_plfs layer would set that function pointer to MPI_Broadcast and the upc layer would do something similar. This would make the ad_plfs layer trivially small. ad_plfs_open would just pass its args directly to plfs_open along with the table of function pointers. In the library, if the table isn't passed, then plfs_open just does what it currently does. But if the table is passed, then plfs_open would do the optimizations that are currently in the ad_plfs_open code but would use the function pointers instead of the MPI_* calls. This then would allow PLFS to build these optimizations without linking against MPI. Then the thin ad_plfs layer could be patched into official MPI distributions and we wouldn't have to worry about updating them when we want to change the optimizations.

Get Doxygen working on the code

Jingwang started doing this on some of his code. I think Chuck was as well. Then figure out how to make the doxygen webpage publicly available.

Get Gerrit working with the existing github

The code review that John has been doing with EMC is a fantastic interface. github.com is not nearly so nice.

Redesign metalinks

We're not sure that the current design of metalinks is as good as it can and should be.

Merge Jun's index compression and Zhenhua's.

Although both are waiting for at-scale testing to see whether they deliver the performance improvements we expect.

Going through trac and cleaning up small lingering bugs

These are listedhere and marked with "bug"

Start porting the new SmallFile and new Index Compression branches to the new IOStore branch

Chuck ported the existing trunk and it wasn't super difficult. One hard part might be what to do about the fd cache in small file code.

Not creating unnecessary empty files in container mode.

This could also be done in IOStore::Open wrapper class which could do deferred writes. This would be an option to the IOStore::Open class which could specify things either to defer to first write or to immediately open. Then if there were never writes, it'd never be opened. This would also rely on an IOStore::Write wrapper.

Creating an IOHandle pool of recently opened IOHandle's.

We don't necessarily have to immediately close every one so that if a caller closes one and then reopens one, existing handles in the pool could be reused to reduce the number of actual system calls coming out of the PLFS library going to the backends

Adding a checksum to the SimpleByteRange index entry

This would give better data integrity by detecting data corruption on the read.

Large scale performance regression test suite up and running

This is something that only LANL can deliver.

plfs collectives

This is what was described in the Fast Forward proposal that we delivered with Cray. This is similar to MPI-IO collective IO except that MPI-IO transfers a lot of data in order to get filesytem aligned blocks. Within PLFS, we need to transfer only a much smaller amount of data so that everyone writes the same amount of data. This will enable patterned index entries instead of SimpleByteRange entries.

Better deletion of containers

Currently when we unlink a container, we just recurse and try to delete everything. Sometimes however there might be a dropping which we can't unlink for some reason. If we have already removed the access file then the unlink will fail and then the container will now appear as a directory and this can cause all sorts of problems. So we should delay deleting the access file until the very last operation (before removing toplevel).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly