diff --git a/README.md b/README.md index bea343c..609f2f7 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,7 @@ A reading list related to storage systems, including data deduplication, erasure 22. *The Dilemma between Deduplication and Locality: Can Both be Achieved?*---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html) 23. *SLIMSTORE: A Cloud-based Deduplication System for Multi-version Backups*----ICDE'21 ([link](http://www.cs.utah.edu/~lifeifei/papers/slimstore-icde21.pdf)) 24. *Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling*----ACM TOS'21 ([link](https://dl.acm.org/doi/full/10.1145/3459626)) +25. *BURST: A Chunk-Based Data Deduplication System with Burst-Encoded Fingerprint Matching*----MSST'24 ([link](https://www.msstconference.org/MSST-history/2024/Papers/msst24-1.2.pdf)) ### Restore Performances @@ -158,6 +159,7 @@ A reading list related to storage systems, including data deduplication, erasure 22. *Palantir: Hierarchical Similarity Detection for Post-Deduplication Delta Compression*----ASPLOS'24 ([link](https://qiangsu97.github.io/files/asplos24spring-final6.pdf)) 23. *DedupSearch: Two-Phase Deduplication Aware Keyword Search*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-elias.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupSearch-FAST'22.html) 24. *Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index*----FAST'24 ([link](https://www.usenix.org/system/files/fast24-levi.pdf)) [summary](https://yzr95924.github.io/paper_summary/IDEA-FAST'24.html) +25. *Is Low Similarity Threshold A Bad Idea in Delta Compression?*----HotStorage'24 ([link](https://henryhxu.github.io/share/hongming-hotstorage24.pdf)) ### Memory && Block-Layer Deduplication @@ -361,6 +363,11 @@ A reading list related to storage systems, including data deduplication, erasure 1. *How the Great Firewall of China Detects and Blocks Fully Encrypted Traffic*----USENIX Security'23 ([link](https://people.cs.umass.edu/~amir/papers/UsenixSecurity23_Encrypted_Censorship.pdf)) ## General Storage + +### HDD, SMR + +1. *Revisiting HDD Rules of Thumb: 1/3 Is Not (Quite) the Average Seek Distance*----MSST'24 ([link](https://www.msstconference.org/MSST-history/2024/Papers/msst24-1.1.pdf)) + ### Distributed Storage System 1. *MapReduce: Simplified Data Processing on Large Clusters*----OSDI'04 ([link](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)) 2. *Cumulus: Filesystem Backup to the Cloud*----FAST'09 ([link](https://www.usenix.org/legacy/event/fast09/tech/full_papers/vrable/vrable.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cumulus-FAST'09.html) @@ -381,14 +388,15 @@ A reading list related to storage systems, including data deduplication, erasure 1. *TinyLFU: A Highly Efficient Cache Admission Policy*----ACM TOS'17 ([link](https://arxiv.org/pdf/1512.00727.pdf)) 2. *Hyperbolic Caching: Flexible Caching for Web Applications*----USENIX ATC'17 ([link](https://www.cs.princeton.edu/~mfreed/docs/hyperbolic-atc17.pdf)) -3. *It’s Time to Revisit LRU vs. FIFO*----HotStorage'20 ([link](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cache-HotStorage'20.html) [trace](http://iotta.snia.org/traces/key-value) -4. *The CacheLib Caching Engine: Design and Experiences at Scale*----OSDI'20 ([link](https://www.usenix.org/system/files/osdi20-berg.pdf)) -5. *Unifying the Data Center Caching Layer — Feasible? Profitable?*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470884)) -6. *Learning Cache Replacement with Cacheus*----FAST'21 ([link](https://www.usenix.org/system/files/fast21-rodriguez.pdf)) -7. *Kangaroo: Caching Billions of Tiny Objects on Flash*----SOSP'21 ([link](https://jasony.me/publications/sosp21-kangaroo.pdf)) -8. *Segcache: a Memory-efficient and Scalable In-memory Key-value Cache for Small Objects*----NSDI'21 ([link](https://jasony.me/publications/nsdi21-segcache.pdf)) -9. *FarReach: Write-back Caching in Programmable Switches*----USENIX ATC'23 ([link](http://www.cse.cuhk.edu.hk/~pclee/www/pubs/atc23.pdf)) -10. *FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion*----HotOS'23 ([link](https://www.pdl.cmu.edu/PDL-FTP/Storage/Yang-FIFO-HotOS23.pdf)) +3. *Flashield: a Hybrid Key-value Cache that Controls Flash Write Amplification*----USENIX NSDI'19 ([link]()) +4. *It’s Time to Revisit LRU vs. FIFO*----HotStorage'20 ([link](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cache-HotStorage'20.html) [trace](http://iotta.snia.org/traces/key-value) +5. *The CacheLib Caching Engine: Design and Experiences at Scale*----OSDI'20 ([link](https://www.usenix.org/system/files/osdi20-berg.pdf)) +6. *Unifying the Data Center Caching Layer — Feasible? Profitable?*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470884)) +7. *Learning Cache Replacement with Cacheus*----FAST'21 ([link](https://www.usenix.org/system/files/fast21-rodriguez.pdf)) +8. *Kangaroo: Caching Billions of Tiny Objects on Flash*----SOSP'21 ([link](https://jasony.me/publications/sosp21-kangaroo.pdf)) +9. *Segcache: a Memory-efficient and Scalable In-memory Key-value Cache for Small Objects*----NSDI'21 ([link](https://jasony.me/publications/nsdi21-segcache.pdf)) +10. *FarReach: Write-back Caching in Programmable Switches*----USENIX ATC'23 ([link](http://www.cse.cuhk.edu.hk/~pclee/www/pubs/atc23.pdf)) +11. *FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion*----HotOS'23 ([link](https://www.pdl.cmu.edu/PDL-FTP/Storage/Yang-FIFO-HotOS23.pdf)) ### Hash @@ -402,18 +410,15 @@ A reading list related to storage systems, including data deduplication, erasure 1. *A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring*----IPDPS'10 ([link](https://www.cse.cuhk.edu.hk/~pclee/www/pubs/ipdps10.pdf)) 2. *Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-chen_jian.pdf)) -### SSD, NVMe +### SSD, Flash 1. *Design Tradeoffs for SSD Performance*----USENIX ATC'08 ([link](https://www.usenix.org/legacy/events/usenix08/tech/full_papers/agrawal/agrawal.pdf)) 1. *Design Tradeoffs for SSD Reliability*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/fast19-kim-bryan.pdf)) 1. *The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments*----FAST'16 ([link](https://www.usenix.org/system/files/conference/fast16/fast16-papers-hao.pdf)) 1. *The Unwritten Contract of Solid State Drives*----EuroSys'17 ([link](https://dl.acm.org/doi/pdf/10.1145/3064176.3064187)) -1. *ZNS: Avoiding the Block Interface Tax for Flash-based SSDs*----USENIX ATC'21 ([link](https://www.usenix.org/system/files/atc21-bjorling.pdf)) [code](https://github.com/westerndigitalcorporation/zenfs) -1. *ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction*----OSDI'21 ([link](https://www.usenix.org/system/files/osdi21-han.pdf)) 1. *The CASE of FEMU: Cheap, Accurate, Scalable and Extensible Flash Emulator*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-li.pdf)) [summary](https://yzr95924.github.io/paper_summary/FEMU-FAST'18.html) 1. *From blocks to rocks: a natural extension of zoned namespaces*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470870)) 1. *Don’t Be a Blockhead: Zoned Namespaces Make Work on Conventional SSDs Obsolete*----HotOS'21 ([link](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s07-stavrinos.pdf)) [summary](https://yzr95924.github.io/paper_summary/BlockHead-HotOS'21.html) -1. Zone Append: A New Way of Writing to Zoned Storage----Vault'20 ([link](https://www.usenix.org/system/files/vault20_slides_bjorling.pdf)) 1. *What Systems Researchers Need to Know about NAND Flash*----HotStorage'13 ([link](https://www.usenix.org/system/files/conference/hotstorage13/hotstorage13-desnoyers.pdf)) 1. *Caveat-Scriptor: Write Anywhere Shingled Disks*----HotStorage'15 ([link](https://www.usenix.org/system/files/conference/hotstorage15/hotstorage15-kadekodi.pdf)) 1. *Towards an Unwritten Contract of Intel Optane SSD*----HotStorage'19 ([link](https://www.usenix.org/system/files/hotstorage19-paper-wu-kan.pdf)) @@ -428,28 +433,21 @@ A reading list related to storage systems, including data deduplication, erasure 1. *NVMeVirt: A Versatile Software-defined Virtual NVMe Device*----FAST'23 ([link](https://www.usenix.org/system/files/fast23-kim.pdf)) 1. *Excessive SSD-Internal Parallelism Considered Harmful*----HotStorage'23 ([link](https://dl.acm.org/doi/pdf/10.1145/3599691.3603412)) 1. *Is Garbage Collection Overhead Gone? Case study of F2FS on ZNS SSDs*----HotStorage'23 ([link](https://dl.acm.org/doi/pdf/10.1145/3599691.3603409)) - -### File system - -1. *Scale and Concurrency of GIGA+: File System Directories with Millions of Files*----FAST''11 ([link](https://www.usenix.org/legacy/event/fast11/tech/full_papers/PatilNew.pdf)) -2. *Journaling of Journal Is (Almost) Free*----FAST'14 ([link](https://www.usenix.org/system/files/conference/fast14/fast14-paper_shen.pdf)) -3. *F2FS: A New File System for Flash Storage*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-lee.pdf)) -4. *POSIX is Dead! Long Live... errr... What Exactly?*----HotStorage'15 ([link](https://www.fsl.cs.stonybrook.edu/docs/cosy-hotos/hotstorage17posux.pdf)) -5. *BetrFS: A Right-Optimized Write-Optimized File System*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-jannen_william.pdf)) -6. *File Systems Fated for Senescence? Nonsense, Says Science!*----FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-conway.pdf)) -7. *To FUSE or Not to FUSE: Performance of User-Space File Systems*----FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-vangoor.pdf)) -8. *iJournaling: Fine-Grained Journaling for Improving the Latency of Fsync System Call*----USENIX ATC'17 ([link](https://www.usenix.org/system/files/conference/atc17/atc17-park.pdf)) -9. *The Full Path to Full-Path Indexing*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-zhan.pdf)) -10. *SplitFS: persistent-memory file system that reduces software overhead*----SOSP'19 ([link](https://www.cs.utexas.edu/~vijay/papers/sosp19-splitfs.pdf)) -11. *EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-gao.pdf)) -12. *Performance and Resource Utilization of FUSE User-Space File Systems*----ACM TOS'19 ([link](https://dl.acm.org/doi/10.1145/3310148)) -13. *Filesystem Aging: It's more Usage than Fullness*----HotStorage'19 ([link](https://www.cs.unc.edu/~porter/pubs/hotstorage19-paper-conway.pdf)) -14. *How to Copy Files*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-zhan.pdf)) -15. *XFUSE: An Infrastructure for Running Filesystem Services in User Space*----USENIX ATC'21 ([link](https://www.usenix.org/system/files/atc21-huai.pdf)) -16. *WineFS: a hugepage-aware file system for persistent memory that ages gracefully*----SOSP'21 ([link](https://www.cs.utexas.edu/~vijay/papers/winefs-sosp21.pdf)) -17. *LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism*----SOSP'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3477132.3483565)) -18. *BetrFS: A Compleat File System for Commodity SSDs*----EuroSys'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3492321.3519571)) -19. *Survey of Distributed File System Design Choices*----ACM TOS'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3465405)) +1. *ZapRAID: Toward High-Performance RAID for ZNS SSDs via Zone Append*----ApSys'23 ([link](https://www.cse.cuhk.edu.hk/~pclee/www/pubs/apsys23.pdf)) +1. *BypassD: Enabling fast userspace access to shared SSDs*----ASPLOS'24 ([link](https://dl.acm.org/doi/pdf/10.1145/3617232.3624854)) + +### Open-Channel SSD, ZNS, SMR + +1. *LightNVM: The Linux Open-Channel SSD Subsystem*----USENIX FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-bjorling.pdf)) +2. *ZoneAlloy: Elastic Data and Space Management for Hybrid SMR Drives*----HotStorage'19 ([link](https://www.usenix.org/system/files/hotstorage19-paper-wu-fenggang.pdf)) +3. *Zone Append: A New Way of Writing to Zoned Storage*----Vault'20 ([link](https://www.usenix.org/system/files/vault20_slides_bjorling.pdf)) +4. *ZNS: Avoiding the Block Interface Tax for Flash-based SSDs*----USENIX ATC'21 ([link](https://www.usenix.org/system/files/atc21-bjorling.pdf)) [code](https://github.com/westerndigitalcorporation/zenfs) +5. *ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction*----OSDI'21 ([link](https://www.usenix.org/system/files/osdi21-han.pdf)) +6. *RAIZN: Redundant Array of Independent Zoned Namespaces*----ASPLOS'23 ([link](https://dl.acm.org/doi/pdf/10.1145/3575693.3575746)) +7. *An Efficient Order-Preserving Recovery for F2FS with ZNS SSD*----HotStorage'23 ([link](https://www.hotstorage.org/2023/papers/hotstorage23-final108.pdf)) +8. *Is Garbage Collection Overhead Gone? Case study of F2FS on ZNS SSDs*----HotStorage'23 ([link](https://huaicheng.github.io/p/hotstorage23-zgc.pdf)) +9. *A Free-Space Adaptive Runtime Zone-Reset Algorithm for Enhanced ZNS Efficiency*----HotStorage'23 ([link](https://discos.sogang.ac.kr/file/2023/intl_conf/HotStorage_2023_S_Byeon.pdf)) +10. *Can ZNS SSDs be Better Storage Devices for Persistent Cache?*----HotStorage'24 ([link](https://dl.acm.org/doi/pdf/10.1145/3655038.3665946)) [summary](https://yzr95924.github.io/paper_summary/ZNS_SSD_Cache-HotStorage'24.html) ### Non-volatile Memory @@ -518,14 +516,71 @@ A reading list related to storage systems, including data deduplication, erasure 1. *GPFS: A Shared-Disk File System for Large Computing Clusters*----FAST'02 ([link](https://www.usenix.org/legacy/publications/library/proceedings/fast02/full_papers/schmuck/schmuck.pdf)) 2. *Efficient Object Storage Journaling in a Distributed Parallel File System*----FAST'10 ([link](https://www.usenix.org/legacy/events/fast10/tech/full_papers/oral.pdf)) -3. *Taking back control of HPC file systems with Robinhood Policy Engine*----arxiv'15 ([link](https://arxiv.org/abs/1505.01448)) -4. *Lustre Lockahead: Early Experience and Performance using Optimized Locking*----CUG'17 ([link](https://cug.org/proceedings/cug2017_proceedings/includes/files/pap141s2-file1.pdf)) -5. *LPCC: Hierarchical Persistent Client Caching for Lustre*----SC'19 ([link](https://dl.acm.org/doi/pdf/10.1145/3295500.3356139)) [slides](https://sc19.supercomputing.org/proceedings/tech_paper/tech_paper_files/pap112s5.pdf) -6. *A Performance Study of Lustre File System Checker: Bottlenecks and Potentials*----MSST'19 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8890077&casa_token=uy7uU5C8DQ4AAAAA:9Sp-zG-QWKhgkn5QkmpxDTuHmGljhJJEoq_c9bzVSYb9gUD5eXk2orJYhnvLdQE0HY3RaIRG_9zDYA)) -7. *I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning*----ICPP'19 ([link](https://dl.acm.org/doi/pdf/10.1145/3337821.3337902)) -8. *HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers*----FAST'23 ([link](https://www.usenix.org/system/files/fast23-he.pdf)) -9. *Accelerating I/O performance of ZFS-based Lustre file system in HPC environment*----Journal of Supercomputing'23 ([link](https://link.springer.com/article/10.1007/s11227-022-04966-7)) -10. *MetaWBC: POSIX-compliant Metadata Write-back Caching for Distributed File Systems*----SC'22 ([link](https://dl.acm.org/doi/pdf/10.5555/3571885.3571959)) -11. *Xfast: Extreme File Attribute Stat Acceleration for Lustre*----SC'23 ([link](https://dl.acm.org/doi/10.1145/3581784.3607080)) [slides](http://lustrefs.cn/wp-content/uploads/2023/11/CLUG2023_12_Emoly_Liu_Qian_Yingjin_Xfast_Extreme_File_Attribute_Stat_Acceleration_for_Lustre.pdf) -12. *The I/O Trace Initiative: Building a Collaborative I/O Archive to Advance HPC*----SC-workshop'23 ([link](https://salkhordeh.de/publication/trace-pdsw/trace-pdsw.pdf)) -13. *Combining Buffered I/O and Direct I/O in Distributed File Systems*----FAST'24 ([link](https://www.usenix.org/system/files/fast24-qian.pdf)) [slides](https://www.usenix.org/system/files/fast24_slides-qian.pdf) [summary](https://yzr95924.github.io/paper_summary/Lustre_BIO_DIO-FAST'24.html) +3. *Tips and Tricks for Diagnosing Lustre Problems on Cray Systems*----CUG'11 ([link](https://cug.org/5-publications/proceedings_attendee_lists/CUG11CD/pages/1-program/final_program/Wednesday/12A-Spitz-Paper.pdf)) +4. *Lustre Resiliency: Understanding Lustre Message Loss and Tuning for Resiliency*----CUG'15 ([link](https://cug.org/proceedings/cug2015_proceedings/includes/files/pap101.pdf)) +5. *Taking back control of HPC file systems with Robinhood Policy Engine*----arxiv'15 ([link](https://arxiv.org/abs/1505.01448)) +6. *Lustre Lockahead: Early Experience and Performance using Optimized Locking*----CUG'17 ([link](https://cug.org/proceedings/cug2017_proceedings/includes/files/pap141s2-file1.pdf)) +7. *LPCC: Hierarchical Persistent Client Caching for Lustre*----SC'19 ([link](https://dl.acm.org/doi/pdf/10.1145/3295500.3356139)) [slides](https://sc19.supercomputing.org/proceedings/tech_paper/tech_paper_files/pap112s5.pdf) +8. *A Performance Study of Lustre File System Checker: Bottlenecks and Potentials*----MSST'19 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8890077&casa_token=uy7uU5C8DQ4AAAAA:9Sp-zG-QWKhgkn5QkmpxDTuHmGljhJJEoq_c9bzVSYb9gUD5eXk2orJYhnvLdQE0HY3RaIRG_9zDYA)) +9. *I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning*----ICPP'19 ([link](https://dl.acm.org/doi/pdf/10.1145/3337821.3337902)) +10. *HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers*----FAST'23 ([link](https://www.usenix.org/system/files/fast23-he.pdf)) +11. *Accelerating I/O performance of ZFS-based Lustre file system in HPC environment*----Journal of Supercomputing'23 ([link](https://link.springer.com/article/10.1007/s11227-022-04966-7)) +12. *MetaWBC: POSIX-compliant Metadata Write-back Caching for Distributed File Systems*----SC'22 ([link](https://dl.acm.org/doi/pdf/10.5555/3571885.3571959)) +13. *Xfast: Extreme File Attribute Stat Acceleration for Lustre*----SC'23 ([link](https://dl.acm.org/doi/10.1145/3581784.3607080)) [slides](http://lustrefs.cn/wp-content/uploads/2023/11/CLUG2023_12_Emoly_Liu_Qian_Yingjin_Xfast_Extreme_File_Attribute_Stat_Acceleration_for_Lustre.pdf) +14. *The I/O Trace Initiative: Building a Collaborative I/O Archive to Advance HPC*----SC-workshop'23 ([link](https://salkhordeh.de/publication/trace-pdsw/trace-pdsw.pdf)) +15. *Combining Buffered I/O and Direct I/O in Distributed File Systems*----FAST'24 ([link](https://www.usenix.org/system/files/fast24-qian.pdf)) [slides](https://www.usenix.org/system/files/fast24_slides-qian.pdf) [summary](https://yzr95924.github.io/paper_summary/Lustre_BIO_DIO-FAST'24.html) + +## File System + +### File Fragmentation + +1. *The Effects of Filesystem Fragmentation*----OLS'06 ([link](https://www.landley.net/kdocs/ols/2006/ols2006v1-pages-193-208.pdf)) +2. *Ext4 Block and Inode Allocator Improvements*----OLS'08 ([link](https://www.kernel.org/doc/ols/2008/ols2008v1-pages-263-274.pdf)) +3. *File Systems Fated for Senescence? Nonsense, Says Science!*----FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-conway.pdf)) +4. *Filesystem Aging: It's more Usage than Fullness*----HotStorage'19 ([link](https://www.cs.unc.edu/~porter/pubs/hotstorage19-paper-conway.pdf)) + +### File System Analysis + +1. *Understanding Configuration Dependencies of File Systems*----HotStorage'22 ([link](https://www.hotstorage.org/2022/camera-ready/hotstorage22-132/pdf/hotstorage22-132.pdf)) +2. *CONFD: Analyzing Configuration Dependencies of File Systems for Fun and Profit*----FAST'24 ([link](https://www.usenix.org/system/files/fast23-mahmud.pdf)) + +### Journaling + +1. *Journaling of Journal Is (Almost) Free*----FAST'14 ([link](https://www.usenix.org/system/files/conference/fast14/fast14-paper_shen.pdf)) +1. *iJournaling: Fine-Grained Journaling for Improving the Latency of Fsync System Call*----USENIX ATC'17 ([link](https://www.usenix.org/system/files/conference/atc17/atc17-park.pdf)) +1. *FastCommit: Resource-efficient, Performant and Cost-effective File System Journaling*----USENIX ATC'24 ([link](https://www.usenix.org/system/files/atc24-shirwadkar.pdf)) + +### Page Cache + +1. *StreamCache: Revisiting Page Cache for File Scanning on Fast Storage Devices*----USENIX ATC'24 ([link](https://www.usenix.org/system/files/atc24-li-zhiyue.pdf)) + +### System Design + +1. *The Linear Tape File System*----MSST'10 ([link](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=55becb668bc6cbf0c13b09caa92b849246c36882)) +2. *Scale and Concurrency of GIGA+: File System Directories with Millions of Files*----FAST''11 ([link](https://www.usenix.org/legacy/event/fast11/tech/full_papers/PatilNew.pdf)) +3. *F2FS: A New File System for Flash Storage*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-lee.pdf)) +4. *POSIX is Dead! Long Live... errr... What Exactly?*----HotStorage'15 ([link](https://www.fsl.cs.stonybrook.edu/docs/cosy-hotos/hotstorage17posux.pdf)) +5. *BetrFS: A Right-Optimized Write-Optimized File System*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-jannen_william.pdf)) +6. *The Full Path to Full-Path Indexing*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-zhan.pdf)) +7. *SplitFS: persistent-memory file system that reduces software overhead*----SOSP'19 ([link](https://www.cs.utexas.edu/~vijay/papers/sosp19-splitfs.pdf)) +8. *EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-gao.pdf)) +9. *How to Copy Files*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-zhan.pdf)) +10. *WineFS: a hugepage-aware file system for persistent memory that ages gracefully*----SOSP'21 ([link](https://www.cs.utexas.edu/~vijay/papers/winefs-sosp21.pdf)) +11. *LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism*----SOSP'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3477132.3483565)) +12. *BetrFS: A Compleat File System for Commodity SSDs*----EuroSys'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3492321.3519571)) + +### FUSE + +1. *To FUSE or Not to FUSE: Performance of User-Space File Systems*----FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-vangoor.pdf)) +2. *Performance and Resource Utilization of FUSE User-Space File Systems*----ACM TOS'19 ([link](https://dl.acm.org/doi/10.1145/3310148)) +3. *XFUSE: An Infrastructure for Running Filesystem Services in User Space*----USENIX ATC'21 ([link](https://www.usenix.org/system/files/atc21-huai.pdf)) + +### Survey + +1. *Survey of Distributed File System Design Choices*----ACM TOS'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3465405)) + +## Storage + AI + +### LLM in Storage + +1. *Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?*----HotStorage'24 ([link](https://asu-idi.github.io/publications/files/HS24_GPT_Project.pdf)) diff --git a/paper_figure/image-20240805014222417.png b/paper_figure/image-20240805014222417.png new file mode 100644 index 0000000..4eeb080 Binary files /dev/null and b/paper_figure/image-20240805014222417.png differ diff --git a/paper_figure/image-20240807011652739.png b/paper_figure/image-20240807011652739.png new file mode 100644 index 0000000..64b8a00 Binary files /dev/null and b/paper_figure/image-20240807011652739.png differ diff --git a/storage_paper_note/deduplication/post_dedup/IDEA-FAST'24.md b/storage_paper_note/deduplication/post_dedup/IDEA-FAST'24.md new file mode 100644 index 0000000..532673a --- /dev/null +++ b/storage_paper_note/deduplication/post_dedup/IDEA-FAST'24.md @@ -0,0 +1,195 @@ +--- +typora-copy-images-to: ../paper_figure +--- +# Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index + +| Venue | Category | +| :------------------------: | :------------------: | +| FAST'24 | Deduplicated System Design, Post-Deduplication Management | +[TOC] + +## 1. Summary +### Motivation of this paper + +- motivation + - indexing deduplicated data might result in extreme inefficiencies + - index size + - proportion to the logical data size, **regardless of its deduplication ratio** + - each term must point to all the files containing it, **even if the files' content is almost identical** + - index creation overhead + - random and redundant accesses to the physical chunks + - **term indexing** is not supported by any deduplicating storage system + - focus on **textual data** + - VMware vSphere and Commvault only support file indexing + - identifies individual files within a backup based on metadata + - Dell-EMC Data Protection Search + - support full content indexing + - warn: processing the full content of a large number of files can be **time consuming** + - recommend performing targeted indexing on **specific backups and file types** +- challenge + - two separate trends + - the growing need to process **cold data** (e.g., old backups) + - e.g., full-system scans, keyword searches --> deduplication-aware search + - the growing application of deduplication on primary storage of hot and warm data + - e.g., perform single-term searches for files within deduplicated personal workstation + - indexing software on file-system level --> **unaware** of the underlying deduplication at the storage system + - index size + - increase --> increase the latency of lookups + - index time + - scan all files in the system --> random IOs, high read amplification + - split terms + - chunking process will likely split the incoming data into chunks (at **arbitrary position**) + - splitting words between adjacent chunks + +### IDEA + +- ![image-20240321002025742](./../paper_figure/image-20240321002025742.png) + +- key idea + - map terms to the unique physical chunks they appear in + - instead of the logical documents (disproportionately high) + - replace term-to-file mapping with + - term-to-chunk map + - chunk-to-file map (file ID) + - only need to modify chunking process in deduplication system + - **white-space aware** --> enforce chunk boundaries only between words +- white-space aligned chunking + - content-defined chunking + - **continue scanning** the following characters until a white-space character is encountered + - fixed-size chunking + - **backward scanning** this chunk until a white-space character is encountered + - resulting chunks are always smaller than the fixed size --> can be stored in a single block + - can trim the block in memory to chunk boundary + - non-textual content + - only to chunking of **textual content** + - identify textual content by the file extension of the incoming data + - .c, .h, and .htm + - add a Boolean field to the metadata of each chunk in the file recipe and container + - only process chunks marked as textual +- term-to-chunk mapping + - number of documents in the index --> number of physical chunks + - might be higher than the number of logical files + - chunks are **read sequentially**, each chunk is processed only once + - processing chunks is easily parallelizable + + - lookup + - return the fingerprints of the chunks this term appears + +- chunk-to-file mapping + - two complementing maps + - chunk-to-file map + - chunk fingerprint --> file IDs + - file-to-path map + - file IDs --> file's full pathname + - created from the metadata in the file recipe + +- keyword/term lookup + - step-1: yield the fingerprints of all the relevant chunks + - step-2: a series of lookups in the chunk-to-file map + - retrieves the IDs of all files containing these chunks + - step-3: a lookup of each file ID in the file-to-path map + - returns the final list of file names +- ranking results + - extend IDEA to support document ranking with the TF-IDF metric + +### Implementation and Evaluation + +- implementation + - LucenePlusPlus + Destor + - use Lucene term-to-doc map + - ![image-20240321204347685](./../paper_figure/image-20240321204347685.png) + - scan all file receipes from Destor + - create the list of files containing each chunk using a key-value store + - use an SSD for the data structures which are external to Lucene +- experimental setup + - trace + - ![image-20240321210826877](./../paper_figure/image-20240321210826877.png) + + - hardware + - maps of all index alternatives were stored on a separate HDD + - chunk-to-file and file-to-path maps of IDEA were stored on a SSD + +- evaluation + - baseline + - traditional deduplication-oblivious indexing (Naive) + + - indexing time + - the reduction is proportional to the **deduplication ratio** + - recipe-processing time is negligible compared to the chunk-processing time + + - indexing time of IDEA is shorter than that of Naive by 49% to 76% + + - index size + - Naive must record more files for all the terms include in them + - IDEA additional information is recorded per chunk, not per term + + - lookup times + - is faster than Naive by up to 82% + - smaller size of its term-to-doc map + - incur shorter lookup latency + + - IDEA overhead + - IDEA has no advantage when compared to deduplication-oblivious indexing + - additional layer of indirection incurs **non-negligible overheads are masked** where the deduplication ratio is sufficiently high + + +## 2. Strength (Contributions of the paper) + +- first design of a deduplication-aware term index +- implementation of IDEA on Lucene + - open-source single-node inverted index used by the Elasticsearch +- extensive evaluation + +## 3. Weakness (Limitations of the paper) + +- trace is not very large +- files containing compressed text (.pdf, .docx) + - their textual content can only be processed after the file is opened by a suitable application or converted by a dedicated tool + - individual chunks cannot be processed during offline index creation + +## 4. Some Insights (Future work) + +- deduplication scenarios + - backup and archival systems + - log-structured manner: chunk --> containers + - content-defined chunking + - primary (non-backup) storage system and appliances + - support direct access to individual chunks + - fixed-sized chunking + - align the deduplicated chunks with the storage interface +- deduplication data management + - implicit sharing of content between files, complicates the followings: transforms logically-sequential data accesses to random IOs in the underlying physical media + - GC + - load balancing between volumes + - caching + - charge-back +- term indexing: **term-to-file** indexing (map) + - ![image-20240321001530743](./../paper_figure/image-20240321001530743.png) + - return the files containing **a keyword** or **term** + - search engines, data analytics + - searched data might be deduplicated + - e.g. Elasticsearch + - built on top of the single-node Apache Lucene + - based on a hierarchy of skip-lists + - other variations + - Amazon OpenSearch, IBM Watson + - keyword: any searchable strings (natural language words) + - query + - the list of files containing this keyword + - optional: byte offsets in which the term appears + - indexing creation + - collect the documents + - identify the terms within each document + - normalize the terms + - create the list of documents, and optionally offsets, containing each term + - result ranking + - using a **scoring formula** on each result + - TF-IDF + - ![image-20240319012231775](./../paper_figure/image-20240319012231775.png) +- deduplication basic + - file recipe + - a list of chunks' fingerprints, their sizes + - restore: locate the chunk by searching in the fingerprint map or cache of its entries + - pack the **compressed data** into containers +- standard storage functionality + - can be made more efficient by taking advantage of deduplicated state diff --git a/storage_paper_note/general_storage/OC_ZNS/ZNS_SSD_Cache-HotStorage'24.md b/storage_paper_note/general_storage/OC_ZNS/ZNS_SSD_Cache-HotStorage'24.md new file mode 100644 index 0000000..87b8a95 --- /dev/null +++ b/storage_paper_note/general_storage/OC_ZNS/ZNS_SSD_Cache-HotStorage'24.md @@ -0,0 +1,140 @@ +--- +typora-copy-images-to: ../paper_figure +--- +# Can ZNS SSDs be Better Storage Devices for Persistent Cache? + +| Venue | Category | +| :------------------------: | :------------------: | +| HotStorage'24 | ZNS SSDs, Cache | +[TOC] + +## 1. Summary +### Motivation of this paper + +- motivation + - existing works mainly focus on cache data on block-based regular SSDs + - widely used as storage backends for **persistent cache** + - caching workload are **write- and update-intensive** with high capacity utilization + - incurs a large amount of **device-level write amplification** (WA) + - with many random and small writes to SSDs + - internal garbage collection (GC) + - SSD lifespan and performance issues + - ZNS SSDs + - two advantages + - need much lower internal over-provisioning --> larger capacity + - a better overall cache hit ratio + - new interfaces --> potential to reduce WA +- problem + - explore three possible schemes to adapt the existing persistent cache system on ZNS SSDs + - utilize **CacheLib** as a general cache framework + +### ZNS SSDs in Persistent Cache + +- three possible schemes + - ![image-20240805014222417](./../paper_figure/image-20240805014222417.png) + - **File-Cache** + - run CacheLib on a ZNS-compatible file system (F2FS) + - FS handle all low-level operations management + - **Zone-Cache** + - directly maps the cache on-disk management unit (i.e., region) to the fixed-size zone + - achieve true zero WA and be GC-free + - **Region-Cache** + - a simple middle layer to translate the zone interface to the region interface + - needs GC to clean the zones +- File-Cache + - ZNS SSD can be formatted with a compatible file system + - zone allocation, zone cleaning with GC, and indexing handled by FS + - **fully transparent** to CacheLib + - treat ZNS SSD like a regular device + - bad + - feasible and convenient, but will **bring explicitly high overhead** +- Zone-Cache + - most of the persistent cache designs + - group the newly inserted cache objects into a much larger management unit (fixed-size regions) + - reduce WA and improve IO efficiency --> **allocating and evicting large IO units** + - enlarge the region size to match the zone size + - one region per zone + - when a region is evicted, the zone can be directly reset without any data migration + - real zero WA + - GC-free + - no OP is needed for GC + - no extra indexing + - adding one entry of zone number to the region metadata for IOs + - bad + - need to match the region to a large zone size (1077MiB in Western Digital ZNS SSD) + - evicting a large region --> cause many valid or hot cache objects to be evicted + - impact the hit ratio + - need a larger region buffer in memory to cache the newly inserted objects + - more DRAM space + - long allocation time in eviction and a long filling time in insertion + - reducing the parallelism effectiveness +- Region-Cache + - add a simple middle layer to translate region to physical zone addresses + - data management + - region ID --> in-zone addresses + - bitmap indicates whether the region is valid in zone + - 1024 MiB Zone --> 16MiB region + - GC + - use a background thread to check the empty zone number and valid data size + - GC threshold and the zone selection threshold are configurable + - depends on the workloads + - opens the design space to further optimize the throughput and WA + - co-design between cache management and zone management + +### Implementation and Evaluation + +- evaluation + - setting + - flexibility, space efficiency, performance, and WA + - compared with CacheLib on regular SSDs (**Block-Cache**) + + - ZNS SSDs + - Western Digital Ultrastar DC ZN540 with 904 zones the zone size is 1077MiB + + - regular SSD + - 1TiB SN540 SSD + + - overall comparison + - ![image-20240807011652739](./../paper_figure/image-20240807011652739.png) + - Zone-Cache has the largest cache size (no OP) --> highest cache hit ratio + + - different OP ratio + - tradeoff between throughput and hit ratio + - higher WA --> lower throughput + + - end-to-end evaluation with RocksDB + - throughput: Region-Cache is highest, Zone-Cache is lowest + - ZNS SSDs can give a larger cache size than regular SSDs + + +## 2. Strength (Contributions of the paper) + +- ZNS SSDs persistent cache can reduce the tail latency and lower WA compared with regular SSDs +- ZNS SSDs can be better storage devices for persistent cache +- Zone-Cache can perform better in the **hit ratio** +- Region-Cache can perform better in **throughput** + +## 3. Weakness (Limitations of the paper) + +## 4. Some Insights (Future work) + +- open-channel SSDs + - separate different data streams into different channels + - relieving WA and GC penalties +- zone-based storage + - sequential write and zone-based cleaning constraints + - avoid internal GC + - GC task can be managed by the applications + - write pointer + - shift to the start by ***zone reset*** + - jump to the end of the zone by ***zone finish*** +- CacheLib + - a pluggable caching engine developed by Meta + - log-structured cache + - flash space is partitioned into regions + - each region is used to package cache objects with different sizes + - **evict entire regions** rather than individual cache objects + - region size is configurable, e.g., 16MiB + - are designed to use either + - a raw regular block device + - one large file allocated in a file system (pre-allocated file)