update

yzr95924 · Jun 1, 2021 · 26d9a25 · 26d9a25
1 parent 4531dcd
commit 26d9a25
Show file tree

Hide file tree

Showing 17 changed files with 1,238 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -50,8 +50,7 @@ In this repo, it records some paper related to storage system, including **Data
 12. *SmartDedup: Optimizing Deduplication for Resource-constrained Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-yang-qirui.pdf))
 13. Can't We All Get Along? Redesigning Protection Storage for Modern Workloads----USENIX ATC'18 ([link](https://www.usenix.org/system/files/conference/atc18/atc18-allu.pdf)) [summary](https://yzr95924.github.io/paper_summary/Redesigning-ATC'18.html)
 14. *Deduplication in SSDs: Model and quantitative analysis*----MSST'12 ([link](https://ieeexplore.ieee.org/document/6232379))
-15. *Cumulus: Filesystem Backup to the Cloud*----FAST'09 ([link](https://www.usenix.org/legacy/event/fast09/tech/full_papers/vrable/vrable.pdf))
-16. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf ))
+16. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf )) [summary](https://yzr95924.github.io/paper_summary/iDedup-FAST'12.html)
 17. *DupHunter: Flexible High-Performance Deduplication for Docker Registries*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-zhao.pdf))
 18. Design Tradeoffs for Data Deduplication Performance in Backup Workloads----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-fu.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupDesignTradeoff-FAST'15.html)
 19. The Dilemma between Deduplication and Locality: Can Both be Achieved?---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html)
@@ -79,7 +78,7 @@ In this repo, it records some paper related to storage system, including **Data
 7. *On Information Leakage in Deduplication Storage Systems*----CCS Workshop'16 [summary](https://yzr95924.github.io/paper_summary/InformationLeakage-CCSW'16.html)
 8. *SecDep: A User-Aware Efficient Fine-Grained Secure Deduplication Scheme with Multi-Level Key Management*----MSST'15 ([link](https://cswxia.github.io/SecDep-final-2015.pdf))
 9. *Message-Locked Encryption and Secure Deduplication*----EuroCrypt'13 [summary](https://yzr95924.github.io/paper_summary/MLE-EuroCrypto'13.html)
-10. *Proofs of Ownership in Remote Storage System*----CCS'11
+10. *Proofs of Ownership in Remote Storage System*----CCS'11 ([link](https://dl.acm.org/doi/pdf/10.1145/2046707.2046765))
 11. *Tapping the Potential: Secure Chunk-based Deduplication of Encrypted Data for Cloud Backup*----CNS'18 [summary](https://yzr95924.github.io/paper_summary/TappingPotential-CNS'18.html)
 12. *A Bandwidth-Efficient Middleware for Encrypted Deduplication*----DSC'18 [summary](https://yzr95924.github.io/paper_summary/UWare-DSC'18.html)
 13. *Bloom Filter Based Privacy Preserving Deduplication System*----Springer International Conference on Security & Privacy'19 ([link](https://link.springer.com/chapter/10.1007/978-981-13-7561-3_2)) [summary](https://yzr95924.github.io/paper_summary/BloomFilterDedup-ICSP'19.html)
@@ -102,7 +101,7 @@ In this repo, it records some paper related to storage system, including **Data
 
 ### Computation Deduplication
 
-2. *Secure Deduplication of General Computations*
+1. *Secure Deduplication of General Computations*
 
 
 ### Metadata Management
@@ -138,6 +137,7 @@ In this repo, it records some paper related to storage system, including **Data
 1. *UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling*----FAST'18 [summary](https://yzr95924.github.io/paper_summary/UKSM-FAST'18.html)
 2. *Using Hints to Improve Inline Block-Layer Deduplication*----FAST'16 [summary](https://yzr95924.github.io/paper_summary/HintsDeduplication-FAST'16.html*)
 3. *XLM: More Effective Memory Deduplication Scanners through Cross-Layer Hints*----USENIX ATC'13
+4. *OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash*----FAST'16 ([link](https://www.usenix.org/system/files/conference/fast16/fast16-papers-chen-zhuan.pdf))
 
 ### Data Chunking
 1. *SS-CDC: A Two-stage Parallel Content-Defined Chunking for Deduplicating Backup Storage*----SYSTOR'19 ([(link)]( http://ranger.uta.edu/~sjiang/pubs/papers/ni19-ss-cdc.pdf )) [summary](https://yzr95924.github.io/paper_summary/SSCDC-SYSTOR'19.html)
@@ -179,6 +179,7 @@ In this repo, it records some paper related to storage system, including **Data
 3. *A Scalable Inline Cluster Deduplication Framework for Big Data Protection*----Middleware'12 ([link](https://hal.inria.fr/hal-01555548/document))
 4. *Tradeoffs in Scalable Data Routing for Deduplication Clusters*----FAST'11 ([link](https://www.usenix.org/legacy/events/fast11/tech/full_papers/Dong.pdf)) [summary]( https://yzr95924.github.io/paper_summary/TradeoffDataRouting-FAST'11.html )
 5. *Cluster and Single-Node Analysis of Long-Term Deduplication Patterns*----ToS'18 ([link](https://dl.acm.org/doi/pdf/10.1145/3183890)) [summary](https://yzr95924.github.io/paper_summary/ClusterSingle-ToS'18.html)
+6. *Decentralized Deduplication in SAN Cluster File Systems*----FAST'09 ([link](https://static.usenix.org/events/usenix09/tech/full_papers/clements/clements.pdf))
 
 ## B. Erasure Coding
 
@@ -292,7 +293,7 @@ In this repo, it records some paper related to storage system, including **Data
 
 1. *NEXUS: Practical and Secure Access Control on Untrusted Storage Platforms using Client-side SGX*----DSN'19 ([link](https://people.cs.pitt.edu/~adamlee/pubs/2019/djoko2019dsn-nexus.pdf))
 2. *Securing the Storage Data Path with SGX Enclaves*----arxiv'18 ([link](https://arxiv.org/abs/1806.10883)) [summary](https://yzr95924.github.io/paper_summary/StorageDataPathSGX-arxiv.html)
-3. *EnclaveDB: A Secure Database using SGX*----S&P'18
+3. *EnclaveDB: A Secure Database using SGX*----S&P'18 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8418608))
 4. *Isolating Operating System Components with Intel SGX*----SysTEX'16 ([link](https://faui1-files.cs.fau.de/filepool/projects/sgx-kernel/sgx-kernel.pdf))
 5. *SPEICHER: Securing LSM-based Key-Value Stores using Shielded Execution*----FAST'19 ([link](https://www.usenix.org/system/files/fast19-bailleu.pdf)) [summary](https://yzr95924.github.io/paper_summary/SPEICHER-FAST'19.html)
 6. *ShieldStore: Shielded In-memory Key-Value Storage with SGX*----EUROSYS'19 ([link]( http://calab.kaist.ac.kr:8080/~jhuh/papers/kim_eurosys19_shieldst.pdf )) [summary](https://yzr95924.github.io/paper_summary/ShieldStore-EuroSys'19.html)
@@ -307,25 +308,24 @@ In this repo, it records some paper related to storage system, including **Data
 1. *A Privacy-Preserving Defense Mechanism Against Request Forgery Attacks*----TrustCom'11 ([link](https://www.cse.cuhk.edu.hk/~pclee/www/pubs/trustcom11.pdf)) [summary]( https://yzr95924.github.io/paper_summary/DeRef-TrustCom'11.html )
 
 ## D. General Storage
-### Multi-Cloud System
+### Cloud Storage System
 1. *Kurma: Secure Geo-Distributed Multi-Cloud Storage Gateways*----SYSTOR'19 [summary](https://yzr95924.github.io/paper_summary/Kurma-SYSTOR'19.html)
 2. *SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services*----SOSP'13 [summary](https://yzr95924.github.io/paper_summary/SPANStore-SOSP'13.html)
 3. *CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Service*----NSDI'15
 4. *A Day Late and a Dollar Short: The Case for Research on Cloud Billing Systems*----HotCloud'14
+5. *Cumulus: Filesystem Backup to the Cloud*----FAST'09 ([link](https://www.usenix.org/legacy/event/fast09/tech/full_papers/vrable/vrable.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cumulus-FAST'09.html)
+6. *Ceph: A Salable, High-Performance Distributed File System*----OSDI'06 
+7. *The Hadoop Distributed File System*----MSST'10 ([link](http://storageconference.us/2010/Papers/MSST/Shvachko.pdf)) [summary](https://yzr95924.github.io/paper_summary/HDFS-MSST'10.html)
+8. *RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters*----PDSW'07
+9. *CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data*----SC'06
+10. *MapReduce: Simplified Data Processing on Large Clusters*----OSDI'04
+11. *The Google File System*----SOSP'03
+12. *Bigtable: A Distributed Storage System for Structured Data*----OSDI'06
 
 ### New PAXOS
 
 1. *In Search of an Understandable Consensus Algorithm*----USENIX ATC'14
 
-### Storage System
-1. *Ceph: A Salable, High-Performance Distributed File System*----OSDI'06 
-2. *The Hadoop Distributed File System*----MSST'10 ([link](http://storageconference.us/2010/Papers/MSST/Shvachko.pdf)) [summary](https://yzr95924.github.io/paper_summary/HDFS-MSST'10.html)
-3. *RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters*----PDSW'07
-4. *CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data*----SC'06
-5. *MapReduce: Simplified Data Processing on Large Clusters*----OSDI'04
-6. *The Google File System*----SOSP'03
-7. *Bigtable: A Distributed Storage System for Structured Data*----OSDI'06
-
 ### Cache
 
 1. *TinyLFU: A Highly Efficient Cache Admission Policy*----ACM ToS'17 

diff --git a/StoragePaperNote/Deduplication/Deduplication-System-Design/iDedup-FAST'12.md b/StoragePaperNote/Deduplication/Deduplication-System-Design/iDedup-FAST'12.md
@@ -0,0 +1,116 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+iDedup: Latency-aware, Inline Data Deduplication for Primary Storage
+------------------------------------------
+|           Venue            |       Category       |
+| :------------------------: | :------------------: |
+| FAST'12 | Deduplication System |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+- Motivation
+Many primary storage workloads are unable to leverage the benefits of deduplication 
+> due to the associated latency costs.
+
+Prior research has not applied deduplication techniques **inline** to the request path for **latency sensitive**, **primary workloads**.
+> inline deduplication: add work to the write path, increase latency
+> offline deduplication: wait for system idle time to do deduplication.
+> reads remain fragmented in both.
+
+- Disadvantages of offline deduplication
+  - cause a bloat in storage usage leading to inaccurate space accounting and provisioning
+  - need system idle time to perform deduplication without impacting foreground requests.
+  - offline deduplication uses extra disk bandwidth when reading in the staged data.
+
+- Current workloads have two insights:
+> 1. spatial locality
+> 2. temporal locality
+
+Key question: how to do the tradeoff between capacity savings and deduplication performance？
+
+### iDedup
+- Goal: not increase the latency of the already latency sensitive, foreground operations.
+1. read operation: fragmentation in data layout.
+2. write operation: to identify duplicates, on-disk data structures are accessed.
+
+- Main idea
+1. Amortize the seeks caused by deduplication by only performing deduplication when a sequence of on-disk blocks are duplicated.
+> examine blocks at write time
+> configure a *minimum sequence length*
+> tradeoff: capacity savings and performance
+
+2. maintain an in-memory fingerprint cache to detect duplicates in lieu of any on-disk structures.
+> a completely memory-resident, LRU cache.
+> tradeoff: performance (hit rate) and capacity savings (dedup-metadata size)
+
+- Design rationale
+1. *Spatial locality* in the data workloads
+Duplicated data is clustered.
+
+2. *Temporal locality* in the data workloads
+making the fingerprint table amenable to caching
+
+
+- System Architecture
+
+![image-20200130153405675](../paper_figure/image-20200130153405675.png)
+
+1. Cache design
+One entry per block.
+> maps the fingerprint of a block to its disk block number (DBN) on disk.
+> use LRU policy, (fingerprint, DBN)
+
+2. Metadata management
+In RAM:
+> Dedup-metadata cache: a pool of block entries (content-nodes)
+> Fingerprint hash table: maps fingerprint to DBN
+> DBN hash table: map DBN to its content-node.
+
+In disk
+> Reference count file: maintains reference counts of deduplicated file system blocks in a file.
+>
+> > refcount updates are often collocated to the same disk blocks (thereby amortizing IOs to the refcount file)
+
+3. iDedup algorithm: Sequence identification
+
+### Implementation and Evaluation
+
+
+- Evaluation
+Two tunable parameters: 
+> 1. the minimum duplicate sequence threshold 
+> 2. in-memory dedup-metadata cache size
+
+Two comparisons:
+1. baseline: without deduplication
+2. threshold = 1: exact deduplication
+
+1. Deduplication ratio vs. threshold
+threshold increases, the deduplication ratio drops
+2. Disk fragmentation vs. threshold
+threshold increases, fragmentation decreases
+3. client read response time vs. threshold
+same trend as disk fragmentation
+4. CPU utilization vs. threshold
+utilization increases slightly with the threshold
+iDedup algorithm has little impacts on the overall utilization 
+5. Buffer cache hit rate vs. dedup-metadata cache size 
+
+## 2. Strength (Contributions of the paper)
+
+## 3. Weakness (Limitations of the paper)
+1. This paper provides the insights on spatial and temporal locality of deduplicated data in real-world, primary workloads.
+
+
+## 4. Some Insights (Future work)
+1. This paper mentions that the higher the deduplication ratio, the higher the likelihood of fragmentation.
+> deduplication can convert sequential reads from the application into random reads from storage.
+
+2. It mentions the threshold in iDedup must be derived empirically to match the randomness in the workload.
+> depends on the workload property
+> how to enable the system to automatically make this tradeoff.
+
+3. primary storage system trace
+CIFS traces: NetApp (USENIX ATC'08)
diff --git a/StoragePaperNote/EnclaveCache-Middleware'19.md b/StoragePaperNote/EnclaveCache-Middleware'19.md
@@ -0,0 +1,108 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+EnclaveCache: A Secure and Scalable Key-Value Cache in Multi-tenant Clouds using Intel SGX
+------------------------------------------
+|           Venue            |       Category       |
+| :------------------------: | :------------------: |
+| Middleware'19 | SGX Storage |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+- Motivation 
+  - In-memory key-value caches such as Redis and Memcached have been widely used to speed up web application and reduce the burden on backend database.
+  - Data security is still a major concern, which affects the adoption of cloud caches (multi-tenant environment)
+    - co-located malicious tenants 
+    - the untrusted cloud provider
+- Limitation of existing approaches 
+  - virtualization and containerization technologies
+    - achieved tenant isolation at the cost of system scalability, resource contention
+  - adopt property-preserving encryption to enable query processing over encrypted data
+    - suffer from high computation overhead and information leakage
+- Threat model
+  - multiple mutually distrust parties in a multi-tenant cloud environment
+  - privileged adversary can access the data stored outside the trusted environment
+  - malicious tenants may make spurious access to increase their cache hit rate, and evict the data of co-located tenants out of memory
+
+### EnclaveCache
+
+- Main idea
+  - enforce data isolation among co-located tenants using multiple SGX
+  - securely guard the encryption key of each tenant by the enclave
+  - key question: how to utilize SGX enclaves to realize secure key-value caches within the limited trusted memory
+    - remains an open question
+- Key design decisions
+  - tenant isolation
+    - allow multiple tenants to share a single cache instance, and `each tenant gets a separate enclave as a secret container`
+  - data protection
+    - plaintext data only stays inside enclaves to get serialized, deserialized and processed, and the data is encrypted one it leaves the enclave.
+- Cache isolation
+  - application container: support un-modified applications inside enclaves (`bad scalability`)
+    - e.g., SCONE
+  - data container: hosting only each tenant's data in a dedicated enclave (`oversubscribe the SGX resources`)
+  - secret container: storing only the sensitive information as well as the critical code into enclaves (`this paper design`)
+- Architecture 
+  - ![image-20210529210747569](../paper_figure/image-20210529210747569.png)
+  - The TLS connection is terminated inside the enclave
+  - **Encryption engine** inside then secret enclave is responsible for encrypting the sensitive fields of the requests passed from the TLS server endpoint.
+  - The encryption key used by the encryption engine is acquired by the Key Request Module (KRM) from a Key Distribution Center (KDC).
+    - via SGX remote attestation
+- Key distribution and management
+  - Each tenant is bound with a unique *encryption key* for the encryption/decryption of tenant's data stored outside the enclave.
+  - Every newly-created secret enclave has to go through RA procedure to be attested and provisioned
+    - the encryption key can be stored securely and persistently in the local disk
+      - SGX sealing mechanism
+- Query processing
+  - only the sensitive fields of a message, such as the key/value field, need to be protected via encryption.
+    - the IV for encryption is computed from the SHA-256 hash of each sensitive field
+    - the IV and the MAC is appended to the ciphertext to be used at the time of decryption
+  - bind the key and value
+    - appends the hash of the key to its corresponding value, and the encryption is then performed on the newly generated value 
+      - to against the attacher to replace the encrypted value.
+  - query with the encrypted key 
+    - forward to the request handler
+
+### Implementation and Evaluation
+- Implementation
+  - mbedtls-sgx: AES-128, SHA-256
+  - Tenant isolation
+    - per-tenant LRU for shared multi-tenant cache management strategy
+      - the same account of data is bound to be evicted from each tenant
+    - bind each tenant with a logical database to enable the per-tenant LRU strategy
+  - switchless call to optimize the performance
+
+- Evaluation 
+  - four instances: redis + stunnel, EnclaveCache + switchless, EnclaveCache, Graphene-SGX + redis
+  - YCSB benchmark suite
+  - 1. throughput
+  - 2. hotspots analysis
+    - using Intel VTune amplifier
+  - 3. latency
+    - for requests with large values, the performance of it decreases greatly, mainly due to the increased computation overhead for cryptography operations
+  - 4. scalability
+  - 5. cache fairness
+
+
+
+
+## 2. Strength (Contributions of the paper)
+
+- leverage trusted hardware to solve the problem of **tenant isolation** and **data protection** in multi-tenant clouds.
+- adopts fine-grained, tenant-specific key-value encryption in SGX enclaves to `overcome the limit of SGX`.
+- Extensive evaluation
+  - better performance, higher scalability than running native, unmodified applications in the enclaves
+
+## 3. Weakness (Limitations of the paper)
+
+- Issues of encrypted data stored outside the enclaves
+  - malicious adversaries can delete or re-insert previous key-value pair 
+  - the operation types, key access frequencies and hashed-key distributions are also visible and exploitable.
+
+## 4. Some Insights (Future work)
+
+- Security issues in multi-tenants environment 
+  - the multi-tenant environment may expose users' sensitive data to the other co-located, possibly malicious tenants
+  - the cloud platform provider itself cannot be considered trusted
+- SGX attach surface
+  - the attack surface with SGX enclaves is significantly reduced to only the `processor` and `the software inside enclaves`.