Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
yzr95924 committed Jun 1, 2021
1 parent 4531dcd commit 26d9a25
Show file tree
Hide file tree
Showing 17 changed files with 1,238 additions and 15 deletions.
30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,7 @@ In this repo, it records some paper related to storage system, including **Data
12. *SmartDedup: Optimizing Deduplication for Resource-constrained Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-yang-qirui.pdf))
13. Can't We All Get Along? Redesigning Protection Storage for Modern Workloads----USENIX ATC'18 ([link](https://www.usenix.org/system/files/conference/atc18/atc18-allu.pdf)) [summary](https://yzr95924.github.io/paper_summary/Redesigning-ATC'18.html)
14. *Deduplication in SSDs: Model and quantitative analysis*----MSST'12 ([link](https://ieeexplore.ieee.org/document/6232379))
15. *Cumulus: Filesystem Backup to the Cloud*----FAST'09 ([link](https://www.usenix.org/legacy/event/fast09/tech/full_papers/vrable/vrable.pdf))
16. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf ))
16. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf )) [summary](https://yzr95924.github.io/paper_summary/iDedup-FAST'12.html)
17. *DupHunter: Flexible High-Performance Deduplication for Docker Registries*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-zhao.pdf))
18. Design Tradeoffs for Data Deduplication Performance in Backup Workloads----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-fu.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupDesignTradeoff-FAST'15.html)
19. The Dilemma between Deduplication and Locality: Can Both be Achieved?---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html)
Expand Down Expand Up @@ -79,7 +78,7 @@ In this repo, it records some paper related to storage system, including **Data
7. *On Information Leakage in Deduplication Storage Systems*----CCS Workshop'16 [summary](https://yzr95924.github.io/paper_summary/InformationLeakage-CCSW'16.html)
8. *SecDep: A User-Aware Efficient Fine-Grained Secure Deduplication Scheme with Multi-Level Key Management*----MSST'15 ([link](https://cswxia.github.io/SecDep-final-2015.pdf))
9. *Message-Locked Encryption and Secure Deduplication*----EuroCrypt'13 [summary](https://yzr95924.github.io/paper_summary/MLE-EuroCrypto'13.html)
10. *Proofs of Ownership in Remote Storage System*----CCS'11
10. *Proofs of Ownership in Remote Storage System*----CCS'11 ([link](https://dl.acm.org/doi/pdf/10.1145/2046707.2046765))
11. *Tapping the Potential: Secure Chunk-based Deduplication of Encrypted Data for Cloud Backup*----CNS'18 [summary](https://yzr95924.github.io/paper_summary/TappingPotential-CNS'18.html)
12. *A Bandwidth-Efficient Middleware for Encrypted Deduplication*----DSC'18 [summary](https://yzr95924.github.io/paper_summary/UWare-DSC'18.html)
13. *Bloom Filter Based Privacy Preserving Deduplication System*----Springer International Conference on Security & Privacy'19 ([link](https://link.springer.com/chapter/10.1007/978-981-13-7561-3_2)) [summary](https://yzr95924.github.io/paper_summary/BloomFilterDedup-ICSP'19.html)
Expand All @@ -102,7 +101,7 @@ In this repo, it records some paper related to storage system, including **Data

### Computation Deduplication

2. *Secure Deduplication of General Computations*
1. *Secure Deduplication of General Computations*


### Metadata Management
Expand Down Expand Up @@ -138,6 +137,7 @@ In this repo, it records some paper related to storage system, including **Data
1. *UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling*----FAST'18 [summary](https://yzr95924.github.io/paper_summary/UKSM-FAST'18.html)
2. *Using Hints to Improve Inline Block-Layer Deduplication*----FAST'16 [summary](https://yzr95924.github.io/paper_summary/HintsDeduplication-FAST'16.html*)
3. *XLM: More Effective Memory Deduplication Scanners through Cross-Layer Hints*----USENIX ATC'13
4. *OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash*----FAST'16 ([link](https://www.usenix.org/system/files/conference/fast16/fast16-papers-chen-zhuan.pdf))

### Data Chunking
1. *SS-CDC: A Two-stage Parallel Content-Defined Chunking for Deduplicating Backup Storage*----SYSTOR'19 ([(link)]( http://ranger.uta.edu/~sjiang/pubs/papers/ni19-ss-cdc.pdf )) [summary](https://yzr95924.github.io/paper_summary/SSCDC-SYSTOR'19.html)
Expand Down Expand Up @@ -179,6 +179,7 @@ In this repo, it records some paper related to storage system, including **Data
3. *A Scalable Inline Cluster Deduplication Framework for Big Data Protection*----Middleware'12 ([link](https://hal.inria.fr/hal-01555548/document))
4. *Tradeoffs in Scalable Data Routing for Deduplication Clusters*----FAST'11 ([link](https://www.usenix.org/legacy/events/fast11/tech/full_papers/Dong.pdf)) [summary]( https://yzr95924.github.io/paper_summary/TradeoffDataRouting-FAST'11.html )
5. *Cluster and Single-Node Analysis of Long-Term Deduplication Patterns*----ToS'18 ([link](https://dl.acm.org/doi/pdf/10.1145/3183890)) [summary](https://yzr95924.github.io/paper_summary/ClusterSingle-ToS'18.html)
6. *Decentralized Deduplication in SAN Cluster File Systems*----FAST'09 ([link](https://static.usenix.org/events/usenix09/tech/full_papers/clements/clements.pdf))

## B. Erasure Coding

Expand Down Expand Up @@ -292,7 +293,7 @@ In this repo, it records some paper related to storage system, including **Data

1. *NEXUS: Practical and Secure Access Control on Untrusted Storage Platforms using Client-side SGX*----DSN'19 ([link](https://people.cs.pitt.edu/~adamlee/pubs/2019/djoko2019dsn-nexus.pdf))
2. *Securing the Storage Data Path with SGX Enclaves*----arxiv'18 ([link](https://arxiv.org/abs/1806.10883)) [summary](https://yzr95924.github.io/paper_summary/StorageDataPathSGX-arxiv.html)
3. *EnclaveDB: A Secure Database using SGX*----S&P'18
3. *EnclaveDB: A Secure Database using SGX*----S&P'18 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8418608))
4. *Isolating Operating System Components with Intel SGX*----SysTEX'16 ([link](https://faui1-files.cs.fau.de/filepool/projects/sgx-kernel/sgx-kernel.pdf))
5. *SPEICHER: Securing LSM-based Key-Value Stores using Shielded Execution*----FAST'19 ([link](https://www.usenix.org/system/files/fast19-bailleu.pdf)) [summary](https://yzr95924.github.io/paper_summary/SPEICHER-FAST'19.html)
6. *ShieldStore: Shielded In-memory Key-Value Storage with SGX*----EUROSYS'19 ([link]( http://calab.kaist.ac.kr:8080/~jhuh/papers/kim_eurosys19_shieldst.pdf )) [summary](https://yzr95924.github.io/paper_summary/ShieldStore-EuroSys'19.html)
Expand All @@ -307,25 +308,24 @@ In this repo, it records some paper related to storage system, including **Data
1. *A Privacy-Preserving Defense Mechanism Against Request Forgery Attacks*----TrustCom'11 ([link](https://www.cse.cuhk.edu.hk/~pclee/www/pubs/trustcom11.pdf)) [summary]( https://yzr95924.github.io/paper_summary/DeRef-TrustCom'11.html )

## D. General Storage
### Multi-Cloud System
### Cloud Storage System
1. *Kurma: Secure Geo-Distributed Multi-Cloud Storage Gateways*----SYSTOR'19 [summary](https://yzr95924.github.io/paper_summary/Kurma-SYSTOR'19.html)
2. *SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services*----SOSP'13 [summary](https://yzr95924.github.io/paper_summary/SPANStore-SOSP'13.html)
3. *CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Service*----NSDI'15
4. *A Day Late and a Dollar Short: The Case for Research on Cloud Billing Systems*----HotCloud'14
5. *Cumulus: Filesystem Backup to the Cloud*----FAST'09 ([link](https://www.usenix.org/legacy/event/fast09/tech/full_papers/vrable/vrable.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cumulus-FAST'09.html)
6. *Ceph: A Salable, High-Performance Distributed File System*----OSDI'06
7. *The Hadoop Distributed File System*----MSST'10 ([link](http://storageconference.us/2010/Papers/MSST/Shvachko.pdf)) [summary](https://yzr95924.github.io/paper_summary/HDFS-MSST'10.html)
8. *RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters*----PDSW'07
9. *CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data*----SC'06
10. *MapReduce: Simplified Data Processing on Large Clusters*----OSDI'04
11. *The Google File System*----SOSP'03
12. *Bigtable: A Distributed Storage System for Structured Data*----OSDI'06

### New PAXOS

1. *In Search of an Understandable Consensus Algorithm*----USENIX ATC'14

### Storage System
1. *Ceph: A Salable, High-Performance Distributed File System*----OSDI'06
2. *The Hadoop Distributed File System*----MSST'10 ([link](http://storageconference.us/2010/Papers/MSST/Shvachko.pdf)) [summary](https://yzr95924.github.io/paper_summary/HDFS-MSST'10.html)
3. *RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters*----PDSW'07
4. *CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data*----SC'06
5. *MapReduce: Simplified Data Processing on Large Clusters*----OSDI'04
6. *The Google File System*----SOSP'03
7. *Bigtable: A Distributed Storage System for Structured Data*----OSDI'06

### Cache

1. *TinyLFU: A Highly Efficient Cache Admission Policy*----ACM ToS'17
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
typora-copy-images-to: ../paper_figure
---
iDedup: Latency-aware, Inline Data Deduplication for Primary Storage
------------------------------------------
| Venue | Category |
| :------------------------: | :------------------: |
| FAST'12 | Deduplication System |
[TOC]

## 1. Summary
### Motivation of this paper
- Motivation
Many primary storage workloads are unable to leverage the benefits of deduplication
> due to the associated latency costs.
Prior research has not applied deduplication techniques **inline** to the request path for **latency sensitive**, **primary workloads**.
> inline deduplication: add work to the write path, increase latency
> offline deduplication: wait for system idle time to do deduplication.
> reads remain fragmented in both.
- Disadvantages of offline deduplication
- cause a bloat in storage usage leading to inaccurate space accounting and provisioning
- need system idle time to perform deduplication without impacting foreground requests.
- offline deduplication uses extra disk bandwidth when reading in the staged data.

- Current workloads have two insights:
> 1. spatial locality
> 2. temporal locality
Key question: how to do the tradeoff between capacity savings and deduplication performance?

### iDedup
- Goal: not increase the latency of the already latency sensitive, foreground operations.
1. read operation: fragmentation in data layout.
2. write operation: to identify duplicates, on-disk data structures are accessed.

- Main idea
1. Amortize the seeks caused by deduplication by only performing deduplication when a sequence of on-disk blocks are duplicated.
> examine blocks at write time
> configure a *minimum sequence length*
> tradeoff: capacity savings and performance
2. maintain an in-memory fingerprint cache to detect duplicates in lieu of any on-disk structures.
> a completely memory-resident, LRU cache.
> tradeoff: performance (hit rate) and capacity savings (dedup-metadata size)
- Design rationale
1. *Spatial locality* in the data workloads
Duplicated data is clustered.

2. *Temporal locality* in the data workloads
making the fingerprint table amenable to caching


- System Architecture

![image-20200130153405675](../paper_figure/image-20200130153405675.png)

1. Cache design
One entry per block.
> maps the fingerprint of a block to its disk block number (DBN) on disk.
> use LRU policy, (fingerprint, DBN)
2. Metadata management
In RAM:
> Dedup-metadata cache: a pool of block entries (content-nodes)
> Fingerprint hash table: maps fingerprint to DBN
> DBN hash table: map DBN to its content-node.
In disk
> Reference count file: maintains reference counts of deduplicated file system blocks in a file.
>
> > refcount updates are often collocated to the same disk blocks (thereby amortizing IOs to the refcount file)
3. iDedup algorithm: Sequence identification

### Implementation and Evaluation


- Evaluation
Two tunable parameters:
> 1. the minimum duplicate sequence threshold
> 2. in-memory dedup-metadata cache size
Two comparisons:
1. baseline: without deduplication
2. threshold = 1: exact deduplication

1. Deduplication ratio vs. threshold
threshold increases, the deduplication ratio drops
2. Disk fragmentation vs. threshold
threshold increases, fragmentation decreases
3. client read response time vs. threshold
same trend as disk fragmentation
4. CPU utilization vs. threshold
utilization increases slightly with the threshold
iDedup algorithm has little impacts on the overall utilization
5. Buffer cache hit rate vs. dedup-metadata cache size

## 2. Strength (Contributions of the paper)

## 3. Weakness (Limitations of the paper)
1. This paper provides the insights on spatial and temporal locality of deduplicated data in real-world, primary workloads.


## 4. Some Insights (Future work)
1. This paper mentions that the higher the deduplication ratio, the higher the likelihood of fragmentation.
> deduplication can convert sequential reads from the application into random reads from storage.
2. It mentions the threshold in iDedup must be derived empirically to match the randomness in the workload.
> depends on the workload property
> how to enable the system to automatically make this tradeoff.
3. primary storage system trace
CIFS traces: NetApp (USENIX ATC'08)
108 changes: 108 additions & 0 deletions StoragePaperNote/EnclaveCache-Middleware'19.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
typora-copy-images-to: ../paper_figure
---
EnclaveCache: A Secure and Scalable Key-Value Cache in Multi-tenant Clouds using Intel SGX
------------------------------------------
| Venue | Category |
| :------------------------: | :------------------: |
| Middleware'19 | SGX Storage |
[TOC]

## 1. Summary
### Motivation of this paper
- Motivation
- In-memory key-value caches such as Redis and Memcached have been widely used to speed up web application and reduce the burden on backend database.
- Data security is still a major concern, which affects the adoption of cloud caches (multi-tenant environment)
- co-located malicious tenants
- the untrusted cloud provider
- Limitation of existing approaches
- virtualization and containerization technologies
- achieved tenant isolation at the cost of system scalability, resource contention
- adopt property-preserving encryption to enable query processing over encrypted data
- suffer from high computation overhead and information leakage
- Threat model
- multiple mutually distrust parties in a multi-tenant cloud environment
- privileged adversary can access the data stored outside the trusted environment
- malicious tenants may make spurious access to increase their cache hit rate, and evict the data of co-located tenants out of memory

### EnclaveCache

- Main idea
- enforce data isolation among co-located tenants using multiple SGX
- securely guard the encryption key of each tenant by the enclave
- key question: how to utilize SGX enclaves to realize secure key-value caches within the limited trusted memory
- remains an open question
- Key design decisions
- tenant isolation
- allow multiple tenants to share a single cache instance, and `each tenant gets a separate enclave as a secret container`
- data protection
- plaintext data only stays inside enclaves to get serialized, deserialized and processed, and the data is encrypted one it leaves the enclave.
- Cache isolation
- application container: support un-modified applications inside enclaves (`bad scalability`)
- e.g., SCONE
- data container: hosting only each tenant's data in a dedicated enclave (`oversubscribe the SGX resources`)
- secret container: storing only the sensitive information as well as the critical code into enclaves (`this paper design`)
- Architecture
- ![image-20210529210747569](../paper_figure/image-20210529210747569.png)
- The TLS connection is terminated inside the enclave
- **Encryption engine** inside then secret enclave is responsible for encrypting the sensitive fields of the requests passed from the TLS server endpoint.
- The encryption key used by the encryption engine is acquired by the Key Request Module (KRM) from a Key Distribution Center (KDC).
- via SGX remote attestation
- Key distribution and management
- Each tenant is bound with a unique *encryption key* for the encryption/decryption of tenant's data stored outside the enclave.
- Every newly-created secret enclave has to go through RA procedure to be attested and provisioned
- the encryption key can be stored securely and persistently in the local disk
- SGX sealing mechanism
- Query processing
- only the sensitive fields of a message, such as the key/value field, need to be protected via encryption.
- the IV for encryption is computed from the SHA-256 hash of each sensitive field
- the IV and the MAC is appended to the ciphertext to be used at the time of decryption
- bind the key and value
- appends the hash of the key to its corresponding value, and the encryption is then performed on the newly generated value
- to against the attacher to replace the encrypted value.
- query with the encrypted key
- forward to the request handler

### Implementation and Evaluation
- Implementation
- mbedtls-sgx: AES-128, SHA-256
- Tenant isolation
- per-tenant LRU for shared multi-tenant cache management strategy
- the same account of data is bound to be evicted from each tenant
- bind each tenant with a logical database to enable the per-tenant LRU strategy
- switchless call to optimize the performance

- Evaluation
- four instances: redis + stunnel, EnclaveCache + switchless, EnclaveCache, Graphene-SGX + redis
- YCSB benchmark suite
- 1. throughput
- 2. hotspots analysis
- using Intel VTune amplifier
- 3. latency
- for requests with large values, the performance of it decreases greatly, mainly due to the increased computation overhead for cryptography operations
- 4. scalability
- 5. cache fairness




## 2. Strength (Contributions of the paper)

- leverage trusted hardware to solve the problem of **tenant isolation** and **data protection** in multi-tenant clouds.
- adopts fine-grained, tenant-specific key-value encryption in SGX enclaves to `overcome the limit of SGX`.
- Extensive evaluation
- better performance, higher scalability than running native, unmodified applications in the enclaves

## 3. Weakness (Limitations of the paper)

- Issues of encrypted data stored outside the enclaves
- malicious adversaries can delete or re-insert previous key-value pair
- the operation types, key access frequencies and hashed-key distributions are also visible and exploitable.

## 4. Some Insights (Future work)

- Security issues in multi-tenants environment
- the multi-tenant environment may expose users' sensitive data to the other co-located, possibly malicious tenants
- the cloud platform provider itself cannot be considered trusted
- SGX attach surface
- the attack surface with SGX enclaves is significantly reduced to only the `processor` and `the software inside enclaves`.
Loading

0 comments on commit 26d9a25

Please sign in to comment.