-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
17 changed files
with
1,238 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
StoragePaperNote/Deduplication/Deduplication-System-Design/iDedup-FAST'12.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
--- | ||
typora-copy-images-to: ../paper_figure | ||
--- | ||
iDedup: Latency-aware, Inline Data Deduplication for Primary Storage | ||
------------------------------------------ | ||
| Venue | Category | | ||
| :------------------------: | :------------------: | | ||
| FAST'12 | Deduplication System | | ||
[TOC] | ||
|
||
## 1. Summary | ||
### Motivation of this paper | ||
- Motivation | ||
Many primary storage workloads are unable to leverage the benefits of deduplication | ||
> due to the associated latency costs. | ||
Prior research has not applied deduplication techniques **inline** to the request path for **latency sensitive**, **primary workloads**. | ||
> inline deduplication: add work to the write path, increase latency | ||
> offline deduplication: wait for system idle time to do deduplication. | ||
> reads remain fragmented in both. | ||
- Disadvantages of offline deduplication | ||
- cause a bloat in storage usage leading to inaccurate space accounting and provisioning | ||
- need system idle time to perform deduplication without impacting foreground requests. | ||
- offline deduplication uses extra disk bandwidth when reading in the staged data. | ||
|
||
- Current workloads have two insights: | ||
> 1. spatial locality | ||
> 2. temporal locality | ||
Key question: how to do the tradeoff between capacity savings and deduplication performance? | ||
|
||
### iDedup | ||
- Goal: not increase the latency of the already latency sensitive, foreground operations. | ||
1. read operation: fragmentation in data layout. | ||
2. write operation: to identify duplicates, on-disk data structures are accessed. | ||
|
||
- Main idea | ||
1. Amortize the seeks caused by deduplication by only performing deduplication when a sequence of on-disk blocks are duplicated. | ||
> examine blocks at write time | ||
> configure a *minimum sequence length* | ||
> tradeoff: capacity savings and performance | ||
2. maintain an in-memory fingerprint cache to detect duplicates in lieu of any on-disk structures. | ||
> a completely memory-resident, LRU cache. | ||
> tradeoff: performance (hit rate) and capacity savings (dedup-metadata size) | ||
- Design rationale | ||
1. *Spatial locality* in the data workloads | ||
Duplicated data is clustered. | ||
|
||
2. *Temporal locality* in the data workloads | ||
making the fingerprint table amenable to caching | ||
|
||
|
||
- System Architecture | ||
|
||
 | ||
|
||
1. Cache design | ||
One entry per block. | ||
> maps the fingerprint of a block to its disk block number (DBN) on disk. | ||
> use LRU policy, (fingerprint, DBN) | ||
2. Metadata management | ||
In RAM: | ||
> Dedup-metadata cache: a pool of block entries (content-nodes) | ||
> Fingerprint hash table: maps fingerprint to DBN | ||
> DBN hash table: map DBN to its content-node. | ||
In disk | ||
> Reference count file: maintains reference counts of deduplicated file system blocks in a file. | ||
> | ||
> > refcount updates are often collocated to the same disk blocks (thereby amortizing IOs to the refcount file) | ||
3. iDedup algorithm: Sequence identification | ||
|
||
### Implementation and Evaluation | ||
|
||
|
||
- Evaluation | ||
Two tunable parameters: | ||
> 1. the minimum duplicate sequence threshold | ||
> 2. in-memory dedup-metadata cache size | ||
Two comparisons: | ||
1. baseline: without deduplication | ||
2. threshold = 1: exact deduplication | ||
|
||
1. Deduplication ratio vs. threshold | ||
threshold increases, the deduplication ratio drops | ||
2. Disk fragmentation vs. threshold | ||
threshold increases, fragmentation decreases | ||
3. client read response time vs. threshold | ||
same trend as disk fragmentation | ||
4. CPU utilization vs. threshold | ||
utilization increases slightly with the threshold | ||
iDedup algorithm has little impacts on the overall utilization | ||
5. Buffer cache hit rate vs. dedup-metadata cache size | ||
|
||
## 2. Strength (Contributions of the paper) | ||
|
||
## 3. Weakness (Limitations of the paper) | ||
1. This paper provides the insights on spatial and temporal locality of deduplicated data in real-world, primary workloads. | ||
|
||
|
||
## 4. Some Insights (Future work) | ||
1. This paper mentions that the higher the deduplication ratio, the higher the likelihood of fragmentation. | ||
> deduplication can convert sequential reads from the application into random reads from storage. | ||
2. It mentions the threshold in iDedup must be derived empirically to match the randomness in the workload. | ||
> depends on the workload property | ||
> how to enable the system to automatically make this tradeoff. | ||
3. primary storage system trace | ||
CIFS traces: NetApp (USENIX ATC'08) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
--- | ||
typora-copy-images-to: ../paper_figure | ||
--- | ||
EnclaveCache: A Secure and Scalable Key-Value Cache in Multi-tenant Clouds using Intel SGX | ||
------------------------------------------ | ||
| Venue | Category | | ||
| :------------------------: | :------------------: | | ||
| Middleware'19 | SGX Storage | | ||
[TOC] | ||
|
||
## 1. Summary | ||
### Motivation of this paper | ||
- Motivation | ||
- In-memory key-value caches such as Redis and Memcached have been widely used to speed up web application and reduce the burden on backend database. | ||
- Data security is still a major concern, which affects the adoption of cloud caches (multi-tenant environment) | ||
- co-located malicious tenants | ||
- the untrusted cloud provider | ||
- Limitation of existing approaches | ||
- virtualization and containerization technologies | ||
- achieved tenant isolation at the cost of system scalability, resource contention | ||
- adopt property-preserving encryption to enable query processing over encrypted data | ||
- suffer from high computation overhead and information leakage | ||
- Threat model | ||
- multiple mutually distrust parties in a multi-tenant cloud environment | ||
- privileged adversary can access the data stored outside the trusted environment | ||
- malicious tenants may make spurious access to increase their cache hit rate, and evict the data of co-located tenants out of memory | ||
|
||
### EnclaveCache | ||
|
||
- Main idea | ||
- enforce data isolation among co-located tenants using multiple SGX | ||
- securely guard the encryption key of each tenant by the enclave | ||
- key question: how to utilize SGX enclaves to realize secure key-value caches within the limited trusted memory | ||
- remains an open question | ||
- Key design decisions | ||
- tenant isolation | ||
- allow multiple tenants to share a single cache instance, and `each tenant gets a separate enclave as a secret container` | ||
- data protection | ||
- plaintext data only stays inside enclaves to get serialized, deserialized and processed, and the data is encrypted one it leaves the enclave. | ||
- Cache isolation | ||
- application container: support un-modified applications inside enclaves (`bad scalability`) | ||
- e.g., SCONE | ||
- data container: hosting only each tenant's data in a dedicated enclave (`oversubscribe the SGX resources`) | ||
- secret container: storing only the sensitive information as well as the critical code into enclaves (`this paper design`) | ||
- Architecture | ||
-  | ||
- The TLS connection is terminated inside the enclave | ||
- **Encryption engine** inside then secret enclave is responsible for encrypting the sensitive fields of the requests passed from the TLS server endpoint. | ||
- The encryption key used by the encryption engine is acquired by the Key Request Module (KRM) from a Key Distribution Center (KDC). | ||
- via SGX remote attestation | ||
- Key distribution and management | ||
- Each tenant is bound with a unique *encryption key* for the encryption/decryption of tenant's data stored outside the enclave. | ||
- Every newly-created secret enclave has to go through RA procedure to be attested and provisioned | ||
- the encryption key can be stored securely and persistently in the local disk | ||
- SGX sealing mechanism | ||
- Query processing | ||
- only the sensitive fields of a message, such as the key/value field, need to be protected via encryption. | ||
- the IV for encryption is computed from the SHA-256 hash of each sensitive field | ||
- the IV and the MAC is appended to the ciphertext to be used at the time of decryption | ||
- bind the key and value | ||
- appends the hash of the key to its corresponding value, and the encryption is then performed on the newly generated value | ||
- to against the attacher to replace the encrypted value. | ||
- query with the encrypted key | ||
- forward to the request handler | ||
|
||
### Implementation and Evaluation | ||
- Implementation | ||
- mbedtls-sgx: AES-128, SHA-256 | ||
- Tenant isolation | ||
- per-tenant LRU for shared multi-tenant cache management strategy | ||
- the same account of data is bound to be evicted from each tenant | ||
- bind each tenant with a logical database to enable the per-tenant LRU strategy | ||
- switchless call to optimize the performance | ||
|
||
- Evaluation | ||
- four instances: redis + stunnel, EnclaveCache + switchless, EnclaveCache, Graphene-SGX + redis | ||
- YCSB benchmark suite | ||
- 1. throughput | ||
- 2. hotspots analysis | ||
- using Intel VTune amplifier | ||
- 3. latency | ||
- for requests with large values, the performance of it decreases greatly, mainly due to the increased computation overhead for cryptography operations | ||
- 4. scalability | ||
- 5. cache fairness | ||
|
||
|
||
|
||
|
||
## 2. Strength (Contributions of the paper) | ||
|
||
- leverage trusted hardware to solve the problem of **tenant isolation** and **data protection** in multi-tenant clouds. | ||
- adopts fine-grained, tenant-specific key-value encryption in SGX enclaves to `overcome the limit of SGX`. | ||
- Extensive evaluation | ||
- better performance, higher scalability than running native, unmodified applications in the enclaves | ||
|
||
## 3. Weakness (Limitations of the paper) | ||
|
||
- Issues of encrypted data stored outside the enclaves | ||
- malicious adversaries can delete or re-insert previous key-value pair | ||
- the operation types, key access frequencies and hashed-key distributions are also visible and exploitable. | ||
|
||
## 4. Some Insights (Future work) | ||
|
||
- Security issues in multi-tenants environment | ||
- the multi-tenant environment may expose users' sensitive data to the other co-located, possibly malicious tenants | ||
- the cloud platform provider itself cannot be considered trusted | ||
- SGX attach surface | ||
- the attack surface with SGX enclaves is significantly reduced to only the `processor` and `the software inside enclaves`. |
Oops, something went wrong.