diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524747449744.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1524747449744.png
deleted file mode 100644
index cf00d16..0000000
Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524747449744.png and /dev/null differ
diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524753088554.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1524753088554.png
deleted file mode 100644
index 5d3e314..0000000
Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524753088554.png and /dev/null differ
diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524908286158.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1524908286158.png
deleted file mode 100644
index ebe9578..0000000
Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524908286158.png and /dev/null differ
diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1525441410699.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1525441410699.png
deleted file mode 100644
index 6defdcb..0000000
Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1525441410699.png and /dev/null differ
diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/CRT_Chain.md b/Paper Reading Note/CRT_Chain-INFOCOM'18/CRT_Chain.md
deleted file mode 100644
index 79adfad..0000000
--- a/Paper Reading Note/CRT_Chain-INFOCOM'18/CRT_Chain.md
+++ /dev/null
@@ -1,193 +0,0 @@
----
-typora-copy-images-to: ./
----
-#On Scalable Service Funciton Chaining with O(1) Flowtable Entries
-@INFOCOM'18
-[TOC]
-##1. Backgroud and Motivation
-- Motivation
-NFV offers customization capability. it howerver comes with a cost of consuming precious TCAM resources.
- > The number of service chains that an SDN can support is **limited by the flowtable size of a switch**.
- > The **scalability** is a fundamental challenge of enabling such configurable service chaining.
-
-- The core of CRT-Chain is an encoding mechanism that leverages **Chinese Reminder Theorem (CRT)** to compress the forwarding information into small labels.
-
-- VNFs can be efficiently scaled up/down and, hence, provide agility and flexibility to adaptation of network components according to dynamic user demands.
-
-- SDN is able to deliver **elastic services** so that allows traffic to go through a customized sequence of services.
-
-##2. Technical Challenges
-1. The **ambiguity** in forwarding
-An SFC may request ti be served by different funcitons associated with the same switch in *a specific order*.
-
-2. How to reduce the label size?
-Propose a chain segmentation scheme that allows the overall label size of all the segments to be smaller than the label size of **end-to-end** path.
-
-3. A CRT label size is determined by the primes assigned to the functiuons along a chain.
-It exploits a prime assignment strategy based on the distribution of function popularity to reduce the expected label size.
-
-##3. Related Work
-主要讨论了三个方面的工作:
-- Flowtable Management
-- Entry Size Reduction
-- Flowtable-Free Routing
-
-##4. The Model of SFC
-- The SFC protocol defined in **IETF RFC7665**
-1. To improve reliability and load-balancing, for each Service Function (SF), it can have multiple instances, associated with a different Service Function Forwarder (SFF).
-
-2. An SFF can insert/process the forwarding rules generated for an SFC and can further know how to parse the **Network Service Header (NSH)** of an SFC packet.
-
-two SFC requests: $c_1$, $c_2$
-$c_1$: $SF3 \rightarrow SF11 \rightarrow SF7$
-$c_2$: $SF3 \rightarrow SF7$
-
-3. Each SF instance along the SFP decreases the value of SI by 1 such that all the SFFs can know whether an SFC has been executed completely.
-
-4. **The problem**: the number of the required flowtable entries grows **linearly** with the number of SFC requests. (*to distinct SFC*)
-
-##5. CRT-Chain Design
-###5.1 Overview of CRT-Chain
-Goal: to resolve the scalability issue.
-- High-level Idea of CRT-Chain: replace **per-chain** forwarding rules with **per-function** forwarding rules.
- > For all the chains requesting to be served by an SF instance, it associated SFF inserts **only one forwarding rule** for this SF instance. (regardless of how many chains assigned to it)
-
-An SFF then uses vey simple modular arithmetic to extract the forwarding information directly from the labels, **without knowing which chain it belongs to**.
-
-- CRT-Chain only requires **a constant number** of flowtable entries in each SFF.
-- CRT-Chain transfers the cost of flowtable entries to the label overhead
-
-
-###5.2 CRT-based Chaining
-- Notation:
-$F$: the set of SFF or switches
-$S$: the set of available SF types
-$C$: the set of SFC requests, $c \in C$
-$P(c)$: the routing path of $c$
-$SP(c)$: the SFP of $c$
-
-For example:
-$P(c1) = IC \rightarrow S3 \rightarrow SFF7 \rightarrow SFF13 \rightarrow S17 \rightarrow EC$
-$SP(c1) = SF3(SFF7) \rightarrow SF11(SFF13) \rightarrow SF7(SFF13)$
-
-- In its design, it encodes $P(c)$ and $SP(c)$ of each $c \in C$ into **two variable length labels**, $X_c$ and $Y_c$ (用两个不同的label)
-
-1. Each forwarder decodes $X_c$ and $Y_c$ to extract the forwarding rules for routing and SFP.
-2. The length of the fields $X_c$ and $Y_c$ can be set to $log_2|X_{max}|$ and $log_2|Y_{max}|$
-
-- Assign each forwarder in $F$ a **unique** prime, and, similarly, assign each SF type in $S$ a **unique** prime. (as the ID of the forwarder (SF)), a forwarder and an SF can share **same prime**.
- > $f \neq f'$ for all, $f, f' \in F$
- > $s \neq s'$ for all, $s, s' \in S$. The number of required SF primes is determined by the number of SF types $|S|$, instead of the number of SF instances.
- > $f = s$ for any $f \in F$, $s \in S$ is allowed
-
-- **Encoding and decoding $X_c$**
-Given a path $P(c)$, the path label $X_c$ should satisfy the following constraints:
-$X_c = e_i(mod f_i(c)), \forall 1\leq i \leq |P(c)|$ (用余数表示端口号)
- > $f_i(c)$ is the prime assigned to the i-th forwarder along the path $P(c)$
-
-According to CRT, the solution of $X_c$ can be found as follows:
- > $X_c = (\sum_{i=1}^{n}(w_i \times e_i))$(mod $X)$ (利用同余方程解的公式), $X = \prod_{i=1}^{|P(c)|}f_i$
-
-For example, for a given service function chain $c_1$, which traverses through forwarders with primes 3, 7, 13 and 17 using the output ports 1, 3, 1 and 2 respectively. Hence, the path label $X_{c1}$ should meet the following constraints:
->$X_{c1}$ = 1(mod 3)
->$X_{c1}$ = 3(mod 7)
->$X_{c1}$ = 1(mod 13)
->$X_{c1}$ = 2(mod 17)
-
-get $X_{c1} = 4,252$
-
-The maximum possible label $X_{max}$, can can be found by the largest primes. (考虑了解的最大值的问题)
-To forward an SFC packet, each forwarder decodes $X_c$ and extracts the forwarding port by using its assigned prime, $e_i = X_c$ mod $f_i(c)$ (Forwarder通过计算得到转发所用的端口)
-
-
-- **Encoding and decoding $Y_c$**
-1. The **difference** between SFP label $Y_c$ and path label $X_c$
-an SFF might associate with multiple SF instances and, more inportantly, the order of forwarding to different SF instances matters.
-
-How to overcome this problem?
-it introduce **a step counter $N$** for each SFC packet
-
-For example:
-c:$SF3 \rightarrow SF11 \rightarrow SF7$
->$Y_c$ = 1(mod 3)
->$Y_c$ = 2(mod 11)
->$Y_c$ = 3(mod 7)
-
-In Step1: if $Y_c$ mod 3 = 1, then it means SFF forwards to SF3 in $step1$, then SF3 instance adds the step counter $N$ by 1.(用余数表示顺序)
-
-If current associated SF instances do not match the current step counter, then SFF sends it out.
-
-
-- Ensuring unique SFP forwarding (考虑了唯一性的问题,上述的简单encoding的方法不一定通用 )
-1. This simple CRT encoding cannot guarantee **correct SFP scheduled** by the controller. (e.g. 比如一种SF多个实例的情况,在这种情况下,会出现异常的转发行为)
-
-2. To avoid this ambiguity, it combines all the forwarders along $P(c)$ and all the SF instances along $SP(c)$ into a **merged path $MP(c)$** in their traversing order.
-
- .g. $MP(c) = (1)S3 \rightarrow (2)SFF7 \rightarrow (3)SF3 \rightarrow (4)SFF13 \rightarrow (5)SF11 \rightarrow (6)SF7 \rightarrow (7)S17$ (将每个step所对应的device也编入在序列中)
-
-CRT-Chain now encodes the SFP label $Y_c$ using **the step counts in $MP(c)$**, rather than the original step counts in $SP(c)$, thus, $Y_c$ should now statisfy the following constraints:
-$Y_c = ix_n($mod $s_n(c))$, $\forall 1 \leq n \leq |SP(c)|$
-$ix_n$ denotes the step count (index) of $s_n(c)$ in **the merged path MP(c)**
-
-###5.3 Chain Segmentation
-**Problme**:
-The header overhead, i.e., $|X_c|$ and $|Y_C|$ is determined by the primes used in the congruence system.
-> e.g., larger primes lead to larger labels, $X_c$ and $Y_c$. (the header size scales up with the number of forwarders [SF types] in a network)
-
-- An intutive solution (从减少primes的角度)
-To allow different forwarders (SFs) to use the same prime, as a result minimizing the number of primes.
-
-Given $F(S)$, instead of using $|F|(|S|)$ unique primes, it can use only $\alpha |F|$ $(0 \leq \alpha \leq 1)$
-
-**It poses a new problem**:
-When some forwarders (SFs) share the same prime, there will be problem if they happen to belong to the same path (SFP)
-
-To avoid this problem: it proposes a **segmentation technique** that partitions a path (SFP) into several sub-paths *(in each which any two forwarders (SFs) do not share the same prime)*
-
-- Partitioning a path: (思想很简单 )
-1. Trace the path and check whether any $f \in P(c), (s \in SP(c))$ has a prime duplicated with any one locating prior to it.
-
-2. For any duplicated prime found, the path should be cut here, making those forwarders (SFs) prior to it as **a conflict-free sub-path**.
-
-For example:
-Consider an example path:
-$P(c) = f_1 \rightarrow f_2 \rightarrow f_3 \rightarrow f_4 \rightarrow f_5$
-assigned the primes:
-$5 \rightarrow 13 \rightarrow 5 \rightarrow 2 \rightarrow 7$
-It should be partitioned into two conflict-free sub-paths:
-$P_1(c) = f_1 \rightarrow f_2 = 5 \rightarrow 13$
-$P_2(c) = f_3 \rightarrow f_4 \rightarrow f_5 = 5 \rightarrow 2 \rightarrow 7$
-Those sub-labels are then conncatenated together as the header in the format of
-$(N, l_{X_{c, 1}}, X_{c, 1}, l_{Y_{c, 1}}, Y_{c, 1}, ...)$
-
-
-- Forwarding sub-paths:
-The remaining problem is how can a forwarder know which sub-label $(X_{c, i}, Y_{c, i})$ it should decode and when should a sub-label be discarded.
-
-Its design is motivated by an observation that **each forwarder has a limited number of output ports**.
-> Hence, it can use this parameter to encode the termination rule of a sub-path
-
-let $o_f$ denote the maximum output port of forwarder $f$, the remainder $e_{null}$ can be any integer number larger than the maximum output port $o_f$
-$X_{c, i} = e_{null}$ (mod f),
-
-By doing this, the last-hop forwarder $f$ of a sub-path will get an invalid port $e_{null}$ and easily detect that it should end the current sub-path. **Then, the forwarder drops the current sub-label $(X_{c, i}, Y_{c, i})$**, extracts the next one $(X_{c, i}, Y_{c, i})$.
-
-
-- Prime Assignment
-It can further reduce the header overhead by **decreasing the probability of using those large primes**
-> many paths going through those popular SFs with large primes can output large labels.
-
-**solution**: assign primes to forwarders and SFs according to their **popularity (or loading)**
-1. Assign small primes to *heavy loaded forwarders and popular SF types*, while letting less used ones have *large primes*.
-
-2. **popularity score**: count the number of chains that traverse through a forwarder $f \in F$ (SF $s \in S$), denoted by the **popularity score** $w_f (w_s)$, and sort $f \in F$ ($s \in S$) in descending order of their popularity $w_f (w_s)$
-
-
-##6. Implementation and Experiment
-a network accommodates 64 VMs and hence is capable of supporting 64 SF instances in total.
-- Impact of the Number of SFC Requests (测试scalability)
-- Impact of the Length of SFCs
-- Impact of Prime Reuse and Path Segmentation
-- Impact of the Number of SF Types
-- Impact of Segmentation on Bandwidth Consumption
-- Overall Overhead
\ No newline at end of file
diff --git a/Paper Reading Note/DDP-ICDCS'18/DDP-ICDCS'18.md b/Paper Reading Note/DDP-ICDCS'18/DDP-ICDCS'18.md
deleted file mode 100644
index 620d2bb..0000000
--- a/Paper Reading Note/DDP-ICDCS'18/DDP-ICDCS'18.md
+++ /dev/null
@@ -1,85 +0,0 @@
----
-typora-copy-images-to: ./
----
-
-#DDP: Distributed Network Updates in SDN
-
-@ICDCS'18 @SDN Update
-[TOC]
-
-
-##1. Motivation
-
-- Current update approaches heavilyh rely on the centralized controller to initiate and orchestrate the network updates, resulting in **long latency** of update completion.
-> Quickly and consistently updating the distributed data plane poses a major and common challenge in SDN system
-> coordination of the distributed data plane requires frequent communication between the controller and switches (slow down the update completion time)
-
-- Asynchronous communication channels:
-control messages are often received and executed by switches in an order different from the order sent by the controller.
-> an inapproriate control order may violate the consistency properties on the data, resulting in network anomalies, e.g., blackholes, traffic loops and congestion.
-
-
-
-##2. Overview of this paper
-
-### **Main Idea** of this paper
-DDP develops distributed coordination abilities at the data plane.
-Each datapath operation container (DOC) is encoded with an individual operation and its dependency logic.
-> the network update can be both triggered and executed in a fully local manner, further improving the update speed while maintaining the consistency.
-
-**Insight**: Switches can coordinate with each other to orderly apply the operations, the update time as well asa the controller's processing load will be greatly reduced.
-
-- Real-time update
-the involved DOCs are sent to the data plane in one shot, and the switches can consistently execute them in a distributed manner.
-
-- Updates directly triggered by local events
-the controller prestores the DOCs at the data plane, and when corresponding events happen, the updates can be locally triggered and executed.
-
-### **Challenges** of this paper
-
-
-### Contributions of this paper
-
-2. design novel algorithms to compute and optimize the primitive DOCs for consistent updates
-
-3. implement the Distributed Datapath (DDP) system to evluate its performance in various update scenarios.
-
-### The Method
-
-1. Network Update Problem
-The $C$ denote a network configuration state (a collection of exact match rules)
-Network update is defined as: a transition of confirguration state from $C$ to $C^{'}=update(C,O,e)$
-> $O=\{o\}$ is a set of datapath operations to implement the update, e.g., to insert/delete/modify a flow rule at a particular switch.
-> $e$ is a local event at the data plane to trigger the update, e.g., a link/switch failure and link congestion.
-
-2. DDP Design
-- Operation Dependency Graph (ODG)
-Introduce the concept of an **Operation Dependency Graph (ODG)** that captures the data plane dependencies
-> 1. The dependency is **unidirectional** (no cycles in the graph)
-> 2. ODG expresses and optimized result of the whole dependency relations.
-> 3. connectivity is dispensable in the ODG
-> 4. the ODGs are composable (multiple ODGs for different update events can be composed together)
-
-- Datapath Operation Container (DOC)
-In DDP system, the SDN controller adopts DOCs to configure the data plane, rather than directly sending operations as in traditional SDN.
-> The switches then coordinate with each other to execute the update at right time.
-
-- Execution Behaviors
->1. Push: DOC is executed
->2. Pull: DOC is received at the data plane
-
-Push and Pull are complementary to each other, and with the two behaviors, all operations will be **consistently** applied in a correct order.
-
-- Algorithm
-
-
-
-### Implementation
-
-
-### Experiment
-
-
-### Related Work
-
-
diff --git a/Paper Reading Note/FADE-ICC'16/1520838349163.png b/Paper Reading Note/FADE-ICC'16/1520838349163.png
deleted file mode 100644
index 82e761d..0000000
Binary files a/Paper Reading Note/FADE-ICC'16/1520838349163.png and /dev/null differ
diff --git a/Paper Reading Note/FADE-ICC'16/1520861458179.png b/Paper Reading Note/FADE-ICC'16/1520861458179.png
deleted file mode 100644
index ca3ab0f..0000000
Binary files a/Paper Reading Note/FADE-ICC'16/1520861458179.png and /dev/null differ
diff --git a/Paper Reading Note/FADE-ICC'16/FADE.md b/Paper Reading Note/FADE-ICC'16/FADE.md
deleted file mode 100644
index 3581c2b..0000000
--- a/Paper Reading Note/FADE-ICC'16/FADE.md
+++ /dev/null
@@ -1,77 +0,0 @@
----
-typora-copy-images-to: ./
----
-
-# FADE: Detecting Forwarding Anomaly in Software Defined Network
-@IEEE ICC 2016
-
-[TOC]
-
-## Motivation
-- Flow rules installed in switches can be easily tempered by different entities intentionally or unintentionally.
-
-- Forwarding Anomalies are normally triggered by **equipment failure** and **network attacks**.
-- Flow rules enforced at the data plane may not be same with the flow rules installed by the control plane
-
-- Forwarding Anomaly Detection (FAD) in SDN is always achieved by sending probing packets or analyzing flow statistics. These approaches are not **effective** and **efficient**.
- e.g.
- > 1. high communication overheads
- > 2. cannot capture all attacks
-
-##Problem Statement
-- This paper define network flow as a set of packets processd by the same sequence of flow rules.
-- Normally, forwarding anomalies can be classified into two categories,
- > 1. traffic interception attacks: flow are dropped or forwarded to wrong rule paths that **never return** to their correct rule paths
- > 2. traffic hijacking attacks
-
-##Challenge
-1. Flow Selection: select a minimal set of flows in the rule graph so that their rule paths can cover all rules paths. (**Flow Selection Algorithm**)
-2. Rule Generation
-3. Traffic Aggregation and Traffic Burst: Anomaly identification should accurately collect flow statistics and verify them under traffic burst and traffic aggregation. (**Using Label**)
-
-
-##Solution
-**Basic Idea**: flow rules forwarding the same flow should have consistent view on the flow's statistics.
-There are three steps in FADE:
-- Firstly, FADE builds a **rule graph** according to topology and flow rules. And it uses a flow selection algorithm to select a small set of flows whose rules path can cover all existing rule paths in the rule graph.
-- Secondly, FADE generates dedicated flow rules for every selected flow and installed them in the data plane to track the flows.
-- Thirdly, FADE collects flow statistics to identify if there is any forwarding anomaly.
-
-
-- Flow Selection: For each egress rule, flow selection traverses the rule graph **reversely** to find a rule that has an indegree of 0 in the rule graph, i.e., ingress rule.
- > As the case of traffic aggregation that a rule may have multiple previous rules in the rule graph. Thus, the rule graph is constructed as a forest. In the forest, the roots of trees are egress rules and leaves are ingress rules.
- 
-
-- Rule Generation: FADE generates several dedicated flow rules to collect their flow statistics and computers the set of switches on which these dedicated flow rules should be installed. FADE supposes it;s a **bijection** between flow rules in a rule graph.
- > e.g. ${r_{11}, r_{31}, r_{41}}$ is ${S_1, S_3, S_4}$
-- It generates k (k>2) dedicated flow rules for each selected flow, and installs them to the first switch, k-2 intermediate switches and the last switch on the flow's forwarding path.
-
-- Once a dedicated flow rule is installed on the switch where the malicious flow rule is installed on the switch where the malicious flow rule is enforced, **it forwards the flow prior to the malicious rule and hides the malicious rule**. There is an optimal $k$ to maximize the successful detection probability. (Do the calculation to find the optimal number)
- > In practice, they find rule paths are hardly longer than 32.
- > It calculates the optimal $k$ for different rule path length, $p(k)=p_1(k)+\sum_{l=2}^{l=m-1}p_2(k, l) (2 \leq k \leq n)$
- > $p_1(k)=\frac{n-k}{n-2}$ is the probability that **Traffic interception attacks** can be detected.
- > $p_2(k, l)=\frac{n-k}{n-2}-\frac{(n-k)...(n-k-l+1)}{(n-2)...(n-l-1)} (2 \leq l < n)$ is the probability that **Traffic hijacking attacks** can be detected.
- > The result and Anomaly Identification Algorithm shows below:
-
-
-
-## Implementation
-- This paper implements FADE as an application on the **Floodlight Controller**. And there are three modules in FADE:
- > 1. Rule Storage Module: it is extecded from **HSA** and maintains all flow rules by monitoring **OFFlowMod** messages and analyzes the dependencise among these rules.
- > 2. Rule Graph Module: It monitors rule storage updates and **LDUpdate** messages, i.e., topology update messages to build rule graph.
- > 3. Anomaly Detection Module: It interacts with the above two modules and detects anomalies according to information retrieved from them.
-
-## Evaluation
-- Floodlight 1.1, Mininet 2.2.1, OVS 2.3.2
-- Using a virtual machine which has a 2.5 GHz dual-core CPU and 16GB memory to emulate different networks.
-- Malicious rules are simulated by injecting flow rules directly into OVS through **ovs-ofctl**.
-- Link throughputs are measured by **iperf**.
-
-## Related Work
-- ATPG: it is a test packet generation framework whose results can be used to verify all flow rules in the network.
- > It only supports static configuration and is time-consuming.
-- SDN Traceroute: it uses a label base scheme to generate test packets to verify flows's forwarding paths hop by hop.
- > It generates lots of packets and only adapts to anomaly location.
-- NetPlumber: it is an invariant checking tool based on HSA. (Similar to rule graph)
-- SPHINX: It uses flow statistics to verify data plane forwarding, which is very similar to FADE
- > It defines flow as source and destination MAC address.
\ No newline at end of file
diff --git a/Paper Reading Note/FADE-ICC'16/flow_selection_algorithm.png b/Paper Reading Note/FADE-ICC'16/flow_selection_algorithm.png
deleted file mode 100644
index 224384c..0000000
Binary files a/Paper Reading Note/FADE-ICC'16/flow_selection_algorithm.png and /dev/null differ
diff --git a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520995111174.png b/Paper Reading Note/SDNtraceroute-HotSDN'14/1520995111174.png
deleted file mode 100644
index 81940b9..0000000
Binary files a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520995111174.png and /dev/null differ
diff --git a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520997877920.png b/Paper Reading Note/SDNtraceroute-HotSDN'14/1520997877920.png
deleted file mode 100644
index 2c5661a..0000000
Binary files a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520997877920.png and /dev/null differ
diff --git a/Paper Reading Note/SDNtraceroute-HotSDN'14/SDNtraceroute.md b/Paper Reading Note/SDNtraceroute-HotSDN'14/SDNtraceroute.md
deleted file mode 100644
index 7e0bed4..0000000
--- a/Paper Reading Note/SDNtraceroute-HotSDN'14/SDNtraceroute.md
+++ /dev/null
@@ -1,70 +0,0 @@
----
-typora-copy-images-to:./
----
-# SDN traceroute: Tracing SDN Forwarding without Changing Network Behavior
-
-@HotSDN'14
-[TOC]
-
-## Motivation
-- Flexibility in SDN brings added complexity, which requires new debugging tools that can provide insights into network behavior.
-- SDN programs and controllers often translate **high-level configurations** into **low-level rules**. The result is that it can be difficult for network operators to **predict** the exact low-level rules.
-- It is imperative to have tools that can provide visibility into how different packets are handle by the network at any given time.
-- The main limitation of $traceroute$ is it can only provide the *layer-3 (IP)* path information because it relies on the time-to-live (TTL) field in the IP header to trigger ICMP error messages from intermediate routers.
-
-## Problem Statement
-- Goal: trace the path of a given packet using the actual forwarding rules in the networkm with as little impact on the network as possible. Requirements:
- > 1. Non-invasive: existing rules in forwarding tables should remain unchanged.
- > 2. Accurate: The existing forwarding rules should be applied directly to the probes as well as production traffic when measuring the behavior of a switch.
- > 3. Low resource consumption: Requiring only a small number of rules per switch and update those rules infrequently.
- > 4. Commodity hardware: This work can work on existing SDN protocols.
- > 5. Arbitrary traffic: It should be possible to trace the path of any flow and even any given packet form within a flow.
-
-- SDN traceroute runs as an application on an SDN controller so that it can push rules to the switches and listen to OpenFlow messages. It has access to the **topology of the network**.
- > *input*: an arbitrary Ethernet frame, an injection point in the form of a switch identifier and port.
- > *output*: an ordered list of $$ pairs corresponding to each hop encountered by the packet as it traversed the network.
-
-## Solution
-### Network Configuration
-
-- SND traceroute must install rules that allow it to selectively **trap probes**.
- > 1. matching the incoming probe packet so the hop can be logged at the controller
- > 2. not matching the controller-returned probe as to forward the packet downstream.
-
-- SDN traceroute firstly applys a graph coloring algorithm to the topology.
- > 1. Colors will serve as tags that are an integral part of the rules
- > 2. The coloring algorithm assigns each switch a color that **no two adjacent switches are assigned same color**. (典型的图论中的染色问题)
- > 3. This problem in Graph Theory is an **NP-hard problem**,SDN traceroute uses a **greedy algorithm** to color the vertices.
- > 4. All traffic carries a color so that the switches can decide whether or not to send a probe to the controller. (Using **VLAN priority field** as the tag)
- > 5. In general, many datacenter topologies use a hierarchical tree structure consisting of core, aggregation and ToR switches. Those topologies require only 2-bit tags as **trees are 2-colorable**.
-
-- The number of rules installed in a switch depends on the number of colors used by its **adjacent switches**. In most scenarios, this rqeuires installing one or two *TCAM rules*, e.g.
- 
-- These rules need only be changed when the network topology changes.
-### Conducting the Trace Route
-
-- Initialize the injection point $(switch, port)$:
- > 1. Use the API call
- > 2. the attachment point of the source host, which is looked up by source MAC or IP address.
- > 3. it looks up **the color of the ingress switch** and inserts the color into the header tag bit of the probe frame.
- > 4. SDN traceroute sends the probe to the ingress switch as the $PACKET\_OUT$ message with input port set to the injection point. The action of $PACKET\_OUT$ is set to $TABLE$. (**目的: 让switch在处理时将probe当作从input port中接收到来处理**)
-
-- Running Steps:
-
-
-###The Assumption in this problem
-- These bits for tag must not be modified by any devices in the network. (e.g. Middleboxes)
-- The bits must correspond to header field(s) that can be matched on using rules in the switches. (e.g. 12 matchable fields in OpenFlow 1.0)
-- SDN traceoute reserves the **hightest priority rules**.
-
-## Evaluation
-- Five IBM RackSwitch G8264 OpenFlow-enabled switches connecting several comodity servers running Open vSwitch
-- Implementation of SDN traceroute is a module for the Floodlight controller providing a REST API allowing a network operator to perform a trace route for an arbitrary packet. (600 LOC)
-## Related Work
-- ATPG
-- NetSight
-- Anteater
-- Header Space Analysis (HSA)
-- Veriflow
-- Libra
-- OFRewind
\ No newline at end of file
diff --git a/Paper Reading Note/SDProber-SOSR'18/1521122343477.png b/Paper Reading Note/SDProber-SOSR'18/1521122343477.png
deleted file mode 100644
index d5e1b23..0000000
Binary files a/Paper Reading Note/SDProber-SOSR'18/1521122343477.png and /dev/null differ
diff --git a/Paper Reading Note/SDProber-SOSR'18/1521277960477.png b/Paper Reading Note/SDProber-SOSR'18/1521277960477.png
deleted file mode 100644
index 25c4229..0000000
Binary files a/Paper Reading Note/SDProber-SOSR'18/1521277960477.png and /dev/null differ
diff --git a/Paper Reading Note/SDProber-SOSR'18/1521284130008.png b/Paper Reading Note/SDProber-SOSR'18/1521284130008.png
deleted file mode 100644
index 90425a0..0000000
Binary files a/Paper Reading Note/SDProber-SOSR'18/1521284130008.png and /dev/null differ
diff --git a/Paper Reading Note/SDProber-SOSR'18/SDProber.md b/Paper Reading Note/SDProber-SOSR'18/SDProber.md
deleted file mode 100644
index 0c192f3..0000000
--- a/Paper Reading Note/SDProber-SOSR'18/SDProber.md
+++ /dev/null
@@ -1,141 +0,0 @@
----
-typora-copy-images-to: ./
----
-
-# SDProber: A Software Defined Prober for SDN
-@SOSR'18
-
-[TOC]
-
-## Motivation
-- **Persistent delays** in wide area networks are perilous and can adversely affect the effectivness of online service. (need to proactivly detect long delays as early as possible.)
-- There is a trade-off between *detection time* and *cost* in proactive measurement of delays in SDN.
- > 1. Increasing the inspection rate of a link can reduce the detection time of a delay
- > 2. Inspecting a link too often could hinder traffic via that links or via the nodes it connects
- > 3. The limitation of the inspection rate per each link.
- > A lower bound: specify how often the link should be inspected.
- > An upper bound: restrict the number of inspecting per link.
-
-- The frequency of congestion and high delays could be learned from the history and the inspection rates would be modified accordingly. (可以采用adaptive方法的原因)
-- Traditional tools (e.g. $ping$, $traceroute$) are unsuitable for adaptive measurement where different links should be inspected at **different rates**. (ping's predefined path limitation)
-- SDN's central control over forwarding rules allows for efficient implementation of adaptable delay monitoring.
-- Adaption is achieved by changing the probabilities that govern the random walk.
-- SDProber is used for **proactive delay measurements** in SDN.
-## Goal
-- 1. inspect links at specified rates.
-- 2. reduce the total number of probe packets when monitoring the network
-- 3. minimize the number of excess packets through the links
-- 4. detect delays earlt
-
-## Problem Statement
-### Model of Network and Delays
-- Network is represented as a directed graph $G=(V, E)$ (normal model)
-- The network operator specifies the minimum and maximum rates of probe-packet dispatching per link.
- > 1. **Input**: a network $G$, rate constraints on edges, a cost constraint $C$ that specifies the total probe packets per minute.
- > 2. **Objective**: probe $G$ that probe rates would satisfy the rate constrains and the cost constraint $C$.
-- Computing a set of paths that satisfies the probe-rate constraints is complex, expensive in terms of running times (essentially, NP-hard), and inflexible.
-
- > if sending probes via predefined paths
-
-
-
-### Overview of SDProber
-#### Delay Measurement
-- The delay between two given nodes $s_1$ and $s_2$ is measured by SDProber using probe packets. The schematic representation of the process is shown below:
-
-
-- $t_1^{\leftrightarrow}$ and $t_2^{\leftrightarrow}$ are the round trip times between the nodes $s_1$ and $s_2$ and the collector. (can be easily measured by $ping$).
- > $t_1^{\rightarrow}$ and $t_2^{\rightarrow}$ be the one-way trip times from the node $s_1$ and $s_2$ to the collector
- > $t_2-t_1 \leq delay(s_1, s_2)+t_2^{\rightarrow} \leq delay(s_1, s_2)+t_2^{\leftrightarrow}$
- > $t_2-t_1 \geq delay(s_1, s_2)-t_1^{\rightarrow} \geq delay(s_1, s_2)-t_1^{\leftrightarrow}$
- > Result in: $t_2-t_1-t_2^{\leftrightarrow} \leq delay(s_1, s_2) \leq t_2-t_1+t_1^{\leftrightarrow}$
-
-#### System Architecture
-- SDProber sends probe packets repeatedly to measure delays in different parts of the network.
-
- > Collector collects mirrored packets to compute the expected delay per link or path.
-
-- **Probe Agent**: It crafts and dispatches probe packets. In general, a probe client is attached to every node. And the number of probe clients can vary.
-
-- Each probe has a **unique ID** in its payload and will be emitted from a **probe client.**
- > 1. Probe packets are marked to distinguish them from genuine traffic.
- > 2. The collector groups mirrored packets of different probes from one another.
-
-
-- **SDN Controller and Open vSwitch**: Underlying elements of the network are OVS with OpenFlow programming interface.
- > 1. SDProber routes probe packets in a **random walk fashion**.
- > 2. To achieve this, it uses a combination of **group tables** and **match-action rules**.
- > 3. OpenFlow's group tables are designed to execute one or more buckets for a single match
- > e.g. SDProber uses group tables in **ALL** (execute all the buckets) and **SELECT** (execute a selected bucket) modes.
- > 4. Each bucket has a weight, for each forwarded packet, the OVS choose a bucket and executes the actions in the bucket.
- > Each bucket contains a forwarding rule to a different neighbor node (a different port)
- > The buckets are selected arbitrarily. (e.g. per a hash of field values, in proportion to the weights.)
- > To add randomness to the bucket selection, the probe agent assigns a **unique source MAC address** to each probe packet. (In repeating visits at a node, the same actions are applied at each visit, 所以traversal是一个 **pseudo random walk**) 为啥?
-
-
-
-- **Collector**: For each mirrored probe packets, collector records the **arrival time**, extracts the UDP source from the header and gets the **unique identifer** from the payload.
- > 1. Mirrored packets are grouped by the **identifier** of the probe.
- > 2. The number of groups should be equal to the total number of the probe packets.
- > 3. The number of packets in each group is equal to the **initial TTL limit**.
- > 4. After grouping, the collector computers the traversed path of each probe packet by ordering the mirrored packets of each group based on **DPID values of switches and the known network topology**.
- > 5. The recorded times of arrival of the ordered mirrored packets are used for estimating the delay for **each link** on the path.
-
-## Monitoring By Random Walk
-- SDProber needs to satisfy the rate constraints when routing probes. Instead of computing a set of path directly, the probe packets perform a random walk over a weighted graph.
- > 1. The initial node and each traversal step are selected randomly, per probe.
- > 2. The **link-selection probabilities** are proportional to **the weight of forwarding rules**.
- > 3. The length of path is limited by setting the TTL field. It determines the number of inspected links per probe packet.
-
-- The model:
- > 1. $n$ is the number of nodes in the network $G$, each node $v_i \in V$ has a weight $w_i$.
- > 2. $W = \sum^{n}_{i=1}w_i$ is the sum of weights.
- > 3. The probability of selecting node $v_i$ is $\frac{w_i}{W}$, For each selection of a node, a number $x$ in range $[0,1)$, $\frac{\sum^{i-1}_{j=1}w_j}{W} \leq x \leq \frac{\sum^{i}_{j=1}w_j}{W}$, $v_i$ is the selected node.
- > 4. For each probe packet forwarding, the link (port) for the next step is chosen **proportionally to the weights** assigned to forwarding rules.
-
-- To control the inspection rates:
- > 1. It needs to estimate the number of probes passing through each link for **a given number of emitted probes**.
- > + **First**, it computes visit probabilities for nodes. $P_0$ is a vector and $P_0[i]$ is the probabilities of selecting $v_i$ as the initial node, $1 \leq i \leq n$.
- > + The transition matrix of $G$ is an $n \times n$, $M=(p_{ij})_{1 \leq i,j \leq n}$, where $p_{ij}$ is the probability of forwarding the probe packet from $v_i$ to $v_j$
- > + For each node $v_i$, the array $(p_{i1},....,p_{in})$ specifies the probabilities for the next step after reaching $v_i$. If $v_i$ and $v_j$ are not neighbors, then $p_{ij} = 0$ and $\sum^{n}_{j=1} p_{ij}=1$, $v_i \in V$
-
-- Given the initial probabilities vector $P_0$, $P_1 = (M^T) P_0$ is the vector of probabilities of reaching each node after one step.
- > 1. $P_t = (M^T)^tP_0$ denotes the probabilities of reaching each node after $t$ steps of the random walk.
- > 2. The probabilities of the traversing a link $(v_i, v_j)$ in $k$ == The probabilities of reaching node $v_i$ in step $t$ and proceeding to node $v_j$ in step $t+1$, for some $0 \leq t < k$, it denotes this probability by $p-traverse_{ij}$.
- > 3. $p-traverse_{ij} = \sum^{k-1}_{t=0} (P_t)_i (p_{ij})$, **$(P_t)_i$ is the probability of reaching node $v_i$ at step $t$**, **$p_{ij}$ is the probability of forwarding to $v_j$ a packet that arrived at $v_i$.**
-
-- Why using random walk: In random walk approach, they do not conduct **complex computations** to craft probe packets or change them as they traverse the graph.
-
- > If network changes require **adjustments** of probe rates, they merely alter the node weights of the intial node selection or alter the weights in the group tables.
-
-## Weight Adaptation
-- Weights affects the random walk are adjusted to aid satisfying the rate constraints.
- > 1. SDProber modifies the weights iteratively using **binary exponential backoof**
- > 2. The iterations continue indefinitely as long as the monitoring continues.
-
-- **Link-weight adaptation**: Weights are doubled (halved) when the probing rate is below (above) the minimum (maximum) rate.
- > 1. Rates within the limits specified by the rate constraints are adjusted after each iteration.
- > 2. Historically delayed links could receive **a higher weight** than links with no history of delays, to be visited more frequently. (通过调整迭代系数控制)
-
-- **Node-weight adaptation**: Node weights are modified to reflect changes in links.
-
- > 1. The weight of a node with a link below the minimum rate is doubled, to increase the chances of visiting it in the next iteration.
-
-## Baseline Method
-- Prober packets are sent via the **shortest path** between two selected nodes. There are two baseline method:
- > 1. **Random Pair Selection (RPS)**: In each iteration, pairs of source and destination nodes are selected randomly. In each iteration, the pair of source and destination nodes is selected uniformly from **the set of pairs that have not been selected previously**, till all the links are probed.
- > 2. **Greedy Path Selection**: In each iteration, for each of nodes, the $weight$ of the shortest path $P$ between these nodes is $\sum_{e \in P, e \notin Visited}min-rate(e)$, the sum of the min-rate values of all the unvisited links on the path.
- > The path with the maximal weight is selected and its links are added to $Visited$.
-
-## Evaluation
-- Mininet, OpenVSwitch 2.7.2, RYU controller, publicly-available real topology (196 nodes and 243 links)
-- 实验内容:Detection Time, Cost Effectiveness, Adjusting $\alpha$
-
-## Related Work
-### Utilizing mirroring for measurements
-- **NetSight** uses mirroring to gather information about the trajectories of all the packets in a network. (does not scale)
-- **Everflow** provides a scalable **sampling of packets** in datacenter networks. (it requires specific hardware)
-
-### Using Probe Packet
-- **SLAM** uses the time of arrival of OpenFlow **packetin messages** at the controller to estimate the delay between links. (It is only relevant to datacenter traffic where there are enough packetin message generated)
-- **OpenNetMon** provides per-flow metrics (e.g. throughput, delayu and packet loss) for OpenFlow networks.
\ No newline at end of file
diff --git a/Paper Reading Note/The Quantcast File System-VLDB'13/1523365611372.png b/Paper Reading Note/The Quantcast File System-VLDB'13/1523365611372.png
deleted file mode 100644
index 1d03f9d..0000000
Binary files a/Paper Reading Note/The Quantcast File System-VLDB'13/1523365611372.png and /dev/null differ
diff --git a/Paper Reading Note/The Quantcast File System-VLDB'13/1523504685432.png b/Paper Reading Note/The Quantcast File System-VLDB'13/1523504685432.png
deleted file mode 100644
index 28fde56..0000000
Binary files a/Paper Reading Note/The Quantcast File System-VLDB'13/1523504685432.png and /dev/null differ
diff --git a/Paper Reading Note/The Quantcast File System-VLDB'13/QFS.md b/Paper Reading Note/The Quantcast File System-VLDB'13/QFS.md
deleted file mode 100644
index cfa58a8..0000000
--- a/Paper Reading Note/The Quantcast File System-VLDB'13/QFS.md
+++ /dev/null
@@ -1,131 +0,0 @@
----
-typora-copy-images-to: ./
----
-
-# The Quantcast File System
-@VLDB2013
-[TOC]
-
-##1. Background and Motivation
-- The QFS is an efficient alternative to the Hadoop Distributed File System (HDFS), written in C++. It offers several efficiency improvements relative to HDFS:
- > 1. 50% dish space savings through **erasure coding** instead od replication
- > 2. a resulting doubling of **write throughput**
- > 3. a faster name node
- > 4. support for faster sorting and logging through a concurrent append feature
- > 5. a native command line client much faster than hadoop fs
- > 6. global feedback-directed I/O device management.
-
-- Apache Hadoop maxized use of hardware by adopting a principle of **data locality**.
-- To achieve fault tolerance, the HDFS adopted a sensible **3x replication strategy**.
- > store one copy of the data on the machine writing ir, another on the same rack, and a third on a distant rack.
- > Thus HDFS is not particularly storage efficient.
- > At today's cost of $40,000 per PB. For reference, Amazon currently charges \$2.3 million to store 1 PB for three years.
-
-- As these developments (e.g. high bandwidth network......), the QFS **abandoned data locality**, relying on faster networks to deliver the data where it is needed, and instead optimized for storage efficiency.
-
-- QFS emplys **Reed-Solomom erasure coding** instead of three-way replication which delivers comparable or better fault tolerance.
-- QFS was developed on the frame of the **Kosmox FIle System** an open-source distributed file system architecturally similar to Hadoop's HDFS but implemented in C++ rather than Java and at an experimental level of maturity.
-
-##2. QFS Architecture
-- The basic design goal:
- > It is intended for efficient **map-reduce-style** processing, where files are written once and read multiple time by **batch process**, rather than for random access or update operations.
- > The hardware will be **heterogeneous**, as clusters tend to be built in stages over time, and disk, machine and nework failures will be routine.
-
-- Data is physically stored in **64MB chunks**, which are accessed via **chunk server** running on the local machine.
-- A single **metaserver** keeps an in-memory mapping of logical path names to file IDs, file IDs to chunk IDs and chunk IDs to physical locations.
-- A **client library** which implements Hadoop's FileSystem interface and its equivalent in C++.
-
-
-
-###2.1 Erasure Coding and Striping
-- Erasure coding enables QFS not only to reduce the amount of storage but also to **accelerate large sequential write patterns** common to MapReduce workloads.
-- Its our **proprietary MapReduce** implementation uses QFS not onlt for results but also for intermediate sort spill files.
- > Erasure coding is critical to getting these large jobs to run quickly while **tolerating hardware failures** without having to **re-execute** map tasks.
-
-- A data stream is stored physically using **Reed-Solomon 6+3** encoding
- > The original data is striped over six chunks plus three parity chunks.
-
-- **Write Data**: The QFS client collects data stripes, usually 64 KB each, into **six 1MB buffers**. When they fill, it calculates an additional three parity blocks and send all **nine blocks** to **nine different chunk servers** (usually one lcoal and the other eight on different rack.)
-
-- **Read Data**: The client requests the six chunks holding original data
- > If one or more chunks cannot be retrieved, the client will fetch enough parity data to execute the Reed-Solomon arithmetic and reconstruct the original.
-
-
-
-###2.2 Failure Groups
-- *To maximize data availability*, a cluster must be partitioned into **failure groups**.
- > Each failure group represents machines with shared physical dependencies such as power circuits or rack switches, which are therefore more likely to fail together.
- > The metaserver will attempt to assign the nine chunks into **nine different failure groups**.
- >
-
-###2.3 Metaserver
-- The QFS metaserver holds all the directory and file structure fo the file system, though **none of the data**.
-- For each file, it keeps the **list of chunks** that store the data and their **physical locations on the cluster**.
-- It handles client requests
- > creates and mutates the direstory and file structure on their behalf.
- > refers client to chunk servers and manages the overall health of the file system.
-
-- Metaserver holds all its data in RAM.
- > As client change files and directories, it records the changes atomically both in memory and to **a transcation log**
- > It forks periodically to dump **the whole file system image** into a checkpoint.
- >
-
-####2.3.1 Chunk creation
-- For **load balance**:
- > 1. Chunk servers continuously report to the metaserver the size of I/O queues and available space for each disk they manage.
- > 2. The metaserver dynamcially decides where to allocate new chunks so as to keep disks evenly filled and evenly busy.
- > 3. It **proactively** avoids disks with problems, as they usually have large I/O queues.
-
-####2.3.2 Space Rebalancing and Re-replication
-- QFS **rebalances** files continuously to maintain a predefined measure of balance across all devices.
- > The rebalance takes place when one or more disks fill up **over a ceiling threshold**, and moves chunks to devices with space utilization **below a floor threshlod**.
- >
-
-####2.3.3 Maintaining Redundancy
-- A large cluster, **components are failing constantly**. The file system can be caught with **less redundancy** than it should have.
-- The metaserver continuously monitors redundancy and recreates missing data.
-
-####2.3.4 Evicting Chunks
-- Eviction is a request to recreate a chunk server's data elsewhere so that its machine can be safely taken down.
-
-####2.3.5 Hibernation
-- For quick maintenance such as an operating system kernel upgrade, chunk are not **evicted**. Instead, the metaserver is told that chunk server directories are being **hinbernated**.
- > This will set **a 30-minute window** during which the metaserver will not attempt to replicate or recover the data on the servers being upgraded.
-
-
-###2.4 Chunk server
-- Each chunk server stores chunks as file on the **local file system**.
- > The chunk server accepts connections from clients to write and read data.
-
-- It verifies **data integrity** on reads and initiates reconvery on permanent I/O errors or checksum mismatches.
-
-###2.5 Interoperability
-- QFS does not depend on Hadoop, though, and can be used in other contexts.
-- The open-source distribution includes FUSE bindings, command-line tools, and C++/Java APIs.
-
-##3. QFS Implementation
-###3.1 Direct I/O for MapReduce Workloads
-- By default QFS uses direct I/O rather than the system buffer cache, for several reasons
- > 1. It wanted to ensure that data is indeed written contiguously in large blocks.
- > 2. It wanted RAM usage to be predictable.
- > 3. The QFS metaserver makes chunk allocation decisions based on global knowledge of the queue sizes of all the I/O devices it manages.
-
-###3.2 Scalable Concurrent Append
-- QFS implements *a concurrent append operation*, which scales up to tens of thousands of concurrent clients writing to the same file at once.
-
-###3.3 Metaserver Optimization
-- The metaserver represents the file system metadata in a **B+ tree** to minimize minimize random memory access.
-
-###3.4 Client
-- The QFS client library is designed to allow concurrent I/O access to mulitple files from a single client.
- > 1. **non-blocking**, **run-until-completion protocol state machines** for handling a variety of tasks.
- > 2. The state machine can be used directlu to create highly scalable applications.
-
-- The QFS library API is implemented by **running the protocol state machines** in a dedicated protocol worker thread.
- > All file I/O processing including network I/O, checksumming, and recovery information calculation are performed within this thread.
- >
-
-- The file system meta information manipulations such as **move**, **rename**, **delete**, **stat**, or **list** require communication only with **the metaserver**.
- > These operations are **serialized** by the QFS client library and block the caller thread until the metaserver responds.
-
-- The use of **read ahead** and **write behind** keeps disk and network I/O at a reasonable size.
\ No newline at end of file
diff --git a/Paper Reading Note/Track-CloudNet'17/1524660411325.png b/Paper Reading Note/Track-CloudNet'17/1524660411325.png
deleted file mode 100644
index fc9313e..0000000
Binary files a/Paper Reading Note/Track-CloudNet'17/1524660411325.png and /dev/null differ
diff --git a/Paper Reading Note/Track-CloudNet'17/1524660530235.png b/Paper Reading Note/Track-CloudNet'17/1524660530235.png
deleted file mode 100644
index 98f4067..0000000
Binary files a/Paper Reading Note/Track-CloudNet'17/1524660530235.png and /dev/null differ
diff --git a/Paper Reading Note/Track-CloudNet'17/1524661117396.png b/Paper Reading Note/Track-CloudNet'17/1524661117396.png
deleted file mode 100644
index 3f94b23..0000000
Binary files a/Paper Reading Note/Track-CloudNet'17/1524661117396.png and /dev/null differ
diff --git a/Paper Reading Note/Track-CloudNet'17/1524663396210.png b/Paper Reading Note/Track-CloudNet'17/1524663396210.png
deleted file mode 100644
index e45eb6f..0000000
Binary files a/Paper Reading Note/Track-CloudNet'17/1524663396210.png and /dev/null differ
diff --git a/Paper Reading Note/Track-CloudNet'17/Track.md b/Paper Reading Note/Track-CloudNet'17/Track.md
deleted file mode 100644
index d52b44b..0000000
--- a/Paper Reading Note/Track-CloudNet'17/Track.md
+++ /dev/null
@@ -1,93 +0,0 @@
----
-typora-copy-images-to: ./
----
-
-# Track: Tracerouting in SDN Networks with Arbitary Network Functions
-@CloudNet'17
-[TOC]
-
-##1. Background and Motivation
-- Existing path tracing tools largely utilize **packet tags** to probe network paths among SDN-enabled switches.
-- **Ubiquitous Network Functions (NFs) ** or middleboxes can drop packets or alter their tags which can collapse the probing mechanism.
-- Sending probing packets through network functions could **corrupt their internal states**, risking of correctness of servicing logic.
- > e.g. incorrect load balancing decisions
-
-##2. Related Work
-- 1. **SDN traceroute**: querying the current path taken by *any types of packet*.
- > SDN traceroute cannot work correctly in a network with network functions (or middleboxes) because some NFs, such as proxy and load balancer can modify packet headers and/or payload.
-
-- 2. **SFC Path Trace**: it is able to trace paths consisting of NFs. However, it also relies on **tags probing packets**. It identifies the type of NFs that have forwarded tagged packets to the controller NFs through **looking up their device IDs from the predefined topology**.
- > a. It will greatly limit its usability when a person has only partial or no access to the topology information.
- > b. sending probing packets through them may corrupt their internal states.
-
-##3. Track
-- The main idea:
- >1. **Track** treat the whole path as several sub-paths joined by NFs.
- >2. It injects a probing packet with user-defined header fields into networks to trace each sub-path.
- >3. It runs **a correlation procedure to infer behaviors of NFs** and concatenate all sub-paths in correct order according to correlation results. (eliminates the requirement of look-up of NF's ID from pre-defined topology information.)
- > This method uses a correlation procedure rather than sending probing packets through them, **preserving their internal states**.
-
-###3.1 System Design and Implementation
-####Design Principles:
-**Track** is a diagnosing tool for debugging in SDN environment with NFs.
-1. Do not corrupt NF states
-2. Do not modify NF service logic
-3. Do not modify production rules
-- How about controller?
-1. It knows the topology of a given network
-2. the controller knows which switch has an NF attached to it (NF-switches)
-
-####System Architecture
-
-
-####Correlation Module:
-1) NFs may drop the packet or dynamically modify its headers and contents. This paper roughly classify NFs into 4 types.
-
-
-2) Correlation Procedure:
-It treats NFs as **blackboxes** and infer their relevant **input-output** behaviors. (这里假设不去ask network administrators关于NF的信息)
-It only need to reason about the NFs behaviors pertinent to packets' forwarding.
-**WorkFlow**:
-a. Collecting packets
-b. Flow mapping
-c. Calculate payload similarity
-d. Identify the most similar flows
-
-3) Implementation of the Correlation Module:
-The controller **install rules** at NF-switches to retrieve the first few packets for each new flow.
-
-
-####Tracing Module
-
-1) Pre-installed rules: The rules must support two different tasks:
- > a) mathing the incoming probing packets so the hop can be logged at the controller (**similar to SDN traceroute**)
- > b) forwarding the controller returned probing packets as normal packets.
-
-Track requires all probing packets to carry a tag so that switches can differentiate probing packets and normal packets. (这一点也是必须的)
-
-2) Tracing procedure:
-1. Users need to specify packet header (e.g. source/destination IP address, source/destination port and so on)
-2. Track constructs the packet with user specified packet header fields and tag
-3. Identify the injection point which is the switch connected to the source specified by user.
-4. Track **sends the probing packet to injection point**, it would be sent to the controller as **a PACKET_IN**.
-5. If current hop is not an NF-switch, Track do the same operations as **SDN Traceroute**.
-6. If current hop is an NF-switch, Track would modify the probing packet **as the NFs do** (according to the mappings) and preserve the probe tag.
-
-Track only log down the information of PACKET_IN messages with probe tag.
-
-3) Implementation of Tracing Module:
-Using the interfaces of RYU to construct probing packets.
-
-
-##4. Experiment
-- Two metrics: **accuracy** and **latency**
-Latency VS Path Length (Compared with SDN traceroute)
-Latency VS Different types of NFs
-Effectiveness and efficiency of **correlation procedure** (accuracy)
-
-##4. The weakness in its method
-1. Not consider the Dynamic Change in SFC
-2. Need to find the correlation between incoming and outgoing flows
-3. Cannot handle the case of multiple paths
-
-
diff --git a/README.md b/README.md
index c68bea1..957673b 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
-# Storage System Paper List
+# Zuoru's Storage System Reading List
-In this repo, it records some paper related to storage system, including **Data Deduplication** (aka, dedup), **Erasure Coding** (aka, EC), general **Distributed Storage System** (aka, DSS) and other related topics (i.e., Network Security.....), updating from time to time~
+A reading list related to storage system, including data deduplication, erasure coding, general storage and other related topics (i.e., Security...), updating from time to time~
[TOC]
## A. Data Deduplication
@@ -27,6 +27,8 @@ In this repo, it records some paper related to storage system, including **Data
11. *Inside Dropbox: Understanding Personal Cloud Storage Services*----IMC'12
11. *Identifying Trends in Enterprise Data Protection Systems*----USENIX ATC'15 ([link](https://www.usenix.org/system/files/conference/atc15/atc15-paper-amvrosladis.pdf))
11. *Deduplication Analyses of Multimedia System Images*----HotStorage'18 ([link](https://www.usenix.org/system/files/conference/hotedge18/hotedge18-papers-suess.pdf))
+14. *Improving Docker Registry Design based on Production Workload Analysis*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-anwar.pdf))
+14. *Insights for Data Reduction in Primary Storage: a Practical Analysis*----SYSTOR'12 ([link](https://dl.acm.org/doi/pdf/10.1145/2367589.2367606))
### Deduplication System Design
@@ -44,15 +46,14 @@ In this repo, it records some paper related to storage system, including **Data
12. *SmartDedup: Optimizing Deduplication for Resource-constrained Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-yang-qirui.pdf))
13. Can't We All Get Along? Redesigning Protection Storage for Modern Workloads----USENIX ATC'18 ([link](https://www.usenix.org/system/files/conference/atc18/atc18-allu.pdf)) [summary](https://yzr95924.github.io/paper_summary/Redesigning-ATC'18.html)
14. *Deduplication in SSDs: Model and quantitative analysis*----MSST'12 ([link](https://ieeexplore.ieee.org/document/6232379))
-16. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf )) [summary](https://yzr95924.github.io/paper_summary/iDedup-FAST'12.html)
-17. *DupHunter: Flexible High-Performance Deduplication for Docker Registries*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-zhao.pdf))
-18. *Design Tradeoffs for Data Deduplication Performance in Backup Workloads*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-fu.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupDesignTradeoff-FAST'15.html)
-19. *The Dilemma between Deduplication and Locality: Can Both be Achieved?*---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html)
+15. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf )) [summary](https://yzr95924.github.io/paper_summary/iDedup-FAST'12.html)
+16. *DupHunter: Flexible High-Performance Deduplication for Docker Registries*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-zhao.pdf))
+17. *Design Tradeoffs for Data Deduplication Performance in Backup Workloads*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-fu.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupDesignTradeoff-FAST'15.html)
+18. *The Dilemma between Deduplication and Locality: Can Both be Achieved?*---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html)
19. *SLIMSTORE: A Cloud-based Deduplication System for Multi-version Backups*----ICDE'21 ([link](http://www.cs.utah.edu/~lifeifei/papers/slimstore-icde21.pdf))
20. *Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling*----ToS'21 ([link](https://dl.acm.org/doi/full/10.1145/3459626))
-20. *Sorted Deduplication: How to Process Thousands of Backup Streams*----MSST'16 ([link](https://storageconference.us/2016/Papers/SortedDeduplication.pdf))
-20. *Deriving and Comparing Deduplication Techniques Using a Model-Based Classification*----EuroSys'15 ([link](https://dl.acm.org/doi/pdf/10.1145/2741948.2741952))
-20. *DedupSearch: Two-Phase Deduplication Aware Keyword Search*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-elias.pdf))
+21. *Sorted Deduplication: How to Process Thousands of Backup Streams*----MSST'16 ([link](https://storageconference.us/2016/Papers/SortedDeduplication.pdf))
+22. *Deriving and Comparing Deduplication Techniques Using a Model-Based Classification*----EuroSys'15 ([link](https://dl.acm.org/doi/pdf/10.1145/2741948.2741952))
### Restore Performances
@@ -60,7 +61,7 @@ In this repo, it records some paper related to storage system, including **Data
2. *ALACC: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive Look-Ahead Window Assisted Chunk Caching*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-cao.pdf)) [summary](https://yzr95924.github.io/paper_summary/ALACC-FAST'18.html)
3. *Reducing Impact of Data Fragmentation Caused by In-line Deduplication*----SYSTOR'12 ([link](http://9livesdata.com/wp-content/uploads/2017/04/AsPresentedOnSYSTOR-1.pdf))
4. *Reducing Fragmentation Impact with Forward Knowledge in Backup Systems with Deduplication*----SYSTOR'15 ([link](https://dl.acm.org/doi/10.1145/2757667.2757678))
-5. *Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets*----MASCOTS'12
+5. *Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets*----MASCOTS'12 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6298180))
6. *Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance*----FAST'19 ([link](https://www.usenix.org/system/files/fast19-cao.pdf)) [summary](https://yzr95924.github.io/paper_summary/LookBackWindow-FAST'19.html)
7. *Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication*---FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final124.pdf)) [summary](https://yzr95924.github.io/paper_summary/ImproveRestore-FAST'13.html)
8. *Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage*----HPCC'11
@@ -99,7 +100,7 @@ In this repo, it records some paper related to storage system, including **Data
29. *S2Dedup: SGX-enabled Secure Deduplication*----SYSTOR'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3456727.3463773)) [summary](https://yzr95924.github.io/paper_summary/S2Dedup-SYSTOR'21.html)
30. *Secure Deduplication of General Computations*----USENIX ATC'15 ([link](https://www.usenix.org/system/files/conference/atc15/atc15-paper-tang.pdf))
31. *When Delta Sync Meets Message-Locked Encryption: a Feature-based Delta Sync Scheme for Encrypted Cloud Storage*----ICDCS'21 ([link](https://shenzr.github.io/publications/featuresync-icdcs21.pdf)) [summary](https://yzr95924.github.io/paper_summary/FeatureSync-ICDCS'21.html)
-31. *DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-bacs.pdf)) [summary](https://yzr95924.github.io/paper_summary/DeepSketch-FAST'22.html)
+31. *DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-bacs.pdf)) [summary](https://yzr95924.github.io/paper_summary/DUPEFS-FAST'22.html)
### Metadata Management
@@ -141,7 +142,10 @@ In this repo, it records some paper related to storage system, including **Data
13. Ddelta: A Deduplication-inspired Fast Delta Compression Approach----Performance'14 ([link](https://www.sciencedirect.com/science/article/pii/S0166531614000790))
14. *Odess: Speeding up Resemblance Detection for Redundancy Elimination by Fast Content-Defined Sampling*----ICDE'14 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9458911))
15. *Exploring the Potential of Fast Delta Encoding: Marching to a Higher Compression Ratio*----CLUSTER'20 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9229609)) [summary](https://yzr95924.github.io/paper_summary/Gdelta-CLUSTER'20.html)
-15. *DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-park.pdf))
+15. *DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-park.pdf)) [summary](https://yzr95924.github.io/paper_summary/DeepSketch-FAST'22.html)
+17. *DedupSearch: Two-Phase Deduplication Aware Keyword Search*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-elias.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupSearch-FAST'22.html)
+17. *To Zip or not to Zip: Effective Resource Usage for Real-Time Compression*----FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final38.pdf)) [summary](https://yzr95924.github.io/paper_summary/CompressionEst-FAST'13.html)
+17. *Adaptively Compressing IoT Data on the Resource-constrained Edge*----HotEdge'20 ([link](https://www.usenix.org/system/files/hotedge20_paper_lu.pdf))
### Memory && Block-Layer Deduplication
@@ -151,6 +155,8 @@ In this repo, it records some paper related to storage system, including **Data
4. *OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash*----FAST'16 ([link](https://www.usenix.org/system/files/conference/fast16/fast16-papers-chen-zhuan.pdf))
5. *CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives*----FAST'11 ([link](https://www.usenix.org/legacy/event/fast11/tech/full_papers/Chen.pdf)) [summary](https://yzr95924.github.io/paper_summary/CAFTL-FAST'11.html)
5. *Remap-SSD: Safely and Efficiently Exploiting SSD Address Remapping to Eliminate Duplicate Writes*----FAST'21 ([link](https://www.usenix.org/system/files/fast21-zhou.pdf))
+7. *Memory Deduplication for Serverless Computing with Medes*----EuroSys'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3492321.3524272))
+8. On the Effectiveness of Same-Domain Memory Deduplication----EuroSec'22 ([link](https://download.vusec.net/papers/dedupestreturns_eurosec22.pdf))
### Data Chunking
1. *SS-CDC: A Two-stage Parallel Content-Defined Chunking for Deduplicating Backup Storage*----SYSTOR'19 ([link]( http://ranger.uta.edu/~sjiang/pubs/papers/ni19-ss-cdc.pdf )) [summary](https://yzr95924.github.io/paper_summary/SSCDC-SYSTOR'19.html)
@@ -171,10 +177,6 @@ In this repo, it records some paper related to storage system, including **Data
3. *Nitro: A Capacity-Optimized SSD Cache for Primary Storage*----USENIX ATC'14 ([link](https://www.usenix.org/system/files/conference/atc14/atc14-paper-li_cheng_nitro.pdf))
4. *Austere Flash Caching with Deduplication and Compression*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-wang-qiuping.pdf))
-### Benchmark
-
-1. *SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-gracia-tinedo.pdf))
-
### Garbage Collection
1. *Memory Efficient Sanitization of a Deduplicated Storage System*----FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final100_0.pdf)) [summary](https://yzr95924.github.io/paper_summary/MemorySanitization-FAST'13.html)
@@ -194,8 +196,9 @@ In this repo, it records some paper related to storage system, including **Data
4. *Tradeoffs in Scalable Data Routing for Deduplication Clusters*----FAST'11 ([link](https://www.usenix.org/legacy/events/fast11/tech/full_papers/Dong.pdf)) [summary]( https://yzr95924.github.io/paper_summary/TradeoffDataRouting-FAST'11.html )
5. *Cluster and Single-Node Analysis of Long-Term Deduplication Patterns*----ToS'18 ([link](https://dl.acm.org/doi/pdf/10.1145/3183890)) [summary](https://yzr95924.github.io/paper_summary/ClusterSingle-ToS'18.html)
6. *Decentralized Deduplication in SAN Cluster File Systems*----USENIX ATC'09 ([link](https://static.usenix.org/events/usenix09/tech/full_papers/clements/clements.pdf))
-6. *GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-nachman.pdf))
-6. *The what, The from, and The to: The Migration Games in Deduplicated Systems*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-kisous.pdf))
+7. *HYDRAstore: A Scalable Secondary Storage*----FAST'09 ([link](http://9livesdata.com/wp-content/uploads/2017/04/HYDRAstor-A-Scalable-Secondary-Storage-1.pdf))
+8. *GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-nachman.pdf))
+9. *The what, The from, and The to: The Migration Games in Deduplicated Systems*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-kisous.pdf)) [summary](https://yzr95924.github.io/paper_summary/MigrationGame-FAST'22.html)
## B. Erasure Coding
@@ -279,6 +282,8 @@ In this repo, it records some paper related to storage system, including **Data
17. *Splinter: Practical Private Queries on Public Data*----NSDI'17 ([link](https://www.usenix.org/system/files/conference/nsdi17/nsdi17-wang-frank.pdf))
18. *Quantifying Information Leakage of Deterministic Encryption*----CCSW'19 ([link]( http://users.cs.fiu.edu/~mjura011/documents/2019_CCSW_Quantifying_Information_Leakage_of_Deterministic_Encryption )) [summary](https://yzr95924.github.io/paper_summary/QuantifyingInformationLeakage-CCSW'19.html)
18. *Pancake: Frequency Smoothing for Encrypted Data Stores*----USENIX Security'20 ([link](https://www.usenix.org/system/files/sec20-grubbs.pdf))
+19. *Hiding the Lengths of Encrypted Message via Gaussian Padding*----CCS'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3460120.3484590))
+20. *On Fingerprinting Attacks and Length-Hiding Encryption*----CT-RSA'22 ([link]())
### Secure Deletion
@@ -340,8 +345,9 @@ In this repo, it records some paper related to storage system, including **Data
11. *The Google File System*----SOSP'03 ([link](https://dl.acm.org/doi/pdf/10.1145/945445.945450))
12. *Bigtable: A Distributed Storage System for Structured Data*----OSDI'06 ([link](https://dl.acm.org/doi/pdf/10.1145/1365815.1365816))
13. *Duplicacy: A New Generation of Cloud Backup Tool Based on Lock-Free Deduplication*----ToCC'20 ([link](https://github.com/gilbertchen/duplicacy/blob/master/duplicacy_paper.pdf)) [summary](https://yzr95924.github.io/paper_summary/Duplicacy-ToCC'20.html)
+13. *RACS: A Case for Cloud Storage Diversity*----SoCC'10 ([link](http://pubs.0xff.co/papers/racs-socc.pdf))
-### New PAXOS
+### Consensus
1. *In Search of an Understandable Consensus Algorithm*----USENIX ATC'14 ([link](https://raft.github.io/raft.pdf))
@@ -350,6 +356,7 @@ In this repo, it records some paper related to storage system, including **Data
1. *TinyLFU: A Highly Efficient Cache Admission Policy*----ACM ToS'17 ([link](https://arxiv.org/pdf/1512.00727.pdf))
2. *It’s Time to Revisit LRU vs. FIFO*----HotStorage'20 ([link](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cache-HotStorage'20.html) [trace](http://iotta.snia.org/traces/key-value)
3. *Unifying the Data Center Caching Layer — Feasible? Profitable?*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470884))
+4. *Learning Cache Replacement with Cacheus*----FAST'21 ([link](https://www.usenix.org/system/files/fast21-rodriguez.pdf))
### Hash
@@ -357,6 +364,7 @@ In this repo, it records some paper related to storage system, including **Data
2. *An Analysis of Compare-by-Hash*----HotOS'03 ([link](http://www.cs.utah.edu/~shanth/stuff/research/dup_elim/hash_cmp.pdf))
3. *On-the-Fly Verification of Rateless Erasure Codes for Efficient Content Distribution*----S&P'04 ([link](https://pdos.csail.mit.edu/papers/otfvec/paper.pdf))
4. *Algorithmic Improvements for Fast Concurrent Cuckoo Hashing*----EuroSys'14 ([link](https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf))
+4. *Don’t Thrash: How to Cache your Hash on Flash*----HotStorage'11 ([link](https://www.usenix.org/legacy/events/hotstorage11/tech/final_files/Bender.pdf))
### Lock-free storage
1. *A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring*----IPDPS'10 ([link](https://www.cse.cuhk.edu.hk/~pclee/www/pubs/ipdps10.pdf))
@@ -383,6 +391,9 @@ In this repo, it records some paper related to storage system, including **Data
1. *From blocks to rocks: a natural extension of zoned namespaces*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470870))
1. *Don’t Be a Blockhead: Zoned Namespaces Make Work on Conventional SSDs Obsolete*----HotOS'21 ([link](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s07-stavrinos.pdf)) [summary](https://yzr95924.github.io/paper_summary/BlockHead-HotOS'21.html)
1. Zone Append: A New Way of Writing to Zoned Storage----Vault'20 ([link](https://www.usenix.org/system/files/vault20_slides_bjorling.pdf))
+1. *What Systems Researchers Need to Know about NAND Flash*----HotStorage'13 ([link](https://www.usenix.org/system/files/conference/hotstorage13/hotstorage13-desnoyers.pdf))
+1. *Caveat-Scriptor: Write Anywhere Shingled Disks*----HotStorage'15 ([link](https://www.usenix.org/system/files/conference/hotstorage15/hotstorage15-kadekodi.pdf))
+1. *Improving the Reliability of Next Generation SSDs using WOM-v Codes*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-jaffer.pdf))
### File system
@@ -393,8 +404,22 @@ In this repo, it records some paper related to storage system, including **Data
5. *EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-gao.pdf))
5. *F2FS: A New File System for Flash Storage*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-lee.pdf))
5. *How to Copy Files*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-zhan.pdf))
+5. *BetrFS: A Compleat File System for Commodity SSDs*----EuroSys'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3492321.3519571))
+5. *The Full Path to Full-Path Indexing*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-zhan.pdf))
+5. *BetrFS: A Right-Optimized Write-Optimized File System*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-jannen_william.pdf))
+11. *Filesystem Aging: It's more Usage than Fullness*----HotStorage'19 ([link](https://www.cs.unc.edu/~porter/pubs/hotstorage19-paper-conway.pdf))
+12. *File Systems Fated for Senescence? Nonsense, Says Science!*----FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-conway.pdf))
### Persistent Memories
1. *SLM-DB: Single-Level Key-Value Store with Persistent Memory*----FAST'19 ([link](https://www.usenix.org/system/files/fast19-kaiyrakhmet.pdf)) [summary](https://yzr95924.github.io/paper_summary/SLMDB-FAST'19.html)
2. *Redesigning LSMs for Nonvolatile Memory with NoveLSM*----USENIX ATC'18 ([link](https://www.usenix.org/system/files/conference/atc18/atc18-kannan.pdf)) [summary](https://yzr95924.github.io/paper_summary/NoveLSM-ATC'18.html)
+
+### Data Structure
+
+1. *An Introduction to Be-trees and Write-Optimization*----USENIX Login'15 ([link](https://www.usenix.org/system/files/login/articles/login_oct15_05_bender.pdf)) [code](https://github.com/oscarlab/Be-Tree)
+1. *Building Workload-Independent Storage with VT-Trees*----FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final165_0.pdf))
+
+### Benchmark
+
+1. *SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-gracia-tinedo.pdf))
\ No newline at end of file
diff --git a/StoragePaperNote/Deduplication/Distributed-Dedup/MigrationGame-FAST'22.md b/StoragePaperNote/Deduplication/Distributed-Dedup/MigrationGame-FAST'22.md
new file mode 100644
index 0000000..aeabae6
--- /dev/null
+++ b/StoragePaperNote/Deduplication/Distributed-Dedup/MigrationGame-FAST'22.md
@@ -0,0 +1,108 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+The what, The from, and The to: The Migration Games in Deduplicated Systems
+------------------------------------------
+| Venue | Category |
+| :------------------------: | :------------------: |
+| FAST'22 | Distributed Deduplication |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+
+- motivation
+ - the high-level management aspects of large-scale systems (e.g. capacity planning, caching and cost of service) still need to be adapted to deduplication storage
+ - data migration: file are remapped between separate **deduplication domains**, or **volumes**
+ - volumes: a single server within a large-scale system, or an independent set of servers dedicated to a customer or dataset
+ - employ a separate fingerprint index in each physical server
+ - optimize several possibly conflicting objectives
+ - the physical size of the stored data (after migration)
+ - the load balancing between the system's volumes
+ - the network bandwidth generated by the migration
+- the main goal
+ - formulate the general migration problem for deduplicated systems as an optimization problem
+ - minimize the system's size
+ - ensuring that the storage load is evenly distributed between the system's volumes (**load balancing** consideration)
+ - the network traffic required for the migration does not exceed its allocation (**traffic** consideration)
+
+### Migration Games
+
+- problem statement
+ - minimizing migration traffic
+ - the amount of data that is transferred between volumes during migration
+ - load balancing
+ - trade-off between minimizing the total physical data size and maximizing load balancing
+ - extreme case: map all files to a single volume
+ - evenly distribute the capacity load between volumes
+ - use fairness metric: the ratio between the size of the smallest volume in the system and that of the largest volume (perfect: 1)
+ - traffic constraint, load balancing constraint
+ - traffic constraint: the maximum traffic allowed during migration
+ - load balancing constraint: a margin of the average volume size
+- Greedy (extend SketchVolume)
+ - iterates over all the files in each volume, and calculates the space-saving ratio from remapping a single file to each of the other volumes
+ - each phase is allocated an even portion of the traffic allocated for migration
+ - load-balancing step
+ - remap files from large volumes to small ones, until the volume sizes are within the margin defined for this phase
+ - capacity-reduction step
+ - use **remaining traffic** to reduce the system's size
+- ILP (extend GoSeed)
+ - all varaibles are boolean
+ - objective: maximize the sum of sizes of all blocks that are deleted minus all blocks that are copied
+ - acceleration methods
+ - fingerprint sampling: k leading zeroes, reducing the number of blocks in the problem
+ - solver timeout: halts the ILP solver's execution after a pre-determined runtime
+- Clustering
+ - main idea: files are similar if they are share a large portion of their blocks
+ - create clusters of similar files and to assign each cluster to a volume
+ - remapping those files that were assigned to a volume different from their original location
+ - hierarchical clustering
+ - in each iteration, merge the most similar pair of clusters into a new cluster
+ - file similarity
+ - use Jaccard index for shared blocks
+ - traffic and load-balancing consideration
+ - determine the maximal cluster size by estimating the system's size after migration
+ - sensitivity to sample
+ - rather than merging the pair of clusters with the smallest distance, we merge a **random** pair from the set of pairs with the smallest distances
+ - constructing the final migration plan
+ - for the same given system and migration constraints, execute the clustering process with different parameters, use the best deletion as the final result
+
+### Implementation and Evaluation
+
+- trace:
+ - MS, FSL, Linux (all of them are public)
+- evaluation
+ - basic comparison between algorithms
+ - the deletion percentage of the initial system's physical size
+ - balance score
+ - the total runtime
+ - sensitivity to problem parameters
+ - effect of sampling degree
+ - effect of load balancing and traffic constraints
+ - effect of randomization on Cluster
+ - effect of the number of volumes
+
+## 2. Strength (Contributions of the paper)
+
+- formulate a general migration problem with three approaches
+ - a greedy algorithm, an ILP-based approach, and hierarchical clustering
+
+## 3. Weakness (Limitations of the paper)
+
+- does not provide a system to apply its algorithm
+ - how to collect metadata for solving the optimization problem?
+- hard to follow as the data migration problem is not common yet
+ - only happens in very large-scale storage system
+
+## 4. Some Insights (Future work)
+
+- related work
+ - SketchVolume-FAST'19
+ - a greedy algorithm
+ - GoSeed-FAST'20
+ - files are remapped into an initially **empty** target volume
+ - Rangoli-SYSTOR'13
+ - a greedy algorithm for space reclamation
+ - a set of files is deleted to reclaim some of the system's capacity
+- data migration in distributed deduplication systems
+ - if a subsystem becomes full while another subsystem has available capacity, migration is quicker and cheaper than adding capacity to the full subsystem
\ No newline at end of file
diff --git a/StoragePaperNote/Deduplication/Post-Dedup/CompressionEst-FAST'13.md b/StoragePaperNote/Deduplication/Post-Dedup/CompressionEst-FAST'13.md
new file mode 100644
index 0000000..765c3d2
--- /dev/null
+++ b/StoragePaperNote/Deduplication/Post-Dedup/CompressionEst-FAST'13.md
@@ -0,0 +1,94 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+To Zip or Not to Zip: Effective Resource Usage for Real-Time Compression
+---------------------------------------
+
+| Venue | Category |
+| :------------------------: | :------------------: |
+| FAST'13 | Compression |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+
+- motivation
+ - adding compression on the data path consumes **scarce CPU** and **memory** resources on the storage system
+ - real-time compression for block and file primary storage systems
+ - it is advisable to avoid compressing what we refer to as "incompressible" data
+ - standard LZ type compression algorithms incur higher performance overheads **when the data does not compression well**
+ - 
+- main problem
+ - identifying **incompressible data** in an efficient manner, allowing systems to effectively utilize their limited resources
+ - a macro-scale compression estimation for the whole data set (**offline**)
+ - a micro-scale compressibility test for individual write operations (**online**)
+
+### Compression Estimation/Test
+
+- the macro-scale solution
+ - for an entire volume or file system of a storage system
+ - estimate the overall compression ratio with **accuracy guarantee**
+ - the general framework
+ - choose `m` random locations
+ - compute an average of the compression ratio of these locations
+ - location, contribution
+ - real life implementations of compression algorithms are subject to **locality limits **(can use a chunk to define the locality)
+ - don’t want to hold long back pointers
+ - memory management, need to flush their buffers
+ - define the contribution of a byte as **the compression ratio of its locality**
+- the micro-scale solution
+ - for a single write: 8KB, 16KB, 32KB, 128KB
+ - recommend to zip or not to zip (has to be much faster than actual compression)
+ - do not want to read the entire chunk, impossible to get guarantees
+ - the heuristics method
+ - collect **a set of basic indicators** about the chunk
+ - from random samples from the chunk rather than the whole chunk
+ - core-set size: the character set that makes up most of the data
+ - byte-entropy
+ - symbol-pairs distribution indicator (from random distribution)
+ - sample: at most 2KB of data per write buffer
+ - 16 consecutive bytes from up to 128 randomly chosen locations
+ - define several thresholds to test the indicators
+
+### Implementation and Evaluation
+
+- implementation
+ - the macro-scale solution: written in C, multi-threaded
+- evaluation
+ - compression ratios v.s. the number of samples
+ - running time v.s. compression trade-off
+ - compared with the prefix method and the full compression
+
+## 2. Strength (Contributions of the paper)
+
+- the macro-scale test provides a quick and accurate estimate for which data sets to compress
+- the micro-scale test heuristics have proved critical in reducing resource consumption while maximizing compression for volumes containing a mix of compressible and incompressible data
+
+## 3. Weakness (Limitations of the paper)
+
+- is not general to other compression algos (e.g., LZ4, ZSTD)
+- define the thresholds to find a good point for disabling compression is not clear
+- evaluation is limited, no end-to-end system performance evaluation
+
+## 4. Some Insights (Future work)
+
+- a bit about compression techniques
+ - this paper focuses on **Zlib** - a popular compression engine for (zip), combines:
+ - **LZ compression**: pointers instead of repetitions
+ - **Huffman encoding**: use shorter encoding to popular characters
+- existing solutions for estimating compression ratios
+ - by file extension
+ - not always accurate, not always available
+ - look at the actual data
+ - scan and compress everything
+ - look at a prefix of (a file or a chunk) and deduce about the rest
+ - not guarantees on the outcome
+ - good for compressible data - zero overhead
+
+- put all together
+ - when most is compressible
+ - use prefix estimation
+ - when significant percent is incompressible
+ - use heuristics method
+ - when most is incompressible
+ - turn compression off
diff --git a/StoragePaperNote/Deduplication/Post-Dedup/DUPEFS-FAST'22.md b/StoragePaperNote/Deduplication/Post-Dedup/DUPEFS-FAST'22.md
new file mode 100644
index 0000000..11b6d33
--- /dev/null
+++ b/StoragePaperNote/Deduplication/Post-Dedup/DUPEFS-FAST'22.md
@@ -0,0 +1,25 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels
+------------------------------------------
+| Venue | Category |
+| :------------------------: | :------------------: |
+| FAST'22 | secure deduplication |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+
+-
+
+### Method Name
+
+### Implementation and Evaluation
+
+## 2. Strength (Contributions of the paper)
+
+## 3. Weakness (Limitations of the paper)
+
+## 4. Some Insights (Future work)
+
diff --git a/StoragePaperNote/Deduplication/Post-Dedup/DedupSearch-FAST'22.md b/StoragePaperNote/Deduplication/Post-Dedup/DedupSearch-FAST'22.md
new file mode 100644
index 0000000..47c743e
--- /dev/null
+++ b/StoragePaperNote/Deduplication/Post-Dedup/DedupSearch-FAST'22.md
@@ -0,0 +1,108 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+DedupSearch: Two-Phase Deduplication Aware Keyword Search
+------------------------------------------
+| Venue | Category |
+| :------------------------: | :------------------: |
+| FAST'22 | Post-Deduplication functionality |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+
+- motivation
+ - in deduplicated storage, it creates multiple logical pointers from different files and even users, to each physical chunk
+ - this many-to-one relationship complicates many functionalities (e.g., caching, capacity planning, and support for QoS)
+ - present an opportunity to rethink those functionalities to be **deduplication-aware** and **more efficient**
+ - this paper aims to address the keyword search issue in deduplicated storage
+- the main goal
+ - focus on **offline search** of large, deduplicated storage systems for legal or analytics purposes
+- why other approaches cannot work
+ - their index size is proportional to **the logical size of the data** and consume a large fraction of storage capacity
+ - not useful for binary strings or more complex keyword patterns (assume a delimiter set such as whitespace)
+ - their data structures must be continually updated as new data is received
+
+### DedupSearch
+
+- naive approaches
+ - opening each file and scanning its content for the specified keywords (**inefficient due to fragmentation and resulting random accesses**)
+ - a given chunk may be read repeatedly from storage due to deduplication
+- main idea
+ - begin with a **physical phase** that performs a **physical scan** of the storage system and scans each chunk of data for the keywords
+ - reading the data sequentially with large I/Os as well as reading each chunk of data only once
+ - record the **exact match** of the keyword, if it is found, as well as the prefixes of suffixes of the keyword (**partial matches**) found at chunk boundaries
+ - then, with a **logical phase** that performs a logical scan of the file system by traversing the chunk pointers that make up the files
+ - instead of reading the actual data chunks
+- challenges
+ - most deduplication systems do not maintain "back pointers" from chunks to the file that contain them (addressed by the logical phase)
+ - cannot associate keyword matches in a chunk with the corresponding file
+ - keywords might be split **between adjacent chunks** in a file (addressed by recording the partial matches)
+ - record the prefixes of the keyword that appear at the end of a chunk and suffixes that appear at the beginning of a chunk
+
+- string-matching algorithm
+ - use the Aho-Corasick string-match algorithm
+ - a trie-based algorithm for matching multiple strings in a single scan of the input
+ - construct a trie for the **reverse** dictionary to identify suffixes at the beginning of a chunk
+- match result database
+ - exact matches
+ - chunk-result record
+ - location-list record: only if the chunk contains more than one exact match
+ - long location-list record
+ - tiny substrings
+ - keywords that begin or end with frequent letters in the alphabet might result in the allocation of numerous chunk-result record
+ - tiny-result record
+ - only if the chunk does not contain any exact match nor a partial match
+ - database organization
+ - in-memory database: chunk-result index, location-list index
+ - disk-based hash table: the tiny-result index
+- generation of full research results
+ - for each file in the system, the **file recipe** is read, and the fingerprints of its chunks are used to lookup result records in the database
+ - collecting exact match and combining partial matches for each fingerprint
+ - the logical phase can be parallelized to some extent
+ - separate backups or files can be processed in parallel
+
+### Implementation and Evaluation
+
+- implementation
+ - based on Destor: three restore thread
+ - use Destor to ingest all the data
+- evaluation
+ - traces
+ - Wikipedia backups, linux kernel versions, and Web server VM backups
+ - linux versions ordered by version, major version, minor version, and patch
+ - Wikipedia backups: archived twice a month since 2017, each snapshot is 1GiB and consists of a single archive file
+ - experiments
+ - DedupSearch performance
+ - effect of deduplication ratio, chunk size, dictionary size, and keywords in the dictionary
+ - DedupSearch data structures
+ - index sizes, database accesses
+ - DedupSearch overheads
+ - physical phase, logical phase
+
+## 2. Strength (Contributions of the paper)
+
+- very strong experiments
+- address the string search issue from the deduplication aspect (a new direction)
+ - no previous work targets this issue
+
+## 3. Weakness (Limitations of the paper)
+
+- the scenario is limited
+ - is more appropriate when queries are **infrequent** and moderate latency is acceptable such as in legal discovery
+- the main idea is very similar to DeduplicationGC-FAST'17, GoSeed-FAST'20
+ - process the **post-deduplication data** **sequentially** along with an analysis phase **on the file recipes**
+
+- lack the support of wildcards
+ - since its prefix/suffix approach incur high overhead, it would be more challenging to support wildcards
+ - attempting to match the chunk content starting at all possible offsets within the keyword
+
+## 4. Some Insights (Future work)
+
+- the concept from **near-storage processing**
+ - the storage system supports certain computations to **reduce I/O traffic and memory usage**
+
+- the restore process considered by it
+ - parse the file recipe
+ - looking up the chunk locations in the fingerprint index
+ - reading their containers
\ No newline at end of file
diff --git a/StoragePaperNote/Deduplication/Secure-Dedup/DUPEFS-FAST'22.md b/StoragePaperNote/Deduplication/Secure-Dedup/DUPEFS-FAST'22.md
new file mode 100644
index 0000000..2a8771b
--- /dev/null
+++ b/StoragePaperNote/Deduplication/Secure-Dedup/DUPEFS-FAST'22.md
@@ -0,0 +1,96 @@
+---
+typora-copy-images-to: ../paper_figure
+---
+DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels
+------------------------------------------
+| Venue | Category |
+| :------------------------: | :------------------: |
+| FAST'22 | Secure Deduplication |
+[TOC]
+
+## 1. Summary
+### Motivation of this paper
+
+- motivation
+ - the implementation in today's advanced filesystems such as ZFS and Btrfs yields **timing side channels** that can reveal whether a chunk of data has been deduplicated
+ - explore the security risks existing in filesystem deduplication
+- main goal
+ - use carefully-crafted read/write operations that show exploitation is not only feasible, but that the signal can be amplified to mount **byte-granular attacks over the network**
+ - the main difference from previous secure deduplication work (memory deduplication):
+ - filesystem operations tend to be **asynchronous** for efficiency
+ - the granularity of filesystem deduplication (often as large as 128 KiB) is large
+
+### DUPEFS
+
+- threat model
+ - an attacker who has direct or indirect (possible remote) access to the same filesystem as a victim, and the filesystem performs inline deduplication
+ - local: using low-level system calls such as write(), read(), sync(), fsync()
+ - remote: interacts with the filesystem through a program that is not under the attacker control
+ - e.g., a server program
+- challenges
+ - **performance**: the I/O operations are mostly asynchronous to hide the latency
+ - filesystems cache data complicates the construction of a timing attack
+ - **reliability**: even if data is deduplicated, the metadata still needs to be written to disk, which interferes with the timing channel
+ - **capacity**: modern filesystems perform deduplication only across many blocks that are either temporally or spatially close to each other, clustered together in a deduplication record
+ - increase the entropy of any target secret deduplication record
+- data fingerprinting
+ - relies on the general timed read/write primitive to **reveal the presence of existing known but inaccessible** data
+- data exfiltration
+ - allow two colluding parties with direct/indirect access to the same system to communicate over a stealthy covert channel
+- data leak
+ - alignment probing
+ - stretch controlled data to fill the deduplication record minus one or more bytes of secret data
+ - 
+ - secret spraying
+ - generate a stronger signal over LAN/WAN
+ - spray candidate secret values over many deduplication records and issue many writes for the corresponding guesses
+- attack primitives
+ - 
+
+- mitigation
+ - using pseudo-same-behavior
+ - write path
+ - even for duplicated data, it still overwrites existing on-disk data
+ - slow down deduplicated write path
+ - read path
+ - introduce time jitter on the read path
+ - enforce pseudo-same-behavior for disk access patterns
+
+### Implementation and Evaluation
+
+- evaluation
+ - on FreeBSD for ZFS, and Linux for Btrfs
+ - attack effectiveness
+ - success rate
+ - attack time
+ - I/O
+ - data fingerprinting, data exfiltration, data leak
+
+## 2. Strength (Contributions of the paper)
+
+- analyze filesystem deduplication side channels and differentiate it with previous work (asynchronous disk accesses and large deduplication granularities)
+ - the attacker can mount byte-level data leak attacks across the network
+- propose some light-weight mitigation for such attacks
+
+## 3. Weakness (Limitations of the paper)
+
+- the remote attack is based on the browser implementation and this is not very general
+- the mitigation approach is practical but cannot completely eradicate the signal
+
+## 4. Some Insights (Future work)
+
+- SHA-256 vs. faster hashing
+ - it can also rely on faster hash functions that are not collision-resistant (such as **fletcher4**)
+ - Since hashing may incur collisions, some implementations include an additional step to verify that the data inside the matching deduplication records is identical
+
+- Deduplication granularity in filesystem deduplication
+ - filesystems perform deduplication at a granularity that is **a multiple of the data block size**
+ - a sufficient number of data blocks must be written to the filesystem to reach the deduplication record size
+- the timed write primitive
+ - the **timing difference** of handling unique data and duplicate data
+ - process duplicate data is cheaper (only update the metadata)
+ - allow attacker to leak whether certain data is present on the filesystem during a write operation
+- the timed read primitive
+ - duplicated data from different files end up in distinct physical memory pages
+ - as the page cache (in Linux) operates at the file level
+ - if a block of a file becomes deduplicated, its physical location on the disk **differs from its surrounding blocks**
\ No newline at end of file
diff --git a/StoragePaperNote/template.md b/StoragePaperNote/template.md
index 2765d67..a0745df 100644
--- a/StoragePaperNote/template.md
+++ b/StoragePaperNote/template.md
@@ -1,8 +1,8 @@
---
typora-copy-images-to: ../paper_figure
---
-Redesigning LSMs for Nonvolatile Memory with NoveLSM
-------------------------------------------
+# To Zip or Not to Zip: Effective Resource Usage for Real-Time Compression
+
| Venue | Category |
| :------------------------: | :------------------: |
| ATC'18 | LSM+PM |
@@ -20,4 +20,3 @@ Redesigning LSMs for Nonvolatile Memory with NoveLSM
## 3. Weakness (Limitations of the paper)
## 4. Some Insights (Future work)
-
diff --git a/paper_figure/image-20220316134336877.png b/paper_figure/image-20220316134336877.png
new file mode 100644
index 0000000..fc46ef1
Binary files /dev/null and b/paper_figure/image-20220316134336877.png differ
diff --git a/paper_figure/image-20220316134407739.png b/paper_figure/image-20220316134407739.png
new file mode 100644
index 0000000..d3ab1ab
Binary files /dev/null and b/paper_figure/image-20220316134407739.png differ
diff --git a/paper_figure/image-20220526171832531.png b/paper_figure/image-20220526171832531.png
new file mode 100644
index 0000000..7dfb4c3
Binary files /dev/null and b/paper_figure/image-20220526171832531.png differ