diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524747449744.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1524747449744.png deleted file mode 100644 index cf00d16..0000000 Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524747449744.png and /dev/null differ diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524753088554.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1524753088554.png deleted file mode 100644 index 5d3e314..0000000 Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524753088554.png and /dev/null differ diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524908286158.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1524908286158.png deleted file mode 100644 index ebe9578..0000000 Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1524908286158.png and /dev/null differ diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/1525441410699.png b/Paper Reading Note/CRT_Chain-INFOCOM'18/1525441410699.png deleted file mode 100644 index 6defdcb..0000000 Binary files a/Paper Reading Note/CRT_Chain-INFOCOM'18/1525441410699.png and /dev/null differ diff --git a/Paper Reading Note/CRT_Chain-INFOCOM'18/CRT_Chain.md b/Paper Reading Note/CRT_Chain-INFOCOM'18/CRT_Chain.md deleted file mode 100644 index 79adfad..0000000 --- a/Paper Reading Note/CRT_Chain-INFOCOM'18/CRT_Chain.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -typora-copy-images-to: ./ ---- -#On Scalable Service Funciton Chaining with O(1) Flowtable Entries -@INFOCOM'18 -[TOC] -##1. Backgroud and Motivation -- Motivation -NFV offers customization capability. it howerver comes with a cost of consuming precious TCAM resources. - > The number of service chains that an SDN can support is **limited by the flowtable size of a switch**. - > The **scalability** is a fundamental challenge of enabling such configurable service chaining. - -- The core of CRT-Chain is an encoding mechanism that leverages **Chinese Reminder Theorem (CRT)** to compress the forwarding information into small labels. - -- VNFs can be efficiently scaled up/down and, hence, provide agility and flexibility to adaptation of network components according to dynamic user demands. - -- SDN is able to deliver **elastic services** so that allows traffic to go through a customized sequence of services. - -##2. Technical Challenges -1. The **ambiguity** in forwarding -An SFC may request ti be served by different funcitons associated with the same switch in *a specific order*. - -2. How to reduce the label size? -Propose a chain segmentation scheme that allows the overall label size of all the segments to be smaller than the label size of **end-to-end** path. - -3. A CRT label size is determined by the primes assigned to the functiuons along a chain. -It exploits a prime assignment strategy based on the distribution of function popularity to reduce the expected label size. - -##3. Related Work -主要讨论了三个方面的工作: -- Flowtable Management -- Entry Size Reduction -- Flowtable-Free Routing - -##4. The Model of SFC -- The SFC protocol defined in **IETF RFC7665** -1. To improve reliability and load-balancing, for each Service Function (SF), it can have multiple instances, associated with a different Service Function Forwarder (SFF). - -2. An SFF can insert/process the forwarding rules generated for an SFC and can further know how to parse the **Network Service Header (NSH)** of an SFC packet. -![1524747449744](1524747449744.png) -two SFC requests: $c_1$, $c_2$ -$c_1$: $SF3 \rightarrow SF11 \rightarrow SF7$ -$c_2$: $SF3 \rightarrow SF7$ - -3. Each SF instance along the SFP decreases the value of SI by 1 such that all the SFFs can know whether an SFC has been executed completely. - -4. **The problem**: the number of the required flowtable entries grows **linearly** with the number of SFC requests. (*to distinct SFC*) - -##5. CRT-Chain Design -###5.1 Overview of CRT-Chain -Goal: to resolve the scalability issue. -- High-level Idea of CRT-Chain: replace **per-chain** forwarding rules with **per-function** forwarding rules. - > For all the chains requesting to be served by an SF instance, it associated SFF inserts **only one forwarding rule** for this SF instance. (regardless of how many chains assigned to it) - -An SFF then uses vey simple modular arithmetic to extract the forwarding information directly from the labels, **without knowing which chain it belongs to**. - -- CRT-Chain only requires **a constant number** of flowtable entries in each SFF. -- CRT-Chain transfers the cost of flowtable entries to the label overhead - - -###5.2 CRT-based Chaining -- Notation: -$F$: the set of SFF or switches -$S$: the set of available SF types -$C$: the set of SFC requests, $c \in C$ -$P(c)$: the routing path of $c$ -$SP(c)$: the SFP of $c$ - -For example: -$P(c1) = IC \rightarrow S3 \rightarrow SFF7 \rightarrow SFF13 \rightarrow S17 \rightarrow EC$ -$SP(c1) = SF3(SFF7) \rightarrow SF11(SFF13) \rightarrow SF7(SFF13)$ - -- In its design, it encodes $P(c)$ and $SP(c)$ of each $c \in C$ into **two variable length labels**, $X_c$ and $Y_c$ (用两个不同的label) -![1524753088554](1524753088554.png) -1. Each forwarder decodes $X_c$ and $Y_c$ to extract the forwarding rules for routing and SFP. -2. The length of the fields $X_c$ and $Y_c$ can be set to $log_2|X_{max}|$ and $log_2|Y_{max}|$ - -- Assign each forwarder in $F$ a **unique** prime, and, similarly, assign each SF type in $S$ a **unique** prime. (as the ID of the forwarder (SF)), a forwarder and an SF can share **same prime**. - > $f \neq f'$ for all, $f, f' \in F$ - > $s \neq s'$ for all, $s, s' \in S$. The number of required SF primes is determined by the number of SF types $|S|$, instead of the number of SF instances. - > $f = s$ for any $f \in F$, $s \in S$ is allowed - -- **Encoding and decoding $X_c$** -Given a path $P(c)$, the path label $X_c$ should satisfy the following constraints: -$X_c = e_i(mod f_i(c)), \forall 1\leq i \leq |P(c)|$ (用余数表示端口号) - > $f_i(c)$ is the prime assigned to the i-th forwarder along the path $P(c)$ - -According to CRT, the solution of $X_c$ can be found as follows: - > $X_c = (\sum_{i=1}^{n}(w_i \times e_i))$(mod $X)$ (利用同余方程解的公式), $X = \prod_{i=1}^{|P(c)|}f_i$ - -For example, for a given service function chain $c_1$, which traverses through forwarders with primes 3, 7, 13 and 17 using the output ports 1, 3, 1 and 2 respectively. Hence, the path label $X_{c1}$ should meet the following constraints: ->$X_{c1}$ = 1(mod 3) ->$X_{c1}$ = 3(mod 7) ->$X_{c1}$ = 1(mod 13) ->$X_{c1}$ = 2(mod 17) - -get $X_{c1} = 4,252$ - -The maximum possible label $X_{max}$, can can be found by the largest primes. (考虑了解的最大值的问题) -To forward an SFC packet, each forwarder decodes $X_c$ and extracts the forwarding port by using its assigned prime, $e_i = X_c$ mod $f_i(c)$ (Forwarder通过计算得到转发所用的端口) -![1524908286158](1524908286158.png) - -- **Encoding and decoding $Y_c$** -1. The **difference** between SFP label $Y_c$ and path label $X_c$ -an SFF might associate with multiple SF instances and, more inportantly, the order of forwarding to different SF instances matters. - -How to overcome this problem? -it introduce **a step counter $N$** for each SFC packet - -For example: -c:$SF3 \rightarrow SF11 \rightarrow SF7$ ->$Y_c$ = 1(mod 3) ->$Y_c$ = 2(mod 11) ->$Y_c$ = 3(mod 7) - -In Step1: if $Y_c$ mod 3 = 1, then it means SFF forwards to SF3 in $step1$, then SF3 instance adds the step counter $N$ by 1.(用余数表示顺序) - -If current associated SF instances do not match the current step counter, then SFF sends it out. - - -- Ensuring unique SFP forwarding (考虑了唯一性的问题,上述的简单encoding的方法不一定通用 ) -1. This simple CRT encoding cannot guarantee **correct SFP scheduled** by the controller. (e.g. 比如一种SF多个实例的情况,在这种情况下,会出现异常的转发行为) - -2. To avoid this ambiguity, it combines all the forwarders along $P(c)$ and all the SF instances along $SP(c)$ into a **merged path $MP(c)$** in their traversing order. - - .g. $MP(c) = (1)S3 \rightarrow (2)SFF7 \rightarrow (3)SF3 \rightarrow (4)SFF13 \rightarrow (5)SF11 \rightarrow (6)SF7 \rightarrow (7)S17$ (将每个step所对应的device也编入在序列中) - -CRT-Chain now encodes the SFP label $Y_c$ using **the step counts in $MP(c)$**, rather than the original step counts in $SP(c)$, thus, $Y_c$ should now statisfy the following constraints: -$Y_c = ix_n($mod $s_n(c))$, $\forall 1 \leq n \leq |SP(c)|$ -$ix_n$ denotes the step count (index) of $s_n(c)$ in **the merged path MP(c)** - -###5.3 Chain Segmentation -**Problme**: -The header overhead, i.e., $|X_c|$ and $|Y_C|$ is determined by the primes used in the congruence system. -> e.g., larger primes lead to larger labels, $X_c$ and $Y_c$. (the header size scales up with the number of forwarders [SF types] in a network) - -- An intutive solution (从减少primes的角度) -To allow different forwarders (SFs) to use the same prime, as a result minimizing the number of primes. - -Given $F(S)$, instead of using $|F|(|S|)$ unique primes, it can use only $\alpha |F|$ $(0 \leq \alpha \leq 1)$ - -**It poses a new problem**: -When some forwarders (SFs) share the same prime, there will be problem if they happen to belong to the same path (SFP) - -To avoid this problem: it proposes a **segmentation technique** that partitions a path (SFP) into several sub-paths *(in each which any two forwarders (SFs) do not share the same prime)* - -- Partitioning a path: (思想很简单 ) -1. Trace the path and check whether any $f \in P(c), (s \in SP(c))$ has a prime duplicated with any one locating prior to it. - -2. For any duplicated prime found, the path should be cut here, making those forwarders (SFs) prior to it as **a conflict-free sub-path**. - -For example: -Consider an example path: -$P(c) = f_1 \rightarrow f_2 \rightarrow f_3 \rightarrow f_4 \rightarrow f_5$ -assigned the primes: -$5 \rightarrow 13 \rightarrow 5 \rightarrow 2 \rightarrow 7$ -It should be partitioned into two conflict-free sub-paths: -$P_1(c) = f_1 \rightarrow f_2 = 5 \rightarrow 13$ -$P_2(c) = f_3 \rightarrow f_4 \rightarrow f_5 = 5 \rightarrow 2 \rightarrow 7$ -Those sub-labels are then conncatenated together as the header in the format of -$(N, l_{X_{c, 1}}, X_{c, 1}, l_{Y_{c, 1}}, Y_{c, 1}, ...)$ - - -- Forwarding sub-paths: -The remaining problem is how can a forwarder know which sub-label $(X_{c, i}, Y_{c, i})$ it should decode and when should a sub-label be discarded. - -Its design is motivated by an observation that **each forwarder has a limited number of output ports**. -> Hence, it can use this parameter to encode the termination rule of a sub-path - -let $o_f$ denote the maximum output port of forwarder $f$, the remainder $e_{null}$ can be any integer number larger than the maximum output port $o_f$ -$X_{c, i} = e_{null}$ (mod f), - -By doing this, the last-hop forwarder $f$ of a sub-path will get an invalid port $e_{null}$ and easily detect that it should end the current sub-path. **Then, the forwarder drops the current sub-label $(X_{c, i}, Y_{c, i})$**, extracts the next one $(X_{c, i}, Y_{c, i})$. -![1525441410699](1525441410699.png) - -- Prime Assignment -It can further reduce the header overhead by **decreasing the probability of using those large primes** -> many paths going through those popular SFs with large primes can output large labels. - -**solution**: assign primes to forwarders and SFs according to their **popularity (or loading)** -1. Assign small primes to *heavy loaded forwarders and popular SF types*, while letting less used ones have *large primes*. - -2. **popularity score**: count the number of chains that traverse through a forwarder $f \in F$ (SF $s \in S$), denoted by the **popularity score** $w_f (w_s)$, and sort $f \in F$ ($s \in S$) in descending order of their popularity $w_f (w_s)$ - - -##6. Implementation and Experiment -a network accommodates 64 VMs and hence is capable of supporting 64 SF instances in total. -- Impact of the Number of SFC Requests (测试scalability) -- Impact of the Length of SFCs -- Impact of Prime Reuse and Path Segmentation -- Impact of the Number of SF Types -- Impact of Segmentation on Bandwidth Consumption -- Overall Overhead \ No newline at end of file diff --git a/Paper Reading Note/DDP-ICDCS'18/DDP-ICDCS'18.md b/Paper Reading Note/DDP-ICDCS'18/DDP-ICDCS'18.md deleted file mode 100644 index 620d2bb..0000000 --- a/Paper Reading Note/DDP-ICDCS'18/DDP-ICDCS'18.md +++ /dev/null @@ -1,85 +0,0 @@ ---- -typora-copy-images-to: ./ ---- - -#DDP: Distributed Network Updates in SDN - -@ICDCS'18 @SDN Update -[TOC] - - -##1. Motivation - -- Current update approaches heavilyh rely on the centralized controller to initiate and orchestrate the network updates, resulting in **long latency** of update completion. -> Quickly and consistently updating the distributed data plane poses a major and common challenge in SDN system -> coordination of the distributed data plane requires frequent communication between the controller and switches (slow down the update completion time) - -- Asynchronous communication channels: -control messages are often received and executed by switches in an order different from the order sent by the controller. -> an inapproriate control order may violate the consistency properties on the data, resulting in network anomalies, e.g., blackholes, traffic loops and congestion. - - - -##2. Overview of this paper - -### **Main Idea** of this paper -DDP develops distributed coordination abilities at the data plane. -Each datapath operation container (DOC) is encoded with an individual operation and its dependency logic. -> the network update can be both triggered and executed in a fully local manner, further improving the update speed while maintaining the consistency. - -**Insight**: Switches can coordinate with each other to orderly apply the operations, the update time as well asa the controller's processing load will be greatly reduced. - -- Real-time update -the involved DOCs are sent to the data plane in one shot, and the switches can consistently execute them in a distributed manner. - -- Updates directly triggered by local events -the controller prestores the DOCs at the data plane, and when corresponding events happen, the updates can be locally triggered and executed. - -### **Challenges** of this paper - - -### Contributions of this paper - -2. design novel algorithms to compute and optimize the primitive DOCs for consistent updates - -3. implement the Distributed Datapath (DDP) system to evluate its performance in various update scenarios. - -### The Method - -1. Network Update Problem -The $C$ denote a network configuration state (a collection of exact match rules) -Network update is defined as: a transition of confirguration state from $C$ to $C^{'}=update(C,O,e)$ -> $O=\{o\}$ is a set of datapath operations to implement the update, e.g., to insert/delete/modify a flow rule at a particular switch. -> $e$ is a local event at the data plane to trigger the update, e.g., a link/switch failure and link congestion. - -2. DDP Design -- Operation Dependency Graph (ODG) -Introduce the concept of an **Operation Dependency Graph (ODG)** that captures the data plane dependencies -> 1. The dependency is **unidirectional** (no cycles in the graph) -> 2. ODG expresses and optimized result of the whole dependency relations. -> 3. connectivity is dispensable in the ODG -> 4. the ODGs are composable (multiple ODGs for different update events can be composed together) - -- Datapath Operation Container (DOC) -In DDP system, the SDN controller adopts DOCs to configure the data plane, rather than directly sending operations as in traditional SDN. -> The switches then coordinate with each other to execute the update at right time. - -- Execution Behaviors ->1. Push: DOC is executed ->2. Pull: DOC is received at the data plane - -Push and Pull are complementary to each other, and with the two behaviors, all operations will be **consistently** applied in a correct order. - -- Algorithm - - - -### Implementation - - -### Experiment - - -### Related Work - - diff --git a/Paper Reading Note/FADE-ICC'16/1520838349163.png b/Paper Reading Note/FADE-ICC'16/1520838349163.png deleted file mode 100644 index 82e761d..0000000 Binary files a/Paper Reading Note/FADE-ICC'16/1520838349163.png and /dev/null differ diff --git a/Paper Reading Note/FADE-ICC'16/1520861458179.png b/Paper Reading Note/FADE-ICC'16/1520861458179.png deleted file mode 100644 index ca3ab0f..0000000 Binary files a/Paper Reading Note/FADE-ICC'16/1520861458179.png and /dev/null differ diff --git a/Paper Reading Note/FADE-ICC'16/FADE.md b/Paper Reading Note/FADE-ICC'16/FADE.md deleted file mode 100644 index 3581c2b..0000000 --- a/Paper Reading Note/FADE-ICC'16/FADE.md +++ /dev/null @@ -1,77 +0,0 @@ ---- -typora-copy-images-to: ./ ---- - -# FADE: Detecting Forwarding Anomaly in Software Defined Network -@IEEE ICC 2016 - -[TOC] - -## Motivation -- Flow rules installed in switches can be easily tempered by different entities intentionally or unintentionally. - -- Forwarding Anomalies are normally triggered by **equipment failure** and **network attacks**. -- Flow rules enforced at the data plane may not be same with the flow rules installed by the control plane - -- Forwarding Anomaly Detection (FAD) in SDN is always achieved by sending probing packets or analyzing flow statistics. These approaches are not **effective** and **efficient**. - e.g. - > 1. high communication overheads - > 2. cannot capture all attacks - -##Problem Statement -- This paper define network flow as a set of packets processd by the same sequence of flow rules. -- Normally, forwarding anomalies can be classified into two categories, - > 1. traffic interception attacks: flow are dropped or forwarded to wrong rule paths that **never return** to their correct rule paths - > 2. traffic hijacking attacks - -##Challenge -1. Flow Selection: select a minimal set of flows in the rule graph so that their rule paths can cover all rules paths. (**Flow Selection Algorithm**) -2. Rule Generation -3. Traffic Aggregation and Traffic Burst: Anomaly identification should accurately collect flow statistics and verify them under traffic burst and traffic aggregation. (**Using Label**) - - -##Solution -**Basic Idea**: flow rules forwarding the same flow should have consistent view on the flow's statistics. -There are three steps in FADE: -- Firstly, FADE builds a **rule graph** according to topology and flow rules. And it uses a flow selection algorithm to select a small set of flows whose rules path can cover all existing rule paths in the rule graph. -- Secondly, FADE generates dedicated flow rules for every selected flow and installed them in the data plane to track the flows. -- Thirdly, FADE collects flow statistics to identify if there is any forwarding anomaly. - - -- Flow Selection: For each egress rule, flow selection traverses the rule graph **reversely** to find a rule that has an indegree of 0 in the rule graph, i.e., ingress rule. - > As the case of traffic aggregation that a rule may have multiple previous rules in the rule graph. Thus, the rule graph is constructed as a forest. In the forest, the roots of trees are egress rules and leaves are ingress rules. - ![flow_selection_algorithm](flow_selection_algorithm.png) - -- Rule Generation: FADE generates several dedicated flow rules to collect their flow statistics and computers the set of switches on which these dedicated flow rules should be installed. FADE supposes it;s a **bijection** between flow rules in a rule graph. - > e.g. ${r_{11}, r_{31}, r_{41}}$ is ${S_1, S_3, S_4}$ -- It generates k (k>2) dedicated flow rules for each selected flow, and installs them to the first switch, k-2 intermediate switches and the last switch on the flow's forwarding path. - -- Once a dedicated flow rule is installed on the switch where the malicious flow rule is installed on the switch where the malicious flow rule is enforced, **it forwards the flow prior to the malicious rule and hides the malicious rule**. There is an optimal $k$ to maximize the successful detection probability. (Do the calculation to find the optimal number) - > In practice, they find rule paths are hardly longer than 32. - > It calculates the optimal $k$ for different rule path length, $p(k)=p_1(k)+\sum_{l=2}^{l=m-1}p_2(k, l) (2 \leq k \leq n)$ - > $p_1(k)=\frac{n-k}{n-2}$ is the probability that **Traffic interception attacks** can be detected. - > $p_2(k, l)=\frac{n-k}{n-2}-\frac{(n-k)...(n-k-l+1)}{(n-2)...(n-l-1)} (2 \leq l < n)$ is the probability that **Traffic hijacking attacks** can be detected. - > The result and Anomaly Identification Algorithm shows below: -![1520838349163](1520838349163.png) -![1520861458179](1520861458179.png) - -## Implementation -- This paper implements FADE as an application on the **Floodlight Controller**. And there are three modules in FADE: - > 1. Rule Storage Module: it is extecded from **HSA** and maintains all flow rules by monitoring **OFFlowMod** messages and analyzes the dependencise among these rules. - > 2. Rule Graph Module: It monitors rule storage updates and **LDUpdate** messages, i.e., topology update messages to build rule graph. - > 3. Anomaly Detection Module: It interacts with the above two modules and detects anomalies according to information retrieved from them. - -## Evaluation -- Floodlight 1.1, Mininet 2.2.1, OVS 2.3.2 -- Using a virtual machine which has a 2.5 GHz dual-core CPU and 16GB memory to emulate different networks. -- Malicious rules are simulated by injecting flow rules directly into OVS through **ovs-ofctl**. -- Link throughputs are measured by **iperf**. - -## Related Work -- ATPG: it is a test packet generation framework whose results can be used to verify all flow rules in the network. - > It only supports static configuration and is time-consuming. -- SDN Traceroute: it uses a label base scheme to generate test packets to verify flows's forwarding paths hop by hop. - > It generates lots of packets and only adapts to anomaly location. -- NetPlumber: it is an invariant checking tool based on HSA. (Similar to rule graph) -- SPHINX: It uses flow statistics to verify data plane forwarding, which is very similar to FADE - > It defines flow as source and destination MAC address. \ No newline at end of file diff --git a/Paper Reading Note/FADE-ICC'16/flow_selection_algorithm.png b/Paper Reading Note/FADE-ICC'16/flow_selection_algorithm.png deleted file mode 100644 index 224384c..0000000 Binary files a/Paper Reading Note/FADE-ICC'16/flow_selection_algorithm.png and /dev/null differ diff --git a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520995111174.png b/Paper Reading Note/SDNtraceroute-HotSDN'14/1520995111174.png deleted file mode 100644 index 81940b9..0000000 Binary files a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520995111174.png and /dev/null differ diff --git a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520997877920.png b/Paper Reading Note/SDNtraceroute-HotSDN'14/1520997877920.png deleted file mode 100644 index 2c5661a..0000000 Binary files a/Paper Reading Note/SDNtraceroute-HotSDN'14/1520997877920.png and /dev/null differ diff --git a/Paper Reading Note/SDNtraceroute-HotSDN'14/SDNtraceroute.md b/Paper Reading Note/SDNtraceroute-HotSDN'14/SDNtraceroute.md deleted file mode 100644 index 7e0bed4..0000000 --- a/Paper Reading Note/SDNtraceroute-HotSDN'14/SDNtraceroute.md +++ /dev/null @@ -1,70 +0,0 @@ ---- -typora-copy-images-to:./ ---- -# SDN traceroute: Tracing SDN Forwarding without Changing Network Behavior - -@HotSDN'14 -[TOC] - -## Motivation -- Flexibility in SDN brings added complexity, which requires new debugging tools that can provide insights into network behavior. -- SDN programs and controllers often translate **high-level configurations** into **low-level rules**. The result is that it can be difficult for network operators to **predict** the exact low-level rules. -- It is imperative to have tools that can provide visibility into how different packets are handle by the network at any given time. -- The main limitation of $traceroute$ is it can only provide the *layer-3 (IP)* path information because it relies on the time-to-live (TTL) field in the IP header to trigger ICMP error messages from intermediate routers. - -## Problem Statement -- Goal: trace the path of a given packet using the actual forwarding rules in the networkm with as little impact on the network as possible. Requirements: - > 1. Non-invasive: existing rules in forwarding tables should remain unchanged. - > 2. Accurate: The existing forwarding rules should be applied directly to the probes as well as production traffic when measuring the behavior of a switch. - > 3. Low resource consumption: Requiring only a small number of rules per switch and update those rules infrequently. - > 4. Commodity hardware: This work can work on existing SDN protocols. - > 5. Arbitrary traffic: It should be possible to trace the path of any flow and even any given packet form within a flow. - -- SDN traceroute runs as an application on an SDN controller so that it can push rules to the switches and listen to OpenFlow messages. It has access to the **topology of the network**. - > *input*: an arbitrary Ethernet frame, an injection point in the form of a switch identifier and port. - > *output*: an ordered list of $$ pairs corresponding to each hop encountered by the packet as it traversed the network. - -## Solution -### Network Configuration - -- SND traceroute must install rules that allow it to selectively **trap probes**. - > 1. matching the incoming probe packet so the hop can be logged at the controller - > 2. not matching the controller-returned probe as to forward the packet downstream. - -- SDN traceroute firstly applys a graph coloring algorithm to the topology. - > 1. Colors will serve as tags that are an integral part of the rules - > 2. The coloring algorithm assigns each switch a color that **no two adjacent switches are assigned same color**. (典型的图论中的染色问题) - > 3. This problem in Graph Theory is an **NP-hard problem**,SDN traceroute uses a **greedy algorithm** to color the vertices. - > 4. All traffic carries a color so that the switches can decide whether or not to send a probe to the controller. (Using **VLAN priority field** as the tag) - > 5. In general, many datacenter topologies use a hierarchical tree structure consisting of core, aggregation and ToR switches. Those topologies require only 2-bit tags as **trees are 2-colorable**. - -- The number of rules installed in a switch depends on the number of colors used by its **adjacent switches**. In most scenarios, this rqeuires installing one or two *TCAM rules*, e.g. - ![1520995111174](1520995111174.png) -- These rules need only be changed when the network topology changes. -### Conducting the Trace Route - -- Initialize the injection point $(switch, port)$: - > 1. Use the API call - > 2. the attachment point of the source host, which is looked up by source MAC or IP address. - > 3. it looks up **the color of the ingress switch** and inserts the color into the header tag bit of the probe frame. - > 4. SDN traceroute sends the probe to the ingress switch as the $PACKET\_OUT$ message with input port set to the injection point. The action of $PACKET\_OUT$ is set to $TABLE$. (**目的: 让switch在处理时将probe当作从input port中接收到来处理**) - -- Running Steps: -![1520997877920](1520997877920.png) - -###The Assumption in this problem -- These bits for tag must not be modified by any devices in the network. (e.g. Middleboxes) -- The bits must correspond to header field(s) that can be matched on using rules in the switches. (e.g. 12 matchable fields in OpenFlow 1.0) -- SDN traceoute reserves the **hightest priority rules**. - -## Evaluation -- Five IBM RackSwitch G8264 OpenFlow-enabled switches connecting several comodity servers running Open vSwitch -- Implementation of SDN traceroute is a module for the Floodlight controller providing a REST API allowing a network operator to perform a trace route for an arbitrary packet. (600 LOC) -## Related Work -- ATPG -- NetSight -- Anteater -- Header Space Analysis (HSA) -- Veriflow -- Libra -- OFRewind \ No newline at end of file diff --git a/Paper Reading Note/SDProber-SOSR'18/1521122343477.png b/Paper Reading Note/SDProber-SOSR'18/1521122343477.png deleted file mode 100644 index d5e1b23..0000000 Binary files a/Paper Reading Note/SDProber-SOSR'18/1521122343477.png and /dev/null differ diff --git a/Paper Reading Note/SDProber-SOSR'18/1521277960477.png b/Paper Reading Note/SDProber-SOSR'18/1521277960477.png deleted file mode 100644 index 25c4229..0000000 Binary files a/Paper Reading Note/SDProber-SOSR'18/1521277960477.png and /dev/null differ diff --git a/Paper Reading Note/SDProber-SOSR'18/1521284130008.png b/Paper Reading Note/SDProber-SOSR'18/1521284130008.png deleted file mode 100644 index 90425a0..0000000 Binary files a/Paper Reading Note/SDProber-SOSR'18/1521284130008.png and /dev/null differ diff --git a/Paper Reading Note/SDProber-SOSR'18/SDProber.md b/Paper Reading Note/SDProber-SOSR'18/SDProber.md deleted file mode 100644 index 0c192f3..0000000 --- a/Paper Reading Note/SDProber-SOSR'18/SDProber.md +++ /dev/null @@ -1,141 +0,0 @@ ---- -typora-copy-images-to: ./ ---- - -# SDProber: A Software Defined Prober for SDN -@SOSR'18 - -[TOC] - -## Motivation -- **Persistent delays** in wide area networks are perilous and can adversely affect the effectivness of online service. (need to proactivly detect long delays as early as possible.) -- There is a trade-off between *detection time* and *cost* in proactive measurement of delays in SDN. - > 1. Increasing the inspection rate of a link can reduce the detection time of a delay - > 2. Inspecting a link too often could hinder traffic via that links or via the nodes it connects - > 3. The limitation of the inspection rate per each link. - > A lower bound: specify how often the link should be inspected. - > An upper bound: restrict the number of inspecting per link. - -- The frequency of congestion and high delays could be learned from the history and the inspection rates would be modified accordingly. (可以采用adaptive方法的原因) -- Traditional tools (e.g. $ping$, $traceroute$) are unsuitable for adaptive measurement where different links should be inspected at **different rates**. (ping's predefined path limitation) -- SDN's central control over forwarding rules allows for efficient implementation of adaptable delay monitoring. -- Adaption is achieved by changing the probabilities that govern the random walk. -- SDProber is used for **proactive delay measurements** in SDN. -## Goal -- 1. inspect links at specified rates. -- 2. reduce the total number of probe packets when monitoring the network -- 3. minimize the number of excess packets through the links -- 4. detect delays earlt - -## Problem Statement -### Model of Network and Delays -- Network is represented as a directed graph $G=(V, E)$ (normal model) -- The network operator specifies the minimum and maximum rates of probe-packet dispatching per link. - > 1. **Input**: a network $G$, rate constraints on edges, a cost constraint $C$ that specifies the total probe packets per minute. - > 2. **Objective**: probe $G$ that probe rates would satisfy the rate constrains and the cost constraint $C$. -- Computing a set of paths that satisfies the probe-rate constraints is complex, expensive in terms of running times (essentially, NP-hard), and inflexible. - - > if sending probes via predefined paths - - - -### Overview of SDProber -#### Delay Measurement -- The delay between two given nodes $s_1$ and $s_2$ is measured by SDProber using probe packets. The schematic representation of the process is shown below: -![1521122343477](1521122343477.png) - -- $t_1^{\leftrightarrow}$ and $t_2^{\leftrightarrow}$ are the round trip times between the nodes $s_1$ and $s_2$ and the collector. (can be easily measured by $ping$). - > $t_1^{\rightarrow}$ and $t_2^{\rightarrow}$ be the one-way trip times from the node $s_1$ and $s_2$ to the collector - > $t_2-t_1 \leq delay(s_1, s_2)+t_2^{\rightarrow} \leq delay(s_1, s_2)+t_2^{\leftrightarrow}$ - > $t_2-t_1 \geq delay(s_1, s_2)-t_1^{\rightarrow} \geq delay(s_1, s_2)-t_1^{\leftrightarrow}$ - > Result in: $t_2-t_1-t_2^{\leftrightarrow} \leq delay(s_1, s_2) \leq t_2-t_1+t_1^{\leftrightarrow}$ - -#### System Architecture -- SDProber sends probe packets repeatedly to measure delays in different parts of the network. - - > Collector collects mirrored packets to compute the expected delay per link or path. - -- **Probe Agent**: It crafts and dispatches probe packets. In general, a probe client is attached to every node. And the number of probe clients can vary. - -- Each probe has a **unique ID** in its payload and will be emitted from a **probe client.** - > 1. Probe packets are marked to distinguish them from genuine traffic. - > 2. The collector groups mirrored packets of different probes from one another. -![1521277960477](1521277960477.png) - -- **SDN Controller and Open vSwitch**: Underlying elements of the network are OVS with OpenFlow programming interface. - > 1. SDProber routes probe packets in a **random walk fashion**. - > 2. To achieve this, it uses a combination of **group tables** and **match-action rules**. - > 3. OpenFlow's group tables are designed to execute one or more buckets for a single match - > e.g. SDProber uses group tables in **ALL** (execute all the buckets) and **SELECT** (execute a selected bucket) modes. - > 4. Each bucket has a weight, for each forwarded packet, the OVS choose a bucket and executes the actions in the bucket. - > Each bucket contains a forwarding rule to a different neighbor node (a different port) - > The buckets are selected arbitrarily. (e.g. per a hash of field values, in proportion to the weights.) - > To add randomness to the bucket selection, the probe agent assigns a **unique source MAC address** to each probe packet. (In repeating visits at a node, the same actions are applied at each visit, 所以traversal是一个 **pseudo random walk**) 为啥? - -![1521284130008](1521284130008.png) - -- **Collector**: For each mirrored probe packets, collector records the **arrival time**, extracts the UDP source from the header and gets the **unique identifer** from the payload. - > 1. Mirrored packets are grouped by the **identifier** of the probe. - > 2. The number of groups should be equal to the total number of the probe packets. - > 3. The number of packets in each group is equal to the **initial TTL limit**. - > 4. After grouping, the collector computers the traversed path of each probe packet by ordering the mirrored packets of each group based on **DPID values of switches and the known network topology**. - > 5. The recorded times of arrival of the ordered mirrored packets are used for estimating the delay for **each link** on the path. - -## Monitoring By Random Walk -- SDProber needs to satisfy the rate constraints when routing probes. Instead of computing a set of path directly, the probe packets perform a random walk over a weighted graph. - > 1. The initial node and each traversal step are selected randomly, per probe. - > 2. The **link-selection probabilities** are proportional to **the weight of forwarding rules**. - > 3. The length of path is limited by setting the TTL field. It determines the number of inspected links per probe packet. - -- The model: - > 1. $n$ is the number of nodes in the network $G$, each node $v_i \in V$ has a weight $w_i$. - > 2. $W = \sum^{n}_{i=1}w_i$ is the sum of weights. - > 3. The probability of selecting node $v_i$ is $\frac{w_i}{W}$, For each selection of a node, a number $x$ in range $[0,1)$, $\frac{\sum^{i-1}_{j=1}w_j}{W} \leq x \leq \frac{\sum^{i}_{j=1}w_j}{W}$, $v_i$ is the selected node. - > 4. For each probe packet forwarding, the link (port) for the next step is chosen **proportionally to the weights** assigned to forwarding rules. - -- To control the inspection rates: - > 1. It needs to estimate the number of probes passing through each link for **a given number of emitted probes**. - > + **First**, it computes visit probabilities for nodes. $P_0$ is a vector and $P_0[i]$ is the probabilities of selecting $v_i$ as the initial node, $1 \leq i \leq n$. - > + The transition matrix of $G$ is an $n \times n$, $M=(p_{ij})_{1 \leq i,j \leq n}$, where $p_{ij}$ is the probability of forwarding the probe packet from $v_i$ to $v_j$ - > + For each node $v_i$, the array $(p_{i1},....,p_{in})$ specifies the probabilities for the next step after reaching $v_i$. If $v_i$ and $v_j$ are not neighbors, then $p_{ij} = 0$ and $\sum^{n}_{j=1} p_{ij}=1$, $v_i \in V$ - -- Given the initial probabilities vector $P_0$, $P_1 = (M^T) P_0$ is the vector of probabilities of reaching each node after one step. - > 1. $P_t = (M^T)^tP_0$ denotes the probabilities of reaching each node after $t$ steps of the random walk. - > 2. The probabilities of the traversing a link $(v_i, v_j)$ in $k$ == The probabilities of reaching node $v_i$ in step $t$ and proceeding to node $v_j$ in step $t+1$, for some $0 \leq t < k$, it denotes this probability by $p-traverse_{ij}$. - > 3. $p-traverse_{ij} = \sum^{k-1}_{t=0} (P_t)_i (p_{ij})$, **$(P_t)_i$ is the probability of reaching node $v_i$ at step $t$**, **$p_{ij}$ is the probability of forwarding to $v_j$ a packet that arrived at $v_i$.** - -- Why using random walk: In random walk approach, they do not conduct **complex computations** to craft probe packets or change them as they traverse the graph. - - > If network changes require **adjustments** of probe rates, they merely alter the node weights of the intial node selection or alter the weights in the group tables. - -## Weight Adaptation -- Weights affects the random walk are adjusted to aid satisfying the rate constraints. - > 1. SDProber modifies the weights iteratively using **binary exponential backoof** - > 2. The iterations continue indefinitely as long as the monitoring continues. - -- **Link-weight adaptation**: Weights are doubled (halved) when the probing rate is below (above) the minimum (maximum) rate. - > 1. Rates within the limits specified by the rate constraints are adjusted after each iteration. - > 2. Historically delayed links could receive **a higher weight** than links with no history of delays, to be visited more frequently. (通过调整迭代系数控制) - -- **Node-weight adaptation**: Node weights are modified to reflect changes in links. - - > 1. The weight of a node with a link below the minimum rate is doubled, to increase the chances of visiting it in the next iteration. - -## Baseline Method -- Prober packets are sent via the **shortest path** between two selected nodes. There are two baseline method: - > 1. **Random Pair Selection (RPS)**: In each iteration, pairs of source and destination nodes are selected randomly. In each iteration, the pair of source and destination nodes is selected uniformly from **the set of pairs that have not been selected previously**, till all the links are probed. - > 2. **Greedy Path Selection**: In each iteration, for each of nodes, the $weight$ of the shortest path $P$ between these nodes is $\sum_{e \in P, e \notin Visited}min-rate(e)$, the sum of the min-rate values of all the unvisited links on the path. - > The path with the maximal weight is selected and its links are added to $Visited$. - -## Evaluation -- Mininet, OpenVSwitch 2.7.2, RYU controller, publicly-available real topology (196 nodes and 243 links) -- 实验内容:Detection Time, Cost Effectiveness, Adjusting $\alpha$ - -## Related Work -### Utilizing mirroring for measurements -- **NetSight** uses mirroring to gather information about the trajectories of all the packets in a network. (does not scale) -- **Everflow** provides a scalable **sampling of packets** in datacenter networks. (it requires specific hardware) - -### Using Probe Packet -- **SLAM** uses the time of arrival of OpenFlow **packetin messages** at the controller to estimate the delay between links. (It is only relevant to datacenter traffic where there are enough packetin message generated) -- **OpenNetMon** provides per-flow metrics (e.g. throughput, delayu and packet loss) for OpenFlow networks. \ No newline at end of file diff --git a/Paper Reading Note/The Quantcast File System-VLDB'13/1523365611372.png b/Paper Reading Note/The Quantcast File System-VLDB'13/1523365611372.png deleted file mode 100644 index 1d03f9d..0000000 Binary files a/Paper Reading Note/The Quantcast File System-VLDB'13/1523365611372.png and /dev/null differ diff --git a/Paper Reading Note/The Quantcast File System-VLDB'13/1523504685432.png b/Paper Reading Note/The Quantcast File System-VLDB'13/1523504685432.png deleted file mode 100644 index 28fde56..0000000 Binary files a/Paper Reading Note/The Quantcast File System-VLDB'13/1523504685432.png and /dev/null differ diff --git a/Paper Reading Note/The Quantcast File System-VLDB'13/QFS.md b/Paper Reading Note/The Quantcast File System-VLDB'13/QFS.md deleted file mode 100644 index cfa58a8..0000000 --- a/Paper Reading Note/The Quantcast File System-VLDB'13/QFS.md +++ /dev/null @@ -1,131 +0,0 @@ ---- -typora-copy-images-to: ./ ---- - -# The Quantcast File System -@VLDB2013 -[TOC] - -##1. Background and Motivation -- The QFS is an efficient alternative to the Hadoop Distributed File System (HDFS), written in C++. It offers several efficiency improvements relative to HDFS: - > 1. 50% dish space savings through **erasure coding** instead od replication - > 2. a resulting doubling of **write throughput** - > 3. a faster name node - > 4. support for faster sorting and logging through a concurrent append feature - > 5. a native command line client much faster than hadoop fs - > 6. global feedback-directed I/O device management. - -- Apache Hadoop maxized use of hardware by adopting a principle of **data locality**. -- To achieve fault tolerance, the HDFS adopted a sensible **3x replication strategy**. - > store one copy of the data on the machine writing ir, another on the same rack, and a third on a distant rack. - > Thus HDFS is not particularly storage efficient. - > At today's cost of $40,000 per PB. For reference, Amazon currently charges \$2.3 million to store 1 PB for three years. - -- As these developments (e.g. high bandwidth network......), the QFS **abandoned data locality**, relying on faster networks to deliver the data where it is needed, and instead optimized for storage efficiency. - -- QFS emplys **Reed-Solomom erasure coding** instead of three-way replication which delivers comparable or better fault tolerance. -- QFS was developed on the frame of the **Kosmox FIle System** an open-source distributed file system architecturally similar to Hadoop's HDFS but implemented in C++ rather than Java and at an experimental level of maturity. - -##2. QFS Architecture -- The basic design goal: - > It is intended for efficient **map-reduce-style** processing, where files are written once and read multiple time by **batch process**, rather than for random access or update operations. - > The hardware will be **heterogeneous**, as clusters tend to be built in stages over time, and disk, machine and nework failures will be routine. - -- Data is physically stored in **64MB chunks**, which are accessed via **chunk server** running on the local machine. -- A single **metaserver** keeps an in-memory mapping of logical path names to file IDs, file IDs to chunk IDs and chunk IDs to physical locations. -- A **client library** which implements Hadoop's FileSystem interface and its equivalent in C++. - -![1523365611372](1523365611372.png) - -###2.1 Erasure Coding and Striping -- Erasure coding enables QFS not only to reduce the amount of storage but also to **accelerate large sequential write patterns** common to MapReduce workloads. -- Its our **proprietary MapReduce** implementation uses QFS not onlt for results but also for intermediate sort spill files. - > Erasure coding is critical to getting these large jobs to run quickly while **tolerating hardware failures** without having to **re-execute** map tasks. - -- A data stream is stored physically using **Reed-Solomon 6+3** encoding - > The original data is striped over six chunks plus three parity chunks. - -- **Write Data**: The QFS client collects data stripes, usually 64 KB each, into **six 1MB buffers**. When they fill, it calculates an additional three parity blocks and send all **nine blocks** to **nine different chunk servers** (usually one lcoal and the other eight on different rack.) - -- **Read Data**: The client requests the six chunks holding original data - > If one or more chunks cannot be retrieved, the client will fetch enough parity data to execute the Reed-Solomon arithmetic and reconstruct the original. - -![1523504685432](1523504685432.png) - -###2.2 Failure Groups -- *To maximize data availability*, a cluster must be partitioned into **failure groups**. - > Each failure group represents machines with shared physical dependencies such as power circuits or rack switches, which are therefore more likely to fail together. - > The metaserver will attempt to assign the nine chunks into **nine different failure groups**. - > - -###2.3 Metaserver -- The QFS metaserver holds all the directory and file structure fo the file system, though **none of the data**. -- For each file, it keeps the **list of chunks** that store the data and their **physical locations on the cluster**. -- It handles client requests - > creates and mutates the direstory and file structure on their behalf. - > refers client to chunk servers and manages the overall health of the file system. - -- Metaserver holds all its data in RAM. - > As client change files and directories, it records the changes atomically both in memory and to **a transcation log** - > It forks periodically to dump **the whole file system image** into a checkpoint. - > - -####2.3.1 Chunk creation -- For **load balance**: - > 1. Chunk servers continuously report to the metaserver the size of I/O queues and available space for each disk they manage. - > 2. The metaserver dynamcially decides where to allocate new chunks so as to keep disks evenly filled and evenly busy. - > 3. It **proactively** avoids disks with problems, as they usually have large I/O queues. - -####2.3.2 Space Rebalancing and Re-replication -- QFS **rebalances** files continuously to maintain a predefined measure of balance across all devices. - > The rebalance takes place when one or more disks fill up **over a ceiling threshold**, and moves chunks to devices with space utilization **below a floor threshlod**. - > - -####2.3.3 Maintaining Redundancy -- A large cluster, **components are failing constantly**. The file system can be caught with **less redundancy** than it should have. -- The metaserver continuously monitors redundancy and recreates missing data. - -####2.3.4 Evicting Chunks -- Eviction is a request to recreate a chunk server's data elsewhere so that its machine can be safely taken down. - -####2.3.5 Hibernation -- For quick maintenance such as an operating system kernel upgrade, chunk are not **evicted**. Instead, the metaserver is told that chunk server directories are being **hinbernated**. - > This will set **a 30-minute window** during which the metaserver will not attempt to replicate or recover the data on the servers being upgraded. - - -###2.4 Chunk server -- Each chunk server stores chunks as file on the **local file system**. - > The chunk server accepts connections from clients to write and read data. - -- It verifies **data integrity** on reads and initiates reconvery on permanent I/O errors or checksum mismatches. - -###2.5 Interoperability -- QFS does not depend on Hadoop, though, and can be used in other contexts. -- The open-source distribution includes FUSE bindings, command-line tools, and C++/Java APIs. - -##3. QFS Implementation -###3.1 Direct I/O for MapReduce Workloads -- By default QFS uses direct I/O rather than the system buffer cache, for several reasons - > 1. It wanted to ensure that data is indeed written contiguously in large blocks. - > 2. It wanted RAM usage to be predictable. - > 3. The QFS metaserver makes chunk allocation decisions based on global knowledge of the queue sizes of all the I/O devices it manages. - -###3.2 Scalable Concurrent Append -- QFS implements *a concurrent append operation*, which scales up to tens of thousands of concurrent clients writing to the same file at once. - -###3.3 Metaserver Optimization -- The metaserver represents the file system metadata in a **B+ tree** to minimize minimize random memory access. - -###3.4 Client -- The QFS client library is designed to allow concurrent I/O access to mulitple files from a single client. - > 1. **non-blocking**, **run-until-completion protocol state machines** for handling a variety of tasks. - > 2. The state machine can be used directlu to create highly scalable applications. - -- The QFS library API is implemented by **running the protocol state machines** in a dedicated protocol worker thread. - > All file I/O processing including network I/O, checksumming, and recovery information calculation are performed within this thread. - > - -- The file system meta information manipulations such as **move**, **rename**, **delete**, **stat**, or **list** require communication only with **the metaserver**. - > These operations are **serialized** by the QFS client library and block the caller thread until the metaserver responds. - -- The use of **read ahead** and **write behind** keeps disk and network I/O at a reasonable size. \ No newline at end of file diff --git a/Paper Reading Note/Track-CloudNet'17/1524660411325.png b/Paper Reading Note/Track-CloudNet'17/1524660411325.png deleted file mode 100644 index fc9313e..0000000 Binary files a/Paper Reading Note/Track-CloudNet'17/1524660411325.png and /dev/null differ diff --git a/Paper Reading Note/Track-CloudNet'17/1524660530235.png b/Paper Reading Note/Track-CloudNet'17/1524660530235.png deleted file mode 100644 index 98f4067..0000000 Binary files a/Paper Reading Note/Track-CloudNet'17/1524660530235.png and /dev/null differ diff --git a/Paper Reading Note/Track-CloudNet'17/1524661117396.png b/Paper Reading Note/Track-CloudNet'17/1524661117396.png deleted file mode 100644 index 3f94b23..0000000 Binary files a/Paper Reading Note/Track-CloudNet'17/1524661117396.png and /dev/null differ diff --git a/Paper Reading Note/Track-CloudNet'17/1524663396210.png b/Paper Reading Note/Track-CloudNet'17/1524663396210.png deleted file mode 100644 index e45eb6f..0000000 Binary files a/Paper Reading Note/Track-CloudNet'17/1524663396210.png and /dev/null differ diff --git a/Paper Reading Note/Track-CloudNet'17/Track.md b/Paper Reading Note/Track-CloudNet'17/Track.md deleted file mode 100644 index d52b44b..0000000 --- a/Paper Reading Note/Track-CloudNet'17/Track.md +++ /dev/null @@ -1,93 +0,0 @@ ---- -typora-copy-images-to: ./ ---- - -# Track: Tracerouting in SDN Networks with Arbitary Network Functions -@CloudNet'17 -[TOC] - -##1. Background and Motivation -- Existing path tracing tools largely utilize **packet tags** to probe network paths among SDN-enabled switches. -- **Ubiquitous Network Functions (NFs) ** or middleboxes can drop packets or alter their tags which can collapse the probing mechanism. -- Sending probing packets through network functions could **corrupt their internal states**, risking of correctness of servicing logic. - > e.g. incorrect load balancing decisions - -##2. Related Work -- 1. **SDN traceroute**: querying the current path taken by *any types of packet*. - > SDN traceroute cannot work correctly in a network with network functions (or middleboxes) because some NFs, such as proxy and load balancer can modify packet headers and/or payload. - -- 2. **SFC Path Trace**: it is able to trace paths consisting of NFs. However, it also relies on **tags probing packets**. It identifies the type of NFs that have forwarded tagged packets to the controller NFs through **looking up their device IDs from the predefined topology**. - > a. It will greatly limit its usability when a person has only partial or no access to the topology information. - > b. sending probing packets through them may corrupt their internal states. - -##3. Track -- The main idea: - >1. **Track** treat the whole path as several sub-paths joined by NFs. - >2. It injects a probing packet with user-defined header fields into networks to trace each sub-path. - >3. It runs **a correlation procedure to infer behaviors of NFs** and concatenate all sub-paths in correct order according to correlation results. (eliminates the requirement of look-up of NF's ID from pre-defined topology information.) - > This method uses a correlation procedure rather than sending probing packets through them, **preserving their internal states**. - -###3.1 System Design and Implementation -####Design Principles: -**Track** is a diagnosing tool for debugging in SDN environment with NFs. -1. Do not corrupt NF states -2. Do not modify NF service logic -3. Do not modify production rules -- How about controller? -1. It knows the topology of a given network -2. the controller knows which switch has an NF attached to it (NF-switches) - -####System Architecture -![1524660530235](1524660530235.png) - -####Correlation Module: -1) NFs may drop the packet or dynamically modify its headers and contents. This paper roughly classify NFs into 4 types. -![1524661117396](1524661117396.png) - -2) Correlation Procedure: -It treats NFs as **blackboxes** and infer their relevant **input-output** behaviors. (这里假设不去ask network administrators关于NF的信息) -It only need to reason about the NFs behaviors pertinent to packets' forwarding. -**WorkFlow**: -a. Collecting packets -b. Flow mapping -c. Calculate payload similarity -d. Identify the most similar flows - -3) Implementation of the Correlation Module: -The controller **install rules** at NF-switches to retrieve the first few packets for each new flow. - - -####Tracing Module -![1524663396210](1524663396210.png) -1) Pre-installed rules: The rules must support two different tasks: - > a) mathing the incoming probing packets so the hop can be logged at the controller (**similar to SDN traceroute**) - > b) forwarding the controller returned probing packets as normal packets. - -Track requires all probing packets to carry a tag so that switches can differentiate probing packets and normal packets. (这一点也是必须的) - -2) Tracing procedure: -1. Users need to specify packet header (e.g. source/destination IP address, source/destination port and so on) -2. Track constructs the packet with user specified packet header fields and tag -3. Identify the injection point which is the switch connected to the source specified by user. -4. Track **sends the probing packet to injection point**, it would be sent to the controller as **a PACKET_IN**. -5. If current hop is not an NF-switch, Track do the same operations as **SDN Traceroute**. -6. If current hop is an NF-switch, Track would modify the probing packet **as the NFs do** (according to the mappings) and preserve the probe tag. - -Track only log down the information of PACKET_IN messages with probe tag. - -3) Implementation of Tracing Module: -Using the interfaces of RYU to construct probing packets. - - -##4. Experiment -- Two metrics: **accuracy** and **latency** -Latency VS Path Length (Compared with SDN traceroute) -Latency VS Different types of NFs -Effectiveness and efficiency of **correlation procedure** (accuracy) - -##4. The weakness in its method -1. Not consider the Dynamic Change in SFC -2. Need to find the correlation between incoming and outgoing flows -3. Cannot handle the case of multiple paths - - diff --git a/README.md b/README.md index c68bea1..957673b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -# Storage System Paper List +# Zuoru's Storage System Reading List -In this repo, it records some paper related to storage system, including **Data Deduplication** (aka, dedup), **Erasure Coding** (aka, EC), general **Distributed Storage System** (aka, DSS) and other related topics (i.e., Network Security.....), updating from time to time~ +A reading list related to storage system, including data deduplication, erasure coding, general storage and other related topics (i.e., Security...), updating from time to time~ [TOC] ## A. Data Deduplication @@ -27,6 +27,8 @@ In this repo, it records some paper related to storage system, including **Data 11. *Inside Dropbox: Understanding Personal Cloud Storage Services*----IMC'12 11. *Identifying Trends in Enterprise Data Protection Systems*----USENIX ATC'15 ([link](https://www.usenix.org/system/files/conference/atc15/atc15-paper-amvrosladis.pdf)) 11. *Deduplication Analyses of Multimedia System Images*----HotStorage'18 ([link](https://www.usenix.org/system/files/conference/hotedge18/hotedge18-papers-suess.pdf)) +14. *Improving Docker Registry Design based on Production Workload Analysis*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-anwar.pdf)) +14. *Insights for Data Reduction in Primary Storage: a Practical Analysis*----SYSTOR'12 ([link](https://dl.acm.org/doi/pdf/10.1145/2367589.2367606)) ### Deduplication System Design @@ -44,15 +46,14 @@ In this repo, it records some paper related to storage system, including **Data 12. *SmartDedup: Optimizing Deduplication for Resource-constrained Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-yang-qirui.pdf)) 13. Can't We All Get Along? Redesigning Protection Storage for Modern Workloads----USENIX ATC'18 ([link](https://www.usenix.org/system/files/conference/atc18/atc18-allu.pdf)) [summary](https://yzr95924.github.io/paper_summary/Redesigning-ATC'18.html) 14. *Deduplication in SSDs: Model and quantitative analysis*----MSST'12 ([link](https://ieeexplore.ieee.org/document/6232379)) -16. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf )) [summary](https://yzr95924.github.io/paper_summary/iDedup-FAST'12.html) -17. *DupHunter: Flexible High-Performance Deduplication for Docker Registries*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-zhao.pdf)) -18. *Design Tradeoffs for Data Deduplication Performance in Backup Workloads*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-fu.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupDesignTradeoff-FAST'15.html) -19. *The Dilemma between Deduplication and Locality: Can Both be Achieved?*---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html) +15. *iDedup: Latency-aware, Inline Data Deduplication for Primary Storage*----FAST'12 ([link]( https://www.usenix.org/legacy/event/fast12/tech/full_papers/Srinivasan.pdf )) [summary](https://yzr95924.github.io/paper_summary/iDedup-FAST'12.html) +16. *DupHunter: Flexible High-Performance Deduplication for Docker Registries*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-zhao.pdf)) +17. *Design Tradeoffs for Data Deduplication Performance in Backup Workloads*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-fu.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupDesignTradeoff-FAST'15.html) +18. *The Dilemma between Deduplication and Locality: Can Both be Achieved?*---FAST'21 ([link](https://www.usenix.org/system/files/fast21-zou.pdf)) [summary](https://yzr95924.github.io/paper_summary/MFDedup-FAST'21.html) 19. *SLIMSTORE: A Cloud-based Deduplication System for Multi-version Backups*----ICDE'21 ([link](http://www.cs.utah.edu/~lifeifei/papers/slimstore-icde21.pdf)) 20. *Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling*----ToS'21 ([link](https://dl.acm.org/doi/full/10.1145/3459626)) -20. *Sorted Deduplication: How to Process Thousands of Backup Streams*----MSST'16 ([link](https://storageconference.us/2016/Papers/SortedDeduplication.pdf)) -20. *Deriving and Comparing Deduplication Techniques Using a Model-Based Classification*----EuroSys'15 ([link](https://dl.acm.org/doi/pdf/10.1145/2741948.2741952)) -20. *DedupSearch: Two-Phase Deduplication Aware Keyword Search*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-elias.pdf)) +21. *Sorted Deduplication: How to Process Thousands of Backup Streams*----MSST'16 ([link](https://storageconference.us/2016/Papers/SortedDeduplication.pdf)) +22. *Deriving and Comparing Deduplication Techniques Using a Model-Based Classification*----EuroSys'15 ([link](https://dl.acm.org/doi/pdf/10.1145/2741948.2741952)) ### Restore Performances @@ -60,7 +61,7 @@ In this repo, it records some paper related to storage system, including **Data 2. *ALACC: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive Look-Ahead Window Assisted Chunk Caching*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-cao.pdf)) [summary](https://yzr95924.github.io/paper_summary/ALACC-FAST'18.html) 3. *Reducing Impact of Data Fragmentation Caused by In-line Deduplication*----SYSTOR'12 ([link](http://9livesdata.com/wp-content/uploads/2017/04/AsPresentedOnSYSTOR-1.pdf)) 4. *Reducing Fragmentation Impact with Forward Knowledge in Backup Systems with Deduplication*----SYSTOR'15 ([link](https://dl.acm.org/doi/10.1145/2757667.2757678)) -5. *Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets*----MASCOTS'12 +5. *Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets*----MASCOTS'12 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6298180)) 6. *Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance*----FAST'19 ([link](https://www.usenix.org/system/files/fast19-cao.pdf)) [summary](https://yzr95924.github.io/paper_summary/LookBackWindow-FAST'19.html) 7. *Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication*---FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final124.pdf)) [summary](https://yzr95924.github.io/paper_summary/ImproveRestore-FAST'13.html) 8. *Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage*----HPCC'11 @@ -99,7 +100,7 @@ In this repo, it records some paper related to storage system, including **Data 29. *S2Dedup: SGX-enabled Secure Deduplication*----SYSTOR'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3456727.3463773)) [summary](https://yzr95924.github.io/paper_summary/S2Dedup-SYSTOR'21.html) 30. *Secure Deduplication of General Computations*----USENIX ATC'15 ([link](https://www.usenix.org/system/files/conference/atc15/atc15-paper-tang.pdf)) 31. *When Delta Sync Meets Message-Locked Encryption: a Feature-based Delta Sync Scheme for Encrypted Cloud Storage*----ICDCS'21 ([link](https://shenzr.github.io/publications/featuresync-icdcs21.pdf)) [summary](https://yzr95924.github.io/paper_summary/FeatureSync-ICDCS'21.html) -31. *DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-bacs.pdf)) [summary](https://yzr95924.github.io/paper_summary/DeepSketch-FAST'22.html) +31. *DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-bacs.pdf)) [summary](https://yzr95924.github.io/paper_summary/DUPEFS-FAST'22.html) ### Metadata Management @@ -141,7 +142,10 @@ In this repo, it records some paper related to storage system, including **Data 13. Ddelta: A Deduplication-inspired Fast Delta Compression Approach----Performance'14 ([link](https://www.sciencedirect.com/science/article/pii/S0166531614000790)) 14. *Odess: Speeding up Resemblance Detection for Redundancy Elimination by Fast Content-Defined Sampling*----ICDE'14 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9458911)) 15. *Exploring the Potential of Fast Delta Encoding: Marching to a Higher Compression Ratio*----CLUSTER'20 ([link](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9229609)) [summary](https://yzr95924.github.io/paper_summary/Gdelta-CLUSTER'20.html) -15. *DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-park.pdf)) +15. *DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-park.pdf)) [summary](https://yzr95924.github.io/paper_summary/DeepSketch-FAST'22.html) +17. *DedupSearch: Two-Phase Deduplication Aware Keyword Search*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-elias.pdf)) [summary](https://yzr95924.github.io/paper_summary/DedupSearch-FAST'22.html) +17. *To Zip or not to Zip: Effective Resource Usage for Real-Time Compression*----FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final38.pdf)) [summary](https://yzr95924.github.io/paper_summary/CompressionEst-FAST'13.html) +17. *Adaptively Compressing IoT Data on the Resource-constrained Edge*----HotEdge'20 ([link](https://www.usenix.org/system/files/hotedge20_paper_lu.pdf)) ### Memory && Block-Layer Deduplication @@ -151,6 +155,8 @@ In this repo, it records some paper related to storage system, including **Data 4. *OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash*----FAST'16 ([link](https://www.usenix.org/system/files/conference/fast16/fast16-papers-chen-zhuan.pdf)) 5. *CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives*----FAST'11 ([link](https://www.usenix.org/legacy/event/fast11/tech/full_papers/Chen.pdf)) [summary](https://yzr95924.github.io/paper_summary/CAFTL-FAST'11.html) 5. *Remap-SSD: Safely and Efficiently Exploiting SSD Address Remapping to Eliminate Duplicate Writes*----FAST'21 ([link](https://www.usenix.org/system/files/fast21-zhou.pdf)) +7. *Memory Deduplication for Serverless Computing with Medes*----EuroSys'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3492321.3524272)) +8. On the Effectiveness of Same-Domain Memory Deduplication----EuroSec'22 ([link](https://download.vusec.net/papers/dedupestreturns_eurosec22.pdf)) ### Data Chunking 1. *SS-CDC: A Two-stage Parallel Content-Defined Chunking for Deduplicating Backup Storage*----SYSTOR'19 ([link]( http://ranger.uta.edu/~sjiang/pubs/papers/ni19-ss-cdc.pdf )) [summary](https://yzr95924.github.io/paper_summary/SSCDC-SYSTOR'19.html) @@ -171,10 +177,6 @@ In this repo, it records some paper related to storage system, including **Data 3. *Nitro: A Capacity-Optimized SSD Cache for Primary Storage*----USENIX ATC'14 ([link](https://www.usenix.org/system/files/conference/atc14/atc14-paper-li_cheng_nitro.pdf)) 4. *Austere Flash Caching with Deduplication and Compression*----USENIX ATC'20 ([link](https://www.usenix.org/system/files/atc20-wang-qiuping.pdf)) -### Benchmark - -1. *SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-gracia-tinedo.pdf)) - ### Garbage Collection 1. *Memory Efficient Sanitization of a Deduplicated Storage System*----FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final100_0.pdf)) [summary](https://yzr95924.github.io/paper_summary/MemorySanitization-FAST'13.html) @@ -194,8 +196,9 @@ In this repo, it records some paper related to storage system, including **Data 4. *Tradeoffs in Scalable Data Routing for Deduplication Clusters*----FAST'11 ([link](https://www.usenix.org/legacy/events/fast11/tech/full_papers/Dong.pdf)) [summary]( https://yzr95924.github.io/paper_summary/TradeoffDataRouting-FAST'11.html ) 5. *Cluster and Single-Node Analysis of Long-Term Deduplication Patterns*----ToS'18 ([link](https://dl.acm.org/doi/pdf/10.1145/3183890)) [summary](https://yzr95924.github.io/paper_summary/ClusterSingle-ToS'18.html) 6. *Decentralized Deduplication in SAN Cluster File Systems*----USENIX ATC'09 ([link](https://static.usenix.org/events/usenix09/tech/full_papers/clements/clements.pdf)) -6. *GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-nachman.pdf)) -6. *The what, The from, and The to: The Migration Games in Deduplicated Systems*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-kisous.pdf)) +7. *HYDRAstore: A Scalable Secondary Storage*----FAST'09 ([link](http://9livesdata.com/wp-content/uploads/2017/04/HYDRAstor-A-Scalable-Secondary-Storage-1.pdf)) +8. *GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-nachman.pdf)) +9. *The what, The from, and The to: The Migration Games in Deduplicated Systems*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-kisous.pdf)) [summary](https://yzr95924.github.io/paper_summary/MigrationGame-FAST'22.html) ## B. Erasure Coding @@ -279,6 +282,8 @@ In this repo, it records some paper related to storage system, including **Data 17. *Splinter: Practical Private Queries on Public Data*----NSDI'17 ([link](https://www.usenix.org/system/files/conference/nsdi17/nsdi17-wang-frank.pdf)) 18. *Quantifying Information Leakage of Deterministic Encryption*----CCSW'19 ([link]( http://users.cs.fiu.edu/~mjura011/documents/2019_CCSW_Quantifying_Information_Leakage_of_Deterministic_Encryption )) [summary](https://yzr95924.github.io/paper_summary/QuantifyingInformationLeakage-CCSW'19.html) 18. *Pancake: Frequency Smoothing for Encrypted Data Stores*----USENIX Security'20 ([link](https://www.usenix.org/system/files/sec20-grubbs.pdf)) +19. *Hiding the Lengths of Encrypted Message via Gaussian Padding*----CCS'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3460120.3484590)) +20. *On Fingerprinting Attacks and Length-Hiding Encryption*----CT-RSA'22 ([link]()) ### Secure Deletion @@ -340,8 +345,9 @@ In this repo, it records some paper related to storage system, including **Data 11. *The Google File System*----SOSP'03 ([link](https://dl.acm.org/doi/pdf/10.1145/945445.945450)) 12. *Bigtable: A Distributed Storage System for Structured Data*----OSDI'06 ([link](https://dl.acm.org/doi/pdf/10.1145/1365815.1365816)) 13. *Duplicacy: A New Generation of Cloud Backup Tool Based on Lock-Free Deduplication*----ToCC'20 ([link](https://github.com/gilbertchen/duplicacy/blob/master/duplicacy_paper.pdf)) [summary](https://yzr95924.github.io/paper_summary/Duplicacy-ToCC'20.html) +13. *RACS: A Case for Cloud Storage Diversity*----SoCC'10 ([link](http://pubs.0xff.co/papers/racs-socc.pdf)) -### New PAXOS +### Consensus 1. *In Search of an Understandable Consensus Algorithm*----USENIX ATC'14 ([link](https://raft.github.io/raft.pdf)) @@ -350,6 +356,7 @@ In this repo, it records some paper related to storage system, including **Data 1. *TinyLFU: A Highly Efficient Cache Admission Policy*----ACM ToS'17 ([link](https://arxiv.org/pdf/1512.00727.pdf)) 2. *It’s Time to Revisit LRU vs. FIFO*----HotStorage'20 ([link](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf)) [summary](https://yzr95924.github.io/paper_summary/Cache-HotStorage'20.html) [trace](http://iotta.snia.org/traces/key-value) 3. *Unifying the Data Center Caching Layer — Feasible? Profitable?*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470884)) +4. *Learning Cache Replacement with Cacheus*----FAST'21 ([link](https://www.usenix.org/system/files/fast21-rodriguez.pdf)) ### Hash @@ -357,6 +364,7 @@ In this repo, it records some paper related to storage system, including **Data 2. *An Analysis of Compare-by-Hash*----HotOS'03 ([link](http://www.cs.utah.edu/~shanth/stuff/research/dup_elim/hash_cmp.pdf)) 3. *On-the-Fly Verification of Rateless Erasure Codes for Efficient Content Distribution*----S&P'04 ([link](https://pdos.csail.mit.edu/papers/otfvec/paper.pdf)) 4. *Algorithmic Improvements for Fast Concurrent Cuckoo Hashing*----EuroSys'14 ([link](https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf)) +4. *Don’t Thrash: How to Cache your Hash on Flash*----HotStorage'11 ([link](https://www.usenix.org/legacy/events/hotstorage11/tech/final_files/Bender.pdf)) ### Lock-free storage 1. *A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring*----IPDPS'10 ([link](https://www.cse.cuhk.edu.hk/~pclee/www/pubs/ipdps10.pdf)) @@ -383,6 +391,9 @@ In this repo, it records some paper related to storage system, including **Data 1. *From blocks to rocks: a natural extension of zoned namespaces*----HotStorage'21 ([link](https://dl.acm.org/doi/pdf/10.1145/3465332.3470870)) 1. *Don’t Be a Blockhead: Zoned Namespaces Make Work on Conventional SSDs Obsolete*----HotOS'21 ([link](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s07-stavrinos.pdf)) [summary](https://yzr95924.github.io/paper_summary/BlockHead-HotOS'21.html) 1. Zone Append: A New Way of Writing to Zoned Storage----Vault'20 ([link](https://www.usenix.org/system/files/vault20_slides_bjorling.pdf)) +1. *What Systems Researchers Need to Know about NAND Flash*----HotStorage'13 ([link](https://www.usenix.org/system/files/conference/hotstorage13/hotstorage13-desnoyers.pdf)) +1. *Caveat-Scriptor: Write Anywhere Shingled Disks*----HotStorage'15 ([link](https://www.usenix.org/system/files/conference/hotstorage15/hotstorage15-kadekodi.pdf)) +1. *Improving the Reliability of Next Generation SSDs using WOM-v Codes*----FAST'22 ([link](https://www.usenix.org/system/files/fast22-jaffer.pdf)) ### File system @@ -393,8 +404,22 @@ In this repo, it records some paper related to storage system, including **Data 5. *EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices*----USENIX ATC'19 ([link](https://www.usenix.org/system/files/atc19-gao.pdf)) 5. *F2FS: A New File System for Flash Storage*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-lee.pdf)) 5. *How to Copy Files*----FAST'20 ([link](https://www.usenix.org/system/files/fast20-zhan.pdf)) +5. *BetrFS: A Compleat File System for Commodity SSDs*----EuroSys'22 ([link](https://dl.acm.org/doi/pdf/10.1145/3492321.3519571)) +5. *The Full Path to Full-Path Indexing*----FAST'18 ([link](https://www.usenix.org/system/files/conference/fast18/fast18-zhan.pdf)) +5. *BetrFS: A Right-Optimized Write-Optimized File System*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-jannen_william.pdf)) +11. *Filesystem Aging: It's more Usage than Fullness*----HotStorage'19 ([link](https://www.cs.unc.edu/~porter/pubs/hotstorage19-paper-conway.pdf)) +12. *File Systems Fated for Senescence? Nonsense, Says Science!*----FAST'17 ([link](https://www.usenix.org/system/files/conference/fast17/fast17-conway.pdf)) ### Persistent Memories 1. *SLM-DB: Single-Level Key-Value Store with Persistent Memory*----FAST'19 ([link](https://www.usenix.org/system/files/fast19-kaiyrakhmet.pdf)) [summary](https://yzr95924.github.io/paper_summary/SLMDB-FAST'19.html) 2. *Redesigning LSMs for Nonvolatile Memory with NoveLSM*----USENIX ATC'18 ([link](https://www.usenix.org/system/files/conference/atc18/atc18-kannan.pdf)) [summary](https://yzr95924.github.io/paper_summary/NoveLSM-ATC'18.html) + +### Data Structure + +1. *An Introduction to Be-trees and Write-Optimization*----USENIX Login'15 ([link](https://www.usenix.org/system/files/login/articles/login_oct15_05_bender.pdf)) [code](https://github.com/oscarlab/Be-Tree) +1. *Building Workload-Independent Storage with VT-Trees*----FAST'13 ([link](https://www.usenix.org/system/files/conference/fast13/fast13-final165_0.pdf)) + +### Benchmark + +1. *SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks*----FAST'15 ([link](https://www.usenix.org/system/files/conference/fast15/fast15-paper-gracia-tinedo.pdf)) \ No newline at end of file diff --git a/StoragePaperNote/Deduplication/Distributed-Dedup/MigrationGame-FAST'22.md b/StoragePaperNote/Deduplication/Distributed-Dedup/MigrationGame-FAST'22.md new file mode 100644 index 0000000..aeabae6 --- /dev/null +++ b/StoragePaperNote/Deduplication/Distributed-Dedup/MigrationGame-FAST'22.md @@ -0,0 +1,108 @@ +--- +typora-copy-images-to: ../paper_figure +--- +The what, The from, and The to: The Migration Games in Deduplicated Systems +------------------------------------------ +| Venue | Category | +| :------------------------: | :------------------: | +| FAST'22 | Distributed Deduplication | +[TOC] + +## 1. Summary +### Motivation of this paper + +- motivation + - the high-level management aspects of large-scale systems (e.g. capacity planning, caching and cost of service) still need to be adapted to deduplication storage + - data migration: file are remapped between separate **deduplication domains**, or **volumes** + - volumes: a single server within a large-scale system, or an independent set of servers dedicated to a customer or dataset + - employ a separate fingerprint index in each physical server + - optimize several possibly conflicting objectives + - the physical size of the stored data (after migration) + - the load balancing between the system's volumes + - the network bandwidth generated by the migration +- the main goal + - formulate the general migration problem for deduplicated systems as an optimization problem + - minimize the system's size + - ensuring that the storage load is evenly distributed between the system's volumes (**load balancing** consideration) + - the network traffic required for the migration does not exceed its allocation (**traffic** consideration) + +### Migration Games + +- problem statement + - minimizing migration traffic + - the amount of data that is transferred between volumes during migration + - load balancing + - trade-off between minimizing the total physical data size and maximizing load balancing + - extreme case: map all files to a single volume + - evenly distribute the capacity load between volumes + - use fairness metric: the ratio between the size of the smallest volume in the system and that of the largest volume (perfect: 1) + - traffic constraint, load balancing constraint + - traffic constraint: the maximum traffic allowed during migration + - load balancing constraint: a margin of the average volume size +- Greedy (extend SketchVolume) + - iterates over all the files in each volume, and calculates the space-saving ratio from remapping a single file to each of the other volumes + - each phase is allocated an even portion of the traffic allocated for migration + - load-balancing step + - remap files from large volumes to small ones, until the volume sizes are within the margin defined for this phase + - capacity-reduction step + - use **remaining traffic** to reduce the system's size +- ILP (extend GoSeed) + - all varaibles are boolean + - objective: maximize the sum of sizes of all blocks that are deleted minus all blocks that are copied + - acceleration methods + - fingerprint sampling: k leading zeroes, reducing the number of blocks in the problem + - solver timeout: halts the ILP solver's execution after a pre-determined runtime +- Clustering + - main idea: files are similar if they are share a large portion of their blocks + - create clusters of similar files and to assign each cluster to a volume + - remapping those files that were assigned to a volume different from their original location + - hierarchical clustering + - in each iteration, merge the most similar pair of clusters into a new cluster + - file similarity + - use Jaccard index for shared blocks + - traffic and load-balancing consideration + - determine the maximal cluster size by estimating the system's size after migration + - sensitivity to sample + - rather than merging the pair of clusters with the smallest distance, we merge a **random** pair from the set of pairs with the smallest distances + - constructing the final migration plan + - for the same given system and migration constraints, execute the clustering process with different parameters, use the best deletion as the final result + +### Implementation and Evaluation + +- trace: + - MS, FSL, Linux (all of them are public) +- evaluation + - basic comparison between algorithms + - the deletion percentage of the initial system's physical size + - balance score + - the total runtime + - sensitivity to problem parameters + - effect of sampling degree + - effect of load balancing and traffic constraints + - effect of randomization on Cluster + - effect of the number of volumes + +## 2. Strength (Contributions of the paper) + +- formulate a general migration problem with three approaches + - a greedy algorithm, an ILP-based approach, and hierarchical clustering + +## 3. Weakness (Limitations of the paper) + +- does not provide a system to apply its algorithm + - how to collect metadata for solving the optimization problem? +- hard to follow as the data migration problem is not common yet + - only happens in very large-scale storage system + +## 4. Some Insights (Future work) + +- related work + - SketchVolume-FAST'19 + - a greedy algorithm + - GoSeed-FAST'20 + - files are remapped into an initially **empty** target volume + - Rangoli-SYSTOR'13 + - a greedy algorithm for space reclamation + - a set of files is deleted to reclaim some of the system's capacity +- data migration in distributed deduplication systems + - if a subsystem becomes full while another subsystem has available capacity, migration is quicker and cheaper than adding capacity to the full subsystem \ No newline at end of file diff --git a/StoragePaperNote/Deduplication/Post-Dedup/CompressionEst-FAST'13.md b/StoragePaperNote/Deduplication/Post-Dedup/CompressionEst-FAST'13.md new file mode 100644 index 0000000..765c3d2 --- /dev/null +++ b/StoragePaperNote/Deduplication/Post-Dedup/CompressionEst-FAST'13.md @@ -0,0 +1,94 @@ +--- +typora-copy-images-to: ../paper_figure +--- +To Zip or Not to Zip: Effective Resource Usage for Real-Time Compression +--------------------------------------- + +| Venue | Category | +| :------------------------: | :------------------: | +| FAST'13 | Compression | +[TOC] + +## 1. Summary +### Motivation of this paper + +- motivation + - adding compression on the data path consumes **scarce CPU** and **memory** resources on the storage system + - real-time compression for block and file primary storage systems + - it is advisable to avoid compressing what we refer to as "incompressible" data + - standard LZ type compression algorithms incur higher performance overheads **when the data does not compression well** + - ![image-20220526171832531](../paper_figure/image-20220526171832531.png) +- main problem + - identifying **incompressible data** in an efficient manner, allowing systems to effectively utilize their limited resources + - a macro-scale compression estimation for the whole data set (**offline**) + - a micro-scale compressibility test for individual write operations (**online**) + +### Compression Estimation/Test + +- the macro-scale solution + - for an entire volume or file system of a storage system + - estimate the overall compression ratio with **accuracy guarantee** + - the general framework + - choose `m` random locations + - compute an average of the compression ratio of these locations + - location, contribution + - real life implementations of compression algorithms are subject to **locality limits **(can use a chunk to define the locality) + - don’t want to hold long back pointers + - memory management, need to flush their buffers + - define the contribution of a byte as **the compression ratio of its locality** +- the micro-scale solution + - for a single write: 8KB, 16KB, 32KB, 128KB + - recommend to zip or not to zip (has to be much faster than actual compression) + - do not want to read the entire chunk, impossible to get guarantees + - the heuristics method + - collect **a set of basic indicators** about the chunk + - from random samples from the chunk rather than the whole chunk + - core-set size: the character set that makes up most of the data + - byte-entropy + - symbol-pairs distribution indicator (from random distribution) + - sample: at most 2KB of data per write buffer + - 16 consecutive bytes from up to 128 randomly chosen locations + - define several thresholds to test the indicators + +### Implementation and Evaluation + +- implementation + - the macro-scale solution: written in C, multi-threaded +- evaluation + - compression ratios v.s. the number of samples + - running time v.s. compression trade-off + - compared with the prefix method and the full compression + +## 2. Strength (Contributions of the paper) + +- the macro-scale test provides a quick and accurate estimate for which data sets to compress +- the micro-scale test heuristics have proved critical in reducing resource consumption while maximizing compression for volumes containing a mix of compressible and incompressible data + +## 3. Weakness (Limitations of the paper) + +- is not general to other compression algos (e.g., LZ4, ZSTD) +- define the thresholds to find a good point for disabling compression is not clear +- evaluation is limited, no end-to-end system performance evaluation + +## 4. Some Insights (Future work) + +- a bit about compression techniques + - this paper focuses on **Zlib** - a popular compression engine for (zip), combines: + - **LZ compression**: pointers instead of repetitions + - **Huffman encoding**: use shorter encoding to popular characters +- existing solutions for estimating compression ratios + - by file extension + - not always accurate, not always available + - look at the actual data + - scan and compress everything + - look at a prefix of (a file or a chunk) and deduce about the rest + - not guarantees on the outcome + - good for compressible data - zero overhead + +- put all together + - when most is compressible + - use prefix estimation + - when significant percent is incompressible + - use heuristics method + - when most is incompressible + - turn compression off diff --git a/StoragePaperNote/Deduplication/Post-Dedup/DUPEFS-FAST'22.md b/StoragePaperNote/Deduplication/Post-Dedup/DUPEFS-FAST'22.md new file mode 100644 index 0000000..11b6d33 --- /dev/null +++ b/StoragePaperNote/Deduplication/Post-Dedup/DUPEFS-FAST'22.md @@ -0,0 +1,25 @@ +--- +typora-copy-images-to: ../paper_figure +--- +DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels +------------------------------------------ +| Venue | Category | +| :------------------------: | :------------------: | +| FAST'22 | secure deduplication | +[TOC] + +## 1. Summary +### Motivation of this paper + +- + +### Method Name + +### Implementation and Evaluation + +## 2. Strength (Contributions of the paper) + +## 3. Weakness (Limitations of the paper) + +## 4. Some Insights (Future work) + diff --git a/StoragePaperNote/Deduplication/Post-Dedup/DedupSearch-FAST'22.md b/StoragePaperNote/Deduplication/Post-Dedup/DedupSearch-FAST'22.md new file mode 100644 index 0000000..47c743e --- /dev/null +++ b/StoragePaperNote/Deduplication/Post-Dedup/DedupSearch-FAST'22.md @@ -0,0 +1,108 @@ +--- +typora-copy-images-to: ../paper_figure +--- +DedupSearch: Two-Phase Deduplication Aware Keyword Search +------------------------------------------ +| Venue | Category | +| :------------------------: | :------------------: | +| FAST'22 | Post-Deduplication functionality | +[TOC] + +## 1. Summary +### Motivation of this paper + +- motivation + - in deduplicated storage, it creates multiple logical pointers from different files and even users, to each physical chunk + - this many-to-one relationship complicates many functionalities (e.g., caching, capacity planning, and support for QoS) + - present an opportunity to rethink those functionalities to be **deduplication-aware** and **more efficient** + - this paper aims to address the keyword search issue in deduplicated storage +- the main goal + - focus on **offline search** of large, deduplicated storage systems for legal or analytics purposes +- why other approaches cannot work + - their index size is proportional to **the logical size of the data** and consume a large fraction of storage capacity + - not useful for binary strings or more complex keyword patterns (assume a delimiter set such as whitespace) + - their data structures must be continually updated as new data is received + +### DedupSearch + +- naive approaches + - opening each file and scanning its content for the specified keywords (**inefficient due to fragmentation and resulting random accesses**) + - a given chunk may be read repeatedly from storage due to deduplication +- main idea + - begin with a **physical phase** that performs a **physical scan** of the storage system and scans each chunk of data for the keywords + - reading the data sequentially with large I/Os as well as reading each chunk of data only once + - record the **exact match** of the keyword, if it is found, as well as the prefixes of suffixes of the keyword (**partial matches**) found at chunk boundaries + - then, with a **logical phase** that performs a logical scan of the file system by traversing the chunk pointers that make up the files + - instead of reading the actual data chunks +- challenges + - most deduplication systems do not maintain "back pointers" from chunks to the file that contain them (addressed by the logical phase) + - cannot associate keyword matches in a chunk with the corresponding file + - keywords might be split **between adjacent chunks** in a file (addressed by recording the partial matches) + - record the prefixes of the keyword that appear at the end of a chunk and suffixes that appear at the beginning of a chunk + +- string-matching algorithm + - use the Aho-Corasick string-match algorithm + - a trie-based algorithm for matching multiple strings in a single scan of the input + - construct a trie for the **reverse** dictionary to identify suffixes at the beginning of a chunk +- match result database + - exact matches + - chunk-result record + - location-list record: only if the chunk contains more than one exact match + - long location-list record + - tiny substrings + - keywords that begin or end with frequent letters in the alphabet might result in the allocation of numerous chunk-result record + - tiny-result record + - only if the chunk does not contain any exact match nor a partial match + - database organization + - in-memory database: chunk-result index, location-list index + - disk-based hash table: the tiny-result index +- generation of full research results + - for each file in the system, the **file recipe** is read, and the fingerprints of its chunks are used to lookup result records in the database + - collecting exact match and combining partial matches for each fingerprint + - the logical phase can be parallelized to some extent + - separate backups or files can be processed in parallel + +### Implementation and Evaluation + +- implementation + - based on Destor: three restore thread + - use Destor to ingest all the data +- evaluation + - traces + - Wikipedia backups, linux kernel versions, and Web server VM backups + - linux versions ordered by version, major version, minor version, and patch + - Wikipedia backups: archived twice a month since 2017, each snapshot is 1GiB and consists of a single archive file + - experiments + - DedupSearch performance + - effect of deduplication ratio, chunk size, dictionary size, and keywords in the dictionary + - DedupSearch data structures + - index sizes, database accesses + - DedupSearch overheads + - physical phase, logical phase + +## 2. Strength (Contributions of the paper) + +- very strong experiments +- address the string search issue from the deduplication aspect (a new direction) + - no previous work targets this issue + +## 3. Weakness (Limitations of the paper) + +- the scenario is limited + - is more appropriate when queries are **infrequent** and moderate latency is acceptable such as in legal discovery +- the main idea is very similar to DeduplicationGC-FAST'17, GoSeed-FAST'20 + - process the **post-deduplication data** **sequentially** along with an analysis phase **on the file recipes** + +- lack the support of wildcards + - since its prefix/suffix approach incur high overhead, it would be more challenging to support wildcards + - attempting to match the chunk content starting at all possible offsets within the keyword + +## 4. Some Insights (Future work) + +- the concept from **near-storage processing** + - the storage system supports certain computations to **reduce I/O traffic and memory usage** + +- the restore process considered by it + - parse the file recipe + - looking up the chunk locations in the fingerprint index + - reading their containers \ No newline at end of file diff --git a/StoragePaperNote/Deduplication/Secure-Dedup/DUPEFS-FAST'22.md b/StoragePaperNote/Deduplication/Secure-Dedup/DUPEFS-FAST'22.md new file mode 100644 index 0000000..2a8771b --- /dev/null +++ b/StoragePaperNote/Deduplication/Secure-Dedup/DUPEFS-FAST'22.md @@ -0,0 +1,96 @@ +--- +typora-copy-images-to: ../paper_figure +--- +DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels +------------------------------------------ +| Venue | Category | +| :------------------------: | :------------------: | +| FAST'22 | Secure Deduplication | +[TOC] + +## 1. Summary +### Motivation of this paper + +- motivation + - the implementation in today's advanced filesystems such as ZFS and Btrfs yields **timing side channels** that can reveal whether a chunk of data has been deduplicated + - explore the security risks existing in filesystem deduplication +- main goal + - use carefully-crafted read/write operations that show exploitation is not only feasible, but that the signal can be amplified to mount **byte-granular attacks over the network** + - the main difference from previous secure deduplication work (memory deduplication): + - filesystem operations tend to be **asynchronous** for efficiency + - the granularity of filesystem deduplication (often as large as 128 KiB) is large + +### DUPEFS + +- threat model + - an attacker who has direct or indirect (possible remote) access to the same filesystem as a victim, and the filesystem performs inline deduplication + - local: using low-level system calls such as write(), read(), sync(), fsync() + - remote: interacts with the filesystem through a program that is not under the attacker control + - e.g., a server program +- challenges + - **performance**: the I/O operations are mostly asynchronous to hide the latency + - filesystems cache data complicates the construction of a timing attack + - **reliability**: even if data is deduplicated, the metadata still needs to be written to disk, which interferes with the timing channel + - **capacity**: modern filesystems perform deduplication only across many blocks that are either temporally or spatially close to each other, clustered together in a deduplication record + - increase the entropy of any target secret deduplication record +- data fingerprinting + - relies on the general timed read/write primitive to **reveal the presence of existing known but inaccessible** data +- data exfiltration + - allow two colluding parties with direct/indirect access to the same system to communicate over a stealthy covert channel +- data leak + - alignment probing + - stretch controlled data to fill the deduplication record minus one or more bytes of secret data + - ![image-20220316134336877](../paper_figure/image-20220316134336877.png) + - secret spraying + - generate a stronger signal over LAN/WAN + - spray candidate secret values over many deduplication records and issue many writes for the corresponding guesses +- attack primitives + - ![image-20220316134407739](../paper_figure/image-20220316134407739.png) + +- mitigation + - using pseudo-same-behavior + - write path + - even for duplicated data, it still overwrites existing on-disk data + - slow down deduplicated write path + - read path + - introduce time jitter on the read path + - enforce pseudo-same-behavior for disk access patterns + +### Implementation and Evaluation + +- evaluation + - on FreeBSD for ZFS, and Linux for Btrfs + - attack effectiveness + - success rate + - attack time + - I/O + - data fingerprinting, data exfiltration, data leak + +## 2. Strength (Contributions of the paper) + +- analyze filesystem deduplication side channels and differentiate it with previous work (asynchronous disk accesses and large deduplication granularities) + - the attacker can mount byte-level data leak attacks across the network +- propose some light-weight mitigation for such attacks + +## 3. Weakness (Limitations of the paper) + +- the remote attack is based on the browser implementation and this is not very general +- the mitigation approach is practical but cannot completely eradicate the signal + +## 4. Some Insights (Future work) + +- SHA-256 vs. faster hashing + - it can also rely on faster hash functions that are not collision-resistant (such as **fletcher4**) + - Since hashing may incur collisions, some implementations include an additional step to verify that the data inside the matching deduplication records is identical + +- Deduplication granularity in filesystem deduplication + - filesystems perform deduplication at a granularity that is **a multiple of the data block size** + - a sufficient number of data blocks must be written to the filesystem to reach the deduplication record size +- the timed write primitive + - the **timing difference** of handling unique data and duplicate data + - process duplicate data is cheaper (only update the metadata) + - allow attacker to leak whether certain data is present on the filesystem during a write operation +- the timed read primitive + - duplicated data from different files end up in distinct physical memory pages + - as the page cache (in Linux) operates at the file level + - if a block of a file becomes deduplicated, its physical location on the disk **differs from its surrounding blocks** \ No newline at end of file diff --git a/StoragePaperNote/template.md b/StoragePaperNote/template.md index 2765d67..a0745df 100644 --- a/StoragePaperNote/template.md +++ b/StoragePaperNote/template.md @@ -1,8 +1,8 @@ --- typora-copy-images-to: ../paper_figure --- -Redesigning LSMs for Nonvolatile Memory with NoveLSM ------------------------------------------- +# To Zip or Not to Zip: Effective Resource Usage for Real-Time Compression + | Venue | Category | | :------------------------: | :------------------: | | ATC'18 | LSM+PM | @@ -20,4 +20,3 @@ Redesigning LSMs for Nonvolatile Memory with NoveLSM ## 3. Weakness (Limitations of the paper) ## 4. Some Insights (Future work) - diff --git a/paper_figure/image-20220316134336877.png b/paper_figure/image-20220316134336877.png new file mode 100644 index 0000000..fc46ef1 Binary files /dev/null and b/paper_figure/image-20220316134336877.png differ diff --git a/paper_figure/image-20220316134407739.png b/paper_figure/image-20220316134407739.png new file mode 100644 index 0000000..d3ab1ab Binary files /dev/null and b/paper_figure/image-20220316134407739.png differ diff --git a/paper_figure/image-20220526171832531.png b/paper_figure/image-20220526171832531.png new file mode 100644 index 0000000..7dfb4c3 Binary files /dev/null and b/paper_figure/image-20220526171832531.png differ