Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geneva: Evolving Censorship Evasion Strategies (CCS 19) #23

Open
wkrp opened this issue Jan 6, 2020 · 3 comments
Open

Geneva: Evolving Censorship Evasion Strategies (CCS 19) #23

wkrp opened this issue Jan 6, 2020 · 3 comments
Labels
China India Kazakhstan reading group summaries and discussions of research papers and other publications

Comments

@wkrp
Copy link
Member

wkrp commented Jan 6, 2020

Geneva: Evolving Censorship Evasion Strategies
Kevin Bock, George Hughey, Xiao Qiang, Dave Levin
https://censorbib.nymity.ch/#Bock2019a

Geneva is a genetic algorithm that automatically discovers censorship evasion strategies by combining primitive operations in various ways and evaluating the combinations against a network censor (real or simulated). The strategies it discovers are packet-level manipulations like those of Khattak et al. 2013, lib·erate, and INTANG—things like sending overlapping segments or dropping certain packets. In fact, Geneva automatically rediscovers most of the evasions that prior work had found manually, as well as new and updated ones that manual analysis probably would not have found. They train and evaluate Geneva in the lab aginst simulated censors, and in the wild against real censors in China, India, and Kazakhstan.

An evasion strategy consists of paired triggers and actions. A trigger is a predicate over packets; for example [TCP:flags:R] matches TCP RST packets. An action is an operation on a single packet: like duplicate, which makes a copy of a packet and allows you to modify the original and the copy separately; fragment, which breaks a packet or segment into two parts and likewise permits further processing on both parts; and tamper, which sets a field in the packet to a static or a random value, while updating dependent fields such as the checksum. Whenever a trigger is true, it causes its associated action to happen. Actions may recursively invoke other actions, forming a tree structure: for example duplicate has two action subtrees that say what to do with the each of the two copies. At the leaves of the tree appear the special terminal actions send and drop. There are separate lists of triggers and actions in the incoming and outgoing directions.

A sample strategy is:

[TCP:flags:A]-duplicate(
    send,
    tamper{TCP:flags:replace:R}(
        tamper{TCP:chksum:corrupt}(
            send
        )
    )
)-|
\/

The trigger [TCP:flags:A] matches outgoing ACK packets. duplicate makes two copies of the packet. The first one is sent unmodified, but the second is changed into a RST packet with a bad checksum before being sent. The \/ separates the outgoing and incoming trigger–action lists; in this case the incoming section is empty. This strategy, currently effective against the GFW, tricks the middlebox into thinking the connection has been terminated because of the RST packet it sends, but the RST actually has no effect on the remote server because of its incorrect checksum. There's a catalog of strategies in Table 1 and at https://github.com/Kkevsterrr/geneva/blob/master/strategies.md.

Geneva starts with a population of strategies that may be initialized randomly or seeded with known strategies. Individuals in the population undergo mutation (random changes to their actions) and crossover (swapping subtrees of actions with another individual). The fitness of an individual is primarily determined by its effectiveness against the censor—Geneva tries it out to see if it works—with penalties for large action trees or high network overhead. High-fitness individuals are more likely to be selected and survive into the next generation. The authors report that computing each new generation takes 5–10 minutes, and full training 4–8 hours.

The bulk of the authors' validation was in China against the Great Firewall. Geneva finds a number of strategies that confuse the firewall's notion of what the correct TCP sequence number is or whether the connection has been closed. It also finds a few weird and unexpected strategies that seem to expose previously unknown and subtle characteristics of the GFW's classification algorithms. Take for example Strategy 7 in Section 5.2: splitting a TCP segment at offset 8 doesn't work, but splitting it at offsets 8 and 12 does—even when the censored keyword is not split across segments. They additionally tested in India (on the Airtel ISP) and Kazakhstan (during the time when the TLS MITM was still happening), where Geneva found effective strategies that were comparatively simpler than the China ones.

There's a project home page:

As of this writing, the genetic training algorithm is not yet available to download, but there is source code for the client-side software that implements pre-trained strategies.

@wkrp wkrp added reading group summaries and discussions of research papers and other publications China Kazakhstan India labels Jan 6, 2020
@wkrp
Copy link
Member Author

wkrp commented Jul 2, 2020

Geneva was the topic of the 2020-06-25 Tor anti-censorship reading group.

  • Geneva could be interesting as a client-side method of reaching blocked Tor bridges. To do that, however, it would have to run on a variety of operating systems, including mobile operating systems, which could be difficult because of the low-level packet manipulations it needs to do.
    • The example of GoodbyeDPI, which has users, shows that the deployment of a system based on packet manipulation can be successful, however.
  • Why is the source code for the genetic algorithm not yet available?
  • Could the technique be applied to server-side evasion? It would be great to have something that a bridge operator could run, not requiring as much portability. (The Internet Measurement Village talk on Geneva discusses server-side evasion.)

@wkrp
Copy link
Member Author

wkrp commented Jul 22, 2020

  • Why is the source code for the genetic algorithm not yet available?

@Kkevsterrr informs me that the genetic algorithm component is now available:

@Kkevsterrr
Copy link

Hi all - I wanted to share an update in response to another question that was asked earlier:

Could the technique be applied to server-side evasion?

The answer is yes, and we've written a second paper on extending Geneva to the server-side, the challenges of server-side packet manipulation evasion, and the strategies Geneva was able to find. Our paper was recently accepted to SIGCOMM, and is available here: https://geneva.cs.umd.edu/papers/come-as-you-are.pdf . We focused specifically on ‘vanilla’/non-evasion protocols (HTTP, HTTPS, FTP, SMTP, and DNS), but in theory Geneva could be applied to identify strategies to reach/protect bridges as well. Outside the scope of that paper, we’ve done a little exploration of protecting Shadowsocks in the past with success. All of the server-side code is included in the code release mentioned by @wkrp above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
China India Kazakhstan reading group summaries and discussions of research papers and other publications
Projects
None yet
Development

No branches or pull requests

2 participants