Skip to content
Toby Dylan Hocking edited this page Mar 23, 2020 · 1 revision

Background

Regular expressions are powerful tools for parsing loosely structured text data. Russ Cox’s ”Regular Expression Matching Can Be Simple And Fast” explains that due to backreference support, several common regular expression engines have a time complexity which is exponential in pattern size (including PCRE which is used by R). One way to achieve polynomial time complexity is to drop backreference support and use the re2 C++ library.

Related work

The figures below show timings of these libraries for matching the pattern a?a?a?aaa against the subject aaa (N=3), for several different values of N. Source.

Coding project: re2r back on CRAN

R package re2r was implemented by Qin Wenfeng in R-GSOC’16 but was archived (removed from CRAN) recently, https://cloud.r-project.org/web/packages/re2r/

TODO MORE DETAILS

The goal of this project is to get re2r back up on CRAN. That involves fixing the issues discussed here, https://github.com/qinwf/re2r/issues/22

Fixing some of the other open issues would be good as well, https://github.com/qinwf/re2r/issues

Expected impact

The re2 package provides a fast and secure way to extract data from text files using regex. More specifically, the polynomial time complexity of the re2 package will be useful for writing secure web applications that accept user-provided regular expressions. For example, say that you write a shiny app that accepts a user-provided regex. If R passes this regex to existing regex libraries such as PCRE or ICU, then this shiny app is vulnerable to a denial-of-service attack. If the user’s regex is malicious, then it could cause the R on the shiny server to stay busy computing a match for a long time (worst case exponential time complexity). In contrast, the re2 library has a worst case polynomial time complexity so would not suffer from such attacks.

Mentors

Please get in touch with Toby Dylan Hocking <[email protected]> and Qin Wenfeng <[email protected]> after completing at least one of the tests below.

Tests

Do one or several — doing more hard tests makes you more likely to be selected.

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally