Skip to content

Latest commit

 

History

History
350 lines (266 loc) · 14 KB

research.md

File metadata and controls

350 lines (266 loc) · 14 KB
title layout permalink
Research
default
/research/

Research

Here we loosely organize our research efforts by subject. If you can't find something or think we've forgotten anything, make sure you also consult the publications list, and consider contacting us directly or filing an issue with this page at https://github.com/squaresLab/squaresLab.github.io.

We try to be reasonably thorough, but note that this page does not attempt to reference every paper squaresLab has ever published; see publications for a more complete list.

Program Transformation and Repair

The error repair process in software systems is, historically, a resource-consuming task that relies heavily on manual developer effort. We have created and analyzed automatic program repair approaches to improve their performance and quality of created patches. We also develop approaches to automate other types of transformation, including API updates and migration. This work has produced frameworks and toolsets for automatic program transformation and repair, as well as datasets for its evaluation.

{% capture transform_text %}

We have developed a number of frameworks and toolsets for automatic program transformation and repair.

  • PolyglotPiranha: an expressive and polyglot code transformation toolset designed for automating large-scale refactorings
  • Comby: a lightweight, declarative way to change code across many languages. (Watch the StrangeLoop talk!)
  • Darjeeling: a framework for language-agnostic search-based/heuristic program repair. If you want something GenProg-like but significantly more modern, this is your cup of tea.
  • GenProg4Java: a framework for heuristic program repair for Java programs; a generally faithful reproduction of the original GenProg4C technique (below).
  • GenProg: stochastic search methods like genetic programming, combined with lightweight program analyses, to find patches for bugs in extant software. The GenProg website covers most GenProg-related research and prior results. If you want to run new experiments, you'll want to start with Darjeeling and/or BugZoo. This stuff is old. {% endcapture %}

{% capture semantic_text %}

Related Publications:

{% bibliography --query @*[project~=static-repair] %}

{% endcapture %}

{% capture eval_text %} We have developed frameworks and datasets for evaluating program repair, and conducted empirical evaluations of repair along a number of axes. Datasets include:

  • PreciseBugCollector: second place at ASE Challenge 2023! An extremely large dataset of security vulnerabilities, with associated metadata.
  • ArduBugs: a dataset of robostics-specific bugs in the Ardu* ecosystem.
  • BugZoo: an active effort to support controlled experiments on buggy C programs, particularly for program repair; it supports the reproduction, in a modern environment, of a number of scenarios from existing datasets, including ManyBugs.
  • ManyBugs and IntroClass: benchmarks and results intended to support evaluations of program repair research. We recommend BugZoo for new ManyBugs experiments.

Related Publications: {% bibliography --query @*[project~=benchmarks] %}

{% endcapture %}

{% capture genprog_text %} GenProg combines stochastic search methods like genetic programming with lightweight program analyses to find patches for real bugs in extant software. The GenProg website covers most GenProg-related research, with links to the various GitHub repositories, results, and reproduction instructions, as well as a historical list of largely GenProg-specific papers (through about 2016).

Papers analyzing or evaluating specific components of GP-based repair are (additionally) listed under search-based software engineering.

{% endcapture %}

{% capture other_transformation %}

Transformation has many uses in software engineering developer tooling, and many problems are amenable to a repair-like approach. We have explored transformation for API migration (SOAR) and API upgrades (MELT), transpilation (BatFix), and clone mitigation. We have also improved static analysis via program transformation and used it to triage fuzz tests. Beyond triage, we have an ongoing body of work for general-purpose transformation in the context of fuzz and mutation testing:

{% bibliography --query @*[project~=transform-testing] %}

{% endcapture %}

{% capture all_repair %} For completeness, the following publications are those that either (a) overview repair generally (CACM articles and the like) or (b) (the majority) propose new or substantially augmented program repair techniques. (This omits specific studies of SBSE operators like crossover; see the SBSE section).

{% bibliography --query @*[project~=new-repair] %}

{% endcapture %}

Heuristic transformation and repair generally
{{ transform_text | markdownify }}
Static and semantic repair
{{ semantic_text | markdownify }}
Datasets, experimental frameworks, and evaluations
{{ eval_text | markdownify }}
Transformation beyond repair
{{ other_transformation | markdownify }}
GenProg
{{ genprog_text | markdownify }}
All new repair techniques
{{ all_repair | markdownify }}

Search-based software engineering

{% capture sbse_papers %}

{% bibliography --query @*[project~=sbse] %}

{% endcapture %}

Our interest in applying AI to software engineering started with search-based techniques. Much of our work in this space has been repair-specific, though we have also looked at the application of GP and related search-based approaches for self-adaptive systems and knowledge reuse at the model level as well. Relevant publications (for both code and models/plans, excluding those that propose new APR approaches, with incidental contributions in SBSE algorithm design) include:

Search-based Software Engineering publications
{{ sbse_papers | markdownify }}

SE for Robotics

{% capture robot_papers %}

{% bibliography --query @*[project~=robots] %}

{% endcapture %}

Robotics and autonomous systems are becoming increasingly prevalent. These systems present new quality assurance challenges, which we both study and attempt to address via new testing and analysis techniques.

SE for robotics publications
{{ robot_papers | markdownify }}

Decompilation and reverse engineering

{% capture decomp_papers %}

{% bibliography --query @*[project~=decomp] %}

{% endcapture %}

Our work on improving developer experiences by integrating program analysis with AI includes a line of research on reverse engineering, specifically to improve decompilation and decompiler output.

Decompilation and reverse engineering
{{ decomp_papers | markdownify }}

AI and LLMs

{% capture ai_papers %}

{% bibliography --query @*[project~=ai] %}

{% endcapture %}

Our work often leverages advances in AI to develop new development tools and QA approaches to improve testing and program transformation. (This list excludes SBSE-specific work.)

AI and LLMs for SE
{{ ai_papers | markdownify }}

Understanding Develop[ment/er] Practices

To produce tools that are useful to developers, it is important to understand current software development practices. We study developers and the software and artifacts they produce to understand both the current state of software quality and which factors affect software quality.

{% capture human_papers %}

{% bibliography --query @*[project~=develop] %}

{% endcapture %}

Understanding SE and QA practices
{{ human_papers | markdownify }}