-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexperiments.tex
23 lines (15 loc) · 4.22 KB
/
experiments.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
% !TEX root = thesis.tex
\chapter{Experimental Evaluation}
\label{ch:experiments}
In this chapter we illustrate experimental studies that we have performed for the techniques described in \mychapter\ref{ch:profiling,ch:continuous}, which have been implemented in production systems and evaluated against prominent benchmarks.
%typically evaluated against industry-strength benchmarks. For performance metrics, reported figures have been obtained by performing multiple runs in a Linux system with negligible background activity, and we also show confidence intervals stated at $95\%$ confidence level where possible.
In the first part of the chapter, we evaluate our space-efficient inter-procedural technique for context-sensitive profiling. In our analysis, we take into account a large collection of popular interactive Linux applications and industrial-strength benchmarks. Results collected for a number of accuracy and space usage metrics reinforce the theoretical prediction that the Hot Calling Context Tree (HCCT) achieves a similar precision as the CCT in a several orders of magnitude smaller, and roughly proportional to the number of hot contexts. Our implementation is cast in a full-fledged infrastructure that we developed for profiling multi-threaded Linux C/C++ applications, and ships as a plugin for the GNU Compiler Collection. We also discuss how we integrated our technique with static bursting, resulting in faster running times without substantially affecting accuracy: we incur a slowdown competitive with the \gprof\ call-graph profiler while collecting finer-grained program profiles.
In the second part, we discuss an implementation in Jikes RVM of our intra-procedural technique for multi-iteration path profiling. We present a broad experimental study on a large suite of prominent Java benchmarks, showing that our profiler can collect profiles that would have been too costly to gather using previous multi-iteration techniques. The key to the efficiency of our approach is to replace costly hash table accesses, which are required by the Ball-Larus algorithm to maintain path counters for larger programs, with substantially faster operations on trees. We then study structural properties of path profiles that span multiple iterations for several representative benchmarks, and discuss memory footprints of the \ksf\ and \kipf\ data structures for increasing values of $k$.
Finally, we present an extensive experimental evaluation of our on-stack replacement (OSR) techniques for continuous program optimization. We first analyze the performance of \osrkit\ in the \tinyvm\ proof-of-concept virtual machine that we developed in LLVM. Our goal is to address a number of typical concerns of VM builders, measuring, e.g., the impact of having an OSR point in a hot code portion, and the actual cost of performing an OSR transition. We then present an LLVM implementation of the techniques for automatically constructing OSR mapping described in \mysection\ref{ss:osr-mapping-algorithms}, evaluating the fraction of program locations where they allow OSR to be efficiently fired in prominent benchmarks. Our experiments suggest that bidirectional OSR transitions between rather different program versions can be supported almost everywhere in the code under several classic optimizations.
\ifdefined \noauthorea
\input{hcct-evaluation}
\input{kblpp-evaluation}
\input{osr-evaluation}
\fi
\section{Conclusions}
The experimental studies presented in this chapter suggest that the ideas from \mychapter\ref{ch:profiling,ch:continuous} can be efficiently implemented in production systems, yielding promising performance results for popular benchmarks. Our performance profiling techniques incur a low run-time overhead that makes them amenable to be used in an adaptive optimization system. \osrkit\ can insert OSR points in both optimized and unoptimized code with a hardly noticeable overhead. We discuss in \mysection\ref{se:CS-matlab} an example of effective adaptive optimization enabled by it. The \buildcomp\ algorithm for automatic compensation code generations allowed OSR to be performed at a very large fraction of program locations in our experiments. \mysection\ref{se:CS-debug} illustrates a case study on an use of the algorithm to improve optimized code debugging.