From d1a62c2188d08d04e6b9126b97f395fd5fde1220 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Thu, 1 Aug 2024 17:15:10 +0200
Subject: [PATCH 1/8] start working on the tracing post

---
 posts/2024/08/state-of-tracing.md | 182 ++++++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)
 create mode 100644 posts/2024/08/state-of-tracing.md

diff --git a/posts/2024/08/state-of-tracing.md b/posts/2024/08/state-of-tracing.md
new file mode 100644
index 000000000..9042016da
--- /dev/null
+++ b/posts/2024/08/state-of-tracing.md
@@ -0,0 +1,182 @@
+<!--
+.. title: The State of Tracing
+.. slug: mining-jit-traces-missing-optimizations-z3
+.. date: 2024-08-01 17:01:09 UTC
+.. tags:
+.. category:
+.. link:
+.. description:
+.. type: text
+.. author: CF Bolz-Tereick
+-->
+
+
+
+"I'm curious what the current state of tracing JITs is. They used to be all the
+rage for a while, then I though I heard they weren't so effective, then I
+haven't heard of them at all. Is the latter because they are ubiquitous, or
+because they proved to not work so well?"
+
+https://twitter.com/ShriramKMurthi/status/1818009884484583459
+
+my opinion on tracing. this is such a complicated question, kind of too large for twitter. here's a thread that should be a blog post, with sections:
+
+
+## Meta-tracing
+
+personal context: been working on pypy since ~20 years. pypy has a meta-JIT,
+which allows it to re-use jit infrastructure for the various Python versions,
+and also for some experimental different languages like Prolog, Racket, also an
+ARM and RISC-V emulator
+
+PyPy gives itself the goal to try to be extremely compatible with all the
+quirks of the Python language. so changing the language to make things easier
+to compile is a no no. we try hard to have no opinions on language design, they
+come up with the semantics, we somehow deal.
+
+PyPy started using a tracing JIT approach *not* because we thought method jits
+are bad. but because we had failed to do a method-based meta-JIT that was using
+partial evaluation (we wrote three or four method-based prototypes that all
+weren't as good as we hoped).
+
+In the meta-JIT context tracing is nice, because tracing has relatively
+understandable behavior and its easy(ish) to tweak how things work with extra
+annotations in the interpreter source.
+
+meta-tracing often works well for us/pypy. It can often slice through the
+complicated layers of Python quite effectively and remove a lot of overhead
+(Python is more complicated than JS, imo. it's big and complex and growing)
+
+### Truffle
+
+later truffle came along and made a method-based meta-JIT using partial
+evaluation work, but with a lot more people/resources and at first requiring a
+quite specific style of interpreters
+
+it's still my impression that getting similar results with truffle is a lot
+more work than with rpython and the warmup of truffle can often pretty bad. but
+both are questions more for @smarr again
+
+## Tracing, the good
+
+the aggressive partial inlining of tracing, following just the hot path through
+lots of layers of abstraction, is obviously often really useful for generating
+fast code
+
+it should be possible to achieve the same effect in a method-based context with
+path splitting. but it's not trivial, because the path execution counts of
+inlined functions can often be very call-site dependent, and tracing gives you
+call-site dependent path splitting
+
+(the aggressive partial inlining and path splitting is even more important in
+the meta-tracing context of pypy, where some of layers are part of the runtime,
+and where  rare corner cases are basically absolutely everywhere)
+
+tracing makes a whole bunch of optimizations really easy to implement, because
+there are (to first approximation) no control flow merges. This allows us to do
+optimizations in exactly one forwards and one backwards pass. Eg our allocation
+removal/partial escape analysis is simple
+
+in a tracing jit it can therefore be quite easy to get some pretty decent
+optimizations. Our optimization of temporary allocations, the way we can reason
+about the heap, about dictionary accesses, about properties of functions of the
+runtime is all quite decent
+
+
+## Tracing, the bad
+
+downsides of tracing: in my experience it tends to have big performance cliffs.
+The 'good' cases are really good, but if something goes wrong you are annoyed
+and performance can become a lot slower. with a simple method jit perf is more
+"even"
+
+there are a bunch of strange corner cases that tracing has (when do you stop
+inlining, what about tracing recursion, what happens if your traces are too
+long, stuff like that)
+
+I agree with this too (and @samth and I have discussed it a few times when
+working on Pycket): if you trace the bytecode dispatch loop of a bytecode
+interpreter (or other interpreter-like control flow), you will get not great
+results
+
+https://twitter.com/pnguyen0112/status/1818100321652199456
+
+this is because the core assumption of the tracing jit "loops take similar
+control flow paths" is just really wrong in the case of interpreters
+
+
+this is because the core assumption of the tracing jit "loops take similar
+control flow paths" is just really wrong in the case of interpreters
+
+
+
+## Discussion
+
+"This is a really great summary. Meta-tracing is probably the one biggest
+success story. I think it has to do with how big and branchy the bytecode
+implementations are for typical dynamic languages; the trace captures latent
+type feedback naturally.
+
+There is an upper limit, tho."
+
+https://twitter.com/TitzerBL/status/1818385622203298265
+
+Exactly, the complexity of py bytecodes is a big factor for why meta tracing
+works well for us. But also in python there are many builtin types (collection
+types, types that form the mop, stdlib modules implemented in C/rpython) and
+tracing operations on them is important too
+
+Stefan Marr:
+"I think Mozilla had a blog post talking more about the difficulty with
+TraceMonkey, could only find this one:
+https://blog.mozilla.org/nnethercote/category/jagermonkey/"
+https://twitter.com/smarr/status/1818600052752797990
+
+imo doing tracing for JS is really hard mode, because the browser is so
+incredibly warmup-sensitive. IIRC tracemonkey used a really low loop trip count
+(single-digit?) to decide when to start tracing (pypy uses >1000). the JS
+interpreters of the time were also quite slow.
+
+Max Bernstein:
+"What about basic block versioning?"
+https://twitter.com/tekknolagi/status/1818368411157905482
+
+It's another point in the phase space ;-). I like it a lot, and maybe it could
+be pushed really far to give the best of both cfg-based and tracing approaches.
+I'd be curious to see a BBV-based meta-JIT (but unfortunately writing meta-JITs
+is super expensive in terms of time).
+
+Maxime Chevalier:
+"There are a number of corner cases you have to deal with in a tracing JIT. It's
+unfortunately not as simple and easy as the initial papers would have you
+believe. One example is how would you deal with a loop inside a loop? Is your
+tracing now recursive?
+
+There's been some research work on trace stitching to deal with trace explosion
+but it does add complexity. With the increase in complexity, I think most
+industrial VM developers would rather pick tried-and-true method-based JITs
+that are well understood."
+
+https://twitter.com/Love2Code/status/1818292516753383644
+
+## Conclusion
+
+In a non-meta-jit it's very unclear to me that you should use tracing. Rather
+spend effort on a solid cfg-based baseline and then try to get some of the good
+properties of tracing on top (path splitting, partial inlining, etc)
+
+in the meta-jit of pypy context I still think it's a relatively pragmatic
+choice, and in the cases where it works well the performance of pypy is quite
+hard to beat (particularly with the constraint of not being "allowed" to change
+the language)
+
+this is all purely based on the data point of a single project, of course,
+albeit one that has implemented a whole bunch of different languages. please
+everyone tell me if you disagree with me.
+
+a side point: nobody in the current thread did this, but people who haven't
+worked on python tend to underestimate its complexity. A pet peeve of mine is
+C++ compiler devs/static analysis people/other well-meaning communities coming
+with statements like "why don't you just..."  🤷‍♀️
+
+

From 09aadff88ed37eee2299cdc4d515f847af1a9c65 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Fri, 23 Aug 2024 16:04:22 +0200
Subject: [PATCH 2/8] prosify

---
 posts/2024/08/state-of-tracing.md | 226 ++++++++++++++++--------------
 1 file changed, 124 insertions(+), 102 deletions(-)

diff --git a/posts/2024/08/state-of-tracing.md b/posts/2024/08/state-of-tracing.md
index 9042016da..617fa0207 100644
--- a/posts/2024/08/state-of-tracing.md
+++ b/posts/2024/08/state-of-tracing.md
@@ -10,143 +10,167 @@
 .. author: CF Bolz-Tereick
 -->
 
-
+A few weeks ago, [Shriram Krishnamurthi](https://cs.brown.edu/~sk/) [asked on
+Twitter](https://twitter.com/ShriramKMurthi/status/1818009884484583459):
 
 "I'm curious what the current state of tracing JITs is. They used to be all the
 rage for a while, then I though I heard they weren't so effective, then I
 haven't heard of them at all. Is the latter because they are ubiquitous, or
 because they proved to not work so well?"
 
-https://twitter.com/ShriramKMurthi/status/1818009884484583459
-
-my opinion on tracing. this is such a complicated question, kind of too large for twitter. here's a thread that should be a blog post, with sections:
+I replied with my personal (partly pretty subjective) opinions about the
+question in a lengthy Twitter thread (which also spawned an even lengthier
+discussion). I wanted to turn what I wrote there into a blog post to make it
+more widely available. The blog post i still somewhat terse, I've tried to at
+least add links to further information. Please ask in the comments if something
+is particularly unclear.
 
 
 ## Meta-tracing
 
-personal context: been working on pypy since ~20 years. pypy has a meta-JIT,
-which allows it to re-use jit infrastructure for the various Python versions,
-and also for some experimental different languages like Prolog, Racket, also an
-ARM and RISC-V emulator
+First some personal context: my perspective is informed by nearly two decades
+of work on PyPy. PyPy's implementation language, RPython, has support a
+meta-JIT, which allows it to re-use its JIT infrastructure for the various
+Python versions that we support (currently we do releases of PyPy2.7 and
+PyPy3.10 together). We have also used the meta-JIT infrastructure for some
+experimental different languages like Prolog, Racket, a database (those
+implementations had various degrees of maturity and most of them are research
+software and aren't maintained any more), but also some more surprising things
+like an ARM and RISC-V emulator.
 
 PyPy gives itself the goal to try to be extremely compatible with all the
-quirks of the Python language. so changing the language to make things easier
-to compile is a no no. we try hard to have no opinions on language design, they
-come up with the semantics, we somehow deal.
-
-PyPy started using a tracing JIT approach *not* because we thought method jits
-are bad. but because we had failed to do a method-based meta-JIT that was using
-partial evaluation (we wrote three or four method-based prototypes that all
-weren't as good as we hoped).
-
-In the meta-JIT context tracing is nice, because tracing has relatively
-understandable behavior and its easy(ish) to tweak how things work with extra
-annotations in the interpreter source.
+quirks of the Python language. We don't change the Python language to make
+things easier to compile. We try very hard to have no opinions on language
+design. The CPython core developers come up with the semantics, we somehow deal
+with them.
+
+PyPy started using a tracing JIT approach *not* because we thought method-based
+just-in-time compilers are bad. Historically we had tried to implemend a
+method-based meta-JIT that was partial evaluation (we wrote three or four
+method-based prototypes that all weren't as good as we hoped). After all those
+experiments failed we switched to the tracing approach, and only at this point
+did our meta-JIT start producing interesting performance.
+
+In the meta-JIT context tracing has good propreties, because tracing has
+relatively understandable behavior and its easy(ish) to tweak how things work
+with extra annotations in the interpreter source.
+
+Another reason why meta-tracing often works well for PyPy is that it can often
+slice through the complicated layers of Python quite effectively and remove a
+lot of overhead. Python is often described as simple, but I think that's
+actually a misconception. On the implementation level it's a very big and
+complicated language, and it is also continuously getting new features every
+year (the language is quite a bit more complicated than Javascript, for
+example).
 
-meta-tracing often works well for us/pypy. It can often slice through the
-complicated layers of Python quite effectively and remove a lot of overhead
-(Python is more complicated than JS, imo. it's big and complex and growing)
 
 ### Truffle
 
-later truffle came along and made a method-based meta-JIT using partial
-evaluation work, but with a lot more people/resources and at first requiring a
-quite specific style of interpreters
-
-it's still my impression that getting similar results with truffle is a lot
-more work than with rpython and the warmup of truffle can often pretty bad. but
-both are questions more for @smarr again
-
-## Tracing, the good
-
-the aggressive partial inlining of tracing, following just the hot path through
-lots of layers of abstraction, is obviously often really useful for generating
-fast code
+Later Truffle came along and made a method-based meta-JIT using partial
+evaluation work. However Truffle (and Graal) has had significantly more people
+working on it and much more money invested. In addition, it at first required a
+quite specific style of AST-based interpreters (in the last few years they have
+also started supporting bytecode-based interpreters).
 
-it should be possible to achieve the same effect in a method-based context with
-path splitting. but it's not trivial, because the path execution counts of
-inlined functions can often be very call-site dependent, and tracing gives you
-call-site dependent path splitting
+It's still my impression that getting similar results with Truffle is a lot
+more work for language implementers than with RPython, and the warmup of
+Truffle can often pretty bad. But Truffle is definitely an existence proof that
+meta-JITs don't *have* to be based on tracing.
 
-(the aggressive partial inlining and path splitting is even more important in
-the meta-tracing context of pypy, where some of layers are part of the runtime,
-and where  rare corner cases are basically absolutely everywhere)
 
-tracing makes a whole bunch of optimizations really easy to implement, because
-there are (to first approximation) no control flow merges. This allows us to do
-optimizations in exactly one forwards and one backwards pass. Eg our allocation
-removal/partial escape analysis is simple
+## Tracing, the good
 
-in a tracing jit it can therefore be quite easy to get some pretty decent
-optimizations. Our optimization of temporary allocations, the way we can reason
-about the heap, about dictionary accesses, about properties of functions of the
-runtime is all quite decent
+Let's now discuss some of the advantages of tracing that go beyond the ease of
+using tracing for a meta-JIT.
 
+Tracing allows for doing very aggressive partial inlining, following just the
+hot path through lots of layers of abstraction, is obviously often really
+useful for generating fast code
 
-## Tracing, the bad
+It's definitely possible to achieve the same effect in a method-based context
+with path splitting. But it requires a lot more implementation work and is not
+trivial, because the path execution counts of inlined functions can often be
+very call-site dependent, and tracing gives you call-site dependent path
+splitting "for free".
 
-downsides of tracing: in my experience it tends to have big performance cliffs.
-The 'good' cases are really good, but if something goes wrong you are annoyed
-and performance can become a lot slower. with a simple method jit perf is more
-"even"
+(The aggressive partial inlining and path splitting is even more important in
+the meta-tracing context of PyPy, where some of inlined layers are a part of
+the language runtime, and where rare corner cases that are never executed in
+practice are basically absolutely everywhere.)
 
-there are a bunch of strange corner cases that tracing has (when do you stop
-inlining, what about tracing recursion, what happens if your traces are too
-long, stuff like that)
+Another advantage of tracing is that it makes a whole bunch of optimizations
+really easy to implement, because there are (to first approximation) no control
+flow merges. This makes all the optimizations that we do (super-)local
+optimizations, that operate on a single (very long) basic block. This the JIT
+to do the optimizations in exactly one forwards and one backwards pass. Eg our
+allocation removal/partial escape analysis is simple.
 
-I agree with this too (and @samth and I have discussed it a few times when
-working on Pycket): if you trace the bytecode dispatch loop of a bytecode
-interpreter (or other interpreter-like control flow), you will get not great
-results
+This ease of implementation of optimizations allowed us to implement some
+pretty decent optimizations. Our optimization of temporary allocations, the way
+we can reason about the heap, about dictionary accesses, about properties of
+functions of the runtime, about the range and known bits of integer variables
+is all quite solid.
 
-https://twitter.com/pnguyen0112/status/1818100321652199456
 
-this is because the core assumption of the tracing jit "loops take similar
-control flow paths" is just really wrong in the case of interpreters
+## Tracing, the bad
 
+Tracing also comes with a significant number of downsides. Probably the biggest
+one is that it tends to have big performance cliffs (PyPy certainly has them,
+and other tracing JITs such as TraceMonkey had them too). The 'good' cases are
+really good, but if something goes wrong you are annoyed and performance can
+become a lot slower. With a simple method jit the performance is often much
+more "even".
 
-this is because the core assumption of the tracing jit "loops take similar
-control flow paths" is just really wrong in the case of interpreters
+Another set of downsides is that tracing has a number of corner cases and
+"weird" behaviour in certain situations. Questions such as:
+- When do you stop inlining?
+- What happens when you trace recursion?
+- What happens if your traces are consistently too long, even without inling?
+- and so on...
 
+There are also some classes of programs that tend to perform quite poorly when
+they are executed by a tracing JIT, bytecode interpreters in particularly, and
+other extremely unpredictably branchy code. This is because the core assumption
+of the tracing jit "loops take similar control flow paths" is just really wrong
+in the case of interpreters.
 
 
 ## Discussion
 
+The Twitter thread spawned quite a bit of discussion, please look at the
+original thread. Here are three that I wanted to highlight:
+
 "This is a really great summary. Meta-tracing is probably the one biggest
 success story. I think it has to do with how big and branchy the bytecode
 implementations are for typical dynamic languages; the trace captures latent
 type feedback naturally.
 
-There is an upper limit, tho."
+There is an upper limit, tho." [Ben Titzer](https://twitter.com/TitzerBL/status/1818385622203298265)
 
-https://twitter.com/TitzerBL/status/1818385622203298265
+I agree with this completely, the complexity of Python bytecodes is a big
+factor for why meta tracing works well for us. But also in Python there are
+many builtin types (collection types, types that form the meta-object protocol
+of Python, standard library modules implemented in C/RPython) and tracing
+operations on them is very important too, for good performance.
 
-Exactly, the complexity of py bytecodes is a big factor for why meta tracing
-works well for us. But also in python there are many builtin types (collection
-types, types that form the mop, stdlib modules implemented in C/rpython) and
-tracing operations on them is important too
 
-Stefan Marr:
+----
+
 "I think Mozilla had a blog post talking more about the difficulty with
 TraceMonkey, could only find this one:
 https://blog.mozilla.org/nnethercote/category/jagermonkey/"
-https://twitter.com/smarr/status/1818600052752797990
 
-imo doing tracing for JS is really hard mode, because the browser is so
+[Stefan Marr](https://twitter.com/smarr/status/1818600052752797990)
+
+"imo doing tracing for JS is really hard mode, because the browser is so
 incredibly warmup-sensitive. IIRC tracemonkey used a really low loop trip count
 (single-digit?) to decide when to start tracing (pypy uses >1000). the JS
-interpreters of the time were also quite slow.
+interpreters of the time were also quite slow."
 
-Max Bernstein:
-"What about basic block versioning?"
-https://twitter.com/tekknolagi/status/1818368411157905482
+[me](https://twitter.com/cfbolz/status/1818609594219811245)
 
-It's another point in the phase space ;-). I like it a lot, and maybe it could
-be pushed really far to give the best of both cfg-based and tracing approaches.
-I'd be curious to see a BBV-based meta-JIT (but unfortunately writing meta-JITs
-is super expensive in terms of time).
+----
 
-Maxime Chevalier:
 "There are a number of corner cases you have to deal with in a tracing JIT. It's
 unfortunately not as simple and easy as the initial papers would have you
 believe. One example is how would you deal with a loop inside a loop? Is your
@@ -157,26 +181,24 @@ but it does add complexity. With the increase in complexity, I think most
 industrial VM developers would rather pick tried-and-true method-based JITs
 that are well understood."
 
-https://twitter.com/Love2Code/status/1818292516753383644
+[Maxime Chevalier](https://twitter.com/Love2Code/status/1818292516753383644)
 
 ## Conclusion
 
-In a non-meta-jit it's very unclear to me that you should use tracing. Rather
-spend effort on a solid cfg-based baseline and then try to get some of the good
-properties of tracing on top (path splitting, partial inlining, etc)
+In a non-meta-jit it's very unclear to me that you should use tracing. It makes
+more sense to rather spend effort on a solid control-flow-graph-based baseline
+and then try to get some of the good properties of tracing on top (path
+splitting, partial inlining, etc).
 
-in the meta-jit of pypy context I still think it's a relatively pragmatic
-choice, and in the cases where it works well the performance of pypy is quite
+For PyPy with it's meta-JIT I still think tracing is a relatively pragmatic
+choice, and in the cases where it works well the performance of PyPy is quite
 hard to beat (particularly with the constraint of not being "allowed" to change
-the language)
-
-this is all purely based on the data point of a single project, of course,
-albeit one that has implemented a whole bunch of different languages. please
-everyone tell me if you disagree with me.
-
-a side point: nobody in the current thread did this, but people who haven't
-worked on python tend to underestimate its complexity. A pet peeve of mine is
-C++ compiler devs/static analysis people/other well-meaning communities coming
-with statements like "why don't you just..."  🤷‍♀️
+the language).
 
+All of the above is all purely based on the data point of a single project, of
+course, but one that has implemented a number of different languages.
 
+(A side point: people who haven't worked on Python tend to underestimate
+its complexity. A pet peeve of mine is C++ compiler devs/static
+analysis/Javascript people/other well-meaning communities coming with
+statements like "why don't you just..."  🤷‍♀️)

From 16418a935b38c417511aa5963b2d093fe76c5fe5 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Fri, 23 Aug 2024 16:10:53 +0200
Subject: [PATCH 3/8] move

---
 posts/2024/{08 => 09}/state-of-tracing.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename posts/2024/{08 => 09}/state-of-tracing.md (100%)

diff --git a/posts/2024/08/state-of-tracing.md b/posts/2024/09/state-of-tracing.md
similarity index 100%
rename from posts/2024/08/state-of-tracing.md
rename to posts/2024/09/state-of-tracing.md

From c38f34643ecb645e995d0f8b893c28d36b582743 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Fri, 23 Aug 2024 17:48:03 +0200
Subject: [PATCH 4/8] more edits

---
 posts/2024/09/state-of-tracing.md | 245 ++++++++++++++++++------------
 1 file changed, 149 insertions(+), 96 deletions(-)

diff --git a/posts/2024/09/state-of-tracing.md b/posts/2024/09/state-of-tracing.md
index 617fa0207..b0f3f70c9 100644
--- a/posts/2024/09/state-of-tracing.md
+++ b/posts/2024/09/state-of-tracing.md
@@ -1,7 +1,7 @@
 <!--
-.. title: The State of Tracing
-.. slug: mining-jit-traces-missing-optimizations-z3
-.. date: 2024-08-01 17:01:09 UTC
+.. title: Musings on Tracing in PyPy
+.. slug: musings-tracing
+.. date: 2024-09-01 17:01:09 UTC
 .. tags:
 .. category:
 .. link:
@@ -13,10 +13,10 @@
 A few weeks ago, [Shriram Krishnamurthi](https://cs.brown.edu/~sk/) [asked on
 Twitter](https://twitter.com/ShriramKMurthi/status/1818009884484583459):
 
-"I'm curious what the current state of tracing JITs is. They used to be all the
-rage for a while, then I though I heard they weren't so effective, then I
-haven't heard of them at all. Is the latter because they are ubiquitous, or
-because they proved to not work so well?"
+> "I'm curious what the current state of tracing JITs is. They used to be all the
+> rage for a while, then I though I heard they weren't so effective, then I
+> haven't heard of them at all. Is the latter because they are ubiquitous, or
+> because they proved to not work so well?"
 
 I replied with my personal (partly pretty subjective) opinions about the
 question in a lengthy Twitter thread (which also spawned an even lengthier
@@ -28,32 +28,52 @@ is particularly unclear.
 
 ## Meta-tracing
 
-First some personal context: my perspective is informed by nearly two decades
-of work on PyPy. PyPy's implementation language, RPython, has support a
+First some personal context: my perspective is informed by nearly [two
+decades](https://mail.python.org/archives/list/pypy-dev@python.org/thread/TZM37YJ733G445R6JGTV26333RQEPLRX/)
+of work on PyPy. PyPy's implementation language, [RPython](https://rpython.readthedocs.io/), has support for a
 meta-JIT, which allows it to re-use its JIT infrastructure for the various
 Python versions that we support (currently we do releases of PyPy2.7 and
-PyPy3.10 together). We have also used the meta-JIT infrastructure for some
-experimental different languages like Prolog, Racket, a database (those
-implementations had various degrees of maturity and most of them are research
-software and aren't maintained any more), but also some more surprising things
-like an ARM and RISC-V emulator.
+PyPy3.10 together). Our meta-JIT infrastructure has been used for some
+experimental different languages like:
+
+- PyPy's [regular expression engine](https://pypy.org/posts/2010/11/pypy-14-ouroboros-in-practice-5437628000869417542.html#more-highlights)
+- [RPySom](https://github.com/SOM-st/PySOM), a tiny Smalltalk
+- [Ruby](https://github.com/topazproject/topaz)
+- [PHP](https://github.com/hippyvm/hippyvm)
+- [Prolog](https://dl.acm.org/doi/10.1145/1836089.1836102),
+- [Racket](https://dl.acm.org/doi/10.1145/2784731.2784740),
+- a [database (SQLite)](https://drops.dagstuhl.de/storage/00lipics/lipics-vol056-ecoop2016/LIPIcs.ECOOP.2016.4/LIPIcs.ECOOP.2016.4.pdf)
+- [Lox](https://www.youtube.com/watch?v=fZj3uljJl_k), the language of [Crafting Interpreters](https://craftinginterpreters.com/)
+- an [ARM and RISC-V emulator](https://docs.pydrofoil.org/en/latest/)
+- and many more
+
+Those implementations had various degrees of maturity and many of them are
+research software and aren't maintained any more.
 
 PyPy gives itself the goal to try to be extremely compatible with all the
 quirks of the Python language. We don't change the Python language to make
-things easier to compile. We try very hard to have no opinions on language
-design. The CPython core developers come up with the semantics, we somehow deal
-with them.
-
-PyPy started using a tracing JIT approach *not* because we thought method-based
-just-in-time compilers are bad. Historically we had tried to implemend a
-method-based meta-JIT that was partial evaluation (we wrote three or four
-method-based prototypes that all weren't as good as we hoped). After all those
-experiments failed we switched to the tracing approach, and only at this point
-did our meta-JIT start producing interesting performance.
+things easier to compile and we support the introspection and debugging
+features of Python. We try very hard to have no opinions on language design.
+The CPython core developers come up with the semantics, we somehow deal with
+them.
+
+PyPy started using a [tracing
+JIT](https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation) approach
+*not* because we thought method-based just-in-time compilers are bad.
+Historically we [had
+tried](https://foss.heptapod.net/pypy/extradoc/-/blob/branch/extradoc/eu-report/D08.2_JIT_Compiler_Architecture-2007-05-01.pdf)
+to implemend a method-based meta-JIT that was partial evaluation (we wrote
+three or four method-based prototypes that all weren't as good as we hoped).
+After all those [experiments
+failed](https://pypy.org/posts/2008/10/sprint-discussions-jit-generator-3301578822967655604.html)
+we switched to the [tracing
+approach](https://dl.acm.org/doi/10.1145/1565824.1565827), and only at this
+point did our meta-JIT start producing interesting performance.
 
 In the meta-JIT context tracing has good propreties, because tracing has
 relatively understandable behavior and its easy(ish) to tweak how things work
-with extra annotations in the interpreter source.
+[with extra annotations in the interpreter
+source](https://dl.acm.org/doi/10.1145/2069172.2069181).
 
 Another reason why meta-tracing often works well for PyPy is that it can often
 slice through the complicated layers of Python quite effectively and remove a
@@ -61,19 +81,28 @@ lot of overhead. Python is often described as simple, but I think that's
 actually a misconception. On the implementation level it's a very big and
 complicated language, and it is also continuously getting new features every
 year (the language is quite a bit more complicated than Javascript, for
-example).
+example[^help]).
 
+[^help]: (A side point: people who haven't worked on Python tend to
+    underestimate its complexity and pace of development. A pet peeve of mine
+    is C++ compiler devs/static analysis/Javascript people/other well-meaning
+    communities coming with statements like "why don't you just..."  🤷‍♀️)
 
-### Truffle
 
-Later Truffle came along and made a method-based meta-JIT using partial
-evaluation work. However Truffle (and Graal) has had significantly more people
-working on it and much more money invested. In addition, it at first required a
-quite specific style of AST-based interpreters (in the last few years they have
-also started supporting bytecode-based interpreters).
+### Truffle
 
-It's still my impression that getting similar results with Truffle is a lot
-more work for language implementers than with RPython, and the warmup of
+Later [Truffle](https://dl.acm.org/doi/abs/10.1145/2509578.2509581) came along
+and made a method-based meta-JIT using partial evaluation work. However Truffle
+(and [Graal]()) has had significantly more people working on it and much more
+money invested. In addition, it at first required a quite specific style of
+[AST-based interpreters](https://dl.acm.org/doi/10.1145/2384577.2384587) (in
+the last few years they have also started supporting bytecode-based
+interpreters).
+
+It's still my impression that getting similar results with Truffle is [more
+work for language
+implementers](https://stefan-marr.de/downloads/oopsla15-marr-ducasse-meta-tracing-vs-partial-evaluation.pdf)
+than with RPython, and the [warmup](https://arxiv.org/pdf/1602.00602) of
 Truffle can often pretty bad. But Truffle is definitely an existence proof that
 meta-JITs don't *have* to be based on tracing.
 
@@ -83,15 +112,17 @@ meta-JITs don't *have* to be based on tracing.
 Let's now discuss some of the advantages of tracing that go beyond the ease of
 using tracing for a meta-JIT.
 
-Tracing allows for doing very aggressive partial inlining, following just the
-hot path through lots of layers of abstraction, is obviously often really
-useful for generating fast code
+Tracing allows for doing very aggressive [partial
+inlining](https://www.cs.fsu.edu/~xyuan/INTERACT-15/papers/paper11.pdf),
+following just the hot path through lots of layers of abstraction, is obviously
+often really useful for generating fast code
 
 It's definitely possible to achieve the same effect in a method-based context
-with path splitting. But it requires a lot more implementation work and is not
-trivial, because the path execution counts of inlined functions can often be
-very call-site dependent, and tracing gives you call-site dependent path
-splitting "for free".
+with [path splitting](https://dl.acm.org/doi/pdf/10.1145/117954.117955). But it
+requires a lot more implementation work and is not trivial, because the path
+[execution counts](https://dl.acm.org/doi/10.1145/504282.504295) of inlined
+functions can often be very call-site dependent, and tracing gives you
+call-site dependent path splitting "for free".
 
 (The aggressive partial inlining and path splitting is even more important in
 the meta-tracing context of PyPy, where some of inlined layers are a part of
@@ -100,105 +131,127 @@ practice are basically absolutely everywhere.)
 
 Another advantage of tracing is that it makes a whole bunch of optimizations
 really easy to implement, because there are (to first approximation) no control
-flow merges. This makes all the optimizations that we do (super-)local
-optimizations, that operate on a single (very long) basic block. This the JIT
-to do the optimizations in exactly one forwards and one backwards pass. Eg our
-allocation removal/partial escape analysis is simple.
+flow merges. This makes all the optimizations that we do (super-)[local
+optimizations](https://en.wikipedia.org/wiki/Optimizing_compiler#Local_vs._global_scope),
+that operate on a single (very long) basic block. This allows the JIT to do the
+optimizations in exactly one forwards and one backwards pass. An example is our
+[allocation removal](https://dl.acm.org/doi/10.1145/1929501.1929508)/partial
+escape analysis pass, which is [quite
+simple](https://pypy.org/posts/2022/10/toy-optimizer-allocation-removal.html),
+whereas the [version for general control
+flow](https://ssw.jku.at/Teaching/PhDTheses/Stadler/Thesis_Stadler_14.pdf) has
+a lot more complexity, particularly in its handling of loops.
 
 This ease of implementation of optimizations allowed us to implement some
-pretty decent optimizations. Our optimization of temporary allocations, the way
-we can reason about the heap, about dictionary accesses, about properties of
-functions of the runtime, about the range and known bits of integer variables
-is all quite solid.
+pretty decent optimizations. Our allocation removal, the way PyPy's JIT can
+reason about the heap, about dictionary accesses, about properties of functions
+of the runtime, about the range and [known bits of integer
+variables](https://pypy.org/posts/2024/08/toy-knownbits.html) is all quite
+solid.
 
 
 ## Tracing, the bad
 
 Tracing also comes with a significant number of downsides. Probably the biggest
 one is that it tends to have big performance cliffs (PyPy certainly has them,
-and other tracing JITs such as TraceMonkey had them too). The 'good' cases are
-really good, but if something goes wrong you are annoyed and performance can
-become a lot slower. With a simple method jit the performance is often much
-more "even".
+and other tracing JITs such as TraceMonkey had them too). In my experience the
+'good' cases of tracing are really good, but if something goes wrong you are
+annoyed and performance can become a lot slower. With a simple method jit the
+performance is often much more "even".
 
 Another set of downsides is that tracing has a number of corner cases and
 "weird" behaviour in certain situations. Questions such as:
 - When do you stop inlining?
-- What happens when you trace recursion?
-- What happens if your traces are consistently too long, even without inling?
+- What happens when you [trace recursion](https://mail.python.org/archives/list/pypy-dev@python.org/thread/GQQ7ABUFHGEAHWN7RQZM6D54CDROQINR/)?
+- What happens if your traces are [consistently too long, even without inling](https://pypy.org/posts/2021/09/jit-auto-generated-code.html)?
 - and so on...
 
-There are also some classes of programs that tend to perform quite poorly when
-they are executed by a tracing JIT, bytecode interpreters in particularly, and
-other extremely unpredictably branchy code. This is because the core assumption
-of the tracing jit "loops take similar control flow paths" is just really wrong
-in the case of interpreters.
+Some of these problems can be solved by adding heuristics to the tracing JIT,
+but doing so loses a lot of the simplicity of tracing again.
+
+There are also some classes of programs that tend to generally perform quite
+poorly when they are executed by a tracing JIT, bytecode interpreters in
+particularly, and other extremely unpredictably branchy code. This is because
+the core assumption of the tracing jit "loops take similar control flow paths"
+is just really wrong in the case of interpreters.
 
 
 ## Discussion
 
 The Twitter thread spawned quite a bit of discussion, please look at the
-original thread. Here are three that I wanted to highlight:
+original thread for all of the comments. Here are three that I wanted to
+highlight:
 
-"This is a really great summary. Meta-tracing is probably the one biggest
-success story. I think it has to do with how big and branchy the bytecode
-implementations are for typical dynamic languages; the trace captures latent
-type feedback naturally.
+> "This is a really great summary. Meta-tracing is probably the one biggest
+> success story. I think it has to do with how big and branchy the bytecode
+> implementations are for typical dynamic languages; the trace captures latent
+> type feedback naturally.
 
 There is an upper limit, tho." [Ben Titzer](https://twitter.com/TitzerBL/status/1818385622203298265)
 
 I agree with this completely, the complexity of Python bytecodes is a big
 factor for why meta tracing works well for us. But also in Python there are
-many builtin types (collection types, types that form the meta-object protocol
-of Python, standard library modules implemented in C/RPython) and tracing
+many builtin types (collection types, types that form the [meta-object
+protocol](https://en.wikipedia.org/wiki/Metaobject#Metaobject_protocol) of
+Python, standard library modules implemented in C/RPython) and tracing
 operations on them is very important too, for good performance.
 
 
 ----
 
-"I think Mozilla had a blog post talking more about the difficulty with
-TraceMonkey, could only find this one:
-https://blog.mozilla.org/nnethercote/category/jagermonkey/"
+> "I think Mozilla had a blog post talking more about the difficulty with
+> TraceMonkey, could only find this one:
+> https://blog.mozilla.org/nnethercote/category/jagermonkey/"
 
 [Stefan Marr](https://twitter.com/smarr/status/1818600052752797990)
 
-"imo doing tracing for JS is really hard mode, because the browser is so
-incredibly warmup-sensitive. IIRC tracemonkey used a really low loop trip count
-(single-digit?) to decide when to start tracing (pypy uses >1000). the JS
-interpreters of the time were also quite slow."
+> "imo doing tracing for JS is really hard mode, because the browser is so
+> incredibly warmup-sensitive. IIRC tracemonkey used a really low loop trip count
+> (single-digit?) to decide when to start tracing (pypy uses >1000). the JS
+> interpreters of the time were also quite slow."
 
 [me](https://twitter.com/cfbolz/status/1818609594219811245)
 
 ----
 
-"There are a number of corner cases you have to deal with in a tracing JIT. It's
-unfortunately not as simple and easy as the initial papers would have you
-believe. One example is how would you deal with a loop inside a loop? Is your
-tracing now recursive?
+> "There are a number of corner cases you have to deal with in a tracing JIT. It's
+> unfortunately not as simple and easy as the initial papers would have you
+> believe. One example is how would you deal with a loop inside a loop? Is your
+> tracing now recursive?
 
-There's been some research work on trace stitching to deal with trace explosion
-but it does add complexity. With the increase in complexity, I think most
-industrial VM developers would rather pick tried-and-true method-based JITs
-that are well understood."
+> There's been some research work on trace stitching to deal with trace explosion
+> but it does add complexity. With the increase in complexity, I think most
+> industrial VM developers would rather pick tried-and-true method-based JITs
+> that are well understood."
 
 [Maxime Chevalier](https://twitter.com/Love2Code/status/1818292516753383644)
 
 ## Conclusion
 
-In a non-meta-jit it's very unclear to me that you should use tracing. It makes
+Given access to enough developers and in the context of "normal" jitting (ie not meta-jitting) it's very unclear to me that you should use tracing. It makes
 more sense to rather spend effort on a solid control-flow-graph-based baseline
 and then try to get some of the good properties of tracing on top (path
-splitting, partial inlining, etc).
-
-For PyPy with it's meta-JIT I still think tracing is a relatively pragmatic
-choice, and in the cases where it works well the performance of PyPy is quite
-hard to beat (particularly with the constraint of not being "allowed" to change
-the language).
-
-All of the above is all purely based on the data point of a single project, of
-course, but one that has implemented a number of different languages.
+splitting, partial inlining, partial escape analysis, etc).
+
+For PyPy with its meta-JIT (and the fact that we don't have particularly much
+funding nor people) I still think tracing was/is a relatively pragmatic choice.
+When I talked with [Sam Tobin-Hochstadt](https://samth.github.io/) about this
+topic recently he characterized it like this: "tracing is a labor-saving device
+for compiler authors".
+
+Performance-wise PyPy is still quite hard to beat in the cases where it works
+well (i.e. pure Python code that doesn't use too many C modules, which are
+[supported but slow in
+PyPy](https://pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-8083064623681286567.html)).
+In general, there are very few JITs for Python (particularly with the
+constraint of not being "allowed" to change the language), the most competitive
+other ones are [GraalPy](https://www.graalvm.org/python/), also based on a
+meta-JIT approach. Instagram is running on
+[Cinder](https://github.com/facebookincubator/cinder/) and also CPython has
+[grown a JIT
+recently](https://tonybaloney.github.io/posts/python-gets-a-jit.html) which
+will be part of the upcoming [3.13 release, but only as an off-by-default build
+option](https://docs.python.org/3.13/whatsnew/3.13.html#an-experimental-just-in-time-jit-compiler),
+so I'm very excited about how Python's performance will develop in the next
+years!
 
-(A side point: people who haven't worked on Python tend to underestimate
-its complexity. A pet peeve of mine is C++ compiler devs/static
-analysis/Javascript people/other well-meaning communities coming with
-statements like "why don't you just..."  🤷‍♀️)

From 8c3166a1281ecf94db9d60dff3f4273aef543405 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Tue, 27 Aug 2024 15:49:37 +0200
Subject: [PATCH 5/8] wip: changes

---
 posts/2024/09/state-of-tracing.md | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/posts/2024/09/state-of-tracing.md b/posts/2024/09/state-of-tracing.md
index b0f3f70c9..d93801606 100644
--- a/posts/2024/09/state-of-tracing.md
+++ b/posts/2024/09/state-of-tracing.md
@@ -21,14 +21,20 @@ Twitter](https://twitter.com/ShriramKMurthi/status/1818009884484583459):
 I replied with my personal (partly pretty subjective) opinions about the
 question in a lengthy Twitter thread (which also spawned an even lengthier
 discussion). I wanted to turn what I wrote there into a blog post to make it
-more widely available. The blog post i still somewhat terse, I've tried to at
-least add links to further information. Please ask in the comments if something
-is particularly unclear.
+more widely available. The blog post i still somewhat terse, I've written a
+small background section and tried to at least add links to further
+information. Please ask in the comments if something is particularly unclear.
 
 
-## Meta-tracing
+## Background
+
+I'll explain a few of the central terms of the rest of the post. JIT compilers
+are compilers that do their work at runtime, interleaved (or concurrent with)
+the execution of the program.
+
+## Personal and Project Context
 
-First some personal context: my perspective is informed by nearly [two
+Some personal context: my perspective is informed by nearly [two
 decades](https://mail.python.org/archives/list/pypy-dev@python.org/thread/TZM37YJ733G445R6JGTV26333RQEPLRX/)
 of work on PyPy. PyPy's implementation language, [RPython](https://rpython.readthedocs.io/), has support for a
 meta-JIT, which allows it to re-use its JIT infrastructure for the various
@@ -57,6 +63,8 @@ features of Python. We try very hard to have no opinions on language design.
 The CPython core developers come up with the semantics, we somehow deal with
 them.
 
+## Meta-tracing
+
 PyPy started using a [tracing
 JIT](https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation) approach
 *not* because we thought method-based just-in-time compilers are bad.
@@ -186,8 +194,10 @@ highlight:
 > success story. I think it has to do with how big and branchy the bytecode
 > implementations are for typical dynamic languages; the trace captures latent
 > type feedback naturally.
+>
+> There is an upper limit, tho."
 
-There is an upper limit, tho." [Ben Titzer](https://twitter.com/TitzerBL/status/1818385622203298265)
+[Ben Titzer](https://twitter.com/TitzerBL/status/1818385622203298265)
 
 I agree with this completely, the complexity of Python bytecodes is a big
 factor for why meta tracing works well for us. But also in Python there are
@@ -226,9 +236,11 @@ operations on them is very important too, for good performance.
 
 [Maxime Chevalier](https://twitter.com/Love2Code/status/1818292516753383644)
 
+
 ## Conclusion
 
-Given access to enough developers and in the context of "normal" jitting (ie not meta-jitting) it's very unclear to me that you should use tracing. It makes
+Given access to enough developers and in the context of "normal" jitting (ie
+not meta-jitting) it's very unclear to me that you should use tracing. It makes
 more sense to rather spend effort on a solid control-flow-graph-based baseline
 and then try to get some of the good properties of tracing on top (path
 splitting, partial inlining, partial escape analysis, etc).

From fc1bec6ee30d3750862c8c0ebdba05a08d470006 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Wed, 28 Aug 2024 11:39:49 +0200
Subject: [PATCH 6/8] add guestpost tag to icelandic post too

---
 posts/2022/02/nlp-icelandic-pypy.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/posts/2022/02/nlp-icelandic-pypy.rst b/posts/2022/02/nlp-icelandic-pypy.rst
index fffc49238..8e13fbe15 100644
--- a/posts/2022/02/nlp-icelandic-pypy.rst
+++ b/posts/2022/02/nlp-icelandic-pypy.rst
@@ -1,7 +1,7 @@
 .. title: Natural Language Processing for Icelandic with PyPy: A Case Study
 .. slug: nlp-icelandic-case-study
 .. date: 2022-02-06 15:00:00 UTC
-.. tags: casestudy
+.. tags: casestudy, guestpost
 .. category: 
 .. link: 
 .. description: 

From c8625c4282d725efcebc0c80bf38699883acae75 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Sat, 4 Jan 2025 11:07:19 +0100
Subject: [PATCH 7/8] try to make the tracing blog post publishable

---
 .../{2024/09 => 2025/01}/state-of-tracing.md  | 64 ++++++++++++-------
 1 file changed, 41 insertions(+), 23 deletions(-)
 rename posts/{2024/09 => 2025/01}/state-of-tracing.md (82%)

diff --git a/posts/2024/09/state-of-tracing.md b/posts/2025/01/state-of-tracing.md
similarity index 82%
rename from posts/2024/09/state-of-tracing.md
rename to posts/2025/01/state-of-tracing.md
index d93801606..615567efb 100644
--- a/posts/2024/09/state-of-tracing.md
+++ b/posts/2025/01/state-of-tracing.md
@@ -1,7 +1,7 @@
 <!--
 .. title: Musings on Tracing in PyPy
 .. slug: musings-tracing
-.. date: 2024-09-01 17:01:09 UTC
+.. date: 2025-01-05 17:01:09 UTC
 .. tags:
 .. category:
 .. link:
@@ -10,7 +10,7 @@
 .. author: CF Bolz-Tereick
 -->
 
-A few weeks ago, [Shriram Krishnamurthi](https://cs.brown.edu/~sk/) [asked on
+Last summer, [Shriram Krishnamurthi](https://cs.brown.edu/~sk/) [asked on
 Twitter](https://twitter.com/ShriramKMurthi/status/1818009884484583459):
 
 > "I'm curious what the current state of tracing JITs is. They used to be all the
@@ -18,19 +18,31 @@ Twitter](https://twitter.com/ShriramKMurthi/status/1818009884484583459):
 > haven't heard of them at all. Is the latter because they are ubiquitous, or
 > because they proved to not work so well?"
 
-I replied with my personal (partly pretty subjective) opinions about the
+I replied with my personal (pretty subjective) opinions about the
 question in a lengthy Twitter thread (which also spawned an even lengthier
 discussion). I wanted to turn what I wrote there into a blog post to make it
-more widely available. The blog post i still somewhat terse, I've written a
-small background section and tried to at least add links to further
-information. Please ask in the comments if something is particularly unclear.
+more widely available (Twitter is no longer easily consumable without an
+account), and also because I'm mostly not using Twitter anymore. The blog post
+i still somewhat terse, I've written a small background section and tried to at
+least add links to further information. Please ask in the comments if something
+is particularly unclear.
 
 
 ## Background
 
-I'll explain a few of the central terms of the rest of the post. JIT compilers
+I'll explain a few of the central terms of the rest of the post. *JIT compilers*
 are compilers that do their work at runtime, interleaved (or concurrent with)
-the execution of the program.
+the execution of the program. There are (at least) two common general styles of
+JIT compiler architectures. The most common one is that of a method-based JIT,
+which will compile one method or function at a time. Then there are tracing JIT
+compilers, which generate code by tracing the execution of the user's program.
+They often focus on loops as stheir main unit of compilation.
+
+Then there is the distinction between a "regular" JIT compiler and that of a
+*meta-JIT*. A regular JIT is built to compile one specific source language to
+machine code. A meta-JIT is a framework for building JIT compilers for a
+variety of different languages, re-using as much machinery as possible between
+the different implementation.
 
 ## Personal and Project Context
 
@@ -70,7 +82,7 @@ JIT](https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation) approach
 *not* because we thought method-based just-in-time compilers are bad.
 Historically we [had
 tried](https://foss.heptapod.net/pypy/extradoc/-/blob/branch/extradoc/eu-report/D08.2_JIT_Compiler_Architecture-2007-05-01.pdf)
-to implemend a method-based meta-JIT that was partial evaluation (we wrote
+to implemend a method-based meta-JIT that was using partial evaluation (we wrote
 three or four method-based prototypes that all weren't as good as we hoped).
 After all those [experiments
 failed](https://pypy.org/posts/2008/10/sprint-discussions-jit-generator-3301578822967655604.html)
@@ -78,7 +90,7 @@ we switched to the [tracing
 approach](https://dl.acm.org/doi/10.1145/1565824.1565827), and only at this
 point did our meta-JIT start producing interesting performance.
 
-In the meta-JIT context tracing has good propreties, because tracing has
+In the meta-JIT context tracing has good properties, because tracing has
 relatively understandable behavior and its easy(ish) to tweak how things work
 [with extra annotations in the interpreter
 source](https://dl.acm.org/doi/10.1145/2069172.2069181).
@@ -101,10 +113,10 @@ example[^help]).
 
 Later [Truffle](https://dl.acm.org/doi/abs/10.1145/2509578.2509581) came along
 and made a method-based meta-JIT using partial evaluation work. However Truffle
-(and [Graal]()) has had significantly more people working on it and much more
+(and [Graal](https://www.oracle.com/java/graalvm/)) has had significantly more people working on it and much more
 money invested. In addition, it at first required a quite specific style of
 [AST-based interpreters](https://dl.acm.org/doi/10.1145/2384577.2384587) (in
-the last few years they have also started supporting bytecode-based
+the last few years they have also added support for bytecode-based
 interpreters).
 
 It's still my impression that getting similar results with Truffle is [more
@@ -117,27 +129,28 @@ meta-JITs don't *have* to be based on tracing.
 
 ## Tracing, the good
 
-Let's now discuss some of the advantages of tracing that go beyond the ease of
-using tracing for a meta-JIT.
+Let's now actually get to he heart of Shriram's question and discuss some of
+the advantages of tracing that go beyond the ease of using tracing for a
+meta-JIT.
 
 Tracing allows for doing very aggressive [partial
 inlining](https://www.cs.fsu.edu/~xyuan/INTERACT-15/papers/paper11.pdf),
-following just the hot path through lots of layers of abstraction, is obviously
-often really useful for generating fast code
+Following just the hot path through lots of layers of abstraction is obviously
+often really useful for generating fast code.
 
 It's definitely possible to achieve the same effect in a method-based context
 with [path splitting](https://dl.acm.org/doi/pdf/10.1145/117954.117955). But it
 requires a lot more implementation work and is not trivial, because the path
 [execution counts](https://dl.acm.org/doi/10.1145/504282.504295) of inlined
-functions can often be very call-site dependent, and tracing gives you
-call-site dependent path splitting "for free".
+functions can often be very call-site dependent. Tracing, on the other hand,
+gives you call-site dependent path splitting "for free".
 
 (The aggressive partial inlining and path splitting is even more important in
 the meta-tracing context of PyPy, where some of inlined layers are a part of
 the language runtime, and where rare corner cases that are never executed in
-practice are basically absolutely everywhere.)
+practice are everywhere.)
 
-Another advantage of tracing is that it makes a whole bunch of optimizations
+Another advantage of tracing is that it makes a number of optimizations
 really easy to implement, because there are (to first approximation) no control
 flow merges. This makes all the optimizations that we do (super-)[local
 optimizations](https://en.wikipedia.org/wiki/Optimizing_compiler#Local_vs._global_scope),
@@ -164,7 +177,7 @@ Tracing also comes with a significant number of downsides. Probably the biggest
 one is that it tends to have big performance cliffs (PyPy certainly has them,
 and other tracing JITs such as TraceMonkey had them too). In my experience the
 'good' cases of tracing are really good, but if something goes wrong you are
-annoyed and performance can become a lot slower. With a simple method jit the
+annoyed and performance can become a lot slower. With a simple method JIT the
 performance is often much more "even".
 
 Another set of downsides is that tracing has a number of corner cases and
@@ -199,7 +212,7 @@ highlight:
 
 [Ben Titzer](https://twitter.com/TitzerBL/status/1818385622203298265)
 
-I agree with this completely, the complexity of Python bytecodes is a big
+I agree with this completely. The complexity of Python bytecodes is a big
 factor for why meta tracing works well for us. But also in Python there are
 many builtin types (collection types, types that form the [meta-object
 protocol](https://en.wikipedia.org/wiki/Metaobject#Metaobject_protocol) of
@@ -222,6 +235,11 @@ operations on them is very important too, for good performance.
 
 [me](https://twitter.com/cfbolz/status/1818609594219811245)
 
+In the meantime there were some more reminiscences about tracing in Javascript
+by [Shu-Yu Guo in a panel
+discussion](https://www.youtube.com/live/_VF3pISRYRc?t=24797s) and by [Jason
+Orendorff on Mastodon](https://kfogel.org/notice/AngH0uqyJl231yLLOa).
+
 ----
 
 > "There are a number of corner cases you have to deal with in a tracing JIT. It's
@@ -262,7 +280,7 @@ meta-JIT approach. Instagram is running on
 [Cinder](https://github.com/facebookincubator/cinder/) and also CPython has
 [grown a JIT
 recently](https://tonybaloney.github.io/posts/python-gets-a-jit.html) which
-will be part of the upcoming [3.13 release, but only as an off-by-default build
+was part of the recent [3.13 release, but only as an off-by-default build
 option](https://docs.python.org/3.13/whatsnew/3.13.html#an-experimental-just-in-time-jit-compiler),
 so I'm very excited about how Python's performance will develop in the next
 years!

From a7377b7d56e88f3d0c7bcfd5c1a0f43b2cd65e52 Mon Sep 17 00:00:00 2001
From: CF Bolz-Tereick <cfbolz@gmx.de>
Date: Sat, 4 Jan 2025 11:10:23 +0100
Subject: [PATCH 8/8] fix typos

---
 posts/2025/01/state-of-tracing.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/posts/2025/01/state-of-tracing.md b/posts/2025/01/state-of-tracing.md
index 615567efb..dafccf412 100644
--- a/posts/2025/01/state-of-tracing.md
+++ b/posts/2025/01/state-of-tracing.md
@@ -49,7 +49,7 @@ the different implementation.
 Some personal context: my perspective is informed by nearly [two
 decades](https://mail.python.org/archives/list/pypy-dev@python.org/thread/TZM37YJ733G445R6JGTV26333RQEPLRX/)
 of work on PyPy. PyPy's implementation language, [RPython](https://rpython.readthedocs.io/), has support for a
-meta-JIT, which allows it to re-use its JIT infrastructure for the various
+meta-JIT, which allows it to reuse its JIT infrastructure for the various
 Python versions that we support (currently we do releases of PyPy2.7 and
 PyPy3.10 together). Our meta-JIT infrastructure has been used for some
 experimental different languages like:
@@ -82,7 +82,7 @@ JIT](https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation) approach
 *not* because we thought method-based just-in-time compilers are bad.
 Historically we [had
 tried](https://foss.heptapod.net/pypy/extradoc/-/blob/branch/extradoc/eu-report/D08.2_JIT_Compiler_Architecture-2007-05-01.pdf)
-to implemend a method-based meta-JIT that was using partial evaluation (we wrote
+to implement a method-based meta-JIT that was using partial evaluation (we wrote
 three or four method-based prototypes that all weren't as good as we hoped).
 After all those [experiments
 failed](https://pypy.org/posts/2008/10/sprint-discussions-jit-generator-3301578822967655604.html)