Ruby 3 JIT can make Rails faster

RubyKaig 2018 / The Method JIT Compiler for Ruby 2.6

"MJIT Does Not Improve Rails Performance"

As I wrote in Ruby 3.0.0 Release Note and my previous post, we thought:

Why does Rails become slower on Ruby 3’s JIT?

People have thought:

Is it because we use C compilers?

MJIT writes C code and invokes a C compiler to generate native code. Slow compilation time aside, which doesn't impact the peak performance, I see two performance problems in using C compilers as a JIT backend.

i-cache misses by compiling many methods?

I’ve measured JIT-ed code with Linux perf so many times. I even wrote a perf plugin for it. Through the investigation, I observed cycles stalled for filling i-cache. Because the number of i-cache misses generally increases when you compile more methods, we changed the default of --jit-max-cache from 1000 to 100 in Ruby 2.7 to address the problem. And it indeed helped Rails performance. So I’ve believed JIT-ing a lot of methods is a bad thing.

The “compile all” magic

Yes, that’s what we needed. If you compile all benchmarked methods instead of just the top 100 methods, the JIT makes Rails faster.

Sinatra Benchmark

This benchmark was from an article saying "enabling the JIT makes Sinatra 11% slower!", which just returns a static string from Sinatra.

Rails Simpler Bench

Rails Simpler Bench was authored by Noah Gibbs. The default benchmark endpoint returns a static text. So it's a Rails version of the above benchmark.

Railsbench

Railsbench measures the performance of HTML #show action of a controller generated by "rails g scaffold". It was originally headius/pgrailsbench used by the JRuby team, I forked it as k0kubun/railsbench to upgrade Rails, and Shopify/yjit-bench uses that too.

Discourse

Discourse is one of the most popular open-source Rails applications in the world. You can see it at Twitter Developers, GitHub Support, CircleCI Discuss, and more.

Why does compiling everything make it faster?

To spot locations impacting metrics like i-cache misses, I've tried tools like perf and cachegrind before, but because the slowness appears everywhere and no single place shows a significant difference, it's hard to tell the exact reason.

So, should I use JIT on Rails?

We're almost there, but we have some stuff to be fixed before bringing this to production.

Ruby 3.0.1 bug that's not in 3.0.0

One of the backports to Ruby 3.0.1 causes JIT compilation failures. 3.0.1 is significantly slower than 3.0.0 if you use JIT. Let's wait for the next release.

Ruby 3 bug that stops compilation in the middle

There's a bug that the JIT worker stops compiling the remaining methods in edge cases. You cannot reproduce the above speed if it happens. Please wait until this fix is backported and released.

Incompatibility with Zeitwerk / TracePoint

In ruby 2.5, Koichi, the author of TracePoint, implemented an optimization that makes overall Ruby execution faster as long as you never use TracePoint. I've thought you should not enable TracePoint if you care about performance. Thus MJIT has not supported TracePoint.

Incompatibility with GC.compact

We either need to compromise performance by making every memory access indirect or recompile many methods whenever GC.compact is called. It has not been implemented, so all JIT-ed code is canceled for now.

The default value of --jit-max-cache

To compile everything, the default 100 of --jit-max-cache was too small. I'll change it to maybe 10000 in Ruby 3.1, but you'd need to override the default for Ruby 3.0.

Scalability of "JIT compaction"

If you want to reproduce the above benchmark results, you have to see the "JIT compaction" log at the end of --jit-verbose=1 logs. However, because the compacted binary has all methods and we currently have no way to distinguish which version of JIT-ed code is in use, the compacted code is almost always not GC-ed properly. Because of that, we restricted the max number of "JIT compaction" to 10% of --jit-max-cache (10 times by default). If JIT continues to compile methods after the last JIT compaction, it'll be slow.

Next steps

I don't care which JIT backend we use to generate machine code out of MJIT, MIR, or YJIT. What I do care about the most is the performance in real-world workloads and optimization ideas implemented in the JIT compiler on top of the backend, which will continue to be useful even if you replace MJIT with MIR and/or YJIT.

Ruby-based JIT compiler

I considered "Ractor-based JIT worker" in the previous article because process invocation is something Ruby is good at and I thought doing that at a Ruby level might make it easier to maintain locks, in addition to using Ruby to easily maintain complicated optimization logic.

Faster deoptimization

One theme in the previous article was "On-Stack Replacement", but I split it to "Faster deoptimization" and the next "Lazy stack frame push" this time. Ultimately, On-Stack Replacement is just one way to implement those optimizations, and whether we use OSR or not doesn't really matter.

Lazy stack frame push

From Ruby 3.0, we started to annotate methods that don't need any stack frame to be pushed. The number of methods we can annotate as such is limited for now because object allocation might raise NoMemoryError and a backtrace needs the frame. If we can lazily push the frame when it's being raised, we can skip pushing the frame. It's tricky to implement that with MJIT, but it'd be useful for reducing the method call overheads.

Sponsors

My contribution to OSS like the JIT, ERB, Haml, and IRB and my own projects like Hamlit, sqldef, pp, mitamae, and rspec-openapi has been supported by the following GitHub Sponsors. Thank you!

https://github.com/sponsors/k0kubun

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store