JRuby 9000
Charles O. Nutter
JRuby; Red Hat
JRuby is Ruby…
(on the JVM...shhhh!)
Why the JVM is great
for Ruby!
Shoulders of Giants
JVMJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. Rose
Hiro
Marcin
Nahi
Subbu Douglas
Christian Dmitry
Tom
Charlie
JRuby
All the stuff!
JVMJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. Rose
Garbage
Collection
Native JIT
Profiled
Optimizations
Native
Threading
Tooling Cross Platform
Can leverage Java
Ecosystem
47k libraries in Maven
Hadoop EHCache
Selenium
Sitemesh
Lucene
Neo4j
JMonkeyEngine
Polyglot
Clojure Scala
Groovy
Jython
Rhino/Nashorn/
DynJS (JavaScript)
Micro Focus JVMVisual COBOL
Java
red/black tree, pure Ruby versus native
ruby-2.0.0 + Ruby
ruby-2.0.0 + C ext
jruby + Ruby
Runtime per iteration
0 0.75 1.5 2.25 3
0.29s
0.51s
2.48s
GC
GC Matters
• Applications grow over time
• Ruby is very object-heavy
• Multiprocess multiplies the problem
• You will eventually have issues
gc_demo.rb
• Heavy GC, mix of old and young
• Steadily growing heap use
class Simple
  attr_accessor :next
end
top = Simple.new
puts Benchmark.measure {
  outer = 10
  total = 100000
  per = 100
  outer.times do
    total.times do
      per.times { Simple.new }
      s = Simple.new
      top.next = s
      top = s
    end
  end
}
0
750
1500
2250
3000
GC count
Ruby 2.1.1 JRuby
1
10
100
1000
10000
GC count
Ruby 2.1.1 JRuby
0s
0.45s
0.9s
1.35s
1.8s
GC time %
Ruby 2.2.2 JRuby
Threads
Real Parallellism
• Ruby thread = JVM thread = native thread
• One process can use all cores
• One server can handle all requests
Ruby 2.2
unthreaded
Ruby 2.2
threaded
JRuby
unthreaded
JRuby
threaded
Per-iteration time versus thread count
0.2s
0.35s
0.5s
0.65s
0.8s
one thread two threads three threads four threads
threaded_reverse
Tools
Profiling
• Java profilers
• VisualVM,YourKit, NetBeans, JXInsight
• jruby [--profile | --profile.graph]
• JVM command-line profilers
VisualVM
• CPU, memory, thread monitoring
• CPU and memory profiling
• VisualGC
• Heap analysis
Scripting Java
Purugin
• Nearly 100% Ruby wrapper
• Thin shim makes Java feel very Ruby-like
• It’s Minecraft!
Egg Madness
Egg Madness
class EggMadnessPlugin
include Purugin::Plugin
description 'EggMadness', 0.1
def on_enable
event(:player_egg_throw) do |e|
e.hatching = true
e.num_hatches = 50
e.hatching_type = :chicken
end
end
end
1.7.23
JRuby Roadmap
mastermaster
1.7.3 1.7.4 1.7.5
1.7.6
...
1.7.7 1.7.22
...
9.0.4
2.2
1.8, 1.9
jruby-1_7jruby-1_7jruby-1_7
master
2.2
1.8, 1.9
End of week
Last Friday
JRuby 9000
• Ruby 2.2
• New runtime (IR)
• Major IO and Encodings overhaul
“It’s over 9000!!!!”
Now What?
PERFORMANCE
WORK
topic of this talk
Now and into not so distant future!
Recent Wins
• JITable blocks
• define_method performance
• Reduced-cost transient exceptions
Block Jitting
• JRuby 1.7 only jitted methods
• Not free-standing procs/lambdas
• Not define_method blocks
• Easier to do now with 9000's IR
• Blocks JIT now in 9.0.4.0
Jitting is Winning
Performance of define_method in loaded file
0k iters/s
750k iters/s
1500k iters/s
2250k iters/s
3000k iters/s
MRI JRuby 9.0.1.0 JRuby 9.0.4.0
normal method define_method method
ruby -e 'load "bench_define_method.rb"'
define_method
Convenient for metaprogramming,
but blocks have more overhead than methods.
define_method(:add) do |a, b|

a + b

end
names.each do |name|

define_method(name) { send :"do_#{name}" }

end
:-(
0k iters/s
1000k iters/s
2000k iters/s
3000k iters/s
4000k iters/s
MRI JRuby 9.0.1.0
def define_method
define_method w/ capture
Optimizing define_method
• Noncapturing
• Treat as method in compiler
• Ignore surrounding scope
• Capturing (future work)
• Lift read-only variables as constant
Getting Better!
0k iters/s
1000k iters/s
2000k iters/s
3000k iters/s
4000k iters/s
MRI JRuby 9.0.1.0 JRuby 9.0.4.0
def define_method
define_method w/ capture
Reduced-cost Exceptions
• Backtrace cost isVERY high on JVM
• Heavily optimized, lots of work to build
• Exceptions frequently ignored
• ...or used as flow control (shame!)
• If ignored, backtrace is not needed!
Postfix Antipattern
foo rescue nil
Exception raised
StandardError rescued
Exception ignored
Result is simple expression, so exception is never visible.
csv.rb Converters
Converters = { integer: lambda { |f|

Integer(f.encode(ConverterEncoding)) rescue f

},

float: lambda { |f|

Float(f.encode(ConverterEncoding)) rescue f

},

...
All trivial rescues, no traces needed.
Simple rescue
Improvement
0
150000
300000
450000
600000
Iters/second
524,475
10,700
Nearly Two Magnitudes!
1
10
100
1000
10000
100000
1000000
Iters/second
524,475
10,700
New Runtime?
• AST to semantic representation
• Traditional Compiler Design
• Wanted Architectural longevity

Lexical
Analysis
Parsing
Semantic
Analysis
Optimization
Bytecode
Generation
Interpret
AST
IR Instructions
CFG DFG ...
JRuby 1.7.x
9000+
Dalvik
Generation
...
def foo(a, b)
c = 1
d = a + c
end
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
2 b = recv_pre_reqd_arg(1)
3 %block = recv_closure
4 thread_poll
5 line_num(1)
6 c = 1
7 line_num(2)
8 %v_0 = call(:+, a, [c])
9 d = copy(%v_0)
10 return(%v_0)
Register-based
3 address format
IR InstructionsSemantic
Analysis
-Xir.passes=LocalOptimizationPass,
DeadCodeElimination
def foo(a, b)
c = 1
d = a + c
end
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
2 b = recv_pre_reqd_arg(1)
3 %block = recv_closure
4 thread_poll
5 line_num(1)
6 c = 1
7 line_num(2)
8 %v_0 = call(:+, a, [c])
9 d = copy(%v_0)
10 return(%v_0)
Optimization
def foo(a, b)
c = 1
d = a + c
end
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
6 c = 1
7 line_num(2)
8 %v_0 = call(:+, a, [c])
9 d = copy(%v_0)
10 return(%v_0)
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
6 c =
7 line_num(2)
8 %v_0 = call(:+, a, [ ])
9 d = copy(%v_0)
10 return(%v_0)
1
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
7 line_num(2)
8 %v_0 = call(:+, a, [1])
9 d = copy(%v_0)
10 return(%v_0)
Optimization -Xir.passes=LocalOptimizationPass,
DeadCodeElimination
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
7 line_num(2)
8 %v_0 = call(:+, a, [1])
9 d = copy(%v_0)
10 return(%v_0)
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
7 line_num(2)
8 %v_0 = call(:+, a, [1])
9 d = copy(%v_0)
10 return(%v_0)
Optimization -Xir.passes=LocalOptimizationPass,
DeadCodeElimination
Inlining
• 500 pound gorilla of optimizations
• shove method/closure back to callsite
• eliminate stack frame
• eliminate parameter passing/return
• eliminate additional allocation
Optimization
Today’s Inliner
def decrement_one(i)
i - 1
end
i = 1_000_000
while i > 0
i = decrement_one(i)
end
def decrement_one(i)
i - 1
end
i = 1_000_000
while i < 0
if guard_same? self
i = i - 1
else
i = decrement_one(i)
end
end
Numeric Specialization
• Everything's an object
• JVM has only references and primitives
• Not compatible in bytecode
• Need to optimize numerics as primitive
def looper(n)

i = 0

while i < n

do_something(i)

i += 1

end

end
Cached object
Call with i
New Fixnum i + 1
Probably a Fixnum?
def looper(n)

i = 0

while i < n

do_something(i)

i += 1

end

end
def looper(long n)

long i = 0

while i < n

do_something(i)

i += 1

end

end
Specialize n, i to long
def looper(n)

i = 0

while i < n

do_something(i)

i += 1

end

end
Deopt to object version if n or i + 1 is not Fixnum
JVM Futures
• We're good friends with OpenJDK folks
• Working to improve JVM as well
• FFI being added at JVM level
• AOT compilation for startup perf
FFI in JVM
• Project Panama (JEP-191)
• Native support for FFI
• Code generators for binding
• JIT support for calling
• API support for userland
User Code
JNI call
JNI impl
Target Library
Java
C/native
User Code
JNR stub
JNI call
JNI impl
libffi
Target Library
Java
C/native
User Code
Panama
Target Library
Java
C/native
JIT knows about
both sides
JIT Magic
callq <getpid address> ; - libSystem.B.dylib
;*invokeinterface getpid
; - GetPidJNRExample::benchGetPid@13 (line 26)
; {optimized virtual_call}
Direct call from JITed Ruby code
Startup Time
• By far our greatest challenge
• Everything starts cold: parser, interpreter,
compiler, core classes, boot logic
• Increasing amount of Ruby in JRuby
• Aggravates the problem
JRuby Startup
C Ruby
JRuby
Time in seconds (lower is better)
0s 3.5s 7s 10.5s 14s
-e 1 gem --list rake -T in Rals app
--dev
• Disables JRuby JIT
• Sets JVM to reduced optimization mode
• 50% reduction in startup time
• Much lower peak perf
JRuby --dev
C Ruby
JRuby
JRuby --dev
Time in seconds (lower is better)
0s 3.5s 7s 10.5s 14s
-e 1 gem --list rake -T in Rals app
AOT
• Precompile JVM bytecode to native
• Focus on hot code
• Save original structure for optimization
• Get JRuby running native right away
• AOT compile Ruby to native in future
Getting There
C Ruby
JRuby
JRuby --dev
Non-opto AOT
Opto AOT
Time in seconds (lower is better)
0s 3.5s 7s 10.5s 14s
rake -T in Rails app
AOT Future
• AOT might be available in Java 9
• Many tweaks we can make to help it
• Ideal: all code run at boot runs native
• Should get closer to MRI
ThankYou
• @headius
• @tom_enebo
• http://jruby.org