Carakan

Over the past few months, a small team of developers and testers have been working on implementing a new ECMAScript/JavaScript engine for Opera. When Opera’s current ECMAScript engine, called Futhark, was first released in a public version, it was the fastest engine on the market. That engine was developed to minimize code footprint and memory usage, rather than to achieve maximum execution speed. This has traditionally been a correct trade-off on many of the platforms Opera runs on. The Web is a changing environment however, and tomorrow’s advanced web applications will require faster ECMAScript execution, so we have now taken on the challenge to once again develop the fastest ECMAScript engine on the market.

The name Carakan, like the names of Opera’s previous ECMAScript engines, Futhark, Linear A and Linear B, is the name of a writing system, or “script”.

We have focused our efforts to improve upon our previous engine in three main areas:

Register-based bytecode

The last couple of generations of Opera’s ECMAScript engine have used a stack-based bytecode instruction set. This type of instruction set is based around a stack of values, where most instructions “pop” input operands from the value stack, process them, and “push” the result back onto the value stack. Some instructions simply push values onto the value stack, and others rearrange the values on the stack. This gives compact bytecode programs and is easy to generate bytecode for.

In the new engine, we’ve instead opted for a register-based bytecode instruction set. In a register-based machine, instead of a dynamically sized stack of values, there’s a fixed size block of them, called “registers”. Instead of only looking at the values at the top of the stack, each instruction can access any register. Since there is no need to copy values to and from the top of the stack to work on them, fewer instructions need to be executed, and less data needs to be copied.

Native code generation

Although our new engine’s bytecode instruction set permits the implementation of a significantly faster bytecode execution engine, there is still significant overhead involved in executing simple ECMAScript code, such as loops performing integer arithmetics, in a bytecode interpreter. To get rid of this overhead we are implementing compilation of whole or parts of ECMAScript programs and functions into native code.

This native code compilation is based on static type analysis (with an internal type system that is richer than ECMAScript’s ordinary one) to eliminate unnecessary type-checks, speculative specialization (with regards to statically indeterminate types) where appropriate, and a relatively ambitious register allocator that allows generation of compact native code with as few unnecessary inter-register moves and memory accesses as possible.

On ECMAScript code that is particularly well-suited for native code conversion, our generated native code looks more or less like assembly code someone could have written by hand, trying to keep everything in registers.

The register allocator is designed to be architecture independent, and so is the code generation “frontend” where most of the complicated decisions are made. We have initially implemented a backend for 32- and 64-bit x86, but some preliminary work has already started to generate native code for other CPU architectures, such as ARM.

In addition to generating native code from regular ECMAScript code, we also generate native code that performs the matching of simple regular expressions. This improves performance a lot when searching for matches of simple regular expressions in long strings. For sufficiently long strings, this actually makes searching for a substring using a regular expression faster than the same search using String.prototype.indexOf. For shorter strings, the speed is limited by the overhead of compiling the regular expression.

For more complex regular expressions, native code generation becomes more involved, and improves performance less since the regular expression engine is already reasonably fast. The base regular expression engine is a newly developed one, but it’s already made its debut in Presto 2.2 (the browser engine used in Opera 10 Alpha). It’s fundamentally a typical backtracking regular expression engine, but does some tricks to avoid redundant backtracking, which usually avoids the severe performance issues a backtracking regular expression engine can have on specific regular expressions.

Automatic object classification

Another area of major improvement over our current engine is in the representation of ECMAScript objects. In the new engine, each object is assigned a class that collects various information about the object, such as its prototype and the order and names of some or all of its properties. Class assignment is naturally very dynamic, since ECMAScript is a very dynamic language, but it is organized such that objects with the same prototype and the same set of properties are assigned the same class.

This representation allows compact storage of individual objects, since most of the complicated structures representing the object’s properties are stored in the class, where they are shared with all other objects with the same class. In real-world programs with many objects of the same classes, this can save significant amounts of memory. It can be expected that most programs that do create many objects still only have a few different classes of objects.

The shared lookup structures for properties also allow us to share the result of looking up a property between different objects. For two objects with the same class, if the lookup of a property “X” on the first object gave the result Y, we know that the same lookup on the second object will also give the result Y. We use this to cache the result of individual property lookups in ECMAScript programs, which greatly speeds up code that contains many property reads or writes.

Performance

So how fast is Carakan? Using a regular cross-platform switch dispatch mechanism (without any generated native code) Carakan is currently about two and a half times faster at the SunSpider benchmark than the ECMAScript engine in Presto 2.2 (Opera 10 Alpha). Since Opera is ported to many different hardware architectures, this cross-platform improvement is on its own very important.

The native code generation in Carakan is not yet ready for full-scale testing, but the few individual benchmark tests that it is already compatible with runs between 5 and 50 times faster, so it is looking promising so far.