Inside the super-fast CSS engine: Quantum CSS (aka Stylo)
You've probably heard of Project Quantum ... This is a project for substantial reworking of Firefox's insides to speed up the browser. In parts, we implement the experience of our experimental browser Servo and significantly improve the rest of the engine elements.
The project was compared with the replacement of an aircraft engine on the fly. We make changes to the Firefox component behind the component, so you can evaluate their effect in the next release of the browser as soon as it is ready.
Note. There are many illustrations under the cut. All of them are clickable (for viewing in higher resolution). If you stumble upon inaccurate translation and other errors - I will be grateful if you report this in the comments or in PM.
And the first major component from Servo - the new CSS engine Quantum CSS (formerly known as Stylo) - is now available for testing in the Firefox nightly build (commentary suggested that there is already 55 in the stable). For its inclusion, the option layout.css.servo.enabled inabout: config is responsible.
The new engine embodies the best innovations from other browsers.
Quantum CSS takes advantage of modern hardware, parallelizing the work between all the cores of the processor, which gives acceleration up to 2, 4 or even 18 times.
In addition, it combines modern optimizations from other browsers, so that even without parallelization it is very fast.
But what exactly does the CSS engine do? First, let's look at what the CSS engine is in general and what its place in the browser, and after we figure out how Quantum CSS accelerates the whole thing.
What is a CSS engine?The CSS engine is part of the browser rendering engine. The rendering engine takes the HTML and CSS files of the site and turns them into pixels on the screen.
Each browser has a rendering engine. Chrome has Blink, Edge has EdgeHTML, Safari has WebKit, and Firefox has Gecko.
To digest files in pixels, they all do roughly the same thing:
1) Parsing files into browser-understandable objects, including DOM. At this stage, the DOM knows about the structure of the page, knows about the parent relationships between the elements, but does not know how these elements should look.
2) Determination of the appearance of elements. For each DOM node, the CSS engine finds out which CSS rules to apply. Then it determines the value for each CSS property. Styles each node in the DOM tree, attaching the calculated styles.
3) Determine the size and position for each node. For everything that needs to be displayed on the screen, boxes are created. They represent not only DOM nodes, but also what can be inside them. For example, lines of text.
4) Drawing blocks. It can occur on several layers. I imagine this myself, like old, hand-drawn animations on several sheets of translucent paper. This allows you to change one layer, without having to redraw others.
5) Combining layers into one image, having previously applied the necessary composer properties (for example, transformations) to them. It's like taking a photo of layers that are combined together. Then this image will be displayed on the screen.
That is, before the miscalculation of styles on the input CSS-engine is available:
List of style rules
And so, he alternately defines the styles for each DOM node, one by one. The value is assigned to each CSS property, even if it is not specified in style sheets.
I present this to myself as filling out the form, where all the fields are required. You need to fill out this form for each DOM node.
To do this, the CSS engine must do two things:
Select the rules that should be applied to the node (selector matching)
Fill all missing values with standard or inherit parent (cascading, the cascade)
Mapping selectorsFirst, we select all the rules applied to the node in the list. Since there may be several suitable rules, several definitions of the same property are possible.
In addition, the browser itself adds some standard styles (user agent style sheets). So how does the CSS engine determine which value to use?
That's where the "specificity rule" comes to us. The CSS engine creates a definition table, which then sorts it in different columns.
The rule wins with the greatest concreteness. Based on such a table, the CSS engine inserts all the values in it into the form.
The rest is calculated by cascading.
CascadingCascading simplifies writing and maintaining CSS. Thanks to it you can set the color property ofbody, and know that the color of the text in the elements p, span, li will be the same (unless you override it yourself).
The CSS engine checks for blank fields in the form. If the property is inherited by default, the CSS engine rises through the tree and checks to see if the value of this property is set to the parent element. If none of the ancestors of the value determines, or it does not inherit, then the default value is set.
So now all the styles for the given DOM node are calculated, the form is filled.
Note: sharing styles structuresThe described form is slightly simplified. CSS has hundreds of properties. If the CSS engine kept the value of each property for each DOM node, it would quickly use all available memory.
Instead, the engines usually use the mechanism of sharing style structures (style struct sharing). They store values that are commonly used together (for example, font properties) in another object called the "style structure". Further, instead of storing all properties in one object, objects of calculated styles contain only a pointer. For each property category, there is a pointer to the style structure with the desired values.
It saves both memory and time. Nodes with similar styles can simply point to the same style structures for common properties. And because many properties are inherited, the parent can share its structure with any child nodes that do not override their own values.
So how do we speed things up?So it looks like an un-optimized process of calculating styles.
A lot of work is done here. At what not only at the moment of the first page loading. And again and again, when interacting with the page, when you move the cursor over the elements or change the DOM, the styles are recalculated.
This means that the calculation of CSS styles is an excellent candidate for optimization ... And for the past 20 years, browsers have tested many different optimization strategies. Quantum CSS tries to combine the best of them to create a new super-fast engine.
Let's look at how this works together.
ParallelizingThe Servo project (from which Quantum CSS came out) is an experimental browser that tries to parallelize everything in the process of rendering a web page. What does it mean?
You can compare the computer with the brain. There is an element responsible for thinking (ALU). Near it is something like short-term memory (registers), the latter are grouped together on the central processor. In addition, there is a long-term memory (RAM).
Early computers could think only one thought at a time. But over the past decades, processors have changed, now they have several groups of ALUs and registers grouped together. So now processors can think several thoughts simultaneously - in parallel.
Quantum CSS uses these advantages by dividing the computation of styles for different DOM nodes on different kernels.
It may seem that it's easy ... Just just divide the branches of the tree and process them on different cores. In fact, everything is much more complicated for several reasons. The first reason is that DOM trees are often uneven. That is, some kernels will receive significantly more work than others.
To distribute the work more evenly Quantum CSS uses a technique called "work stealing". When a DOM node is processed, the program takes its direct children and splits them into one or more "work units". These units of work are queued.
When a kernel completes all the work in its turn, it can look for work in other queues. In this way, we evenly distribute the work without the need for a preliminary evaluation with the passage along the entire tree.
In most browsers it will be difficult to implement this correctly. Parallelism is obviously a difficult task, and the CSS engine is quite complex and by itself. It is also located between the other two most complex parts of the rendering engine - DOM and markup. In general, it's easy to make a mistake, and parallelizing can lead to quite trivial bugs, called "data races". I describe these bugs in more details in another article (there is also a translation into Russian ).
If you accept edits from hundreds of thousands of contributors, how can you apply parallelism without fear? For this, we have Rust .
Rust allows you to statically verify that there is no data race. That is, you avoid trivial bugs, not allowing them into your code initially. The compiler simply will not allow you to do this. I will write more about this in future articles. You can also watch the introductory video about parallelism in Rust or this a more detailed conversation about" stealing work ".
All this greatly simplifies the matter. Now, almost nothing stops you from implementing the calculation of CSS styles effectively in parallel. This means that we can approach linear acceleration. If your processor is 4-core, then parallelization will give a speed increase of almost 4 times.
Acceleration of recalculation using the rules treeFor each DOM node, the CSS engine must go through all the rules and perform a selector mapping. For most nodes, the corresponding selectors will most likely not change very often. For example, if a user hovers over an item, then the corresponding rules may change. We need to recalculate the styles for all of its descendants in order to handle the inheritance of properties. But the rules corresponding to these descendants probably will not change.
It would be nice to remember which rules correspond to these descendants, so you do not have to match the selectors again ... And the rules tree that came from previous versions of Firefox does just that.
The CSS engine selects the selectors corresponding to the element, and then sorts them by specificity. The result is a linked list of rules.
This list is added to the tree.
The CSS engine tries to minimize the number of branches in the tree, reusing them when possible.
If most selectors in the list match the existing branch, it will follow it. But it can reach the point where the next rule in the list does not match the rule from the existing branch. Only in this case a new branch is created.
The DOM node will receive a pointer to the rule that was last added (in our example, div # warning). It is the most concrete.
When the styles are recalculated, the engine performs a quick check to see if the change in the rules of the parent element can affect the rules of the children. If not, then for all descendants, the engine can simply use a pointer to the corresponding rule in the tree. That is, completely skip the comparison of selectors and sorting.
And so, it helps save time when recalculating styles, but the initial calculation is all the same time-consuming. If there are 10,000 nodes, then it is necessary to compare the selectors 10,000 times. But there is a way to speed this up.
Accelerating the initial rendering using the common style cacheImagine a page with thousands of nodes. Many of them will comply with the same rules. For example, imagine a long Wikipedia page ... The main content paragraphs should have absolutely identical style rules and absolutely identical calculated styles.
Without optimizations, the CSS engine must match the selectors and calculate the styles for each paragraph separately. But if there was a way to prove that the styles for all paragraphs coincide, then the engine could just do this job once, and from each node simply point to the same calculated style.
This is what makes the general rules cache, drawing inspiration from Safari and Chrome. After processing the element, the calculated style is cached. Next, before starting the calculation of the styles of the next element, several checks are performed to check whether it is impossible to use something from the cache.
The checks are as follows:
Whether 2 nodes have the same ID, classes, etc. If so, they will follow the same rules.
Do they have the same values for everything that is not based on selectors (for example, built-in styles). If so, the above rules will not be overridden, or they will be redefined identically for both.
Do both parents indicate the calculated styles for the same object? If so, the inherited values will also be the same.
These checks were implemented in the early versions of the general style cache from the very beginning. But there are many small situations in which styles do not match. For example, if the CSS rule uses a selector: first-child, then the styles of the two paragraphs may not match, even if the above checks claim the opposite.
WebKit and Blink in these situations surrender and do not use a common style cache. And the more sites these modern selectors use, the less is the benefit of such optimization, so the Blink team recently removed it altogether. But it turns out that there is an opportunity to keep up with all these updates and with a common style cache.
In Quantum CSS we collect all those strange selectors and check if they are applied to the DOM node. Then we store the result of this check in the form of units and zeros for each such selector. If two elements have an identical set of units and zeros, we know that they exactly match.
If the DOM node can use styles that are already calculated, then in fact almost all the work is skipped. Pages often have many nodes with the same styles, so a common style cache saves memory and really speeds up the work.
OutputThis is the first major technology transfer from Servo to Firefox. We learned a lot about how to make modern, high-performance code on Rust into the core of Firefox.
We are very pleased that the big piece of Project Quantum is ready for beta use. We will be grateful if you try it and, in case of errors, let me know those about them .
|Vote for this post
Bring it to the Main Page