Doubling the Number of Content Processes in Firefox

Over the past year, the Fission MemShrink project has been working tirelessly to reduce the memory overhead of Firefox. The goal is to allow us to start spinning up more processes while still maintaining a reasonable memory footprint. I’m happy to announce that we’ve seen the fruits of this labor: as of version 66 we’re doubling the default number of content processes from 4 to 8.

Doubling the number of content processes is the logical extension of the e10s-multi project. Back when that project wrapped up we chose to limit the default number of processes to 4 in order to balance the benefits of multiple content processes — fewer crashes, better site isolation, improved performance when loading multiple pages — with the impact on memory usage for our users.

Our telemetry has looked really good: if we compare beta 59 (roughly when this project started) with beta 66, where we decided to let the increase be shipped to our regular users, we see a virtually unchanged total memory usage for our 25th, median, and 75th percentile and a modest 9% increase for the 95th percentile on Windows 64-bit.

Doubling the number of content processes and not seeing a huge jump is quite impressive. Even on our worst-case-scenario stress test — AWSY which loads 100 pages in 30 tabs, repeated 3 times — we only saw a 6% increase in memory usage when turning on 8 content processes when compared to when we started the project.

This is a huge accomplishment and I’m very proud of the loose-knit team of contributors who have done some phenomenal feats to get us to this point. There have been some big wins, but really it’s the myriad of minor improvements that compounded into a large impact. This has ranged from delay-loading browser JavaScript code until it’s needed (or not at all), to low-level changes to packing C++ data structures more efficiently, to large system-wide changes to how we generate bindings that glue together our JavaScript and C++ code. You can read more about the background of this project and many of the changes in our initial newsletter and the follow-up.

While I’m pleased with where we are now, we still have a way to go to get our overhead down even further. Fear not, for we have a quite a few changes in the pipeline including a fork server to help further reduce memory usage on Linux and macOS, work to share font data between processes, and work to share more CSS data between processes. In addition to reducing overhead we now have a tab unloading feature in Nightly 67 that will proactively unload tabs when it looks like you’re about to run out of memory. So far the results in reducing the number of out-of-memory crashes are looking really good and we’re hoping to get that released to a wider audience in the near future.

Firefox memory usage with multiple content processes

This is a continuation of my Are They Slim Yet series, for background see my previous installment.

With Firefox’s next release, 54, we plan to enable multiple content processes — internally referred to as the e10s-multi project — by default. That means if you have e10s enabled we’ll use up to four processes to manage web content instead of just one.

My previous measurements found that four content processes are a sweet spot for both memory usage and performance. As a follow up we wanted to run the tests again to confirm my conclusions and make sure that we’re testing on what we plan to release. Additionally I was able to work around our issues testing Microsoft Edge and have included both 32-bit and 64-bit versions of Firefox on Windows; 32-bit is currently our default, 64-bit is a few releases out.

The methodology for the test is the same as previous runs, I used the atsy project to load 30 pages and measure memory usage of the various processes that each browser spawns during that time.

Without further ado, the results:

Graph of browser memory usage, Chrome uses a lot.

So we continue to see Chrome leading the pack in memory usage across the board: 2.4X the memory as Firefox 32-bit and 1.7X 64-bit on Windows. IE 11 does well, in fact it was the only one to beat Firefox. It’s successor Edge, the default browser on Windows 10, appears to be striving for Chrome level consumption. On macOS 10.12 we see Safari going the Chrome route as well.

Browsers included are the default versions of IE 11 and Edge 38 on Windows 10, Chrome Beta 59 on all platforms, Firefox Beta 54 on all platforms, and Safari Technology Preview 29 on macOS 10.12.4.

Note: For Safari I had to run the test manually, they seem to have made some changes that cause all the pages from my test to be loaded in the same content process.

Are they slim yet, round 2

A year later let’s see how Firefox fares on Windows, Linux, and OSX with multiple content processes enabled.

Results

Graph comparing memory usage, chrome is still quite high

We can see that Firefox with four content processes fares better than Chrome on all platforms which is reassuring; Chrome is still about 2X worse on Windows and Linux. Our current plan is to only move up to four content processes, so this is great news.

Two content processes is still better than IE, with four we’re a bit worse. This is pretty impressive given last year we were in the same position with one content process.

Surprisingly on Mac Firefox is better than Safari with two content processes, compared with last year where we used 2X the memory with just one process, now we’re on par with four content processes.

I included Firefox with eight content processes to keep us honest. As you can see we actually do pretty well, but I don’t think it’s realistic to ship with that many nor do we currently plan to. We already have or are adding additional processes such as the plugin process for Flash and the GPU process. These need to be taken into consideration when choosing how many content processes to enable and pushing to eight doesn’t give us much breathing room. Making sure we have measurements now is important; it’s good to know where we can improve.

Overall I feel solid about these numbers, especially considering where we were just a year ago. This bodes well for the e10s-multi project.

Test setup

This is the same setup as last year. I load the first 30 pages of the tp5 page set (a snapshot of Alexa top 100 websites from a few years ago), each in its own tab, with 10 seconds in between loads and 60 seconds of settle time at the end.

Note: There was a minor change to the setup to give each page a unique domain. At least Safari and Chrome are roughly doing process per domain, so just using different ports on localhost was not enough. A simple solution was to modify my /etc/hosts file to add localhost-<1-30> aliases.

Methodology

Measuring multiprocess browser memory usage is tricky. I’ve settled with a somewhat simple formula of:

total_memory = sum_uss(content processes) + sum_rss(parent processes); 

Where a parent process is defined as anything that is not a content process (I’ll explain in a moment). Historically there was just one parent process that manages all other processes, this is still somewhat the case but each browser still has other executables they may run in addition to content processes. A content process has a slightly different definition per browser, but is generally “where the pages are loaded” — this is an oversimplification, but it’s good enough for now.

My definitions:

Browser Content Definition Example “parent”
Firefox firefox processes launched with the -contentproc command line. firefox without the -contentproc command line, plugin-process which is used for Flash, etc.
Chrome chrome processes launched with the --type command line. chrome without out the --type command line, nacl_helper, etc.
Safari WebContent processes. Safari, SafariServices, SafariHistory, Webkit.Networking, etc.
IE iexplore.exe process launched with the /prefetch command line. iexplore without the /prefetch command line.
Edge MicrosoftEdgeCP.exe processes. MicrosoftEdge.exe, etc.

For Firefox this is a reasonable and fair measurement, for other browsers we might be under counting memory by a bit. For example Edge has a parent executable, MicrosoftEdge.exe, and a different content executable, MicrosoftEdgeCP.exe, arguably we should measure the RSS of one the MicrosoftEdgeCP.exe processes, and USS for the rest, so we’re probably under counting. On the other hand we might end up over counting if the parent and content processes are sharing dynamic libraries. In future measurements I may tweak how we sum the memory, but for now I’d rather possibly under count rather then worry about being unfair to other browsers.

Raw numbers

OS Browser Total Memory
Ubuntu 16.04 LTS Chrome 54 (see note) 1,478 MB
Ubuntu 16.04 LTS Firefox 55 – 2 CP 765 MB
Ubuntu 16.04 LTS Firefox 55 – 4 CP 817 MB
Ubuntu 16.04 LTS Firefox 55 – 8 CP 990 MB
macOS 10.12.3 Chrome 59 1,365 MB
macOS 10.12.3 Firefox 55 – 2 CP 1,113 MB
macOS 10.12.3 Firefox 55 – 4 CP 1,215 MB
macOS 10.12.3 Firefox 55 – 8 CP 1,399 MB
macOS 10.12.3 Safari 10.2 (see note) 1,203 MB
Windows 10 Chrome 59 1,382 MB
Windows 10 Edge (see note) N/A
Windows 10 Firefox 55 – 2 CP 587 MB
Windows 10 Firefox 55 – 4 CP 839 MB
Windows 10 Firefox 55 – 8 CP 905 MB
Windows 10 IE 11 660 MB

Browser Version Notes

  • Chrome 54 — aka chrome-unstable — was used on Ubuntu 16.04 LTS as that’s the latest branded version available (rather than Chromium)
  • Firefox Nightly 55 – 2 CP is Firefox with 2 content processes and one parent process, the default configuration for Nightly.
  • Firefox Nightly 55 – 4 CP is Firefox with 4 content processes and one parent process, this is a longer term goal.
  • Firefox Nightly 55 – 8 CP is Firefox with 8 content processes and one parent process, this is aspirational, a good sanity check.
  • Safari Technology Preview 10.2 release 25 was used on macOS as that’s the latest branded version available (rather than Webkit nightly)
  • Edge was disqualified because it seemed to bypass the hosts file and wouldn’t load pages from unique domains. I can do measurements so I might revisit this, but it wouldn’t have been a fair comparison as-is.

Memory Usage of Firefox with e10s Enabled

Quick background

With the e10s project full steam ahead, likely to be enabled for many users in mid-2016, it seemed like a good time to measure the memory overhead of switching Firefox from a single-process architecture to a multi-process architecture. The concern here is simple: the more processes we have, the more memory we use. Starting Q4-2015 I began setting up a test to measure the memory usage of Firefox with a variable amount of content processes.

Methodology

For the test I used a slightly modified version of the AWSY framework that I maintain for areweslimyet.com. This test runs through a sample pageset, the same one used in Talos perf testing, in an attempt to simulate a long-lived session.

The steps:

  1. Open Firefox configured to use N content processes.
  2. Measure memory usage.
  3. Open 100 urls in 30 tabs, cycling through tabs once 30 are opened. Wait 10 seconds per tab.
  4. Measure memory usage.
  5. Close all tabs.
  6. Measure memory usage.

For this test I performed two iterations of this, reporting the startup memory usage from the first and the end of test memory usage (TabsOpen, TabsClosed) for the second.

Note: Just summing the total memory usage of each Firefox process is not a useful metric as it will include memory shared between the main process and the content processes. For a more realistic baseline I chose to use a combination of RSS and USS (aka unique set size, private working bytes):

total_memory = RSS(parent_process) + sum(USS(content_processes))

For example if we had:

Process RSS USS
parent 100 50
content_1 90 30
content_2 95 40

total_memory = 100 + 30 + 40

Results

Note on memory checkpoints:

  • Settled: 30 seconds have passed since previous checkpoint.
  • ForceGC: We manually invoked garbage collection.
  • We list the memory usage for each checkpoint using 0, 1, 2, 4, 8 content processes.

Linux, 64-bit

0 1 2 4 8
Start 190 MiB 232 MiB 223 MiB 223 MiB 229 MiB
StartSettled 173 MiB 219 MiB 216 MiB 219 MiB 213 MiB
TabsOpen 457 MiB 544 MiB 586 MiB 714 MiB 871 MiB
TabsOpenSettled 448 MiB 542 MiB 582 MiB 696 MiB 872 MiB
TabsOpenForceGC 415 MiB 510 MiB 560 MiB 670 MiB 820 MiB
TabsClosed 386 MiB 507 MiB 401 MiB 381 MiB 381 MiB
TabsClosedSettled 264 MiB 359 MiB 325 MiB 308 MiB 303 MiB
TabsClosedForceGC 242 MiB 322 MiB 304 MiB 285 MiB 281 MiB

Windows 7, 64-bit

32-bit Firefox

0 1 2 4 8
Start 172 MiB 212 MiB 207 MiB 204 MiB 213 MiB
StartSettled 194 MiB 236 MiB 234 MiB 232 MiB 234 MiB
TabsOpen 461 MiB 537 MiB 631 MiB 800 MiB 1,099 MiB
TabsOpenSettled 463 MiB 535 MiB 635 MiB 808 MiB 1,108 MiB
TabsOpenForceGC 447 MiB 514 MiB 593 MiB 737 MiB 990 MiB
TabsClosed 429 MiB 512 MiB 435 MiB 333 MiB 347 MiB
TabsClosedSettled 356 MiB 427 MiB 379 MiB 302 MiB 306 MiB
TabsClosedForceGC 342 MiB 392 MiB 360 MiB 297 MiB 295 MiB

64-bit Firefox

0 1 2 4 8
Start 245 MiB 276 MiB 275 MiB 279 MiB 295 MiB
StartSettled 236 MiB 290 MiB 287 MiB 288 MiB 289 MiB
TabsOpen 618 MiB 699 MiB 805 MiB 1061 MiB 1334 MiB
TabsOpenSettled 625 MiB 690 MiB 795 MiB 1058 MiB 1338 MiB
TabsOpenForceGC 600 MiB 661 MiB 740 MiB 936 MiB 1184 MiB
TabsClosed 568 MiB 663 MiB 543 MiB 481 MiB 435 MiB
TabsClosedSettled 451 MiB 517 MiB 454 MiB 426 MiB 377 MiB
TabsClosedForceGC 432 MiB 480 MiB 429 MiB 412 MiB 374 MiB

OSX, 64-bit

0 1 2 4 8
Start 319 MiB 350 MiB 342 MiB 336 MiB 336 MiB
StartSettled 311 MiB 393 MiB 383 MiB 384 MiB 382 MiB
TabsOpen 889 MiB 1,038 MiB 1,243 MiB 1,397 MiB 1,694 MiB
TabsOpenSettled 876 MiB 977 MiB 1,105 MiB 1,252 MiB 1,632 MiB
TabsOpenForceGC 795 MiB 966 MiB 1,096 MiB 1,235 MiB 1,540 MiB
TabsClosed 794 MiB 996 MiB 977 MiB 889 MiB 883 MiB
TabsClosedSettled 738 MiB 925 MiB 876 MiB 823 MiB 832 MiB
TabsClosedForceGC 621 MiB 800 MiB 799 MiB 755 MiB 747 MiB

Conclusions

Simply put: the more content processes we use, the more memory we use. On the plus side it’s not a 1:1 factor, with 8 content processes we see roughly a doubling of memory usage on the TabsOpenSettled measurment. It’s a bit worse on Windows, a bit better on OSX, but it’s not 8 times worse.

Overall we see a 10-20% increase in memory usage for the 1 content process case (which is what we plan on shipping initially). This seems like a fair tradeoff for potential security and performance benefits, but as we try to grow the number of content processes we’ll need to take another look at where that memory is being used.

For the next steps I’d like to take a look at how our memory usage compares to other browsers. Expect a follow up post on that shortly.