Untitled

> Unfortunately, the BHR Telemetry data from both Aurora 43 & Beta 44
> experiments suggests that e10s is jankier than non-e10s. This holds true
> for profiles with & without extensions.
>
> However, we have identified bugs causing inaccuracies in BHR reporting and
> we are working to imporve BHR as well as other Telemetry performance
> measurements. We have even built an extension to visualize BHR's jank
> detection: https://github.com/chutten/statuser
>
> In general, as we evaluate e10s performance using A/B experiments, we also
> validate and improve the performance probes in parallel.

I have been told this part of my post is misleading, so I'll go into more detail about what we know and don't know about the reliability of the BHR responsiveness measurement and consequently e10s's performance.

1) We know BHR over-reported jank for e10s Firefox during both A/B experiments. This was fixed & uplifted in bug 1234618, but the uplifts didn't make it into either of the Aurora 43 & Beta 44 experiments. This means that the BHR experiment analyses linked above are not reliable. The upcoming Beta 45 A/B experiment will have the issue fixed.

2) Bill McCloskey looked at BHR e10s vs non-e10s performance on the general Aurora 45 population (not as part of an A/B experiment) and found that in these (not randomly selected) e10s & non-e10s populations, e10s is more responsive.
Bill's analysis: http://people.mozilla.org/~wmccloskey/aurora-analysis.html

3) We know that up to now, BHR did not report jank from the e10s child process at all (bug 1228437). This would have caused under-reporting of e10s jank during the A/B experiments as well as in billm's Aurora analysis. This is now fixed in bug 1228437 and pending uplift.

4) BHR uses "pseudostacks" to report the sources of hangs. The C++ psuedostack is lacking in coverage, reducing the usefulness of the collected stacks. This is now addressed in bug 1224374 and pending uplift.

5) There were additional issues with the most recent Beta 44 A/B experiment (e.g. bug 1236754) which reduced the quality of the collected data.

To summarize, validating & fixing BHR is an ongoing effort, and we don't have a basis for determining whether e10s performance is better or worse than non-e10s performance based on the BHR numbers yet. I am very optimistic about the quality of BHR data we will obtain from the upcoming Beta 45 A/B experiment.

P.S. Note that we have much higher confidence in the e10s Telemetry stability numbers.