In WebRTC (Web Real-Time Communication) development, understanding the quality of audio and video streams is critical for delivering seamless communication experiences. One of the most common metrics for evaluating call quality is the Mean Opinion Score (MOS).
While MOS provides valuable insights into the perceived quality of a call, calculating it in real-time WebRTC applications can be complex.
In this article, I will try to share my understanding in order to help you compute your score.
MOS is a metric that represents the quality of audio (and/or video) communication as perceived by the user. It is traditionally measured by having users rate the quality of a call on a scale of 1 to 5, where:
• 1 = Bad• 2 = Poor• 3 = Fair• 4 = Good• 5 = Excellent
This subjective feedback is then averaged across multiple users to obtain a mean opinion score.
In the context of WebRTC, the goal is to estimate this subjective score based on objective parameters like packet loss, latency, and codec performance. MOS offers developers an indication of how users are likely to perceive the quality of their calls, without relying on direct feedback.
Up to know, it seems to be not so complicated, need to understand which parameters to use and what is the computation to do.
Here are a few key points (this list is not exhaustive).
MOS is traditionally based on human perception, and that can vary widely across different users. Some may tolerate slight audio distortions, while others may be more sensitive to them.
As a result, it’s difficult to accurately predict MOS using purely objective data that can represent your audience.
The WebRTC stack gives a lot of statistics thanks to the getStats
API. There are more than 170 statistics defined, standardized and grouped into reports.
Unfortunately, MOS is not computed… The stack reports only raw information.
So, if you want it, you need to compute it on your own!
Like many developers, I started by googling on MOS to look for an algorithm to “copy/paste” in my application.
In some minutes, I found the “Graal” and had it implemented in my application.
I copied and pasted this formula and were very happy to have it.
The formula found was a bit complex, and I have no choice to use it “as it”; relaying on the community to detect and fix any issues in case of (such as for some corner cases).
So, it is like a black box. I hope that the one who implemented it in JavaScript didn’t make any mistake :-)
I spent more time to understand from which source, the algorithm found was implemented.
After more search, I finally found documentation from the ITU that explains how to compute MOS such as the recommendation G107: The E-model: a computational model for use in transmission planning
You may find this documentation challenging, as it is primarily aimed at scientists conducting research rather than Web developers…
I identified the core computation, but it relies on more complex calculations where some parameters depend on real-world experiments or environmental factors.
Fortunately, there is a Simplified MOS Algorithm available.
Finally, I discovered multiple methods to compute the MOS.
On one hand, there are the ITU-provided algorithms. On the other, alternative algorithms rely on effective latency, fitting parameters, or other metrics that may better reflect modern codecs like Opus.
The original algorithm was developed before WebRTC… Meanwhile, more users now have fiber connections with significantly reduced latency.
So, is the MOS algorithm still accurate today? And which version of the algorithm should I use?
I’m not sure if there is a definitive MOS formula, but here are some key considerations for implementing MOS.
Each paragraph outlines a specific aspect, from individual statistics to consider to the core algorithm itself.
All MOS algorithms rely on the Round Trip Time.
When examining WebRTC statistics, relevant information can be found in the remote-inbound-rtp, remote-outbound-rtp, and candidate-pair reports.
So, which report should be used?
From my understanding, the RTT should ideally be taken from the remote-outbound-rtp report, as we are computing the MOS for an inbound audio stream to assess the quality experienced by the user who listens to this audio.
However, I observed that the RTT in this report is not computed, with measurements consistently showing as 0.
In such cases, consider alternative sources for RTT:
remote-inbound-rtp report for an outgoing audio stream (where kind=audio).
candidate-pair report, though less accurate since it derives from STUN binding requests, which may differ from RTCP-based values.
If the MOS calculation is frequent, you can also use the last valid RTT value if available.
This approach helps ensure accurate and consistent MOS computations even when RTT is unavailable in the remote-outbound-rtp report.
There are several RTT statistics available: currentRoundTripTime
and roundTripTime
on one hand, and totalRoundTripTime
with roundTripTimeMeasurements
on the other.
So, which should be used?
Personally, I rely on totalRoundTripTime
and roundTripTimeMeasurements
to calculate an average RTT over the interval between my previous and current measurements. This approach smooths out fluctuations by providing an average over the entire interval.
I avoid using currentRoundTripTime
and roundTripTime
because these represent the most recent measurements by the stack and can be too instantaneous. Within an interval, multiple RTT calculations might have been made by the stack.
Note: When calculating a delta RTT, be cautious with short intervals (less than 2 seconds). If roundTripTimeMeasurements
hasn’t increased since the last report, this indicates no new RTT calculations were made. In this case, reuse the last computed average if available.
As previously mentioned, do not assume that the RTT statistic will always be present in the report.
For short intervals (under 5 seconds), you may receive remote--rtp reports without an RTT value. For very short intervals (under 2 seconds), check the timestamp in the remote--rtp report. If it matches the previous timestamp, this indicates that the stack has not received a new report from the remote side, so you can keep the previous value already collected.
Be sure to account for this in your algorithm!
All algorithms use a notion of delay, sometimes it is the Absolute Delay and sometimes the Effective Delay.
My understanding on this point is that:
The Absolute delay refers to the total time taken from the sender’s mouth to the listener’s ear. This includes all components contributing to latency, not just the network delay.
The Effective delay in opposite refers to the network delay component alone, rather than the full end-to-end or absolute delay. This effective latency is essentially the time it takes for data to travel across the network from sender to receiver.
Algorithms based on absolute delay should not rely on Jitter because Jitter does not represent additional delay—it is compensated by the Jitter Buffer Delay. Therefore, absolute delay calculations should use the jitter buffer delay statistic instead.
For effective delay, adding RTT and jitter can be a bit misleading, as it combines two different metrics. However, this approach may still provide some insight.
This calculation is straightforward, but you will need to compute a delta by dividing jitterBufferDelay
by jitterBufferEmittedCount
for your interval.
This will give you the average delay over the most recent interval.
To calculate the Absolute Delay, I use the following formula
const absolute_delay = RTT / 2 + JitterBufferDelay + PlayoutDelay + 20;
All data is in milliseconds, with 20 representing the packetization time (in milliseconds). For PlayoutDelay, a value of 0 can be used, as this statistic is not yet standardized.
On my own, I use 2 algorithms depending on the codec:
For Opus, I use the G107-2 algorithm which provides a MOS computation for a fullband codec.
For any G7xx, I rely on the G107 Simplified algorithm
Here are the algorithms:
const getAbsoluteDelay = (roundTripTime: number, jitterBufferDelay: number): number => roundTripTime / 2 + jitterBufferDelay + 20;const computeMOS = (R: number): number => {if (R < 0) {return 1;}if (R > 100) {return 4.5;}return 1 + 0.035 * R + R * (R - 60) * (100 - R) * 0.000007;};/*** For Opus* @param rtt Measured in milliseconds* @param packetLost In range [0...1]* @param jitterBufferDelay Measured in milliseconds*/const score_G107_2 = (rtt, packetLost, jitterBufferDelay): number => {const Ro = 148;const Is = 0;let Id = 0;const A = 0;const Ta = getAbsoluteDelay(rtt, jitterBufferDelay);const Ppl = packetLost * 100; // percentage of packets lostconst Iee = 10.2 + (132 - 10.2) * (Ppl / (Ppl + 4.3));if (Ta <= 100) {Id = 0;} else {const x = (Math.log(Ta) - Math.log(100)) / Math.log(2);Id = 1.48 * 25 * (Math.pow(1 + Math.pow(x, 6), 1 / 6) - 3 * Math.pow(1 + Math.pow(x / 3, 6), 1 / 6) + 2);}const Rx = Ro - Is - Id - Iee + A;const R = Rx / 1.48;return Math.max(1, computeMOS(R));};/*** For G711, G729, G7xx* @param rtt Measured in milliseconds* @param packetLost In range [0...1]* @param jitterBufferDelay Measured in milliseconds*/const score_G107_simplified = (rtt, packetLost, jitterBufferDelay): number => {const absolute_delay = getAbsoluteDelay(rtt, jitterBufferDelay);const Ppl = packetLost * 100; // percentage of packets lostconst h = absolute_delay - 177.3 < 0 ? 0 : 1;const Id = 0.024 * absolute_delay + 0.11 * (absolute_delay - 177.3) * h;const R0 = 93.2;const Ie = 0;const rFactor = Math.max(0, R0 - Ppl);const Rx = 0.18 * rFactor * rFactor - 27.9 * rFactor + 1126.62 - Id - Ie;return Math.max(1, computeMOS(Rx));};
It depends on what you expect about the MOS?
If you want to signal to the user that the call is degraded, computing it regularly over periods from 3 to 5 seconds is pretty common
If you want to report a global indicator at the end of the call, only one measure can be computed.
But as Latency, Jitter or Jitter Buffer, packet loss, and codec performance vary throughout the call. These fluctuations affect MOS scores differently at various points in time, making MOS interpretation especially tricky if you have only one single MOS for the whole call.
Is my MOS satisfactory?
MOS is a valuable indicator, but on its own, it doesn’t help improve the call quality experienced by users—it’s simply an observation based on certain metrics.
To truly understand and diagnose call issues, you will need:
Additional statistics (e.g., accessible via chrome://webrtc-internals for deeper observation and analysis),
A custom call-inspector tool to collect and track you own relevant KPIs you have defined,
Or, consider professional solutions like testRTC, which provides comprehensive call analytics, statistics, and insights.
Because, you need to look at a sufficient number of calls (call quality at scale) to detect meaningful issues. Examining just one call might not provide a clear picture of the main problem.
Now that you understand this entire thought process, you might realize that, in the end, you can stick with your existing MOS computation—Perhaps, it’s not as critical as it may seem. 🙂
If you have a different perspective on any of the technical points discussed, I’d love to hear your thoughts! I’m always eager to deepen my understanding.