How to calculate MOS?

By Olivier Anguenot

Published in dev

November 13, 2024

6 min read

What is MOS (Mean Opinion Score)?

Why is MOS complicated to calculate in WebRTC?

MOS recipe

Conclusion

In WebRTC (Web Real-Time Communication) development, understanding the quality of audio and video streams is critical for delivering seamless communication experiences. One of the most common metrics for evaluating call quality is the Mean Opinion Score (MOS).

While MOS provides valuable insights into the perceived quality of a call, calculating it in real-time WebRTC applications can be complex.

In this article, I will try to share my understanding in order to help you compute your score.

What is MOS (Mean Opinion Score)?

MOS is a metric that represents the quality of audio (and/or video) communication as perceived by the user. It is traditionally measured by having users rate the quality of a call on a scale of 1 to 5, where:

•   1 = Bad
•   2 = Poor
•   3 = Fair
•   4 = Good
•   5 = Excellent

This subjective feedback is then averaged across multiple users to obtain a mean opinion score.

In the context of WebRTC, the goal is to estimate this subjective score based on objective parameters like packet loss, latency, and codec performance. MOS offers developers an indication of how users are likely to perceive the quality of their calls, without relying on direct feedback.

Up to know, it seems to be not so complicated, need to understand which parameters to use and what is the computation to do.

Why is MOS complicated to calculate in WebRTC?

Here are a few key points (this list is not exhaustive).

Subjectivity of Human Perception

MOS is traditionally based on human perception, and that can vary widely across different users. Some may tolerate slight audio distortions, while others may be more sensitive to them.

As a result, it’s difficult to accurately predict MOS using purely objective data that can represent your audience.

MOS is not computed by the stack

The WebRTC stack gives a lot of statistics thanks to the getStats API. There are more than 170 statistics defined, standardized and grouped into reports.

Unfortunately, MOS is not computed… The stack reports only raw information.

So, if you want it, you need to compute it on your own!

MOS is based on Math computation

Like many developers, I started by googling on MOS to look for an algorithm to “copy/paste” in my application.

In some minutes, I found the “Graal” and had it implemented in my application.

I copied and pasted this formula and were very happy to have it.

The formula found was a bit complex, and I have no choice to use it “as it”; relaying on the community to detect and fix any issues in case of (such as for some corner cases).

So, it is like a black box. I hope that the one who implemented it in JavaScript didn’t make any mistake :-)

MOS computation is not clearly explained

I spent more time to understand from which source, the algorithm found was implemented.

After more search, I finally found documentation from the ITU that explains how to compute MOS such as the recommendation G107: The E-model: a computational model for use in transmission planning

You may find this documentation challenging, as it is primarily aimed at scientists conducting research rather than Web developers…

I identified the core computation, but it relies on more complex calculations where some parameters depend on real-world experiments or environmental factors.

Fortunately, there is a Simplified MOS Algorithm available.

MOS can be computed differently

Finally, I discovered multiple methods to compute the MOS.

On one hand, there are the ITU-provided algorithms. On the other, alternative algorithms rely on effective latency, fitting parameters, or other metrics that may better reflect modern codecs like Opus.

The original algorithm was developed before WebRTC… Meanwhile, more users now have fiber connections with significantly reduced latency.

So, is the MOS algorithm still accurate today? And which version of the algorithm should I use?

MOS recipe

I’m not sure if there is a definitive MOS formula, but here are some key considerations for implementing MOS.

Each paragraph outlines a specific aspect, from individual statistics to consider to the core algorithm itself.

From which report, RTT comes from?

All MOS algorithms rely on the Round Trip Time.

When examining WebRTC statistics, relevant information can be found in the remote-inbound-rtp, remote-outbound-rtp, and candidate-pair reports.

So, which report should be used?

From my understanding, the RTT should ideally be taken from the remote-outbound-rtp report, as we are computing the MOS for an inbound audio stream to assess the quality experienced by the user who listens to this audio.

However, I observed that the RTT in this report is not computed, with measurements consistently showing as 0.

In such cases, consider alternative sources for RTT:

remote-inbound-rtp report for an outgoing audio stream (where kind=audio).
candidate-pair report, though less accurate since it derives from STUN binding requests, which may differ from RTCP-based values.
If the MOS calculation is frequent, you can also use the last valid RTT value if available.

This approach helps ensure accurate and consistent MOS computations even when RTT is unavailable in the remote-outbound-rtp report.

What statistics should I use for the RTT?

There are several RTT statistics available: currentRoundTripTime and roundTripTime on one hand, and totalRoundTripTime with roundTripTimeMeasurements on the other.

So, which should be used?

Personally, I rely on totalRoundTripTime and roundTripTimeMeasurements to calculate an average RTT over the interval between my previous and current measurements. This approach smooths out fluctuations by providing an average over the entire interval.

I avoid using currentRoundTripTime and roundTripTime because these represent the most recent measurements by the stack and can be too instantaneous. Within an interval, multiple RTT calculations might have been made by the stack.

Note: When calculating a delta RTT, be cautious with short intervals (less than 2 seconds). If roundTripTimeMeasurements hasn’t increased since the last report, this indicates no new RTT calculations were made. In this case, reuse the last computed average if available.

What to do when no RTT statistic?

As previously mentioned, do not assume that the RTT statistic will always be present in the report.

For short intervals (under 5 seconds), you may receive remote--rtp reports without an RTT value. For very short intervals (under 2 seconds), check the timestamp in the remote--rtp report. If it matches the previous timestamp, this indicates that the stack has not received a new report from the remote side, so you can keep the previous value already collected.

Be sure to account for this in your algorithm!

Should I use the Jitter or the Jitter Buffer Delay?

All algorithms use a notion of delay, sometimes it is the Absolute Delay and sometimes the Effective Delay.

My understanding on this point is that:

The Absolute delay refers to the total time taken from the sender’s mouth to the listener’s ear. This includes all components contributing to latency, not just the network delay.
The Effective delay in opposite refers to the network delay component alone, rather than the full end-to-end or absolute delay. This effective latency is essentially the time it takes for data to travel across the network from sender to receiver.

Algorithms based on absolute delay should not rely on Jitter because Jitter does not represent additional delay—it is compensated by the Jitter Buffer Delay. Therefore, absolute delay calculations should use the jitter buffer delay statistic instead.

For effective delay, adding RTT and jitter can be a bit misleading, as it combines two different metrics. However, this approach may still provide some insight.

How to compute the Jitter Buffer Delay?

This calculation is straightforward, but you will need to compute a delta by dividing jitterBufferDelay by jitterBufferEmittedCount for your interval.

This will give you the average delay over the most recent interval.

How to compute Absolute delay?

To calculate the Absolute Delay, I use the following formula

const absolute_delay = RTT / 2 + JitterBufferDelay + PlayoutDelay + 20;

All data is in milliseconds, with 20 representing the packetization time (in milliseconds). For PlayoutDelay, a value of 0 can be used, as this statistic is not yet standardized.

Which algorithm to use?

On my own, I use 2 algorithms depending on the codec:

For Opus, I use the G107-2 algorithm which provides a MOS computation for a fullband codec.
For any G7xx, I rely on the G107 Simplified algorithm

Here are the algorithms:

const getAbsoluteDelay = (roundTripTime: number, jitterBufferDelay: number): number => roundTripTime / 2 + jitterBufferDelay + 20;

const computeMOS = (R: number): number => {
    if (R < 0) {
      return 1;
    }
    if (R > 100) {
      return 4.5;
    }
    return 1 + 0.035 * R + R * (R - 60) * (100 - R) * 0.000007;
};


/**
* For Opus
* @param rtt  Measured in milliseconds
* @param packetLost   In range [0...1]
* @param jitterBufferDelay Measured in milliseconds
*/
const score_G107_2 = (rtt, packetLost, jitterBufferDelay): number => {
    const Ro = 148;
    const Is = 0;
    let Id = 0;
    const A = 0;
    const Ta = getAbsoluteDelay(rtt, jitterBufferDelay);
    const Ppl = packetLost * 100; // percentage of packets lost

    const Iee = 10.2 + (132 - 10.2) * (Ppl / (Ppl + 4.3));

    if (Ta <= 100) {
      Id = 0;
    } else {
      const x = (Math.log(Ta) - Math.log(100)) / Math.log(2);
      Id = 1.48 * 25 * (Math.pow(1 + Math.pow(x, 6), 1 / 6) - 3 * Math.pow(1 + Math.pow(x / 3, 6), 1 / 6) + 2);
    }

    const Rx = Ro - Is - Id - Iee + A;
    const R = Rx / 1.48;
    return Math.max(1, computeMOS(R));
};

/**
* For G711, G729, G7xx
* @param rtt  Measured in milliseconds
* @param packetLost   In range [0...1]
* @param jitterBufferDelay Measured in milliseconds
*/
const score_G107_simplified = (rtt, packetLost, jitterBufferDelay): number => {
    const absolute_delay = getAbsoluteDelay(rtt, jitterBufferDelay);
    const Ppl = packetLost * 100; // percentage of packets lost

    const h = absolute_delay - 177.3 < 0 ? 0 : 1;
    const Id = 0.024 * absolute_delay + 0.11 * (absolute_delay - 177.3) * h;
    const R0 = 93.2;
    const Ie = 0;

    const rFactor = Math.max(0, R0 - Ppl);
    const Rx = 0.18 * rFactor * rFactor - 27.9 * rFactor + 1126.62 - Id - Ie;
    return Math.max(1, computeMOS(Rx));
};

How often to compute the MOS?

It depends on what you expect about the MOS?

If you want to signal to the user that the call is degraded, computing it regularly over periods from 3 to 5 seconds is pretty common

If you want to report a global indicator at the end of the call, only one measure can be computed.

But as Latency, Jitter or Jitter Buffer, packet loss, and codec performance vary throughout the call. These fluctuations affect MOS scores differently at various points in time, making MOS interpretation especially tricky if you have only one single MOS for the whole call.

Conclusion

Is my MOS satisfactory?

MOS is a valuable indicator, but on its own, it doesn’t help improve the call quality experienced by users—it’s simply an observation based on certain metrics.

To truly understand and diagnose call issues, you will need:

Additional statistics (e.g., accessible via chrome://webrtc-internals for deeper observation and analysis),
A custom call-inspector tool to collect and track you own relevant KPIs you have defined,
Or, consider professional solutions like testRTC, which provides comprehensive call analytics, statistics, and insights.

Because, you need to look at a sufficient number of calls (call quality at scale) to detect meaningful issues. Examining just one call might not provide a clear picture of the main problem.

Now that you understand this entire thought process, you might realize that, in the end, you can stick with your existing MOS computation—Perhaps, it’s not as critical as it may seem. 🙂

If you have a different perspective on any of the technical points discussed, I’d love to hear your thoughts! I’m always eager to deepen my understanding.