WebRTC Statistics using getStats

By Olivier Anguenot

Published in api

January 30, 2022

11 min read

Introduction

Having statistics

Just an other API... Hum...

Common traps ?

What statistics to use ?

How to react ?

The good new

Having stats is a way to have real time information coming from the WebRTC stack but is it helpful?

Introduction

RTCPeerConnection.getStats() is the API to call to have access to the statistics in real-time. In fact, calling this method, which is a Promise returns the statistics available at this time. If an application needs to collect stats regularly, that method needs to be called several times (eg: using setTimeout() or setInterval()).

Once collected, the application has in its hand a lot of statistics. But having the data is not the hard part of the exercise, the hard part is to interpret these data and to choose the ones that are important to check.

“Audio, video, or data packets transmitted over a peer-connection can be lost, and experience varying amounts of network delay. A web application implementing WebRTC expects to monitor the performance of the underlying network and media pipeline.” extracted from the Identifiers for WebRTC’s Statistics API W3C document.

Having statistics

Before exploring the API getStats(), there is an easy way to have access to the WebRTC Statistics in real-time. This is by using the debug tool provided by the browser it-self. For Chrome, this is by typing chrome://webrtc-internals whereas in Firefox the same tool can be accessed using about:webrtc.

Note: At this time of writing, there is no equivalent tool in Safari but you can have access to the WebRTC logs directly in the console: From the JavaScript console, open the Options Menu, select the Console tab and finally change the WebRTC Logging selector to Basic (only signaling and some partial information from the stack) or Verbose (which includes all the statistics reports collected).

Having these statistics is very helpful when developing, testing and qualifying an application. This information allow to understand where problems can be, by informing about the state of the signalization part as well as the state of the media part. Moreover, these tools display very detailed statistics and graphs in real-time that can be used to understand what happened and how the stack reacts to network disturbances.

However, the application needs sometime to take decisions depending on the quality of the network. For that, the application has to access to these statistics. This is the objective of the getStats API.

Just an other API… Hum…

If we take some seconds to look back on the past, Chrome 23 which has been launched in November’12 officially supported WebRTC (no more behind a flag). The getStats API has been introduced in Chrome 24 but at that time, no specification was available. The result was a first version of the API that was not in line with future specification.

It was not simple for developers that implemented this first version to then support Firefox that was based on the specification.

More than that, this API does not just return a JSON with 4 or five values. The getStats returns a list of available reports. Each report contains statistics.

The specification defines a lot of reports:

{
  "codec",
  "inbound-rtp",
  "outbound-rtp",
  "remote-inbound-rtp",
  "remote-outbound-rtp",
  "media-source",
  "csrc",
  "peer-connection",
  "data-channel",
  "stream",
  "track",
  "transceiver",
  "sender",
  "receiver",
  "transport",
  "sctp-transport",
  "candidate-pair",
  "local-candidate",
  "remote-candidate",
  "certificate",
  "ice-server"
};

As a developer, the first step is to understand where to find what…

The most important reports to understand are:

inbound-rtp: Statistics about the inbound RTP (ie: inbound data stream you are receiving from your peer)
outbound-rtp and remote-inbound-rtp: Statistics about the outbound RTP (ie: outbound data stream you are sending, computed locally in case of the outbound-rtp report or from the remote peer in case of the remote-inbound-rtp).
candidate-pair: Statistics about the ICE Candidate pair to identify the pair in used and the associate data.
local-candidate and remote-candidate: Statistics about the local candidate used as well as the remote candidate. ID of the candidate used is reported in the candidate-pair report.
codec: Statistics about the codec used.

Common traps ?

They are some pitfalls to pay attention in order to be able to use the statistics in a good way.

Browsers’ differentiation

An other way to say that browsers are not at the same level when using the API getStats which means that statistics as well as properties can or not be available depending on the browser.

Without any suspense, Chrome implements the most of the stats following by Safari and the last one is Firefox…

The W3C has a web page where the getStats API can be tested.

In details:

Browser	Tests Passed	Tests Failed	Percent
Chrome	203	107	65%
Safari	156	154	50%
Firefox	88	222	28%

If that web page is refreshed and in line with the specification, the result is poor for all browsers. Even in Chrome, we are far away from the specification.

Hopefully, the basic reports and properties are available to compute a MOS score.

But here comes the second point…

Specification in progress

Another way to say that it is complicated to follow the implementation in browsers. To be honest, the Identifiers for WebRTC’s Statistics API document is a Candidate Recommendation Draft. The current edition is the 8th in that state since October 2020. This is still not a Proposed Recommendation and so not a Standard.

For example, in the latest edition, the networkType property has been noted as deprecated (for privacy reason) and the statsended event has been removed (never seen in action…).

Have a look to the release notes of the browser to know the latest changes.

A black box

As mentioned in the specification, the browser has no control over how often statistics are sampled. Which means that the application has for the reports at a time and get the latest available information. It the application asks too often which means for example every 50ms, the same statistics are returned (same timestamp). If the application asks rarely, intermediate stats will not be retrieved. So it is up to the application to define a reasonable delay before asking statistics.

In a general manner, statistics can be asked every second. But for longer calls or for computing MOS score, collecting statistics every 4 or 5 seconds give good results.

Absent report

Be careful, don’t take for granted that all reports are collected.

If you call getStat() too early, which means for example at 500ms after the connected event (from the connectionState), the WebRTC will not generate the RTCRemoteInboundRTPAudio and RTCRemoteInboundRTPVideo reports.

If you wait less than 2 seconds, in most of the case, the RTCRemoteInboundRTPAudio will still be missing.

So jitter, packetsLoss and rtt are not available for you outgoing stream during the 2 first seconds once connected.

Absent property

Be careful, don’t take for granted the fact that a property exists in a report.

If the browser doesn’t have the time to compute a new value, the property could be missing in the next report. This is the case for example for the roundTripTime property reported for audio in Chrome. Approximately, this property is reported in Chrome every 5 second only. In Firefox, this is not the case but instead, the same value is reported several times… To avoid any mismatch, the specification added a counter that increments each time there is a new value available. But this property is not yet implemented in Firefox…

Take time to read the specification because there are hints on the way to compute average for some properties.

Order of reports don’t matter

Once the reports have been received, the application can start iterating on each in order to collect the information. Sometimes, the application needs to collect information in a report (for example an id) and then use it to retrieve a specific report (for example the local-candidate). It is not because you received reports in a certain order at a time that the application will receive the next reports in the same order.

Store temporary information such as identifier to be able to retrieve the right report to use or to deduce some global statistics based on properties available in different reports.

Use the timestamp and not the frequency

Most of the time, your application asks repeatedly for the statistics using the same frequency delay between two requests. For example, your application collects the statistics every 5 seconds.

But, this is not because you have a fixed delay that you will receive the statistics exactly with that delay. That’s why there is a timestamp in each report.

And depending on the use of your CPU, this delay can fluctuate even more… I did some tests and found an average difference of 8ms gap between the requested frequency and the calculated difference between 2 consecutive timestamps. If you collect statistics every 2 seconds, you can have an error margin of 0.4%.

Always compute intermediate statistics based on timestamps difference rather than on the imposed collection time.

Take care of the timestamp

This is not because statistics come together that the timestamp field of each is equals.

For the statistics that are collected remotely such as the remote-inbound-rtp, the timestamp is different than the others. In the tests I made, I discover sometime a variation around 2 seconds. So the consequence if you compute the MOS or if you display these stats all together in a graph is that you can have strange gaps between values computed locally such as inbound-rtp or outbound-rtp and those computed remotely (remoteInbound). The timestamp of the remoteInbound corresponds to the time the report has been received locally.

In the same way, the timestamp of two consecutive reports computed remotely could be the same meaning that no new statistic has been collected. If you display a graph, you should not add an extra point due to the wrong timestamp used (local instead of the remote).

Do not take the timestamp of the first local report of a series as a reference for computing all statistics of other reports of that series. Each report has its own timestamp meaning that reports could be collected at different moments which is always the case for the remote ones.

Statistics take time

(To keep in mind). Depending on the computer, tablet or mobile device used, collecting stats as well as computing metrics take time. It can be a problem if the application is trying to encode and decode several streams with several PeerConnections. It can hurt the performance of the application (introduce latency when decoding or encoding streams…).

You can measure the time it takes for collecting and computing the stats or rely on the new qualityLimitationReason introduced recently in Chrome to detect any CPU issues.

What statistics to use ?

This paragraph lists some interesting properties to collect.

Round Trip Time and Jitter

The Round Trip Time measures the time it takes to send a request and have a response. It is a good indicator of the speed of the network. Lower is the best. RTT can be collected from the remote-inbound-rtp reports (audio and video) as well as from the candidate-pair report (based on STUN requests).

The Jitter measures the variation in packet delay. High values and a high variability of the values is a good indicator to diagnostic quality problems. To have a good audio and video quality, network should be the more stable as possible. Jitter can be collected in the remote-inbound-rtp reports for my outbound streams as well as in the inbound-rtp reports for my inbound streams.

Bytes sent and received

The bytesSent and bytesReceived properties allow to measure your incoming and outgoing bitrate in real-time. For that, you need to take 2 consecutive reports to subtract these values and to divide by the duration of that interval obtained from the timestamp property.

Quality limitation reason

The qualityLimitationReason property gives the reason why your outgoing stream could be downgraded. If the value obtained is not equal to none, this means that something is limiting your outgoing stream to be less than 100% of the quality requested. This can be a problem with your bandwidth (value equals to bandwidth), a problem with your computer or device that is solicited too much (value equals to cpu) or another reason (value equals to other).

Having information about the CPU is very important to be able to detect the cause of the bad quality. The bandwidth is not always the reason…

Packets lost, NACK, FIR, PLI

The packetsLost property allows to know if the packets transmitted for the incoming and the outgoing streams have been lost somewhere across the network. It is not because there are packets lost that the perception of the quality will decrease: WebRTC has some internal mechanisms to retransmit packets loss such as the Forward Error Connection (FEC) used in Opus. But having the information that packets are lost is good information of the quality of the network. Packets lost can be collected in the inbound-rtp reports for the incoming streams and in remote-inbound-rtp for the outgoing streams.

The nackCount (Negative Acknowledgement) property is a good indicator of packets loss because it indicates how many times, the WebRTC stack is asking the recipient to retransmit packets and at the opposite, how many times the recipient asked your stack to retransmit packets. A good advice is to reduce the bitrate in case your recipient sent you nack.

The firCount (Full Intra Request) property is another indicator. When a FIR message is received, it indicates that the recipient is no more able to render the stream correctly and so a complete frame should be sent instead of just the delta.

The pliCount (Picture Loss Indication) property is quite similar. When the recipient lost a full frame, he sent a PLI message to indicate to the sender to resent a full frame.

Nack, FIR and PLI are only for the video streams. These properties can be collected in the inbound-rtp reports and in the outbound-rtp reports.

These indicators tell the application about the quality of the network in term of loss. In most of the case, reducing the bitrate should help to recover quickly a better situation. Don’t forget that the WebRTC stack will adapt your outgoing stream in term of quality automatically.

How to react ?

Display a quality indicator

Most of the application starts by displaying a quality indicator to indicate to the users that the quality is not good when it comes. Most of the case, it is to indicate that the network is not good. In fact, this is UX reaction to inform to say to the users “Hey, don’t worry, this is under control…“.

Most of the case, that indicator is based on the Mean Opinion Score (MOS) which is a score between 1 and 4.5 that informs about the quality of the call (audio part). When the MOS is below 3.6, the perception starts to change from good to bad. And when the score is bellow 3 during some time, the user needs to make some extra effort to listen and understand the speaker. Finally, when the score is below 2, conversation is no more understandable.

Computing MOS can be done by implementing by computing a R-Value than can be then converted into a score. For that, you will need to get the percent of packet lost, the RTT and the Jitter. Then method to compute can be found here: Monitoring VoIP call using improved simplified E-model

Changing the bitrate

As soon previously, the application has the possibility to control or said differently to constraint the bitrate used by the stack.

This can be done prior to the call by modifying the SDP and by adding a new line with

# For Chrome/Safari
b=AS:500" //limit to 500kbits

# For Firefox
b=TIAS:500000

During the call, it is possible to change “on the fly” the bandwidth used by using the function setParameters() on the RTCRTPSender as in the following

const sender = peerConnection.getSenders()[0];

// For compatibility reason
const parameters = sender.getParameters();
if (!parameters.encodings) {
  parameters.encodings = [{}];
}
// Limit to 500kbps
parameters.encodings[0].maxBitrate = 500000;
sender.setParameters(parameters).then(()=> {
  // In case of success
  console.log("limiter set");
}).catch(err => {
  // In case of error
  console.error(err);
});

// To remove the limiter, property maxBitrate should be removed
delete parameters.encodings[0].maxBitrate;
sender.setParameters(parameters).then(() => {
// In case of success
  console.log("limiter removed");
}).catch(err => {
  // In case of error
  console.error(err);
})

Note: A other possibility is to scale the resolution down by using the parameter scaleResolutionDownBy. For example, with a value of 2, it results in a video 1/4 the size of the original. But as you will see, decreasing the bitrate with decrease the quality of the video to adapt to the given bitrate.

Limiting the bandwidth (in both sides) is a way to decrease congestion of the network during some time. Once statistics look good again, this limitation can be removed.

Displaying less video (SFU)

When in a conference with a lot of videos of participants displayed, if the application detects that there is an issue with the CPU (when using the qualityLimitationReason property), the application can decide to render less videos (for example, instead of rendering 6 videos, only 4 will be rendered at the same time).

A CPU limitation can appear at any time but most often on tablets where CPU have less power. But if you have several applications opened in your computed during the call with one that takes a lot of CPU, you could have the same problem: This will affect the encoding and the decoding of streams.

The good new

Hopefully, WebRTC is a robust stack that does everything possible to keep in priority the audio part with the best possible quality and then try to keep the video part in best effort but with the best possible resolution and framerate.

That’s why, when starting a call, the WebRTC stack never send high quality streams even if the application asked for a 720P video stream. Depending on the bandwidth estimation, it can start at 15% of the quality asked and then gradually increase the resolution. If everything looks good, after around 30 seconds, the video reaches 100% of the resolution asked. And at any time, if the WebRTC stack detects a limitation, as explained, the encoded resolution will decrease to adapt (best-effort).