Managing Devices in WebRTC

By Olivier Anguenot

Published in dev

July 29, 2024

9 min read

Introduction

Permissions

Devices enumeration

Grouping devices together

Input devices selection

Output devices selection

Devices Changed

Alternatives

To conclude

To have good WebRTC conversations, you first have to offer the user the right devices to use.

Imagine you’re on a video call with a friend and you can’t hear him because the microphone isn’t the right one. Or you can’t see him because he’s sharing a virtual camera… It’s no fun, is it?

To avoid these situations, it’s important to manage devices correctly in your WebRTC application. And that’s not always easy, especially when you’re connecting and disconnecting devices on the fly or when you’re pairing a new device like AirPods.

Why isn’t it easy? Because each browser implements device management differently, leading to inconsistencies and difficulties in device management between different browsers.

What’s worse is that this API was designed in 2013 and more than 10 years later, we’re still having problems with it.

In this article, I’ll explore how to manage devices in WebRTC applications.

Introduction

Managing devices in a WebRTC application involves several key steps:

Authorisation : The first step in device management is to request and obtain permission from the user to access their media devices (camera, microphone). This is usually done using the navigator.mediaDevices.getUserMedia() method, which asks the user for permission to access their camera and/or microphone.
Enumeration: Once access has been granted, the next step is to enumerate the available media devices. This can be done using the navigator.mediaDevices.enumerateDevices() method. In return, you get a list of available devices, including their IDs, labels and types (i.e. audioinput, audiooutput, videoinput).
Selection: Having listed the devices, the user should be able to select the desired device for their audio and video inputs. Again, this can be done using the navigator.mediaDevices.getUserMedia() method, but this time with constraints specifying the IDs of the desired devices.
Detection: The final step is to detect changes in the available devices, for example when a new device is connected or an existing device is disconnected. This can be done by listening for the devicechange event on the navigator.mediaDevices object.

Permissions

No common rules?

This step is mandatory: The application asks the user for the permission to access the devices for the Website or for the application (SPA).

Why do you need this? Because you don’t want an application to access your camera and microphone without your consent. It’s a question of privacy. For the camera, it is easy to see if it being used, but for the microphone, it is not so easy. How to know if an application is listening to you if you don’t pay attention to the browser toolbar or the system toolbar?

Requesting the permission is done by calling the method navigator.mediaDevices.getUserMedia() with the type of media you want to get: audio, video or both. The point to understand is that if you don’t request for a specific device, the browser will choose for you (Chrome/Safari) or allow you to select which one to use (Firefox).

So, the way each browser handles the permission is not exactly the same. What they all have in common is that authorization relates to a domain.

In Chrome, by default, you request authorization for all devices of the type you want. Not for a particular device. So once you have access to a camera, you don’t need another permission to use another camera. Additionally, you can authorize for a single session or permanently. You can also block the permission.
In Safari, this is per domain too. But every time you reload the page, you have to ask for the permission again..
In Firefox, you always give the permission for a specific device. If you want to use another device, you have to ask for the permission again. However, Firefox has recently added the option to authorize all devices of the same type (i.e. all cameras or all microphones).

Depending on the choice of the user, the experience may be different when the user wants to switch to a different device.

System permissions

Please note that the operating system such as MacOS requires an extra permission (the first time you are using the browser) to give globally the permission to the browser to access the devices.

Note that this permission only concerns browsers other than Safari…

Once the authorisation has been accepted, the system will no longer ask you for it.

But what happens if by mistake you refuse this permission?

When getUserMedia is called, even if the user authorizes the permission, the application cannot access the device. The error received will be different:

In Firefox, it will generate a DOMException: The object can not be found here.
In Chrome, it will be the error NotAllowedError: Permission denied by system.

In Chrome, you can deduce that the permission has been refused by the system. In Firefox, it is not so clear.

Permissions API

At any time, you can query this permission thanks to the Permissions API.

Be careful, Firefox is still not managing permissions for the microphone and the camera So this API is not working in all browsers.

Here is an example in the latest Chrome Canary 129:

try {
  const permission = await navigator.permissions.query({ name: 'camera' });
  console.log(permission.state);
  // granted, denied, prompt
} catch(err) {
  // Handle the error
}

Note: At this time (last update from March’24), only the following permissions have been standardized: geolocation, notifications, push and web-share. Others are still in the draft.

Remove or reset permissions

At any time, a permission can be removed. Not by the application but by the user.

Going to the system settings (on MacOS) can remove globally the browser’s authorization to access the devices for browsers other than Safari.
From the browser settings, you can remove or reset any permissions given for a specific domain.

As previously, if by mistake, the user declines the authorization, your application can detect it and propose to the user to ask for the permission again.

In Chrome, the application receives the error DOMException: Permission denied
In Firefox, the application receives the error DOMException: The request is not allowed by the user agent or the platform in the current context.
In Safari: this is the error NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission.

Detecting this error can be useful to propose to the user a way to reset the permission.

To resume

The getUserMedia API is an “all-in-one” API meaning that in case of success, you get a stream (of type RTCMediaStream) containing the tracks (of type RTCMediaStreamTrack), you’ve asked (i.e. audio, video or both).

So, if you already ask with specific constraints, you don’t need to do extra things. You can directly use the stream.

If you want more information on how to use this API, please refer to this article GetUserMedia Constraints Explained. It is mainly around constraints, but it gives you a good overview of how to use this API.

Here is a simple example of how to ask for the permission to access the camera and the microphone:

const constraints = {
    audio: true,
    video: true,
  };

  try {
    const stream = await navigator.mediaDevices.getUserMedia(constraints);
  } catch(err) {
    // Handle the error
  }

Devices enumeration

Once the permission is granted, you can list the devices available on the system. This is done by calling the method navigator.mediaDevices.enumerateDevices().

This method returns a promise that resolves with an array of MediaDeviceInfo objects. Each object represents a media input or output device such as a microphone, camera, or speaker. The MediaDeviceInfo object contains information about the device, including its deviceId, groupId, kind (i.e. audioinput, audiooutput, videoinput) and label.

Here is the result of the enumeration after asking for the same basic constraints (e.i. audio and video) in the different browsers on my machine:

Browser	Total	Audio Input	Audio Output	Video Input
Chrome	24	Default +3 physical devices +7 virtual devices	Default + 5 physical devices 5 virtual devices	1 physical device 1 virtual device
Firefox	19	+3 physical devices +7 virtual devices	+ 2 physical devices 5 virtual devices	1 physical device 1 virtual device
Safari	12	+3 physical devices +7 virtual devices	-	1 physical device 1 virtual device

The main differences are:

Chrome adds the default devices (input and output) to the list of devices. In fact, default devices are existing devices with the id that have been replaced by default. If you don’t ask for a specific device, Chrome will use this default device, whereas Firefox/Safari take the first one in the list.
Safari still doesn’t support output devices
Firefox does not display the built-in speakers: The one integrated to my Mac Mini and those integrated to my 2 monitors (HDMI and DisplayPort).

Grouping devices together

There are devices capable of simultaneously managing audio and video or audio input and output. For example, a webcam with a built-in microphone or a microphone with built-in speakers such as the equipment used in a meeting room.

It is interesting in this case to group these devices to allow the user to select them as a single device which means that selecting one with automatically select the other device of the same group.

The association is possible thanks to the groupId attribute of the MediaDeviceInfo object. This attribute is a unique identifier for the group of devices to which the device belongs.

Two devices have the same group identifier if they belong to the same physical device; for example, a monitor with both a built-in camera and microphone MDN.

As usual, behavior in browsers is different:

Firefox groups devices as well as all virtual devices
Chrome groups devices and not all virtual devices (e.g. ok for Teams audio devices but not for the Rode Connect virtual devices)
Safari doesn’t care about grouping devices… (not yet?)

Input devices selection

Once you have listed the devices, you can allow the user to select the desired devices for audio and video input. This can be done by specifying the device IDs in the constraints object passed to the getUserMedia() method.

This article explains how to do that this article GetUserMedia Constraints Explained

Here is an example of selecting a microphone

const constraints = {
  audio: {
    deviceId: {
      exact: "0e05387a88dec20949ff8d8d18ee288ed4d7271a8d4380b497feb1432300c4bd",
    },
  },
};

try {
  const stream = await navigator.mediaDevices.getUserMedia(constraints);
} catch(err) {
  // Handle the error
}

Note: Remember to catch the error in case the user denies the permission or the device is not available.

Output devices selection

In Firefox and Chrome, you can select the speaker by using the setSinkId() method on the HTMLMediaElement object. This method expects the deviceId of an audioOutput media device.

Here is an example of how to use the setSinkId() method:

const audio = document.querySelector('#audio');
audio.setSinkId("51D9CC25B5DFDD54160FC1E357577D50116FDA89");

Firefox goes a step further by implementing the API navigator.mediaDevices.selectAudioOutput. This API allows the user to select the output device from a pop-up window displaying all the devices.

I tested it on Firefox Nightly (130) and this API also offers the built-in speakers.

Please note that this API is not yet completely standardized and requires a gesture from the user to work (for example, by clicking on a button). This is to prevent an application from sending the media to an external speaker without your consent.

This specification is available here: Audio Output Devices API

Devices Changed

The interesting part…

From the specification:

The set of media devices, available to the User Agent, has changed. The current list of devices is available in the devices attribute

Currently, the browsers don’t implement this event as defined in the specification.

Here is what I observed:

Browser	Event fired when	devices attribute
Chrome	The default device is changed at System level (OS)	NO
	A new device is paired (1x or 2x)	NO
	A device is removed (1x or 2x)	NO
Firefox	A new device is paired (1x)	NO
	A device is removed (1x)	NO
Safari	The default device is changed at System level (OS)	NO
	A new device is paired (1x)	NO
	A device is removed (1x)	NO

The main things to remember are:

Firefox seems not to fire events when the default device is changed at the system level.
Firefox fires events only when the browser is active. Else, the event is fired as soon as the browser gets the focus.
Chrome fires events for each type of device added or removed (i.e. audioinput, audiooutput, videoinput). For example, if you pair your AirPods, you will see two events: one for the input and one for the output.
None of the browsers fires events with the devices attribute as defined in the specification.

Firefox case

As Firefox is not firing events when the default device is changed at the system level (as least on MacOS), your application can be desynchronized mainly in terms of speakers used. This is the main issue I see with Firefox.

The way I decided to prevent this issue is to call enumerateDevices regularly and to compare with the current list of devices.

By using this way, I can detect when I plug or unplug a device and display a banner to let the user confirm to switch to the new device.

Chrome double event

Here, the problem is different. So I choose to wait some milliseconds (i.e. up to 500ms) once receiving a devicechange event to not capture the second one. As there is no devices attribute, this second event is not helpful except that you know that they’ve been two changes.

After this delay, I call enumerateDevices to get the list of devices and update the UI accordingly.

navigator.mediaDevices.addEventListener('devicechange', async () => {
  if (!hasChanged) {
    // Avoid to do something on the second event
    hasChanged = true;
    setTimeout(async () => {
      hasChanged = false;
      // do something once the 2 events have been fired
    }, 500);
  }
});

Managing AirPods

When you open the case to take your AirPods, they’re automatically paired with your Mac. All browsers detect the new devices and add them to the list of devices.

What is interesting is that as long as the AirPods aren’t in your ears, the device is only added to the list of devices.

If you put the AirPods in your ears, Chrome and Safari fire a new devicechange event. Why? Because the AirPods are now used as the default device for input and output.

If you call enumerateDevices again, you will see that in Chrome, the default device changed and in Safari, the AirPods are now the first audioinput device.

So, don’t take for granted that devicechange means that at least a device is added or removed. No, it can also mean that the default device is changed. As when this is done manually from the macOS settings.

Alternatives

Input devices used

Assuming you have a MediaStream stream, you can get the devices used from its tracks.

const devices = await navigator.mediaDevices.enumerateDevices();

// Get the tracks
const tracks = stream.getTracks();

// Get the deviceId associated to each track from the settings
const devicesId = tracks.map(track => track.getSettings().deviceId);

// Compare this id to the list of devices your application knows to find the right ones
const devicesUsed = devices.filter(device => devicesId.includes(device.deviceId));

console.log(devicesUsed);
// [ { deviceId: "51D9CC25B5DFDD54160FC1E357577D50116FDA89", kind: "audioinput", label: "Rode NT-Usb", groupId:... }, { deviceId: "E2BF4D17BF7BD448FE0CF9C1141924A9B4FC5237", kind: "videoinput", label: "StreamCam", groupId: ... } ]

Output devices used

The way I found to confirm which speaker is used is to use the sinkId property of the <audio> or <video> element.

First, find this element in the DOM using a function such as getElementsByTagName and then if an HTMLAudioElement or HTMLVideoElement exists, get the id of the device used from the sinkId property.

It should match with an audioinput device.

In some circumstances, this can be a way to confirm the devices used.

To conclude

From what has been described here, the main points to consider are:

a) Detect any permissions denied to help the user as soon as possible to resolve this case: Use the query API or the getUserMedia API.
b) Show the current or default devices, the ones the user will use for his next call: Store the devices used previously and compare them with the current list of devices, detect if devices is still available of not.
b) Select the new device when the user plugs or pairs a new device, because most probably he wants to use it: rely on the devicechange event and on the enumerateDevices API.
c) Group devices when applicable: Use the groupId attribute and in case of a new device plugged or paired, switch to it globally (audio and video or input and output).

In all cases, feedbacks are the key to helping the user understand what’s going on.

What you don’t want is the user having to reload the page to get the new devices available…

Don’t hesitate to share your views on this topic!