To have good WebRTC conversations, you first have to offer the user the right devices to use.
Imagine you’re on a video call with a friend and you can’t hear him because the microphone isn’t the right one. Or you can’t see him because he’s sharing a virtual camera… It’s no fun, is it?
To avoid these situations, it’s important to manage devices correctly in your WebRTC application. And that’s not always easy, especially when you’re connecting and disconnecting devices on the fly or when you’re pairing a new device like AirPods.
Why isn’t it easy? Because each browser implements device management differently, leading to inconsistencies and difficulties in device management between different browsers.
What’s worse is that this API was designed in 2013 and more than 10 years later, we’re still having problems with it.
In this article, I’ll explore how to manage devices in WebRTC applications.
Managing devices in a WebRTC application involves several key steps:
Authorisation : The first step in device management is to request and obtain permission from the user to access their media devices (camera, microphone). This is usually done using the navigator.mediaDevices.getUserMedia()
method, which asks the user for permission to access their camera and/or microphone.
Enumeration: Once access has been granted, the next step is to enumerate the available media devices. This can be done using the navigator.mediaDevices.enumerateDevices()
method. In return, you get a list of available devices, including their IDs, labels and types (i.e. audioinput
, audiooutput
, videoinput
).
Selection: Having listed the devices, the user should be able to select the desired device for their audio and video inputs. Again, this can be done using the navigator.mediaDevices.getUserMedia()
method, but this time with constraints specifying the IDs of the desired devices.
Detection: The final step is to detect changes in the available devices, for example when a new device is connected or an existing device is disconnected. This can be done by listening for the devicechange
event on the navigator.mediaDevices object.
This step is mandatory: The application asks the user for the permission to access the devices for the Website or for the application (SPA).
Why do you need this? Because you don’t want an application to access your camera and microphone without your consent. It’s a question of privacy. For the camera, it is easy to see if it being used, but for the microphone, it is not so easy. How to know if an application is listening to you if you don’t pay attention to the browser toolbar or the system toolbar?
Requesting the permission is done by calling the method navigator.mediaDevices.getUserMedia()
with the type of media you want to get: audio, video or both. The point to understand is that if you don’t request for a specific device, the browser will choose for you (Chrome/Safari) or allow you to select which one to use (Firefox).
So, the way each browser handles the permission is not exactly the same. What they all have in common is that authorization relates to a domain.
In Chrome, by default, you request authorization for all devices of the type you want. Not for a particular device. So once you have access to a camera, you don’t need another permission to use another camera. Additionally, you can authorize for a single session or permanently. You can also block the permission.
In Safari, this is per domain too. But every time you reload the page, you have to ask for the permission again..
In Firefox, you always give the permission for a specific device. If you want to use another device, you have to ask for the permission again. However, Firefox has recently added the option to authorize all devices of the same type (i.e. all cameras or all microphones).
Depending on the choice of the user, the experience may be different when the user wants to switch to a different device.
Please note that the operating system such as MacOS requires an extra permission (the first time you are using the browser) to give globally the permission to the browser to access the devices.
Note that this permission only concerns browsers other than Safari…
Once the authorisation has been accepted, the system will no longer ask you for it.
But what happens if by mistake you refuse this permission?
When getUserMedia
is called, even if the user authorizes the permission, the application cannot access the device. The error received will be different:
In Firefox, it will generate a DOMException: The object can not be found here
.
In Chrome, it will be the error NotAllowedError: Permission denied by system
.
In Chrome, you can deduce that the permission has been refused by the system. In Firefox, it is not so clear.
At any time, you can query this permission thanks to the Permissions API.
Be careful, Firefox is still not managing permissions for the microphone
and the camera
So this API is not working in all browsers.
Here is an example in the latest Chrome Canary 129:
try {const permission = await navigator.permissions.query({ name: 'camera' });console.log(permission.state);// granted, denied, prompt} catch(err) {// Handle the error}
Note: At this time (last update from March’24), only the following permissions have been standardized: geolocation
, notifications
, push
and web-share
. Others are still in the draft.
At any time, a permission can be removed. Not by the application but by the user.
Going to the system settings (on MacOS) can remove globally the browser’s authorization to access the devices for browsers other than Safari.
From the browser settings, you can remove or reset any permissions given for a specific domain.
As previously, if by mistake, the user declines the authorization, your application can detect it and propose to the user to ask for the permission again.
In Chrome, the application receives the error DOMException: Permission denied
In Firefox, the application receives the error DOMException: The request is not allowed by the user agent or the platform in the current context
.
In Safari: this is the error NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission.
Detecting this error can be useful to propose to the user a way to reset the permission.
The getUserMedia
API is an “all-in-one” API meaning that in case of success, you get a stream (of type RTCMediaStream
) containing the tracks (of type RTCMediaStreamTrack
), you’ve asked (i.e. audio, video or both).
So, if you already ask with specific constraints, you don’t need to do extra things. You can directly use the stream.
If you want more information on how to use this API, please refer to this article GetUserMedia Constraints Explained. It is mainly around constraints, but it gives you a good overview of how to use this API.
Here is a simple example of how to ask for the permission to access the camera and the microphone:
const constraints = {audio: true,video: true,};try {const stream = await navigator.mediaDevices.getUserMedia(constraints);} catch(err) {// Handle the error}
Once the permission is granted, you can list the devices available on the system. This is done by calling the method navigator.mediaDevices.enumerateDevices()
.
This method returns a promise that resolves with an array of MediaDeviceInfo
objects. Each object represents a media input or output device such as a microphone, camera, or speaker. The MediaDeviceInfo
object contains information about the device, including its deviceId
, groupId
, kind
(i.e. audioinput
, audiooutput
, videoinput
) and label
.
Here is the result of the enumeration after asking for the same basic constraints (e.i. audio and video) in the different browsers on my machine:
Browser | Total | Audio Input | Audio Output | Video Input |
---|---|---|---|---|
Chrome | 24 | Default +3 physical devices +7 virtual devices | Default + 5 physical devices 5 virtual devices | 1 physical device 1 virtual device |
Firefox | 19 | +3 physical devices +7 virtual devices | + 2 physical devices 5 virtual devices | 1 physical device 1 virtual device |
Safari | 12 | +3 physical devices +7 virtual devices | - | 1 physical device 1 virtual device |
The main differences are:
default
. If you don’t ask for a specific device, Chrome will use this default device, whereas Firefox/Safari take the first one in the list.There are devices capable of simultaneously managing audio and video or audio input and output. For example, a webcam with a built-in microphone or a microphone with built-in speakers such as the equipment used in a meeting room.
It is interesting in this case to group these devices to allow the user to select them as a single device which means that selecting one with automatically select the other device of the same group.
The association is possible thanks to the groupId
attribute of the MediaDeviceInfo
object. This attribute is a unique identifier for the group of devices to which the device belongs.
Two devices have the same group identifier if they belong to the same physical device; for example, a monitor with both a built-in camera and microphone MDN.
As usual, behavior in browsers is different:
Once you have listed the devices, you can allow the user to select the desired devices for audio and video input. This can be done by specifying the device IDs in the constraints object passed to the getUserMedia()
method.
This article explains how to do that this article GetUserMedia Constraints Explained
Here is an example of selecting a microphone
const constraints = {audio: {deviceId: {exact: "0e05387a88dec20949ff8d8d18ee288ed4d7271a8d4380b497feb1432300c4bd",},},};try {const stream = await navigator.mediaDevices.getUserMedia(constraints);} catch(err) {// Handle the error}
Note: Remember to catch the error in case the user denies the permission or the device is not available.
In Firefox and Chrome, you can select the speaker by using the setSinkId()
method on the HTMLMediaElement
object. This method expects the deviceId
of an audioOutput
media device.
Here is an example of how to use the setSinkId()
method:
const audio = document.querySelector('#audio');audio.setSinkId("51D9CC25B5DFDD54160FC1E357577D50116FDA89");
Firefox goes a step further by implementing the API navigator.mediaDevices.selectAudioOutput
. This API allows the user to select the output device from a pop-up window displaying all the devices.
I tested it on Firefox Nightly (130) and this API also offers the built-in speakers.
Please note that this API is not yet completely standardized and requires a gesture from the user to work (for example, by clicking on a button). This is to prevent an application from sending the media to an external speaker without your consent.
This specification is available here: Audio Output Devices API
The interesting part…
From the specification:
The set of media devices, available to the User Agent, has changed. The current list of devices is available in the devices attribute
Currently, the browsers don’t implement this event as defined in the specification.
Here is what I observed:
Browser | Event fired when | devices attribute |
---|---|---|
Chrome | The default device is changed at System level (OS) | NO |
A new device is paired (1x or 2x) | NO | |
A device is removed (1x or 2x) | NO | |
Firefox | A new device is paired (1x) | NO |
A device is removed (1x) | NO | |
Safari | The default device is changed at System level (OS) | NO |
A new device is paired (1x) | NO | |
A device is removed (1x) | NO |
The main things to remember are:
audioinput
, audiooutput
, videoinput
). For example, if you pair your AirPods, you will see two events: one for the input and one for the output.devices
attribute as defined in the specification.As Firefox is not firing events when the default device is changed at the system level (as least on MacOS), your application can be desynchronized mainly in terms of speakers used. This is the main issue I see with Firefox.
The way I decided to prevent this issue is to call enumerateDevices
regularly and to compare with the current list of devices.
By using this way, I can detect when I plug or unplug a device and display a banner to let the user confirm to switch to the new device.
Here, the problem is different. So I choose to wait some milliseconds (i.e. up to 500ms) once receiving a devicechange
event to not capture the second one. As there is no devices
attribute, this second event is not helpful except that you know that they’ve been two changes.
After this delay, I call enumerateDevices
to get the list of devices and update the UI accordingly.
navigator.mediaDevices.addEventListener('devicechange', async () => {if (!hasChanged) {// Avoid to do something on the second eventhasChanged = true;setTimeout(async () => {hasChanged = false;// do something once the 2 events have been fired}, 500);}});
When you open the case to take your AirPods, they’re automatically paired with your Mac. All browsers detect the new devices and add them to the list of devices.
What is interesting is that as long as the AirPods aren’t in your ears, the device is only added to the list of devices.
If you put the AirPods in your ears, Chrome and Safari fire a new devicechange
event. Why? Because the AirPods are now used as the default device for input and output.
If you call enumerateDevices
again, you will see that in Chrome, the default
device changed and in Safari, the AirPods are now the first audioinput
device.
So, don’t take for granted that devicechange
means that at least a device is added or removed. No, it can also mean that the default device is changed. As when this is done manually from the macOS settings.
Assuming you have a MediaStream
stream, you can get the devices used from its tracks.
const devices = await navigator.mediaDevices.enumerateDevices();// Get the tracksconst tracks = stream.getTracks();// Get the deviceId associated to each track from the settingsconst devicesId = tracks.map(track => track.getSettings().deviceId);// Compare this id to the list of devices your application knows to find the right onesconst devicesUsed = devices.filter(device => devicesId.includes(device.deviceId));console.log(devicesUsed);// [ { deviceId: "51D9CC25B5DFDD54160FC1E357577D50116FDA89", kind: "audioinput", label: "Rode NT-Usb", groupId:... }, { deviceId: "E2BF4D17BF7BD448FE0CF9C1141924A9B4FC5237", kind: "videoinput", label: "StreamCam", groupId: ... } ]
The way I found to confirm which speaker is used is to use the sinkId
property of the <audio> or <video> element.
First, find this element in the DOM using a function such as getElementsByTagName
and then if an HTMLAudioElement or HTMLVideoElement exists, get the id of the device used from the sinkId
property.
It should match with an audioinput
device.
In some circumstances, this can be a way to confirm the devices used.
From what has been described here, the main points to consider are:
a) Detect any permissions denied to help the user as soon as possible to resolve this case: Use the query
API or the getUserMedia
API.
b) Show the current or default devices, the ones the user will use for his next call: Store the devices used previously and compare them with the current list of devices, detect if devices is still available of not.
b) Select the new device when the user plugs or pairs a new device, because most probably he wants to use it: rely on the devicechange
event and on the enumerateDevices
API.
c) Group devices when applicable: Use the groupId
attribute and in case of a new device plugged or paired, switch to it globally (audio and video or input and output).
In all cases, feedbacks are the key to helping the user understand what’s going on.
What you don’t want is the user having to reload the page to get the new devices available…
Don’t hesitate to share your views on this topic!