Sometimes users complaint about not being able to hear the sound from one person in a conference or from their recipients when in a peer-to-peer call.
If the problem is not located in the receiver side, it could be the problem of the emitter. This article focus on detecting if the microphone really works well which means if the microphone captures audio data or not.
Verifying that the microphone works could be done prior to the call. For sure, this is the good moment to check before engaging the conversation. But during the call, the user can manipulate his microphone or his computer, and so he can introduce “bad things” accidentally or not… So having a check in “live” during the call is interesting too, to detect any troubles and to assist the user to recover from this situation.
In general, two cases are interesting to detect:
When the user speaks and his microphone is muted physically or is not working correctly
When the microphone is opened (not muted by the application) and no sound is detected
In these 2 cases, the user thinks that his recipients hear him but in reality, nobody is able to hear him.
For this article, I used the following microphones:
Rode NT-USB Microphone which works in stereo
Konftel Ego that mixes a microphone and a loudspeaker
The first think to do when dealing with devices is to check that the navigator is able to access them.
This action can be done by checking the permission to access the device using the Permissions API. This API checks if the browser has already the authorization to access and to use the microphone.
It is interesting to know if the user has denied the authorization because often he did not pay attention to the authorization request and denies it accidentally.
const permission = await navigator.permissions.query({ name: 'microphone' });if (permission.state === 'granted') {// OK - Access has been granted to the microphone} else if(permission.state === 'denied') {// KO - Access has been denied. Microphone can't be used} else {// Permission should be asked}permission.onchange = () => {// React when the permission changed}
And by listening to the event change
, the application is able to react when the permission changes.
Note: At this time of writing, the Permissions API (for microphone) works only in Chrome
Input audio from the microphone can be captured using the getUserMedia
function from the MediaDevices APIs. When the browser can’t access the device, an error is thrown and the application can react.
try {const stream = await navigator.mediaDevices.getUserMedia({audio: true});} catch (err) {// Errors when accessing the device}
Then, the application can check that an active audio track exists. An active audio track is a track that is actively sending media (audio) data.
const audioTracks = stream.getAudioTracks();if (audioTracks.length === 0) {// No audio from microphone has been capturedreturn;}// We asked for the microphone so one trackconst track = audioTracks[0];if (track.muted) {// Track is muted which means that the track is unable to provide media data.// When muted, a track can't be unmuted.// This track will no more provide data...}if (!track.enabled) {// Track is disabled (muted for telephonist) which means that the track provides silence instead of real data.// When disabled, a track can be enabled again.// When in that case, user can't be heard until track is enabled again.}if (track.readyState === "ended") {// Possibly a disconnection of the device// When ended, a track can't be active again// This track will no more provide data}
When the state of a track changes, events are fired to inform the application
track.addEventListener("ended", () => {// Which means track.readyState = "ended"});track.addEventListener("mute", () => {// Which means track.enabled = false});track.addEventListener("unmute", () => {// Which means track.enabled = true});
If a track fires the event ended
(property readyState
goes to ended
), the track is terminated and becomes obsolete: No more audio data will be received.
Audio APIs are a set of APIs that can be used to manipulate audio in a way to build an audio pipeline where the audio stream crosses nodes that can access and modify it before passing that transformed audio stream to the next one until the final output node.
Here, with these APIs, we can plug an AnalyzerNode to look at the signal generated by the microphone.
// Get the streamconst stream = await navigator.mediaDevices.getUserMedia({ audio: true});// Create and configure the audio pipelineconst audioContext = new AudioContext();const analyzer = audioContext.createAnalyser();analyzer.fftSize = 512;analyzer.smoothingTimeConstant = 0.1;const sourceNode = audioContext.createMediaStreamSource(stream);sourceNode.connect(analyzer);// Analyze the soundsetInterval(() => {// Compute the max volume level (-Infinity...0)const fftBins = new Float32Array(analyzer.frequencyBinCount); // Number of values manipulated for each sampleanalyzer.getFloatFrequencyData(fftBins);// audioPeakDB varies from -Infinity up to 0const audioPeakDB = Math.max(...fftBins);// Compute a wave (0...)const frequencyRangeData = new Uint8Array(analyzer.frequencyBinCount);analyzer.getByteFrequencyData(frequencyRangeData);const sum = frequencyRangeData.reduce((p, c) => p + c, 0);// audioMeter varies from 0 to 10const audioMeter = Math.sqrt(sum / frequencyRangeData.length);}, 100);
Using the variables audioPeakDB
and audioMeter
, the application can deduce the level of sound and display something on screen representing the activity of the microphone.
And by mixing all together the Audio APIs, the GetUserMedia APIs and the Permissions APIs, we can have a clear view of the parameters to control: The following table summarizes the different cases where the application could consider that no audible sound will be sent to the recipient(s).
API | Status | Description |
---|---|---|
Permissions API | state=denied | Access to the device has not been granted |
GetUserMedia API | throws an error | Error when accessing the device |
GetUserMedia API | No audio track | No audio captured (never seen such problem) |
MediaStreamTrack API | track.muted = true | Audio has been muted for that stream (eg: direction =“recvonly” |
MediaStreamTrack API | track.readyState = ended | Device is disconnected or track is unable to provide audio data anymore |
MediaStreamTrack API | track.enabled = false | Audio is temporaly inactive |
Audio API | peakDBLevel = -Infinity | Audio is (temporaly) inactive |
From the previous paragraph, we can deduce a number of situations where the application is interesting to know if the microphone works well or not.
All the information gathered allow deducing 8 states:
States | How to detect? | Status |
---|---|---|
Active sound (Voice activity or loud sound) | When the audioPeakDB is above -50db | OK |
Background noise | When the audioPeakDB is below -50db and audioMeter > 0 | OK |
Quiet | When audioMeter equals 0 and audioPeakDB is different than -Infinity | OK |
Disabled “In-app” | When track.enabled is equals to false | OK |
Muted “In-app” | When track.muted is equals to true | OK |
Muted | When audioMeter equals to 0 and audioPeakDB goes to -Infinity (which is below -900db) | ? |
Ended | When track.readyState is equals to ended | KO |
Not accessible | When permission is equals to denied or when getUserMedia throws an error | KO |
This information is useful to identify the status of the microphone and so to detect an eventual problem:
Using an Analyzer and setInterval
in not the optimized way to deal with the Audio API.
If the sound is analyzed too often (short interval = some ms) and during a long time, it will affect the performance of the application.
This computation can be optimized by using a Worklet:
The Worklet interface is a lightweight version of Web Workers and gives developers access to low-level parts of the rendering pipeline. With Worklets, you can run JavaScript and WebAssembly code to do graphics rendering or audio processing where high performance is required. MDN Web Docs.
By using an AudioWorklet, the audio processing is done out of the main thread. As the AudioNode, the AudioWorkletProcessor, processes 128 frames at a time. This ensures to add no extra additional latency, but if you want to work on more frames, you will need to implement your own buffer.
Here is an example of an AudioWorkletProcessor.
// Put this code to a file name audioMeter.jsconst SMOOTHING_FACTOR = 0.99;class AudioMeter extends AudioWorkletProcessor {constructor() {super();this._volume = 0;this.port.onmessage = ( event ) => {// Deal with message received from the main thread - event.data};}process(inputs, outputs, parameters) {const input = inputs[0];const samples = input[0];const sumSquare = samples.reduce((p, c) => p + (c * c), 0);const rms = Math.sqrt(sumSquare / (samples.length || 1));this._volume = Math.max(rms, this._volume * SMOOTHING_FACTOR);this.port.postMessage({volume: this._volume});// Don't forget to return true - else worklet is endedreturn true;}}registerProcessor('audioMeter', AudioMeter);
The AudioWorkletProcessor can be loaded and executed from the application
// Get the audio streamconst stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });// Create the Audio Contextconst audioContext = new AudioContext();const source = audioContext.createMediaStreamSource(stream);// Load the workletawait audioContext.audioWorklet.addModule('./audioMeter.js');const node = new AudioWorkletNode(audioContext, 'audioMeter');node.port.onmessage = (event) => {// Deal with message received from the Worklet processor - event.data};// Connect the audio pipeline - this will start the processingsource.connect(node).connect(audioContext.destination);
A more complete description of AudioWorklet can be found here.
Using AudioWorklet, your application is able to monitor the microphone during a long period of time without having to worry about performance.
Note: Be careful to embed your worklet file when using Webpack
Now that we know how to interpret the microphone state, we can see what happens when the user manipulates his computer or his microphone.
When an external microphone is plugged to the computer, the user can interact with it and sometimes put it in a wrong way: Cable can be unplugged, physical mute button can be pressed unintentionally…
The following table summarizes what can be detected
Actions | All browsers | State |
---|---|---|
Pressing on the mute button | peakDBLevel = -Infinity | Muted |
Disconnecting the device (unplugged / bluetooth disconnected) | track.readyState = endedpeakDBLevel = -Infinity | Ended |
Note: The 3 major browsers detect these changes.
Note 2: Be careful with devices that have both a microphone and a speaker. Detection of the Muted state not always work. It is like there is always some noise that prevent to move to that state (-Infinity).
On macOS (this should be the same on other OS), we can go to the Sound panel in the System Preferences and modify the input level of the microphone.
This has an impact when using the Audio API because level captured will be different: lower if you decrease the input level or higher if you increase it.
Actions | Chrome | State |
---|---|---|
Put the input level at 1% | Average level is around 40db lower than when at 100% | Active Sound Background-Noise Quiet |
Put the input level to 0% | Track.enabled switched to true. Average level goes down do -Infinity | Muted “in-app” |
Note: Chrome fires the event mute and unmute when the input level is at 0%
Actions | Safari & Firefox | State |
---|---|---|
Put the input level at 1% | Average level is around 40db lower than when at 100% | Active Sound Background-Noise Quiet |
Put the input level to 0% | Average level goes down do -Infinity | Muted “in-app” |
Note: No event mute/unmute in Safari/Firefox when the input level reaches 0%
In Safari, the user has the possibility to mute or unmute the microphone during a call directly from the browser itself. This action is available by clicking on the microphone icon located at the end of the URL field.
Actions | Safari | State |
---|---|---|
Click on the microphone button to mute/unmute the microphone | mute/unmute event is firedTrack.enabled switched to false/trueAverage level goes down to -Infinity | Muted “in-app” |
Authorization can be disabled during a call Chrome. This action is available by clicking on the microphone icon located in the URL field.
Actions | Chrome | State |
---|---|---|
Click on the microphone button to remove the authorization | Permission changed to denied Event ended is firedTrack.readyState goes to endedAverage level goes down to -Infinity | Not accessible |
Authorization can be disabled during a call Firefox. This action is available by clicking on the microphone icon located in the URL field.
Actions | Chrome | State |
---|---|---|
Click on the microphone button to remove the authorization | Event ended is firedTrack.readyState goes to endedAverage level goes down to -Infinity | Ended |
When in call, the application can propose to mute the microphone. The easiest way to do that is to manipulate the MediaStreamTrack by putting the property enabled
to false. True is used to unmute the microphone.
Actions | All browsers | State |
---|---|---|
Mute/Unmute the microphone from the application | track.enabled goes to false/trueAverage level goes down to -Infinity | Muted “in-app |
If by default, a virtual microphone (virtual audio recorder) is selected in the System Preferences or if the user selects it accidentally, this can lead to troubles too.
Actions | All browsers | State |
---|---|---|
Using a virtual device | Got error A MediaStreamTrack ended due to a capture failure Event ended is firedTrack.readyState goes to endedAverage level is always equals to -Infinity | Ended |
Actions | All browsers | State |
---|---|---|
Using a virtual device | Average level is always equals to -Infinity | Muted |
Monitoring the microphone can be done at 2 levels: At the Device level (for the permission and the authorization) and at the Media level (by using the MediaStreamTrack and the MediaStream WebRTC interfaces).
Having that monitoring in place can be helpful to prevent users from complaining about sound issues. But as seen in that article, there is a number of different cases to handle…
To remember: Safari does not allow asking for several audio streams (at the same time) by calling several times consecutively the getUserMedia API. Each call to getUserMedia automatically ends the previous track obtained. Current audio data stream obtained from the microphone is ended by calling getUserMedia during a call.