How to know if my microphone works?

By Olivier Anguenot

Published in api

March 20, 2022

5 min read

Introduction

Permission to use a device

Capture the Output Stream

Detect Noise or silence

Microphone Status

Optimize using Worklet

Use cases

Conclusion

Sometimes users complaint about not being able to hear the sound from one person in a conference or from their recipients when in a peer-to-peer call.

If the problem is not located in the receiver side, it could be the problem of the emitter. This article focus on detecting if the microphone really works well which means if the microphone captures audio data or not.

Verifying that the microphone works could be done prior to the call. For sure, this is the good moment to check before engaging the conversation. But during the call, the user can manipulate his microphone or his computer, and so he can introduce “bad things” accidentally or not… So having a check in “live” during the call is interesting too, to detect any troubles and to assist the user to recover from this situation.

Introduction

In general, two cases are interesting to detect:

When the user speaks and his microphone is muted physically or is not working correctly
When the microphone is opened (not muted by the application) and no sound is detected

In these 2 cases, the user thinks that his recipients hear him but in reality, nobody is able to hear him.

For this article, I used the following microphones:

Rode NT-USB Microphone which works in stereo
Konftel Ego that mixes a microphone and a loudspeaker

Permission to use a device

The first think to do when dealing with devices is to check that the navigator is able to access them.

This action can be done by checking the permission to access the device using the Permissions API. This API checks if the browser has already the authorization to access and to use the microphone.

It is interesting to know if the user has denied the authorization because often he did not pay attention to the authorization request and denies it accidentally.

const permission = await navigator.permissions.query({ name: 'microphone' });
if (permission.state === 'granted') {
// OK - Access has been granted to the microphone
} else if(permission.state === 'denied') {
// KO - Access has been denied. Microphone can't be used
} else {
// Permission should be asked
}

permission.onchange = () => {
// React when the permission changed
}

And by listening to the event change, the application is able to react when the permission changes.

Note: At this time of writing, the Permissions API (for microphone) works only in Chrome

Capture the Output Stream

Input audio from the microphone can be captured using the getUserMedia function from the MediaDevices APIs. When the browser can’t access the device, an error is thrown and the application can react.

try {
    const stream = await navigator.mediaDevices.getUserMedia({audio: true});
} catch (err) {
    // Errors when accessing the device
}

Then, the application can check that an active audio track exists. An active audio track is a track that is actively sending media (audio) data.

const audioTracks = stream.getAudioTracks();

if (audioTracks.length === 0) {
    // No audio from microphone has been captured
    return;
}

// We asked for the microphone so one track
const track = audioTracks[0];
if (track.muted) {
    // Track is muted which means that the track is unable to provide media data.
    // When muted, a track can't be unmuted.
    // This track will no more provide data...
}

if (!track.enabled) {
    // Track is disabled (muted for telephonist) which means that the track provides silence instead of real data.
    // When disabled, a track can be enabled again.
    // When in that case, user can't be heard until track is enabled again.
}

if (track.readyState === "ended") {
    // Possibly a disconnection of the device
    // When ended, a track can't be active again
    // This track will no more provide data
}

When the state of a track changes, events are fired to inform the application

track.addEventListener("ended", () => {
    // Which means track.readyState = "ended"
});

track.addEventListener("mute", () => {
  // Which means track.enabled = false
});

track.addEventListener("unmute", () => {
    // Which means track.enabled = true
});

If a track fires the event ended (property readyState goes to ended), the track is terminated and becomes obsolete: No more audio data will be received.

Detect Noise or silence

Audio APIs are a set of APIs that can be used to manipulate audio in a way to build an audio pipeline where the audio stream crosses nodes that can access and modify it before passing that transformed audio stream to the next one until the final output node.

Here, with these APIs, we can plug an AnalyzerNode to look at the signal generated by the microphone.

// Get the stream
const stream = await navigator.mediaDevices.getUserMedia({ audio: true});

// Create and configure the audio pipeline
const audioContext = new AudioContext();
const analyzer = audioContext.createAnalyser();
analyzer.fftSize = 512;
analyzer.smoothingTimeConstant = 0.1;
const sourceNode = audioContext.createMediaStreamSource(stream);
sourceNode.connect(analyzer);

// Analyze the sound
setInterval(() => {
    // Compute the max volume level (-Infinity...0)
    const fftBins = new Float32Array(analyzer.frequencyBinCount); // Number of values manipulated for each sample
  analyzer.getFloatFrequencyData(fftBins);
    // audioPeakDB varies from -Infinity up to 0
  const audioPeakDB = Math.max(...fftBins);

    // Compute a wave (0...)
    const frequencyRangeData = new Uint8Array(analyzer.frequencyBinCount);
  analyzer.getByteFrequencyData(frequencyRangeData);
  const sum = frequencyRangeData.reduce((p, c) => p + c, 0);
    // audioMeter varies from 0 to 10
  const audioMeter = Math.sqrt(sum / frequencyRangeData.length);
}, 100);

Using the variables audioPeakDB and audioMeter, the application can deduce the level of sound and display something on screen representing the activity of the microphone.

And by mixing all together the Audio APIs, the GetUserMedia APIs and the Permissions APIs, we can have a clear view of the parameters to control: The following table summarizes the different cases where the application could consider that no audible sound will be sent to the recipient(s).

API	Status	Description
Permissions API	state=`denied`	Access to the device has not been granted
GetUserMedia API	throws an error	Error when accessing the device
GetUserMedia API	No audio track	No audio captured (never seen such problem)
MediaStreamTrack API	`track.muted` = true	Audio has been muted for that stream (eg: `direction`=“recvonly”
MediaStreamTrack API	`track.readyState` = `ended`	Device is disconnected or track is unable to provide audio data anymore
MediaStreamTrack API	`track.enabled` = false	Audio is temporaly inactive
Audio API	`peakDBLevel` = -Infinity	Audio is (temporaly) inactive

Microphone Status

From the previous paragraph, we can deduce a number of situations where the application is interesting to know if the microphone works well or not.

All the information gathered allow deducing 8 states:

States	How to detect?	Status
Active sound (Voice activity or loud sound)	When the `audioPeakDB` is above -50db	OK
Background noise	When the `audioPeakDB` is below -50db and `audioMeter > 0`	OK
Quiet	When `audioMeter` equals 0 and `audioPeakDB` is different than -Infinity	OK
Disabled “In-app”	When `track.enabled` is equals to false	OK
Muted “In-app”	When `track.muted`is equals to true	OK
Muted	When `audioMeter` equals to 0 and `audioPeakDB` goes to -Infinity (which is below -900db)	?
Ended	When `track.readyState` is equals to ended	KO
Not accessible	When permission is equals to denied or when getUserMedia throws an error	KO

This information is useful to identify the status of the microphone and so to detect an eventual problem:

When in state Ended or Not accessible, there is no doubt: Something is not good. The microphone will not be able to provide any sounds.
When in the state Muted, something that is independent of the application has muted the audio from the microphone. It can be an issue or not (voluntarily done). So the application could ask the user to check his microphone when in that case or at least to inform him that the microphone seems muted.
For the other states, the microphone seems to be under control and so works as promised.

Optimize using Worklet

Using an Analyzer and setInterval in not the optimized way to deal with the Audio API.

If the sound is analyzed too often (short interval = some ms) and during a long time, it will affect the performance of the application.

This computation can be optimized by using a Worklet:

The Worklet interface is a lightweight version of Web Workers and gives developers access to low-level parts of the rendering pipeline. With Worklets, you can run JavaScript and WebAssembly code to do graphics rendering or audio processing where high performance is required. MDN Web Docs.

By using an AudioWorklet, the audio processing is done out of the main thread. As the AudioNode, the AudioWorkletProcessor, processes 128 frames at a time. This ensures to add no extra additional latency, but if you want to work on more frames, you will need to implement your own buffer.

Here is an example of an AudioWorkletProcessor.

// Put this code to a file name audioMeter.js

const SMOOTHING_FACTOR = 0.99;

class AudioMeter extends AudioWorkletProcessor {

    constructor() {
    super();
    this._volume = 0;
    this.port.onmessage = ( event ) => {
      // Deal with message received from the main thread - event.data
    };
    }

    process(inputs, outputs, parameters) {
        const input = inputs[0];
        const samples = input[0];

        const sumSquare = samples.reduce((p, c) => p + (c * c), 0);
        const rms = Math.sqrt(sumSquare / (samples.length || 1));
    this._volume = Math.max(rms, this._volume * SMOOTHING_FACTOR);
        this.port.postMessage({volume: this._volume});

        // Don't forget to return true - else worklet is ended
        return true;
  }
}

registerProcessor('audioMeter', AudioMeter);

The AudioWorkletProcessor can be loaded and executed from the application

// Get the audio stream
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });

// Create the Audio Context
const audioContext = new AudioContext();
const source = audioContext.createMediaStreamSource(stream);

// Load the worklet
await audioContext.audioWorklet.addModule('./audioMeter.js');
const node = new AudioWorkletNode(audioContext, 'audioMeter');

node.port.onmessage = (event) => {
    // Deal with message received from the Worklet processor - event.data
};

// Connect the audio pipeline - this will start the processing
source.connect(node).connect(audioContext.destination);

A more complete description of AudioWorklet can be found here.

Using AudioWorklet, your application is able to monitor the microphone during a long period of time without having to worry about performance.

Note: Be careful to embed your worklet file when using Webpack

Use cases

Now that we know how to interpret the microphone state, we can see what happens when the user manipulates his computer or his microphone.

Detecting actions done from a physical device

When an external microphone is plugged to the computer, the user can interact with it and sometimes put it in a wrong way: Cable can be unplugged, physical mute button can be pressed unintentionally…

The following table summarizes what can be detected

Actions	All browsers	State
Pressing on the mute button	`peakDBLevel` = -Infinity	Muted
Disconnecting the device (unplugged / bluetooth disconnected)	`track.readyState` = ended `peakDBLevel` = -Infinity	Ended

Note: The 3 major browsers detect these changes.

Note 2: Be careful with devices that have both a microphone and a speaker. Detection of the Muted state not always work. It is like there is always some noise that prevent to move to that state (-Infinity).

Detecting actions done from the System Preferences

On macOS (this should be the same on other OS), we can go to the Sound panel in the System Preferences and modify the input level of the microphone.

This has an impact when using the Audio API because level captured will be different: lower if you decrease the input level or higher if you increase it.

Chrome

Actions	Chrome	State
Put the input level at 1%	Average level is around 40db lower than when at 100%	Active Sound Background-Noise Quiet
Put the input level to 0%	`Track.enabled` switched to true. Average level goes down do -Infinity	Muted “in-app”

Note: Chrome fires the event mute and unmute when the input level is at 0%

Safari & Firefox

Actions	Safari & Firefox	State
Put the input level at 1%	Average level is around 40db lower than when at 100%	Active Sound Background-Noise Quiet
Put the input level to 0%	Average level goes down do -Infinity	Muted “in-app”

Note: No event mute/unmute in Safari/Firefox when the input level reaches 0%

Detecting actions done from the browser itself

Mute/Unmute microphone when in call (Safari)

In Safari, the user has the possibility to mute or unmute the microphone during a call directly from the browser itself. This action is available by clicking on the microphone icon located at the end of the URL field.

Actions	Safari	State
Click on the microphone button to mute/unmute the microphone	mute/unmute event is fired `Track.enabled` switched to false/true Average level goes down to -Infinity	Muted “in-app”

Disable authorization when in call (Chrome)

Authorization can be disabled during a call Chrome. This action is available by clicking on the microphone icon located in the URL field.

Actions	Chrome	State
Click on the microphone button to remove the authorization	Permission changed to denied Event `ended` is fired `Track.readyState` goes to ended Average level goes down to -Infinity	Not accessible

Disable authorization when in call (Firefox)

Authorization can be disabled during a call Firefox. This action is available by clicking on the microphone icon located in the URL field.

Actions	Chrome	State
Click on the microphone button to remove the authorization	Event `ended` is fired `Track.readyState` goes to ended Average level goes down to -Infinity	Ended

Detecting actions done from the application

When in call, the application can propose to mute the microphone. The easiest way to do that is to manipulate the MediaStreamTrack by putting the property enabled to false. True is used to unmute the microphone.

Actions	All browsers	State
Mute/Unmute the microphone from the application	`track.enabled` goes to false/true Average level goes down to -Infinity	Muted “in-app

Switching to a Virtual Audio recorder

If by default, a virtual microphone (virtual audio recorder) is selected in the System Preferences or if the user selects it accidentally, this can lead to troubles too.

Safari

Actions	All browsers	State
Using a virtual device	Got error A MediaStreamTrack ended due to a capture failure Event `ended` is fired `Track.readyState` goes to ended Average level is always equals to -Infinity	Ended

Chrome & Firefox

Actions	All browsers	State
Using a virtual device	Average level is always equals to -Infinity	Muted

Conclusion

Monitoring the microphone can be done at 2 levels: At the Device level (for the permission and the authorization) and at the Media level (by using the MediaStreamTrack and the MediaStream WebRTC interfaces).

Having that monitoring in place can be helpful to prevent users from complaining about sound issues. But as seen in that article, there is a number of different cases to handle…

To remember: Safari does not allow asking for several audio streams (at the same time) by calling several times consecutively the getUserMedia API. Each call to getUserMedia automatically ends the previous track obtained. Current audio data stream obtained from the microphone is ended by calling getUserMedia during a call.