The W3C ECMAScript API defines an event to handle issues during the ICE gathering process. But should your application handles these errors ? And if yes, what to do?
During the ICE candidate gathering process an error could occur when your browser is exchanging information with the STUN and TURN servers.
url: xxx.xxx.xxx:443address: [2a01:e0a:410:x:x:x:x:x]port: 53449host_candidate: [2a01:e0a:410:x:x:x:x:x]:53449error_text: STUN host lookup received error.error_code: 701
Even if your JavaScript code is correct, your application needs to detect and to react to any errors during that step that can be due mainly to the network environment or the servers them self.
For that the RTCPeerConnection fires the event onicecandidateerror
.
However, dealing with that error is not so easy… Firstly, because in most of the case, it could be too late; your application is not able to place countermeasures for initiating the call and secondly because understanding what happened using the errorCode
is complicated…
Note: The description of the ICE errors are defined in several documents. But basically, STUN error code are listed in that main document Session Traversal Utilities for NAT (STUN) Parameters that is the umbrella of all other documents.
When dealing with an STUN server, ICE errors are documented in the IETF Proposed Standard document: Session Traversal Utilities for NAT (STUN). Error codes are described in section 14.8 ERROR-CODE
Note: This document obsoletes the RFC 5389
ErrorCode | Description |
---|---|
300 Try Alternate | The client should contact an alternate server for this request. This error response MUST only be sent if the request included either a USERNAME or USERHASH attribute and a valid MESSAGE-INTEGRITY or MESSAGE-INTEGRITY-SHA256 attribute; otherwise, it MUST NOT be sent and error code 400 (Bad Request) is suggested. This error response MUST be protected with the MESSAGE-INTEGRITY or MESSAGE-INTEGRITY-SHA256 attribute, and receivers MUST validate the MESSAGE-INTEGRITY or MESSAGE- INTEGRITY-SHA256 of this response before redirecting themselves to an alternate server. Note: Failure to generate and validate message integrity for a 300 response allows an on-path attacker to falsify a 300 response thus causing subsequent STUN messages to be sent to a victim. |
400 Bad Request | The request was malformed. The client SHOULD NOT retry the request without modification from the previous attempt. The server may not be able to generate a valid MESSAGE-INTEGRITY or MESSAGE-INTEGRITY-SHA256 for this error, so the client MUST NOT expect a valid MESSAGE-INTEGRITY or MESSAGE- INTEGRITY-SHA256 attribute on this response. |
401 Unauthenticated | The request did not contain the correct credentials to proceed. The client should retry the request with proper credentials. |
420 Unknown Attribute | The server received a STUN packet containing a comprehension-required attribute that it did not understand. The server MUST put this unknown attribute in the UNKNOWN- ATTRIBUTE attribute of its error response. |
438 Stale Nonce | The NONCE used by the client was no longer valid. The client should retry, using the NONCE provided in the response. |
When dealing with a TURN server, these errors are document in the IETF Proposed Standard document: Traversal Using Relays around NAT (TURN) which is the Relay Extensions to Session Traversal Utilities for NAT (STUN). Error codes are described in section 19 STUN Error Response Codes
Note: This document obsoletes the RFC 5766
ErrorCode | Description |
---|---|
403 Forbidden | The request was valid but cannot be performed due to administrative or similar restrictions. |
437 Allocation Mismatch | A request was received by the server that requires an allocation to be in place, but no allocation exists, or a request was received that requires no allocation, but an allocation exists. |
440 Address Family not Supported | The server does not support the address family requested by the client. |
441 Wrong Credentials | (Wrong Credentials): The credentials in the (non-Allocate) request do not match those used to create the allocation. |
442 Unsupported Transport Protocol | The Allocate request asked the server to use a transport protocol between the server and the peer that the server does not support. NOTE: This does NOT refer to the transport protocol used in the 5-tuple. |
443 Peer Address Family Mismatch | A peer address is part of a different address family than that of the relayed transport address of the allocation. |
486 Allocation Quota Reached | No more allocations using this username can be created at the present time. |
508 Insufficient Capacity | The server is unable to carry out the request due to some capacity limit being reached. In an Allocate response, this could be due to the server having no more relayed transport addresses available at that time, having none with the requested properties, or the one that corresponds to the specified reservation token is not available. |
The first document that specifies how entities behind a NAT discover the presence of a NAT and then lear the addresses bindings allocated by the NAT is referenced as STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) which has been obsoleted by the RFC-8489 described above.
But in that document, additional codes are defined that can still be fired by STUN & TURN servers.
ErrorCode | Description |
---|---|
430 Stale Credentials | The Binding Request did contain a MESSAGE- INTEGRITY attribute, but it used a shared secret that has expired. The client should obtain a new shared secret and try again. |
431 Integrity Check Failure | The Binding Request contained a MESSAGE-INTEGRITY attribute, but the HMAC failed verification. This could be a sign of a potential attack, or client implementation error. |
432 Missing Username | The Binding Request contained a MESSAGE- INTEGRITY attribute, but not a USERNAME attribute. Both must be present for integrity checks. |
433 Use TLS | The Shared Secret request has to be sent over TLS, but was not received over TLS. |
500 Server Error | Server Error: The server has suffered a temporary error. The client should try again. |
600 Global Failure | The server is refusing to fulfill the request. The client should not retry. |
Some other documents describe ICE errors such as:
When dealing with mobility, ICE errors are described in document Mobility with Traversal Using Relays around NAT (TURN), which describes a new error code in section 3.4 New STUN Error Response Code
ErrorCode | Description |
---|---|
405 Mobility Forbidden | Mobility request was valid but cannot be performed due to administrative or similar restrictions. |
The document Traversal Using Relays around NAT (TURN) Extensions for TCP Allocations describes lists some new error codes in section 6.3 New STUN Error Codes
ErrorCode | Description |
---|---|
446 Connection Already Exists | No description |
447 Connection Timeout or Failure | No description |
The document Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal, describes an additional new ICE error code when in that situation, in section 16.2 New Error-Response Codes
ErrorCode | Description |
---|---|
487 Role Conflict | The Binding request contained either the ICE- CONTROLLING or ICE-CONTROLLED attribute, indicating an ICE role that conflicted with the server. The remote server compared the tiebreaker values of the client and the server and determined that the client needs to switch roles. |
In term of ECMAScript APIs, the W3C Recommendation document WebRTC 1.0: Real Time Communication Between Browsers gives the JavaScript definition of the ICE error as well as the definition of the event to listen to detect these errors
The icecandidateerror
event and the RTCPeerConnectionEceErrorEvent
interface are described in section 4.8.3 RTCPeerConnectionIceErrorEvent
The value of the errorCode
attribute points to the IANA document that summarizes these codes STUN Error Codes. For sure, the ECMAScript API only transmits the ICE errors coming from the STUN/TURN servers.
But additionally, the W3C defines a new error code when no host candidate can reach the server:
ErrorCode | Description |
---|---|
701 | If no host candidate can reach the server, errorCode will be set to the value 701 which is outside the STUN error code range. This error is only fired once per server URL while in the RTCIceGatheringState of “gathering”. |
Note: That specific event is generated by the browser not by the STUN & TURN servers
Now, that we know the possible values (be careful, new values could be defined), we need to listen to the icecandidateerror
event in our JavaScript application to catch these errors.
Deducing from the errors listed previously, 2 main categories emerge:
When errorCode < 700
: In that case, we could conclude that the application succeeded to contact the STUN & TURN servers but something went wrong during the gathering. Some IT skills are needed to understand what happened. It could be the STUN or TURN server that are not correctly configured. If TURN credentials have been verified, there is a few changes that the error is in your application, because it’s the browser that contacts the STUN & TURN servers directly.
When errorCode > 700
: As mentioned, these errors are generated by the browser itself and not by the STUN server. So, they have a different meaning. Depending on the result of the ICE gathering, these errors have to be taken into account or not meaning that, it will work in some circumstances. So, the first thing to do is a health check of your STUN & TURN servers to be sure it still responds correctly to requests and then try to analyze deeper (as proposed below)
Here is what you can add to your code to separate these 2 kinds of errors
// Somewhere in your codepc.onicecandidateerror = function(event) {if (event.errorCode >= 300 && event.errorCode <= 699) {// Here this a standardized ICE error// Do something...} else if (event.errorCode >= 700 && event.errorCode <= 799) {// Here, the application perhaps didn't reach the server ?// Do something else...}}
But then what ? What to do once the application catch an ICE error ? :-)
The minimal viable error management is to save that error somewhere: either in your application logs file or somewhere in the Cloud (It could be in a Loki system or in any logs or events aggregation systems in place). This helps you understand in which circumstance that error occurs, how many times and if it is specific to a user, a place, etc…
Unfortunately, listening to the event icegatheringstatechange
does not help because once finished, the gathering state just moves to state complete
which means that this step is finished.
The RTCPeerConnection offers the possibility to listen to the event iceconnectionstatechange
. If a connection can be established between a candidate pair (which means a local candidate and a remote candidate), the state should move to connected
or completed
.
But the final assessment should be done by listening to the event connectionstatechange
and by checking the state reached. If the connection is successful, state should be connected
.
That said, ICE errors are very rare and uncommon in the majority of cases. When the STUN & TURN servers are up and correctly configured and when the application sends the correct URLs and credentials, it should work in 99% of the cases. So, as seen in the previous paragraph, your application could stay simple without having to handle these different events specifically. It's better to invest on monitoring your STUN & TURN equipments to detect any crashes.
If we suppose that the STUN & TURN servers are accessible from the browser because they use common ports and protocols, the minimal case that needs to be handled correctly from the application side is when only host candidates have been gathered. Because it can affect the call if the participants are not on the same network.
2 main issues could be at the origin of this result:
The STUN & TURN information have not been given correctly to WebRTC which finally use no STUN server.
The STUN & TURN server are down or not reachable
In order to discard the first possibility and confirm possibly the second, a quick test can be done by using the WebRTC sample Tickle ICE. If more candidates appear, the conclusion could be that the problem is in your code and not with the configuration of the STUN & TURN servers.
If there is still only host candidates, you can have a look to the icecandidateerror
to see if there are interesting information. Sadly, depending on the browser used, it will not help you to understand the issue.
It seems that Firefox (86) and Safari (14.0.3) don’t fire an icecandidateerror
when the STUN & TURN servers can’t be reached or are down. Only Chrome fires that event. An extra information is added in the property eventText
with the value STUN host lookup received error or TURN host lookup received error. I’m not sure that this information has to be interpreted as the STUN & TURN servers can’t be reached. So, finally, using the error in that case is useless as we have no real hints…
So, finally, wa have to deal with that result: The browser only got host candidates and that’s all! So, the application has to deduce that the STUN & TURN servers have an issue.
You can add the following logical block in your code to deduce a STUN issue:
const candidates = [...]; // List of candidates receivedconst hasNatOrRelayCandidates = (listOfCandidates) => {return listOfCandidates.find(candidate => (candidate.type === "srflx" || candidate.type === "relay"));};if (!hasCandidatesForTraversingNat(candidates)) {// Potentially there is an issue contacting the STUN & TURN server}
Then when in that state, your application has the possibility to disable the call button and inform that the service is unavailable.
If you have configured a Coturn server, your application expects to receive srflx and relay candidates.
One case that can happens is if the TURN credentials are not the right ones. How can you detect that case ?
In fact, two things can help you to detect that case:
Only host and srflx have been received. No relay candidate
The icecandidateerror
event is fired with a 401 Unauthorized
After a quick test with the current versions of Safari (14.1.1) and Firefox (89); only Chrome fires an icecandidateerror
event. So, to have a common code, the following logical block can be added to deduce a TURN specific issue:
const candidates = [...]; // List of candidates receivedconst hasRelayCandidates = (listOfCandidates) => {return listOfCandidates.find(candidate => (candidate.type === "relay"));};if (!hasRelayCandidates(candidates)) {// Potentially there is an issue with the TURN credentials}
Then, as for the previous paragraph, this is up to your application to adapt.
That case is not easy to handle and understand but hopefully in some cases, impact is minor and don’t really avoid receiving the candidates.
First, as mentioned previously, that error is generated by the browser and at this time of writing only Chrome generates it.
So, after checking the source code of the libwebrtc, here are the following cases when this error can be generated. What is interesting is that there are additional information attached with that event that can be useful. Secondly, that error is defined as SERVER_NOT_REACHABLE_ERROR which means that it deals with the network part.
Error | Details | Description |
---|---|---|
SERVER_NOT_REACHABLE_ERROR | Failed to create TURN client socket. | Chrome didn’t succeed to create a TCP or UDP socket to the server |
SERVER_NOT_REACHABLE_ERROR | TURN host lookup received error. | DNS didn’t succeed to resolve the server name. DNS queries could be blocked by a firewall. |
SERVER_NOT_REACHABLE_ERROR | — | Socket not ready when trying to close it |
SERVER_NOT_REACHABLE_ERROR | TURN allocate request timed out. | When receiving a timeout message from the TURN server |
As you can see, that error is issued in some different cases. In my case, I only received TURN host lookup received error which lets my application grabs relay and srflx candidates without problems in Chrome, Firefox and Safari. So it’s ok today and I hope this will continue to be the case.
Note: There is still an opened Github ticket in the Coturn repository limited to the case TURN allocate request timed out..
My conclusion is that from the application point of view, the easiest way to have a common logical block that handles the ICE issues is to rely on the ICE candidates received. This is the minimal code to add in order to alert the user that there is something wrong as soon as possible.
And to not warn him for no blocking issue such as error 701 if all candidates have been gathered correctly.
And you ?
How did you manage the ICE errors coming into your web application ? Did you take care about them or simply forget them ?