How to fix the "Could not decode a text frame as UTF-8." bug

Sometimes Google Chrome throw a Could not decode a text frame as UTF-8 error. It happens when the server send invalid unicode characters (see Unicode surrogates) to the browser (via websockets or any other transport) and . I've found two work-around for this issue.

The first one is from my point of view, the best approach (the original code came from SockJS codebase). It removes all the invalid unicode characters from the string so you can send it from the server-side without further decoding.
The second one takes another approach which seems valid (I only tested the former) but requires an extra decoding step on the other side:
Hope this help !

[Update] Dougal Campbell made some important notes: “the second method preserves the original data, while the first strips out information, altering the original data”. Thus, the first method can lead to potential security leaks (see his comment).

Unidecode for JavaScript (NodeJS)

Unidecode is JavaScript port of the perl module Text::Unicode. It takes UTF-8 data and tries to represent it in US-ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F). The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system.

See Text::Unicode for the original README file, including methodology and limitations.

Note that all the files named 'x??.php' in data are derived directly from the equivalent perl file, and both sets of files are distributed under the perl license, and not the BSD license.


$ npm install unidecode


$ node > var unidecode = require('unidecode');
> unidecode("aéà)àçé");
> unidecode("に間違いがないか、再度確認してください。再読み込みしてください。");
'niJian Wei iganaika, Zai Du Que Ren sitekudasai. Zai Du miIp misitekudasai. '

« »
Made with on a hot august night.