« »
5/05/2012

How to fix the "Could not decode a text frame as UTF-8." bug

Sometimes Google Chrome throw a Could not decode a text frame as UTF-8 error. It happens when the server send invalid unicode characters (see Unicode surrogates) to the browser (via websockets or any other transport) and . I've found two work-around for this issue.

The first one is from my point of view, the best approach (the original code came from SockJS codebase). It removes all the invalid unicode characters from the string so you can send it from the server-side without further decoding.

/*
 * Fix the "Could not decode a text frame as UTF-8." bug #socket.io #nodejs #websocket
 *
 * Usage:
 *   cleanedString = filterUnicode(maybeHarmfulString);
 *
 * Original work-around from SockJS: https://github.com/sockjs/sockjs-node/commit/e0e7113f0f8bd8e5fea25e1eb2a8b1fe1413da2c
 * Other work-around: https://gist.github.com/2024272
 * 
 */

var escapable = /[\x00-\x1f\ud800-\udfff\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufff0-\uffff]/g;

function filterUnicode(quoted){

  escapable.lastIndex = 0;
  if( !escapable.test(quoted)) return quoted;

  return quoted.replace( escapable, function(a){
    return '';
  });
}

The second one takes another approach which seems valid (I only tested the former) but requires an extra decoding step on the other side:

/**
 * encode to handle invalid UTF
 * 
 * If Chrome tells you "Could not decode a text frame as UTF-8" when you try sending
 * data from nodejs, try using these functions to encode/decode your JSON objects.
 * 
 * see discussion here: http://code.google.com/p/v8/issues/detail?id=761#c8
 * see also, for browsers that don't have native JSON: https://github.com/douglascrockford/JSON-js
 * 
 * Any time you need to send data between client and server (or vice versa), encode before sending,
 * and decode upon receiving. This is useful, for example, if you are using socket.io for real-time
 * client/server communication of data fetched from a third-party service like Twitter, which might
 * contain Emoji, or other UTF characters outside the BMP.
 */
function strencode( data ) {
  return unescape( encodeURIComponent( JSON.stringify( data ) ) );
}

function strdecode( data ) {
  return JSON.parse( decodeURIComponent( escape ( data ) ) );
}

Hope this help !

[Update] Dougal Campbell made some important notes: “the second method preserves the original data, while the first strips out information, altering the original data”. Thus, the first method can lead to potential security leaks (see his comment).

« »
 
 
Made with on a hot august night from an airplane the 19th of March 2017.