The first one is from my point of view, the best approach (the original code came from SockJS codebase). It removes all the invalid unicode characters from the string so you can send it from the server-side without further decoding.
The second one takes another approach which seems valid (I only tested the former) but requires an extra decoding step on the other side:
Hope this help !
[Update] Dougal Campbell made some important notes: “the second method preserves the original data, while the first strips out information, altering the original data”. Thus, the first method can lead to potential security leaks (see his comment).
See Text::Unicode for the original README file, including methodology and limitations.
Note that all the files named 'x??.php' in data are derived directly from the equivalent perl file, and both sets of files are distributed under the perl license, and not the BSD license.
$ npm install unidecode
$ node > var unidecode = require('unidecode');
'niJian Wei iganaika, Zai Du Que Ren sitekudasai. Zai Du miIp misitekudasai. '