9/26/2012

Strpad function in LUA

I started to play with LUA 5.1 (embedded in Redis 2.6) today and couldn't find an implementation of Strpad so here it is.
8/21/2012

JavaScript - Return the array elements until a selector is matched

6/06/2012

How to get Growl notifications from Grunt JS

I just started to use Grunt on a new project. While I really enjoyed Grunt I quickly missed the desktop notifications when something wrong happen. Here are two snippets that will display cross-platform growl notifications on warn/fatal Grunt events.


5/05/2012

How to fix the "Could not decode a text frame as UTF-8." bug

Sometimes Google Chrome throw a Could not decode a text frame as UTF-8 error. It happens when the server send invalid unicode characters (see Unicode surrogates) to the browser (via websockets or any other transport) and . I've found two work-around for this issue.

The first one is from my point of view, the best approach (the original code came from SockJS codebase). It removes all the invalid unicode characters from the string so you can send it from the server-side without further decoding.

/*
 * Fix the "Could not decode a text frame as UTF-8." bug #socket.io #nodejs #websocket
 *
 * Usage:
 *   cleanedString = filterUnicode(maybeHarmfulString);
 *
 * Original work-around from SockJS: https://github.com/sockjs/sockjs-node/commit/e0e7113f0f8bd8e5fea25e1eb2a8b1fe1413da2c
 * Other work-around: https://gist.github.com/2024272
 * 
 */

var escapable = /[\x00-\x1f\ud800-\udfff\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufff0-\uffff]/g;

function filterUnicode(quoted){

  escapable.lastIndex = 0;
  if( !escapable.test(quoted)) return quoted;

  return quoted.replace( escapable, function(a){
    return '';
  });
}

The second one takes another approach which seems valid (I only tested the former) but requires an extra decoding step on the other side:

/**
 * encode to handle invalid UTF
 * 
 * If Chrome tells you "Could not decode a text frame as UTF-8" when you try sending
 * data from nodejs, try using these functions to encode/decode your JSON objects.
 * 
 * see discussion here: http://code.google.com/p/v8/issues/detail?id=761#c8
 * see also, for browsers that don't have native JSON: https://github.com/douglascrockford/JSON-js
 * 
 * Any time you need to send data between client and server (or vice versa), encode before sending,
 * and decode upon receiving. This is useful, for example, if you are using socket.io for real-time
 * client/server communication of data fetched from a third-party service like Twitter, which might
 * contain Emoji, or other UTF characters outside the BMP.
 */
function strencode( data ) {
  return unescape( encodeURIComponent( JSON.stringify( data ) ) );
}

function strdecode( data ) {
  return JSON.parse( decodeURIComponent( escape ( data ) ) );
}

Hope this help !

[Update] Dougal Campbell made some important notes: “the second method preserves the original data, while the first strips out information, altering the original data”. Thus, the first method can lead to potential security leaks (see his comment).

5/01/2012

Unidecode for JavaScript (NodeJS)

Unidecode is JavaScript port of the perl module Text::Unicode. It takes UTF-8 data and tries to represent it in US-ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F). The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system.

See Text::Unicode for the original README file, including methodology and limitations.

Note that all the files named 'x??.php' in data are derived directly from the equivalent perl file, and both sets of files are distributed under the perl license, and not the BSD license.

Installation

$ npm install unidecode

Usage

$ node
    > var unidecode = require('unidecode');
> unidecode("aéà)àçé");
'aea)ace'
> unidecode("に間違いがないか、再度確認してください。再読み込みしてください。");
'niJian Wei iganaika, Zai Du Que Ren sitekudasai. Zai Du miIp misitekudasai. '

node-unidecode on Github

4/29/2012

[MacOS] Nginx Homebrew Formula for tcp_proxy support

Since we have to wait until Nginx 1.3 for websocket support I updated the official nginx homebrew formula (a pull request to the homebrew repository is on its way) to support the tcp_proxy module.

brew install "https://raw.github.com/FGRibreau/homebrew-formulas/master/nginx.rb" --with-tcpproxy

Happy coding !

3/06/2012

NodeJS process management at Brin.gr

I saw today a question on StackOverflow about "Running and managing nodejs applications on a single server" and thought it would be a good idea to share how we deal with NodeJS applications at brin.gr.

  • First we use supervisord to manage & automatically restart applications. The configuration file for each application looks like this:




  • Finally, for remote control and application monitoring, we've setup Monit. Each application has a monit configuration file like the following:


Note: before this current workflow we were using forever/forever webui but forever remained quite unstable so we decided to switch to supervisord/monit and since this migration everything's fine.
2/24/2012

Why I do computer programming

There are three main reasons that I could sum up in three parts: instant-result, never-ending knowledge and creativity (and challenge).

Instant-result

Before I start to play with computer, a large part of my childhood was dedicated to imagining and building things (mainly with wood and screws). Some years later (around 10) I started to learn electronics and build my first integrated circuit. I don’t think that I really knew what I was doing at that time but it did work. I was always dismantling things and reassembling them in order to know how everything worked.

One day as I was assembling another integrated circuit (that I hadn’t designed ) everything fell apart, it didn't work. I knew that I wouldn't be able to find and fix the issue because it required more knowledge in mathematics and physics that the schoolboy I was had (and without doubt even now). Unlike dealing with wood and screws, in electronics, everything seemed complex. The learning curve was too high for me. From this day I decided to stop electronics and began to use a computer. That failure was the turning point.

Everything was instant, it took me less than a week to learn and setup my first website. I loved how web programming works, I just had to write or edit something, hit reload and immediately see the result of what I was doing. It recalls me why I enjoyed working with wood. Each modification on the structure had an aim and the feedback, good or bad, was instant !

But, unlike with wood, with computers I wasn't limited by the raw material quantity. This issue doesn't even exist in computing (or at least not as much) where the virtual world can have (theoretically) an infinite size. The only limitation to building something inside this world  is the knowledge and imagination of the developer. As a developer, I like building things that didn't exist before (or, if so, without all the features that I wanted), as well as finding how existing programs work. And when I'm not able to access to their source code. I try to imagine how it's done behind the scene (I like to see this activity as doing reverse-engineering). There are so many areas in computer science that I wish I would have known earlier in order to use them at the right time, in the right project, instead of doing things inefficiently.

The never-ending knowledge

Which leads us to the second part: the never-ending knowledge. The more I learn about computer science, the more I think the schoolboy I was a fool to believe he knew how to make a website. Even a little computing area like website creation or a more wider area like web-service consumption can become increasingly complex. On a daily basis, I browse the web and social networks to find relevant pieces of information related to my favorite domains. It allows me to discover each day new way of thinking (for example by using the power of partial and high order functions in functional programming languages), structuring projects development (with test/behaviour driven development (TDD) or even documentation driven development) or opening myself to new areas like machine learning and more recently distributed asynchronous computation.

Each time I learn something new, or when I discover a new library or framework or work-flow for building a specific type of program, if it's interesting enough I'll always try it by myself, and if it works, add it to my toolbox in order to be more precise and efficient in future work.

Every single day I do discover, learn, code, share or release something that I didn't know (or didn't know how to do) before. And this intellectual curiosity is in both way directly connected with the last part: creativity.

Creativity (and challenge)

Creativity is what cultivates this passion about computer science. I've never accepted a single project in which each of its components was already available somewhere (in the form of a library, a framework, or a tool). Don't get me wrong, I like to use them and often to study them (it's always fascinating to see how developers solve specific problem domains) but if the project is only about making these components fit together then the creativity required in the process will be too limited and even worst, there won’t be any challenge. In my point of view challenge comes in pair together with creativity.

For example, for the first assignment in Information Retrieval, I added unconsciously multiple challenges to the original requirements. I would have to do it in JavaScript (which may not seems the best language for this kind of exercise) in less than 2 hours, using an hand-made lexer (the goal here was to learn at the same time how to do a lexer because that was something I wanted to learn how to do since the first time I heard about it) and the program would have to be faster than it's Java equivalent. Each of these additional requirements brought us the last point: excitation.

In fact, while coding I added another requirement which was to do it with the minimum of lines of code. After 2 hours it worked. At that time, the feeling of accomplishment is what makes me want to learn more everyday. The lexer was parsing the Cranfield collection of 1400 documents in less than 500ms (8 times faster than the hand-made Java parser) and after a rewrite from scratch this number went down to 42ms. The final version with only 136 lines of code takes 4 secs to process and index all documents.

But what's important here isn't the result, it's the way I take to go to this result. It was all about using the functional tool and using the language weirdness to reduce the number of lines and improve the program's efficiency. This project was not about the architecture and the design of the code but about specific problem areas which may not be usually related: IR, language performance and code size. It was more an experiment than an assignment. In fact since the beginning I knew that it won't be the final program that I'll have to return. Simply because it doesn't respond to the initial requirements. But it allowed me to discover the Vector Space Model, TF-IDF and how to make a Lexer from another angle before re-writing it in the traditional way in another language.


That's why I do computer, it constantly feeds my mind with new things to learn, try, or reproduce, and allows me to express my creativity in challenging projects.
2/16/2012

[Snippet] Cross-platform .pid management for NodeJS

... and by cross-platform I mean of course "cross-unix".

I wrote the following snippets because I wanted to use the same tools in my development and production environnements to manage my apps (bye bye initd on Debian & launchd on Mac). Since my recent switch from Forever (too buggy even manipulated with an handmade web ui) to Supervisord I haven't found an option to configure Supervisord to do the heavy lifting of pid management for me (at the time of this writing it doesn't even seem possible). So here we are:


In CoffeeScript:
1/26/2012

[Twitter Unofficial API] Getting the tweet's number of favorites, RTs and replies

I was updating an internal tool we use at brin.gr while suddenly I needed a way to find the number of times a specific tweet has been favorited. The Search API results didn't include this information even with include_entities set to true.

I knew there was an API for that since the official web & mobile app are able to show this. So after a little network monitoring I finally found the endpoint:

https://api.twitter.com/i/statuses/[tweet.id]/activity/summary.json
Note: the request must be done via OAuth. Sample payload:
{
   "favoriters":[113682166],
   "favoriters_count":"1",
   "repliers":[],
   "repliers_count":"0",
   "retweeters_count":"1",
   "retweeters":[113682166]
}
« »
 
 
Made with on a hot august night from an airplane the 19th of March 2017.