The world is awash in Javascript, but not everyone is speaking the same dialect

The Background

Recently I've been spending a fair amount of time hacking away in Javascript on the improbable node.js. As I've gushed, my preferred stack of choice nowadays is Redis, node.js and Riak. One great thing about this arrangement is that both node.js and Riak speak Javascript natively. Considering that Javascript is the lingua franca of end user browsers everywhere you can imagine a situation where being able to develop in Javascript throughout the stack could be a compelling proposition.

With visions of one language to rule them all running through my mind I set off to remake my world of applications in a new image. A better image. The transformation from a hodgepodge of languages goes smoothly. Synergies are being realized as functions get packaged for reuse in different environments. Alas, all is not well in the Shire. I've run into a snag and it is somewhat interrupting secondsies. 

The Problem

Middle-earth aside, it turns out that all Javascript engines are not created equal. Beyond specific performance differences between the various Javascript implementations there are also subtle differences in feature support. In my humble opinion, nowhere is this more glaring then in date parsing. In an era where you are executing javascript in three separate environments - browser, middleware and database - and potentially against three different engines, you need to concern yourself with the details. On the client end your user can be using any of a number of different browsers all with their own Javascript engine and version number. If you are using node.js your engine is v8, Riak uses spidermonkey as its engine. Do not be lulled into a sense of security due to the fact that you are using one language throughout. In some instances you may as well be in two completely different languages. Ok maybe that's a bit much, but let us take a closer look.

Here is a picture of a Date.parse() test I ran on the three major browsers, chrome, safari and firefox on osx snow leopard and the code used to make it happen. It's clear that v8 has the best date parsing support, but I'll let you be the judge.

chrome, safari and firefox

What I'm doing here is running an array of different date and date/time strings through Date.parse() to see which date/time formats a given engine will accept. If an engine can successfully parse a string it will return an integer, if not it will return NaN (Not a Number). This integer is the date/time represented as the number of milliseconds since Jan 1, 1970, aka. the epoch. Obviously this won't work for dates older than Jan 1, 1970 but I'm mainly interested in present and forward dates. Run the code for yourself, feel free to add any formats you are interested in and take a look at the results. After you are done being all 'WTW' about it come back. Ya, this sucks. This sucks... hard. But, like, why... specifically. 

As it turns out computers can sort all kinds of stuff, strings included. However, what you'll find is that string representations of dates like Jan, 1 2011 won't sort the way you want them to... you know chronologically. To skip that problem you need to convert said date into an integer - which computers also have no problem sorting - yet this time the sorting will mean exactly what you want it to mean. For example, this is an issue in Riak when you want to map/reduce over a range of keys and then do interesting things to them based on dates contained therein. I talk about sorting by date in Riak in a previous post.

 

So now that we know our date parsing is all out of whack what can we do about it?

If your application calls for date conversions do yourself a favor and do it in your middleware - node.js, python, ruby, what have you - and ship integers out directly to the end user and to the database. Centralizing your parsing will ensure that all your parsing is being done under the same rule set. All javascript implementations will be able to go from an integer to a date like so:

new Date(integer)

This will eliminate the vagaries of conversion in multiple environments. Sure, a lot of these headaches can be curtailed by controlling the date format before it enters your system but sometimes it is just out of your control. Minimize the number of places in your application for discrepancy and ship data around your stack in a format every language implementation can understand.

If you know of any other interesting gotchas between the various Javascript implementations please share them in the comments.

Using Riak's map/reduce for sorting

From a database perspective, Riak is a schemaless, key/value datastore. The focus of this post is to show you how to do the equivalent of the sql "SORT BY date DESC" using Riak's map/reduce interface. Due to Riak's schemaless, document focused nature Riak lacks internal indexing and by extension, native sorting capabilities. Additionally, Riak does not have a single file backend. The primary default backend is called Bitcask but Riak does offer a number of different backends for specific use cases. This makes an internal general purpose index implementation impractical, especially so once you factor in the distributed nature of the platform.

So how does a sort actually work in this environment? Map/Reduce. Riak implements map/reduce as its way of querying the riak cluster. Lets keep this description light and simply say: Riak brings your query (for the most part) to the node where your data lives. The map part of your query is distributed about the cluster to the nodes where the data resides, executed, then results sent back to the originating node for the reduce phase. You can write your map/reduce query in two different languages - erlang and javascript (Spidermonkey is the internal JavaScript engine.)

So now that you have a basic theoretical underpinning, how does this actually work in practice? I'm including here a snippet of a heavily commented javascript function that i use in one of my nodejs apps. The bridge between nodejs and Riak is a module called riak-js (disclosure, I've contributed some patches.) Let's take a look, I'll see you on the other side.

Lets break this down. This function is part of a larger nodejs application that uses the fu router library lifted from node_chat, a quite approchable getting-to-know-node example application. No you can not cut and paste this code somewhere and have it work. What you should do is take a look at the map and reduceDescending variables (lines 15 and 40). Those functions are written in javascript and sent over the wire to riak. Lets go over some of the magic that makes this work.

Riak will gladly accept a bucket as it's input mechanism in a map/reduce. Although Basho has done a good amount of work to make this performant, simply passing a bucket will force an expensive list:keys operation internally. The more keys you have in your system the longer this will take. Sometimes this is unavoidable or even desirable. Most likely you will want to expressly pass keys to the map/reduce job. This is done in the format:

[ ["bucket","key1"],["bucket","key2"],["bucket","key3"],["bucket","key4"] ] 

Now, although I'm passing the keys here in order (key1... keyN), recall that riak has no internal concept of ordering. The map phase will seek out the keys wherever they live and the result is not guaranteed to be ordered. What is needed is to sort the result set in the reduce phase once all the data has been collected. In this case I will be sorting by the X-Riak-Last-Modified header which is a date kept in the format "Tue, 31 Aug 2010 06:46:02 GMT". Well, that doesn't look like a sortable string, does it? The trick is to turn it into an int, as I do on line 28:

o.lastModifiedParsed = Date.parse(v["values"][0]["metadata"]["X-Riak-Last-Modified"]); 

Here the string date is pulled out of the header and converted via the native javascript function Date.parse() into an int. It is the int that allows the numeric sorting in the reduce phase on line 46:

v.sort ( function(a,b) { return b['lastModifiedParsed'] - a['lastModifiedParsed'] } );

The format "b-a" is what dictates descending order, conversely ascending order would be written as "a-b". Remember the value is embedded within a javascript object and needs to be accessed as such. This trick can be used with any integer value embedded in a json object. If my "key" (on line 30) were an int I could use that, or maybe a price or quantity value.

Map/reduce is a bit tricky to wrap you mind around when coming from a relational/sql background but the new breed of NoSQL databases available make it easy to duplicate many of those features. Riak exposes a fully functional map/reduce implementation to get at all the nested parts of your complex json documents. So what are you waiting for? Get codin!