The world is awash in Javascript, but not everyone is speaking the same dialect

The Background

Recently I've been spending a fair amount of time hacking away in Javascript on the improbable node.js. As I've gushed, my preferred stack of choice nowadays is Redis, node.js and Riak. One great thing about this arrangement is that both node.js and Riak speak Javascript natively. Considering that Javascript is the lingua franca of end user browsers everywhere you can imagine a situation where being able to develop in Javascript throughout the stack could be a compelling proposition.

With visions of one language to rule them all running through my mind I set off to remake my world of applications in a new image. A better image. The transformation from a hodgepodge of languages goes smoothly. Synergies are being realized as functions get packaged for reuse in different environments. Alas, all is not well in the Shire. I've run into a snag and it is somewhat interrupting secondsies. 

The Problem

Middle-earth aside, it turns out that all Javascript engines are not created equal. Beyond specific performance differences between the various Javascript implementations there are also subtle differences in feature support. In my humble opinion, nowhere is this more glaring then in date parsing. In an era where you are executing javascript in three separate environments - browser, middleware and database - and potentially against three different engines, you need to concern yourself with the details. On the client end your user can be using any of a number of different browsers all with their own Javascript engine and version number. If you are using node.js your engine is v8, Riak uses spidermonkey as its engine. Do not be lulled into a sense of security due to the fact that you are using one language throughout. In some instances you may as well be in two completely different languages. Ok maybe that's a bit much, but let us take a closer look.

Here is a picture of a Date.parse() test I ran on the three major browsers, chrome, safari and firefox on osx snow leopard and the code used to make it happen. It's clear that v8 has the best date parsing support, but I'll let you be the judge.

chrome, safari and firefox

What I'm doing here is running an array of different date and date/time strings through Date.parse() to see which date/time formats a given engine will accept. If an engine can successfully parse a string it will return an integer, if not it will return NaN (Not a Number). This integer is the date/time represented as the number of milliseconds since Jan 1, 1970, aka. the epoch. Obviously this won't work for dates older than Jan 1, 1970 but I'm mainly interested in present and forward dates. Run the code for yourself, feel free to add any formats you are interested in and take a look at the results. After you are done being all 'WTW' about it come back. Ya, this sucks. This sucks... hard. But, like, why... specifically. 

As it turns out computers can sort all kinds of stuff, strings included. However, what you'll find is that string representations of dates like Jan, 1 2011 won't sort the way you want them to... you know chronologically. To skip that problem you need to convert said date into an integer - which computers also have no problem sorting - yet this time the sorting will mean exactly what you want it to mean. For example, this is an issue in Riak when you want to map/reduce over a range of keys and then do interesting things to them based on dates contained therein. I talk about sorting by date in Riak in a previous post.

 

So now that we know our date parsing is all out of whack what can we do about it?

If your application calls for date conversions do yourself a favor and do it in your middleware - node.js, python, ruby, what have you - and ship integers out directly to the end user and to the database. Centralizing your parsing will ensure that all your parsing is being done under the same rule set. All javascript implementations will be able to go from an integer to a date like so:

new Date(integer)

This will eliminate the vagaries of conversion in multiple environments. Sure, a lot of these headaches can be curtailed by controlling the date format before it enters your system but sometimes it is just out of your control. Minimize the number of places in your application for discrepancy and ship data around your stack in a format every language implementation can understand.

If you know of any other interesting gotchas between the various Javascript implementations please share them in the comments.

Paginating with Riak

The question of pagination comes up from time to time on the Riak mailing list and in #riak on irc.freenode.net, most recently a few days ago. In reply, I always say something along the lines of "No. Riak does not do pagination." Let's take a look at what pagination is and why Riak has a hard time doing it. Pagination is generally defined as the ordered numbering of pages in a publication, usually a book. Now let's take that book and make it a Hot 100 list of super cool things that we want to put on a website. As far as we are concerned pagination is the ability to select a subset of information, in sequence, from a larger set of information.

Let's work with the numbers 1 through 100, in order. We could interest ourselves with the numbers one at a time or, perhaps, 10 at a time. If we were to page through those numbers we would have to know primarily two things: where are we starting and how much do we want. In addition, any meaningful pagination would require the larger set to be sorted. Working with our earlier definition and our example, Riak presents one chief complaint: sorting.

Riak at its core is a distributed key/value persisted data store that also happens to do a lot of other things. Now break that down. Looking at those words individually we have "distributed", meaning that your data lives on a number of different machines in your cluster. Good thing, right? Yes. However it also means that no single machine is the canonical reference for all your data. Which in turn means that you need to ask multiple machines for your data and those machines will return data to you when they see fit, ie. not in order. Moving on, we have "key/value". In regards to the topic at hand, this means that Riak has no insight into any data held within your keys, ie. Riak does not care if your stored json object has an age value in it. Next, we have "persisted". Riak has no native internal index, meaning Riak will not store on disk the data you send it in any useful way - useful to you at least. Lastly, we have "happens to do a lot of other things." Thankfully for us, one of those other things is Map/Reduce.

Map/Reduce is where all those previous sorting problems, uh, sort themselves out. Map/Reduce is basically the way for you to marshal your data in Riak. Basically, Map/Reduce takes your unsorted heaping mess of data and whips it into shape. I'll be using the riak-js module for nodejs to talk to Riak and walk through our example. Using a stock Riak install we will populate 100 keys, named 1 through 100, with a simple json object and select a subset of those keys using m/r. This example expounds on a brief mention of the subject in a Basho blog post from the summer of 2010. We will be taking advantage of a number of built-in javascript functions that come bundled with Riak. See you after the code.

Basically run populate-riak to populate a bucket, then run paginate-riak to get a "page" of that data back. All this works off the command line. Cool, right? Well, ya. Except... if you are contemplating running a site with any meaningful scale this will not function that well for you. Hmm, why is that? Well, on its face this method will work but what it is doing is pulling all records in your bucket and sorting your result set on the fly - every time you call it. This will fall down as the number of records in your system grows, aka. the number of records you need to sort grows. You really need to employ caching at different layers of your application to make this work better. Allowing your users to run the above every time they want to paginate a set of records is just a recipe for disaster. As an ad-hoc query run once in a while it should work fine, ie. perhaps run on a frequency to build a paginated cache that your user facing application hits directly.

Bear in mind that this is not a knock on Riak. It is simply a limitation that is inherent in the design of Riak. When evaluating a persistent data store you should take into account the good, of which Riak has a fair amount of, and the not so good. This is just one area where your application will have to accomodate the shortcomings by making judicious usage of pre-emptive caching. Now, when asked in the future whether or not Riak supports pagination I'll simply give a qualified "Sort of."