siculars on stuff

Ricon 2012 Recap

Back in July 2012, Basho's Technical Evangelist, Tom Santero, posted an invitation to the riak mailing list to a conference Basho was holding on distributed databases called Ricon. Tom said:

...the goal is to put on a two day event that caters to developers working in, with and
around distributed systems. While Riak will be a primary focus, we've
invited several select speakers whose reputations and work in distributed
systems procceed them.

I've been in and or around the non-traditional database world for a few years now prodding and poking around, checking out the various flavors, reading up on comparisons and perusing their mailing lists. Although garnering a lot of attention ranging from justified to hyperbolic in the last few years, non-relational databases - often falling under the conflated NOSQL moniker - are still a niche, fledgling set of technologies. Who would come to this conference? Basho's Riak simply had not had http://www.google.com/trends/explore#q=riak%2C%20couchdb%2C%20redis%2C%20mongodb&cmpt=q" target="_blank">the same mindshare that a Mongo or even a Cassandra or Couch have enjoyed, not to mention the overall Hadoop ecosystem. In truth, I was skeptical. Oftentimes, a company specific conference devolves into a marketing and sales fun fest that leaves you running for the exits.

Well, turns out Ricon ended up selling out at around 350 distributed database enthusiasts and believe it or not, the conference did not devolve but presented a very interesting mix of academic theory and practical implementation. Others thought so as well:

High pace, quality content. Refreshing mix of academic and practical. Looking forward to #ricon2013. #ricon2012
— Pieter Noordhuis (@pnoordhuis) October 12, 2012

The Talks

Bookended by two preeminent speakers, Joseph Hellerstein and Eric Brewer, Ricon started and ended like an academic conference. Their talks were both theoretical and practical in that they both presented distributed database concepts like monotinicity and commutative replicated data types (CRDT's) in an approachable way and coupled that with practical next steps for current and future development like Joe and his team's efforts on Bloom language (not to be confused with Bloom filters) and Russel Sears' Stasis. These talks were about as product agnostic as any at Ricon.

There were talks by Basho engineers that built on and further bridged the gap of the theoretical with the practical as implemented in Riak. Sean Cribbs and Russell Brown showed us how some of those theoretical principles are being realized in code by releasing riak_dt, an experimental Riak branch that adds things like counters to standard Riak. Counters have been available in Cassandra for a while now and have been something the Riak community has been interested for some time. And by Riak community, I mean me... and possibly a few others. Bryan Fink gave a talk on Riak Pipe which is the underpinnings of Riak's map reduce implementation. The Q&A after Bryan's talk was dominated by the possibility of leveraging Pipe to create a real-time processing environment in a similar vein to that of Storm.

One talk I was particularly interested in was Ryan Zezeski's Yokozuna presentation which highlighted ongoing development of Riak's search capabilities. The new effort is a fresh start and departure from the current implementation. Yokozuna replaces the current Lucene-like, a.k.a. not Lucene, search internals for actual Lucene by way of integrating Solr directly into Riak. There are quite a few advantages to be had here. As Ryan indicated in his talk, Basho is not in the business of search. By relying on Lucene to drive search, Basho effectively outsources the core search problem to a proven and well-known solution. Additionally, this reorients the current term-based partitioning mechanism with a document-based partitioning mechanism. What that basically means, is that your search index will live on the same node as your document which has implications for most composite search queries. Part of Gary Flake's keynote delved into this particular consideration and how his company, Clipboard, deals with this currently.

Two non-Riak talks that were quite interesting were Dana Contreras's talk on Twitters' internal stack re-architecture employing interconnected independent services and Accenture on how to present "big data" as a concern, opportunity and solution to decision-makers in your organization. Dana's talk took a look at the internal engineering and operational process as it evolves in an orginization as their code base and engineering head count grows. For Twitter, segmenting their internal services not only provided for a cleaner code base but also allowed teams of people to focus on one area of the code base at a time. Accenture's talk was the most "businessy" of the conference but I think it is something that developers and engineers should have some insight into. Market fit, risk, productization and cost are all concerns for your executive suite and as a developer or consultant you need to speak to those concerns when pitching solutions. Accenture's analysis shows that in the last few years, mainly thanks to the popularity of Hadoop, executives in traditional industries are aware of "big data" and are open to entertaining solutions in the space.

Future directions

Where does Ricon go from here? There is always room for improvement. My main concern is that the content remain diverse in terms of products. Politics and money aside (is that even possible?), I would like to see continued participation from other vendors or even end users whose solutions employ other vendors' products if this conference is to continue billing itself as a distributed systems gathering. Riak is not the only solution in the space and it is worthwhile knowing how other solutions are implementing distributed principles. I definitely think, on the whole, an inclusive approach simply stands to grow the non-traditional distributed database/systems pie.

Chatter by conference attendees left me convinced that Ricon was a success. Ricon was-well executed, well-attended and actually interesting. But more importantly, it was relevant. For those of us at the conference, we actually work in this space. We are interested in the ongoing development of distributed solutions to a number of problems. The conference delivered on creating a space that brought us together to share solutions and learn about continuing advancements. For a new conference to have a successful maiden voyage is no small feat in my book. I, for one, am looking forward to the next one.

I would love to hear your thoughts on what would make a good distributed systems conference. Were you at Ricon? What did you think?

The world is awash in Javascript, but not everyone is speaking the same dialect

The Background

Recently I've been spending a fair amount of time hacking away in Javascript on the improbable node.js. As I've gushed, my preferred stack of choice nowadays is Redis, node.js and Riak. One great thing about this arrangement is that both node.js and Riak speak Javascript natively. Considering that Javascript is the lingua franca of end user browsers everywhere you can imagine a situation where being able to develop in Javascript throughout the stack could be a compelling proposition.

With visions of one language to rule them all running through my mind I set off to remake my world of applications in a new image. A better image. The transformation from a hodgepodge of languages goes smoothly. Synergies are being realized as functions get packaged for reuse in different environments. Alas, all is not well in the Shire. I've run into a snag and it is somewhat interrupting secondsies.

The Problem

Middle-earth aside, it turns out that all Javascript engines are not created equal. Beyond specific performance differences between the various Javascript implementations there are also subtle differences in feature support. In my humble opinion, nowhere is this more glaring then in date parsing. In an era where you are executing javascript in three separate environments - browser, middleware and database - and potentially against three different engines, you need to concern yourself with the details. On the client end your user can be using any of a number of different browsers all with their own Javascript engine and version number. If you are using node.js your engine is v8, Riak uses spidermonkey as its engine. Do not be lulled into a sense of security due to the fact that you are using one language throughout. In some instances you may as well be in two completely different languages. Ok maybe that's a bit much, but let us take a closer look.

Here is a picture of a Date.parse() test I ran on the three major browsers, chrome, safari and firefox on osx snow leopard and the code used to make it happen. It's clear that v8 has the best date parsing support, but I'll let you be the judge.

chrome, safari and firefox

What I'm doing here is running an array of different date and date/time strings through Date.parse() to see which date/time formats a given engine will accept. If an engine can successfully parse a string it will return an integer, if not it will return NaN (Not a Number). This integer is the date/time represented as the number of milliseconds since Jan 1, 1970, aka. the epoch. Obviously this won't work for dates older than Jan 1, 1970 but I'm mainly interested in present and forward dates. Run the code for yourself, feel free to add any formats you are interested in and take a look at the results. After you are done being all 'WTW' about it come back. Ya, this sucks. This sucks... hard. But, like, why... specifically.

As it turns out computers can sort all kinds of stuff, strings included. However, what you'll find is that string representations of dates like Jan, 1 2011 won't sort the way you want them to... you know chronologically. To skip that problem you need to convert said date into an integer - which computers also have no problem sorting - yet this time the sorting will mean exactly what you want it to mean. For example, this is an issue in Riak when you want to map/reduce over a range of keys and then do interesting things to them based on dates contained therein. I talk about sorting by date in Riak in a previous post.

So now that we know our date parsing is all out of whack what can we do about it?

If your application calls for date conversions do yourself a favor and do it in your middleware - node.js, python, ruby, what have you - and ship integers out directly to the end user and to the database. Centralizing your parsing will ensure that all your parsing is being done under the same rule set. All javascript implementations will be able to go from an integer to a date like so:

new Date(integer)

This will eliminate the vagaries of conversion in multiple environments. Sure, a lot of these headaches can be curtailed by controlling the date format before it enters your system but sometimes it is just out of your control. Minimize the number of places in your application for discrepancy and ship data around your stack in a format every language implementation can understand.

If you know of any other interesting gotchas between the various Javascript implementations please share them in the comments.

Paginating with Riak

The question of pagination comes up from time to time on the Riak mailing list and in #riak on irc.freenode.net, most recently a few days ago. In reply, I always say something along the lines of "No. Riak does not do pagination." Let's take a look at what pagination is and why Riak has a hard time doing it. Pagination is generally defined as the ordered numbering of pages in a publication, usually a book. Now let's take that book and make it a Hot 100 list of super cool things that we want to put on a website. As far as we are concerned pagination is the ability to select a subset of information, in sequence, from a larger set of information.

Let's work with the numbers 1 through 100, in order. We could interest ourselves with the numbers one at a time or, perhaps, 10 at a time. If we were to page through those numbers we would have to know primarily two things: where are we starting and how much do we want. In addition, any meaningful pagination would require the larger set to be sorted. Working with our earlier definition and our example, Riak presents one chief complaint: sorting.

Riak at its core is a distributed key/value persisted data store that also happens to do a lot of other things. Now break that down. Looking at those words individually we have "distributed", meaning that your data lives on a number of different machines in your cluster. Good thing, right? Yes. However it also means that no single machine is the canonical reference for all your data. Which in turn means that you need to ask multiple machines for your data and those machines will return data to you when they see fit, ie. not in order. Moving on, we have "key/value". In regards to the topic at hand, this means that Riak has no insight into any data held within your keys, ie. Riak does not care if your stored json object has an age value in it. Next, we have "persisted". Riak has no native internal index, meaning Riak will not store on disk the data you send it in any useful way - useful to you at least. Lastly, we have "happens to do a lot of other things." Thankfully for us, one of those other things is Map/Reduce.

Map/Reduce is where all those previous sorting problems, uh, sort themselves out. Map/Reduce is basically the way for you to marshal your data in Riak. Basically, Map/Reduce takes your unsorted heaping mess of data and whips it into shape. I'll be using the riak-js module for nodejs to talk to Riak and walk through our example. Using a stock Riak install we will populate 100 keys, named 1 through 100, with a simple json object and select a subset of those keys using m/r. This example expounds on a brief mention of the subject in a Basho blog post from the summer of 2010. We will be taking advantage of a number of built-in javascript functions that come bundled with Riak. See you after the code.

Basically run populate-riak to populate a bucket, then run paginate-riak to get a "page" of that data back. All this works off the command line. Cool, right? Well, ya. Except... if you are contemplating running a site with any meaningful scale this will not function that well for you. Hmm, why is that? Well, on its face this method will work but what it is doing is pulling all records in your bucket and sorting your result set on the fly - every time you call it. This will fall down as the number of records in your system grows, aka. the number of records you need to sort grows. You really need to employ caching at different layers of your application to make this work better. Allowing your users to run the above every time they want to paginate a set of records is just a recipe for disaster. As an ad-hoc query run once in a while it should work fine, ie. perhaps run on a frequency to build a paginated cache that your user facing application hits directly.

Bear in mind that this is not a knock on Riak. It is simply a limitation that is inherent in the design of Riak. When evaluating a persistent data store you should take into account the good, of which Riak has a fair amount of, and the not so good. This is just one area where your application will have to accomodate the shortcomings by making judicious usage of pre-emptive caching. Now, when asked in the future whether or not Riak supports pagination I'll simply give a qualified "Sort of."

Using Riak's map/reduce for sorting

From a database perspective, Riak is a schemaless, key/value datastore. The focus of this post is to show you how to do the equivalent of the sql "SORT BY date DESC" using Riak's map/reduce interface. Due to Riak's schemaless, document focused nature Riak lacks internal indexing and by extension, native sorting capabilities. Additionally, Riak does not have a single file backend. The primary default backend is called Bitcask but Riak does offer a number of different backends for specific use cases. This makes an internal general purpose index implementation impractical, especially so once you factor in the distributed nature of the platform.

So how does a sort actually work in this environment? Map/Reduce. Riak implements map/reduce as its way of querying the riak cluster. Lets keep this description light and simply say: Riak brings your query (for the most part) to the node where your data lives. The map part of your query is distributed about the cluster to the nodes where the data resides, executed, then results sent back to the originating node for the reduce phase. You can write your map/reduce query in two different languages - erlang and javascript (Spidermonkey is the internal JavaScript engine.)

So now that you have a basic theoretical underpinning, how does this actually work in practice? I'm including here a snippet of a heavily commented javascript function that i use in one of my nodejs apps. The bridge between nodejs and Riak is a module called riak-js (disclosure, I've contributed some patches.) Let's take a look, I'll see you on the other side.

Lets break this down. This function is part of a larger nodejs application that uses the fu router library lifted from node_chat, a quite approchable getting-to-know-node example application. No you can not cut and paste this code somewhere and have it work. What you should do is take a look at the map and reduceDescending variables (lines 15 and 40). Those functions are written in javascript and sent over the wire to riak. Lets go over some of the magic that makes this work.

Riak will gladly accept a bucket as it's input mechanism in a map/reduce. Although Basho has done a good amount of work to make this performant, simply passing a bucket will force an expensive list:keys operation internally. The more keys you have in your system the longer this will take. Sometimes this is unavoidable or even desirable. Most likely you will want to expressly pass keys to the map/reduce job. This is done in the format:

[ ["bucket","key1"],["bucket","key2"],["bucket","key3"],["bucket","key4"] ]

Now, although I'm passing the keys here in order (key1... keyN), recall that riak has no internal concept of ordering. The map phase will seek out the keys wherever they live and the result is not guaranteed to be ordered. What is needed is to sort the result set in the reduce phase once all the data has been collected. In this case I will be sorting by the X-Riak-Last-Modified header which is a date kept in the format "Tue, 31 Aug 2010 06:46:02 GMT". Well, that doesn't look like a sortable string, does it? The trick is to turn it into an int, as I do on line 28:

o.lastModifiedParsed = Date.parse(v["values"][0]["metadata"]["X-Riak-Last-Modified"]);

Here the string date is pulled out of the header and converted via the native javascript function Date.parse() into an int. It is the int that allows the numeric sorting in the reduce phase on line 46:

v.sort ( function(a,b) { return b['lastModifiedParsed'] - a['lastModifiedParsed'] } );

The format "b-a" is what dictates descending order, conversely ascending order would be written as "a-b". Remember the value is embedded within a javascript object and needs to be accessed as such. This trick can be used with any integer value embedded in a json object. If my "key" (on line 30) were an int I could use that, or maybe a price or quantity value.

Map/reduce is a bit tricky to wrap you mind around when coming from a relational/sql background but the new breed of NoSQL databases available make it easy to duplicate many of those features. Riak exposes a fully functional map/reduce implementation to get at all the nested parts of your complex json documents. So what are you waiting for? Get codin!

Notes from the Basho webinar on benchmarking Riak

Something that often comes up in the various nosql message boards and irc channels from new and experienced users alike is the broad question of performance. How many ops/sec can I squeeze out of Riak/Mongodb/Cassandra/etc.? How many keys can it hold? How will performance degrade if most of the values I'm keeping are less than 100KB on Tuesday's but on every other Thursday they spike to 500KB. Most of the time I have 80% reads vs 20% writes but I want to know what would happen if that mix changes. Will it shred my disk? Do I have enough I/O for my load? I've seen all those and then some out there in the wild... Ok. Maybe not the alternate Thursday's, but you get my point.

Users need a uniform, simple to use mechanism to test their systems themselves. There are so many floating variables that govern overall system performance that it is hard to get a straight answer from anybody, but more specifically - hard to get an answer that is right for you and your unique needs.

Earlier today I had the pleasure of sitting in on a webinar hosted by Basho, the makers of Riak. Shortly, Basho will release basho_bench (I believe that is the correct name), a framework for benchmarking Riak. This all dovetails nicely with a Basho blog post regarding the inevitable comparisons between various nosql offerings. Beyond having many knobs and levers to tweak for your demanding benchmarking needs, I'll touch on three features that make this tool very useful.

Each baso_bench test is a configurable, simple text file. This will allow standard test patterns to be developed and shared amongst the community for various use cases. Basho_bench is also integrated with the R statistical analysis programming language. All tests dump their results to their own self contained folder which is than used by R to print out eye candy graphs. Oooh... shiny. Most importantly, basho_bench has the ability to change the transport mechanism by which it connects to Riak. Because Riak itself supports multiple access methods (http, protobuf and native erlang client), the framework will allow the basho_bench tool to be extended to support benchmarking on other nosql key/value like systems. I see the glimmer of a thrift interface in the distance... This single feature will go a ways to making basho_bench a standard test suite in the nosql space.

Keep your eyes open for the release of the basho_bench tool in the next week or so.

The following are some of my non-authoritative, off the cuff notes form the presentation. Many of them you should be familiar with from benchmarking in general and some are specific to the options available in this new suite. The full slide stack should be available from Basho in the next week or so.

Performance measured in -

Throughput - operation/sec
Latency

Test typical and worst-case scenarios

Minimize variables changes between tests

Run early and often

Iterative testing process

Introducing basho_bench

benchmark anything that is a key/value store (other nosql solutions)
spins up multiple threads (akin to concurrent requests)
driver specification (http, protobuf, etc)
event generator (80% read / 20% write)
key generator (incrementing integer)
payload generator (various size, binary)

Microbenchmarks are bad

benchmarks should be long running
cache warm ups
page flushes
backend specific issues

Eye candy output via R integration

Key generation

sequential ints
pareto ints (simulate hot keys)

Value generation

fixed length random bin data
random length random bin data

Benchmarking is Hard

tool and system limits
multi-variate space
designing accurate tests
dont take results out of context
everything is relative

Gotchas

file handler exhaustion
swapping thrashing (one run only developer problems after 12hrs)

Conduct your own tests, things to find out

gets vs puts vs deletes
key distribution
value size distribution

Sad face - "Too many open files"

While developing a new project that utilizes node.js as the webserver and riak as the datastore, I encountered this error (here is a relevant dump):

InnoDB: Operating system error number 24 in a file operation.

InnoDB: Error number 24 means 'Too many open files'.

InnoDB: Check InnoDB website for details

My developer platform is a beefy mac with lots of ram that lets me run many virtual machines at the same time, but I digress. Thankfully, riak dumps operations to a log file in its log folder so basically all you need to do to get that info is to check your most recent log like so:

$cat < `ls -tr log/erlang.log.* | tail -n 1`

The problem has to do with the fact that the Mac (and other *nix's) limits the maximum number of files you can have open at any one time. Unfortunately, as my testing has shown, riak likes to hog files. Especially when using the innostore backend. More or less, as of this writing (riak v0.8), the innostore backend to riak will open 64 files for every bucket that you use. Even if there is no data in the bucket. A call to "http://host:port/riak/someBucket" will instruct innostore to create the 64 files if they do not exist and open them. Each file is about 100KB with no real data in them, aka empty, in case you were wondering.

So how do I keep testing while Basho figures out a way to fix this issue? By upping my max file limit, of course! First let us check what are current limits are:

$launchctl limit

Note that that limit is a user specific limit. The kernel has its own limit which is check like so:

$sysctl -A | grep kern.maxfiles

I got most of this information from krypted.com, which has the most cohesive description in one place that I have yet to find on this subject outside of kernel-developer mailing lists. I got the userland shell info from here. Now for the eagle eyed, those two pages may have some overlapping yet conflicting information. I employed the following method, which may not be best for all users, YMMV. Be warned.

Borrowing heavily from the aforementioned posts, I simply created an /etc/launchd.conf file like so:

$sudo touch /etc/launchd.conf

Then edit the launchd.conf file to include these lines:

limit maxproc 1024 2048

limit maxfiles 2048 4096

The two columns are a soft limit and a hard limit. The distinction being... eh whatever, google is your friend. But importantly, the soft limit must be a number smaller than the hard limit and the hard limit may be "unlimited". I also added a maxproc line in there for good measure while I was at it. Of note, after reboot, sysctl and launchctl command both return the new values. If I didn't have so many windows open right now I would do more testing about that, but eh, whatever.

[UPDATE 10MAR2010]

Got an email from Jon Meredith, a Basho employee stating the following:

You can control the maximum number of file descriptors that Embedded
InnoDB will use by setting open_files in the innostore configuration.
The default value compiled in is 300 which exceeds the OS X default of
256.
In riak/etc/app.config add this to your innostore section
{innostore, [
           %% Other Inno settings
           {open_files,               100}            %% Limit to 100 open file handles
          ]}

So there you have it. A number of ways to work around the open files limit on macs.