NYC Dives Into Open Data

If you have been following the Government 2.0 revolution happening in America, you know that there is a big push happening at all levels of government for making government data available to the public. Considering the glacial pace of American bureaucracy it is really quite amazing there has been any movement in this area at all and a marvel we have come as far as we have. Sure enough the Federal government produces and, not to be out done, my very own municipality brings us the DataMine. We are in for an interesting future as every person walking the streets has access to applications that wring useful data out of the municipal data maze.

Admittedly I haven't followed this space that closely but with twitter and blog bots you're never really too removed from the conversation. Yesterday, RWW published a timely piece on a new contest being conducted right here in my neck of the woods. The NYC BigApps Competition describes itself as:
"A software application competition to
make New York City more transparent,
accessible and accountable, and
an easier place to live, work and play.

For me, the contest is a huge incentive to attract interested parties to the data honeypot. Only good can come of this. What I am really interested in are the datasets that NYC just gifted us with. With that, let us take a look at how to get at them. All files are available via the DataMine, a fairly unassuming site no where near as flashy as it's Federal counterpart. All you have to do is simply navigate the datasets section and before you know it some two hundred data sets are at your fingertips. If I had designed the site I probably would have merged raw with geo data and just given users a unified search interface.  No doubt, direction was taken from the federal data set raw/geo breakdown. Datasets are available in many different formats which you are responsible for knowing how to read. DataMine provides links to appropriate reading software to point you in the right direction.

Beyond the search awkwardness the main gripe I have with the DataMine is that it requires you to acknowledge their terms of use on every download page whereas the feds bind you to their terms by simply downloading data. Interestingly the federal ToS (terms of service), in my humble non-legal opinion, are more permissive to wit I submit the following excerpt from the federal terms compared to the following excerpt from NYC's terms:
Excerpt from Federal ToS
Excerpt from NYC ToS
Data accessed through do not, and should not, include controls over its end use. The City may require a user to terminate any and all display, distribution or other use of any or all of the datasets or feeds for any reason including, without limitation, violation of these Terms of Use or other terms defined by City entities contributing data to the Site.

Clearly we are at Mayor-for-Life Michael Bloomberg's mercy. Anybody see Rome? Oh, good times... but I digress ;)  But all jesting aside we have the Dear Leader to thank for this. Michael Bloomberg has indeed pushed to make this a reality and in turn, I salute him.


So, by now you may have played around with DataMine search system. You may have even wondered how this whole thing works behind the scenes. Lets take a closer look under the hood. As with most things deconstructive when it comes to website analysis FireBug is your friend. Just browsing the site with the FireBug inspection window open yields a number of interesting tidbits. The first thing you should search for when doing this kind of analysis are script blocks and script includes. Such a search yields the following (non-exhaustive, excluding some Google api includes) list of scripts:

Navigation related, left hand column stuff


+ points for offering site translation.
++ points for Google translate.
-- for an army of calls to google.language.translate()
Better still would be to pre-translate the static pages (majority) and shove those in some cache somewhere instead of hitting up Google all the time.


Privacy people may be interested in what's in here. I stopped looking after noticing they keep cookies around for a year.


Manages the search functionality and result display code.


Code that is called on the ToS confirmation page that passes the actual file url.

Now... the good stuff:


These two files are the meat of the entire app. All the datasets are enumerated in a trivially parseable text file which creates the 'datasets' javascript array used throughout the site. There are no column headers but it is not too hard to decipher based on the search results. Note that there is no version information other than what appears to be an integer increment appended to the end of the file name.


(The "../.." means climb two directories up and then follow the path back down. The root is, so will bring up the cookie script.)

(I won't link directly to the js files, if you are with me so far you know how to get them.)

Where do we go from here?

For those running DataMine I make these recommendations:
  • Streamline the current download scenario by adopting a similar ToS acceptance model to that of the federal 
  • Publish an rss feed encapsulating what is currently available via the raw.js and geo.js files.
  • Augment raw.js/geo.js to include a publication date and replace the integer at the end of each file with that publication date (yyyymmdd format).
  • Change your translation implementation to cache translated static pages.

Overall I am quite happy that NYC has taken this big step in the right direction. For those thinking it is not enough just stop for a moment and think about how hard taking the first step in any endeavor can be. The team that put DataMine together and were able to wrangle data from the bureaucracy that is NYC government should be commended. This is just the beginning. As for me, watch this space for some more on what to do with all this data.