At the Climate Corporation, we have a great demand for storing large amounts of raster-based data, and an even greater demand to retrieve small amounts of it quickly. Mandoline is our distributed, immutable, versioned database for storing big multidimensional data. We use it for storing weather data, elevation data, satellite imagery, and other kinds of data. It is one of the core systems that we use in production.
What can Mandoline do for me?
Mandoline can store your multidimensional array data in a versionable way that doesn’t bloat your storage. When you don’t know what your query pattern is, or when you want to preserve past versions of your data, Mandoline may be the solution for you.
Some more details
- It’s a clojure library.
- Using the Mandoline library, we have built services both to access and to ingest data in a distributed fashion. The reasons for doing it this way is so that our scientists have easy and language agnostic access to the data, and so that it plays well with existing scientific tools, like netCDF libraries and applications in any language.
- When we say distributed, we mean that you can have distributed reads and writes from different machines to your dataset.
- Mandoline uses swappable backends, and can save actual data to different backends. In production, it currently runs on Amazon’s DynamoDB. For testing purposes, we can either use the sqlite or in memory backends.
- We want to expand the backend offerings to other databases like Cassandra and HBase.
- Mandoline takes advantage of shared structure to make immutability and versions possible.
For more of an introduction to Mandoline (formerly known as Doc Brown), here is a video of the talk I gave at Clojure/West 2014.
Mandoline was primarily written by Brian Davis, Alice Liang, and Sebastian Galkin. Brian Davis and Steve Kim have been the main push to open source Mandoline for use with the general public.