*This post and work on HipHip are done in part by Leon Barrett, who spent a ‘sprintbatical’ visiting fellow Clojure shop Prismatic, and Emil Flakk. This post is cross-posted to Prismatic’s blog.*

The Climate Corporation and Prismatic share the unique challenge of developing new algorithms and making them work at a large scale. This combination of research and engineering takes many cycles of algorithm and model design, including rapid implementation to test on lots of real data. In contrast to most engineering prototyping, such numeric modeling demands high performance out of the gate, since correctness cannot be judged on just a little bit of data. In this post, we introduce our open-source array processing library HipHip, which combines Clojure’s expressiveness with the fastest math Java has to offer.

# Computational Functional Programming

As outlined in several past posts, Climate and Prismatic love Clojure (and functional programming generally) because it allows us to quickly build performant and reusable tools. However, many of the data abstractions that make Clojure code so reusable don’t yield fast numerical computation. For instance, consider the dot product operation, which is the core performance bottleneck in most machine learning algorithms. Using Clojure’s built-in sequence operations, it would be natural to write the dense dot product as:

```
;; 1ms for 10k doubles: 20 MFlops
(defn dot-product [^doubles ws ^doubles xs]
(reduce + (map * ws xs))
```

The above code succinctly expresses the idea that we form the sequence of element-wise products between `ws`

and `xs`

, and then we sum those entries. But its performance is a disaster, especially if you’re considering real-world machine learning use where each prediction could require thousands of dot products. The core of the slowness arises from the fact that intermediate operations produce sequences of ‘boxed’ Java Double objects. All arithmetic operations on these boxed objects are significantly slower than on their primitive counterparts. This implementation also creates an unnecessary intermediate sequence (the result of the `map`

), rather than just summing the numbers directly.

# Clojure’s built-in array macros vs. plain-old Java

Clojure comes with built-in array processing macros (`amap`

and `areduce`

) which extend Clojure sequence operations to arrays and allow for primitive arithmetic. Here’s what dot-product looks like using those macros:

```
;; 8.5 us for 10k doubles: 2.3 GFlops
;; (11 us with *unchecked-math* false)
(defn dot-product [^doubles ws ^doubles xs]
(areduce xs i ret 0.0
(+ ret (* (aget xs i)
(aget ws i)))))
```

The performance here is acceptable, but the code itself is pretty convoluted. Arguably, this isn’t even much of an improvement over the Java version:

```
// 8.5 us for 10k doubles: 2.3 GFlops
public double dotProduct(double [] ws, double[] xs) {
double result = 0.0;
for (int i=0; i < ws.length; ++i) {
result += ws[i] * xs[i];
}
return result;
}
```

# Why not just use Java for math bottlenecks?

Since the dot product is a crucial bottleneck for machine learning, why not just suck it up and use the Java version? Clojure makes Java-interop easy so that you can take the core bottleneck of your system and make it fast.

The problem is that for computational systems, there typically isn’t a single bottleneck. Instead, any function that performs array operations (whether `double`

, `long`

, `float`

, etc.) over each item of data can cause an unacceptable slowdown, even during the research prototyping stage. Since these sorts of array operations are the bulk of our computational code, our research engineers would mostly be stuck writing Java. We need the expressiveness of Clojure **and** the speed of Java primitives.

# Introducing the HipHip array processing library

We’re happy to announce HipHip, a Clojure array-processing library. Here’s what a dot product looks like in HipHip:

```
;; 8.5 us for 10k doubles: 2.3 GFlops
(require '[hiphip.double :as dbl])
(defn dot-product [ws xs]
(dbl/asum [x xs w ws] (* x w)))
```

All the arithmetic operations are performed over primitive doubles and which allows the speed to match Java. Here we get a clean, functional representation of the operation that matches Java performance without any worrying about type hints.

HipHip provides namespaces for working on arrays of the four main primitive math types (`double`

, `float`

, `long`

, and `int`

) without type hints, which include:

- Versions of
`clojure.core`

array fns that do not require type hints:

```
(dbl/aset xs i (* 2.0 (dbl/aget xs (inc i)))))
```

- A family of macros for efficiently iterating over array(s) with a common binding syntax, including
`amake`

,`doarr`

,`afill!`

, and our own versions of`amap`

and`areduce`

.

```
(defn add-in-place!
"Modify xs, adding to each element the corresponding element of ys"
[xs ys]
(dbl/afill! [x xs y ys] (+ x y)))
(defn pointwise-product
"Produce a new double array of the element-wise product of xs and ys"
[xs ys]
(dbl/amap [x xs y ys] (* x y)))
```

- Common ‘mathy’ operations like summing over an array and selecting a maximal element (or many):

```
(defn standard-deviation [xs]
(let [mean (dbl/amean xs)
mean-sq (/ (dbl/asum [x xs] (* x x))
(dbl/alength xs))]
(Math/sqrt (- mean-sq (* mean mean)))))
(= 1 (dbl/amax-index (double-array [1.0 3.0 2.0])))
(defn max-normalize [xs]
(let [m (dbl/amax xs)]
(afill! [x xs] (/ x m))))
```

You might find some of these operations useful, even if your data isn’t already in arrays. For

example, the following function can find the top 1000 elements of vector `xs`

by `score-fn`

about 20 times faster than `(take 1000 (reverse (sort-by score-fn xs)))`

on 100k elements.

```
(defn top-k
"Return the top k elements of xs according to score-fn"
[k score-fn xs]
(let [n (count xs)
scores (dbl/amake [i n] (score-fn (nth xs i)))
^ints idxs (dbl/amax-indices scores k)]
(seq (hiphip.array/amake Object [i k]
(nth xs (aget idxs (- n i 1)))))))
```

HipHip also provides a generic namespace `hiphip.array`

with variants of the iteration macros

that work on all (hinted) array types, including mixing-and-matching a variety of array types

in the same operation:

HipHip uses Criterium for benchmarking, and runs benchmarks as tests, so we can be pretty confident that most of our double array operations run 0-50% slower than Java (but we’re not quite there for other array types yet, see the ‘Known Issues’ section of the readme for details). The readme also describes a critical option for fast array math if you’re running under Leiningen.

Check it out and let us know what you think in the Hacker News thread.

Tried running dot-product and ended up with the following error:

Exception in thread “main” java.lang.ClassCastException: clojure.lang.PersistentVector cannot be cast to [D

at hiphip.core$dot_product.invoke(core.clj:6)

at hiphip.core$_main.doInvoke(core.clj:14)

at clojure.lang.RestFn.invoke(RestFn.java:397)

at clojure.lang.Var.invoke(Var.java:411)

Yes, filou, these macros are designed to work on primitive arrays, made for example by

`float-array`

or`hiphip/float.amake`

. For speed, you would want primitive arrays anyway because they avoid the cost of boxing. They also are mutable, which can be important for writing really fast inner loops.So, this will

notwork:`(hiphip.double/asum [1 2 3 4])`

but this will work:

`(hiphip.double/asum (double-array [1 2 3 4]))`