Abstraction and Performance with Graph

This post is by Leon Barrett, who has been collaborating with our fellow Clojure users at Prismatic. The Climate Corporation has “sprintbaticals,” two-week sabbaticals to work on something a little different, and Leon decided to contribute to Prismatic’s open-source Graph library, making it 30X faster for our use case. This post is cross-posted to their blog.

The Promise of Graph

At The Climate Corporation, we model the weather and how it affects the growth of crops. As one small aspect of that, we have a computation for incoming sunlight at a given latitude on a given day of the year. This code is run in the inner loop of some of our models; every time we need to know how plants will fare in some place on some day, we need to compute it. We want it to be fast (microseconds per call), and of course it should be: it consists of only several dozen floating-point operations. The code is not too complicated, just a bit of physics and trigonometry, but it is still complex enough that we benefit from the use of Graph. In this post (spoiler alert), we’re going to describe how we were able to speed up Graph by a factor of 30 for our use case.

As discussed in a post on Prismatic’s blog, Graph helps reduce “incidental complexity“, and thus make coding easier, by letting us specify computations and their dependencies declaratively. With Graph, some things that are normally hidden in code—that the reason this function is called before that one is because its value is needed there—are completely explicit, making it far easier to change that code. Further, we can separately give an “execution strategy” that says how to execute a graph—in parallel, lazily, etc.

To use Graph (described in more detail on Github), you use the fnk macro to declare functions with attached schemas describing their inputs and outputs. Then, you link those up in a map, like this one showing how to compute incoming sunlight at a given place and time. The details of the math don’t matter too much for this discussion; what does matter is that it has interesting dependencies but simple individual computations.

{:Ra (fnk [day-of-year lat] (solar-rad-et day-of-year lat))
 :Rs (fnk [tmax tmin {kRs 0.16} Ra] (-> (- tmax tmin)
                                        (* kRs Ra)))
 :Rso (fnk [alt Ra] (-> (* 2e-5 alt) (+ 0.75) (* Ra)))
 :Rns (fnk [{a 0.23} Rs] (-> (- 1 a) (* Rs)))
 :tmaxK (fnk [tmax] (to-kelvin tmax))
 :tminK (fnk [tmin] (to-kelvin tmin))
 :ea (fnk [tmin] (sat-vapour-pressure tmin))
 :term1 (fnk [tmaxK tminK {s 4.903e-9}]
             (-> (Math/pow tmaxK 4) (+ (Math/pow tminK 4))
                 (* s) (/ 2)))
 :term2 (fnk [ea] (-> (Math/sqrt ea) (* -0.14) (+ 0.34)))
 :term3 (fnk [Rs Rso] (-> (* 1.35 Rs) (/ Rso) (- 0.35)))
 :Rnl (fnk [term1 term2 term3] (* term1 term2 term3))
 :solar-radiation (fnk [Rns Rnl] (- Rns Rnl))}

Note that that code clearly declares what variables each computation depends on. One benefit of that is in the form of this programmatically-generated image, which reveals that structure.

Image of graph of solar radiation computation

Despite all the benefits of Graph, none of the execution methods available in the first version of Graph were fast enough for our needs; each added a 30X (yes, 30 times) overhead. It only takes about 1 microsecond to run our floating-point computations, but more than 30 microseconds to run the graph computation! Because the solar radiation functions are so simple, any overhead in executing the graph is magnified enormously.

What was the source of this overhead? In the first version of Graph, every time a fnk was called, the caller made a map (e.g. {:day-of-year 102 :lat 34.2}) and the fnk pulled out its parameters. So, every time we called a fnk, we created a map full of its parameters, like so:

(defn restricted-call
  "Call fnk f on the subset of keys its input schema
  explicitly asks for."
  [f in-map]
  (f (select-keys in-map (keys (pfnk/input-schema f)))))

Similarly, the fnk macro generates destructuring code like this:

(fn term32635
    [Rs (get map2634 :Rs nil)
     Rso (get map2634 :Rso nil)]
    (-> (* 1.35 Rs) (/ Rso) (- 0.35))))

That use of maps was a nice win for Prismatic, making it simple to store lots of intermediate values. Prismatic uses Graph in their outer loop, so their individual fnks are much more complex (and slower) than Climate’s solar radiation calculation. In that case, the overhead of two dozen map constructions doesn’t matter. But, when applied to Climate’s inner loop, those end up being much more expensive than the several dozen floating-point operations.

Code generation is your friend

Rather than do map construction and destructuring with every function call, it would be more efficient to, well, just call a function with the desired parameters. It’s easy to have the fnk macro generate a normal positional function. Then we just need to arrange to call that function with the right arguments in the right place, making something like this:

(let [Ra (Ra-fnk day-of-year lat)
      Rs (Rs-fnk tmax tmin kRs Ra)

However, notice that we’ll need to generate this code at runtime. The structure of a graph may not be known at compile time; we might want to assoc new fnks into a graph or select-keys to pull an important piece out of the graph. That means macros will not be enough; even though famously “eval is evil“, we need to eval some code. But eval isn’t evil in Clojure—there’s no heavyweight construction of a compiler, no risk of mis-interpreting strings, just the same machinery we use to write macros. So we can inspect the graph at runtime, generate a nice little let, and then eval it, rather like this:

  `(fn positional-graph#
     (let ~(vec (interleave (vals value-syms) function-calls))
       (new ~(def-graph-record g)
            ~@(->> g pfnk/output-schema keys (map value-syms))))))

(For even more speed, we define a record for the output of the graph. A record is a lot like a map with a fixed set of keys, backed by a class with a member datum for each key, so it’s as fast as the JVM allows.)

The result is that Graph can now compile our tiny arithmetic graph with not 30X, but rather 20% overhead, essentially just the cost of calling twelve different functions (and a little for checking of optional parameters). This change takes the graph execution time from 30 microseconds down to 1.2 microseconds per call. Nothing is lost, and everything is backwards-compatible—the new positional functions are stored in the fnk‘s metadata, and the earlier graph compilation methods are still available. But now if you need to exercise your floating-point unit, Graph is prepared to help.

To inspect exactly what was changed, you can see the change on Github. Add [prismatic/plumbing "0.1.0"] to your Leiningen configuration to use the new library.

P.S. To learn more about how The Climate Corporation employees get to share open-source code with other Clojure users, do math/science research, go on sprintbaticals, and take advantage of many other benefits, you can check out this page about all those things.

Posted in Engineering

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: