A DSL to access the Google App Engine datastore from Clojure

[Update: This post belongs to a guest posting at the Google Code blog. Read it here.]

After more than 10 years of bringing new products online for our customers, we now have our very own startup: TheDeadline, an intelligent to-do management system. One of the tools we used to build TheDeadline was a domain specific language, or DSL, we created for working with the App Engine datastore.
TheDeadline runs on Google App Engine and is written in Clojure (read why here). Clojure is a modern Lisp running on the Java Virtual Machine. App Engine provides a distributed key-value datastore based on Google's Bigtable system. You can use the datastore from Python and Java, as well as other languages that run on the JVM. If you are using Java, you have several options to store and access your data, including standardized object persistence mapping via the provided JDO and JPA interfaces.

Modeling your data structures for a distributed key-value store for large-scale internet applications differs in several key aspects from ER-modeling: Forget normalization, optimizing for read-access, etc. As a result, we believe that using object-oriented persistence mapping can cause a developer to incorrectly abstract object relationships: You should not have complex object relationships in your datastore. In addition, since Clojure is a functional programming language, it makes less sense to use a persistence mechanism rooted in object oriented practices. In Clojure you are using structs (maps) and not "objects" to hold your data, which means that you already have simple key-value structured data at hand. There's no need to use object persistence mapping anyway. The most natural way is to use the low-level API to the datastore directly.

One more thing to mention: The App Engine datastore is a schema-free database. This means that the schema is maintained on the application level and not at the database level. But it is still desirable to have a schema! You still have to structure your data. What you really want to do is to define your data structures and to let Clojure generate the needed code to store and query your data. You can do this by writing Clojure macros. A Clojure macro is a Clojure program that generates another Clojure program. Macros allow you to extend the Clojure language with your own embedded mini-languages, also known as DSLs.

Our solution for TheDeadline consists of two parts: a data structure definition language and a query language. Let's say you want to store data about books in the App Engine datastore. The first step is to define the data structure of a book with the defentity macro. This defines a book entity with six attributes:

(defentity book
  [:key]
  [:title]
  [:author]
  [:publisher]
  [:isbn]
  [:pages])

defentity generates several functions. The most important one in this case is make-book. Let's create some books now:

(def *books*
  (list (make-book :title "On Lisp"
           :author "Paul Graham"
           :publisher "Prentice Hall"
           :isbn "978-0130305527"
           :pages 413)
    (make-book :title "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp"
           :author "Peter Norvig"
           :publisher "Morgan Kaufmann"
           :isbn "978-1558601918"
           :pages 946)
    (make-book :title "Programming Clojure"
           :author "Stuart Halloway"
           :publisher "Pragmatic Programmers"
           :isbn "978-1934356333"
           :pages 304)))

When you now evaluate the variable *books*, you'll see it contains a list of Clojure maps with your book data:

repl-prompt> *books*
({:key nil, :title "On Lisp", :author "Paul Graham", :publisher "Prentice Hall", :isbn "978-0130305527", :pages 413, :kind "book"} {:key nil, :title "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp", :author "Peter Norvig", :publisher "Morgan Kaufmann", :isbn "978-1558601918", :pages 946, :kind "book"} {:key nil, :title "Programming Clojure", :author "Stuart Halloway", :publisher "Pragmatic Programmers", :isbn "978-1934356333", :pages 304, :kind "book"})

The key-Attribute is nil, because we did not save the data to the datastore yet. We just created the books in memory. If you now want to store these books into the app engine datastore, you just call the function store-entities!:

repl-prompt> (store-entities! *books*)
({:pages 413, :isbn "978-0130305527", :publisher "Prentice Hall", :author "Paul Graham", :title "On Lisp", :key #<Key book(7)>, :parent-key nil, :kind "book"} {:pages 946, :isbn "978-1558601918", :publisher "Morgan Kaufmann", :author "Peter Norvig", :title "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp", :key #<Key book(8)>, :parent-key nil, :kind "book"} {:pages 304, :isbn "978-1934356333", :publisher "Pragmatic Programmers", :author "Stuart Halloway", :title "Programming Clojure", :key #<Key book(9)>, :parent-key nil, :kind "book"})

store-entities! returns a list of the entities that have just been stored. We put an exclamation mark at the end of the function name to visualize this side effect. You can see that the key attributes are not nil anymore. What you see here is the string representation of the App Engine datastore Key-Class objects. You can now access each entity by its key. Much of the time, however, you may want to select an entity subset that satisfies a certain criteria set. You can do this with another DSL: the datastore query language.

Let's say we want to select all books from the author "Peter Norvig":

repl-prompt> (select (where book ([= :author "Peter Norvig"])))
({:pages 946, :isbn "978-1558601918", :publisher "Morgan Kaufmann", :author "Peter Norvig", :title "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp", :key #<Key book(8)>, :parent-key nil, :kind "book"})

Or all books with less than 400 pages:

repl-prompt> (select (where book ([< :pages 400])))
({:pages 304, :isbn "978-1934356333", :publisher "Pragmatic Programmers", :author "Stuart Halloway", :title "Programming Clojure", :key #<Key book(12)>, :parent-key nil, :kind "book"})

These mini-languages are very simple to use and you don't need to know anything about the datastore internals or the Java datastore API! You don't even need to know what an entity is because you just work with the Clojure maps. You can map, reduce and filter these maps just like any other Clojure internal datastructure. So we have a very natural integration with the language.

There is more: At some point you'll need some more functions to convert complex types between your application and the Google datastore and vice versa because the datastore supports only a fixed set of datatypes. If you want to store unsupported types, you have to take care of the serialization/deserialization yourself (check the supported types here).

Let's construct a simple example: You want to store a Boolean value for whether the book is out of print, but the input data in your program is a string "yes" or "no". "yes" should be translated to true and "no" should be translated to false before the entity is saved to the datastore. When the entity is loaded from the datastore, the Boolean values should be translated back to the string values again. To do this, we add the attribute outofprint to our book definition and we define a :pre-save and a :post-load anonymous function to convert between string and Boolean values. They are called with the current value of the attribute as their only parameter (the placeholder for this is the '%' sign). The return value is set as the new attribute value:

(defentity book
  [:key]
  [:title]
  [:author]
  [:publisher]
  [:isbn]
  [:pages]
  [:outofprint
   :pre-save #(= % "yes")
   :post-load #(if %
                 "yes"
                 "no")]
)


Storing a book would look like this now. We would provide the outofprint parameter as a string:



(store-entitites! (make-book :title "On Lisp"
                             :author "Paul Graham"
                             :publisher "Prentice Hall"
                             :isbn "978-0130305527"
                             :pages 413
                             :outofprint "yes"))

Now let's load the book from the datastore again. The parameter is still a string:


repl-prompt> (select (where book ([= :author "Paul Graham"])))
({:outofprint "yes", :pages 413, :isbn "978-0130305527", :publisher "Prentice Hall", :author "Paul Graham", :title "On Lisp", :key #<Key book(21)>, :parent-key nil, :kind "book"})


But when you examine the entity under the hood, you see that the value of outofprint is stored as a Boolean:


#<Entity <Entity [book(21)]:
    author = Paul Graham
    title = On Lisp
    pages = 413
    isbn = 978-0130305527
    outofprint = true
    publisher = Prentice Hall
>
>


For every attribute, our mini-language executes the :pre-save and :post-load functions automatically before it saves/loads data to/from the datastore. You can use these functions for type conversions, to manipulate the data in other ways, do calculations or whatever, and of course you can use these functions for validation.



This is all you need to know to write code to access the datastore. If you are new to Clojure (and any other Lisp language), than you might get a feeling why Paul Graham once said: "Lisp's power is multiplied by the fact that your competitors don't get it." Use simple data structures. Create powerful functional abstractions. Write less code. If you want to give our mini-languages a try, you can find the code here. You will find features for :pre-save and :post-load functions on entity level, transactions with automatic retries, query by key, return only keys from a query, automatically resolving parent/child relationships between entities and automatically resolving entities from attributes that contain keys.


If you want to get started with Clojure on Google App Engine, I can recommend this post. You'll need this post to setup an interactive programming environment.


If you are curious now and would like to try out TheDeadline, you can sign-up here. For more Clojure and App Engine related posts, you can follow our blog H.W.A. If you're attending Google I/O, you can chat with us about Google App Engine, Clojure and the Universe in the Developer Sandbox on May 19 and 20.

Happy hacking!