Aggregation

(Aggregation is alpha and subject to change.)

Datomic's aggregate syntax is incorporated in the :find clause:

[:find ?a (min ?b) (max ?b) ?c (sample 12 ?d) 
 :where ...]

The list expressions are aggregate expressions. Query variables not in aggregate expressions will group the results and appear intact in the result. Thus, the above query binds ?a ?b ?c ?d, then groups by ?a and ?c, and produces a result for each aggregate expression for each group, yielding 5-tuples.

Aggregates Returning a Single Value

The aggregation functions minmaxcountcount-distinctsumavgmedianvariance, and stddev all behave as their names suggest. For example, the following query finds the highest value of :object/meanRadius in adata set about the solar system.

(d/q '[:find (max ?radius)
       :where [_ :object/meanRadius ?radius]]
     db)
=> [[696000.0]]

min and max support all database types (via comparators), not just numbers.

rand aggreggator selects a random element from the collection being aggregated:

(d/q '[:find (rand ?name)
       :where [?e :object/name ?name]]
     db)
=> [["Triton"]]

 

 

Aggregates Returning Collections

  • (distinct ?xs)
  • (min n ?xs)
  • (max n ?xs)
  • (rand n ?xs)
  • (sample n ?xs)

distinct returns the set of distinct values in the collection. min / max n return the n (if available) least/greatest items. rand n selects n items with potential for duplicates, while sample n attempts to return n distinct elements, treating the collection as a population. In all cases where n is provided, fewer than n may be returned if that's all that is available.

The following query returns five names from a population of solar system objects:

 

(d/q '[:find (sample 5 ?name)
       :with ?e
       :where [?e :object/name ?name]]
     db)
=> [[["Sun" "Io" "Triton" "Ganymede" "Mars"]]]

 

 

Control Grouping via :with

Unless otherwise specified, Datomic's datalog returns sets, and you will not see duplicate values. This is often undesirable when producing aggregates. Consider the following data set describing mythological monsters:

(def monsters [["Cerberus" 3]
               ["Medusa" 1]
               ["Cyclops" 1]
               ["Chimera" 1]])

and this (incorrect!) head-counting query:

(d/q '[:find (sum ?heads)
       :in [[_ ?heads]]]
     monsters)
=> [[4]]

The solution to this problem is the :with clause, which considers additional variables when forming the basis set for the query result. The :with variables are then removed, leaving a bag (not a set!) of values available for aggregation.

(d/q '[:find (sum ?heads)
       :with ?monster
       :in [[?monster ?heads]]]
     monsters)
=> [[6]]

 

 

Custom Aggregates

You may call an arbitary Clojure function as an aggregation function as follows:

  • Use the fully qualified name of the function.
  • The one and only aggregated variable must be the last argument to the function.
  • Other arguments to the function must be constants in they query.

Your function will be called with a partial implementation of java.util.List - only size()iterator(), and get(i) are supported.

For example, the following query might come in handy when analyzing naming conventions in a database. It returns the modes of schema name size, using a custom modes aggregator.

(d/q '[:find (datomic.samples.query/modes ?length)
       :with ?e
       :where
       [?e :db/ident ?ident]
       [(name ?ident) ?name]
       [(count ?name) ?length]]
     db)

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk