(Aggregation is alpha and subject to change.)
Datomic's aggregate syntax is incorporated in the :find
clause:
[:find ?a (min ?b) (max ?b) ?c (sample 12 ?d) :where ...]
The list expressions are aggregate expressions. Query variables not in aggregate expressions will group the results and appear intact in the result. Thus, the above query binds ?a ?b ?c ?d
, then groups by ?a
and ?c
, and produces a result for each aggregate expression for each group, yielding 5-tuples.
Aggregates Returning a Single Value
The aggregation functions min
, max
, count
, count-distinct
, sum
, avg
, median
, variance
, and stddev
all behave as their names suggest. For example, the following query finds the highest value of :object/meanRadius
in adata set about the solar system.
(d/q '[:find (max ?radius) :where [_ :object/meanRadius ?radius]] db) => [[696000.0]]
min
and max
support all database types (via comparators), not just numbers.
rand
aggreggator selects a random element from the collection being aggregated:
(d/q '[:find (rand ?name) :where [?e :object/name ?name]] db) => [["Triton"]]
- sample code in Clojure
- sample code in Java
Aggregates Returning Collections
(distinct ?xs)
(min n ?xs)
(max n ?xs)
(rand n ?xs)
(sample n ?xs)
distinct
returns the set of distinct values in the collection. min
/ max
n return the n (if available) least/greatest items. rand
n selects n items with potential for duplicates, while sample
n attempts to return n distinct elements, treating the collection as a population. In all cases where n is provided, fewer than n may be returned if that's all that is available.
The following query returns five names from a population of solar system objects:
(d/q '[:find (sample 5 ?name) :with ?e :where [?e :object/name ?name]] db) => [[["Sun" "Io" "Triton" "Ganymede" "Mars"]]]
- sample code in Clojure
- sample code in Java
Control Grouping via :with
Unless otherwise specified, Datomic's datalog returns sets, and you will not see duplicate values. This is often undesirable when producing aggregates. Consider the following data set describing mythological monsters:
(def monsters [["Cerberus" 3] ["Medusa" 1] ["Cyclops" 1] ["Chimera" 1]])
and this (incorrect!) head-counting query:
(d/q '[:find (sum ?heads) :in [[_ ?heads]]] monsters) => [[4]]
The solution to this problem is the :with
clause, which considers additional variables when forming the basis set for the query result. The :with
variables are then removed, leaving a bag (not a set!) of values available for aggregation.
(d/q '[:find (sum ?heads) :with ?monster :in [[?monster ?heads]]] monsters) => [[6]]
- sample code in Clojure
- sample code in Clojure
Custom Aggregates
You may call an arbitary Clojure function as an aggregation function as follows:
- Use the fully qualified name of the function.
- The one and only aggregated variable must be the last argument to the function.
- Other arguments to the function must be constants in they query.
Your function will be called with a partial implementation of java.util.List
- only size()
, iterator()
, and get(i)
are supported.
For example, the following query might come in handy when analyzing naming conventions in a database. It returns the modes of schema name size, using a custom modes
aggregator.
(d/q '[:find (datomic.samples.query/modes ?length) :with ?e :where [?e :db/ident ?ident] [(name ?ident) ?name] [(count ?name) ?length]] db)
- sample code in Clojure
0 Comments