Querying collections in SQL-like style

Groovy’s groovy-ginq module provides a higher-level abstraction over collections. It could perform queries against in-memory collections of objects in SQL-like style. Also, querying XML, JSON, YAML, etc. could also be supported because they can be parsed into collections. As GORM and jOOQ are powerful enough to support querying DB, we will cover collections first.

1. GINQ a.k.a. Groovy-Integrated Query

GINQ is a DSL for querying with SQL-like syntax, which consists of the following structure:

GQ, i.e. abbreviation for GINQ
|__ from
|   |__ <data_source_alias> in <data_source>
|__ [join/innerjoin/leftjoin/rightjoin/fulljoin/crossjoin]*
|   |__ <data_source_alias> in <data_source>
|   |__ on <condition> ((&& | ||) <condition>)* (NOTE: `crossjoin` does not need `on` clause)
|__ [where]
|   |__ <condition> ((&& | ||) <condition>)*
|__ [groupby]
|   |__ <expression> [as <alias>] (, <expression> [as <alias>])*
|   |__ [having]
|       |__ <condition> ((&& | ||) <condition>)*
|__ [orderby]
|   |__ <expression> [in (asc|desc)] (, <expression> [in (asc|desc)])*
|__ [limit]
|   |__ [<offset>,] <size>
|__ select
    |__ <expression> [as <alias>] (, <expression> [as <alias>])*

[] means the related clause is optional, * means zero or more times, and + means one or more times. Also, the clauses of GINQ are order sensitive, so the order of clauses should be kept as the above structure

As we could see, the simplest GINQ consists of a from clause and a select clause, which looks like:

from n in [0, 1, 2]
select n

ONLY ONE from clause is required in GINQ. Also, GINQ supports multiple data sources through from and the related joins.

As a DSL, GINQ should be wrapped with the following block to be executed:

GQ { /* GINQ CODE */ }

For example,

def numbers = [0, 1, 2]
assert [0, 1, 2] == GQ {
    from n in numbers
    select n
}.toList()

import java.util.stream.Collectors

def numbers = [0, 1, 2]
assert '0#1#2' == GQ {
    from n in numbers
    select n
}.stream()
    .map(e -> String.valueOf(e))
    .collect(Collectors.joining('#'))

And it is strongly recommended to use def to define the variable for the result of GINQ execution, which is a Queryable instance that is lazy.

def result = GQ {
    /* GINQ CODE */
}
def stream = result.stream() // get the stream from GINQ result
def list = result.toList() // get the list from GINQ result

Currently GINQ can not work well when STC is enabled.

Also, GINQ could be written in a method marked with @GQ:

@GQ
def someGinqMethod() {
    /* GINQ CODE */
}

For example,

Mark the ginq method as a GINQ method with @GQ annotation:

@groovy.ginq.transform.GQ
def ginq(list, b, e) {
    from n in list
    where b < n && n < e
    select n
}

assert [3, 4] == ginq([1, 2, 3, 4, 5, 6], 2, 5).toList()

Specify the result type as List:

import groovy.ginq.transform.GQ

@GQ(List)
def ginq(b, e) {
    from n in [1, 2, 3, 4, 5, 6]
    where b < n && n < e
    select n
}

assert [3, 4] == ginq(2, 5)

GINQ supports many result types, e.g. List, Set, Collection, Iterable, Iterator, java.util.stream.Stream and array types.

Enable parallel querying:

import groovy.ginq.transform.GQ

@GQ(parallel=true)
def ginq(x) {
    from n in [1, 2, 3]
    where n < x
    select n
}

assert [1] == ginq(2).toList()

1.1. GINQ Syntax

1.1.1. Data Source

The data source for GINQ could be specified by from clause, which is equivalent to SQL’s FROM. Currently GINQ supports Iterable, Stream, array and GINQ result set as its data source:

`Iterable` Data Source

from n in [1, 2, 3] select n

`Stream` Data Source

from n in [1, 2, 3].stream() select n

Array Data Source

from n in new int[] {1, 2, 3} select n

GINQ Result Set Data Source

def vt = GQ {from m in [1, 2, 3] select m}
assert [1, 2, 3] == GQ {
    from n in vt select n
}.toList()

1.1.2. Projection

The column names could be renamed with as clause:

def result = GQ {
    from n in [1, 2, 3]
    select Math.pow(n, 2) as powerOfN
}
assert [[1, 1], [4, 4], [9, 9]] == result.stream().map(r -> [r[0], r.powerOfN]).toList()

The renamed column could be referenced by its new name, e.g. r.powerOfN. Also, it could be referenced by its index, e.g. r[0]

assert [[1, 1], [2, 4], [3, 9]] == GQ {
    from v in (
        from n in [1, 2, 3]
        select n, Math.pow(n, 2) as powerOfN
    )
    select v.n, v.powerOfN
}.toList()

select P1, P2, …, Pn is a simplified syntax of select new NamedRecord(P1, P2, …, Pn) when and only when n >= 2. Also, NamedRecord instance will be created if as clause is used. The values stored in the NamedRecord could be referenced by their names.

Construct new objects as column values:

@groovy.transform.EqualsAndHashCode
class Person {
    String name
    Person(String name) {
        this.name = name
    }
}
def persons = [new Person('Daniel'), new Person('Paul'), new Person('Eric')]
assert persons == GQ {
    from n in ['Daniel', 'Paul', 'Eric']
    select new Person(n)
}.toList()

Distinct

distinct is equivalent to SQL’s DISTINCT

def result = GQ {
    from n in [1, 2, 2, 3, 3, 3]
    select distinct(n)
}
assert [1, 2, 3] == result.toList()

def result = GQ {
    from n in [1, 2, 2, 3, 3, 3]
    select distinct(n, n + 1)
}
assert [[1, 2], [2, 3], [3, 4]] == result.toList()

1.1.3. Filtering

where is equivalent to SQL’s WHERE

from n in [0, 1, 2, 3, 4, 5]
where n > 0 && n <= 3
select n * 2

In

from n in [0, 1, 2]
where n in [1, 2]
select n

from n in [0, 1, 2]
where n in (
    from m in [1, 2]
    select m
)
select n

import static groovy.lang.Tuple.tuple
assert [0, 1] == GQ {
    from n in [0, 1, 2]
    where tuple(n, n + 1) in (
        from m in [1, 2]
        select m - 1, m
    )
    select n
}.toList()

Not In

from n in [0, 1, 2]
where n !in [1, 2]
select n

from n in [0, 1, 2]
where n !in (
    from m in [1, 2]
    select m
)
select n

import static groovy.lang.Tuple.tuple
assert [2] == GQ {
    from n in [0, 1, 2]
    where tuple(n, n + 1) !in (
        from m in [1, 2]
        select m - 1, m
    )
    select n
}.toList()

Exists

from n in [1, 2, 3]
where (
    from m in [2, 3]
    where m == n
    select m
).exists()
select n

Not Exists

from n in [1, 2, 3]
where !(
    from m in [2, 3]
    where m == n
    select m
).exists()
select n

1.1.4. Joining

More data sources for GINQ could be specified by join clauses.

from n1 in [1, 2, 3]
join n2 in [1, 3] on n1 == n2
select n1, n2

join is preferred over innerjoin and innerhashjoin as it has better readability, and it is smart enough to choose the correct concrete join(i.e. innerjoin or innerhashjoin) by its on clause.

from n1 in [1, 2, 3]
innerjoin n2 in [1, 3] on n1 == n2
select n1, n2

from n1 in [1, 2, 3]
leftjoin n2 in [2, 3, 4] on n1 == n2
select n1, n2

from n1 in [2, 3, 4]
rightjoin n2 in [1, 2, 3] on n1 == n2
select n1, n2

from n1 in [1, 2, 3]
fulljoin n2 in [2, 3, 4] on n1 == n2
select n1, n2

from n1 in [1, 2, 3]
crossjoin n2 in [3, 4, 5]
select n1, n2

hash join is especially efficient when data sources contain lots of objects

from n1 in [1, 2, 3]
innerhashjoin n2 in [1, 3] on n1 == n2
select n1, n2

from n1 in [1, 2, 3]
lefthashjoin n2 in [2, 3, 4] on n1 == n2
select n1, n2

from n1 in [2, 3, 4]
righthashjoin n2 in [1, 2, 3] on n1 == n2
select n1, n2

from n1 in [1, 2, 3]
fullhashjoin n2 in [2, 3, 4] on n1 == n2
select n1, n2

Only binary expressions(==, &&) are allowed in the on clause of hash join

1.1.5. Grouping

groupby is equivalent to SQL’s GROUP BY, and having is equivalent to SQL’s HAVING

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, count(n)

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
having n >= 3
select n, count(n)

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
having count() < 3
select n, count()

The group columns could be renamed with as clause:

from s in ['ab', 'ac', 'bd', 'acd', 'bcd', 'bef']
groupby s.size() as length, s[0] as firstChar
select length, firstChar, max(s)

from s in ['ab', 'ac', 'bd', 'acd', 'bcd', 'bef']
groupby s.size() as length, s[0] as firstChar
having length == 3 && firstChar == 'b'
select length, firstChar, max(s)

Aggregate Functions

GINQ provides some built-in aggregate functions:

Function Argument Type(s) Return Type Description

Function	Argument Type(s)	Return Type	Description
count()		java.lang.Long	number of rows, similar to `count(*)` in SQL
count(expression)	any	java.lang.Long	number of rows for which the value of expression is not `null`
min(expression)	java.lang.Comparable	same as argument type	minimum value of expression across all non-null values
max(expression)	java.lang.Comparable	same as argument type	maximum value of expression across all non-null values
sum(expression)	java.lang.Number	java.math.BigDecimal	sum of expression across all non-null values
avg(expression)	java.lang.Number	java.math.BigDecimal	the average (arithmetic mean) of all non-null values
median(expression)	java.lang.Number	java.math.BigDecimal	value such that the number of non-null values above and below it is the same ("middle" value, not necessarily same as average or mean)
stdev(expression)	java.lang.Number	java.math.BigDecimal	the statistical standard deviation of all non-null values
stdevp(expression)	java.lang.Number	java.math.BigDecimal	the statistical standard deviation for the population for all non-null values
var(expression)	java.lang.Number	java.math.BigDecimal	the statistical variance of all non-null values
varp(expression)	java.lang.Number	java.math.BigDecimal	the statistical variance for the population for all non-null values
agg(expression)	any	any	customizes the aggregation logic in expression and returns single value

count()

java.lang.Long

number of rows, similar to count(*) in SQL

count(expression)

any

java.lang.Long

number of rows for which the value of expression is not null

min(expression)

java.lang.Comparable

same as argument type

minimum value of expression across all non-null values

max(expression)

java.lang.Comparable

same as argument type

maximum value of expression across all non-null values

sum(expression)

java.lang.Number

java.math.BigDecimal

sum of expression across all non-null values

avg(expression)

java.lang.Number

java.math.BigDecimal

the average (arithmetic mean) of all non-null values

median(expression)

java.lang.Number

java.math.BigDecimal

value such that the number of non-null values above and below it is the same ("middle" value, not necessarily same as average or mean)

stdev(expression)

java.lang.Number

java.math.BigDecimal

the statistical standard deviation of all non-null values

stdevp(expression)

java.lang.Number

java.math.BigDecimal

the statistical standard deviation for the population for all non-null values

var(expression)

java.lang.Number

java.math.BigDecimal

the statistical variance of all non-null values

varp(expression)

java.lang.Number

java.math.BigDecimal

the statistical variance for the population for all non-null values

agg(expression)

any

customizes the aggregation logic in expression and returns single value

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, count()

from s in ['a', 'b', 'cd', 'ef']
groupby s.size() as length
select length, min(s)

from s in ['a', 'b', 'cd', 'ef']
groupby s.size() as length
select length, max(s)

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, sum(n)

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, avg(n)

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, median(n)

from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, agg(_g.stream().map(r -> r.n).reduce(BigDecimal.ZERO, BigDecimal::add))

_g is an implicit variable for agg aggregate function, it represents the grouped Queryable object and its record(e.g. r) could reference the data source by alias(e.g. n)

from fruit in ['Apple', 'Apricot', 'Banana', 'Cantaloupe']
groupby fruit.substring(0, 1) as firstChar
select firstChar, agg(_g.stream().map(r -> r.fruit).toList()) as fruit_list

Also, we could apply the aggregate functions for the whole GINQ result, i.e. no groupby clause is needed:

assert [3] == GQ {
    from n in [1, 2, 3]
    select max(n)
}.toList()

assert [[1, 3, 2, 2, 6, 3, 3, 6]] == GQ {
    from n in [1, 2, 3]
    select min(n), max(n), avg(n), median(n), sum(n), count(n), count(),
            agg(_g.stream().map(r -> r.n).reduce(BigDecimal.ZERO, BigDecimal::add))
}.toList()

assert [0.816496580927726] == GQ {
    from n in [1, 2, 3]
    select stdev(n)
}.toList()

assert [1] == GQ {
    from n in [1, 2, 3]
    select stdevp(n)
}.toList()

assert [0.6666666666666667] == GQ {
    from n in [1, 2, 3]
    select var(n)
}.toList()

assert [1] == GQ {
    from n in [1, 2, 3]
    select varp(n)
}.toList()

1.1.6. Sorting

orderby is equivalent to SQL’s ORDER BY

from n in [1, 5, 2, 6]
orderby n
select n

in asc is optional when sorting in ascending order

from n in [1, 5, 2, 6]
orderby n in asc
select n

from n in [1, 5, 2, 6]
orderby n in desc
select n

from s in ['a', 'b', 'ef', 'cd']
orderby s.length() in desc, s in asc
select s

from s in ['a', 'b', 'ef', 'cd']
orderby s.length() in desc, s
select s

from n in [1, null, 5, null, 2, 6]
orderby n in asc(nullslast)
select n

nullslast is equivalent to SQL’s NULLS LAST and applied by default. nullsfirst is equivalent to SQL’s NULLS FIRST.

from n in [1, null, 5, null, 2, 6]
orderby n in asc(nullsfirst)
select n

from n in [1, null, 5, null, 2, 6]
orderby n in desc(nullslast)
select n

from n in [1, null, 5, null, 2, 6]
orderby n in desc(nullsfirst)
select n

1.1.7. Pagination

limit is similar to the limit clause of MySQL, which could specify the offset(first argument) and size(second argument) for paginating, or just specify the only one argument as size

from n in [1, 2, 3, 4, 5]
limit 3
select n

from n in [1, 2, 3, 4, 5]
limit 1, 3
select n

1.1.8. Nested GINQ

Nested GINQ in `from` clause

from v in (
    from n in [1, 2, 3]
    select n
)
select v

Nested GINQ in `where` clause

from n in [0, 1, 2]
where n in (
    from m in [1, 2]
    select m
)
select n

from n in [0, 1, 2]
where (
    from m in [1, 2]
    where m == n
    select m
).exists()
select n

Nested GINQ in `select` clause

assert [null, 2, 3] == GQ {
    from n in [1, 2, 3]
    select (
        from m in [2, 3, 4]
        where m == n
        limit 1
        select m
    )
}.toList()

It’s recommended to use limit 1 to restrict the count of sub-query result because TooManyValuesException will be thrown if more than one values returned

We could use as clause to name the sub-query result

assert [[1, null], [2, 2], [3, 3]] == GQ {
    from n in [1, 2, 3]
    select n, (
        from m in [2, 3, 4]
        where m == n
        select m
    ) as sqr
}.toList()

1.1.9. Window Functions

Window can be defined by partitionby, orderby, rows and range:

over(
    [partitionby <expression> (, <expression>)*]
    [orderby <expression> (, <expression>)*
       [rows <lower>, <upper> | range <lower>, <upper>]]
)

0 used as bound of rows and range clause is equivalent to SQL’s CURRENT ROW, and negative means PRECEDING, positive means FOLLOWING
null used as the lower bound of rows and range clause is equivalent to SQL’s UNBOUNDED PRECEDING
null used as the upper bound of rows and range clause is equivalent to SQL’s UNBOUNDED FOLLOWING

Also, GINQ provides some built-in window functions:

Function Argument Type(s) Return Type Description

Function	Argument Type(s)	Return Type	Description
rowNumber()		java.lang.Long	number of the current row within its partition, counting from `0`
rank()		java.lang.Long	rank of the current row with gaps
denseRank()		java.lang.Long	rank of the current row without gaps
percentRank()		java.math.BigDecimal	relative rank of the current row: (rank - 1) / (total rows - 1)
cumeDist()		java.math.BigDecimal	relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
ntile(expression)	java.lang.Long	java.lang.Long	bucket index ranging from `0` to `expression - 1`, dividing the partition as equally as possible
lead(expression [, offset [, default]])	any [, java.lang.Long [, same as expression type]]	same as expression type	returns expression evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as expression). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to `1` and default to `null`
lag(expression [, offset [, default]])	any [, java.lang.Long [, same as expression type]]	same as expression type	returns expression evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as expression). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to `1` and default to `null`
firstValue(expression)	any	same type as expression	returns expression evaluated at the row that is the first row of the window frame
lastValue(expression)	any	same type as expression	returns expression evaluated at the row that is the last row of the window frame
nthValue(expression, n)	any, java.lang.Long	same type as expression	returns expression evaluated at the row that is the nth row of the window frame
count()		java.lang.Long	number of rows, similar to `count(*)` in SQL
count(expression)	any	java.lang.Long	number of rows for which the value of expression is not `null`
min(expression)	java.lang.Comparable	same as argument type	minimum value of expression across all non-null values
max(expression)	java.lang.Comparable	same as argument type	maximum value of expression across all non-null values
sum(expression)	java.lang.Number	java.math.BigDecimal	sum of expression across all non-null values
avg(expression)	java.lang.Number	java.math.BigDecimal	the average (arithmetic mean) of all non-null values
median(expression)	java.lang.Number	java.math.BigDecimal	value such that the number of non-null values above and below it is the same ("middle" value, not necessarily same as average or mean)
stdev(expression)	java.lang.Number	java.math.BigDecimal	the statistical standard deviation of all non-null values
stdevp(expression)	java.lang.Number	java.math.BigDecimal	the statistical standard deviation for the population for all non-null values
var(expression)	java.lang.Number	java.math.BigDecimal	the statistical variance of all non-null values
varp(expression)	java.lang.Number	java.math.BigDecimal	the statistical variance for the population for all non-null values
agg(expression)	any	any	INCUBATING: customizes the aggregation logic in expression and returns single value

rowNumber()

java.lang.Long

number of the current row within its partition, counting from 0

rank()

java.lang.Long

rank of the current row with gaps

denseRank()

java.lang.Long

rank of the current row without gaps

percentRank()

java.math.BigDecimal

relative rank of the current row: (rank - 1) / (total rows - 1)

cumeDist()

java.math.BigDecimal

relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

ntile(expression)

java.lang.Long

bucket index ranging from 0 to expression - 1, dividing the partition as equally as possible

lead(expression [, offset [, default]])

any [, java.lang.Long [, same as expression type]]

same as expression type

returns expression evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as expression). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

lag(expression [, offset [, default]])

any [, java.lang.Long [, same as expression type]]

same as expression type

returns expression evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as expression). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

firstValue(expression)

any

same type as expression

returns expression evaluated at the row that is the first row of the window frame

lastValue(expression)

any

same type as expression

returns expression evaluated at the row that is the last row of the window frame

nthValue(expression, n)

any, java.lang.Long

same type as expression

returns expression evaluated at the row that is the nth row of the window frame

count()

java.lang.Long

number of rows, similar to count(*) in SQL

count(expression)

any

java.lang.Long

number of rows for which the value of expression is not null

min(expression)

java.lang.Comparable

same as argument type

minimum value of expression across all non-null values

max(expression)

java.lang.Comparable

same as argument type

maximum value of expression across all non-null values

sum(expression)

java.lang.Number

java.math.BigDecimal

sum of expression across all non-null values

avg(expression)

java.lang.Number

java.math.BigDecimal

the average (arithmetic mean) of all non-null values

median(expression)

java.lang.Number

java.math.BigDecimal

value such that the number of non-null values above and below it is the same ("middle" value, not necessarily same as average or mean)

stdev(expression)

java.lang.Number

java.math.BigDecimal

the statistical standard deviation of all non-null values

stdevp(expression)

java.lang.Number

java.math.BigDecimal

the statistical standard deviation for the population for all non-null values

var(expression)

java.lang.Number

java.math.BigDecimal

the statistical variance of all non-null values

varp(expression)

java.lang.Number

java.math.BigDecimal

the statistical variance for the population for all non-null values

agg(expression)

any

INCUBATING: customizes the aggregation logic in expression and returns single value

`rowNumber`

assert [[2, 1, 1, 1], [1, 0, 0, 2], [null, 3, 3, 3], [3, 2, 2, 0]] == GQ {
    from n in [2, 1, null, 3]
    select n, (rowNumber() over(orderby n)),
              (rowNumber() over(orderby n in asc)),
              (rowNumber() over(orderby n in desc))
}.toList()

assert [[1, 0, 1, 2, 3], [2, 1, 2, 1, 2], [null, 3, 0, 3, 0], [3, 2, 3, 0, 1]] == GQ {
    from n in [1, 2, null, 3]
    select n, (rowNumber() over(orderby n in asc(nullslast))),
              (rowNumber() over(orderby n in asc(nullsfirst))),
              (rowNumber() over(orderby n in desc(nullslast))),
              (rowNumber() over(orderby n in desc(nullsfirst)))
}.toList()

The parentheses around the window function is required.

`rank`, `denseRank`, `percentRank`, `cumeDist` and `ntile`

assert [['a', 1, 1], ['b', 2, 2], ['b', 2, 2],
        ['c', 4, 3], ['c', 4, 3], ['d', 6, 4],
        ['e', 7, 5]] == GQ {
    from s in ['a', 'b', 'b', 'c', 'c', 'd', 'e']
    select s,
        (rank() over(orderby s)),
        (denseRank() over(orderby s))
}.toList()

assert [[60, 0, 0.4], [60, 0, 0.4], [80, 0.5, 0.8], [80, 0.5, 0.8], [100, 1, 1]] == GQ {
    from n in [60, 60, 80, 80, 100]
    select n,
        (percentRank() over(orderby n)),
        (cumeDist() over(orderby n))
}.toList()

assert [[1, 0], [2, 0], [3, 0],
        [4, 1], [5, 1],
        [6, 2], [7, 2],[8, 2],
        [9, 3], [10, 3]] == GQ {
    from n in 1..10
    select n, (ntile(4) over(orderby n))
}.toList()

`lead` and `lag`

assert [[2, 3], [1, 2], [3, null]] == GQ {
    from n in [2, 1, 3]
    select n, (lead(n) over(orderby n))
}.toList()

assert [[2, 3], [1, 2], [3, null]] == GQ {
    from n in [2, 1, 3]
    select n, (lead(n) over(orderby n in asc))
}.toList()

assert [['a', 'bc'], ['ab', null], ['b', 'a'], ['bc', 'ab']] == GQ {
    from s in ['a', 'ab', 'b', 'bc']
    select s, (lead(s) over(orderby s.length(), s in desc))
}.toList()

assert [['a', null], ['ab', null], ['b', 'a'], ['bc', 'ab']] == GQ {
    from s in ['a', 'ab', 'b', 'bc']
    select s, (lead(s) over(partitionby s.length() orderby s.length(), s in desc))
}.toList()

assert [[2, 1], [1, null], [3, 2]] == GQ {
    from n in [2, 1, 3]
    select n, (lag(n) over(orderby n))
}.toList()

assert [[2, 3], [1, 2], [3, null]] == GQ {
    from n in [2, 1, 3]
    select n, (lag(n) over(orderby n in desc))
}.toList()

assert [['a', null], ['b', 'a'], ['aa', null], ['bb', 'aa']] == GQ {
    from s in ['a', 'b', 'aa', 'bb']
    select s, (lag(s) over(partitionby s.length() orderby s))
}.toList()

assert [[2, 3, 1], [1, 2, null], [3, null, 2]] == GQ {
    from n in [2, 1, 3]
    select n, (lead(n) over(orderby n)), (lag(n) over(orderby n))
}.toList()

The offset can be specified other than the default offset 1:

assert [[2, null, null], [1, 3, null], [3, null, 1]] == GQ {
    from n in [2, 1, 3]
    select n, (lead(n, 2) over(orderby n)), (lag(n, 2) over(orderby n))
}.toList()

The default value can be returned when the index specified by offset is out of window, e.g. 'NONE':

assert [[2, 'NONE', 'NONE'], [1, 3, 'NONE'], [3, 'NONE', 1]] == GQ {
    from n in [2, 1, 3]
    select n, (lead(n, 2, 'NONE') over(orderby n)), (lag(n, 2, 'NONE') over(orderby n))
}.toList()

`firstValue`, `lastValue` and `nthValue`

assert [[2, 1], [1, 1], [3, 2]] == GQ {
    from n in [2, 1, 3]
    select n, (firstValue(n) over(orderby n rows -1, 1))
}.toList()

assert [[2, 3], [1, 2], [3, 3]] == GQ {
    from n in [2, 1, 3]
    select n, (lastValue(n) over(orderby n rows -1, 1))
}.toList()

assert [[2, 2], [1, 1], [3, 3]] == GQ {
    from n in [2, 1, 3]
    select n, (firstValue(n) over(orderby n rows 0, 1))
}.toList()

assert [[2, 1], [1, null], [3, 1]] == GQ {
    from n in [2, 1, 3]
    select n, (firstValue(n) over(orderby n rows -2, -1))
}.toList()

assert [[2, 1], [1, null], [3, 2]] == GQ {
    from n in [2, 1, 3]
    select n, (lastValue(n) over(orderby n rows -2, -1))
}.toList()

assert [[2, 3], [1, 3], [3, null]] == GQ {
    from n in [2, 1, 3]
    select n, (lastValue(n) over(orderby n rows 1, 2))
}.toList()

assert [[2, 3], [1, 2], [3, null]] == GQ {
    from n in [2, 1, 3]
    select n, (firstValue(n) over(orderby n rows 1, 2))
}.toList()

assert [[2, 2], [1, 1], [3, 3]] == GQ {
    from n in [2, 1, 3]
    select n, (lastValue(n) over(orderby n rows -1, 0))
}.toList()

assert [[2, 1], [1, 1], [3, 1]] == GQ {
    from n in [2, 1, 3]
    select n, (firstValue(n) over(orderby n rows null, 1))
}.toList()

assert [[2, 3], [1, 3], [3, 3]] == GQ {
    from n in [2, 1, 3]
    select n, (lastValue(n) over(orderby n rows -1, null))
}.toList()

assert [['a', 'a', 'b'], ['aa', 'aa', 'bb'], ['b', 'a', 'b'], ['bb', 'aa', 'bb']] == GQ {
    from s in ['a', 'aa', 'b', 'bb']
    select s, (firstValue(s) over(partitionby s.length() orderby s)),
            (lastValue(s) over(partitionby s.length() orderby s))
}.toList()

assert [[1, 1, 2, 3, null], [2, 1, 2, 3, null], [3, 1, 2, 3, null]] == GQ {
    from n in 1..3
    select n, (nthValue(n, 0) over(orderby n)),
              (nthValue(n, 1) over(orderby n)),
              (nthValue(n, 2) over(orderby n)),
              (nthValue(n, 3) over(orderby n))
}.toList()

`min`, `max`, `count`, `sum`, `avg`, `median`, `stdev`, `stdevp`, `var` ,`varp` and `agg`

assert [['a', 'a', 'b'], ['b', 'a', 'b'], ['aa', 'aa', 'bb'], ['bb', 'aa', 'bb']] == GQ {
    from s in ['a', 'b', 'aa', 'bb']
    select s, (min(s) over(partitionby s.length())), (max(s) over(partitionby s.length()))
}.toList()

assert [[1, 2, 2, 2, 1, 1], [1, 2, 2, 2, 1, 1],
        [2, 2, 2, 4, 2, 2], [2, 2, 2, 4, 2, 2],
        [3, 2, 2, 6, 3, 3], [3, 2, 2, 6, 3, 3]] == GQ {
    from n in [1, 1, 2, 2, 3, 3]
    select n, (count() over(partitionby n)),
              (count(n) over(partitionby n)),
              (sum(n) over(partitionby n)),
              (avg(n) over(partitionby n)),
              (median(n) over(partitionby n))
}.toList()

assert [[2, 6, 3, 1, 3, 4], [1, 6, 3, 1, 3, 4],
        [3, 6, 3, 1, 3, 4], [null, 6, 3, 1, 3, 4]] == GQ {
    from n in [2, 1, 3, null]
    select n, (sum(n) over()),
              (max(n) over()),
              (min(n) over()),
              (count(n) over()),
              (count() over())
}.toList()

assert [[1, 1, 1], [2, 2, 3], [5, 2, 10], [5, 2, 10]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n range -2, 0)),
              (sum(n) over(orderby n range -2, 0))
}.toList()

assert [[1, 2, 3], [2, 1, 2], [5, 2, 10], [5, 2, 10]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n range 0, 1)),
              (sum(n) over(orderby n range 0, 1))
}.toList()

assert [[1, 2, 3], [2, 2, 3], [5, 2, 10], [5, 2, 10]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n range -1, 1)),
              (sum(n) over(orderby n range -1, 1))
}.toList()

assert [[1, 1, 2], [2, 0, 0], [5, 0, 0], [5, 0, 0]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n in desc range 1, 2)),
              (sum(n) over(orderby n in desc range 1, 2))
}.toList()

assert [[1, 0, 0], [2, 1, 1], [5, 0, 0], [5, 0, 0]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n in desc range -2, -1)),
              (sum(n) over(orderby n in desc range -2, -1))
}.toList()

assert [[1, 3, 12], [2, 2, 10], [5, 0, 0], [5, 0, 0]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n range 1, null)),
              (sum(n) over(orderby n range 1, null))
}.toList()

assert [[1, 2, 3], [2, 2, 3], [5, 4, 13], [5, 4, 13]] == GQ {
    from n in [1, 2, 5, 5]
    select n, (count() over(orderby n range null, 1)),
              (sum(n) over(orderby n range null, 1))
}.toList()

assert [[1, 0.816496580927726],
        [2, 0.816496580927726],
        [3, 0.816496580927726]] == GQ {
    from n in [1, 2, 3]
    select n, (stdev(n) over())
}.toList()

assert [[1, 1], [2, 1], [3, 1]] == GQ {
    from n in [1, 2, 3]
    select n, (stdevp(n) over())
}.toList()

assert [[1, 0.6666666666666667],
        [2, 0.6666666666666667],
        [3, 0.6666666666666667]] == GQ {
    from n in [1, 2, 3]
    select n, (var(n) over())
}.toList()

assert [[1, 1], [2, 1], [3, 1]] == GQ {
    from n in [1, 2, 3]
    select n, (varp(n) over())
}.toList()

assert [[1, 4], [2, 2], [3, 4]] == GQ {
    from n in [1, 2, 3]
    select n,
           (agg(_g.stream().map(r -> r.n).reduce(BigDecimal.ZERO, BigDecimal::add)) over(partitionby n % 2))
}.toList()

1.2. GINQ Tips

1.2.1. Row Number

_rn is the implicit variable representing row number for each record in the result set. It starts with 0

from n in [1, 2, 3]
select _rn, n

1.2.2. List Comprehension

List comprehension is an elegant way to define and create lists based on existing lists:

assert [4, 16, 36, 64, 100] == GQ {from n in 1..<11 where n % 2 == 0 select n ** 2}.toList()

assert [4, 16, 36, 64, 100] == GQ {from n in 1..<11 where n % 2 == 0 select n ** 2} as List

assert [4, 16, 36, 64, 100] == GQL {from n in 1..<11 where n % 2 == 0 select n ** 2}

GQL {…} is the abbreviation of GQ {…}.toList()

GINQ could be used as list comprehension in the loops directly:

def result = []
for (def x : GQ {from n in 1..<11 where n % 2 == 0 select n ** 2}) {
    result << x
}
assert [4, 16, 36, 64, 100] == result

1.2.3. Query & Update

This is like update statement in SQL

import groovy.transform.*
@TupleConstructor
@EqualsAndHashCode
@ToString
class Person {
    String name
    String nickname
}

def linda = new Person('Linda', null)
def david = new Person('David', null)
def persons = [new Person('Daniel', 'ShanFengXiaoZi'), linda, david]
def result = GQ {
    from p in persons
    where p.nickname == null
    select p
}.stream()
    .peek(p -> { p.nickname = 'Unknown' }) // update `nickname`
    .toList()

def expected = [new Person('Linda', 'Unknown'), new Person('David', 'Unknown')]
assert expected == result
assert ['Unknown', 'Unknown'] == [linda, david]*.nickname // ensure the original objects are updated

1.2.4. Alternative for `with` clause

GINQ does not support with clause for now, but we could define a temporary variable to workaround:

def v = GQ { from n in [1, 2, 3] where n < 3 select n }
def result = GQ {
    from n in v
    where n > 1
    select n
}
assert [2] == result.toList()

1.2.5. Alternative for `case-when`

case-when of SQL could be replaced with switch expression:

assert ['a', 'b', 'c', 'c'] == GQ {
    from n in [1, 2, 3, 4]
    select switch (n) {
        case 1 -> 'a'
        case 2 -> 'b'
        default -> 'c'
    }
}.toList()

1.2.6. Query JSON

import groovy.json.JsonSlurper
def json = new JsonSlurper().parseText('''
    {
        "fruits": [
            {"name": "Orange", "price": 11},
            {"name": "Apple", "price": 6},
            {"name": "Banana", "price": 4},
            {"name": "Mongo", "price": 29},
            {"name": "Durian", "price": 32}
        ]
    }
''')

def expected = [['Mongo', 29], ['Orange', 11], ['Apple', 6], ['Banana', 4]]
assert expected == GQ {
    from f in json.fruits
    where f.price < 32
    orderby f.price in desc
    select f.name, f.price
}.toList()

1.2.7. Parallel Querying

Parallel querying is especially efficient when querying big data sources. It is disabled by default, but we could enable it by hand:

assert [[1, 1], [2, 2], [3, 3]] == GQ(parallel: true) {
    from n1 in 1..1000
    join n2 in 1..10000 on n2 == n1
    where n1 <= 3 && n2 <= 5
    select n1, n2
}.toList()

As parallel querying will use a shared thread pool, the following code can release resources after all GINQ statements execution are completed, and it will wait util all tasks of threads are completed.

GQ {
    shutdown
}

Once shutdown is issued, parallel querying can not work anymore.

The following code is equivalent to the above code, in other words, immediate is optional:

GQ {
    shutdown immediate
}

Shutdown without waiting tasks to complete:

GQ {
    shutdown abort
}

1.2.8. Customize GINQ

For advanced users, you could customize GINQ behaviour by specifying your own target code generator. For example, we could specify the qualified class name org.apache.groovy.ginq.provider.collection.GinqAstWalker as the target code generator to generate GINQ method calls for querying collections, which is the default behaviour of GINQ:

assert [0, 1, 2] == GQ(astWalker: 'org.apache.groovy.ginq.provider.collection.GinqAstWalker') {
    from n in [0, 1, 2]
    select n
}.toList()

1.2.9. Optimize GINQ

GINQ optimizer is enabled by default for better performance. It will transform the GINQ AST to achieve better execution plan. We could disable it by hand:

assert [[2, 2]] == GQ(optimize: false) {
    from n1 in [1, 2, 3]
    join n2 in [1, 2, 3] on n1 == n2
    where n1 > 1 &&  n2 < 3
    select n1, n2
}.toList()

1.3. GINQ Examples

1.3.1. Generate Multiplication Table

from v in (
    from a in 1..9
    join b in 1..9 on a <= b
    select a as f, b as s, "$a * $b = ${a * b}".toString() as r
)
groupby v.s
select max(v.f == 1 ? v.r : '') as v1,
       max(v.f == 2 ? v.r : '') as v2,
       max(v.f == 3 ? v.r : '') as v3,
       max(v.f == 4 ? v.r : '') as v4,
       max(v.f == 5 ? v.r : '') as v5,
       max(v.f == 6 ? v.r : '') as v6,
       max(v.f == 7 ? v.r : '') as v7,
       max(v.f == 8 ? v.r : '') as v8,
       max(v.f == 9 ? v.r : '') as v9

1.3.2. More examples

link: the latest GINQ examples

Some examples in the above link require the latest SNAPSHOT version of Groovy to run.