Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to distinct rows from list of lists using Painless scripting language?

I have a Groovy script:

def results = []
def cluster = ['cluster1', 'cluster1', 'cluster1', 'cluster1', 'cluster1', 'cluster1'];
def ports =  ['4344', '4344', '4344', '4344', '4344', '4344'];
def hostname = [ 'cluster1.com','cluster1.com','cluster1.com','cluster1.com','cluster1.com','cluster1.com' ];

def heapu = ['533.6', '526.72' , '518.82' , '515.73', '525.69', '517.71'] ;
def heapm = ['1212.15', '1212.15', '1212.15', '1212.15', '1212.15', '1212.15'];
def times = ['2017-10-08T07:26:21.050Z', '2017-10-08T07:26:11.042Z', '2017-10-08T07:25:51.047Z', '2017-10-08T07:25:31.055Z', '2017-10-08T07:26:01.047Z', '2017-10-08T07:25:41.041Z'] ;

for (int i = 0; i < cluster.size(); ++i){
    def c = cluster[i]
    def p = ports[i]
    def h = hostname[i]
    def hu = heapu[i]
    def hm = heapm[i]
    def t = times[i]

    results.add(['cluster': c,
                 'port': p,
                 'hostname': h,
                 'heap_used': hu,
                 'heap_max': hm,
                 'times': t])
    results = results.unique()
}
//    return ['results': results, 'singlex': singlex]

for (i = 0; i < results.size(); i++){
    println(results[i])
}

The output of this script looks like:

[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:533.6, heap_max:1212.15, times:2017-10-08T07:26:21.050Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:526.72, heap_max:1212.15, times:2017-10-08T07:26:11.042Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:518.82, heap_max:1212.15, times:2017-10-08T07:25:51.047Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:515.73, heap_max:1212.15, times:2017-10-08T07:25:31.055Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:525.69, heap_max:1212.15, times:2017-10-08T07:26:01.047Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:517.71, heap_max:1212.15, times:2017-10-08T07:25:41.041Z]

As it can bee seen from output - > I basically have 6 same line which differs with timestamp. HeapSize and Max HeapSize is different but that is not that important.

Since cluster is the same for all the six entries /cluster1/ I consider it as one output. Ideally, I would like to apply some sort of unique() function which would provide me one line as an output

like following:

[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:523.0450, heap_max:1212.15, times:2017-10-08T07:25:41.041Z]

where heap_used is an average of 6 values as well as heap_max. I know that in python pandas I can make it with one command.However I have no idea about groovy, I keep searching on internet.

EDIT: Groovy solution does not transfer 1:1 to Painless unfortunately.

like image 857
user2156115 Avatar asked Nov 16 '25 06:11

user2156115


1 Answers

You can process your results list in a following way:

def grouped = results.groupBy { [it.cluster, it.port, it.hostname] }
        .entrySet()
        .collect { it -> [cluster: it.key.get(0), port: it.key.get(1), hostname: it.key.get(2)] + [
                heap_used: it.value.heap_used*.toBigDecimal().sum() / it.value.size(),
                heap_max: it.value.heap_max*.toBigDecimal().sum() / it.value.size(),
                times: it.value.times.max()
        ]}

Firstly we group all list elements by triplet containing cluster, port and hostname. Then we collect all entries by combining cluster, port and hostname with heap_used: avg(heap_used), heap_max: avg(heap_max) and times: max(times).

Here

it.value.heap_used*.toBigDecimal().sum()

we take a list of all heap_used values (it.value.heap_used) and then we use spread operator to apply .toBigDecimal() on each list element, because your initial values are represented as strings. And to calculate average we just divide a sum of all heap_used values by the size of the list.

Output

Printing grouped variable will display following result:

[[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:523.045, heap_max:1212.15, times:2017-10-08T07:26:21.050Z]]
like image 103
Szymon Stepniak Avatar answered Nov 18 '25 20:11

Szymon Stepniak