Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SQL-style JOIN operator with jq

Tags:

json

left-join

jq

I'd like to use jq to do a left join on these two objects in an array by using the jq JOIN operator:

[
  {
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "Ben",
        "male"
      ],
      [
        "2",
        "Cathy",
        "female"
      ]
    ],
    "columns": [
      "id",
      "name",
      "sex"
    ]
  },
  {
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "25"
      ],
      [
        "2",
        "28"
      ]
    ],
    "columns": [
      "id",
      "age"
    ]
  }
]

to this:

{
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "Ben",
        "male",
        "25"
      ],
      [
        "2",
        "Cathy",
        "female",
        "28"
      ]
    ],
    "columns": [
      "id",
      "name",
      "sex",
      "age"
    ]
}

The jq manual references SQL-style operators like INDEX and JOIN but I can't find any examples online of how to use them. Like any join, I need to iterate over the rows in the first object and merge with rows found in the second object based on the id column.

like image 701
Ben Davis Avatar asked Sep 19 '25 01:09

Ben Davis


1 Answers

The following adopts an object-oriented approach to the task, and ignores keys other than "rows" and "columns" in the input.

There are accordingly three steps:

  1. convert the tabular (columns/rows) representation of the "database" to an OO representation -- this is accomplished by toOO;
  2. perform the update -- accomplished by left_join;
  3. convert the OO representation back to the tabular form (toDB)

The main program is thus quite short:

map(toOO) | left_join(.id) | toDB   

The helper functions, most of which are independently useful, are presented below but should appear before the main program above.

Helper Functions

# headers should be an array of strings
def objectify(headers):
  . as $in
  | reduce range(0; headers|length) as $i ({};
      . + {(headers[$i]): $in[$i]} );

def fromObject:
  . as $in
  | reduce keys_unsorted[] as $k ( {rows:[], columns:[]};
     .columns += [$k] | .rows += [$in[$k]] );

# Input: an array of object
# Output: a single object with array-valued keys formed from
# the input object by (blindly) concatenating the values
def merge:
  def allkeys: map(keys) | add | unique;
  allkeys as $allkeys
  | reduce .[] as $in ({};
     reduce $allkeys[] as $k (.;
      . + {($k): (.[$k] + [$in[$k]]) } ));

# id should be a filter such as .id
# Apply merge but do not replicate id
def merge_by(id):
  merge | (id |= .[0] ) ;

# Create an object-representation of the columns/rows database
def toOO:
  .columns as $headers
  | [.rows[] | objectify($headers)];

def toDB:
  map(fromObject)
  | merge_by(.columns);

# Input: an array of two arrays, each consisting of objects forming a database.
# "key" specifies the (possibly composite) key to be used to form the
# left-join of the two databases. 
# That is, objects are NOT added to the first database.
def left_join(key):
  def updateObject(obj):
    reduce .[] as $x ([];
      if ($x|key) == (obj|key) 
      then . + [$x + obj] else . + [$x]
      end);
  reduce .[1][] as $x (.[0]; updateObject($x) );

Output

{
  "columns": [
    "id",
    "name",
    "sex",
    "age"
  ],
  "rows": [
    [
      "1",
      "Ben",
      "male",
      "25"
    ],
    [
      "2",
      "Cathy",
      "female",
      "28"
    ]
  ]
}

Using JOIN builtin

If your jq has JOIN/4, then it can be used to implement left_join/1 as follows:

def left_join(key):
  map(INDEX(key)) 
  | [ JOIN( .[1]; .[0][]; key; add) ] ;

In theory, this should be more efficient than the definition given above.

like image 68
peak Avatar answered Sep 20 '25 16:09

peak