Filter tuple on an nested value of a inner bag

Question

I'm quite a begginer on PigLatin and I need some (basic, I think) help.

My data is DESCRIBE as :

xmlToTuple: {(node_attr_id: int,tag: {(tag_attr_k: chararray,tag_attr_v: chararray)})}

and DUMP like this :

((704398904,{(lat,-13.00583333),(lon,45.24166667)}))
((1230941976,{(place,village)}))
((1230941977,{(name,Mtsahara)}))
((1751057677,{(amenity,fast_food),(name,Brochetterie)}))
((100948360,{(amenity,ferry_terminal)}))
((362795028,{(amenity,fuel),(operator,Total)}))

I want to extract the record which have a certain value for the tag_attr_k field. For example, give me the record where there is a tag_attr_k = amesity ? That should be :

((1751057677,{(amenity,fast_food),(name,Brochetterie)}))
((100948360,{(amenity,ferry_terminal)}))
((362795028,{(amenity,fuel),(operator,Total)}))

Anybody can explain me to do that ? I'm a bit lost…

reo katoa · Accepted Answer

You should use a map instead of a bag of tuples. The keys will be your tag_attr_ks, and your values your tag_attr_vs. So one line of your data would be, e.g.,

(1751057677,['amenity'#'fast_food', 'name',#'Brochetterie'])

Then you can check to see if a key exists by attempting to access it and checking whether the value is NULL.

FILTER xml BY tag_attr#'amenity' IS NOT NULL;

Filter tuple on an nested value of a inner bag

Tags:

hadoop

apache-pig

psic

1 Answers

reo katoa

Recent Activity

Donate For Us

Filter tuple on an nested value of a inner bag

Tags:

hadoop

apache-pig

psic

1 Answers

reo katoa

Related questions

Recent Activity

Donate For Us