/google/gmail/inbox
 /google/drive/map
 /google/apps
 /yahoo/news/cricket
 /yahoo/mail/
 /yahoo/sports
 /wiki/ind/jack
 /wiki/us/jil
I need to get the required page groups. If i search for page group starting with 'google' using hive query, I need to get the data of the first 3 rows.
/google/gmail/inbox
 /google/drive/map
 /google/apps
In this way I need to get the data based on page group.
I searched for the string using like function.
select * from table where field like '%/google/%';
It sounds like you want page group. Which may be google, but it seems like it could also be yahoo perhaps. If you want to extract the page group by search engine you can use a regular expression. You can place multiple websites in the (page1|page2|...|pageN).
Select column from table
where column rlike '.*(google|yahoo).*'
Output:
/google/gmail/inbox
/google/drive/map
/google/apps
You may want to create a new column that has the search engine name it or landing page. It seems like the first place in the path is the landing page. You can extact the landing page in this fashion:
select * from
    (Select column
    , regexp_extract('^(\\/[a-zA-Z]*\\/)',1) as landing_page
    from table) a
  where landing page in ('google','yahoo',...,'bing')
  ;
Output:
column                   new column
/google/gmail/inbox      /google/
/google/drive/map        /google/
/google/apps             /google/
/yahoo/news/cricket      /yahoo/
/yahoo/mail/             /yahoo/
/yahoo/sports            /yahoo/
/bing/meats/delisandwich /bing/
/bing/maps/delis         /bing/
If you don't want /google/ and prefer just google then do:
regexp_extract('^\\/([a-zA-Z]*)\\/',1) as landing_page
Now I am assuming that the landing page comes first in the paths you describe.
The question is a bit ambiguous, but I believe you are trying to search for the word google in the string and return the rows which contain the word google in the string.
Suppose you have the following table:
create table test (val string);
And it contain the following records:
hive> select * from test;
/google/gmail/inbox
/google/drive/map
/yahoo/mail/
You can select the rows which contains the string google using the below query:
select val from test
where instr(val, 'google') = 2;
This gives the result:
/google/gmail/inbox
/google/drive/map
instr gives the position of the string you searched for. In this case, the position of google is 2. If you are trying to get all the rows which contains google anywhere, then you can use:
select val from test
where instr(val, 'google') != 0;
You can read about the various Hive string functions from the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With