Here is my BigQuery
SELECT word,word_count,corpus_date FROM 
[publicdata:samples.shakespeare] 
WHERE word="the" ORDER BY word_count asc
which gives output as
    Row word    word_count corpus_date   
    1   the       57       1609  
    2   the       106      0     
    3   the       287      1609  
    4   the       353      1594  
    5   the       363      0     
    6   the       399      1592  
    7   the       421      1611  
I want the data to be group by corpus_date.I tried using a group by corpus_date
    SELECT word,word_count,corpus_date FROM 
   [publicdata:samples.shakespeare] 
    WHERE word="the" group by corpus_date 
    ORDER BY word_count asc
but it did'nt allow me to do a group by corpus_date.Any way to get data grouped by corpus_date
The GROUP BY clause allows you to group rows that have the same values for a given field. You can then perform aggregate functions on each of the groups. Grouping occurs after any selection or aggregation in the SELECT clause.
4) Google BigQuery SQL Syntax: GROUP BY Clause The GROUP BY clause is used only while using the SELECT statement. The GROUP BY clause comes after the WHERE clause in the query.
ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls.
You'll need to GROUP BY all non aggregated values in your query. However, since you are simply looking for a single word, you don't need to show or even GROUP BY that word in the result set (it's implicitly selected using the word="the" clause).
Therefore, if you want the total sum of word counts for the word "the" grouped by date, you can run something like this:
SELECT
  SUM(word_count) as sum_for_the,
  corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date
ORDER BY
  sum_for_the ASC;
That's not super useful on it's own... so if you want to do something more involved, such as learn which corpus the count per date comes from, SUM the counts of the word and list the corpora using a query like this:
SELECT
  SUM(word_count) AS sum_for_the, corpus, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date, corpus
ORDER BY
  sum_for_the ASC;
For listing all volumes that a word appeared in per year, I like to use the GROUP_CONCAT function. The word "the" appears in everything, so it's probably not as interesting as a less common word, like "swagger." (Which is one of many words invented by Shakespeare).
SELECT
  SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="swagger"
GROUP BY
  corpus_date ORDER BY corpus_date ASC;
Even more fun is to look at word prefixes, and GROUP BY variations of a word per volume and date:
SELECT
  word, SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word CONTAINS "swagger"
GROUP BY
  word, corpus_date
ORDER BY
  corpus_date ASC
IGNORE CASE;
Check out the BigQuery Query Language reference and the BigQuery Cookbook for more examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With