Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write Bigquery in new schema with replacing event_dim in old schema from Firebase analytics?

The old BigQuery Export schema wise script is running.It is given below. But I want to replicate this code and write it according to new export schema as we Bigquery schema has been changed. Please help becasue in new BigQuery Export schema I don't find any other corresponding record against event_dim (event_dim is in according to old BigQuery Export schema).

Here is link for BigQuery Export schema: click here

 SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
                event.name,
                params.value.int_value engagement_time
        FROM `xxx.app_events_*`,
        UNNEST(event_dim) as event,
        UNNEST(event.params) as params,
        UNNEST(user_dim.user_properties) as user_params
        where (event.name = "user_engagement" and params.key = "engagement_time_msec")
        and
                (user_params.key = "access" and user_params.value.value.string_value = "true") and
                PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"

Tried the query below but what I want app_instance, min_time, max_time, event_name, engagement_time at one SELECT statement. And as I am using 'group by', I am not able to get all those (app_instance, min_time, max_time, event_name, engagement_time) at a time. Please help.

 SELECT user_pseudo_id
     , MIN(event_timestamp) AS min_time
      ,MAX(event_timestamp) AS max_time
    FROM `xxx.app_events_*` as T,
       T.event_params,
       T.user_properties,
       T.event_timestamp
    where (event_name = "user_engagement" and event_params.key = "engagement_time_msec")
    and
            (user_properties.key = "access" and user_properties.value.string_value = "true") and
            PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
            PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
    group by 1
like image 839
subhendu rana Avatar asked Sep 18 '25 14:09

subhendu rana


1 Answers

As I believe my previous answer provides some general ideas for the Community, I will keep it and write a new one in order to be more specific for your use case.

First of all, I would like to clarify that in order to adapt a query (just like you are asking us to do), one needs to have a clear understanding of the statement, objective of the query, expected results and data to play with. As this is not the case, it is difficult to work with it, even more considering that there are some functionalities that are not clear from the query, for example: in order to obtain the "min_time" and "max_time" for each event, you are taking the min and max value across multiple events, which does not make clear sense to me (it may, depending on your use case, reason why I suggested that it would be better if you could provide more details or work more on the query yourself). Moreover, the new schema "flattens" events, in such a way that each event is written in a different line (you can easily check this by running a SELECT COUNT(*) FROM 'table_with_old_schema' and compare it to SELECT COUNT(*) FROM 'table_with_new_schema'; you will see that the second one has many more rows), so your query does not make sense anymore, because events are not grouped anymore, and then you cannot pick a minimum and maximum between nested fields.

This being clarified, and having removed some fields that cannot be directly adapted to the new schema (you may be able to adapt this from your side, but this would require some additional effort and understanding of what did those fields mean to you in your previous query), here there are two queries that provide exactly the same results, when run against the same table, with different schema:

Query against a table with the old schema:

SELECT
  user_dim.app_info.app_instance_id,
  event.name,
  params.value.int_value engagement_time
FROM
  `DATASET.app_events_YYYYMMDD`,
  UNNEST(event_dim) AS event,
  UNNEST(event.params) AS params,
  UNNEST(user_dim.user_properties) AS user_params
WHERE
  (event.name = "user_engagement"
    AND params.key = "engagement_time_msec")
  AND (user_params.key = "plays_quickplay"
    AND user_params.value.value.string_value = "true")
ORDER BY 1, 2, 3

Query against the same table, with the new schema:

SELECT
  user_pseudo_id,
  event_name,
  params.value.int_value engagement_time
FROM
  `DATASET.events_YYYYMMDD`,
  UNNEST(event_params) AS params,
  UNNEST(user_properties) AS user_params
WHERE
  (event_name = "user_engagement"
    AND params.key = "engagement_time_msec")
  AND (user_params.key = "plays_quickplay"
    AND user_params.value.string_value = "true")
ORDER BY 1, 2, 3

Again, for this I am using the following table from the public dataset: firebase-public-project.com_firebase_demo_ANDROID.app_events_YYYYMMDD, so I had to change some filters and remove some others in order for it to retrieve sensible results against that table. Therefore, feel free to modify or add the ones you need in order for it to be useful for your use case.

like image 62
dsesto Avatar answered Sep 20 '25 05:09

dsesto