I'd like to add events to BigQuery in order to view them with graph using services like ModeAnalytics.
I'm not sure to grasp the full concept of BigQuery and maybe I'm making wrong assumptions about it, but what I'd like to use it for is have a (kind of) table "events" and add events to it based on the event name.
This could be like "new account", "user search", etc.
But all the documentation I could find was about loading a lot of data from other tables, a csv, a json file, etc.
I was not able to find a documentation about just doing something like a POST request with data to add to the "event" table.
How can I do ?
I'd suggest reading up on BigQuery a little more. You don't seem to have fully grasped the concept of what it actually is yet. Try here for starters. Think "massively scalable data analytics using SQL using the power of Google's infrastructure". After that, then have a look at its streaming API functionality. This allows you to inserts "events" using http POST requests via its RESTful API.
Once you get your head around that, then there are a few solutions to stream the data in. You could go to BigQuery directly, but that's not a great idea - for obvious reasons. As mentioned in varun's answer you could indeed hook up Pub/Sub & Dataflow, but I feel that might be too much for want you need.
So, a common pattern/solution to this is to using a queue, and have something consume the events off that queue asynchronously, and push them to BigQuery in batches. This is how we do it for many of our projects and it works wonderfully.
You don't have to use GAE to achieve this. You can use any type of queue, or simply build your own custom one. For example, we run some of our queues on AWS using SQS (our events originate from Apache nodes running on EC2), and have a application which then consumes the events off SQS asynchronously.
We also use Redis to achieve the same effect. And this runs in production, is scalable, and processes about 50K events to BigQuery a minute for us. One of our engineers wrote a blog post about it here.
So, as you can see, there are many ways to build a solution to this. However, the basic premise is to have some sort of queue that processes your "events" asynchronously, and pushes them in batches to BigQuery where you can then do your analysis and plug in a BI tool to make nice graphs. Also, what solution you choose depends on your specific use case e.g. where are your events originating from? What are your skill sets like? etc.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With