De-Nesting Google Analytics Data in BigQuery | by Martin Weitzmann

The proper way to flat tables

Photo of Singapore by Mike Enerio on Unsplash

BigQuery is an analytics engine optimized to crunch pre-joined (or: nested) data. Sub-relations make sense in analytical scenarios because we don’t want to deal with joins over bigger datasets — just imagine daily year-over-year comparisons over the last 3 years, aggregating Terabytes of data — but with joins adding another layer of complexity.

A sub-relation, or sub-table, is usually implemented as an array of structs. The array as a list-like data type provides rows, the struct, similar to a map or dictionary, provides columns. The sub-schema is consistent throughout the table — in contrast to JSON types who can change their schema from row to row.

The only other engine going down this route of nested data seems to be AWS Redshift Spectrum. Yet, if we want to use Google Analytics (GA) data in another system you’d almost always want to de-join the data to have flat tables, because capabilities to aggregate or change arrays of structs are quite limited. Most analytical database engines seem to optimize for…

De-Nesting Google Analytics Data in BigQuery | by Martin Weitzmann | Mar, 2024

The proper way to flat tables

Related Post

You Missed

Spotify set to increase prices and add more plans soon

Oura preempts Samsung’s Galaxy Ring with new features for its rings

14 Years of iPad, and “Trucks” Continue To Dominate

Apple Vision Pro Spatial Personas Hands-On: A Step-Change For Telepresence