The proper way to flat tables

Photo of Singapore by Mike Enerio on Unsplash

BigQuery is an analytics engine optimized to crunch pre-joined (or: nested) data. Sub-relations make sense in analytical scenarios because we don’t want to deal with joins over bigger datasets — just imagine daily year-over-year comparisons over the last 3 years, aggregating Terabytes of data — but with joins adding another layer of complexity.

A sub-relation, or sub-table, is usually implemented as an array of structs. The array as a list-like data type provides rows, the struct, similar to a map or dictionary, provides columns. The sub-schema is consistent throughout the table — in contrast to JSON types who can change their schema from row to row.

The only other engine going down this route of nested data seems to be AWS Redshift Spectrum. Yet, if we want to use Google Analytics (GA) data in another system you’d almost always want to de-join the data to have flat tables, because capabilities to aggregate or change arrays of structs are quite limited. Most analytical database engines seem to optimize for…

Leave a Reply