Using BigQuery data canvas: a deep dive

BigQuery data canvas in action

To give you a better sense of how the impact BigQuery data canvas can have in your organization, let’s take an example. Companies of all sizes, from large enterprises to small startups, can benefit from a deeper understanding of their developer team’s productivity. In this technical deep dive, we’ll show you how to use the github_repos public dataset with data canvas to generate valuable insights in a shareable workspace. Through this example, you’ll see how data canvas makes it easy to perform complex queries — allowing you to create SQL that joins and unnests nested fields, converts timestamps, extracts month/year from date fields, and more. With Gemini’s capabilities, you can easily generate these queries and explore your data, with insightful visualizations, all using natural language.

Please note that as with many new AI products and services today, you need strong prompt engineering skills for the successful use of any LLM-enabled application. Many may perceive that out-of-the-box large language models (LLMs) are not good at generating SQL. But in our experience, with the right prompting techniques, Gemini in BigQuery via data canvas can generate complex SQL queries with the context of your data corpus. We see that data canvas determines the sorting, grouping, ordering, limiting the record count and SQL structure based on natural language queries. To learn about engineering prompts for BigQuery data canvas, check out our other blog post, BigQuery BigQuery Data Canvas: Prompting Best Practices.

The github_repos dataset, available in Bigquery Public Datasets, is a 3TB+ dataset that contains activity on 3M+ open-source repositories about commits, watch counts, etc., in multiple tables. For this example, we want to look at Google Cloud Platform repository. As always, ensure you have the proper IAM permissions before getting started. Along with these, ensure you have proper permissions to the data canvas and datasets to run nodes successfully.

Exploring each of the tables within the github_repos dataset is easy to do with data canvas. Here, we compare datasets side-by-side, examine schema, details, and preview data, all in the same panel.