in

How to use an LLM to create data schemas in BigQuery


The intricate hierarchical data structures in data warehouses and lakes sourced from diverse origins can make data modeling a protracted and error-prone process. To quickly adapt and create data models that meet evolving business requirements without having to rework them excessively, you need data models that are flexible, modular and adaptable enough to accommodate many requirements. This requires advanced technologies, proficient personnel, and robust methodologies.

The advancements in generative AI offer numerous opportunities to address these challenges. Multimodal large language models (LLMs) can analyze examples of data in the data lake, including text descriptions, code, and even images of existing databases. By understanding this data and its relationships, LLMs can suggest or even automatically generate schema layouts, simplifying the laborious process of implementing the data model within the database, so developers can focus on higher value data management tasks.

In this blog, we walk you through how to use multimodal LLMs in BigQuery to create a database schema. To do so, we’ll take a real-world example of entity relationship (ER) diagrams and examples of data definition languages (DDLs), and create a database schema in three steps. 

For this demonstration, we will use Data Beans, a fictional technology company built on BigQuery that provides a SaaS platform to coffee sellers. Data Beans leverages BigQuery’s integration with Vertex AI to access Google AI models like Gemini Vision Pro 1.0 to analyze unstructured data and integrate it with structured data, while using BigQuery to help with data modeling and generating insights. 

STEP1 : Create an entity relationship diagram 

The first step is to create an ER diagram using your favorite modeling tool, or to take a screenshot of an existing ER diagram. The ER diagram can contain primary key and foreign key relationships, and will then be used as an input to the Gemini Vision Pro 1.0 model to create relevant BigQuery DDLs.


SciTechDaily

MIT Scientists Use AI To Target “Sleeper” Bacteria

A Cheaper OpenAI? Introducing Goose.ai | Unscripted Coding