in

Real-time data processing for ML with Striim and BigQuery


In today’s data-driven world, the ability to leverage real-time data for machine learning applications is a game-changer. Two key players in this field, Striim and Google Cloud with BigQuery, offer a powerful combination to make this possible. Striim serves as a real-time data integration platform that seamlessly and continuously moves data from diverse sources to destinations such as cloud databases, messaging systems, and data warehouses, making it a vital component in modern data architectures. BigQuery is an enterprise data platform with best-in-class capabilities to unify all data and workloads in multi-format, multi-storage and multi-engine. BigQuery ML is built into the BigQuery environment, allowing you to create and deploy machine learning models using SQL-like syntax in a single, unified experience.

Real-time data processing in the world of machine learning (ML) allows data scientists and engineers to focus on model development and monitoring, instead of relying on traditional methods where data scientists and ML engineers used to manually execute workflows and code to gather, clean, and label their raw data through batch processing, which often involved delays and less responsiveness. Striim’s strength lies in its capacity to connect to over 150 data sources, enabling real-time data acquisition from virtually any location and simplifying data transformations. This empowers businesses to expedite the creation of machine learning models and make data-driven decisions and predictions swiftly, ultimately enhancing customer experiences and optimizing operations. By incorporating the most current data, organizations can further boost the accuracy of their decision-making processes, ensuring that insights are derived from the latest information available, leading to more informed and strategic business outcomes.

Prerequisites

Before we embark on the journey of integrating Striim with BigQuery ML for real-time data processing in machine learning, there are a few prerequisites that you should ensure are in place.

  1. Striim instance: To get started, you need to have a Striim instance created and have access to it. Striim is the backbone of this integration, and having a working Striim instance is essential for setting up the data pipelines and connecting to your source databases. For a free trial, please sign up for a Striim Cloud on Google Cloud trial at https://go2.striim.com/trial-google-cloud
  2. Basic understanding of Striim: Familiarity with the basic concepts of Striim and the ability to create data pipelines is crucial. You should understand how to navigate the Striim environment, configure data sources, and set up data flows. If you’re new to Striim or need a refresher on its core functionalities, you can review the documentation and resources available at https://github.com/schowStriim/striim-PoC-migration.

In the forthcoming sections of this blog post, we will guide you through the seamless integration of Striim with BigQuery ML, showcasing a step-by-step process from connecting to a Postgres database to deploying machine learning models. The integration of Striim’s real-time data integration capabilities with BigQuery ML’s powerful machine learning services empowers users to not only move data seamlessly but also harness the latest data for building and deploying machine learning models. Our demonstration will highlight how these tools facilitate real-time data acquisition, transformation, and model deployment, ultimately enabling organizations to make quick, data-driven decisions and predictions while optimizing their operational efficiency.

Section 1: Connecting to the source database

The first step in this integration journey is connecting Striim to a database that contains raw machine learning data. In this blog, we will focus on a PostgreSQL database. Inside this database, we have an iris_dataset table with the following column structure.

Nuclera runs AlphaFold2 on Vertex AI

Building a clinical intelligence engine using MedLM

Camanchaca innovates its employee experience with real-time generative agents

Apollo 24|7 uses MedLM and RAG for healthcare innovation