[Live Workshop] How to Build GenAI Applications with a Data Streaming Platform | Register Now
The advancement and widespread availability of new artificial intelligence (AI) capabilities—through platforms like the Databricks Data Intelligence Platform and Mosaic AI—has completely reset expectations for engineering teams across every industry. Business now moves at a new pace, demanding rapid delivery of intelligent, real-time applications—instead of slowly stitched-together systems solving problems defined and scoped months prior. But even the most advanced AI platform is only as effective as the data that fuels it. To deliver true value, AI needs a continuous supply of fresh, real-time data.
Built by the original creators of Apache Kafka®, Confluent’s data streaming platform connects every corner of your business with Databricks in real time to power highly contextualized AI and analytics at scale. From legacy mainframes to modern cloud databases and enterprise applications to IoT devices at the edge, Confluent brings all your operational data to Databricks to fuel intelligent applications and proactive, automated decision-making.
Within this blog, you’ll learn how Confluent and Databricks can be used together to unlock real-time AI before stepping into a hands-on tutorial, with step-by-step guidance for building an AI-powered marketing personalization engine.
Ready to jump right into the demo?
Many enterprises struggle to make AI operational because their data is siloed into different systems that weren’t built to work together. Most organizations have two critical data silos:
Operational systems that power applications, transactions, and real-time events
Analytical systems that drive data intelligence and AI for better decision-making
Data often moves between these silos in batch jobs that are slow, brittle, and manual, with governance and lineage getting lost along the way. This represents a major issue for large language models (LLMs) and agentic AI where wrong or outdated data means poor reasoning and incorrect decisions. A reliable, real-time bridge between operational and analytical systems is critical—enabling high-value operational data to be analyzed, joined with historical data, and fed into AI models to make real decisions. Making this happen requires Kafka data streams to be structured in formats compatible with data lakehouse architectures.
Tableflow represents Kafka topics and their associated schemas as Delta Lake tables (Open Preview) in just a few clicks. No stitching pipelines, less data duplication, no schema headaches. Just enable Tableflow, and your Kafka data becomes instantly accessible to your analytics and AI tooling. It’s that simple.
Tableflow establishes a seamless highway between data in motion and data at rest—the operational and analytical estates. This highway is the exchange mechanism for the fuel that brings AI to life: real-time, trusted data products. Not just raw data, but governed, reusable data assets—designed to power AI and analytics, regardless of their origin. Confluent makes building these data products easy.
With 120+ pre-built connectors spanning the entire data ecosystem, teams can seamlessly integrate data streams from every operational system or application, ensuring AI models and analytics are always working with the freshest, most relevant information. Together with Apache Flink® stream processing and built-in governance, including schema management and validation, these tools enable cleaning, processing, and enriching of all data in motion—ensuring only high-quality, governed, and discoverable data lands in downstream data lakehouses and analytics platforms.
In the workshop that follows, you’ll learn how to build an AI-powered marketing personalization engine for a (fictional) luxury hotel brand. River Hotels wants to use Confluent and Databricks to build an application that will identify hotel locations with low booking numbers and recommend promotional content to lift sales.
Confluent will ingest and merge data from multiple sources using connectors, including the Oracle XStream CDC Source Connector, resulting in real-time data products identifying properties with low bookings. These data products will then be converted to Delta Tables with Tableflow and written to Databricks. With Databricks Genie, built atop Mosaic AI, you’ll then be able to generate the best-fit promotional copy and audience segments on the fly. Let’s get going.
Confluent Cloud account with admin privileges
Databricks account and existing workspace (trial account supported)
AWS CLI installed and authenticated with resource creation permissions
Terraform installed
Docker Desktop installed
Git installed
Working knowledge of cloud platforms (AWS)
Basic SQL proficiency
Familiarity with streaming data concepts
This diagram provides an overview of the River Hotels implementation on Confluent and Databricks.
Customer and hotel updates from Oracle are continuously captured and streamed to Kafka using the Oracle XStream CDC Source Connector, while bookings, reviews, and clickstream events flow into Kafka topics in real time. Apache Flink processes and joins these streams to pinpoint high-intent customers—such as those who browsed but didn’t book within 20 minutes.
From there, a Databricks AI model analyzes hotel reviews to craft personalized marketing content. Tableflow syncs the enriched Kafka data as Delta Lake tables in Databricks, where insights are generated and AI-powered campaigns are launched to re-engage potential customers.
Capabilities | Components | ||
---|---|---|---|
Data Ingestion | Oracle XE database with Oracle XStream CDC Source Connector enabled for customer and hotel data | Data generators producing realistic booking, review, and clickstream events | Oracle XStream CDC Source Connector for real-time database change capture |
Stream Processing | Confluent Cloud with Kafka clusters for handling varying data volumes and latencies | Apache Flink for real-time data processing and AI model inference | Schema Registry for data governance and quality |
Data Enrichment | Native Flink integration with AI models | Data aggregation | Real-time review summarization and customer targeting |
Integration & Analytics | Tableflow for seamless Delta Lake integration with Amazon S3 | Databricks Genie AI for natural language data exploration | Databricks Notebook for crafting targeted social media posts |
Below is a summary of the steps to run this Quickstart. Check out the full GitHub repository to access and go through the complete quickstart steps.
1. Setup Local Environment: The first step is to complete the prerequisites by installing the necessary tools locally as well as signing up and/or logging in to your AWS, Confluent, and Databricks accounts.
2. Deploy Cloud Infrastructure: Next, you will Initialize Terraform and add the necessary API and account details needed to spin up this workshop’s cloud resources across AWS, Confluent, and Databricks. Once initialized and validated, you will deploy the cloud resources with a single Terraform command—just like magic.
3. Ingest Mock Data: At this point, you will generate the mock workshop data by running a local service that produces two streams of data to Oracle and three to Kafka. To get the Oracle data into Confluent, you will utilize the Oracle XStream CDC Source Connector which makes it easy to capture data changes.
4. Process and Enrich Data With Apache Flink: Now that the data is being ingested into Confluent Cloud, you will convert these data streams into enriched data products using Flink by creating a table for target customers with AI-summarized reviews and a combined hotel + bookings table.
5. Stream Data Products to Delta Lake With Tableflow: Next, you will stream these data products as Delta tables by enabling Tableflow for them. This will sync them to an S3 bucket that your Databricks account can also access.
6. Generate Insights and Create Targeted Campaigns With Databricks: Within Databricks you will easily visualize and derive insights from these data products with AI/BI Genie and utilize additional AI features to generate targeted social marketing campaigns.
7. Clean Up: In this final step you will manually remove a few cloud resources that you created in both the Confluent and Databricks UI. Then you can efficiently spin down and remove the rest with Terraform.
Ready to get going? Check out the quick start and start building with Confluent and Databricks today. By connecting Confluent with Databricks via Tableflow, you can deploy an AI strategy that bridges the operational and analytical divide without the costs and headaches of complex data engineering.
Not yet a Confluent customer? Start your free trial of Confluent Cloud today. New users receive $400 to spend during their first 30 days.
Apache®, Apache Kafka, Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, Iceberg™️, and the Kafka, Flink, and Iceberg logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
This is the Q1 2025 Connect with Confluent announcement blog—a quarterly installment introducing new entrants into the Connect with Confluent technology partner program. Every blog has a new theme, and this quarter’s focus is on powering real-time AI and analytics with streaming data.
Jump-start a new use case with our new Build with Confluent partners and solutions.