[Live Workshop] How to Build GenAI Applications with a Data Streaming Platform | Register Now

Jun 13, 2025Lesedauer: 5 min

7 Steps to Build an AI-Powered Personalization Engine With Confluent & Databricks

Verfasst von

Greg MurphyStaff Product Marketing Manager, Confluent
Kyle KleinSenior Technical Marketing Manager

Jun 13, 2025Lesedauer: 5 min

The advancement and widespread availability of new artificial intelligence (AI) capabilities—through platforms like the Databricks Data Intelligence Platform and Mosaic AI—has completely reset expectations for engineering teams across every industry. Business now moves at a new pace, demanding rapid delivery of intelligent, real-time applications—instead of slowly stitched-together systems solving problems defined and scoped months prior. But even the most advanced AI platform is only as effective as the data that fuels it. To deliver true value, AI needs a continuous supply of fresh, real-time data.

Built by the original creators of Apache Kafka®, Confluent’s data streaming platform connects every corner of your business with Databricks in real time to power highly contextualized AI and analytics at scale. From legacy mainframes to modern cloud databases and enterprise applications to IoT devices at the edge, Confluent brings all your operational data to Databricks to fuel intelligent applications and proactive, automated decision-making.

Within this blog, you’ll learn how Confluent and Databricks can be used together to unlock real-time AI before stepping into a hands-on tutorial, with step-by-step guidance for building an AI-powered marketing personalization engine.

Ready to jump right into the demo?

Open Quickstart

Deploy an AI Strategy That Bridges the Operational and Analytical Divide

Many enterprises struggle to make AI operational because their data is siloed into different systems that weren’t built to work together. Most organizations have two critical data silos:

Operational systems that power applications, transactions, and real-time events
Analytical systems that drive data intelligence and AI for better decision-making

Data often moves between these silos in batch jobs that are slow, brittle, and manual, with governance and lineage getting lost along the way. This represents a major issue for large language models (LLMs) and agentic AI where wrong or outdated data means poor reasoning and incorrect decisions. A reliable, real-time bridge between operational and analytical systems is critical—enabling high-value operational data to be analyzed, joined with historical data, and fed into AI models to make real decisions. Making this happen requires Kafka data streams to be structured in formats compatible with data lakehouse architectures.

Tableflow represents Kafka topics and their associated schemas as Delta Lake tables (Open Preview) in just a few clicks. No stitching pipelines, less data duplication, no schema headaches. Just enable Tableflow, and your Kafka data becomes instantly accessible to your analytics and AI tooling. It’s that simple.

Easily represent Kafka topics as Delta Lake tables (Open Preview) with Tableflow.

Fuel AI With Trustworthy, Real-Time Data Products

Tableflow establishes a seamless highway between data in motion and data at rest—the operational and analytical estates. This highway is the exchange mechanism for the fuel that brings AI to life: real-time, trusted data products. Not just raw data, but governed, reusable data assets—designed to power AI and analytics, regardless of their origin. Confluent makes building these data products easy.

With 120+ pre-built connectors spanning the entire data ecosystem, teams can seamlessly integrate data streams from every operational system or application, ensuring AI models and analytics are always working with the freshest, most relevant information. Together with Apache Flink® stream processing and built-in governance, including schema management and validation, these tools enable cleaning, processing, and enriching of all data in motion—ensuring only high-quality, governed, and discoverable data lands in downstream data lakehouses and analytics platforms.

With Confluent, you can build a real-time, contextualized, and trustworthy knowledge base to fuel AI on Databricks.

Quick Start: Build an AI-Powered Marketing Personalization Engine

In the workshop that follows, you’ll learn how to build an AI-powered marketing personalization engine for a (fictional) luxury hotel brand. River Hotels wants to use Confluent and Databricks to build an application that will identify hotel locations with low booking numbers and recommend promotional content to lift sales.

Confluent will ingest and merge data from multiple sources using connectors, including the Oracle XStream CDC Source Connector, resulting in real-time data products identifying properties with low bookings. These data products will then be converted to Delta Tables with Tableflow and written to Databricks. With Databricks Genie, built atop Mosaic AI, you’ll then be able to generate the best-fit promotional copy and audience segments on the fly. Let’s get going.

Quickstart Prerequisites

Confluent Cloud account with admin privileges
Databricks account and existing workspace (trial account supported)
AWS CLI installed and authenticated with resource creation permissions
Terraform installed
Docker Desktop installed
Git installed

Recommended Technical Knowledge

Working knowledge of cloud platforms (AWS)
Basic SQL proficiency
Familiarity with streaming data concepts

Architecture

This diagram provides an overview of the River Hotels implementation on Confluent and Databricks.

River Hotels wants to use Confluent and Databricks together to build an application that will identify hotel locations with low booking numbers and recommend promotional content to lift sales.

Data Flow and Key Technical Components

Customer and hotel updates from Oracle are continuously captured and streamed to Kafka using the Oracle XStream CDC Source Connector, while bookings, reviews, and clickstream events flow into Kafka topics in real time. Apache Flink processes and joins these streams to pinpoint high-intent customers—such as those who browsed but didn’t book within 20 minutes.

From there, a Databricks AI model analyzes hotel reviews to craft personalized marketing content. Tableflow syncs the enriched Kafka data as Delta Lake tables in Databricks, where insights are generated and AI-powered campaigns are launched to re-engage potential customers.

Capabilities	Components
Data Ingestion	Oracle XE database with Oracle XStream CDC Source Connector enabled for customer and hotel data	Data generators producing realistic booking, review, and clickstream events	Oracle XStream CDC Source Connector for real-time database change capture
Stream Processing	Confluent Cloud with Kafka clusters for handling varying data volumes and latencies	Apache Flink for real-time data processing and AI model inference	Schema Registry for data governance and quality
Data Enrichment	Native Flink integration with AI models	Data aggregation	Real-time review summarization and customer targeting
Integration & Analytics	Tableflow for seamless Delta Lake integration with Amazon S3	Databricks Genie AI for natural language data exploration	Databricks Notebook for crafting targeted social media posts

Run the Quick Start

Below is a summary of the steps to run this Quickstart. Check out the full GitHub repository to access and go through the complete quickstart steps.

1. Setup Local Environment: The first step is to complete the prerequisites by installing the necessary tools locally as well as signing up and/or logging in to your AWS, Confluent, and Databricks accounts.

2. Deploy Cloud Infrastructure: Next, you will Initialize Terraform and add the necessary API and account details needed to spin up this workshop’s cloud resources across AWS, Confluent, and Databricks. Once initialized and validated, you will deploy the cloud resources with a single Terraform command—just like magic.

3. Ingest Mock Data: At this point, you will generate the mock workshop data by running a local service that produces two streams of data to Oracle and three to Kafka. To get the Oracle data into Confluent, you will utilize the Oracle XStream CDC Source Connector which makes it easy to capture data changes.

The fully managed Oracle XStream CDC Source Connector for Confluent Cloud captures all changes made to rows in an Oracle database and represents the changes as change event records in Apache Kafka^® topics.

4. Process and Enrich Data With Apache Flink: Now that the data is being ingested into Confluent Cloud, you will convert these data streams into enriched data products using Flink by creating a table for target customers with AI-summarized reviews and a combined hotel + bookings table.

5. Stream Data Products to Delta Lake With Tableflow: Next, you will stream these data products as Delta tables by enabling Tableflow for them. This will sync them to an S3 bucket that your Databricks account can also access.

6. Generate Insights and Create Targeted Campaigns With Databricks: Within Databricks you will easily visualize and derive insights from these data products with AI/BI Genie and utilize additional AI features to generate targeted social marketing campaigns.

7. Clean Up: In this final step you will manually remove a few cloud resources that you created in both the Confluent and Databricks UI. Then you can efficiently spin down and remove the rest with Terraform.

Get Started With Confluent

Ready to get going? Check out the quick start and start building with Confluent and Databricks today. By connecting Confluent with Databricks via Tableflow, you can deploy an AI strategy that bridges the operational and analytical divide without the costs and headaches of complex data engineering.

Open Quickstart

Not yet a Confluent customer? Start your free trial of Confluent Cloud today. New users receive $400 to spend during their first 30 days.

‎

Apache®, Apache Kafka, Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, Iceberg™️, and the Kafka, Flink, and Iceberg logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Greg Murphy is the Staff Product Marketing Manager focused on developing and evangelizing Confluent’s technology partner program. He helps customers better understand how Confluent’s data streaming platform fits within the larger partner ecosystem. Prior to Confluent, Greg held product marketing and product management roles at Salesforce and Google Cloud.
Kyle is a senior technical marketing manager at Confluent who transforms complex streaming data concepts into hands-on learning experiences. He designs and delivers technical demos, workshops, and webinars that help developers and data engineers conceptualize robust data streaming solutions, bridging the gap between cutting-edge technology and practical business solutions

Ist dieser Blog-Beitrag interessant? Jetzt teilen

Streaming Data Fuels Real-time AI & Analytics: Connect with Confluent Q1 Program Entrants

Mar 7, 2025

This is the Q1 2025 Connect with Confluent announcement blog—a quarterly installment introducing new entrants into the Connect with Confluent technology partner program. Every blog has a new theme, and this quarter’s focus is on powering real-time AI and analytics with streaming data.

Greg Murphy

Driving Real-Time Innovation: Meet the Five New Build with Confluent Partners

Feb 4, 2025

Jump-start a new use case with our new Build with Confluent partners and solutions.

Paul Mac Farland

7 Steps to Build an AI-Powered Personalization Engine With Confluent & Databricks

Get started with Confluent Cloud

Verfasst von

Deploy an AI Strategy That Bridges the Operational and Analytical Divide

Fuel AI With Trustworthy, Real-Time Data Products