[Live Workshop] Streams on Tour: Hands-On Deep Dive into Confluent | Register Now
Users of Snowflake and other data lakes and data warehouses need real-time data for artificial intelligence (AI) and analytical workloads—but they struggle to get that data into their lakes and warehouses. In response to this ubiquitous challenge, Confluent developed Tableflow. The goal was simple: Make it push-button easy to represent Apache Kafka® topics as open table formats such as Apache Iceberg™️ (General Availability), ready for consumption by data lakes and warehouses like Snowflake but without complex ETL pipelines.
Last year, we shared our vision for Tableflow to bridge the gap between real-time operational data flowing through Kafka and the analytical AI and machine learning (ML) ecosystems. Throughout our Early Access Program, we received invaluable feedback from customers who experienced firsthand how Tableflow could simplify their data architectures and accelerate insights. Building on that momentum and success, we announced the General Availability (GA) of Tableflow during Current Bengaluru ’25, featuring robust support for Iceberg and seamless integration with the Snowflake Open Catalog.
Learn more about Tableflow via Confluent product documentation or contact us to request a personalized demo.
Snowflake is equally excited about this expanded integration, which makes it even easier for our joint customers to unlock the value of their real-time data streams using the breadth of Snowflake analytics and ML services. This collaboration underscores our shared commitment to helping organizations innovate faster by simplifying data access and management in the cloud.
This blog post examines the historical separation of analytical and operational systems, detailing how Tableflow's innovative features facilitate their convergence. Furthermore, it elaborates on Tableflow’s seamless integration with Snowflake's analytical services, such as Cortex AI, leveraging Open Catalog, to enable real-time analytics.
Operational and analytical systems have historically been distinct due to their differing design principles and objectives. Operational systems, encompassing microservices, software-as-a-service (SaaS) applications, and transactional databases, prioritize rapid, high-volume transaction processing to support real-time application responsiveness. Conversely, analytical systems are engineered for intricate queries, historical data exploration, and AI applications, necessitating batch processing and specialized storage architectures. It is essential that the data collected in the operational estate must be shared with the analytical estate in real time to perform analytics and generate business insights. Essentially, you want to streamline the process of feeding your operational data in Kafka topics into Iceberg tables so that it’s ready to power analytics in data lakes or warehouses.
Feeding raw operational data from Kafka into Snowflake and other data lakes and warehouses in Iceberg format is a complex, expensive, and error-prone process that requires building custom data pipelines. In these pipelines, you need to transfer data (using sink connectors), clean data, manage schema, materialize change data capture streams, transform and compact data, and store it in Apache Parquet and Iceberg table formats.
This intricate workflow demands significant effort and expertise to ensure data consistency and usability. What if you could eliminate all the hassle and have your Kafka topics automatically materialized into analytics-ready Iceberg tables in your data lake or warehouse? That’s precisely what Tableflow allows you to do.
Tableflow revolutionizes the way Kafka data is materialized into data lakes and warehouses by seamlessly representing Kafka topics as Iceberg tables. Tableflow uses innovations in Kora’s storage layer that allow the flexibility to take Kafka segments and write them out to other storage formats—in this case, Parquet files. Tableflow also uses a new metadata publishing service behind the scenes that taps into Confluent’s Schema Registry to generate Iceberg metadata transaction logs while handling schema mapping, schema evolution, and type conversions.
Here are the key capabilities of Tableflow:
Data Conversion: It converts Kafka segments and schemas in Avro, JSON, or Protobuf into Iceberg compatible schemas and Parquet files, using Schema Registry in Confluent Cloud as the source of truth.
Schema Evolution: It automatically detects schema changes, such as adding fields or widening types, and applies them to the respective table.
Catalog Syncing: You can sync Tableflow-created tables as external tables in AWS Glue, Snowflake Open Catalog, Apache Polaris, and Unity Catalog (coming soon).
Table Maintenance and Metadata Management: It automatically compacts small files when it detects enough of them and also handles snapshot and version expiration.
Choose Your Storage: You can store the data in your own Amazon S3 bucket or let Confluent host and manage the storage for you.
With just the push of a button, you can now represent your Kafka data in Confluent Cloud as Iceberg tables to feed your data lake, warehouse, or analytical engine.
Open Catalog is Snowflake's innovative approach to managing Iceberg tables. It's an implementation of the open source Iceberg REST catalog API and is designed to provide a centralized way to register, discover, and govern Iceberg tables regardless of where the data physically resides (e.g., in your Amazon S3 buckets).
The key principles behind Snowflake Open Catalog are:
Openness: It embraces the open Iceberg standard, promoting a vendor-neutral approach to data management.
Interoperability: It allows various query engines and data processing tools that support the Iceberg REST API—including Snowflake's own powerful engine—to seamlessly access and work with the same Iceberg tables.
Centralized Management: It offers a single point of reference for discovering and managing Iceberg table metadata, simplifying data governance and access control.
By leveraging Snowflake Open Catalog, you can build flexible, open data lakehouse architectures without being locked into proprietary formats.
Tableflow supports integrations with Snowflake Open Catalog along with its own built-in REST Iceberg catalog for seamless discovery of materialized Iceberg tables. As stated earlier, Tableflow's primary purpose is to materialize streaming Kafka data as Iceberg tables. These tables are written directly to cloud object storage (Amazon S3), enabling a durable and query-optimized representation of the original event stream without requiring custom pipelines or ETL processes.
Now let’s look at a walk-through demo of Tableflow integration with Snowflake Open Catalog, which is as easy as a few clicks. To begin, start within your Confluent Cloud environment and configure Tableflow by selecting the Kafka topics you want to stream into Snowflake. As part of this setup, specify your Amazon S3 bucket as the destination where Tableflow will write the data in Iceberg format. While setting up Amazon S3, use AWS AssumeRole and establish the necessary provider integration within Confluent Cloud. Tableflow handles the ongoing conversion of Kafka messages into Parquet files, organizes them into Iceberg tables, and manages the associated metadata and maintenance tasks like compaction. It essentially prepares your streaming data for easy consumption by analytical engines.
Once Tableflow is materializing your Kafka data as Iceberg tables in Amazon S3, the next step is to make these tables discoverable and queryable in Snowflake. This is where Snowflake's Open Catalog capabilities come into play. Configure Snowflake to connect to your Amazon S3 storage by creating an external volume, which points to the location of your Iceberg data. Then set up a catalog integration, which allows Snowflake to understand and access the Iceberg metadata.
Once the catalog integration is complete, the Iceberg table should get automatically published to Snowflake Open Catalog. Within Snowflake, you can create external volumes and externally managed Iceberg tables that link directly to the Iceberg data exposed by Tableflow. This brings real-time operational data into the analytical environment with zero duplication and minimal latency.
With the data now accessible in Snowflake, Cortex AI can be used to extract key business insights, empowering teams to take action faster and with greater confidence. For a detailed walk-through of Tableflow’s integration with Snowflake, refer to this quick start guide.
This powerful integration unlocks compelling benefits for joint Confluent and Snowflake users. It radically simplifies getting streaming Kafka data into Snowflake, automatically transforming it into queryable Iceberg tables and eliminating complex manual ETL. This accelerates time to insight, enabling analysis of the freshest data almost instantly for critical applications like real-time fraud detection and personalization. Data governance and consistency are bolstered through the Snowflake Open Catalog and Confluent Schema Registry, ensuring robust integrity. Organizations achieve significant cost savings by reducing pipeline development and maintenance, thanks to Tableflow’s automation. Moreover, it enhances collaboration by providing analysts, data scientists, and business users with easy, democratized access to up-to-date data in Snowflake while preserving unparalleled openness and flexibility through Iceberg on their own Amazon S3 storage, preventing vendor lock-in.
Further, the combined power of Tableflow and Snowflake analytics services opens transformative possibilities across industries. For instance, retailers can now leverage real-time customer interactions that are streamed via Confluent and instantly analyzed in Snowflake to personalize offers and optimize inventory on the fly. Financial services firms can detect fraudulent transactions and assess risk intra-day, moving beyond batch processing limitations. In manufacturing, streaming Internet of Things (IoT) sensor data can fuel predictive maintenance models and enhance quality control directly within Snowflake. Customer service organizations can power their AI with the freshest interaction data for more relevant next-best-action recommendations. These are just a few examples; virtually any scenario requiring fresh, reliable data for advanced analytics, from optimizing supply chains to enhancing patient care, benefits immensely from this seamless integration, accelerating insights and driving significant business value.
The integration of Tableflow with Snowflake Open Catalog is a significant step forward in making real-time data analytics more accessible, efficient, and governable. By eliminating traditional complexities and embracing open standards, Confluent and Snowflake are empowering organizations to unlock the full potential of their data in motion. This powerful combination allows them to focus on deriving value and insights from their data rather than the intricacies of pipeline management. Both Confluent and Snowflake are committed to fostering open and interoperable data ecosystems, and this integration is a testament to that shared vision.
The General Availability of Tableflow on Amazon Web Services (AWS) with Iceberg support and Snowflake Open Catalog integration marks a pivotal milestone, but our journey doesn't stop here. Throughout 2025 and beyond, Confluent will continue to invest heavily in Tableflow, enhancing its capabilities, performance, and integration points based on customer feedback and evolving market needs. Our roadmap includes various features such as Upsert, DLQ, Flink integration, bidirectional data flow, and Azure and Google Cloud support to further streamline the flow of data from operational systems to analytical systems for driving insights and AI-driven actions. Together, Confluent and Snowflake will continue to empower organizations to build next-generation, real-time applications and analytics on the cloud.
Ready to eliminate ETL complexity and unlock real-time analytics on Snowflake? Explore Tableflow today!
See it in action: Watch our short introduction video or Tim Berglund's lightboard explanation.
Get started: If you're already using Confluent Cloud, navigate to the Tableflow section for your cluster. New users can sign up for Confluent Cloud on AWS marketplace and explore Tableflow's capabilities.
Contact us today for a personalized demo and start unlocking the full potential of your data on AWS. We’re incredibly excited to see how you leverage Tableflow and Snowflake to turn your real-time data streams into tangible business value!
Confluent and associated marks are trademarks or registered trademarks of Confluent, Inc.
Apache®, Apache Kafka®, Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, and Iceberg™️ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.
Announcing the GA of Confluent’s fully-managed Kafka Connector V2 for Azure Cosmos DB—now available in Confluent Cloud. Seamlessly stream real-time data to and from Cosmos DB with improved scalability, performance, and simplified setup.
CC Q2 2025 adds Tableflow support for Delta Lake tables, Flink Snapshot Queries, maximum eCKU configuration for elastically scaling clusters, and more!