Databricks' Innovations in Open-Source AI and Data Governance: A Conversation with Ivo Everts
Ahead of the AI & Big Data Expo Europe, Ivo Everts, Senior Solutions Architect at Databricks, shared key developments in the areas of open-source AI and data governance. In this insightful discussion, Everts highlighted how Databricks is shaping the future of AI and data management with innovations like the DBRX model, Unity Catalog, and new products like Databricks AI/BI and Mosaic AI.
Setting New Standards with the DBRX Model
Databricks has set a new benchmark for open-source large language models (LLMs) with the release of the DBRX model. This model outperformed many leading open LLMs on standard benchmarks and boasts up to two times faster inference speeds compared to models like Llama2-70B. Everts explained that this achievement was made possible through advanced training techniques.
According to Everts, "DBRX is one of the best open-source models available today, excelling in various industry benchmarks, including language comprehension, programming, and mathematical tasks." The model aims to democratize the training of custom LLMs, allowing organizations to develop world-class models using their own data in a cost-effective manner.
Open-Source Unity Catalog: A Game Changer for Data Governance
Databricks has also open-sourced Unity Catalog, enhancing its adoption across cloud platforms such as AWS and Azure, as well as on-premise infrastructures. This move supports consistent data governance regardless of where the data is stored or processed.
Everts highlighted several features of Unity Catalog that address data governance challenges:
- Centralized Data Access Management: Unity Catalog allows organizations to manage data access in a unified way.
- Role-Based Access Control (RBAC): This feature enables organizations to assign roles and permissions based on user profiles.
- Data Lineage and Auditing: This provides detailed tracking of data usage, helping identify and eliminate outdated or redundant data. Additionally, all data access and changes are logged to ensure compliance with data security policies.
- Cross-Cloud and Hybrid Support: Unity Catalog offers governance across multi-cloud and hybrid environments, ensuring consistent management no matter where the data resides.
The Rise of Databricks AI/BI for Business Intelligence
Databricks has introduced Databricks AI/BI, a powerful business intelligence tool that uses generative AI to enhance data exploration and visualization. Everts believes that for a truly intelligent BI system, understanding the unique nuances of a business is essential to providing useful answers to business users.
The AI/BI system consists of two main components:
- Dashboards: An AI-powered, low-code interface for creating interactive dashboards that include features like visualizations, cross-filtering, and periodic reports.
- Genie: A conversational interface that allows users to ask ad-hoc questions using natural language. Genie continuously learns from underlying data to offer better visualizations and suggestions over time.
Everts explained that Databricks AI/BI enables self-service data analysis for everyone in an organization, powered by a compound AI system that learns from data usage across the entire data stack.
Mosaic AI: A Unified Platform for AI and ML Solutions
Databricks also unveiled Mosaic AI, a comprehensive platform for building, deploying, and managing machine learning (ML) and generative AI applications. Mosaic AI integrates enterprise data to enhance both performance and governance, offering the following key components:
- Unified Tooling: Tools for building, deploying, and managing AI and ML solutions.
- Generative AI Patterns: Supports prompt engineering, retrieval-augmented generation (RAG), and fine-tuning of AI models.
- Centralized Model Management: Allows for centralized governance and querying of AI models, including custom and foundation models.
- Monitoring and Governance: Ensures comprehensive tracking and governance across the entire AI lifecycle through Lakehouse Monitoring and Unity Catalog.
Everts emphasized that Mosaic AI enables companies to train and serve custom LLMs in a cost-effective way, tailored to their specific needs. The platform also features fast startup times, live prompt evaluation, and support for custom pre-trained checkpoints, making it an ideal solution for organizations looking to scale their AI capabilities.
The Data Intelligence Platform: A Unified Approach to AI and Data Management
At the core of these innovations is the Data Intelligence Platform, which Everts describes as a revolutionary solution that transforms data management through the use of AI models. The platform combines data lake and data warehouse features using Delta Lake technology for real-time processing and Delta Sharing for secure data exchange across organizations.
The Data Intelligence Platform enables businesses to:
- Leverage a Unified Data and AI Architecture: Combining the capabilities of data lakes and warehouses into a single, scalable platform.
- Ensure Real-Time Data Processing: Through Delta Lake, providing reliable governance and real-time insights.
- Collaborate and Share Data: Using Delta Sharing for secure and open data collaboration across organizational boundaries.
- Integrate ML and AI: With support for popular AI frameworks such as MLflow, PyTorch, and TensorFlow.
- Achieve Scalability and High Performance: The platform’s cloud-native architecture and Photon engine optimize query execution for improved performance.
Opportunities and Challenges of AI Deployment
Everts noted that AI, particularly generative AI, is rapidly expanding the range of technologies in the data ecosystem. New tools and models are emerging at a fast pace, and companies are looking for ways to integrate these technologies seamlessly into their existing workflows. While this can be challenging, it also presents significant opportunities. Databricks’ platforms, including Control-M and Mosaic AI, offer the flexibility and adaptability needed to incorporate new AI tools into existing systems without requiring extensive reworking.
Case Study: Domino’s Pizza
Domino’s Pizza is a prime example of how Databricks solutions are used to orchestrate large-scale data pipelines. With over 20,000 stores worldwide, Domino’s manages more than 3,000 data pipelines that pull data from internal systems, sales data, and third-party sources. This data is then processed to inform decisions related to food quality, customer satisfaction, and operational efficiency.
Control-M orchestrates these workflows, integrating various technologies such as Apache Kafka, SQL Server, and Power BI, among others. The platform also provides end-to-end visibility of the pipelines, helping Domino’s meet strict service-level agreements (SLAs) while scaling operations efficiently.
The Future of Databricks: What’s Next?
Looking ahead, Databricks plans to continue expanding its capabilities across cloud platforms while maintaining a focus on enabling businesses to adopt modern technologies through its Control-M platform. In addition to introducing more integrations with public cloud providers, Databricks aims to enhance collaboration within DataOps by offering persona-based user experiences.
Everts also hinted at new developments in data quality and its integration with data orchestration, ensuring that data quality becomes a first-class citizen within the broader data ecosystem. This will further strengthen Databricks’ position as a leader in AI and data governance.
Conclusion
Databricks is at the forefront of transforming how businesses leverage AI and data governance. With innovations like DBRX, Unity Catalog, Databricks AI/BI, and Mosaic AI, the company is helping organizations democratize AI, streamline data management, and scale their data-driven initiatives. As the AI landscape continues to evolve, Databricks’ focus on open-source solutions and flexible data governance will play a crucial role in shaping the future of AI and data management.

0 Comments