Cognite Databricks Integration Documentation
Overview
cognite-databricks provides two approaches for registering and using User-Defined Table Functions (UDTFs) to query CDF data from Databricks:
- Session-Scoped Registration: Temporary registration for development and testing
- Catalog-Based Registration: Permanent registration in Unity Catalog for production
Choosing the Right Approach
Use Session-Scoped Registration When:
- ✅ Developing and Testing: Quickly test UDTFs before committing to Unity Catalog
- ✅ Prototyping: Experiment with different configurations and queries
- ✅ Development: Quick setup for testing and development
- ✅ Temporary Analysis: Running ad-hoc queries without permanent registration
- ✅ Learning: Getting familiar with UDTFs and CDF data access patterns
Key Characteristics: - UDTFs are registered within a single Spark session - No Unity Catalog access required - Credentials passed directly in SQL queries (or via Secret Manager) - Automatically cleaned up when session ends - Faster setup, ideal for development
Use Catalog-Based Registration When:
- ✅ Production Deployments: Permanent registration with Unity Catalog governance
- ✅ Data Discovery: Views are indexed and searchable in the Databricks UI
- ✅ Access Control: Need fine-grained permissions (GRANT/REVOKE)
- ✅ Enterprise Security: Credentials stored securely in Databricks Secret Manager
- ✅ Team Collaboration: Shared, discoverable data assets across teams
- ✅ Production: Using Unity Catalog for permanent, discoverable data assets
Key Characteristics: - UDTFs and Views registered in Unity Catalog - Permanent, discoverable, and governable - Credentials managed via Secret Manager (no credentials in SQL) - Views provide simplified query interface - Requires Unity Catalog access and Secret Manager setup
Quick Start
Session-Scoped (Development)
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
client = load_cognite_client_from_toml("config.toml")
generator = generate_udtf_notebook(data_model_id, client)
generator.register_session_scoped_udtfs()
See Session-Scoped Documentation for complete guide.
Catalog-Based (Production)
from cognite.databricks import generate_udtf_notebook, SecretManagerHelper
from databricks.sdk import WorkspaceClient
workspace_client = WorkspaceClient()
generator = generate_udtf_notebook(data_model_id, client, workspace_client=workspace_client)
result = generator.register_udtfs_and_views(secret_scope="cdf_scope")
See Catalog-Based Documentation for complete guide.
Documentation Structure
Session-Scoped UDTF Registration
Catalog-Based UDTF Registration
- Quickstart
- Prerequisites
- Secret Manager
- Registration
- Views
- Querying
- Filtering
- Joining
- Time Series
- SQL-Native Time Series (Alpha)
- Governance
- Troubleshooting
Examples
All examples are available in the examples/ directory:
- Session-Scoped Examples:
examples/session_scoped/ - Catalog-Based Examples:
examples/catalog_based/
Package Architecture
cognite-databricks extends pygen-spark with Databricks-specific features:
- Code Generation: Uses
pygen-sparkfor template-based UDTF generation (both Data Model and Time Series UDTFs) - Generic Components: Generic utilities (
TypeConverter,CDFConnectionConfig,to_udtf_function_name) are provided bypygen-sparkand re-exported fromcognite.databricksfor backward compatibility - Databricks-Specific: Unity Catalog registration, Secret Manager integration, and Databricks-specific utilities
Import Paths for Generic Components:
# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name
# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_name
Related Resources
- README: Package overview and installation
- Technical Plan: Architecture and design details
- pygen-spark Documentation: Generic Spark UDTF code generation library (works with any Spark cluster)