Skip to content

Quickstart (Unity Catalog) — start here

This guide is the recommended starting path for all customers: one flow from installCDF + Databricks clientsgenerate UDTFsSecret Managerregister UDTFs and Views in Unity Catalog.

Notebook (copy-paste friendly): quickstart.ipynb on GitHub — same steps as below, with markdown explanations and inline comments in each code cell.

Prerequisites: Prerequisites (Unity Catalog, Secret Manager, CDF data model, TOML with [cognite] credentials).


1. Install

Install the package from PyPI. In a Databricks notebook:

# Installs cognite-databricks (Unity Catalog registration, Secret Manager, notebook helpers).
%pip install --upgrade cognite-databricks

Restart the kernel if the installer tells you to.


2. Imports and CDF client

  • load_cognite_client_from_toml: builds a Cognite client from TOML — same idea as cognite-pygen notebooks. Used to load the data model and talk to CDF.
  • TOML during provisioning: you read credentials from a file to seed Secret Manager. End users querying Views do not use this file; SQL uses SECRET().
from cognite.databricks import generate_udtf_notebook
from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.pygen import load_cognite_client_from_toml
from databricks.sdk import WorkspaceClient

import toml

# Path to your CDF config in the workspace (see example structure in prerequisites).
toml_file_path = "/Workspace/Users/<your-email>/config/credentials.toml"

# CDF API client — validates config and fetches the data model for codegen.
client = load_cognite_client_from_toml(toml_file_path)

3. Databricks workspace and SQL warehouse

  • WorkspaceClient: authenticates to Databricks (PAT/OAuth in the notebook environment) for SQL and Secrets APIs.
  • Warehouse: CREATE FUNCTION / view DDL runs through a SQL warehouse. Pick one your principal can use (often your team’s default SQL warehouse).
# Uses notebook-attached Databricks credentials.
workspace_client = WorkspaceClient()

# Inspect available Pro SQL warehouses (name + id).
warehouses = list(workspace_client.warehouses.list())
for w in warehouses:
    print(f"Name: {w.name}, ID: {w.id}")

if not warehouses:
    raise RuntimeError("No SQL warehouses found. Create one in Databricks SQL.")

# Pick explicitly — avoid using the last loop variable by mistake.
warehouse = warehouses[0]
# Or: next(w for w in warehouses if w.name == "Analytics WH")

print(f"Using warehouse: {warehouse.name} ({warehouse.id})")

4. Generate UDTF Python files

  • generate_udtf_notebook: uses cognite-pygen-spark templates to write Python UDTF modules under output_dir.
  • catalog / schema: Unity Catalog destination for functions and views.
  • workspace_client: required for catalog-based registration and Secret Manager helpers.
# CDF data model to expose in Databricks (space, external id, version).
data_model_id = DataModelId(space="cdf_cdm", external_id="CogniteCore", version="v1")

generator = generate_udtf_notebook(
    data_model_id,
    client,
    workspace_client=workspace_client,
    output_dir="/Workspace/Users/<your-email>/udtf_generated",
    catalog="my_catalog",  # Your Unity Catalog name
    schema="CDF_CogniteCore_v1",  # Or None to auto-derive from the data model
    warehouse_id=warehouse.id,
    debug=False,
)

5. Persist CDF credentials in Secret Manager

  • Scope naming: cdf_{space}_{external_id.lower()} aligns with generated SQL that references SECRET('cdf_...', 'client_id'), etc.
  • set_cdf_credentials: creates the scope if missing; stores project, cdf_cluster, client_id, client_secret, tenant_id.
# Stable secret scope name for this data model.
secret_scope = f"cdf_{data_model_id.space}_{data_model_id.external_id.lower()}"

# Load the same TOML again and push values into Databricks Secret Manager.
toml_content = toml.load(toml_file_path)
cognite_config = toml_content.get("cognite", {})

generator.secret_helper.set_cdf_credentials(
    scope_name=secret_scope,
    project=cognite_config["project"],
    cdf_cluster=cognite_config["cdf_cluster"],
    client_id=cognite_config["client_id"],
    client_secret=cognite_config["client_secret"],
    tenant_id=cognite_config["tenant_id"],
)

More detail: Secret Manager.


6. Register UDTFs, then Views

  • register_udtfs: creates Unity Catalog functions from generated files.
  • register_views: creates views over those UDTFs so analysts can query without embedding secrets.
  • if_exists: replace for iterative setup; skip / error for stricter pipelines.
  • debug=True: verbose logging of registration steps.
# Register functions first (views depend on them).
udtf_result = generator.register_udtfs(
    secret_scope=secret_scope,
    if_exists="replace",  # "skip" | "replace" | "error"
    debug=False,
)

view_result = generator.register_views(
    secret_scope=secret_scope,
    if_exists="replace",
    debug=False,
)

Optional: single call (Runtime 18.1+)

result = generator.register_udtfs_and_views(
    secret_scope=secret_scope,
    if_exists="replace",
    debug=False,
)

Note: register_udtfs_and_views() requires Databricks Runtime 18.1+. For older runtimes or step-by-step debugging, keep two calls as above.


7. Verify and continue

Next steps