Unity Catalog and dbt Deep Dive - Lakehouse Governance Best Practices
Unity Catalog and dbt Deep Dive - Lakehouse Governance Best Practices
“Between ‘we can do everything in Databricks’ and ‘we can do everything in dbt,’ there’s a practical balance worth finding.”
When using Databricks, you often run into questions like:
- We use dbt – why do we need Unity Catalog?
- We manage permissions with Unity Catalog – how far should we go with dbt’s
schema.ymltests and docs? - How should we name catalogs/schemas/tables so environment separation stays clean?
This post dives deep into architecture, governance, and patterns when using Unity Catalog (UC) and dbt together, from a practical standpoint.
📚 Table of Contents
- Role Split: Unity Catalog vs dbt
- Namespace Design: catalog.schema.table Strategy
- Permission Model: Mapping UC Permissions to dbt Roles
- Environment Separation: dev / staging / prod
- Where dbt Tests End and Unity Catalog Policies Begin
- Lineage and the Catalog: Connecting dbt, UC, and BI
- Practical Pattern: Domain-Centric Lakehouse + dbt Project
- Checklist Summary
🎭 Role Split: Unity Catalog vs dbt
It helps to clearly separate roles so the two tools don’t overlap in confusing ways.
Unity Catalog’s Domain
- Who can see what?
- Catalog / schema / table / column permissions
- Row- and column-level security, data masking
- Where did the data come from and where does it go?
- Lineage, system tables, audit logs
- Organization-wide source of truth
- e.g. “
prod.analytics.ordersis our official orders mart”
- e.g. “
dbt’s Domain
- How should data be transformed?
- SQL-based modeling, Jinja macros
- How do we enforce model quality?
schema.ymltests, documentation, docs site
- Data team productivity
- Developer experience, review/deploy/rollback workflows
Summary
- Unity Catalog: Security, governance, lineage, and official object definitions
- dbt: Transformation logic, quality, and domain modeling
Treat them as complementary, not competing: each fills gaps the other doesn’t.
🧭 Namespace Design: catalog.schema.table Strategy
Unity Catalog uses a three-level namespace.
-- catalog.schema.table
SELECT * FROM prod.analytics.orders;
Three Common Patterns
- Catalog = environment, schema = domain
- e.g.
dev.analytics.orders,prod.analytics.orders - Pros: Strong isolation per environment, intuitive permission model
- Cons: More catalogs as environments grow
- e.g.
- Catalog = domain, schema = layer (raw/stg/mart) (recommended)
- e.g.
analytics.raw.orders_raw,analytics.mart.orders - Environment separation handled via dbt profile / job config
- Pros: Domain-centric design, easy for BI/ML to reference
- e.g.
- Catalog = organization/platform, schema = domain + environment
- e.g.
corp.analytics_prod.orders,corp.analytics_dev.orders - Similar to legacy data warehouse naming
- e.g.
What Fits dbt Best
dbt naturally works with layer-based schemas:
# dbt project config example (dbt_project.yml)
models:
my_project:
raw:
+schema: raw
staging:
+schema: stg
marts:
+schema: mart
The best fit is:
- Catalog = domain (e.g. analytics, marketing)
- Schema = layer (raw, stg, mart)
Examples:
analytics.raw.events_rawanalytics.stg.events_staginganalytics.mart.events_sessionized
🔐 Permission Model: Mapping UC Permissions to dbt Roles
Basics of Unity Catalog Permissions
- Principal: User, group, or service principal
- Object: Catalog, schema, table, view, function, volume, etc.
- Privileges:
USAGE,SELECT,MODIFY,OWNERSHIP,APPLY TAG,EXECUTE, etc.
-- Example: analytics catalog and mart schema
GRANT USAGE ON CATALOG analytics TO `analytics_readonly`;
GRANT USAGE ON SCHEMA analytics.mart TO `analytics_readonly`;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.mart TO `analytics_readonly`;
Aligning with dbt “Roles”
Typical dbt-side roles:
analytics_dev: developersanalytics_ci: CI/CD pipelineanalytics_reader: reporting / analytics (read-only)
Map these to Unity Catalog groups or service principals:
-- Developer permissions
GRANT USAGE ON CATALOG analytics TO `grp_analytics_dev`;
GRANT USAGE, CREATE ON SCHEMA analytics.raw TO `grp_analytics_dev`;
GRANT USAGE, CREATE, MODIFY ON SCHEMA analytics.stg TO `grp_analytics_dev`;
GRANT USAGE, SELECT ON SCHEMA analytics.mart TO `grp_analytics_dev`;
-- Read-only
GRANT USAGE ON CATALOG analytics TO `grp_analytics_reader`;
GRANT USAGE ON SCHEMA analytics.mart TO `grp_analytics_reader`;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.mart TO `grp_analytics_reader`;
Pattern
- dbt execution (CI/Job) usually uses a machine account like
grp_analytics_ci- People (analysts, engineers) get separate groups with least-privilege access
🌱 Environment Separation: dev / staging / prod
You can separate environments in two main ways:
- At the Unity Catalog level
- e.g.
dev_analytics,stg_analytics,prod_analytics
- e.g.
- At the dbt profile / job level
- Same UC catalog, different schema / storage / cluster
1) Catalog-Based Environment Separation
CREATE CATALOG dev_analytics;
CREATE CATALOG prod_analytics;
# profiles.yml (catalog-based)
databricks_uc:
target: dev
outputs:
dev:
type: databricks
catalog: dev_analytics
schema: mart
...
prod:
type: databricks
catalog: prod_analytics
schema: mart
...
- Pros: Strong isolation, lower risk of touching prod by mistake
- Cons: More catalogs, more permission/policy management
2) Shared Catalog + Schema/Storage Separation (common pattern)
-- Single analytics catalog
CREATE CATALOG analytics;
-- Per-environment schemas
CREATE SCHEMA analytics.mart_dev;
CREATE SCHEMA analytics.mart_prod;
# profiles.yml (schema-based)
databricks_uc:
target: dev
outputs:
dev:
type: databricks
catalog: analytics
schema: mart_dev
...
prod:
type: databricks
catalog: analytics
schema: mart_prod
...
- Pros: Domain-centric catalog, easy for BI/ML to reference
- Cons: More schemas/tables; naming rules must be followed
Practical tip
- Small teams: Catalog = domain, schema = raw/stg/mart; separate environments via profile/workspace
- Larger enterprises: Often mix in catalog = environment for compliance
✅ Where dbt Tests End and Unity Catalog Policies Begin
Both tools touch “quality” and “policy,” so it’s useful to draw a line: dbt for what, UC/platform for what.
dbt’s Responsibility
- Logical model quality
not_null,unique,accepted_values,relationships, etc.
- Domain rules
- e.g.
status IN ('COMPLETED','CANCELLED') - e.g.
order_date <= current_date
- e.g.
- Documentation
- Column descriptions, business definitions, examples
version: 2
models:
- name: orders
description: "Fact table of orders from the application"
columns:
- name: order_id
tests:
- not_null
- unique
- name: status
tests:
- accepted_values:
values: ['CREATED', 'COMPLETED', 'CANCELLED']
Unity Catalog / Platform’s Responsibility
- Access control, masking, filtering
- PII, PHI, financial data, etc.
- Audit and compliance
- Who accessed what, when
- Organization-wide policy
- e.g. “SSN must always be masked for US residents”
-- Conceptual example: tag-based masking in UC
ALTER TABLE prod.analytics.customers
ALTER COLUMN ssn
SET TAGS ('classification' = 'pii');
CREATE MASKING POLICY mask_pii AS
(value STRING) RETURNS STRING ->
CASE
WHEN current_role() IN ('pii_approved_role') THEN value
ELSE '***-**-****'
END;
APPLY MASKING POLICY mask_pii
ON prod.analytics.customers.ssn;
Summary
- dbt: “Is the data correct?” (logical and domain quality)
- Unity Catalog: “Who can see what?” (security and compliance)
🔗 Lineage and the Catalog: Connecting dbt, UC, and BI
Lineage in dbt
- Model dependency graph via
dbt docsanddbt meta - Developer view: “Which raw sources feed this mart table?”
Lineage in Unity Catalog
- UC captures lineage at engine/query level
- SQL Warehouse, Notebooks, Workflows, BI tools, etc.
- Organization view: “Which reports, models, or notebooks use this table/view?”
-- Conceptual example: UC lineage (actual view/API may vary by version)
SELECT *
FROM system.lineage.relations
WHERE downstream_object = 'analytics.mart.orders';
Using Both Together
- dbt: “Code-level” lineage for developers
- Unity Catalog: “Platform-level” lineage for security, governance, and BI/ML owners
Treat the two lineage views as complementary, serving different audiences, rather than redundant.
🧩 Practical Pattern: Domain-Centric Lakehouse + dbt Project
Example Domain: analytics Lakehouse
1. Unity Catalog setup
CREATE CATALOG analytics;
CREATE SCHEMA analytics.raw;
CREATE SCHEMA analytics.stg;
CREATE SCHEMA analytics.mart;
2. dbt project structure
models/
raw/
raw_events.sql
staging/
stg_events.sql
marts/
fct_sessions.sql
dim_users.sql
# dbt_project.yml (summary)
models:
analytics_dbt:
raw:
+schema: raw
staging:
+schema: stg
marts:
+schema: mart
3. Connect to UC via profiles.yml
analytics_dbt:
target: dev
outputs:
dev:
type: databricks
catalog: analytics
schema: mart_dev
host: adb-123.45.azuredatabricks.net
http_path: /sql/1.0/warehouses/abc
token: ""
threads: 8
prod:
type: databricks
catalog: analytics
schema: mart
host: adb-123.45.azuredatabricks.net
http_path: /sql/1.0/warehouses/xyz
token: ""
threads: 16
4. Permissions / governance
-- Developer group
GRANT USAGE ON CATALOG analytics TO `grp_analytics_dev`;
GRANT USAGE, CREATE, MODIFY ON SCHEMA analytics.raw TO `grp_analytics_dev`;
GRANT USAGE, CREATE, MODIFY ON SCHEMA analytics.stg TO `grp_analytics_dev`;
GRANT USAGE, SELECT ON SCHEMA analytics.mart TO `grp_analytics_dev`;
-- Reader group (read-only)
GRANT USAGE ON CATALOG analytics TO `grp_analytics_reader`;
GRANT USAGE ON SCHEMA analytics.mart TO `grp_analytics_reader`;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.mart TO `grp_analytics_reader`;
5. Quality and lineage
- dbt: Column-level quality rules and descriptions in
schema.yml - Unity Catalog: Tags/labels for sensitivity, ownership, data domain; system tables and Lineage UI for end-to-end lineage
🧾 Checklist Summary
1. Role definition
- Unity Catalog: security, permissions, lineage, compliance
- dbt: modeling, transformation logic, tests, documentation
2. Namespace / environment strategy
catalog.schema.tabledesigned in a domain-centric way?- Decided whether dev/stg/prod are separated by catalog vs schema/workspace?
3. Permission model
- dbt execution (machine) account separate from human accounts?
- UC permissions designed around groups/roles?
4. Quality and policy
- Domain rules and model quality in dbt tests?
- Security, masking, and compliance in UC/platform?
5. Lineage and observability
- Lineage visible in both dbt and Unity Catalog?
- Change history traceable via system tables and logs?
“Unity Catalog and dbt are not replacements for each other – they fill each other’s gaps in the lakehouse.”
Use UC for organization-wide governance and safe data access, and dbt for domain logic, tests, and documentation. That way, data and governance teams can collaborate without stepping on each other’s toes.
Use this post as a reference to design a UC + dbt pattern that fits your team.