Skip to main content

Connect Databricks as a Data Source

Step-by-step guide to connecting your Databricks workspace to Crossbeam as a data source.

Written by Joy Rudnick

In this article:

Overview

Databricks as a Data Source pulls account, contact, deal, lead, and user data from your Databricks workspace into Crossbeam. Once connected, Crossbeam uses this data to generate partner overlaps.

Crossbeam integrates with Databricks in two directions: this article covers the Data Source (Pull), which syncs your data into Crossbeam. You can also use Databricks Integration (Push) to send partner overlaps back into your Databricks workspace. Both are independent and can point to different catalogs and schemas.

👀 Looking to push Crossbeam overlap data back into Databricks?


Plan Availability

Free

Connector

Supernode

Enterprise

Connect Databricks as a Data Source


Prerequisites

Before connecting, confirm you have the following in your Databricks environment:

  • A Databricks workspace with Unity Catalog enabled

  • A SQL Warehouse that Crossbeam can connect to—both values are visible under your SQL Warehouse → Connection Details:

    • Server hostname (e.g. dbc-12345abc-6789.cloud.databricks.com)

    • HTTP path (e.g. /sql/1.0/warehouses/abc123def456)

  • A Service Principal with an OAuth client secret—Crossbeam authenticates as this Service Principal and does not support personal access tokens or interactive OAuth


Creating the Service Principal

In your Databricks account console, navigate to:

  • SettingsIdentity and accessService principals

  • Click Add service principal

  • Open the newly created Service Principal and navigate to SecretsGenerate secret

  • Copy the Client ID (Application ID) and Client Secret—Crossbeam needs both

    • The Client Secret is only shown once. Store it securely before closing the window.

  • Grant the Service Principal CAN USE on your SQL Warehouse:

    • Navigate to SQL Warehousesyour warehousePermissions, then add the Service Principal with Can use

✍️ Note

If you plan to enable both the Databricks as a Data Source and Databricks Integration, you can use the same Service Principal for both.


Step 1: Create the Catalog and Schema

Create a dedicated catalog and schema for the data you're sharing with Crossbeam. You can reuse existing objects, but most customers create a dedicated crossbeam schema.

CREATE CATALOG IF NOT EXISTS crossbeam_share;
CREATE SCHEMA  IF NOT EXISTS crossbeam_share.crm;

💡 Check out these Databricks resources if you need help with creating a catalog and creating a schema.


Step 2: Create the Source Tables

The only required table is accounts. Add other tables only if they're relevant to your use case. Crossbeam will automatically discover any extra columns you add and surface them as fields in the UI.

About the users table: This table holds your CRM users—the record owners behind accounts, deals, and leads. It powers AE-attribution features in Crossbeam. To set it up, add an owner_id STRING column to your accounts, deals, and/or leads tables that references users.id.

Add an optional is_deleted BOOLEAN column to any table to enable soft-delete detection. When rows are flagged as deleted, Crossbeam removes them from your overlaps.

-- REQUIRED: accounts
CREATE TABLE IF NOT EXISTS crossbeam_share.crm.accounts (
  id          STRING    NOT NULL,
  name        STRING,
  website     STRING,
  duns_number STRING,
  owner_id    STRING,
  is_deleted  BOOLEAN,
  updated_at  TIMESTAMP NOT NULL
) USING DELTA;-- OPTIONAL: contacts
CREATE TABLE IF NOT EXISTS crossbeam_share.crm.contacts (
  id         STRING    NOT NULL,
  name       STRING,
  email      STRING,
  account_id STRING,
  is_deleted BOOLEAN,
  updated_at TIMESTAMP NOT NULL
) USING DELTA;-- OPTIONAL: deals
CREATE TABLE IF NOT EXISTS crossbeam_share.crm.deals (
  id         STRING    NOT NULL,
  account_id STRING,
  amount     DECIMAL(18,2),
  owner_id   STRING,
  is_deleted BOOLEAN,
  updated_at TIMESTAMP NOT NULL
) USING DELTA;-- OPTIONAL: leads
CREATE TABLE IF NOT EXISTS crossbeam_share.crm.leads (
  id         STRING    NOT NULL,
  email      STRING,
  owner_id   STRING,
  is_deleted BOOLEAN,
  updated_at TIMESTAMP NOT NULL
) USING DELTA;-- OPTIONAL: users
CREATE TABLE IF NOT EXISTS crossbeam_share.crm.users (
  id         STRING    NOT NULL,
  email      STRING    NOT NULL,
  name       STRING,
  phone      STRING,
  title      STRING,
  is_deleted BOOLEAN,
  updated_at TIMESTAMP NOT NULL
) USING DELTA;


✍️ Note

Every table must include id (STRING) and updated_at (TIMESTAMP) for incremental syncs to work.


Step 3: Grant the Service Principal Read Access

Replace crossbeam-sp with the display name or UUID of your Service Principal.

GRANT USE CATALOG ON CATALOG crossbeam_share          TO `crossbeam-sp`;
GRANT USE SCHEMA  ON SCHEMA  crossbeam_share.crm      TO `crossbeam-sp`;
GRANT SELECT      ON SCHEMA  crossbeam_share.crm      TO `crossbeam-sp`;

To grant access table-by-table instead of schema-wide:

GRANT SELECT ON TABLE crossbeam_share.crm.accounts  TO `crossbeam-sp`;
GRANT SELECT ON TABLE crossbeam_share.crm.contacts  TO `crossbeam-sp`;
-- … etc.

Step 4: Connect from Crossbeam

  • In Crossbeam, navigate to Data Sources → click the Databricks tile

Enter the following when prompted:

Field

Value

Server hostname

From SQL Warehouse → Connection Details

HTTP path

From SQL Warehouse → Connection Details

Catalog

crossbeam_share

Schema

crm

Client ID

Service Principal Application ID

Client Secret

OAuth client secret from the Service Principal

Click Connect.

Crossbeam will:

  • Connect using your Service Principal credentials

  • Verify the required accounts table is present

  • Discover all other tables and columns

  • Validate required field types

Once connected, the initial sync starts automatically. Subsequent syncs are incremental and run against the updated_at column.


Supported Features

Feature

Details

Authentication

OAuth M2M (Service Principal client_id and client_secret)

Supported objects

accounts (required), contacts, deals, leads, users

Incremental sync

Yes — bookmarked on updated_at

Full re-sync

Yes — resets the bookmark

Soft-delete detection

Yes — opt-in via an is_deleted boolean column

Preview sync

Yes — last 10,000 records ordered by updated_at DESC

Discovery

Automatic via INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.COLUMNS

Per-org record limit

Yes — contact support if exceeded


Supported Column Types

Databricks Type

Mapped To

STRING, VARCHAR, CHAR

Text

INT, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, NUMERIC, REAL

Number

DATE, TIMESTAMP, TIMESTAMP_NTZ, TIMESTAMP_LTZ

Timestamp (with time zone)

BOOLEAN

Boolean

ARRAY, MAP, STRUCT, INTERVAL, BINARY

Not supported—column will be skipped


✍️ Note

The deals.amount column is treated as a money type. Complex types (ARRAY, MAP, STRUCT, INTERVAL, BINARY) are not supported—flatten them into separate columns or views if you need to share them.


Limitations

  • Authentication: Service Principal OAuth M2M only. Personal access tokens and interactive OAuth are not supported.

  • Required table: The accounts table is mandatory. The connection cannot be enabled without it.

  • Required columns: Every table must include id (STRING) and updated_at (TIMESTAMP) for incremental syncs to work.

  • Unsupported types: Complex Databricks types (ARRAY, MAP, STRUCT, INTERVAL, BINARY) are skipped.


Managed Databricks Connection

In Crossbeam, navigate to Data Sources → click the Settings icon next to your Databricks connection.

From here you select:

  • General: Adjust how and when Crossbeam syncs data from your Databricks workspace, check the status of the integration, and see connection details

  • Field Sync: Control which individual fields are pulled into Crossbeam from Databricks

  • Field Presets: Create a custom set of fields to control which data is shared with partners.

  • Field Mapping: Map your Databricks workspace as the data source, establish your opportunity fields, and set your product line type field.

  • Click Remove data source to delete it from your Crossbeam account

Click Save when done managing your settings.


Related Articles

Did this answer your question?