Databricks integration for HyperDB in Airtable

Prev Next

Plan availability

Enterprise Scale plans only

Permissions

  • Enterprise Admin

  • Integration Admin

Platform(s)

Web/Browser

The Databricks HyperDB integration allows admins to sync a table of data from Databricks into Airtable.

  • Because HyperDB is the destination for this integration, you can sync large datasets from Databricks. One HyperDB table can contain up to 100M records.

  • The Databricks integration is a one-way sync into HyperDB, so changes made in Airtable or HyperDB cannot currently be synced back into Databricks.

Understanding Databricks requirements when integrating with Airtable HyperDB

  • Setup requires at least workspace admin access in Databricks. You will need to be able to create a service principal and configure permissions in order to prep their Databricks instance for integration with Airtable. You will need to gather the following information:

    • Databricks Account ID

    • Databricks Workspace URL

    • Service Principal Client ID and Secret

  • Databricks customers must have a SQL Warehouse configured.

  • Tables exported with the Databricks connector must be configured as “Delta” tables within Databricks for versioning purposes.

  • Tables must have a unique primary key, which will be selected by the customer during the table creation flow.

    • Note that this is not enforced by Databricks SQL.

    • Duplicate keys will lead to import errors.

  • We currently offer first-class support for AWS-hosted Databricks instances.

Databricks integration configuration steps

  1. Log into your Databricks account.  The account ID will be located at the end of the url.

    1. Example: https://accounts.cloud.databricks.com/select-workspace?auto_login=true&account_id=xxxxxxx

    2. If you are an account admin, you can also find this in the account settings.

  2. After selecting a workspace in the UI from the previous link, the URL you are on now is the workspace URL.

    1. Example: https://your-workspace.cloud.databricks.com/

  3. Configure the Databricks Service Principal Client ID & Secret

    1. If you don’t yet have one, follow the steps here to create a service principal to represent Airtable in all transactions.

    2. Grant the service principal the correct permissions

      1. At a minimum, go to each table of interest and grant the service principal access using the following logic: Grant SELECT on the table, USE SCHEMA on its parent schema, and USE CATALOG on its parent catalog.

        • You can grant any more expansive permissions you’d like, in order to populate the Airtable dropdowns described in the table creation flow with more data.

      2. Go to the SQL warehouse(s) you’d like Airtable to use for import queries and grant the service principal CAN USE

    3. Open the service principal details and go to the “Secrets” tab.

    4. Click Generate to generate an OAuth secret.

      1. For lifetime, we recommend selecting 730 days. (After the duration you select, you’ll have to delete and recreate the account for now - reconnecting capabilities will follow in a later release.)

      2. The secret and client ID will be displayed on the next page. Copy both values.

      3. Note that the Databricks client secret is scoped to an individual user account. Other users in your org can either re-use an existing secret or add their own service principal client secret.

Configuring a Databricks integration in admin panel

For first-time org setup, you need to create a new Databricks account configuration via the settings panel in the Airtable admin panel:

  1. Open your admin panel.

  2. Click Settings.

  3. Click the Integrations & development tab.

  4. Click the > to the right of the “Block integrations in automations & external source sync” setting.

  5. Click Configure integrations ⌄.

  6. Scroll to the “Databricks” section and click + Add configuration.

  7. When prompted to create a new configuration, enter a descriptive name and fill in the Databricks fields for your account. All pieces of information from the Databricks setup above besides the service principal secret should go here.

  8. Click Save.

Note

  • If you have Enterprise Hub enabled, note that you will need to update the permissions on the specific org unit that will own the data tables.  Enabling the permissions to the top-level org will not automatically propagate to the individual units.

  • The account configuration you’ve created here will be shared across your org unit. Other Airtable admins in your org will be able to select it, but will have to authenticate it themselves with a service principal secret, as described in item 2 under Table Creation.

Connecting Databricks and Airtable in HyperDB

  1. From admin panel, click HyperDB.

  2. Click Add table.

    1. If your organization uses Enterprise Hub, then select an org unit.

  3. Under “Set up a recurring import,” select Databricks and then click Next.

  4. Select an existing connected account or click + Connect new Databricks account.

    1. Below “Databricks account” click Select account and select the account you configured in the “Integrations & Development” tab of the admin panel in the last step.

    2. Click Next.

    3. Enter an “Account name” and the “Service principal client secret” you copied from Databricks.

      1. Note that the Databricks client secret is scoped to your individual user account only. Other users in your org will have to repeat these steps to generate and add their own secret for the service principal.

    4. Click Save.

  5. After the account is set up, you can now select it under “Databricks account” and proceed:

    1. Below “Warehouse ID",” select the warehouse containing the data that you want to sync into Airtable.

    2. Next, select a “Catalog Name,” “Schema Name,” and “Table Name” from Databricks.

      1. Only tables your service principal has SELECT permissions to will be included.

    3. Click Next.

  6. You can now proceed with configuration of the HyperDB table that will be created in Airtable. Here you will:

    1. Choose a primary field - For Databricks, the user must explicitly select a primary key during schema configuration.

    2. Optionally, choose a “Search field” - This allows you to search data using a human-readable value.

    3. Review field mapping - Here, you can make any changes to the schema of the table (i.e. converting a value to a single select field).

    4. Click Continue when you are finished configuring the options in this step.

  7. This will bring you to the final setup screen. Here you will give the HyperDB table a name, description, and set an “Update frequency.”

Note

After creating the table, click on the table to view progress.  An email notification should be sent once the table is completed.

Supported Databricks Field Types

Databricks field type

Default Airtable field type

array

Single line text

boolean

Single line text

byte

Single line text

char

Single line text

date

Date

decimal

Number

double

Number

float

Number

int

Number

long

Number

map

Single line text

short

Number

string

Single line text

struct

Single line text

timestamp

Date

HyperDB Databricks integration limitations

  • HyperDB can support tables of up to 100M rows.

  • The total size of the table can be up to 100GB. This is a Databricks API limit. If the table size would exceed this limit, then rows (imported in ascending order of primary key) may be missing, but the sync import will complete successfully up to the point of hitting that limit.

  • The total size of requests to fetch catalogs, schemas, and tables can be up to 25 MB.  Beyond this, results may be missing from the dropdowns.

  • We offer limited support for table/schema changes:

    • Columns added to the source table after initial import will be ignored.

    • Deleting columns from the source table after initial import will result in import errors.

    • Columns whose types are changed on the source table after initial import will be imported on a best-effort basis.

  • Once a service principal secret expires, you will have to recreate the account. A “reconnect” flow is currently not supported.

  • We offer limited support for syncing Databricks views. They can be imported as normal, but if the underlying tables for the view change on Databricks in the middle of an import, the resulting HyperDB table may be silently corrupted.

  • “Binary” and “Variant” Databricks field types are not supported.

FAQs

Are Azure-hosted or GCP-hosted Databricks instances supported for this integration?

We've made our best attempt to support Azure and GCP. However, there may be unexpected behavior at times. Reach out to your AE if you are noticing any functionality issues.