Release Version: BetaRelease VersionBeta Vendor Support: Official

The Specto DataLake (Target) integration writes IMSURGE pipeline data into pipeline parquet storage within a selected Specto DataLake project. Based on the rollover period you choose, older rows are moved into monthly archive parquet files while newer rows remain in the current dataset.


Setup

Prerequisites

Before setting up this integration, obtain the SDL License Key provided by Specto upon account setup.

A valid, unexpired SDL License Key is required for IMSURGE to discover available Specto DataLake projects during setup.

Credential Setup

Use Specto DataLake Credentials to create or select the credential for this integration. The same credential works for both Specto DataLake source and target setups.

Integration Setup

After selecting the Specto DataLake credential, configure the integration in this order:

  • Data Rollover Period – Choose how many days of data remain in the current pipeline parquet before older rows are moved into monthly archive parquet files. Allowed values are 30, 60, or 90 days. This field can be edited after the integration is created.
  • Project – Select the Specto DataLake project that IMSURGE writes into. This list is populated dynamically from your SDL License Key and is a setup-only field.

These are the only fields shown for this target integration. IMSURGE does not ask you to select a device, calculation, or pipeline on the Specto DataLake (Target) setup screen.

After the integration is first saved, the selected Project cannot be changed.


Reference

For credential fields, see Specto DataLake Credentials.


Behavior Notes

  • IMSURGE writes the current dataset to /<project_key>/pipeline/<pipeline_name>.parquet.
  • Older rows are grouped into monthly archive parquet files under /<project_key>/pipeline/archive/<pipeline_name>/ using YYYY-MM file names.
  • On each run, IMSURGE reads the existing current parquet, combines it with new rows, sorts by timestamp, and keeps the latest row when duplicate timestamps exist.
  • IMSURGE flattens exported data for parquet storage. Single-device data becomes metric columns, while multi-device data becomes combined device and metric columns. Numeric values are stored as floating-point columns.
  • After each export, Specto DataLake compute orchestration is triggered asynchronously.
  • Specto DataLake (Source) does not read this /pipeline/ parquet directly. If you want to read the data back into IMSURGE, a separate computed output must exist under /computed/....

Concerns & Limitations

  • UTC-based rollover – The rollover cutoff is based on the current UTC time when the upload runs. Backfilled older rows may be archived immediately.
  • Archive retention behavior – Archive parquet files are append-and-merge only. No automatic pruning or cleanup behavior is documented here.
  • Empty current dataset – If no rows remain inside the current rollover window after processing, the main parquet may not be rewritten for that run.


External References