Options for Extracting Data from TeamForm
This document outlines the options available for extracting data from TeamForm.
Overview
TeamForm has two mechanisms to extract data
- via a batch created dataset written to S3 (Options 1A, 1B, 1C) with broad coverage of TeamForm data reporting tables.
and
- via public API (Option 2) for real time data needs focusing on core people, tag and team data fields.
You may choose to adopt either or both paths depending on your use case.
Option 1A: S3 + SFTP
How it works: TeamForm runs the extract and writes files to an S3 bucket. You download the files via SFTP using SSH key authentication, then load them into your data lake (e.g. via internal stage + COPY, or external stage if you first copy files into your own S3).
Pros | Cons |
|---|---|
No AWS account required on your side for access | Requires SFTP client and key management |
Works with any downstream stack (not only AWS) | You must run a separate step to move/load files into your data lake |
Familiar protocol for many enterprises | Optional: if files are PGP-encrypted, you must decrypt before load |
Can restrict access by IP if required |
|
Customer setup
Provide SSH public key(s)
TeamForm will create SFTP user(s) and associate your public key(s). You’ll receive the SFTP host and username.Optional: IP allow list
If you use a fixed egress IP for SFTP, we can restrict SFTP access to that IP.Connect and download
Use any SFTP client (e.g.sftp, WinSCP, or an orchestration tool) to connect and download the extract files from the provided path.Load into downtream system
Upload files to a stage (internal or S3 external stage in your account), then use
COPY INTO … FROM @stageto load tables, orIf you first copy files into your own S3, create an external stage on that bucket and run
COPYfrom there.
Note: SFTP is currently tied to the integration upload bucket. If the extract is written to a different bucket, TeamForm may configure an additional output (e.g. copy) so that the same files are available under the SFTP-accessible path. Confirm the exact S3 path and SFTP path with your TeamForm contact.
Option 1B: AWS IAM – Pull (you read from our bucket)
How it works: TeamForm writes extract files to a TeamForm-owned S3 bucket in our AWS account. You assume an IAM role we create (trusting your AWS principal) and read objects from that bucket. You can then load into your data lake from your own S3 or via a data lake external stage that uses that assumed role.
Pros | Cons |
|---|---|
No inbound access from TeamForm into your account | You must have an AWS identity (role or user) to assume our role |
You control when and how often you read | Data lives in our account until you copy it |
Fits well with data lake external stages and storage integrations | You need to configure data lake to use the assumed-role credentials (e.g. storage integration) |
Optional IP restriction on the role for extra security |
|
Customer setup
Provide your AWS role ARN
Give TeamForm the ARN of the IAM role (or user) that will be allowed to assume our “remote reader” role (e.g.arn:aws:iam::YOUR_ACCOUNT:role/YourDatalakeDataIngestionRole).Optional: IP allow list
If you want to restrict access by source IP, provide the CIDR(s). We will add a condition on the role’s policy so only requests from that IP can use it.Assume our role and read
From your side (e.g. EC2, Lambda, or Datalake storage integration):Assume the TeamForm-provided role (e.g.
data-extractor-batch-remote-role-<tenantId>).Use the temporary credentials to read from the bucket and prefix we give you (e.g.
s3://teamform-reports-data-extract-<env>-<tenantId>/<prefix>/).
Load into Datalake
Option A: Use a Datalake storage integration that assumes the TeamForm role (if supported in your Datalake/AWS setup), and create an external stage on our bucket. Then
COPY INTO … FROM @external_stage.Option B: Use a job in your account (e.g. Lambda, ECS) that assumes the role, reads from our bucket, and writes to your S3; then point Datalake at your S3.
TeamForm will provide: bucket name, optional prefix, role ARN to assume, and region. If objects are KMS-encrypted, the role we create will have permission to use the relevant KMS key.
Option 1C: AWS IAM – Push (we write to your bucket)
How it works: TeamForm’s Batch job assumes an IAM role in your AWS account and writes extract files directly into your S3 bucket. You then load from that bucket into Datalake (e.g. external stage + COPY).
Pros | Cons |
|---|---|
Data lands in your account; no pull step | You must create a role and bucket and allow our account to assume the role |
Simple Datalake integration: external stage on your bucket | Requires AWS and some IAM setup on your side |
You control retention, lifecycle, and access in your bucket | Cross-account and optional KMS setup to configure |
Can target a different region via configuration |
|
Customer setup
Create an S3 bucket
In the account and region where you want the extract (e.g. same region as Datalake or a dedicated data-lake account). Note the bucket name and, if different from our region, the region.Create an IAM role for TeamForm to assume
Create a role that only the TeamForm AWS account can assume (trust policy:
Principal: { AWS: "arn:aws:iam::TEAMFORM_ACCOUNT_ID:root" }or a specific role ARN we provide).Attach a policy that allows:
s3:PutObject,s3:GetObject,s3:DeleteObjectonarn:aws:s3:::YOUR_BUCKET_NAME/*(and optionallys3:ListBucketif we need it).
If you use a customer-managed KMS key for the bucket, grant the role
kms:Decrypt,kms:GenerateDataKeyon that key.
Provide TeamForm
Role ARN (e.g.
arn:aws:iam::YOUR_ACCOUNT:role/TeamFormDataExtractWriteRole).Bucket name.
Optional: object prefix (e.g.
teamform/extract/), target region if different from our default.
Optional: PGP encryption
If you want files encrypted at rest in your bucket, provide a public PGP key. TeamForm will encrypt the Parquet files with it before uploading. You decrypt in your pipeline before loading into Datalake (or use a tool that supports PGP in the load step).Load into Datalake
Create an external stage (and storage integration if needed) on your bucket and runCOPY INTO … FROM @stagefor the extract files. If files are PGP-encrypted, add a decrypt step before or during load.
TeamForm will configure the Batch job with your role ARN and bucket (and optional prefix/region) so that each run writes directly to your bucket.
Option 2: Public API
How it works: TeamForm exposes a REST-style Public API (POST endpoints, JSON request/response) secured by OAuth2 client credentials. You call the API to query people, teams, memberships, tags, objectives, and workspaces with filters, pagination, and optional point-in-time (asOfDate). You then ETL the responses into Datalake (e.g. scheduled jobs that call the API and load into tables). This option is not a bulk dump: it is request/response, so building a full data lake copy requires many calls and your own orchestration.
Pros | Cons |
|---|---|
Near real-time data (no wait for batch schedule) | Not designed for bulk export; you must paginate and orchestrate |
Fine-grained queries (filter by team, person, date, etc.) | Rate limits apply; large datasets need many requests |
No file transfer or AWS setup required | Only a subset of data is available (see comparison below) |
Point-in-time (asOfDate) supported per request | JSON format; you own transforming and loading into Datalake |
Good for incremental syncs or small/medium datasets | Several extract-only datasets have no API equivalent |
Customer setup
Obtain API credentials
Create a Public API credential (Auth0 machine-to-machine) via TeamForm (GraphQL mutationcreatePublicAPICredentialor self-serve if enabled). You receive a client ID and client secret; use them to get a JWT with audience<https://api.teamform.co/<tenant-id>>/api.Call the API
Base URL:<https://api.teamform.co/<tenant-id>>/api(or regional endpoint, e.g.api-euw2.teamform.co). All listed endpoints are POST; sendworkspaceIdand optionalasOfDate(ISO 8601) in the body. Use the OpenAPI spec (/getReferenceor/getSpecification) for exact request/response schemas.Paginate and sync
Endpoints such assearchPeopleandsearchTeamssupportsizeandpage. Implement a sync job that pages through results, then loads into Datalake (e.g. merge into staging tables).Optional: IP allow list
If your Confluence/tenant has Public API IP filtering enabled, ensure your egress IPs are allow-listed.
Note: The Public API does not expose all datasets that the batch extract provides. See Data availability and format comparison below for what is available via API vs extract and the main gaps.
Data format and scope
Format: Parquet (default). Filenames and optional date-based prefixes are configurable (e.g.
<source>_<date>.parquetor.parquet.pgpwhen encrypted).Datasets: The extract can include many entity types, e.g. people, teams, memberships, allocations, baselines, objectives, tags, comments, planning, and others. The exact list and names are configurable; default list is documented in the data extract Batch job (e.g. in
teamform-api:DATA_EXTRACTOR_FILE_LIST/FILE_LISTin the Batch job).Source code: Extract logic and scheduling live in teamform-api (Batch job definition, output configs, SFTP, IAM); the reporting data and pipeline that produce the source Parquet are in teamform-reporting.
Data availability and format comparison (API vs batch extract)
Choosing between the Public API and the batch extract (Options 1–3) depends on how much data you need, how it’s shaped, and how you want to load it. Below is a concise comparison and gap summary.
What the Public API exposes
The Public API is documented in OpenAPI form (e.g. /getSpecification, /getReference). In summary it offers:
Area | Endpoints (examples) | Notes |
|---|---|---|
Workspaces |
| List workspaces (e.g. for picking |
Teams |
| Search/filter, pagination, point-in-time |
People |
| Search/filter, pagination, point-in-time; attributes via unlisted |
Tags |
| Tag definitions and applied tags |
Objectives |
| Objectives (e.g. OKRs) |
Memberships |
| Memberships with filters |
Format: JSON request/response. Pagination via
sizeandpage(e.g. max 100 per page). Point-in-time viaasOfDatein the request body where supported.Auth: Bearer JWT (OAuth2 client credentials). Optional IP allow list per tenant.
What the batch extract includes (default file list)
The extract writes Parquet files (one per dataset) from the reporting data store. The default FILE_LIST includes many more “tables” than the API exposes, for example:
Available in both (but different shape): people, teams, memberships, tags (and related: tag_attributes, tag_people, tag_teams, entity_tag_attributes, entity_tags_today), objectives (objectives, team_objectives), associations, people_attributes.
Extract only (no Public API equivalent): projects, allocations, baseline_allocations, baseline_people, baseline_teams, comments, divisions, forecast, individual_allocations, memberships_journal, memberships_schedule, people_to_teams, person_history, planning, team_attributes, team_history.
So: the API is a subset of the data model, focused on current-state and search-oriented access. The extract is a full snapshot of the reporting schema, including history, baselines, planning, and allocations.
Main gaps when using the Public API for a data lake
No bulk snapshot – You must call the API multiple times (paginated) to assemble the dataset. There is no single “dump all people” or “dump all teams” endpoint that returns the same scope as one Parquet file. (using changes since an “as of” date can avoid a need to pull full dataset - see point 5)
Missing datasets – comments, planning, forecast, history tables and similar are not available via the Public API. For those, you need the batch extract.
Format and shape – API returns nested JSON (e.g. team with memberships); extract returns flat or normalized Parquet tables. Field names and structure can differ; you need a mapping layer if you mix API and extract in the same lake.
Rate limits – The API is rate-limited (e.g. per API key). Large or frequent bulk syncs can hit limits; the batch extract is better for “full refresh” of large datasets.
History and point-in-time – The API supports
asOfDateon individual requests. The extract gives you full table snapshots (and optionally dated paths); for historical series you may still need the extract (e.g. team_history, person_history).
When to use which
Use the Public API when you need near real-time, incremental, or filtered access to people, teams, memberships, tags, objectives and your volume is manageable (pagination + rate limits acceptable). Good for syncs into Datalake that don’t require planning and forecast data.
Use the batch extract (Options 1–3) when you need full data reporting snapshots including planning, forecasting, comments, history tables. Best for a full data lake load or when the API doesn’t expose the entities you need.
Summary
Option | Who moves data? | Where data lives | Best when |
|---|---|---|---|
S3 + SFTP | You pull via SFTP | Our S3 (SFTP-accessible path) | You don’t use AWS or prefer SFTP |
IAM Pull | You pull via assume role | Our S3 | You use AWS and want to read from us |
IAM Push | We push to your bucket | Your S3 | You use AWS and want data in your account |
Public API | You call API, then load | Your pipeline / Datalake | Near real-time, subset of data, no file transfer |
For Datalake :
SFTP: Download from SFTP → upload to Datalake stage (or your S3) →
COPYinto tables.Pull: Assume our role → read from our bucket → either stage in your S3 and use Datalake external stage, or use a storage integration to our bucket if your Datalake setup supports it.
Push: Data lands in your S3 → create external stage on that bucket →
COPYinto Datalake tables (with optional PGP decrypt in the pipeline).Public API: Call the API (paginated), write JSON to stage or tables (e.g. variant columns), or ETL into relational tables. Best for the subset of data the API exposes; use extract for full scope and for data not available via API.
If you tell us your preference (SFTP vs Pull vs Push vs API) and whether you use AWS and PGP, we can confirm the exact setup steps and config keys (e.g. DATA_EXTRACTOR_OUTPUT_CONFIGS, DATA_EXTRACTOR_REMOTE_ROLE, SFTP host, bucket names, role ARNs, or API credentials) for your tenant.