Skip to main content
Beton
Open Source · MIT

DryFit

Synthetic analytics datasets with hidden ground truth — so you can benchmark signal-discovery agents deterministically.

DryFit is an open-source Python tool that generates PostHog-shape event data from a YAML config, along with a ground_truth.json listing the exact event_ids that make up each positive and negative signal path. Point an agent at it, grade the output against the truth. Ships with 14 SaaS business-model scenarios and a Grafana inspection stack.

Python 3.12 uv Typer Pydantic PostgreSQL Faker Docker Compose Grafana MIT License

What's in the box

A generator, a truth file, and enough scenario diversity to stress-test any signal-detection pipeline.

Hidden ground truth

Every dataset ships with a ground_truth.json that maps each positive and negative signal back to the exact event_ids that produced it. Grade any detector output deterministically.

14 PostHog business-model scenarios

Seat-based, usage-based, transaction, storage, contact, feature-gated, marketplace, revenue-share, credits, hybrid, freemium, event-volume — plus a combined-coverage dataset exercising every event type.

Realistic noise, controlled

Missing events, duplicates, out-of-order arrivals, null properties, anonymous actors. Configurable probabilities per scenario — and noise never touches rows referenced by ground truth.

PostgreSQL native output

Writes an events table into PostgreSQL (local Unix-socket or dockerized). Reproducible with the same seed. No proprietary format — any SQL client works.

Grafana inspection built-in

docker compose up brings up Postgres, Grafana, a provisioned datasource, and a "Generated Event Inspection" dashboard. See the data you just generated in a browser.

MIT licensed

Python 3.12 with uv, typer, pydantic, psycopg. Fork it, extend it, wire it into CI. No license fees, no cloud dependency.

How it works

YAML in, events and ground truth out.

01

Pick a scenario config

Choose one of the 14 bundled scenarios or author your own YAML. Each declares the success event, positive and negative signal paths, scale, and noise parameters.

02

Run the generator

uv run dryfit -c configs/...yaml writes events to your Postgres instance and dumps ground_truth.json plus manifest.json.

03

Benchmark your detector

Run your agent, SQL query, or ML model against the events table. Score its output against ground_truth.json. Inspect the raw data in the provisioned Grafana dashboard while you iterate.

Docker quickstart
# Clone + bring up Postgres and Grafana
git clone https://github.com/getbeton/dryfit.git && cd dryfit
docker compose up -d

# Generate a dataset into the dockerized Postgres
uv run dryfit \
  -c configs/posthog_seat_based_mvp.yaml \
  --dsn postgresql://dryfit_writer:dryfit_writer@127.0.0.1:54329/dryfit \
  --print-summary

# Inspect in Grafana at http://127.0.0.1:3000 (admin/admin)

Bundled scenarios

Each scenario models a specific SaaS business motion — value metric, positive funnel, negative paths, and realistic noise. Click through for the full config breakdown.

Baseline 500 accounts

PostHog Web (baseline)

→ purchase

Generic SaaS activation (baseline)

Chat / messaging 0 accounts

Telegram Chat

→ event_signup

Chat engagement / retention

Record-based 300 accounts

Contact / record-based SaaS

→ contact_created

Contacts, leads, subscribers, accounts managed

Credits 3,000 accounts

Credits / token-based

→ credits_purchased

Credits consumed, tokens used, compute units

Event-volume 320 accounts

Event-volume SaaS

→ custom_event_tracked

Events tracked, data points ingested, log lines

Feature-gated 260 accounts

Feature-gated (tiered) SaaS

→ upgrade_clicked

Plan tier / feature access level

Freemium 280 accounts

Freemium-to-paid

→ trial_started

Free-tier limit hits that drive paid conversion

Hybrid 320 accounts

Hybrid (seat + usage)

→ compute_hours_used

Seats plus usage overage

Marketplace 240 accounts

Platform / marketplace

→ listing_published

Listings, storefronts, connected accounts, integrations

Revenue-share 240 accounts

Revenue-share / take-rate

→ commission_calculated

Revenue processed, bookings, GMV through platform

Per-seat 3,000 accounts

Seat-based SaaS

→ seat_activated

Active seats / users

Storage 260 accounts

Storage-based SaaS

→ file_uploaded

GB stored, records managed, files hosted

Transaction 280 accounts

Transaction / volume-based SaaS

→ payment_completed

Transactions processed, GMV, payments

Usage-based 320 accounts

Usage-based (metered) SaaS

→ job_completed

API calls, compute hours, messages, requests

All-models 360 accounts

Combined coverage (all models)

→ upgrade_clicked

Union of event types across all business models

Frequently Asked Questions

Does dryfit read from my production database?
No. Dryfit generates synthetic data — it does not consume or read from any production system. You give it a YAML config, it writes synthetic events into a PostgreSQL database you control (local or dockerized).
Is this a signal-detection tool?
No. Dryfit produces the synthetic datasets you run signal-detection tools against. Its value is the ground_truth.json shipped with every dataset — run any agent or detector against the events table, and score its output deterministically against the known positive and negative paths.
Who uses this?
Teams building agentic or LLM-powered analytics pipelines that need reproducible benchmarks. Beton's own signal-discovery agents are tested on dryfit datasets in CI. External researchers, integration-test authors, and data-engineering teams testing pipeline correctness also use it.
What data layout does it produce?
A PostgreSQL events table with entity_id, event_id, event_name, event_timestamp, event_props and related columns — PostHog-like shape. Plus a ground_truth.json referencing actual generated event_ids and a manifest.json describing the run. Artifacts default to artifacts/<dataset_id>/.
How is this different from Faker or other synthetic-data libraries?
Generic libraries give you random rows. Dryfit generates event sequences that match real SaaS business models, with explicit positive and negative signal paths, and emits machine-checkable ground truth so you can grade detectors. Faker is used internally for human-like metadata — it doesn't drive signal logic.
Is it deterministic?
Yes. Each config has a seed (default 42). Re-running with the same config, seed, and scale yields the same event_ids and ground truth. Change the seed to produce a different sample without changing the generator.

Give your agents a benchmark to beat

Deterministic event data with machine-checkable ground truth. Open source, free forever.