How to Ingest Data Into Statsig

Tue Jul 16 2024

Logan Bates

Solutions Engineer, Statsig

During my endeavors as a data engineer, I’d spend hours helping teams translate data models and building data pipelines between software tools so they could effectively communicate.

The frustration of getting the data into tools often spoiled the intended initial experience and added complexities to a production implementation. To add to this, as our models evolved, we’d have to spend meaningful development time to ensure everything continued to operate smoothly, often due to rigid downstream schemas.

Now as a solutions engineer, where I’m tasked with helping prospects understand the lift of data ingestion, I can rest assured that they can avoid a lot of these frustrating experiences with Statsig.

It doesn’t matter whether your team has an existing data pipeline feeding into a warehouse as a single source of truth, are utilizing a third-party tool for data collection, or are jumping into data logging for the first time; Statsig offers several mechanisms to address these different data flows with sophisticated SDKs, third-party integrations, and a Data Warehouse Native Solution. This guide will provide you with a step-by-step walkthrough of how to implement these solutions, with clear instructions and best practices to ensure a smooth setup.

Before you begin, ensure you have:

  • An account with Statsig

  • Access to your data source (e.g., data warehouse, 3rd party data tool/CDP, or application code).

  • Necessary permissions to read from and write to your data source. (if applicable)

  • Familiarity with SQL (if you're using a data warehouse)

1. Choose Your Ingestion Method

Choose the one that best fits your infrastructure or current needs. You can also use multiple ingestion methods in most cases, if necessary. Typically if you’re just starting out you’ll want to orchestrate the SDK into your code to get oriented with the platform. On the other hand, if you have existing metric models in a warehouse, then our warehouse native solution generally will work better for your teams. Lets dive in!

Server, Client and Mobile SDKs

Ideal for real-time event tracking directly from your application code. Load the Statsig SDK into your app or server and utilize built in methods to log any events with relevant metadata for thorough analysis. Perform experiments across multiple surfaces easily with the SDKs.

  • Use the logEvent method in various languages to quickly populate Statsig with data for experimentation and analysis

  • Log additional metrics alongside 3rd party or internal logging systems to enhance measurement capabilities

HTTP API

Suitable for server-side event logging when the SDKs don’t cover your stack or if you prefer direct API calls. Note: This typically requires a bit more work to get it into a reliable state (vs the SDKs)

  • Use in situations where the SDKs are not supported or preferred

  • Can be used to support custom metric ingestion workflows

Data Integrations

Use pre-built integrations with CDPs like Segment or mParticle for seamless data flow. Maintain your existing logging infrastructure to power experimentation quickly. Event filters can be utilized to reduce downstream event volume to Statsig, so only your relevant metrics are analyzed.

  • Quickly populate Statsig with metric data and analyze with the metrics explorer

  • Reduce time to analysis by simplifying experiment setup

  • Control event billing volume with event filtering

Data Warehouse Ingestion

Connect your data warehouse (e.g., Snowflake, BigQuery) to Statsig for bulk data import. Map your data to Statsig’s expected schema and schedule regular (daily) imports for metric analysis. A copy of your metric data is stored in Statsig servers for processing.

  • Send custom events or precomputed metrics to cover more complex use cases, internal computations, attribution windows, etc

  • Use SQL queries to pull in data, join data when you need to to pull in more metrics.

Data Warehouse Native Solution

If you have an existing warehouse with metric data, perhaps downstream a reverse ETL tool or internal logging system This ingestion method differs from also allows Statsig will operate on top of your data warehouse data and utilize warehouse resources to run experimentation and analysis in real-time. No user-level data is replicated in Statsig servers, so this pathway is preferred for privacy-conscious industries.

  • Easy SQL interfaces for connecting metrics, as well as assignment data for experiment analysis (offline experiments, 3rd party systems, internal systems)

  • Create multiple metric sources and build additional aggregate metrics on top of these sources

  • If you have existing experiment allocation data, you can perform experiment analysis in ~30 minutes

  • Reload experiment analysis on demand

2. Set Up Your Data Connection and/or Instrument SDKs (if applicable)

Depending on your chosen method, you'll need to prepare your data connection and/or initialize our SDKs within your app:

SDKs

Once you’ve chosen your SDK, you’ll need to integrate the Statsig into your application. Follow the official SDK documentation for specific instructions. We offer the high level steps here:

  1. Initialize the SDK in your chosen language.

  2. Start calling the logEvent method. See a more in depth walk-through here.

import type { StatsigEvent } from '@statsig/client-core';

// log a simple event
myStatsigClient.logEvent('simple_event');

// or, include more information by using a StatsigEvent object
const robustEvent: StatsigEvent = {
  eventName: 'add_to_cart',
  value: 'SKU_12345',
  metadata: {
    price: '9.99',
    item_name: 'diet_coke_48_pack',
  },
};

myStatsigClient.logEvent(robustEvent);

For HTTP API

Authenticate with your Statsig API key and use the log_event endpoint to send data. Check the HTTP API documentation for details. Generally:

  1. Fetch a client SDK key (or generate a new one) from the console

  2. Send a POST request to /log_event endpoint

curl \
--header "statsig-api-key: <CLIENT-SDK-KEY>" \
--header "Content-Type: application/json" \
--request POST \
--data '{"events": [{"user": { "userID": "42" }, "time": 1616826986211, "eventName": "test_api_event"}]}' \
"https://api.statsig.com/v1/log_event"

For Integrations

Set up the integration within your CDP's interface and link it to Statsig. Refer to the Integrations documentation for guidance on the specific tool you’re using. Generally:

  • Navigate to the integration section (settings → project → integrations) and select the applicable tile

  • Follow the specific instructions for connecting the data

  • Optional: Apply event filters to reduce downstream event volume

For Data Warehouse Ingestion

Configure the connection in the Statsig console, map your data fields, and schedule ingestion. The Data Warehouse Ingestion guide provides comprehensive instructions. The high level steps are:

  • Establish a connection with your data warehouse. You’ll need to create a service role that requires the ability to read metric and assignment data from your warehouse, write to a staging data set for caching and experiment results, and ability to run queries on top of the warehouse.

  • Establish metric and assignment sources via SQL queries or simply by table name. Map your data to Statsig’s expected schemas to establish baseline data.

3. Validate Your Data Ingestion

Once a data connection has been established, you’ll want to verify that metrics are correctly flowing into the system and in the correct format. Depending on your implementation, Statsig provides a few ways of doing so:

  • Events Logstream - A live view of events that are logged and/or ingested. Dig into individual events to verify name, value, date, metadata, and that the correct ID is provided (A common mistake during implementation is unmatched/incorrect IDs)

Events Logstream Ingest Data
  • Metrics Explorer - Provides mechanisms to dive into metric views via charts, funnels and more. You can apply filters to event and user metadata and validate the your data is being ingested correctly. Read more here.

Metrics Explorer Ingest Data
  • SQL Debugging (for Warehouse Native/Ingestion) - With Warehouse Native, each metric source (and metric definitions built on top of sources) are produced as the result of a SQL query, so you’ll be able to quickly verify that the data exists on your warehouse.

If you encounter issues:

  • For SDKs: Ensure the SDK is initialized correctly and that you're using the latest version. Check out the client SDK debugging guide.

  • For HTTP API: Check for errors in your API requests and ensure you're using the correct endpoints and authentication.

  • For Integrations: Verify that the integration settings match between your upstream tool and Statsig. Many of these systems have debugging tools to help diagnose improper data flows.

  • For Warehouse Native: Refer to this guide for assistance with debugging.

  • Reach out on our community slack channel for support if you still need assistance.

4. Creating Metrics and Next Steps

Once you’ve correctly orchestrated data logging and ingestion, you’ll be able to start creating a metrics catalogue that can be leveraged for analysis and experimentation. This metrics catalogue will ultimately become the bedrock of your product measurement, so it’s important to spend some time getting oriented.

If you’ve made it this far, I hope you’ve found this guide useful and you now have a clear understanding of how to ingest data into Statsig so that you can begin measuring impact!

As more teams get involved, revisit this guide to help orient new members. Should you have any questions or feedback on how we can improve our existing ingestion methods, please visit our community slack and drop us a line!

Get started now!

Get started for free. Add your whole team!
an enter key that says "free account"

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy