🤖 AI Feature Store Workshop
A comprehensive hands-on lab for building production-grade ML feature stores with Datorth streaming. Learn to serve features in real-time with sub-millisecond latency while maintaining training-serving consistency.
Workshop Overview
This 4-hour hands-on workshop guides you through building a complete feature store implementation using Datorth. You'll learn to create, serve, and monitor ML features at production scale while avoiding common pitfalls like training-serving skew.
What You'll Learn
- Feature engineering patterns — Batch and streaming feature pipelines
- Real-time feature serving — Sub-millisecond online inference
- Point-in-time joins — Correct historical feature retrieval for training
- Feature monitoring — Drift detection and quality tracking
- Feature discovery — Catalog and governance for feature reuse
Workshop Agenda
Module 1: Feature Store Fundamentals (45 min)
Understand the core concepts and architecture of modern feature stores.
- Why feature stores matter for ML operations
- Online vs. offline feature serving
- Feature store architecture with Datorth
- Training-serving skew and how to prevent it
Module 2: Building Feature Pipelines (60 min)
Hands-on lab creating batch and streaming feature pipelines.
- Defining feature schemas and metadata
- Building batch features with Spark SQL
- Creating streaming features with Flink
- Implementing windowed aggregations
Module 3: Feature Serving (60 min)
Deploy features for real-time inference with low latency.
- Configuring online feature stores
- Serving features via REST and gRPC APIs
- Caching strategies for ultra-low latency
- Handling missing features gracefully
Module 4: Training Data Generation (45 min)
Generate correct training datasets with point-in-time feature retrieval.
- Understanding point-in-time correctness
- Building training datasets with time-travel
- Avoiding data leakage in features
- Integration with ML training frameworks
Module 5: Monitoring & Governance (30 min)
Ensure feature quality and enable discovery across teams.
- Feature drift detection and alerting
- Data quality monitoring for features
- Feature catalog and documentation
- Access control and lineage tracking
Lab Environment
Each participant receives access to a fully configured Datorth environment with:
- Pre-configured Kafka clusters for streaming
- Flink and Spark environments for processing
- Redis-backed online feature store
- Sample datasets and starter notebooks
- Jupyter environment for hands-on exercises
Use Cases Covered
Real-time Fraud Detection
Build features for transaction fraud scoring:
- User spending patterns (30-day rolling averages)
- Transaction velocity (last 5 minutes)
- Device and location features
- Merchant risk scores
Personalization Engine
Create features for real-time recommendations:
- User engagement signals (clicks, views, purchases)
- Item popularity and trending scores
- Collaborative filtering embeddings
- Session context features
Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and SQL
- Experience with data engineering (helpful but not required)
- Laptop with modern web browser
Workshop Formats
Live Virtual Workshop
Instructor-led sessions with live Q&A and hands-on support.
- Duration: 4 hours
- Class size: Up to 30 participants
- Schedule: Monthly sessions (see calendar)
Private Workshop
Customized for your team with your use cases and data.
- Duration: 4-8 hours (customizable)
- Delivered on-site or virtually
- Custom labs using your datasets
Self-Paced Lab
Work through the materials at your own pace.
- Access to recorded sessions
- Lab environment for 7 days
- Community support via Slack
Upcoming Sessions
- December 15, 2025 — 10:00 AM EST (Virtual)
- January 12, 2026 — 10:00 AM EST (Virtual)
- January 26, 2026 — 2:00 PM PST (Virtual)
Build your ML feature store
Reserve your seat in an upcoming workshop or request a private session for your team.
Reserve a seat ← Back to Resources