EAF Event Sourcing Performance Baseline
Overview
This document defines the performance baseline assumptions, hardware requirements, and SLA interpretations for the EAF Event Sourcing infrastructure performance tests.
Performance SLA Targets
Recalibrated Targets (Based on Real Hardware)
Operation | Target SLA | Safety Margin | Measured Baseline | Test Conditions |
---|---|---|---|---|
Token Operations | <5ms | 2.5x | ~2ms average | 100 iterations, microsecond precision |
Aggregate Loading | <100ms | 2x | ~47ms average | 1000-event aggregate, 10 iterations |
Event Streaming | <200ms | 4x | ~50ms average | 1000-event batch, 5 iterations |
Single Event Append | <100ms | - | Variable | Individual event writes |
Batch Event Append | <500ms | - | Variable | 100-event batches |
Hardware Baseline Assumptions
Development/CI Environment:
- CPU: Modern multi-core (4+ cores, 2.4GHz+)
- Memory: 8GB+ RAM available to JVM
- Storage: SSD storage (not spinning disk)
- Database: PostgreSQL 15+ with default tuning
- Network: Local/containerized (minimal latency)
- JVM: OpenJDK 21 with default GC settings
Production Scaling Factors:
- Network Latency: +10-50ms for remote database
- Concurrent Load: 2-5x degradation under high concurrency
- Dataset Size: Gradual degradation with >10M events
- Hardware Constraints: Linear scaling with CPU/memory
Test Data Characteristics
Event Generation Patterns
// Realistic payload distribution
val payloadSizes = when {
Random.nextDouble() < 0.7 -> 100-500 bytes // 70% small events
Random.nextDouble() < 0.95 -> 1-5KB // 25% medium events
else -> 5-10KB // 5% large events
}
// Multi-tenant distribution
val tenantDistribution = "80/20 rule" // 80% events in 20% of tenants
// Aggregate lifecycle patterns
val aggregatePatterns = listOf(
"User: create → activate → update → deactivate",
"Order: create → add_items → checkout → fulfill → complete",
"Product: create → update_details → price_change → discontinue"
)
Database Configuration
Test Database Settings:
-- Connection pooling
max_connections = 20-50
shared_buffers = 256MB (minimum)
work_mem = 4MB
-- Write optimization
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 1GB
-- Query optimization
effective_cache_size = 1GB
random_page_cost = 1.1 (for SSD)
Performance Test Categories
1. CI Smoke Tests (Always Enabled)
Purpose: Catch major performance regressions in CI pipeline Dataset: 10-25 events per test Runtime: <30 seconds total SLA: 2x more lenient than full tests
# Enable in CI
-Dperformance.smoke.tests.enabled=true
-Dci.performance.enabled=true
2. Full Performance Tests (Manual/Nightly)
Purpose: Comprehensive performance validation Dataset: 1000-10000 events per test
Runtime: 5-15 minutes per test SLA: Production-ready targets
# Enable full tests
# Uncomment @Disabled annotations in test classes
3. Large Dataset Tests (On-Demand)
Purpose: Scale testing with production-like data volumes Dataset: 1M+ events Runtime: 30+ minutes SLA: Throughput and memory efficiency
Performance Monitoring
Key Metrics to Track
-
Latency Metrics
- Average response time
- 95th percentile response time
- Maximum response time
-
Throughput Metrics
- Events per second (write)
- Queries per second (read)
- Concurrent user capacity
-
Resource Metrics
- CPU utilization
- Memory usage (heap + off-heap)
- Database connection pool usage
- Disk I/O patterns
-
Database Metrics
- Query execution plans
- Index usage statistics
- Lock contention
- Connection pool efficiency
Alerting Thresholds
# Example monitoring configuration
performance_alerts:
token_operations:
warning: 3ms
critical: 5ms
aggregate_loading:
warning: 75ms
critical: 100ms
event_streaming:
warning: 150ms
critical: 200ms
Troubleshooting Performance Issues
Common Performance Problems
-
Optimistic Locking Conflicts
Symptom: EventStoreException: Concurrency conflict detected
Cause: Duplicate aggregate IDs or sequence numbers
Fix: Ensure unique aggregate identifiers per test -
Database Index Misses
Symptom: Query plans show Seq Scan instead of Index Scan
Cause: Missing or ineffective indexes
Fix: Analyze EXPLAIN ANALYZE output, optimize indexes -
Connection Pool Exhaustion
Symptom: Timeouts during concurrent tests
Cause: Too few database connections
Fix: Increase pool size or reduce concurrency -
Memory Pressure
Symptom: GC overhead, OutOfMemoryError
Cause: Large result sets, memory leaks
Fix: Implement streaming, pagination, proper cleanup
Performance Debugging Commands
# Run performance tests with profiling
nx test eaf-eventsourcing-sdk --args="--tests '*performance*' -Dperformance.profiling.enabled=true"
# Generate JMH benchmark reports
nx test eaf-eventsourcing-sdk --args="--tests '*runJmhBenchmarks*'"
# Database query analysis
psql -d eaf_eventstore_test -c "EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM domain_events WHERE tenant_id = 'test' LIMIT 1000;"
Continuous Improvement
Performance Regression Detection
- Baseline Tracking: Store performance baselines in CI artifacts
- Trend Analysis: Monitor performance changes over time
- Automated Alerts: Flag significant performance degradations
- Root Cause Analysis: Correlate performance changes with code changes
Hardware Scaling Guidelines
Environment | CPU Cores | RAM | Storage | Expected Throughput |
---|---|---|---|---|
Development | 4 cores | 8GB | SSD | 1K events/sec |
Testing | 8 cores | 16GB | NVMe SSD | 5K events/sec |
Production | 16+ cores | 32GB+ | Enterprise SSD | 10K+ events/sec |