Project Goal
Build an expense approval agent that demonstrates production-ready reliability techniques:- ✅ Deterministic business rules - Validate expenses with 100% consistency
- ✅ Security guardrails - Detect and handle PII, jailbreak attempts
- ✅ Optimized tool selection - Handle 10+ tools efficiently
- ✅ Measurable accuracy - Achieve 95%+ rule compliance
Architecture
Project Requirements
1. Deterministic Business Rules (40 points) Implement validation tool with hard business rules:- Meal expense limits ($50/day)
- Hotel limits ($200/night)
- Receipt requirements (>$25)
- Flight class restrictions (economy <5 hours)
- Manager approval thresholds (>$500)
- Consistent response format (success/error envelope)
- 100% consistency (same input = same output)
- Clear violation messages
- Actionable approval requirements
- Unit testable validation logic
- PII detection (email, SSN, credit card)
- Jailbreak detection (instruction override attempts)
- Input sanitization
- Output validation
- PII detection using regex or Presidio
- Jailbreak pattern detection
- Output content filtering
- At least 10 distinct tools (expense validation, data fetch, notifications, etc.)
- Implement ONE optimization technique:
- Hierarchical organization (router tool)
- Context-based tool groups (phase-specific)
- Retrieval-augmented selection (semantic search)
- Tool consolidation (combine related operations)
- Tool usage analytics (track calls, success rate, latency)
- Selection accuracy measurement (% correct tool chosen)
- Performance comparison (before/after optimization)
- Pre-execution guardrails (check before executing action)
- Post-execution validation (verify results)
- Structured error responses with next actions
- Logging for all critical operations
Bonus Challenges
Choose one or more:- Full security stack: Implement all 3 guardrails (PII + jailbreak + content filtering)
- Advanced validation: Add complex rules (budget tracking, multi-currency support)
- Tool analytics dashboard: Build real-time monitoring of tool usage patterns
- A/B testing framework: Compare multiple tool selection strategies
- Cost optimization: Implement tool result caching and batch operations
Metrics to Track
Business Rule Accuracy:- Policy compliance rate: % of expenses correctly validated
- False positive rate: Valid expenses incorrectly rejected
- False negative rate: Invalid expenses incorrectly approved
- Target: 100% consistency, 0% false positives/negatives
- PII detection rate: % of PII correctly identified
- Jailbreak detection rate: % of attacks caught
- False alarm rate: Legitimate requests blocked
- Target: >95% detection, <5% false alarms
- Selection accuracy: % of queries using correct tool
- Average tools retrieved: Efficiency of pre-filtering
- Response latency: Time to select and execute
- Target: >90% accuracy, <3 tools retrieved on average