File size: 25,580 Bytes
dd916d8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 |
```markdown
# Agent Reasoning Flow Guide
## Overview
RewardPilot uses a multi-stage reasoning process powered by Claude 3.5 Sonnet (planning) and Gemini 2.0 Flash (synthesis). This guide explains how the agent thinks through complex credit card optimization decisions.
## Why Multi-LLM Architecture?
| Stage | LLM | Reason |
|-------|-----|--------|
| **Planning** | Claude 3.5 Sonnet | Best at strategic thinking, tool use |
| **Synthesis** | Gemini 2.0 Flash | Fast context processing, cost-effective |
| **Verification** | GPT-4o | High accuracy for critical decisions |
**Cost Comparison:**
- Single GPT-4o: $0.15 per recommendation
- Multi-LLM: $0.03 per recommendation (5x cheaper)
---
## Four-Phase Reasoning Process
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER TRANSACTION β
β "Whole Foods, $127.50, Groceries" β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 1: PLANNING β
β (Claude 3.5 Sonnet) β
β β
β Input: Transaction context β
β Output: Execution strategy β
β β
β Questions: β
β 1. What category is this? (Groceries) β
β 2. Which cards have grocery bonuses? β
β 3. Are there spending caps to check? β
β 4. Need to forecast future spending? β
β 5. Any special merchant restrictions? β
β β
β Strategy: β
β - Call Smart Wallet MCP (get card recommendations) β
β - Call RAG MCP (check merchant acceptance) β
β - Call Forecast MCP (check cap status) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 2: EXECUTION β
β (Parallel MCP Server Calls) β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Smart Wallet β β Rewards RAG β β Forecast β β
β β MCP β β MCP β β MCP β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Results: β β
β β - Amex Gold: 4x = $5.10 β β
β β - Citi Custom: 5% but cap hit β β
β β - Chase Freedom: Not in grocery quarter β β
β β β β
β β - Merchant: Amex accepted at Whole Foods β β
β β β β
β β - Forecast: $450/$500 cap remaining this month β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 3: REASONING β
β (Gemini 2.0 Flash Exp) β
β β
β Input: All MCP results + transaction context β
β Output: Synthesized explanation β
β β
β Reasoning Chain: β
β β
β 1. Compare Rewards: β
β - Amex Gold: 4x points = $5.10 cash value β
β - Citi Custom Cash: Would be 5% ($6.38) but β
β monthly cap already hit β
β - Winner: Amex Gold ($5.10 > $1.28) β
β β
β 2. Check Constraints: β
β - Amex accepted at Whole Foods? β
Yes β
β - Annual cap status? $2,450/$25,000 (safe) β
β - Foreign transaction fee? β
None β
β β
β 3. Future Optimization: β
β - Forecast shows 3 more grocery trips this month β
β - Total: $127.50 Γ 3 = $382.50 β
β - Rewards: $382.50 Γ 4% = $15.30 β
β - Recommendation: Continue using Amex Gold β
β β
β 4. Alternative Scenarios: β
β - If Citi cap not hit: Use Citi ($6.38 > $5.10) β
β - If at Costco: Use Citi (Amex not accepted) β
β - If annual cap near: Switch to Citi next month β
β β
β Confidence: 95% (high certainty) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 4: RESPONSE FORMATTING β
β (Structured Output) β
β β
β { β
β "recommended_card": { β
β "card_id": "c_amex_gold", β
β "card_name": "American Express Gold", β
β "issuer": "American Express" β
β }, β
β "rewards": { β
β "points_earned": 510, β
β "cash_value": 5.10, β
β "earn_rate": "4x points" β
β }, β
β "reasoning": "Amex Gold offers 4x points...", β
β "confidence": 0.95, β
β "alternatives": [...], β
β "warnings": [...] β
β } β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## Phase 1: Planning (Claude 3.5 Sonnet)
### Implementation
```python
from anthropic import Anthropic
anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
async def create_execution_plan(transaction: dict) -> dict:
"""
Claude analyzes transaction and creates execution strategy
"""
prompt = f"""
You are a credit card optimization expert. Analyze this transaction and create an execution plan.
Transaction:
- Merchant: {transaction['merchant']}
- Category: {transaction['category']}
- Amount: ${transaction['amount_usd']}
- MCC Code: {transaction['mcc']}
- User ID: {transaction['user_id']}
Available MCP servers:
1. smart_wallet - Analyzes user's cards and calculates rewards
2. rewards_rag - Semantic search of card benefits and restrictions
3. spend_forecast - Predicts spending and cap warnings
Your task:
1. Determine which MCP servers to call
2. Prioritize the calls (some may depend on others)
3. Identify key decision factors
4. Set confidence threshold for recommendation
Return a JSON plan with:
{{
"strategy": "optimization approach (e.g., 'max_rewards', 'cap_aware')",
"mcp_calls": [
{{
"service": "smart_wallet",
"priority": 1,
"reason": "Need to know available cards and base rewards"
}},
{{
"service": "rewards_rag",
"priority": 2,
"reason": "Check if merchant accepts top card"
}},
{{
"service": "spend_forecast",
"priority": 3,
"reason": "Verify monthly cap status"
}}
],
"decision_factors": [
"reward_rate",
"merchant_acceptance",
"spending_caps",
"annual_fees"
],
"confidence_threshold": 0.85,
"complexity": "medium"
}}
"""
response = anthropic.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
temperature=0.3, # Lower temperature for consistent planning
messages=[{
"role": "user",
"content": prompt
}]
)
# Parse JSON response
plan = json.loads(response.content[0].text)
return plan
```
### Example Plans
#### Simple Transaction
```json
{
"strategy": "max_rewards",
"mcp_calls": [
{
"service": "smart_wallet",
"priority": 1,
"reason": "Straightforward category bonus"
}
],
"decision_factors": ["reward_rate"],
"confidence_threshold": 0.90,
"complexity": "low"
}
```
#### Complex Transaction
```json
{
"strategy": "cap_aware_optimization",
"mcp_calls": [
{
"service": "smart_wallet",
"priority": 1,
"reason": "Get all card options"
},
{
"service": "spend_forecast",
"priority": 2,
"reason": "Check if near monthly/annual caps"
},
{
"service": "rewards_rag",
"priority": 3,
"reason": "Verify merchant acceptance for top 2 cards"
}
],
"decision_factors": [
"reward_rate",
"spending_caps",
"merchant_acceptance",
"future_spending"
],
"confidence_threshold": 0.80,
"complexity": "high"
}
```
---
## Phase 2: Execution (Parallel MCP Calls)
### Implementation
```python
import asyncio
import httpx
async def execute_mcp_calls(plan: dict, transaction: dict) -> dict:
"""
Execute MCP calls based on plan
"""
# Sort by priority
sorted_calls = sorted(
plan["mcp_calls"],
key=lambda x: x["priority"]
)
results = {}
# Execute in priority order (can parallelize same priority)
current_priority = sorted_calls[0]["priority"]
priority_group = []
for call in sorted_calls:
if call["priority"] == current_priority:
priority_group.append(call)
else:
# Execute current priority group in parallel
group_results = await execute_priority_group(
priority_group,
transaction
)
results.update(group_results)
# Move to next priority
current_priority = call["priority"]
priority_group = [call]
# Execute final group
if priority_group:
group_results = await execute_priority_group(
priority_group,
transaction
)
results.update(group_results)
return results
async def execute_priority_group(calls: list, transaction: dict) -> dict:
"""Execute MCP calls of same priority in parallel"""
tasks = []
for call in calls:
if call["service"] == "smart_wallet":
tasks.append(call_smart_wallet(transaction))
elif call["service"] == "rewards_rag":
tasks.append(call_rewards_rag(transaction))
elif call["service"] == "spend_forecast":
tasks.append(call_forecast(transaction))
results = await asyncio.gather(*tasks)
return dict(zip([c["service"] for c in calls], results))
async def call_smart_wallet(transaction: dict) -> dict:
"""Call Smart Wallet MCP"""
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.post(
f"{MCP_ENDPOINTS['smart_wallet']}/analyze",
json=transaction
)
response.raise_for_status()
return response.json()
# Similar for other MCP servers...
```
---
## Phase 3: Reasoning (Gemini 2.0 Flash)
### Implementation
```python
import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash-exp")
async def synthesize_reasoning(
transaction: dict,
mcp_results: dict,
plan: dict
) -> str:
"""
Gemini synthesizes all information into coherent explanation
"""
prompt = f"""
You are a credit card optimization expert. Synthesize the following information into a clear recommendation.
Transaction:
{json.dumps(transaction, indent=2)}
MCP Results:
{json.dumps(mcp_results, indent=2)}
Decision Factors (in order of importance):
{json.dumps(plan['decision_factors'], indent=2)}
Your task:
1. Compare all card options on the decision factors
2. Identify the optimal card with clear reasoning
3. Explain why alternatives are suboptimal
4. Provide any warnings or caveats
5. Suggest future optimizations
Format your response as:
## Recommended Card
[Card name and key benefit]
## Reasoning
[Step-by-step logic]
## Comparison
[Table comparing top 3 options]
## Warnings
[Any caveats or cap warnings]
## Future Optimization
[How to maximize rewards going forward]
Be specific with numbers and percentages.
"""
response = model.generate_content(
prompt,
generation_config={
"temperature": 0.7,
"max_output_tokens": 2048
}
)
return response.text
```
### Example Reasoning Output
```markdown
## Recommended Card
**American Express Gold** - 4x points on U.S. supermarkets
## Reasoning
1. **Reward Rate Comparison:**
- Amex Gold: 4x points = $5.10 cash value (1.3 cpp transfer)
- Citi Custom Cash: Would be 5% = $6.38, but monthly cap hit
- Chase Freedom Flex: 1x points = $1.28 (not grocery quarter)
Winner: Amex Gold ($5.10 actual rewards)
2. **Merchant Acceptance:**
- Whole Foods accepts American Express β
- No foreign transaction fees β
3. **Spending Cap Status:**
- Current: $2,450 / $25,000 annual cap (9.8% used)
- This transaction: $127.50 (0.5% of cap)
- Safe to use β
4. **Future Spending Forecast:**
- Predicted 3 more grocery trips this month ($382.50 total)
- Projected rewards: $15.30
- Still well under annual cap
## Comparison
| Card | Earn Rate | Rewards | Cap Status | Accepted? |
|------|-----------|---------|------------|-----------|
| **Amex Gold** | 4x | **$5.10** | 9.8% used | β
Yes |
| Citi Custom Cash | 5% | $1.28 | Cap hit | β
Yes |
| Chase Freedom Flex | 1x | $1.28 | N/A | β
Yes |
## Warnings
β οΈ **Citi Custom Cash Cap Hit**: You've reached the $500 monthly limit on Citi Custom Cash. It will reset on Feb 1st. Consider using it for non-grocery purchases this month.
β οΈ **Annual Cap Tracking**: You're at $2,450/$25,000 on Amex Gold's supermarket bonus. At current pace, you'll hit the cap in November. Plan to switch to Citi Custom Cash after that.
## Future Optimization
1. **This Month**: Continue using Amex Gold for groceries (best rate)
2. **Next Month**: Switch to Citi Custom Cash (5% > 4x after cap resets)
3. **After $25k Cap**: Use Citi Custom Cash or Chase Freedom (if grocery quarter)
4. **Consider**: Blue Cash Preferred (6% groceries, no cap) if spending exceeds $25k/year
**Estimated Annual Savings**: $523 by following this strategy vs. using single card
```
---
## Phase 4: Response Formatting
### Implementation
```python
from pydantic import BaseModel
from typing import List, Optional
class RecommendedCard(BaseModel):
card_id: str
card_name: str
issuer: str
class Rewards(BaseModel):
points_earned: int
cash_value: float
earn_rate: str
class Alternative(BaseModel):
card_name: str
rewards: float
reason: str
class FinalRecommendation(BaseModel):
recommended_card: RecommendedCard
rewards: Rewards
reasoning: str
confidence: float
alternatives: List[Alternative]
warnings: List[str]
processing_time_ms: float
def format_recommendation(
mcp_results: dict,
reasoning: str,
processing_time: float
) -> FinalRecommendation:
"""Format final response"""
smart_wallet_result = mcp_results["smart_wallet"]
best_card = smart_wallet_result["recommended_card"]
# Extract alternatives
alternatives = []
for card in smart_wallet_result["all_cards_comparison"][1:4]:
alternatives.append(Alternative(
card_name=card["card_name"],
rewards=card["rewards"],
reason=card.get("note", "Lower rewards rate")
))
# Extract warnings
warnings = []
if "forecast" in mcp_results:
warnings.extend(mcp_results["forecast"].get("warnings", []))
return FinalRecommendation(
recommended_card=RecommendedCard(**best_card),
rewards=Rewards(**smart_wallet_result["rewards"]),
reasoning=reasoning,
confidence=calculate_confidence(mcp_results),
alternatives=alternatives,
warnings=warnings,
processing_time_ms=processing_time
)
```
---
## Advanced Reasoning Patterns
### 1. Chain-of-Thought Reasoning
```python
prompt = """
Let's think through this step-by-step:
Step 1: Identify the category
- Merchant: {merchant}
- MCC: {mcc}
- Likely category: ?
Step 2: List cards with bonuses in this category
- Card A: X% on category
- Card B: Y points per dollar
- Card C: Z% cashback
Step 3: Calculate actual rewards
- Card A: ${amount} Γ X% = $?
- Card B: ${amount} Γ Y points Γ $0.01 = $?
- Card C: ${amount} Γ Z% = $?
Step 4: Check constraints
- Is Card A accepted at merchant?
- Is Card B near spending cap?
- Does Card C have annual fee?
Step 5: Make recommendation
Based on steps 1-4, the best card is...
"""
```
### 2. Self-Consistency
```python
# Generate multiple reasoning paths
reasoning_paths = []
for i in range(5):
response = model.generate_content(prompt, temperature=0.8)
reasoning_paths.append(response.text)
# Vote on most common recommendation
from collections import Counter
recommendations = [extract_card(path) for path in reasoning_paths]
most_common = Counter(recommendations).most_common(1)[0][0]
# Use the reasoning path that led to most common answer
final_reasoning = next(
path for path in reasoning_paths
if extract_card(path) == most_common
)
```
### 3. Reflection & Verification
```python
# Initial recommendation
initial_rec = await generate_recommendation(transaction, mcp_results)
# Self-critique
critique_prompt = f"""
Review this credit card recommendation:
{initial_rec}
Are there any errors or oversights?
- Did we miss a better card?
- Are the math calculations correct?
- Did we consider all constraints?
- Is the reasoning sound?
If you find issues, provide corrections.
"""
critique = model.generate_content(critique_prompt)
# Refine if needed
if "error" in critique.text.lower() or "issue" in critique.text.lower():
final_rec = await refine_recommendation(initial_rec, critique.text)
else:
final_rec = initial_rec
```
---
## Confidence Scoring
```python
def calculate_confidence(mcp_results: dict) -> float:
"""
Calculate confidence score based on multiple factors
"""
confidence = 1.0
# Factor 1: Reward difference (higher difference = higher confidence)
best_reward = mcp_results["smart_wallet"]["recommended_card"]["rewards"]
second_best = mcp_results["smart_wallet"]["all_cards_comparison"][1]["rewards"]
reward_gap = (best_reward - second_best) / best_reward
if reward_gap < 0.1: # Less than 10% difference
confidence *= 0.8
# Factor 2: Merchant acceptance certainty
if "rewards_rag" in mcp_results:
rag_confidence = mcp_results["rewards_rag"]["sources"][0]["relevance_score"]
confidence *= rag_confidence
# Factor 3: Cap warnings
if "forecast" in mcp_results:
if mcp_results["forecast"].get("warnings"):
confidence *= 0.9
# Factor 4: Data freshness
# (Lower confidence for stale data)
return round(confidence, 2)
```
---
## Error Handling & Fallbacks
```python
async def recommend_with_fallback(transaction: dict):
"""Graceful degradation if MCP servers fail"""
try:
# Try full reasoning pipeline
plan = await create_execution_plan(transaction)
mcp_results = await execute_mcp_calls(plan, transaction)
reasoning = await synthesize_reasoning(transaction, mcp_results, plan)
return format_recommendation(mcp_results, reasoning)
except Exception as e:
logger.error(f"Full pipeline failed: {e}")
try:
# Fallback: Use only Smart Wallet MCP
result = await call_smart_wallet(transaction)
return format_simple_recommendation(result)
except Exception as e2:
logger.error(f"Fallback failed: {e2}")
# Last resort: Rule-based recommendation
return rule_based_recommendation(transaction)
def rule_based_recommendation(transaction: dict):
"""Simple rule-based fallback"""
rules = {
"Groceries": "Amex Gold (4x points)",
"Dining": "Amex Gold (4x points)",
"Travel": "Chase Sapphire Reserve (3x points)",
"Gas": "Costco Anywhere Visa (4% cashback)",
"Default": "Citi Double Cash (2% on everything)"
}
category = transaction["category"]
recommended = rules.get(category, rules["Default"])
return {
"recommended_card": recommended,
"reasoning": f"Based on category rules for {category}",
"confidence": 0.60, # Lower confidence for rule-based
"warnings": ["Recommendation based on simplified rules (MCP servers unavailable)"]
}
```
---
## Testing & Evaluation
### Unit Tests
```python
import pytest
@pytest.mark.asyncio
async def test_planning_phase():
"""Test Claude's planning logic"""
transaction = {
"merchant": "Whole Foods",
"category": "Groceries",
"amount_usd": 127.50,
"mcc": "5411"
}
plan = await create_execution_plan(transaction)
assert "strategy" in plan
assert "mcp_calls" in plan
assert len(plan["mcp_calls"]) > 0
assert plan["confidence_threshold"] >= 0.5
@pytest.mark.asyncio
async def test_reasoning_phase():
"""Test Gemini's synthesis"""
mcp_results = {
"smart_wallet": {
"recommended_card": {"card_name": "Amex Gold"},
"rewards": {"cash_value": 5.10}
}
}
reasoning = await synthesize_reasoning({}, mcp_results, {})
assert "Amex Gold" in reasoning
assert "$5.10" in reasoning
```
### Integration Tests
```python
@pytest.mark.asyncio
async def test_end_to_end_recommendation():
"""Test full recommendation pipeline"""
transaction = {
"user_id": "test_user",
"merchant": "Whole Foods",
"category": "Groceries",
"amount_usd": 127.50,
"mcc": "5411"
}
result = await recommend_with_fallback(transaction)
assert result["recommended_card"]["card_name"]
assert result["rewards"]["cash_value"] > 0
assert result["confidence"] >= 0.5
assert len(result["reasoning"]) > 100
```
---
**Related Documentation:**
- [MCP Server Implementation](./mcp_architecture.md)
- [Modal Deployment Guide](./modal_deployment.md)
- [LlamaIndex RAG Setup](./llamaindex_setup.md)
```
---
|