Tag: performance

Nov 4, 2025 · Amit Kothari · AI
Cache the prompt, not the response - why most LLM caching fails
Your LLM API bills are eating your budget because you are caching the wrong thing. Most teams cache responses when they should cache prompts. Anthropic's prompt caching cuts costs by up to 90% and reduces latency by 85% by reusing processed context instead of reprocessing it every time.
aicost-optimizationperformanceinfrastructure
Nov 4, 2025 · Amit Kothari · AI
Real-time AI streaming - perception beats technical perfection
Most companies over-engineer real-time AI systems by focusing on technical latency instead of user perception. The difference between 50ms and 200ms response time rarely matters to users, but infrastructure complexity differs enormously. Here is how to build streaming AI that feels instant without breaking budget constraints.
aireal-timestreamingperformancearchitecture