A Match Made in Heaven: LLM Judgment at Vector-Search Speed
Last year, I spent far too long chasing a stubborn data quirk: why our system couldn't see that “The Law Offices of John Miller” and “Miller, John S. PLLC” were the same entity. It led to a bigger question: can you achieve LLM-level judgment at vector-search speeds? Traditional vector search is fast, but often misses subtle context like this, while relying entirely on LLMs is too slow and costly for production at scale. In this talk, I’ll share how we bridged that gap at Intuit with a hybrid system that combines fast vector search, efficient small transformer-based models, and fine-tuned SLMs. The result is a system capable of handling millions of weekly entity comparisons, significantly boosting recall while maintaining precision levels comparable to traditional methods.