The Problem
AI/ML models' performance is typically assessed by how often they produce expected outcomes from a test data set. However, this approach often breaks down in the real world due to changing definitions of acceptable outcomes. This can look like:
- Different sensitivities to errors or defects: User requirements vary widely. Stryker may have much narrower tolerances when detecting defects for medical device screws than Ikea has for furniture screws.
- Competing definitions of a variable: Think about sentiment classification - what’s classified as a negative sentiment will vary significantly between a fast, casual restaurant chain and a car manufacturer. Both want to detect negative reviews, but they set very different criteria.
- Varying values from false positives: For fraud detection models, companies may have different tolerances for detected and undetected fraudulent transactions based on the cost of fraud compared to the costs of intervention. This creates differences in preferences for when transactions get escalated (or don’t).
These variations can cause previously high-performing models to become inaccurate or inappropriate for certain users, creating challenges in meeting diverse or shifting capabilities.
The Status Quo
A static model will inherently have a limited ability to respond in these changing scenarios, decreasing the effectiveness of the model when clients want outcomes weighted differently. To address these challenges, teams often rely on two common strategies:
- Training new models: This requires new data and significant time to build production-grade models that meet new criteria. It also leads to managing an expanding set of models, creating technical debt.
- Human-in-the-loop approaches: While this mitigates the need for new models, it introduces ongoing expenses, especially for highly skilled reviewers, and creates scaling challenges as more reviewers need to be hired constantly.
Both approaches clean up inaccuracies but limit the ability to respond quickly and cost-effectively to new scenarios. This can delay revenue, prolong sales cycles, and lead to shelved internal projects.
How Orca Fixes This
Orca’s unique model architecture, where traditional deep-learning models learn to explicitly leverage external information during inference, solves this challenge of diverse (or shifting) opinions on what correct outputs actually are. Simply introduce a new set of external data into your models’ memory, and the model adjusts its outputs, increasing its accuracy against the new criteria. With this approach, Orca creates:
- Real-time adaptation: By introducing new external data into the model's memory, outputs adjust to new criteria almost immediately.
- Leveraging existing capabilities: The model benefits from its pre-existing reasoning abilities, reducing the amount of new data needed for adaptation.
- Simplified model management: Maintaining one base model with independent data sets allows teams to focus on improvements rather than managing multiple models.
This solution enables businesses to quickly and efficiently respond to changing criteria, whether driven by diverse customer needs or shifting internal goals, without the drawbacks of traditional approaches.