Understanding real-time model adaptability

Orca is powered by contextual embeddings in an external database, ensuring that your model uses the most appropriate data during inference.

Step 1: Teach the model to use context

Traditional neural networks learn to make predictions by memorizing the  distribution of training data. However, this training data can fall out of date, compromising the model's effectiveness.

When you set-up a classifier or any other deep-learning model using Orca, the model learns to adapt outputs based on context you choose to supply.

Step 2: Access data during inference

Because the behavior is now encoded in the neural network, an Orca-trained model leverages that context during inference, just like a generative model using retrieval augmentation.

Now, any inference run reacts almost instantly to new information you supply to the model.

Step 3: Monitor. Update. Swap

Once your model learns to use Orca, you can proactively manage memories to address data drift and other dynamic changes. You can even do complete swaps of context for customized or personalized variants of a model.

FAQs

Let's dive into the details.

Installation and Set-up

What types of models does Orca work with?

Orca works with discriminative models, including:

- Image classifiers
- Ranking models (including within recommendation systems)
- Text classifiers
- More model types coming soon!

How long does it take to implement an Orca model?

Implementing an Orca model typically requires about the same amount of engineering effort as accessing a foundational LLM model via an API call. However, just like training or fine-tuning any model — or setting up a high-performing RAG pipeline — a model’s accuracy is highly dependent on the available data. As a result, the overall time needed to get started with Orca depends on the quality of the available data for your use case. Retrieval-augmented models will also be comparatively faster if you have to data-wrangle because you don’t need to wait for a training cycle to complete before you can see if new or different data improved your model’s accuracy.

Will Orca help with data collection and curation of data used in my models’ memory?

The Orca team will work closely with you during both data gathering and preparation phases to optimize your model’s performance. Activities may include:

Developing Synthetic Data: Creating synthetic data to fill gaps in your existing data. Teams often use this synthetic data initially and gradually replace it with real data as it becomes available, refining the model's performance over time.

Labeling Unlabeled Data: Accelerating data labeling efforts through a combination of (1) auto-labelling new data points and (2) identifying outliers where we have less confidence for human review and labelling by your team (or a third party). We frequently use examples provided by you as memories, then use our own Orca-enabled classifier to create labels.  This approach allows Orca to increase the effectiveness and accuracy of our auto-labelling when we start with or begin to acquire more real-world examples.

Testing and Curating Memory Data: Evaluating and refining memory data to ensure accurate and unbiased distributions.

Do I need to collect actual, real-world data for an Orca model to work?

Technically, you can build an Orca model  by populating the initial memories with synthetic data. If your data contains any deviations in distribution from the real world, you may find that model accuracy is lower than initial test results show.  To manage this risk, you could:

- Expand a limited data set by supplementing the existing distribution with synthetic data.
- Proactively re-tune the memories as you gather new data. This approach is especially helpful if you don’t have a way to collect data without an app in production.

In situations where you have limited data, we recommend that you prioritize that data for your evaluation data- set and leverage synthetic data for your model’s first memories. Taking this approach ensures you can evaluate and optimize model performance based on a foundational truth, enabling a higher level of accuracy. This also helps you avoid accidental biases in a synthetic evaluation set.

Will an Orca model work with my existing tech stack and ML tooling?

Yes. Orca does not require you to change your current tech stack. This includes your existing data storage solution along with the tooling you use for monitoring or evaluating your ML models. Orca’s embedding database and memory-tuning tools sit alongside your existing solutions.

How do retrieval-augmented models perform on traditional benchmarking tests?

Depending on the exact use case, Orca may increase your model’s accuracy against a static benchmark. Orca performs better in static evaluations when you have very dense data like images or challenges  with skewed datasets (e.g. toxicity detection). In other cases, a model using retrieval-augmentation reaches parity with a traditional classifier or ranking model initially, but the augmented model maintains that performance across changes to inputs and desired outputs.

It’s important to be realistic, however. A toy model using retrieval augmentation  with very limited data in its memory simply won’t be able to compete with a state-of-the-art model trained on a rich amount of data. When comparing comparably-sized models, retrieval-augmented models achieve the same initial accuracy and then maintain that accuracy even as the data drifts over time.

Model Performance

How does a retrieval-augmented model perform compared to a traditional deep learning model?

In more dynamic scenarios, retrieval-augmented models maintain better performance over time than a traditional deep learning model. You can also optimize a retrieval-augmented model to perform well across multiple test data-sets (for example, when you have different preferences on how to classify certain inputs).

In a static scenario, where models have access to well-curated training data, retrieval-augmented models and traditional deep learning models have comparable levels of accuracy.  You’ll notice this when measuring these two types of models against benchmark data sets.

How is retrieval augmentation different than using an LLM and RAG?

A retrieval-augmented classifier built with Orca utilizes the same concepts as injecting context into an LLM through RAG. Both approaches ensure that the model can access the latest information and provide customized responses based on data injected during inference.

Both types of models make sense to use in specific use cases. LLMs customized using RAG are most effective in generative use cases - content creation, expanding/structuring specific documents from text inputs, etc. In these applications, the ability for an LLM to generate text merits the increased compute and architectural complexity of these massive models.

Retrieval-augmented classifiers and rankers solve the discriminative use cases described in their names (classification and ranking) that require the ability to customize or quickly update the model.  Using a retrieval-augmented classifier does have several significant advantages over leveraging a generative model masquerading as a classifier:
- Scope can be broader than natural language processing, so you can support use cases like image classification.
- Purpose built models typically achieve similar (or better) accuracy with lower latency and are more cost effective than leveraging a large-language-model.

How can I improve model performance over time?

The single best way to improve the performance of a retrieval-augmented model is by continuing to optimize the data that populates the model’s memory. By actively collecting more data and updating both your evaluation data and the data stored in your model’s memory, your model becomes more effective at matching your goals and reflecting ground-truth.

Managing a Production Instance

Does Orca offer hosting?

Orca does offer the option of full hosting, including inference, for retrieval-augmented models that you build with us. However, Orca can also run in your dedicated cloud or on-premise set up if you’d prefer for security reasons.

Can I use Orca to manage models that use highly sensitive data like PHI and PII?

If you’re willing to set up an Orca instance within your own cloud instance or on-premise - absolutely.  Otherwise, we can help you set up proxy data sets for your model’s memories, but that is operationally more complex.

What does Orca do with the data stored as memories in my data set?

Orca does not train any models  with your data, unless we have contractually agreed to build a custom embedding model as part of your Enterprise engagement with us.

Right now, the data you store as memories in Orca act as external references for your model to boost performance.

We have a high volume of read and writes per minute. Can Orca support that kind of scale?

Yes, Orca has scaling characteristics very similar to a conventional database - meaning it can scale to nearly arbitrary read and write volumes, but you will at some point need to employ standard storage/database scalability techniques to get there (e.g., sharding, partitioning, read replicas, etc.)

Does Orca introduce any latency?

Yes, accessing external data with Orca introduces some latency compared to a model of similar size that relies solely on encoded training data. However, this latency penalty — typically a few milliseconds  — is generally very small and usually imperceptible to end users.

In some cases, Orca can actually reduce overall latency for an application compared to injecting context through a re-purposed generative model. By shifting to a smaller, more efficient models that maintains accuracy and has context awareness, Orca may require less compute per input, and a simpler, faster architecture.

Still have questions?

Get in touch. We'd love to help you build more performant models.

Book a call

Blog

Learn more about what Orca is doing and where we’re going.

Stop Contorting Your AI App into an LLM
4 minutes

Stop Contorting Your AI App into an LLM

Why converting your discriminative model into an LLM for RAG isn't always worth it.
Building Adaptable AI Systems for a Dynamic World
4 min read

Building Adaptable AI Systems for a Dynamic World

Orca's vision for the future of AI is one where models adapt instantly to changing data and objectives—unlocking real-time agility without the burden of retraining.
How Orca Helps You Customize to Different Preferences
1 min read

How Orca Helps You Customize to Different Preferences

When evaluating an ML model's performance, the definition of "correct" can vary greatly across individuals and customers, posing a challenge in managing diverse preferences.
Keep Up With Rapidly-Evolving Data Using Orca
1 min read

Keep Up With Rapidly-Evolving Data Using Orca

Orca can help models adapt to rapid data drift without the need for costly retraining using memory augmentation techniques.
Tackling Toxicity: How Orca’s Retrieval Augmented Classifiers Simplify Content Moderation
10 min read

Tackling Toxicity: How Orca’s Retrieval Augmented Classifiers Simplify Content Moderation

Detecting toxicity is challenging due to data imbalances and the trade-off between false positives and false negatives. Retrieval-Augmented Classifiers provide a robust solution for this complex problem.
How Orca Helps Your AI Adapt to Changing Business Objectives
2 min read

How Orca Helps Your AI Adapt to Changing Business Objectives

ML models must be adaptable to remain effective as business problems shift like targeting new customers, products, or goals. Learn how Orca can help.
How Orca Helps You Instantly Expand to New Use Cases
2 min read

How Orca Helps You Instantly Expand to New Use Cases

ML models in production often face unexpected use cases, and adapting to these can provide significant business value, but the challenge is figuring out how to achieve this flexibility.
Orca's Retrieval-Augmented Image Classifier Shows Perfect Robustness Against Data Drift
5 min read

Orca's Retrieval-Augmented Image Classifier Shows Perfect Robustness Against Data Drift

Memory-based updates enable an image classifier to maintain near-perfect accuracy even as data distributions shifted—without the need for costly retraining.
Retrieval-Augmented Text Classifiers Adapt to Changing Conditions in Real-Time
6 min read

Retrieval-Augmented Text Classifiers Adapt to Changing Conditions in Real-Time

Orca’s RAC text classifiers adapt in real-time to changing data, maintaining high accuracy comparable to retraining on a sentiment analysis of airline-related tweets.
Survey: Data Quality and Consistency Are Top Issues for ML Engineers
4 min read

Survey: Data Quality and Consistency Are Top Issues for ML Engineers

Orca's survey of 205 engineers revealed that data challenges remain at the forefront of machine learning model development.

Find out if Orca is right for you

Speak to our ML engineers to see if we can help you create more consistent models.