Version: 2.0

Reranking

Initial search results often fail to capture nuanced relevance or diversity, potentially leading to suboptimal user experiences. Utilizing Vectara's reranking can significantly enhance the quality and usefulness of search results, leading to more effective information retrieval.

Reranking search results involves a process of rescoring and refining an initial set of query results to achieve a more precise ranking. It employs a machine learning model that while slower than the rapid retrieval step, offers more accurate results.

Available rerankers

Vectara offers multiple reranking models that enable you to choose the best one for your data and use case. You can evaluate different models against your own dataset to determine which provides optimal results for your domain and accuracy and latency requirements.

Reranker Name	API Name	Description
Qwen3 Reranker (default)	`qwen3-reranker`	High-performance multilingual neural reranker optimized for accuracy. In many benchmarks, Qwen3 demonstrates strong performance, though results vary by dataset.
Mixbread Reranker	`mxbai-rerank-base-v2`	Efficient production-friendly model offering a good balance between speed and accuracy.
Multilingual Reranker v1 (Slingshot)	`Rerank_Multilingual_v1`	Neural reranker providing more accurate ranking than initial Boomerang retrieval. While computationally more expensive, it offers improved text scoring across a wide range of languages.
Maximal Marginal Relevance (MMR) Reranker	`type=mmr`	Diversifies results while maintaining relevance.
User Defined Function Reranker	`type=userfn`	Applies custom scoring based on metadata or business rules.

tip

To enable reranking in the Vectara console, navigate to the Query tab of a corpus and select Retrieval. Use this for exploration and experimenting with the API.

Chain reranking

The Vectara Chain Reranker (type=chain) lets you combine multiple reranking strategies in sequence to meet more complex search requirements. This lets you completely customize the functionality of Vectara to your needs by giving you absolute control over the ranking functions. For details, see Chain Reranker.

Knee reranking

Designed to work after the Slingshot reranker in a chain (type=userfn and user_function=knee()), knee reranking dynamically filters results by detecting natural cutoff points, improving precision while maintaining recall.

Enable reranking

To enable reranking, specify the appropriate value for the type in the reranker object. For the MMR reranker, use mmr. In most scenarios, it makes sense to use the default query start value of 0 so that you're reranking all of the best initial results. You can also set the limit of the query to the total number of documents you wish to rerank. The default value is 25.

The following example shows the limit and type values in a query. Note that this simplified example intentionally omits several parameter values.

CODE EXAMPLE

Using neural rerankers

For neural rerankers like Qwen3, Mixbread, or Multilingual v1, use type=customer_reranker and specify the reranker_name.

QUERY WITH QWEN3 RERANKER

QUERY WITH MIXBREAD RERANKER

Best practices

When working with multiple rerankers, consider the following best practices:

Experimentation: Each reranker behaves differently depending on your content and queries. Evaluate each reranker on your own dataset to determine which provides the best results for your specific use case.
Latency vs. accuracy: Larger models like Qwen3 tend to provide more accurate results but can add more latency compared to smaller models like Mixbread. Test both models to find the right balance for your application.
Fallback handling: Ensure your application handles reranker errors gracefully and can fall back to retrieval-only results if a reranker fails or times out.

Search cutoffs

Sometimes you may want to exclude results that are not relevant to a query, and ensure that only results with higher scores than a specific threshold are displayed. The cutoff property of the reranker object allows you to specify a minimum score threshold for search results to include after reranking.

By setting this cutoff value, you can control which results are considered relevant enough to return, filtering out results that do not meet the desired level of relevance. For example, when you set the cutoff to 0.5, only results with a score of 0.5 or higher are considered. For example:

CODE EXAMPLE

When a reranker is applied with a cutoff, it performs the following steps:

Reranks all input results based on the selected reranker.
Applies the cutoff, removing any results with scores below the specified threshold.
Returns the remaining results, sorted by their new scores.

note

This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff value, potentially further reducing the number of results at each stage. If both limit and cutoff are specified, the cutoff is applied first, followed by the limit.

caution

Search cutoffs are most effective when used with neural rerankers like Qwen3, Mixbread, or the Vectara Multilingual reranker (Slingshot), which provide normalized scores between 0 and 1. If you use hybrid search methods that involve BM25, scores may be unbounded, making cutoff values less predictable.

Search limits

The limit property of the reranker object allows you to have more granular control over the number of results returned after reranking. This limit is applied per each reranking stage, such as if you use chain reranking, and this limit affects the output and not the input to the reranker. When you apply this limit to the Multilingual reranker (Slingshot), Maximal Marginal Relevance (MMR) reranker, or User Defined Function reranker, it performs the following steps:

Applies score cutoff, if specified.
Reranks all input results based on the selected reranker.
Eliminates results that return null scores. Null scores are only returned deliberately by the User Defined Function reranker. For example, you want to eliminate some search results, have the UDF return null.
Sorts the reranked results based on their new scores.
Returns the top N results, where N is the value specified by this limit.

Imagine a scenario where you want to limit the output of results to a reranker, whether a single reranker, or within rerankers that are in a chain. For example, you want to process blog posts and ignore non-blog posts. You would set up a UDF to filter for blog categories and return null score for non-blog content.

if (get('$.document_metadata.category') == 'blog') get('$.score') else null

This would remove non-blog posts from the results. Then you can set a limit of 10 to get only the top 10 blog post results.

Combine cutoffs and limits

Using both cutoffs and limits in a chain allows for more refined control over query results.

CODE EXAMPLE

This filters out non-blog content where the UDF reranker limits the output to 10, and sends these 10 results to the Vectara Multilingual reranker which both removes results with a score below 0.5 and returns the top 3 results from the remaining set.

Improve summarization

You can also improve LLM summarization by using cutoffs and limits. For example, filter out low-scoring results with a high threshold before sending them for summarization, which can improve the quality of the generated summary. This example uses both Slingshot and a User Defined Function to send only highly relevant and recent documents for summarization.

CODE EXAMPLE

The first stage in the chain filters out documents with scores lower than 0.75 and it also limits the results to 10.
The next stage prioritizes documents based on their publish_ts value, which represents the publication timestamp.

Available rerankers​

Chain reranking​

Knee reranking​

Enable reranking​

Using neural rerankers​

Best practices​

Search cutoffs​

Search limits​

Combine cutoffs and limits​

Improve summarization​

Available rerankers

Chain reranking

Knee reranking

Enable reranking

Using neural rerankers

Best practices

Search cutoffs

Search limits

Combine cutoffs and limits

Improve summarization