资讯

历史

科技

环境与自然

成长

游戏

财经

文学与艺术

美食

健康

家居

文化

情感

汽车

三农

军事

旅行

运动

教育

生活

星座命理

RAG实战篇：检索召回结果不准确？试试这三种实用方法

创作时间:

作者:

@小白创作中心

RAG实战篇：检索召回结果不准确？试试这三种实用方法

引用

来源

https://www.53ai.com/news/RAG/2024101145798.html

在构建RAG（Retrieval-Augmented Generation）系统时，检索召回（Retrieval）环节的优化至关重要。本文将详细介绍三种实用方法：重排序（Reranking）、压缩（Refinement）和纠正性Rag（Corrective Rag），帮助提升LLM大模型的回答准确度。

在检索召回过程中，用户的问题会被输入到嵌入模型中进行向量化处理，然后系统会在向量数据库中搜索与该问题向量语义上相似的知识文本或历史对话记录并返回。在Naive Rag中，系统会将所有检索到的块直接输入到LLM生成回答，导致出现中间内容丢失、噪声占比过高、上下文长度限制等问题。

下面，我们结合源代码，详细介绍下Reranking（重排序）、Refinement（压缩）和Corrective Rag（纠正性Rag）这三种优化召回准确率的方案。具体的源代码地址可以在文末获取。

1. Rerank（重排序）

重排序，顾名思义，就是将检索召回的结果，按照一定的规则或逻辑重新排序，从而将相关性或准确度更高的结果排在前面，提升检索质量。重排序主要有两种类型，基于统计打分的重排序和基于深度学习的重排序。

基于统计的重排序会汇总多个来源的候选结果列表，使用多路召回的加权得分或倒数排名融合（RRF）算法来为所有结果重新算分。这种方法的优势是计算简单，成本低效率高，广泛用于对延迟较敏感的传统检索系统中，比如内部知识库检索、电商智能客服检索等。

在《RAG实战篇：优化查询转换的五种高级方法，让大模型真正理解用户意图》一文中，提到RAG Fusion 中的 reciprocal_rank_fusion 就是一种基于统计打分的重排序，我们再来回顾一下，如以下代码所示：

def reciprocal_rank_fusion(results: list[list], k=60):  
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents   
    and an optional parameter k used in the RRF formula """  
    # Initialize a dictionary to hold fused scores for each unique document  
    fused_scores = {}  
    # Iterate through each list of ranked documents  
    for docs in results:  
        # Iterate through each document in the list, with its rank (position in the list)  
        for rank, doc in enumerate(docs):  
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)  
            doc_str = dumps(doc)  
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0  
            if doc_str not in fused_scores:  
                fused_scores[doc_str] = 0  
            # Retrieve the current score of the document, if any  
            previous_score = fused_scores[doc_str]  
            # Update the score of the document using the RRF formula: 1 / (rank + k)  
            fused_scores[doc_str] += 1 / (rank + k)  
    # Sort the documents based on their fused scores in descending order to get the final reranked results  
    reranked_results = [  
        (loads(doc), score)  
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)  
    ]  
    # Return the reranked results as a list of tuples, each containing the document and its fused score  
    return reranked_results

基于深度学习模型的重排序，通常被称为 Cross-encoder Reranker，经过特殊训练的神经网络可以更好地分析问题和文档之间的相关性。这类重排序可以给问题和文档之间的语义相似度进行打分，打分只取决于问题和文档的文本内容，不取决于文档在召回结果中的位置。这种方法的优点是检索准确度更高，但成本更高，响应时间更慢，比较适合于对检索精度要求极高的场景，比如医疗问诊。

我们也可以使用大名鼎鼎的Cohere进行重排，一个非常优秀的开源工具，支持多种重排序策略。其使用方法也非常简单，如以下代码所示：

from langchain_community.llms import Cohere  
from langchain.retrievers import ContextualCompressionRetriever  
from langchain.retrievers.document_compressors import CohereRerank  
from langchain.retrievers.document_compressors import CohereRerank  
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})  
# Re-rank  
compressor = CohereRerank()  
compression_retriever = ContextualCompressionRetriever(  
    base_compressor=compressor, base_retriever=retriever  
)  
compressed_docs = compression_retriever.get_relevant_documents(question)

2. Refinement（压缩）

压缩，即对于检索到的内容块，不要直接输入大模型，而是先删除无关内容并突出重要上下文，从而减少整体提示长度，降低冗余信息对大模型的干扰。langchain中有一个基础的上下文压缩检索器可以使用，叫做ContextualCompressionRetriever。

from langchain.retrievers import ContextualCompressionRetriever  
from langchain.retrievers.document_compressors import LLMChainExtractor  
from langchain_openai import OpenAI  
llm = OpenAI(temperature=0)  
compressor = LLMChainExtractor.from_llm(llm)  
compression_retriever = ContextualCompressionRetriever(  
    base_compressor=compressor, base_retriever=retriever  
)  
compressed_docs = compression_retriever.invoke(  
    "What did the president say about Ketanji Jackson Brown"  
)

LLMChainFilter是一个稍微简单但更强大的压缩器，它使用 LLM 链来决定过滤掉哪些最初检索到的文档以及返回哪些文档，而无需操作文档内容

from langchain.retrievers.document_compressors import LLMChainFilter  
_filter = LLMChainFilter.from_llm(llm)  
compression_retriever = ContextualCompressionRetriever(  
    base_compressor=_filter, base_retriever=retriever  
)  
compressed_docs = compression_retriever.invoke(  
    "What did the president say about Ketanji Jackson Brown"  
)  
pretty_print_docs(compressed_docs)

3. Corrective Rag（纠错性Rag）

Corrective-RAG (CRAG) 是一种 RAG 策略，它结合了对检索到的文档进行自我反思/自我评分。CRAG 增强生成的方式是使用轻量级的“检索评估器”，该评估器为每个检索到的文档返回一个置信度分数，然后该分数决定触发哪种检索操作。例如评估器可以根据置信度分数将检索到的文档标记为三个桶中的一个：正确、模糊、不正确。

如果所有检索到的文档的置信度分数均低于阈值，则假定检索“不正确”。这会触发采取新的知识来源（例如网络搜索）的行动，以实现生成的质量。如果至少有一个检索到的文档的置信度分数高于阈值，则假定检索“正确”，这会触发对检索到的文档进行知识细化的方法。知识细化包括将文档分割成“知识条”，然后根据相关性对每个条目进行评分，最相关的条目被重新组合为生成的内部知识。

所以，Corrective Rag的关键在于”检索评估器“的设计，以下是一个实现检索评估器的示例：

from langchain_core.prompts import ChatPromptTemplate  
from langchain_core.pydantic_v1 import BaseModel, Field  
from langchain_openai import ChatOpenAI  
# Data model  
class GradeDocuments(BaseModel):  
    """Binary score for relevance check on retrieved documents."""  
    binary_score: str = Field(  
        description="Documents are relevant to the question, 'yes' or 'no'"  
    )  
# LLM with function call  
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)  
structured_llm_grader = llm.with_structured_output(GradeDocuments)  
# Prompt  
system = """You are a grader assessing relevance of a retrieved document to a user question. \n   
If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant. \n  
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""  
grade_prompt = ChatPromptTemplate.from_messages(  
    [  
        ("system", system),  
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),  
    ]  
)  
retrieval_grader = grade_prompt | structured_llm_grader  
question = "agent memory"  
docs = retriever.get_relevant_documents(question)  
doc_txt = docs[1].page_content  
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

但是，Corrective Rag的局限性是严重依赖于检索评估器的质量，并容易受到网络搜索引入的偏见的影响，因此微调检索评估器可能是不可避免的。

到这里，三种有效优化检索召回的方法就介绍完了。

总结

在这篇文章中，详细介绍了优化Retrival（检索召回）的三种方法，包括Rerank（重排序）、RefineMent（压缩）、Corrective Rag（纠错性Rag）。检索召回的下一步是最终内容生成，即使使用了上述检索召回的优化方案，最终生成环节也可能遇到格式错误、内容不完整、政治不正确等问题。因此，在生成环节，我们也需要相应的优化方案，给整个RAG流程画上完美的句号。