郑爽跳舞视频,美女性感扣逼视频,美女视频日b

LLM之RAG：《Retrieval-Augmented Generation for Large Language Models: A Survey大型語言模型的檢索增強生成研究綜述》翻譯與解讀

導(dǎo)讀：這篇論文主要圍繞信息檢索增強生成(Retrieval Augmented Generation，簡稱RAG)技術(shù)進行概述和分析。

背景痛點：

>> 大語言模型(LLM)在處理知識密集型任務(wù)和回答離線知識更豐富的問題時面臨難題，例如產(chǎn)生錯誤信息或過時信息等問題。

>> 往往需要對LLM進行定制化訓(xùn)練，才能適應(yīng)不同場景下的應(yīng)用，這對開發(fā)人員和研究人員來說難度很大。

RAG技術(shù)的核心思想和解決方案：RAG通過將外部知識庫中的信息檢索成果整合到LLM的輸入context中，從而增強LLM處理知識型任務(wù)和產(chǎn)生更準確答案的能力。

RAG技術(shù)發(fā)展趨勢：

>> 從初級RAG到高級RAG，再到模塊化RAG，不斷優(yōu)化框架結(jié)構(gòu)。

>> 結(jié)合信息檢索、生成和增強不同技術(shù)模塊，形成完整流程。

>> 利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)、LLM產(chǎn)生的內(nèi)容等不同來源進行信息增強。

>> 探索迭代檢索、遞歸檢索、自適應(yīng)檢索等方法來優(yōu)化檢索過程。

>> 將RAG技術(shù)應(yīng)用和整合到定制訓(xùn)練中，實現(xiàn)LLM優(yōu)化的多種方式結(jié)合。

RAG技術(shù)的優(yōu)勢：

>> 無需重新訓(xùn)練LLM即可將外部新知識整合到模型中，更輕松地應(yīng)對需求變化。

>> 借助外部知識庫，LLM產(chǎn)出的答案更加準確、相關(guān)，能更好解決知識型問題。

>> RAG框架性能不斷提高，且可擴展到圖像、語音等多模態(tài)信息處理。

綜上，RAG技術(shù)通過有效結(jié)合LLM與外部知識，在保留LLM優(yōu)點的同時彌補其知識不足的缺陷，為LLM應(yīng)用于生產(chǎn)環(huán)境提供一條良好的路徑。

《Retrieval-Augmented Generation for Large Language Models: A Survey大型語言模型的檢索增強生成研究綜述》翻譯與解讀

地址

論文地址：https://arxiv.org/abs/2312.10997

時間

2024年1月5日

作者

Yunfan Gao 1, Yun Xiong 2, Xinyu Gao 2, Kangxiang Jia 2, Jinliu Pan 2, Yuxi Bi 3, Yi

Dai1, Jiawei Sun1, Qianyu Guo4, Meng Wang 3 and Haofen Wang 1,3 ?

同濟大學(xué)，復(fù)旦大學(xué)

Abstract

Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the models, particu-larly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs’ intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval , the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces the metrics and benchmarks for assessing RAG models, along with the most up-to-date evaluation framework. In conclusion, the paper delineates prospective avenues for research, including the identification of challenges, the expansion of multi-modalities, and the progression of the RAG infrastructure and its ecosystem. 1.

大型語言模型(llm)展示了重要的功能，但面臨著諸如幻覺、過時的知識和不透明、不可追溯的推理過程等挑戰(zhàn)。檢索-增強生成(Retrieval-Augmented Generation, RAG)通過整合來自外部數(shù)據(jù)庫的知識而成為一種很有前途的解決方案。這增強了模型的準確性和可信度，特別是對于知識密集型任務(wù)，并允許持續(xù)的知識更新和特定領(lǐng)域信息的集成。RAG將llm的內(nèi)在知識與外部數(shù)據(jù)庫的龐大動態(tài)存儲庫協(xié)同合并。這篇全面的綜述論文提供了對RAG范式進展的詳細檢查，包括幼稚RAG、高級RAG和模塊化RAG。詳細分析了RAG框架的三方面基礎(chǔ)，包括檢索技術(shù)、生成技術(shù)和增強技術(shù)。本文強調(diào)了這些關(guān)鍵組件中嵌入的最先進的技術(shù)，提供了對RAG系統(tǒng)進步的深刻理解。此外，本文還介紹了用于評估RAG模型的度量和基準，以及最新的評估框架。最后，本文描述了未來的研究途徑，包括識別挑戰(zhàn)、擴展多模態(tài)以及RAG基礎(chǔ)設(shè)施及其生態(tài)系統(tǒng)的進展。

1 Introduction

Large language models (LLMs) such as the GPT se-ries [Brown et al., 2020, OpenAI, 2023] and the LLama se-ries [Touvron et al., 2023], along with other models like Gemini [Google, 2023], have achieved remarkable suc-cess in natural language processing, demonstrating supe-rior performance on various benchmarks including Super-GLUE [Wang et al., 2019], MMLU [Hendrycks et al., 2020], and BIG-bench [Srivastava et al., 2022]. Despite these advancements, LLMs exhibit notable limitations, par-ticularly in handling domain-specific or highly special-ized queries [Kandpal et al., 2023]. A common issue is the generation of incorrect information, or ”hallucina-tions” [Zhang et al., 2023b], especially when queries extend beyond the model’s training data or necessitate up-to-date in-formation. These shortcomings underscore the impractical-ity of deploying LLMs as black-box solutions in real-world production environments without additional safeguards. One promising approach to mitigate these limitations is Retrieval-Augmented Generation (RAG), which integrates external data retrieval into the generative process, thereby enhancing the model’s ability to provide accurate and relevant responses.

大型語言模型(llm)，如GPT系列[Brown等人，2020,OpenAI, 2023]和LLama系列[Touvron等人，2023]，以及Gemini [Google, 2023]等其他模型，在自然語言處理方面取得了顯著的成功，在各種基準測試中表現(xiàn)出卓越的性能，包括Super-GLUE [Wang等人，2019]，MMLU [Hendrycks等人，2020]和BIG-bench [Srivastava等人，2022]。盡管取得了這些進步，llm仍然表現(xiàn)出明顯的局限性，特別是在處理特定領(lǐng)域或高度專業(yè)化的查詢方面[Kandpal等人，2023]。一個常見的問題是產(chǎn)生不正確的信息，或“幻覺”[Zhang等人，2023b]，特別是當(dāng)查詢超出模型的訓(xùn)練數(shù)據(jù)或需要最新信息時。這些缺點強調(diào)了在沒有額外保障的情況下將llm作為黑盒解決方案部署到實際生產(chǎn)環(huán)境中的不可行性。緩解這些限制的一種有希望的方法是檢索增強生成(retrieve - augmented Generation, RAG)，它將外部數(shù)據(jù)檢索集成到生成過程中，從而增強模型提供準確和相關(guān)響應(yīng)的能力。

RAG, introduced by Lewis et al. [Lewis et al., 2020] in mid-2020, stands as a paradigm within the realm of LLMs, enhancing generative tasks. Specifically, RAG involves an initial retrieval step where the LLMs query an external data source to obtain relevant information before proceeding to an-swer questions or generate text. This process not only informs the subsequent generation phase but also ensures that the re-sponses are grounded in retrieved evidence, thereby signif-icantly enhancing the accuracy and relevance of the output. The dynamic retrieval of information from knowledge bases during the inference phase allows RAG to address issues such as the generation of factually incorrect content, commonly referred to as “hallucinations.” The integration of RAG into LLMs has seen rapid adoption and has become a pivotal tech-nology in refining the capabilities of chatbots and rendering LLMs more viable for practical applications.

由Lewis et al. [Lewis et al.， 2020]在2020年年中引入的RAG是法學(xué)碩士領(lǐng)域的一個范例，增強了生成任務(wù)。具體來說，RAG涉及一個初始檢索步驟，llm在此步驟中查詢外部數(shù)據(jù)源以獲取相關(guān)信息，然后再繼續(xù)回答問題或生成文本。這個過程不僅通知了后續(xù)的生成階段，而且還確保了響應(yīng)是基于檢索到的證據(jù)，從而顯著提高了輸出的準確性和相關(guān)性。在推理階段從知識庫中動態(tài)檢索信息使RAG能夠解決諸如生成事實不正確的內(nèi)容(通常稱為“幻覺”)之類的問題。將RAG集成到法學(xué)碩士中已經(jīng)得到了迅速的采用，并已成為改進聊天機器人功能和使法學(xué)碩士在實際應(yīng)用中更可行的關(guān)鍵技術(shù)。

The evolutionary trajectory of RAG unfolds across four distinctive phases, as illustrated in Figure 1. In its in-ception in 2017, aligned with the emergence of the Trans-former architecture, the primary thrust was on assimilating additional knowledge through Pre-Training Models (PTM) to augment language models. This epoch witnessed RAG’s foundational efforts predominantly directed at optimizing pre-training methodologies.

RAG的進化軌跡在四個不同的階段展開，如圖1所示。在2017年的初始階段，與Trans-former架構(gòu)的出現(xiàn)相一致，其主要目的是通過預(yù)訓(xùn)練模型(PTM)吸收額外的知識，以增強語言模型。這個時代見證了RAG的基礎(chǔ)努力，主要是針對優(yōu)化預(yù)訓(xùn)練方法。

Following this initial phase, a period of relative dormancy ensued before the advent of chatGPT, during which there was minimal advancement in related research for RAG. The sub-sequent arrival of chatGPT marked a pivotal moment in the?trajectory, propelling LLMs into the forefront. The com-munity’s focal point shifted towards harnessing the capabil-ities of LLMs to attain heightened controllability and ad-dress evolving requirements. Consequently, the lion’s share of RAG endeavors concentrated on inference, with a minor-ity dedicated to fine-tuning processes. As LLM capabili-ties continued to advance, especially with the introduction of GPT-4, the landscape of RAG technology underwent a sig-nificant transformation. The emphasis evolved into a hybrid approach, combining the strengths of RAG and fine-tuning, alongside a dedicated minority continuing the focus on opti-mizing pre-training methodologies.

在這個初始階段之后，在chatGPT出現(xiàn)之前，有一段相對的休眠期，在此期間，對RAG的相關(guān)研究進展甚微。隨后，chatGPT的到來標志著這一發(fā)展軌跡的關(guān)鍵時刻，將法學(xué)碩士推向了前沿。社區(qū)的焦點轉(zhuǎn)向利用法學(xué)碩士的能力，以獲得更高的可控性，并解決不斷變化的需求。因此，RAG的大部分努力都集中在推理上，只有一小部分致力于微調(diào)過程。隨著法學(xué)碩士技術(shù)的不斷發(fā)展，尤其是GPT-4的引入，RAG技術(shù)的前景發(fā)生了重大變化。重點發(fā)展成為一種混合方法，結(jié)合RAG和微調(diào)的優(yōu)勢，以及專門的少數(shù)人繼續(xù)專注于優(yōu)化預(yù)訓(xùn)練方法。

Despite the rapid growth of RAG research, there has been a lack of systematic consolidation and abstraction in the field, which poses challenges in understanding the comprehensive landscape of RAG advancements. This survey aims to out-line the entire RAG process and encompass the current and future directions of RAG research, by providing a thorough examination of retrieval augmentation in LLMs.

盡管RAG研究發(fā)展迅速，但該領(lǐng)域缺乏系統(tǒng)的整合和抽象，這對理解RAG進展的全面前景提出了挑戰(zhàn)。本調(diào)查旨在概述整個RAG過程，并通過提供對法學(xué)碩士檢索增強的徹底檢查，涵蓋RAG研究的當(dāng)前和未來方向。

Therefore, this paper aims to comprehensively summarize and organize the technical principles, developmental history, content, and, in particular, the relevant methods and applica-tions after the emergence of LLMs, as well as the evaluation methods and application scenarios of RAG. It seeks to provide a comprehensive overview and analysis of existing RAG technologies and offer conclusions and prospects for future development methods. This survey intends to furnish readers and practitioners with a thorough and systematic comprehen-sion of large models and RAG, elucidate the progression and key technologies of retrieval augmentation, clarify the merits and limitations of various technologies along with their suit-able contexts, and forecast potential future developments.

因此，本文旨在對RAG的技術(shù)原理、發(fā)展歷史、內(nèi)容，特別是法學(xué)碩士出現(xiàn)后的相關(guān)方法和應(yīng)用，以及RAG的評價方法和應(yīng)用場景進行全面的總結(jié)和整理。它試圖對現(xiàn)有的RAG技術(shù)進行全面的概述和分析，并對未來的發(fā)展方法提出結(jié)論和展望。本調(diào)查旨在使讀者和從業(yè)者對大型模型和檢索增強有一個全面和系統(tǒng)的了解，闡明檢索增強的進展和關(guān)鍵技術(shù)，闡明各種技術(shù)的優(yōu)點和局限性以及它們的適用背景，并預(yù)測潛在的未來發(fā)展。

Our contributions are as follows:

>>We present a thorough and systematic review of the state-of-the-art RAG, delineating its evolution through paradigms including naive RAG, advanced RAG, and modular RAG. This review contextualizes the broader scope of RAG research within the landscape of LLMs.

>>We identify and discuss the central technologies integral to the RAG process, specifically focusing on the aspects of “Retrieval”, “Generator” and “Augmentation”, and delve into their synergies, elucidating how these com-ponents intricately collaborate to form a cohesive and effective RAG framework.

>>We construct a thorough evaluation framework for RAG, outlining the evaluation objectives and metrics. Our comparative analysis clarifies the strengths and weak-nesses of RAG compared to fine-tuning from various perspectives. Additionally, we anticipate future direc-tions for RAG, emphasizing potential enhancements to tackle current challenges, expansions into multi-modal settings, and the development of its ecosystem.

我們的貢獻如下:

>>我們對最先進的RAG進行了全面和系統(tǒng)的回顧，描述了其通過范例的演變，包括幼稚的RAG，先進的RAG和模塊化的RAG。這篇綜述的背景下，更廣泛的范圍內(nèi)的法學(xué)碩士研究RAG的景觀。

>>我們確定并討論了RAG過程中不可或缺的核心技術(shù)，特別關(guān)注“檢索”，“生成器”和“增強”方面，并深入研究了它們的協(xié)同作用，闡明了這些組件如何復(fù)雜地協(xié)作以形成一個有凝聚力和有效的RAG框架。

>>我們構(gòu)建了一個全面的RAG評估框架，概述了評估目標和指標。我們的對比分析從多個角度闡明了RAG與微調(diào)相比的優(yōu)缺點。此外，我們預(yù)測了RAG的未來方向，強調(diào)潛在的增強以應(yīng)對當(dāng)前的挑戰(zhàn)，擴展到多模式設(shè)置，以及其生態(tài)系統(tǒng)的發(fā)展。

The paper unfolds as follows: Section 2 and 3 define RAG and detail its developmental process. Section 4 through 6 ex-plore core components—Retrieval, “Generation” and “Aug-mentation”—highlighting diverse embedded technologies. Section 7 focuses on RAG’s evaluation system. Section 8 compare RAG with other LLM optimization methods and suggest potential directions for its evolution. The paper con-cludes in Section 9.

第二節(jié)和第三節(jié)對RAG進行了定義，并詳細介紹了RAG的發(fā)展過程。第4節(jié)至第6節(jié)探討了核心組件——檢索、“生成”和“增強”——重點介紹了各種嵌入式技術(shù)。第7節(jié)重點介紹RAG的評估體系。第8節(jié)將RAG與其他LLM優(yōu)化方法進行了比較，并提出了其可能的發(fā)展方向。本文在第9節(jié)結(jié)束。

2 Definition

The definition of RAG can be summarized from its workflow. Figure 2 depicts a typical RAG application workflow. In this scenario, a user inquires ChatGPT about a recent high-profile event (i.e., the abrupt dismissal and reinstatement of Ope-nAI’s CEO) which generated considerable public discourse. ChatGPT as the most renowned and widely utilized LLM, constrained by its pretraining data, lacks knowledge of re-cent events. RAG addresses this gap by retrieving up-to-date document excerpts from external knowledge bases. In this in-stance, it procures a selection of news articles pertinent to the inquiry. These articles, alongside the initial question, are then amalgamated into an enriched prompt that enables ChatGPT to synthesize an informed response. This example illustrates the RAG process, demonstrating its capability to enhance the model’s responses with real-time information retrieval.

RAG的定義可以從它的工作流程中總結(jié)出來。圖2描述了一個典型的RAG應(yīng)用程序工作流。在這個場景中，用戶向ChatGPT詢問最近發(fā)生的一件引人注目的事件(例如，Ope-nAI的首席執(zhí)行官突然被解雇和復(fù)職)，該事件引起了相當(dāng)大的公眾討論。ChatGPT作為最著名和應(yīng)用最廣泛的LLM，受其預(yù)訓(xùn)練數(shù)據(jù)的限制，缺乏對近期事件的了解。RAG通過從外部知識庫檢索最新的文檔摘要來解決這一差距。在這種情況下，它獲得了與調(diào)查有關(guān)的新聞文章的選擇。然后，這些文章與最初的問題合并成一個豐富的提示，使ChatGPT能夠合成一個知情的響應(yīng)。這個例子說明了RAG過程，展示了它通過實時信息檢索增強模型響應(yīng)的能力。

Technologically, RAG has been enriched through various innovative approaches addressing pivotal questions such as “what to retrieve” “when to retrieve” and “how to use the retrieved information”. For “what to retrieve” research has progressed from simple token [Khandelwal et al., 2019] and entity retrieval [Nishikawa et al., 2022] to more complex structures like chunks [Ram et al., 2023] and knowledge graph [Kang et al., 2023], with studies focusing on the granularity of retrieval and the level of data structur-ing. Coarse granularity brings more information but with lower precision. Retrieving structured text provides more information while sacrificing efficiency. The ques-tion of “when to retrieve” has led to strategies ranging from single [Wang et al., 2023e, Shi et al., 2023] to adap-tive [Jiang et al., 2023b, Huang et al., 2023] and multiple retrieval [Izacard et al., 2022] methods. High frequency of retrieval brings more information and lower efficiency. As for ”how to use” the retrieved data, integration techniques have been developed across various levels of the model architecture, including the input [Khattab et al., 2022], intermediate [Borgeaud et al., 2022], and output lay-ers [Liang et al., 2023]. Although the “intermediate” and “output layers” are more effective, there are problems with the need for training and low efficiency.

在技術(shù)上，RAG通過各種創(chuàng)新方法得到了豐富，這些方法解決了諸如“檢索什么”、“何時檢索”和“如何使用檢索到的信息”等關(guān)鍵問題。對于“檢索什么”的研究已經(jīng)從簡單的令牌[Khandelwal等人，2019]和實體檢索[Nishikawa等人，2022]發(fā)展到更復(fù)雜的結(jié)構(gòu)，如塊[Ram等人，2023]和知識圖譜[Kang等人，2023]，研究重點是檢索的粒度和數(shù)據(jù)結(jié)構(gòu)的水平。粗粒度帶來更多的信息，但精度較低。檢索結(jié)構(gòu)化文本可以在犧牲效率的同時提供更多信息?！昂螘r檢索”的問題導(dǎo)致了從單一[Wang等人，2023e, Shi等人，2023]到自適應(yīng)[Jiang等人，2023b, Huang等人，2023]和多重檢索[Izacard等人，2022]方法的策略。檢索頻率高，信息量大，效率低。至于“如何使用”檢索到的數(shù)據(jù)，已經(jīng)在模型架構(gòu)的各個層次上開發(fā)了集成技術(shù)，包括輸入層[Khattab等人，2022]、中間層[Borgeaud等人，2022]和輸出層[Liang等人，2023]。雖然“中間層”和“輸出層”更有效，但存在需要訓(xùn)練和效率低的問題。

RAG is a paradigm that enhances LLMs by integrating ex-ternal knowledge bases. It employs a synergistic approach, combining information retrieval mechanisms and In-Context Learning (ICL) to bolster the LLM’s performance. In this framework, a query initiated by a user prompts the retrieval of?pertinent information via search algorithms. This information is then woven into the LLM’s prompts, providing additional context for the generation process. RAG’s key advantage lies in its obviation of the need for retraining of LLMs for task-specific applications. Developers can instead append an ex-ternal knowledge repository, enriching the input and thereby refining the model’s output precision. RAG has become one of the most popular architectures in LLMs’ systems, due to its high practicality and low barrier to entry, with many con-versational products being built almost entirely on RAG.

RAG是一種通過集成外部知識庫來增強法學(xué)碩士的范例。它采用協(xié)同方法，將信息檢索機制和上下文學(xué)習(xí)(ICL)相結(jié)合，以提高法學(xué)碩士的表現(xiàn)。在這個框架中，用戶發(fā)起的查詢提示通過搜索算法檢索相關(guān)信息。然后將這些信息編織到LLM的提示中，為生成過程提供額外的上下文。RAG的主要優(yōu)勢在于它避免了針對特定任務(wù)應(yīng)用程序?qū)Ψ▽W(xué)碩士進行再培訓(xùn)的需要。開發(fā)人員可以附加一個外部知識庫，豐富輸入，從而改進模型的輸出精度。由于其高實用性和低入門門檻，RAG已成為法學(xué)碩士系統(tǒng)中最受歡迎的架構(gòu)之一，許多會話產(chǎn)品幾乎完全基于RAG構(gòu)建。

The RAG workflow comprises three key steps. First, the corpus is partitioned into discrete chunks, upon which vec-tor indices are constructed utilizing an encoder model. Sec-ond, RAG identifies and retrieves chunks based on their vec-tor similarity to the query and indexed chunks. Finally, the model synthesizes a response conditioned on the contextual information gleaned from the retrieved chunks. These steps form the fundamental framework of the RAG process, under-pinning its information retrieval and context-aware genera-tion capabilities. Next, we will provide an introduction to the RAG research framework.

RAG工作流包括三個關(guān)鍵步驟。首先，將語料庫劃分為離散塊，利用編碼器模型在其上構(gòu)建向量索引。其次，RAG根據(jù)它們與查詢和索引塊的向量相似性來標識和檢索塊。最后，該模型綜合了基于從檢索塊中收集到的上下文信息的響應(yīng)。這些步驟構(gòu)成了RAG流程的基本框架，支持其信息檢索和上下文感知生成功能。接下來，我們將介紹RAG研究框架。

3 RAG Framework

The RAG research paradigm is continuously evolving, and this section primarily delineates its progression. We cate-gorize it into three types: Naive RAG, Advanced RAG, and Modular RAG. While RAG were cost-effective and surpassed the performance of the native LLM, they also exhibited sev-eral limitations. The development of Advanced RAG and Modular RAG was a response to these specific shortcomings in Naive RAG.

RAG研究范式是不斷發(fā)展的，本節(jié)主要描述了它的發(fā)展過程。我們將其分為三種類型:初級RAG、高級RAG和模塊化RAG。雖然RAG具有成本效益，并且性能超過了原生LLM，但它們也有一些局限性。高級RAG和模塊化RAG的開發(fā)是對樸素RAG的這些具體缺點的回應(yīng)。

3.1 Naive RAG

The Naive RAG research paradigm represents the earliest methodology, which gained prominence shortly after the widespread adoption of ChatGPT. The Naive RAG follows a traditional process that includes indexing, retrieval, and gen-eration. It is also characterized as a “Retrieve-Read” frame-work [Ma et al., 2023a].

樸素的RAG研究范式代表了最早的方法論，它在ChatGPT被廣泛采用后不久就獲得了突出的地位。樸素RAG遵循一個傳統(tǒng)的過程，包括索引、檢索和生成。它也被描述為“檢索-讀取”框架[Ma et al.， 2023a]。

Indexing

The indexing process is a crucial initial step in data prepara-tion that occurs offline and involves several stages. It begins with data indexing, where original data is cleansed and ex-tracted, and various file formats such as PDF, HTML, Word, and Markdown are converted into standardized plain text. In order to fit within the context limitations of language models, this text is then segmented into smaller, more manageable chunks in a process known as chunking. These chunks are subsequently transformed into vector representations through an embedding model, chosen for its balance between infer-ence efficiency and model size. This facilitates similarity comparisons during the retrieval phase. Finally, an index is created to store these text chunks and their vector embed-dings as key-value pairs, which allows for efficient and scal-able search capabilities.

索引

索引過程是離線數(shù)據(jù)準備的關(guān)鍵初始步驟，涉及幾個階段。它從數(shù)據(jù)索引開始，清理和提取原始數(shù)據(jù)，并將各種文件格式(如PDF、HTML、Word和Markdown)轉(zhuǎn)換為標準化的純文本。為了適應(yīng)語言模型的上下文限制，該文本然后被分割成更小、更易于管理的塊，這個過程稱為分塊。這些塊隨后通過嵌入模型轉(zhuǎn)換為向量表示，選擇嵌入模型是為了在推理效率和模型大小之間取得平衡。這有助于在檢索階段進行相似性比較。最后，創(chuàng)建索引以鍵值對的形式存儲這些文本塊及其向量嵌入，從而實現(xiàn)高效且可擴展的搜索功能。

Retrieval

Upon receipt of a user query, the system employs the same en-coding model utilized during the indexing phase to transcode?the input into a vector representation. It then proceeds to compute the similarity scores between the query vector and the vectorized chunks within the indexed corpus. The system prioritizes and retrieves the top K chunks that demonstrate the greatest similarity to the query. These chunks are subse-quently used as the expanded contextual basis for addressing the user’s request.

檢索

在收到用戶查詢后，系統(tǒng)使用索引階段使用的相同編碼模型將輸入轉(zhuǎn)碼為矢量表示。然后，它繼續(xù)計算查詢向量和索引語料庫中矢量化塊之間的相似性分數(shù)。系統(tǒng)對與查詢最相似的前K個塊進行優(yōu)先級排序并檢索。這些塊隨后被用作擴展的上下文基礎(chǔ)，用于處理用戶的請求。

Generation

The posed query and selected documents are synthesized into a coherent prompt to which a large language model is tasked with formulating a response. The model’s approach to an-swering may vary depending on task-specific criteria, allow-ing it to either draw upon its inherent parametric knowledge or restrict its responses to the information contained within the provided documents. In cases of ongoing dialogues, any existing conversational history can be integrated into the prompt, enabling the model to engage in multi-turn dialogue interactions effectively.

一代

提出的查詢和選定的文檔被合成為一個連貫的提示，大型語言模型的任務(wù)是制定響應(yīng)。模型的回答方法可能會根據(jù)特定于任務(wù)的標準而有所不同，允許它利用其固有的參數(shù)知識或限制其對所提供文檔中包含的信息的響應(yīng)。在正在進行對話的情況下，任何現(xiàn)有的對話歷史都可以集成到提示符中，使模型能夠有效地進行多輪對話交互。

Drawbacks in Naive RAG

Naive RAG faces significant challenges in three key areas: “Retrieval,” “Generation,” and “Augmentation”.

Naive RAG的缺點

Naive RAG在三個關(guān)鍵領(lǐng)域面臨重大挑戰(zhàn):“檢索”、“生成”和“增強”。

Retrieval quality poses diverse challenges, including low precision, leading to misaligned retrieved chunks and po-tential issues like hallucination or mid-air drop. Low recall also occurs, resulting in the failure to retrieve all relevant chunks, thereby hindering the LLMs’ ability to craft comprehensive responses. Outdated information further compounds the problem, potentially yielding inaccurate retrieval results.

檢索質(zhì)量帶來了各種各樣的挑戰(zhàn)，包括精度低，導(dǎo)致檢索塊不對齊以及潛在的問題，如幻覺或半空中掉落。低回憶率也會發(fā)生，導(dǎo)致無法檢索所有相關(guān)的塊，從而阻礙了法學(xué)碩士制定全面回應(yīng)的能力。過時的信息使問題進一步復(fù)雜化，可能產(chǎn)生不準確的檢索結(jié)果。

Response generation quality presents hallucination chal-lenge, where the model generates answers not grounded in the provided context, as well as issues of irrelevant context and potential toxicity or bias in the model’s output.

響應(yīng)生成質(zhì)量呈現(xiàn)幻覺挑戰(zhàn)，即模型生成的答案不基于所提供的上下文，以及模型輸出中不相關(guān)的上下文和潛在的毒性或偏見問題。

The augmentation process presents its own challenges in effectively integrating context from retrieved passages with the current generation task, potentially leading to disjointed or incoherent output. Redundancy and repetition are also concerns, especially when multiple retrieved passages con-tain similar information, resulting in repetitive content in the generated response.

增強過程在有效地將檢索段落的上下文與當(dāng)前生成任務(wù)集成方面提出了自己的挑戰(zhàn)，可能導(dǎo)致不連貫或不連貫的輸出。冗余和重復(fù)也是一個問題，特別是當(dāng)多個檢索的段落包含相似的信息時，會導(dǎo)致生成的響應(yīng)中出現(xiàn)重復(fù)的內(nèi)容。

Discerning the importance and relevance of multiple re-trieved passages to the generation task is another challenge, requiring the proper balance of each passage’s value. Addi-tionally, reconciling differences in writing styles and tones to ensure consistency in the output is crucial.

辨別多個檢索段落對生成任務(wù)的重要性和相關(guān)性是另一個挑戰(zhàn)，需要適當(dāng)平衡每個段落的價值。此外，協(xié)調(diào)不同的寫作風(fēng)格和語調(diào)，以確保輸出的一致性是至關(guān)重要的。

Lastly, there’s a risk of generation models overly depend-ing on augmented information, potentially resulting in out-puts that merely reiterate the retrieved content without pro-viding new value or synthesized information.

最后，存在生成模型過度依賴于增強信息的風(fēng)險，這可能導(dǎo)致輸出只是重復(fù)檢索的內(nèi)容，而沒有提供新值或合成信息。

3.2 Advanced RAG

Advanced RAG has been developed with targeted enhance-ments to address the shortcomings of Naive RAG. In terms of retrieval quality, Advanced RAG implements pre-retrieval?and post-retrieval strategies. To address the indexing chal-lenges experienced by Naive RAG, Advanced RAG has re-fined its indexing approach using techniques such as slid-ing window, fine-grained segmentation, and metadata. It has also introduced various methods to optimize the retrieval pro-cess [ILIN, 2023].

高級RAG已被開發(fā)，并有針對性地進行了增強，以解決幼稚RAG的缺點。在檢索質(zhì)量方面，Advanced RAG實現(xiàn)了檢索前和檢索后策略。為了解決Naive RAG遇到的索引挑戰(zhàn)，Advanced RAG使用滑動窗口、細粒度分割和元數(shù)據(jù)等技術(shù)重新定義了其索引方法。它還引入了各種方法來優(yōu)化檢索過程[ILIN, 2023]。

Pre-Retrieval Process

Optimizing Data Indexing.The goal of optimizing data index-ing is to enhance the quality of the content being indexed. This involves five primary strategies: enhancing data gran-ularity, optimizing index structures, adding metadata, align-ment optimization, and mixed retrieval.

Pre-Retrieval過程

優(yōu)化數(shù)據(jù)索引。優(yōu)化數(shù)據(jù)索引的目標是提高被索引內(nèi)容的質(zhì)量。這涉及五種主要策略:增強數(shù)據(jù)粒度、優(yōu)化索引結(jié)構(gòu)、添加元數(shù)據(jù)、對齊優(yōu)化和混合檢索。

Enhancing data granularity aims to elevate text standard-ization, consistency, factual accuracy, and rich context to im-prove the RAG system’s performance. This includes remov-ing irrelevant information, dispelling ambiguity in entities and terms, confirming factual accuracy, maintaining context, and updating outdated documents.

增強數(shù)據(jù)粒度旨在提高文本的標準化、一致性、事實準確性和豐富的上下文，從而提高RAG系統(tǒng)的性能。這包括刪除不相關(guān)的信息，消除實體和術(shù)語中的歧義，確認事實的準確性，維護上下文和更新過時的文檔。

Optimizing index structures involves adjusting the size of chunks to capture relevant context, querying across multiple index paths, and incorporating information from the graph structure to capture relevant context by leveraging relation-ships between nodes in a graph data index.

優(yōu)化索引結(jié)構(gòu)包括調(diào)整塊的大小以捕獲相關(guān)上下文，跨多個索引路徑進行查詢，以及通過利用圖數(shù)據(jù)索引中節(jié)點之間的關(guān)系來合并圖結(jié)構(gòu)中的信息以捕獲相關(guān)上下文。

Adding metadata information involves integrating refer-enced metadata, such as dates and purposes, into chunks for filtering purposes, and incorporating metadata like chapters and subsections of references to improve retrieval efficiency.

添加元數(shù)據(jù)信息包括將引用的元數(shù)據(jù)(如日期和用途)集成到塊中以進行過濾，以及將引用的章節(jié)和小節(jié)等元數(shù)據(jù)集成到塊中以提高檢索效率。

Alignment optimization addresses alignment issues and disparities between documents by introducing “hypothetical questions” [Li et al., 2023d] into documents to rectify align-ment issues and differences.

對齊優(yōu)化通過在文檔中引入“假設(shè)問題”[Li等人，2023]來糾正對齊問題和差異，從而解決文檔之間的對齊問題和差異。

Retrieval

During the retrieval stage, the primary focus is on identifying the appropriate context by calculating the similarity between the query and chunks. The embedding model is central to this process. In the advanced RAG, there is potential for op-timization of the embedding models.

檢索

在檢索階段，主要關(guān)注的是通過計算查詢和塊之間的相似性來識別適當(dāng)?shù)纳舷挛?。嵌入模型是這個過程的核心。在高級RAG中，有可能對嵌入模型進行優(yōu)化。

Fine-tuning Embedding. Fine-tuning embedding models significantly impact the relevance of retrieved content in RAG systems. This process involves customizing embedding mod-els to enhance retrieval relevance in domain-specific contexts, especially for professional domains dealing with evolving or rare terms. The BGE embedding model [BAAI, 2023], such as BGE-large-EN developed by BAAI2, is an example of a high-performance embedding model that can be fine-tuned to optimize retrieval relevance. Training data for fine-tuning can be generated using language models like GPT-3.5-turbo to formulate questions grounded on document chunks, which are then used as fine-tuning pairs.

微調(diào)嵌入。微調(diào)嵌入模型會顯著影響RAG系統(tǒng)中檢索內(nèi)容的相關(guān)性。該過程包括自定義嵌入模型，以增強特定領(lǐng)域上下文中的檢索相關(guān)性，特別是對于處理演化或罕見術(shù)語的專業(yè)領(lǐng)域。BGE嵌入模型[BAAI, 2023]，如BAAI2開發(fā)的BGE-large- en，就是一個可以微調(diào)以優(yōu)化檢索相關(guān)性的高性能嵌入模型的例子?？梢允褂肎PT-3.5-turbo等語言模型生成用于微調(diào)的訓(xùn)練數(shù)據(jù)，以制定基于文檔塊的問題，然后將其用作微調(diào)對。

Dynamic Embedding adapts to the context in which words are used, unlike static embedding, which uses a single vec-tor for each word [Karpukhin et al., 2020]. For example, in transformer models like BERT, the same word can have varied embeddings depending on surrounding words. Ope-nAI’s embeddings-ada-02 model3, built upon the principles?of LLMs like GPT, is a sophisticated dynamic embedding model that captures contextual understanding. However, it may not exhibit the same sensitivity to context as the latest full-size language models like GPT-4.

與靜態(tài)嵌入不同，動態(tài)嵌入適應(yīng)單詞使用的上下文，靜態(tài)嵌入為每個單詞使用單個向量[Karpukhin等人，2020]。例如，在像BERT這樣的變壓器模型中，相同的單詞可以根據(jù)周圍的單詞具有不同的嵌入。Ope-nAI的embedding_ada -02模型建立在法學(xué)碩士(如GPT)的原理之上，是一個復(fù)雜的動態(tài)嵌入模型，可以捕獲上下文理解。然而，它可能不會像最新的全尺寸語言模型(如GPT-4)那樣對上下文表現(xiàn)出同樣的敏感性。

Post-Retrieval Process

After retrieving valuable context from the database, it is es-sential to merge it with the query as an input into LLMs while addressing challenges posed by context window limits. Sim-ply presenting all relevant documents to the LLM at once may exceed the context window limit, introduce noise, and hinder the focus on crucial information. Additional processing of the retrieved content is necessary to address these issues.

Post-Retrieval過程

在從數(shù)據(jù)庫中檢索有價值的上下文之后，必須將其與查詢合并，作為llm的輸入，同時解決上下文窗口限制帶來的挑戰(zhàn)。簡單地將所有相關(guān)文件一次性呈現(xiàn)給法學(xué)碩士可能會超出上下文窗口限制，引入噪音，并阻礙對關(guān)鍵信息的關(guān)注。為了解決這些問題，需要對檢索到的內(nèi)容進行額外處理。

Re-Ranking. Re-ranking the retrieved information to re-locate the most relevant content to the edges of the prompt is a key strategy. This concept has been implemented in frameworks such as LlamaIndex4, LangChain5, and HayStack [Blagojevi, 2023]. For example, Diversity Ranker6 prioritizes reordering based on document diversity, while LostInTheMiddleRanker alternates placing the best docu-ment at the beginning and end of the context window. Ad-ditionally, approaches like cohereAI rerank [Cohere, 2023], bge-rerank7, and LongLLMLingua [Jiang et al., 2023a] re-calculate the semantic similarity between relevant text and the query, addressing the challenge of interpreting vector-based simulated searches for semantic similarity.

重新評估。對檢索到的信息重新排序以將最相關(guān)的內(nèi)容重新定位到提示的邊緣是一個關(guān)鍵策略。這個概念已經(jīng)在LlamaIndex4、LangChain5和HayStack等框架中實現(xiàn)[Blagojevi, 2023]。例如，Diversity Ranker6根據(jù)文檔多樣性對重新排序進行優(yōu)先級排序，而LostInTheMiddleRanker則交替將最佳文檔放在上下文窗口的開頭和結(jié)尾。此外，cohereAI rerank [Cohere, 2023]、big -rerank7和LongLLMLingua [Jiang等人，2023]等方法重新計算了相關(guān)文本與查詢之間的語義相似度，解決了解釋基于向量的模擬搜索語義相似度的挑戰(zhàn)。

Prompt Compression. Research indicates that noise in re-trieved documents adversely affects RAG performance. In post-processing, the emphasis lies in compressing irrelevant context, highlighting pivotal paragraphs, and reducing the overall context length. Approaches such as Selective Context and LLMLingua [Litman et al., 2020, Anderson et al., 2022] utilize small language models to calculate prompt mu-tual information or perplexity, estimating element impor-tance. Recomp [Xu et al., 2023a] addresses this by train-ing compressors at different granularities, while Long Context [Xu et al., 2023b] and “Walking in the Memory Maze” [Chen et al., 2023a] design summarization techniques to enhance LLM’s key information perception, particularly in dealing with extensive contexts.

提示壓縮。研究表明，檢索文檔中的噪聲會對RAG性能產(chǎn)生不利影響。在后處理中，重點在于壓縮不相關(guān)的上下文，突出關(guān)鍵段落，減少整體上下文長度。選擇性語境(Selective Context)和LLMLingua等方法[Litman et al.， 2020, Anderson et al.， 2022]利用小語言模型來計算提示互信息或困惑，從而估計元素的重要性。Recomp [Xu等人，2023a]通過在不同粒度上訓(xùn)練壓縮器來解決這個問題，而Long Context [Xu等人，2023b]和“在記憶迷宮中行走”[Chen等人，2023a]設(shè)計了總結(jié)技術(shù)來增強LLM的關(guān)鍵信息感知，特別是在處理廣泛的上下文時。

3.3 Modular RAG

The modular RAG structure diverges from the tradi-tional Naive RAG framework, providing greater versatil-ity and flexibility. It integrates various methods to en-hance functional modules, such as incorporating a search module for similarity retrieval and applying a fine-tuning approach in the retriever [Lin et al., 2023]. Restructured RAG modules [Yu et al., 2022] and iterative methodologies like [Shao et al., 2023] have been developed to address spe-cific issues. The modular RAG paradigm is increasingly be-coming the norm in the RAG domain, allowing for either a serialized pipeline or an end-to-end training approach across multiple modules. The comparison of three RAG paradigms?is depicted in Figure 3. However, Modular RAG is not stan-dalone. Advanced RAG is a specialized form of modular RAG, and further, Naive RAG itself is a special case of Ad-vanced RAG. The relationship among the three paradigms is one of inheritance and development.

模塊化的RAG結(jié)構(gòu)與傳統(tǒng)的樸素RAG框架不同，提供了更大的通用性和靈活性。它集成了各種方法來增強功能模塊，例如在檢索器中加入相似檢索的搜索模塊和應(yīng)用微調(diào)方法[Lin et al.， 2023]。重構(gòu)RAG模塊[Yu et al.， 2022]和迭代方法(如[Shao et al.， 2023])已被開發(fā)用于解決特定問題。模塊化的RAG范例正日益成為RAG領(lǐng)域的規(guī)范，它允許序列化的管道或跨多個模塊的端到端訓(xùn)練方法。圖3描述了三個RAG范例的比較。然而，模塊化RAG并不是獨立的。高級RAG是模塊化RAG的一種特殊形式，此外，幼稚RAG本身是高級RAG的一種特殊情況。三種范式之間是一種繼承與發(fā)展的關(guān)系。

New Modules

Search Module. In contrast to the similarity retrieval in Naive/Advanced RAG, the Search Module is tailored to spe-cific scenarios and incorporates direct searches on additional corpora. This integration is achieved using code generated by the LLM, query languages such as SQL or Cypher, and other custom tools. The data sources for these searches can include search engines, text data, tabular data, and knowledge graphs [Wang et al., 2023d].

新模塊

搜索模塊。與Naive/Advanced RAG中的相似度檢索相比，Search模塊針對特定場景進行了定制，并結(jié)合了對其他語料庫的直接搜索。這種集成是使用LLM生成的代碼、查詢語言(如SQL或Cypher)以及其他自定義工具來實現(xiàn)的。這些搜索的數(shù)據(jù)源可以包括搜索引擎、文本數(shù)據(jù)、表格數(shù)據(jù)和知識圖譜[Wang et al.， 2023]。

Memory Module. This module harnesses the memory ca-pabilities of the LLM to guide retrieval. The approach in-volves identifying memories most similar to the current input. Selfmem [Cheng et al., 2023b] utilizes a retrieval-enhanced generator to create an unbounded memory pool iteratively, combining the “original question” and “dual question”. By employing a retrieval-enhanced generative model that uses its own outputs to improve itself, the text becomes more aligned with the data distribution during the reasoning process. Con-sequently, the model’s own outputs are utilized instead of the training data [Wang et al., 2022a].

內(nèi)存模塊。該模塊利用LLM的內(nèi)存功能來指導(dǎo)檢索。這種方法包括識別與當(dāng)前輸入最相似的記憶。Selfmem [Cheng et al.， 2023b]利用檢索增強生成器迭代創(chuàng)建無界內(nèi)存池，將“原始問題”和“雙重問題”結(jié)合起來。通過使用檢索增強的生成模型，該模型使用自己的輸出來改進自己，文本在推理過程中與數(shù)據(jù)分布更加一致。因此，使用模型自身的輸出來代替訓(xùn)練數(shù)據(jù)[Wang et al.， 2022a]。

Fusion. RAG-Fusion [Raudaschl, 2023]enhances tradi-tional search systems by addressing their limitations through a multi-query approach that expands user queries into multiple, diverse perspectives using an LLM. This approach not only captures the explicit information users seek but also un-covers deeper, transformative knowledge. The fusion pro-cess involves parallel vector searches of both original and expanded queries, intelligent re-ranking to optimize results, and pairing the best outcomes with new queries. This sophis-ticated method ensures search results that align closely with both the explicit and implicit intentions of the user, leading to more insightful and relevant information discovery.

融合。RAG-Fusion [Raudaschl, 2023]通過使用LLM將用戶查詢擴展到多個不同角度的多查詢方法來解決傳統(tǒng)搜索系統(tǒng)的局限性，從而增強了傳統(tǒng)搜索系統(tǒng)。這種方法不僅捕獲了用戶所尋求的明確信息，而且還揭示了更深層次的、具有變革性的知識。融合過程包括對原始查詢和擴展查詢進行并行向量搜索，智能重新排序以優(yōu)化結(jié)果，并將最佳結(jié)果與新查詢配對。這種復(fù)雜的方法確保搜索結(jié)果與用戶的顯性和隱性意圖緊密結(jié)合，從而導(dǎo)致更有洞察力和相關(guān)的信息發(fā)現(xiàn)。

Routing. The RAG system’s retrieval process utilizes di-verse sources, differing in domain, language, and format, which can be either alternated or merged based on the sit-uation [Li et al., 2023b]. Query routing decides the subse-quent action to a user’s query, with options ranging from summarization, searching specific databases, or merging dif-ferent pathways into a single response. The query router also chooses the appropriate data store for the query, which may include various sources like vector stores, graph databases, or relational databases, or a hierarchy of indices—for instance, a summary index and a document block vector index for multi-document storage. The query router’s decision-making is pre-defined and executed via LLMs calls, which direct the query to the chosen index.

路由。RAG系統(tǒng)的檢索過程利用了多種來源，這些來源在領(lǐng)域、語言和格式上都有所不同，可以根據(jù)情況進行交替或合并[Li et al.， 2023b]。查詢路由決定用戶查詢的后續(xù)操作，其選項包括匯總、搜索特定數(shù)據(jù)庫或?qū)⒉煌穆窂胶喜⒌絾蝹€響應(yīng)中。查詢路由器還為查詢選擇適當(dāng)?shù)臄?shù)據(jù)存儲，其中可能包括各種來源，如矢量存儲、圖形數(shù)據(jù)庫或關(guān)系數(shù)據(jù)庫，或者索引層次結(jié)構(gòu)——例如，用于多文檔存儲的摘要索引和文檔塊向量索引。查詢路由器的決策是預(yù)先定義的，并通過llm調(diào)用執(zhí)行，llm調(diào)用將查詢定向到所選的索引。

Predict . It addresses the common issues of redundancy and noise in retrieved content. Instead of directly retrieving from a data source, this module utilizes the LLM to generate the necessary context [Yu et al., 2022]. The content produced by the LLM is more likely to contain pertinent information compared to that obtained through direct retrieval.

預(yù)測。它解決了檢索內(nèi)容中的冗余和噪聲等常見問題。該模塊不是直接從數(shù)據(jù)源中檢索，而是利用LLM生成必要的上下文[Yu et al.， 2022]。與通過直接檢索獲得的內(nèi)容相比，法學(xué)碩士產(chǎn)生的內(nèi)容更有可能包含相關(guān)信息。

Task Adapter. This module focuses on adapting RAG to a variety of downstream tasks. UPRISE automates the retrieval of prompts for zero-shot task inputs from a pre-constructed data pool, thereby enhancing universality across tasks and models [Cheng et al., 2023a]. Meanwhile, PROMPTAGA-TOR [Dai et al., 2022] utilizes LLM as a few-shot query gen-erator and, based on the generated data, creates task-specific retrievers. By leveraging the generalization capability of LLMs, it enables the development of task-specific end-to-end retrievers with minimal examples.

任務(wù)適配器。本模塊側(cè)重于使RAG適應(yīng)各種下游任務(wù)。UPRISE自動從預(yù)構(gòu)建的數(shù)據(jù)池中檢索零shot任務(wù)輸入的提示，從而增強了任務(wù)和模型之間的通用性[Cheng等人，2023a]。同時，PROMPTAGA-TOR [Dai et al.， 2022]利用LLM作為少量查詢生成器，并基于生成的數(shù)據(jù)創(chuàng)建特定于任務(wù)的檢索器。通過利用llm的泛化能力，它可以用最少的示例開發(fā)特定于任務(wù)的端到端檢索器。

New Patterns

The organizational structure of Modular RAG is highly adapt-able, allowing for the substitution or rearrangement of mod-ules within the RAG process to suit specific problem contexts.

新模式

模塊化RAG的組織結(jié)構(gòu)具有高度的適應(yīng)性，允許在RAG過程中替換或重新排列模塊以適應(yīng)特定的問題上下文。

Naive RAG and Advanced RAG can both be considered as being composed of some fixed modules. As illustrated in the figure 3, Naive RAG primarily consists of the “Retrieve” and “Read” modules. A typical pattern of Advanced RAG builds upon the foundation of Naive RAG by adding “Rewrite” and “Rerank” modules. However, on the whole, modular RAG enjoys greater diversity and flexibility.

初級RAG和高級RAG都可以認為是由一些固定的模塊組成的。如圖3所示，Naive RAG主要由“Retrieve”和“Read”模塊組成。高級RAG的典型模式建立在樸素RAG的基礎(chǔ)上，通過添加“重寫”和“重新排序”模塊。但總體而言，模塊化RAG具有更大的多樣性和靈活性。

Current research primarily explores two organizational paradigms. The first involves adding or replacing modules, while the second focuses on adjusting the organizational flow between modules. This flexibility enables tailoring the RAG process to effectively address a wide array of tasks.

目前的研究主要探討了兩種組織范式。前者涉及添加或替換模塊，而后者側(cè)重于調(diào)整模塊之間的組織流程。這種靈活性使RAG過程能夠有效地處理各種任務(wù)。

Adding or Replacing Modules.The strategy of introducing or substituting modules involves maintaining the core struc-ture of the Retrieval-Read process while integrating addi-tional modules to enhance specific functionalities. The RRR model [Ma et al., 2023a] introduces the Rewrite-Retrieve-Read process, utilizing the LLM performance as a reinforce-ment learning incentive for a rewriting module. This enables the rewriter to fine-tune retrieval queries, thereby improving the downstream task performance of the reader.

增加或更換模塊。引入或替換模塊的策略包括維護檢索-讀取過程的核心結(jié)構(gòu)，同時集成其他模塊以增強特定功能。RRR模型[Ma et al.， 2023a]引入了重寫-檢索-讀取過程，利用LLM性能作為重寫模塊的強化學(xué)習(xí)激勵。這使重寫器能夠微調(diào)檢索查詢，從而提高讀取器的下游任務(wù)性能。

Similarly, modules can be selectively swapped in method-ologies like Generate-Read [Yu et al., 2022], where the LLM’s generation module takes the place of the retrieval module. The Recite-Read approach [Sun et al., 2022] trans-forms external retrieval into retrieval from model weights, requiring the LLM to initially memorize task-specific infor-mation and subsequently produce output capable of handling knowledge-intensive natural language processing tasks.

類似地，模塊可以在Generate-Read [Yu et al.， 2022]等方法中選擇性地交換，其中LLM的生成模塊取代了檢索模塊。背誦-閱讀方法[Sun et al.， 2022]將外部檢索轉(zhuǎn)換為從模型權(quán)重中檢索，要求LLM首先記住特定于任務(wù)的信息，然后產(chǎn)生能夠處理知識密集型自然語言處理任務(wù)的輸出。

Adjusting the Flow between Modules. zheIn the realm of module flow adjustment, there is a focus on enhancing the interaction between language models and retrieval mod-els. DSP [Khattab et al., 2022] introduces the Demonstrate-Search-Predict framework, treating the context learning sys-tem as an explicit program rather than a final task prompt, leading to more effective handling of knowledge-intensive tasks. The ITER-RETGEN [Shao et al., 2023] approach uti-lizes generated content to guide retrieval, iteratively im-plementing “retrieval-enhanced generation” and “generation-enhanced retrieval” within a Retrieve-Read-Retrieve-Read flow. This method demonstrates an innovative way of using one module’s output to improve the functionality of another.

調(diào)整模塊間的流程。在模塊流調(diào)整領(lǐng)域，重點是加強語言模型和檢索模型之間的交互。DSP [Khattab等人，2022]引入了演示-搜索-預(yù)測框架，將上下文學(xué)習(xí)系統(tǒng)視為一個明確的程序，而不是最終的任務(wù)提示，從而更有效地處理知識密集型任務(wù)。ITER-RETGEN [Shao等人，2023]方法利用生成的內(nèi)容來指導(dǎo)檢索，在檢索-讀取-檢索-讀取流程中迭代實現(xiàn)“檢索增強生成”和“生成增強檢索”。這種方法展示了一種使用一個模塊的輸出來改進另一個模塊的功能的創(chuàng)新方法。

Optimizing the RAG Pipeline

The optimization of the retrieval process aims to enhance the efficiency and quality of information in RAG systems. Cur-rent research focuses on integrating diverse search technolo-gies, refining retrieval steps, incorporating cognitive back-tracking, implementing versatile query strategies, and lever-aging embedding similarity. These efforts collectively strive to achieve a balance between retrieval efficiency and the depth of contextual information in RAG systems.

RAG管道優(yōu)化

優(yōu)化檢索過程的目的是提高檢索效率和檢索質(zhì)量。目前的研究主要集中在整合多種搜索技術(shù)、優(yōu)化檢索步驟、結(jié)合認知回溯、實現(xiàn)通用查詢策略以及利用老化嵌入相似度等方面。這些努力共同努力實現(xiàn)檢索效率和上下文信息深度在RAG系統(tǒng)之間的平衡。

Hybrid Search Exploration. The RAG system optimizes its performance by intelligently integrating various techniques, including keyword-based search, semantic search, and vec-tor search. This approach leverages the unique strengths of each method to accommodate diverse query types and infor-mation needs, ensuring consistent retrieval of highly relevant and context-rich information. The use of hybrid search serves as a robust supplement to retrieval strategies, thereby enhanc-ing the overall efficacy of the RAG pipeline.

混合搜索探索。RAG系統(tǒng)通過智能集成各種技術(shù)來優(yōu)化其性能，包括基于關(guān)鍵字的搜索、語義搜索和向量搜索。這種方法利用每種方法的獨特優(yōu)勢來適應(yīng)不同的查詢類型和信息需求，確保對高度相關(guān)和上下文豐富的信息進行一致的檢索。使用混合搜索作為檢索策略的強大補充，從而提高了RAG管道的整體效率。

Recursive Retrieval and Query Engine. Recursive retrieval involves acquiring smaller chunks during the initial retrieval phase to capture key semantic meanings. Subsequently, larger chunks containing more contextual information are provided to the LLM in later stages of the process. This two-step re-trieval method helps to strike a balance between efficiency and the delivery of contextually rich responses.

遞歸檢索和查詢引擎。遞歸檢索涉及在初始檢索階段獲取較小的塊以捕獲關(guān)鍵語義。隨后，在流程的后期階段，將向法學(xué)碩士提供包含更多上下文信息的大塊。這種兩步檢索方法有助于在效率和提供上下文豐富的響應(yīng)之間取得平衡。

StepBack-prompt approach encourages the LLM to move away from specific instances and engage in reasoning around broader concepts and principles [Zheng et al., 2023]. Experi-mental results demonstrate a significant performance increase in various challenging, inference-based tasks when backward prompts are used, highlighting their natural adaptability to the RAG process. These retrieval-enhancing steps can be applied both in generating responses to backward prompts and in the final question-answering process.

退步提示方法鼓勵法學(xué)碩士從具體實例中轉(zhuǎn)移出來，圍繞更廣泛的概念和原則進行推理[Zheng等人，2023]。實驗結(jié)果表明，當(dāng)使用向后提示時，在各種具有挑戰(zhàn)性的、基于推理的任務(wù)中，性能顯著提高，突出了它們對RAG過程的自然適應(yīng)性。這些增強檢索的步驟既可以應(yīng)用于生成對向后提示的響應(yīng)，也可以應(yīng)用于最終的問答過程。

Sub-Queries. Depending on the scenario, various query strategies can be employed, such as using query engines provided by frameworks like LlamaIndex, leveraging tree queries, utilizing vector queries, or executing simple sequen-tial querying of chunks.

子查詢。根據(jù)場景的不同，可以采用各種查詢策略，例如使用LlamaIndex等框架提供的查詢引擎、利用樹查詢、利用向量查詢或執(zhí)行簡單的塊順序查詢。

Hypothetical Document Embeddings. HyDE operates on the belief that the answers generated might be closer in the embedding space than a direct query. Using the LLM, HyDE creates a hypothetical document (answer) in response to a query, embeds this document, and uses the resulting em-bedding to retrieve real documents similar to the hypotheti-cal one. Instead of seeking embedding similarity based on the query, this approach focuses on the embedding similar-ity from one answer to another [Gao et al., 2022]. However, it might not consistently produce desirable outcomes, espe-cially when the language model is unfamiliar with the subject matter, potentially leading to more instances with errors.

假設(shè)的文檔嵌入。HyDE相信生成的答案在嵌入空間中可能比直接查詢更接近。使用LLM, HyDE為響應(yīng)查詢創(chuàng)建一個假設(shè)文檔(答案)，嵌入該文檔，并使用生成的嵌入來檢索與假設(shè)文檔相似的真實文檔。該方法不是基于查詢尋求嵌入相似度，而是側(cè)重于從一個答案到另一個答案的嵌入相似度[Gao et al.， 2022]。然而，它可能不會始終產(chǎn)生理想的結(jié)果，特別是當(dāng)語言模型不熟悉主題時，可能會導(dǎo)致更多帶有錯誤的實例。

4 Retrieval

In the context of RAG, it is crucial to efficiently retrieve rel-evant documents from the data source. However, creating a proficient retriever presents significant challenges. This sec-tionelves into three fundamental questions: 1) How can we achieve accurate semantic representations? 2) What methods?can align the semantic spaces of queries and documents? 3) How can the retriever’s output be aligned with the preferences of the Large Language Model?

在RAG上下文中，從數(shù)據(jù)源中有效地檢索相關(guān)事件文檔是至關(guān)重要的。然而，創(chuàng)造一只熟練的尋回犬面臨著巨大的挑戰(zhàn)。本節(jié)分為三個基本問題:1)我們?nèi)绾螌崿F(xiàn)準確的語義表示?2)什么方法可以對齊查詢和文檔的語義空間?3)如何使檢索器的輸出與大語言模型的偏好保持一致?

4.1 Enhancing Semantic Representations

In RAG, the semantic space is essential as it involves the mul-tidimensional mapping of queries and documents. Retrieval accuracy in this semantic space significantly impacts RAG outcomes. This section will present two methods for building accurate semantic spaces.

在RAG中，語義空間是必不可少的，因為它涉及查詢和文檔的多維映射。語義空間的檢索精度顯著影響RAG結(jié)果。本節(jié)將介紹構(gòu)建準確語義空間的兩種方法。

Chunk optimization

When managing external documents, the initial step involves breaking them down into smaller chunks to extract fine-grained features, which are then embedded to represent their semantics. However, embedding overly large or excessively small text chunks may lead to sub-optimal outcomes. There-fore, identifying the optimal chunk size for documents within the corpus is crucial to ensuring the accuracy and relevance of the retrieved results.

塊優(yōu)化

在管理外部文檔時，最初的步驟包括將它們分解為更小的塊，以提取細粒度的特性，然后嵌入這些特性以表示它們的語義。然而，嵌入過大或過小的文本塊可能會導(dǎo)致次優(yōu)結(jié)果。因此，確定語料庫中文檔的最佳塊大小對于確保檢索結(jié)果的準確性和相關(guān)性至關(guān)重要。

Choosing an appropriate chunking strategy requires care-ful consideration of several vital factors, such as the nature of the indexed content, the embedding model and its opti-mal block size, the expected length and complexity of user queries, and the specific application’s utilization of the re-trieved results. For instance, the selection of a chunking model should be based on the content’s length—whether it is longer or shorter. Additionally, different embedding mod-els demonstrate distinct performance characteristics at vary-ing block sizes. For example, sentence-transformer performs better with single sentences, while text-embedding-ada-002 excels with blocks containing 256 or 512 tokens.

選擇適當(dāng)?shù)姆謮K策略需要仔細考慮幾個重要因素，例如索引內(nèi)容的性質(zhì)、嵌入模型及其最優(yōu)塊大小、用戶查詢的預(yù)期長度和復(fù)雜性，以及特定應(yīng)用程序?qū)z索結(jié)果的利用。例如，分塊模型的選擇應(yīng)該基于內(nèi)容的長度——是長還是短。此外，不同的嵌入模型在不同塊大小下表現(xiàn)出不同的性能特征。例如，句子轉(zhuǎn)換器在處理單個句子時表現(xiàn)更好，而text-embedding-ada-002在處理包含256或512個令牌的塊時表現(xiàn)出色。

Additionally, factors like the length and complexity of user input questions, and the specific needs of the application (e.g., semantic search or question answering), have effect on the choice of a chunking strategy. This choice can be directly in-fluenced by the token limits of the selected LLMs, requiring adjustments to the block size. In reality, getting precise query results involves flexibly applying different chunking strate-gies. There is no one-size-fits-all ”best” strategy, only the most appropriate one for a particular context.

此外，用戶輸入問題的長度和復(fù)雜性以及應(yīng)用程序的特定需求(例如，語義搜索或問題回答)等因素也會影響分塊策略的選擇。這種選擇可能直接受到所選llm的令牌限制的影響，需要調(diào)整塊大小。在現(xiàn)實中，獲得精確的查詢結(jié)果需要靈活地應(yīng)用不同的分塊策略。沒有放之四海而皆準的“最佳”策略，只有最適合特定環(huán)境的策略。

Current research in RAG explores various block optimiza-tion techniques aimed at improving both retrieval efficiency and accuracy. One such approach involves the use of slid-ing window technology, enabling layered retrieval by merg-ing globally related information across multiple retrieval pro-cesses. Another strategy, known as the “small2big” method, utilizes small text blocks during the initial search phase and subsequently provides larger related text blocks to the lan-guage model for processing.

當(dāng)前RAG的研究探索了各種塊優(yōu)化技術(shù)，旨在提高檢索效率和準確性。其中一種方法涉及使用滑動窗口技術(shù)，通過跨多個檢索過程合并全局相關(guān)信息來實現(xiàn)分層檢索。另一種策略，稱為“small2big”方法，在初始搜索階段利用小文本塊，隨后向語言模型提供更大的相關(guān)文本塊進行處理。

The abstract embedding technique prioritizes top K re-trieval based on document abstracts (or summaries), offering a comprehensive understanding of the entire document con-text. Additionally, the metadata filtering technique leverages document metadata to enhance the filtering process. An in-novative approach, the graph indexing technique, transforms entities and relationships into nodes and connections, sig-nificantly improving relevance, particularly in the context of multi-hop problems.

摘要嵌入技術(shù)根據(jù)文檔摘要(或摘要)對top K檢索進行優(yōu)先級排序，從而提供對整個文檔上下文的全面理解。此外，元數(shù)據(jù)過濾技術(shù)利用文檔元數(shù)據(jù)來增強過濾過程。一種創(chuàng)新的方法，圖索引技術(shù)，將實體和關(guān)系轉(zhuǎn)換為節(jié)點和連接，顯著提高相關(guān)性，特別是在多跳問題的背景下。

The combination of these diverse methods has led to no-table advancements, resulting in enhanced retrieval outcomes and improved performance for RAG.

這些不同方法的組合導(dǎo)致了無表的進步，從而增強了檢索結(jié)果并改進了RAG的性能。

Fine-tuning Embedding Models

Once the appropriate size of chunks is determined, the next crucial step involves embedding these chunks and the query into the semantic space using an embedding model. The effectiveness of the embedding is critical as it impacts the model’s ability to represent the corpus. Recent re-search has introduced prominent embedding models such as AngIE, Voyage, BGE,etc [Li and Li, 2023, VoyageAI, 2023, BAAI, 2023]. These models have undergone pre-training on extensive corpora. However, their capability to accurately capture domain-specific information may be limited when ap-plied to specialized domains.

微調(diào)嵌入模型

一旦確定了適當(dāng)?shù)膲K大小，下一個關(guān)鍵步驟是使用嵌入模型將這些塊和查詢嵌入到語義空間中。嵌入的有效性至關(guān)重要，因為它影響模型表示語料庫的能力。最近的研究引入了AngIE、Voyage、BGE等突出的嵌入模型[Li and Li, 2023, VoyageAI, 2023, BAAI, 2023]。這些模型在廣泛的語料庫上進行了預(yù)訓(xùn)練。然而，當(dāng)應(yīng)用于特定領(lǐng)域時，它們準確捕獲特定領(lǐng)域信息的能力可能會受到限制。

Moreover, task-specific fine-tuning of embedding models is essential to ensure that the model comprehends the user query in terms of content relevance. A model without fine-tuning may not adequately address the requirements of a spe-cific task. Consequently, fine-tuning an embedding model be-comes crucial for downstream applications. There are two primary paradigms in embedding fine-tuning methods.

此外，嵌入模型的特定任務(wù)微調(diào)對于確保模型從內(nèi)容相關(guān)性方面理解用戶查詢至關(guān)重要。沒有微調(diào)的模型可能無法充分滿足特定任務(wù)的需求。因此，對嵌入模型進行微調(diào)對于下游應(yīng)用程序至關(guān)重要。在嵌入微調(diào)方法中有兩種主要的范式。

Domain Knowledge Fine-tuning. To ensure that an embed-ding model accurately captures domain-specific information, it is imperative to utilize domain-specific datasets for fine-tuning. This process diverges from standard language model fine-tuning, chiefly in the nature of the datasets involved. Typically, the dataset for embedding model fine-tuning en-compasses three principal elements: queries, a corpus, and relevant documents. The model employs these queries to identify pertinent documents within the corpus. The effi-cacy of the model is then gauged based on its ability to re-trieve these relevant documents in response to the queries. The dataset construction, model fine-tuning, and evalua-tion phases each present distinct challenges. The LlamaIn-dex [Liu, 2023] introduces a suite of pivotal classes and func-tions designed to enhance the embedding model fine-tuning workflow, thereby simplifying these intricate processes. By curating a corpus infused with domain knowledge and lever-aging the methodologies offered, one can adeptly fine-tune an embedding model to align closely with the specific require-ments of the target domain.

領(lǐng)域知識微調(diào)。為了確保嵌入模型準確地捕獲特定于領(lǐng)域的信息，必須利用特定于領(lǐng)域的數(shù)據(jù)集進行微調(diào)。這個過程與標準語言模型微調(diào)不同，主要在于所涉及的數(shù)據(jù)集的性質(zhì)。通常，用于嵌入模型微調(diào)的數(shù)據(jù)集包含三個主要元素:查詢、語料庫和相關(guān)文檔。該模型使用這些查詢來識別語料庫中的相關(guān)文檔。然后，根據(jù)響應(yīng)查詢而重新檢索這些相關(guān)文檔的能力來衡量模型的有效性。數(shù)據(jù)集構(gòu)建、模型微調(diào)和評估階段各有不同的挑戰(zhàn)。LlamaIn-dex [Liu, 2023]引入了一套關(guān)鍵類和函數(shù)，旨在增強嵌入模型微調(diào)工作流程，從而簡化這些復(fù)雜的過程。通過管理充滿領(lǐng)域知識的語料庫并利用所提供的方法，可以熟練地微調(diào)嵌入模型，使其與目標領(lǐng)域的特定需求緊密結(jié)合。

Fine-tuning for Downstream Tasks. Fine-tuning embed-ding models for downstream tasks is a critical step in en-hancing model performance. In the realm of utilizing RAG for these tasks, innovative methods have emerged to fine-tune embedding models by harnessing the capabilities of LLMs. For example, PROMPTAGATOR [Dai et al., 2022] utilizes the LLM as a few-shot query generator to cre-ate task-specific retrievers, addressing challenges in super-vised fine-tuning, particularly in data-scarce domains. An-other approach, LLM-Embedder [Zhang et al., 2023a], ex-ploits LLMs to generate reward signals for data across mul-tiple downstream tasks. The retriever is fine-tuned with two types of supervised signals: hard labels for the dataset and soft rewards from the LLMs. This dual-signal approach fos-ters a more effective fine-tuning process, tailoring the embed-ding model to diverse downstream applications.

對下游任務(wù)進行微調(diào)。對下游任務(wù)的嵌入模型進行微調(diào)是提高模型性能的關(guān)鍵步驟。在利用RAG完成這些任務(wù)的領(lǐng)域中，通過利用llm的功能來微調(diào)嵌入模型的創(chuàng)新方法已經(jīng)出現(xiàn)。例如，PROMPTAGATOR [Dai等人，2022]利用LLM作為少量查詢生成器來創(chuàng)建特定于任務(wù)的檢索器，解決了監(jiān)督微調(diào)中的挑戰(zhàn)，特別是在數(shù)據(jù)稀缺領(lǐng)域。另一種方法是LLM-Embedder [Zhang等，2023a]，利用llm為跨多個下游任務(wù)的數(shù)據(jù)生成獎勵信號。檢索器使用兩種類型的監(jiān)督信號進行微調(diào):數(shù)據(jù)集的硬標簽和來自llm的軟獎勵。這種雙信號方法實現(xiàn)了更有效的微調(diào)過程，使嵌入模型適應(yīng)不同的下游應(yīng)用。

While these methods improve semantic representation by incorporating domain knowledge and task-specific fine-tuning, retrievers may not always exhibit optimal compatibil-ity with certain LLMs. To address this, some researchers have explored direct supervision of the fine-tuning process using feedback from LLMs. This direct supervision seeks to align the retriever more closely with the LLM, thereby improving performance on downstream tasks. A more comprehensive discussion on this topic is presented in Section 4.3.

雖然這些方法通過結(jié)合領(lǐng)域知識和特定于任務(wù)的微調(diào)來改進語義表示，但檢索器可能并不總是表現(xiàn)出與某些llm的最佳兼容性。為了解決這個問題，一些研究人員探索了利用法學(xué)碩士的反饋直接監(jiān)督微調(diào)過程。這種直接監(jiān)督旨在使檢索器更緊密地與LLM保持一致，從而提高下游任務(wù)的性能。關(guān)于這個主題的更全面的討論將在第4.3節(jié)中介紹。

4.2 Aligning Queries and Documents

In the context of RAG applications, retrievers may utilize a single embedding model for encoding both the query and the documents, or employ separate models for each. Addi-tionally, the user’s original query may suffer from imprecise phrasing and lack of semantic information. Therefore, it is crucial to align the semantic space of the user’s query with those of the documents. This section introduces two funda-mental techniques aimed at achieving this alignment.

在RAG應(yīng)用程序的上下文中，檢索器可以使用單個嵌入模型對查詢和文檔進行編碼，或者為每個模型使用單獨的模型。此外，用戶的原始查詢可能會受到措辭不精確和缺乏語義信息的影響。因此，將用戶查詢的語義空間與文檔的語義空間保持一致是至關(guān)重要的。本節(jié)將介紹兩種旨在實現(xiàn)這種一致性的基本技術(shù)。

Query Rewriting

Query rewriting is a fundamental approach for aligning the semantics of a query and a document. Methods such as Query2Doc and ITER-RETGEN leverage LLMs to create a pseudo-document by combining the origi-nal query with additional guidance [Wang et al., 2023c, Shao et al., 2023]. HyDE constructs query vectors using textual cues to generate a “hypothetical” document captur-ing essential patterns [Gao et al., 2022]. RRR introduces a framework that reverses the traditional retrieval and read-ing order, focusing on query rewriting [Ma et al., 2023a]. STEP-BACKPROMPTING enables LLMs to perform ab-stract reasoning and retrieval based on high-level con-cepts [Zheng et al., 2023]. Additionally, the multi-query re-trieval method utilizes LLMs to generate and execute multiple search queries simultaneously, advantageous for addressing complex problems with multiple sub-problems.

查詢重寫

查詢重寫是對齊查詢和文檔語義的基本方法。Query2Doc和ITER-RETGEN等方法利用llm通過將原始查詢與附加指導(dǎo)相結(jié)合來創(chuàng)建偽文檔[Wang et al.， 2023c, Shao et al.， 2023]。HyDE使用文本線索構(gòu)建查詢向量，以生成捕獲基本模式的“假設(shè)”文檔[Gao等人，2022]。RRR引入了一個框架，該框架逆轉(zhuǎn)了傳統(tǒng)的檢索和讀取順序，重點是查詢重寫[Ma et al.， 2023a]。step - backprompts使llm能夠基于高級概念執(zhí)行抽象推理和檢索[Zheng等，2023]。此外，多查詢重新檢索方法利用llm同時生成和執(zhí)行多個搜索查詢，有利于解決包含多個子問題的復(fù)雜問題。

Embedding Transformation

Beyond broad strategies such as query rewriting, there exist more granular techniques specifically designed for embed-ding transformations. LlamaIndex [Liu, 2023] exemplifies this by introducing an adapter module that can be integrated following the query encoder. This adapter facilitates fine-tuning, thereby optimizing the representation of query em-beddings to map them into a latent space that is more closely aligned with the intended tasks.

嵌入轉(zhuǎn)換

除了諸如查詢重寫之類的廣泛策略之外，還有專門為嵌入轉(zhuǎn)換設(shè)計的更細粒度的技術(shù)。LlamaIndex [Liu, 2023]通過引入一個可以集成在查詢編碼器之后的適配器模塊來舉例說明這一點。這個適配器促進了微調(diào)，從而優(yōu)化了查詢嵌入的表示，將它們映射到與預(yù)期任務(wù)更緊密結(jié)合的潛在空間中。

The challenge of aligning queries with structured exter-nal documents, particularly when addressing the incongruity between structured and unstructured data, is addressed by SANTA [Li et al., 2023d]. It enhances the retriever’s sen-sitivity to structured information through two pre-training strategies: first, by leveraging the intrinsic alignment between structured and unstructured data to inform contrastive learn-ing in a structured-aware pre-training scheme; and second, by implementing Masked Entity Prediction. The latter utilizes an entity-centric masking strategy that encourages language models to predict and fill in the masked entities, thereby fos-tering a deeper understanding of structured data.

將查詢與結(jié)構(gòu)化外部文檔對齊的挑戰(zhàn)，特別是在處理結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的不一致性時，SANTA解決了這個問題[Li等人，2023]。它通過兩種預(yù)訓(xùn)練策略來提高檢索器對結(jié)構(gòu)化信息的敏感性:第一，利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的內(nèi)在一致性，在結(jié)構(gòu)化感知預(yù)訓(xùn)練方案中通知對比學(xué)習(xí);第二，通過實現(xiàn)屏蔽實體預(yù)測。后者利用以實體為中心的屏蔽策略，鼓勵語言模型預(yù)測和填充被屏蔽的實體，從而促進對結(jié)構(gòu)化數(shù)據(jù)的更深入理解。

The issue of aligning queries with structured exter-nal documents, especially when dealing with the dispar-ity between structured and unstructured data, is tackled by SANTA [Li et al., 2023d]. This approach improves the re-triever’s ability to recognize structured information through two pre-training strategies: firstly, by utilizing the inher-ent alignment between structured and unstructured data to guide contrastive learning in a structured-aware pre-training scheme; and secondly, by employing Masked Entity Predic-tion. The latter uses an entity-centric masking strategy to prompt language models to predict and complete the masked entities, thus promoting a more profound comprehension of structured data.

將查詢與結(jié)構(gòu)化外部文檔對齊的問題，特別是在處理結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的差異時，由SANTA解決[Li等人，2023]。該方法通過兩種預(yù)訓(xùn)練策略提高了檢索器識別結(jié)構(gòu)化信息的能力:第一，利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的內(nèi)在一致性來指導(dǎo)結(jié)構(gòu)化感知預(yù)訓(xùn)練方案中的對比學(xué)習(xí);其次，采用屏蔽實體預(yù)測。后者使用以實體為中心的掩蔽策略來提示語言模型預(yù)測和完成掩蔽實體，從而促進對結(jié)構(gòu)化數(shù)據(jù)的更深刻理解。

4.3 Aligning Retriever and LLM

In the RAG pipeline, enhancing retrieval hit rate through var-ious techniques may not necessarily improve the final out-come, as the retrieved documents may not align with the spe-cific requirements of the LLMs. Therefore, this section in-troduces two methods aimed at aligning the retriever outputs with the preferences of the LLMs.

在RAG管道中，通過各種技術(shù)提高檢索命中率不一定會改善最終結(jié)果，因為檢索的文檔可能與llm的特定需求不一致。因此，本節(jié)將介紹兩種方法，旨在使檢索器輸出與llm的首選項保持一致。

Fine-tuning Retrievers

Several studies utilize feedback signals from LLMs to refine retrieval models. For instance, AAR [Yu et al., 2023b] intro-duces supervisory signals for a pre-trained retriever using an encoder-decoder architecture. This is achieved by identifying the LM’s preferred documents through FiD cross-attention scores. Subsequently, the retriever undergoes fine-tuning with hard negative sampling and standard cross-entropy loss. Ultimately, the refined retriever can be directly applied to en-hance unseen target LMs, resulting in improved performance in the target task. Additionally, it is suggested that LLMs may have a preference for focusing on readable rather than information-rich documents.

微調(diào)獵犬

一些研究利用llm的反饋信號來完善檢索模型。例如，AAR [Yu等人，2023b]使用編碼器-解碼器架構(gòu)為預(yù)訓(xùn)練的檢索器引入監(jiān)視信號。這是通過FiD交叉注意分數(shù)來識別LM的首選文檔來實現(xiàn)的。隨后，通過硬負采樣和標準交叉熵損失對尋回犬進行微調(diào)。最終，改進后的檢索器可以直接用于增強未見目標lm，從而提高目標任務(wù)的性能。此外，有人建議法學(xué)碩士可能更傾向于關(guān)注可讀而不是信息豐富的文檔。

REPLUG [Shi et al., 2023] utilizes a retriever and an LLM to calculate the probability distributions of the retrieved doc-uments and then performs supervised training by computing the KL divergence. This straightforward and effective train-ing method enhances the performance of the retrieval model by using an LM as the supervisory signal, eliminating the need for specific cross-attention mechanisms.

REPLUG [Shi et al.， 2023]利用檢索器和LLM計算檢索文檔的概率分布，然后通過計算KL散度進行監(jiān)督訓(xùn)練。這種簡單有效的訓(xùn)練方法通過使用LM作為監(jiān)督信號來提高檢索模型的性能，消除了對特定交叉注意機制的需要。

UPRISE [Cheng et al., 2023a] also employs frozen LLMs to fine-tune the prompt retriever. Both the LLM and the re-triever take prompt-input pairs as inputs and utilize the scores provided by the LLM to supervise the retriever’s training, ef-fectively treating the LLM as a dataset labeler. In addition, Atlas [Izacard et al., 2022] proposes four methods of super-vised fine-tuning embedding models:

>>Attention Distillation. This approach employs cross-attention scores generated by the LLM during output to distill the model’s knowledge.

>>EMDR2. By using the Expectation-Maximization algo-rithm, this method trains the model with retrieved docu-ments as latent variables.

>>Perplexity Distillation directly trains the model using the perplexity of generated tokens as an indicator.

>>LOOP. This method presents a novel loss function based on the impact of document deletion on LLM prediction, offering an efficient training strategy to better adapt the model to specific tasks.

UPRISE [Cheng et al.， 2023a]也使用凍結(jié)llm對提示檢索器進行微調(diào)。LLM和尋回犬都將提示輸入對作為輸入，并利用LLM提供的分數(shù)來監(jiān)督尋回犬的訓(xùn)練，有效地將LLM視為數(shù)據(jù)集標注器。此外，Atlas [Izacard et al.， 2022]提出了四種監(jiān)督微調(diào)嵌入模型的方法:

> >注意蒸餾。該方法利用LLM在輸出過程中生成的交叉注意分數(shù)來提取模型的知識。

> > EMDR2。該方法采用期望最大化算法，以檢索到的文檔作為潛在變量對模型進行訓(xùn)練。

Perplexity Distillation直接使用生成的token的Perplexity作為指標來訓(xùn)練模型。

> >循環(huán)。該方法提出了一種新的基于文檔刪除對LLM預(yù)測影響的損失函數(shù)，提供了一種有效的訓(xùn)練策略，使模型更好地適應(yīng)特定的任務(wù)。

These approaches aim to improve the synergy between the retriever and the LLM, leading to enhanced retrieval perfor-mance and more accurate responses to user inquiries.

這些方法旨在提高檢索器和LLM之間的協(xié)同作用，從而提高檢索性能并更準確地響應(yīng)用戶查詢。

Adapters

Fine-tuning models may present challenges, such as integrat-ing functionality through an API or addressing constraints arising from limited local computational resources. Con-sequently, some approaches opt to incorporate an external adapter to aid in alignment.

適配器

微調(diào)模型可能會帶來挑戰(zhàn)，例如通過API集成功能或解決由有限的本地計算資源引起的約束。因此，一些方法選擇合并外部適配器來幫助校準。

PRCA trains the adapter through a context extraction phase and a reward-driven phase. The retriever’s out-put is then optimized using a token-based autoregres-sive strategy [Yang et al., 2023b]. The token filtering ap-proach employs cross-attention scores to efficiently fil-ter tokens, selecting only the highest-scoring input to-kens [Berchansky et al., 2023].RECOMP introduces both ex-tractive and generative compressors for summary generation. These compressors either select relevant sentences or syn-thesize document information, creating summaries tailored to multi-document queries [Xu et al., 2023a].

PRCA通過上下文提取階段和獎勵驅(qū)動階段訓(xùn)練適配器。然后使用基于令牌的自回歸策略對檢索器的輸出進行優(yōu)化[Yang等人，2023b]。令牌過濾方法采用交叉注意分數(shù)來有效地過濾令牌，只選擇得分最高的輸入令牌[Berchansky等人，2023]。RECOMP引入了抽取壓縮器和生成壓縮器來生成摘要。這些壓縮器要么選擇相關(guān)句子，要么合成文檔信息，創(chuàng)建適合多文檔查詢的摘要[Xu等人，2023a]。

Furthermore, PKG introduces an innovative method for in-tegrating knowledge into white-box models via directive fine-tuning [Luo et al., 2023]. In this approach, the retriever mod-ule is directly substituted to generate relevant documents ac-cording to a query. This method assists in addressing the dif-ficulties encountered during the fine-tuning process and en-hances model performance.

此外，PKG引入了一種通過指令微調(diào)將知識集成到白盒模型中的創(chuàng)新方法[Luo等人，2023]。在這種方法中，直接替換檢索器模塊，根據(jù)查詢生成相關(guān)文檔。該方法有助于解決在微調(diào)過程中遇到的困難，并提高模型性能。

5 Generation

A crucial component of RAG is its generator, which is re-sponsible for converting retrieved information into coherent and fluent text. Unlike traditional language models, RAG’s generator sets itself apart by improving accuracy and rele-vance via the incorporation of retrieved data. In RAG, the generator’s input encompasses not only typical contextual in-formation but also relevant text segments obtained through the retriever. This comprehensive input enables the generator to gain a deep understanding of the question’s context, result-ing in more informative and contextually relevant responses.

RAG的一個關(guān)鍵組件是它的生成器，它負責(zé)將檢索到的信息轉(zhuǎn)換成連貫流暢的文本。與傳統(tǒng)的語言模型不同，RAG的生成器通過整合檢索到的數(shù)據(jù)來提高準確性和相關(guān)性，從而使自己與眾不同。在RAG中，生成器的輸入不僅包括典型的上下文信息，還包括通過檢索器獲得的相關(guān)文本片段。這種全面的輸入使生成器能夠深入了解問題的上下文，從而產(chǎn)生更多信息和上下文相關(guān)的響應(yīng)。

Furthermore, the generator is guided by the retrieved text to ensure coherence between the generated content and the ob-tained information. The diverse input data has led to targeted efforts during the generation phase, all aimed at refining the adaptation of the large model to the input data derived from queries and documents. In the following subsections, we will explore the introduction of the generator by delving into as-pects of post-retrieval processing and fine-tuning.

此外，生成器由檢索文本引導(dǎo)，以確保生成的內(nèi)容與獲取的信息之間的一致性。不同的輸入數(shù)據(jù)導(dǎo)致在生成階段進行有針對性的工作，所有這些工作都旨在改進大型模型對來自查詢和文檔的輸入數(shù)據(jù)的適應(yīng)。在接下來的小節(jié)中，我們將通過深入研究檢索后處理和微調(diào)的各個方面來探討生成器的介紹。

5.1 Post-retrieval with Frozen LLM

In the realm of untunable LLMs , many studies rely on well-established models like GPT-4 [OpenAI, 2023] to harness their comprehensive internal knowledge for systematically synthesizing retrieved information from various documents.

在不可調(diào)法學(xué)碩士領(lǐng)域，許多研究依賴于完善的模型，如GPT-4 [OpenAI, 2023]來利用其全面的內(nèi)部知識，系統(tǒng)地綜合從各種文檔中檢索到的信息。

However, challenges persist with these large models, includ-ing limitations on context length and susceptibility to redun-dant information. To tackle these issues, certain research en-deavors have turned their focus to post-retrieval processing.

然而，這些大型模型仍然存在挑戰(zhàn)，包括上下文長度的限制和對冗余信息的敏感性。為了解決這些問題，一些研究人員將重點轉(zhuǎn)向了檢索后處理。

Post-retrieval processing involves treating, filtering, or op-timizing the relevant information retrieved by the retriever from a large document database. Its main goal is to enhance the quality of retrieval results, aligning them more closely with user needs or subsequent tasks. It can be viewed as a reprocessing of the documents obtained during the retrieval phase. Common operations in post-retrieval processing typi-cally include information compression and result reranking.

檢索后處理包括處理、過濾或優(yōu)化檢索器從大型文檔數(shù)據(jù)庫檢索到的相關(guān)信息。它的主要目標是提高檢索結(jié)果的質(zhì)量，使它們更貼近用戶需求或后續(xù)任務(wù)。它可以看作是對檢索階段獲得的文檔的再處理。檢索后處理中的常見操作通常包括信息壓縮和結(jié)果重新排序。

Information Compression

The retriever excels at retrieving relevant information from a vast knowledge base, but managing the substantial amount of information within retrieval documents is a challenge. Ongo-ing research aims to extend the context length of large lan-guage models to tackle this issue. However, current large models still struggle with context limitations. Therefore, there are scenarios where condensing information becomes necessary. Information condensation is significant for reduc-ing noise, addressing context length restrictions, and enhanc-ing generation effects.

信息壓縮

檢索器擅長從龐大的知識庫中檢索相關(guān)信息，但是管理檢索文檔中的大量信息是一個挑戰(zhàn)。正在進行的研究旨在擴展大型語言模型的上下文長度來解決這個問題。然而，當(dāng)前的大型模型仍然與上下文限制作斗爭。因此，在某些情況下，壓縮信息是必要的。信息凝聚對于降低噪聲、解決上下文長度限制和增強生成效果具有重要意義。

PRCA tackled this issue by training an information ex-tractor [Yang et al., 2023b]. In the context extraction phase, when provided with an input text Sinput, it is capable of producing an output sequence Cextracted that represents the condensed context from the input document. The train-ing process is designed to minimize the difference between Cextracted and the actual context Ctruth.

PRCA通過培訓(xùn)信息提取拖拉機解決了這個問題[Yang等，2023b]。在上下文提取階段，當(dāng)提供輸入文本Sinput時，它能夠生成一個輸出序列，該序列表示從輸入文檔中提取的濃縮上下文。訓(xùn)練過程的目的是盡量減少提取和實際上下文之間的差異。

Similarly, RECOMP adopts a comparable approach by training an information condenser using contrastive learn-ing [Xu et al., 2023a]. Each training data point consists of one positive sample and five negative samples, and the en-coder undergoes training using contrastive loss throughout this process [Karpukhin et al., 2020] .

類似地，RECOMP采用了一種類似的方法，即使用對比學(xué)習(xí)來訓(xùn)練信息收集器[Xu et al.， 2023a]。每個訓(xùn)練數(shù)據(jù)點由一個正樣本和五個負樣本組成，編碼器在整個過程中使用對比損失進行訓(xùn)練[Karpukhin et al.， 2020]。

Another study has taken a different approach by aim-ing to reduce the number of documents in order to im-prove the accuracy of the model’s answers. In the study by [Ma et al., 2023b], they propose the “Filter-Reranker” paradigm, which combines the strengths of LLMs and Small Language Models (SLMs). In this paradigm, SLMs serve as filters, while LLMs function as reordering agents. The re-search shows that instructing LLMs to rearrange challeng-ing samples identified by SLMs leads to significant improve-ments in various Information Extraction (IE) tasks.

另一項研究采取了不同的方法，旨在減少文件的數(shù)量，以提高模型答案的準確性。在[Ma et al.， 2023b]的研究中，他們提出了“Filter-Reranker”范式，該范式結(jié)合了llm和小語言模型(Small Language Models, slm)的優(yōu)勢。在這個范例中，slm充當(dāng)過濾器，而llm充當(dāng)重新排序代理。研究表明，指導(dǎo)llm重新排列由slm識別的具有挑戰(zhàn)性的樣本可以顯著改善各種信息提取(IE)任務(wù)。

Reranking

The re-ranking model is pivotal in optimizing the document set retrieved from the retriever. Language models often face performance declines when additional context is introduced, and re-ranking effectively addresses this issue. The core con-cept involves rearranging document records to prioritize the most relevant items at the top, thereby limiting the total num-ber of documents. This not only resolves the challenge of context window expansion during retrieval but also enhances retrieval efficiency and responsiveness.

Reranking

重新排序模型是優(yōu)化從檢索器檢索到的文檔集的關(guān)鍵。當(dāng)引入額外的上下文時，語言模型經(jīng)常面臨性能下降的問題，重新排序可以有效地解決這個問題。核心概念包括重新排列文檔記錄，將最相關(guān)的項放在最上面，從而限制文檔的總數(shù)。這既解決了檢索過程中上下文窗口展開的難題，又提高了檢索效率和響應(yīng)速度。

The re-ranking model assumes a dual role throughout the information retrieval process, functioning as both an?optimizer and a refiner. It provides more effective and accurate input for subsequent language model process-ing [Zhuang et al., 2023].

重新排序模型在整個信息檢索過程中扮演雙重角色，既充當(dāng)優(yōu)化器，又充當(dāng)精煉器。它為后續(xù)的語言模型處理提供了更有效和準確的輸入[Zhuang等，2023]。

Contextual compression is incorporated into the reorder-ing process to offer more precise retrieval information. This method entails reducing the content of individual documents and filtering the entire document, with the ultimate goal of presenting the most relevant information in the search results for a more focused and accurate display of pertinent content.

上下文壓縮被整合到重新排序過程中，以提供更精確的檢索信息。這種方法需要減少單個文檔的內(nèi)容并過濾整個文檔，其最終目標是在搜索結(jié)果中顯示最相關(guān)的信息，以便更集中、更準確地顯示相關(guān)內(nèi)容。

5.2 Fine-tuning LLM for RAG

Optimizing the generator within the RAG model is a critical aspect of its architecture. The generator’s role is to take the retrieved information and produce relevant text, forming the final output of the model. The optimization of the generator aims to ensure that the generated text is both natural and ef-fectively leverages the retrieved documents to better meet the user’s query needs.

在RAG模型中優(yōu)化生成器是其體系結(jié)構(gòu)的一個關(guān)鍵方面。生成器的作用是獲取檢索到的信息并生成相關(guān)文本，形成模型的最終輸出。生成器的優(yōu)化旨在確保生成的文本既自然又有效地利用檢索到的文檔來更好地滿足用戶的查詢需求。

In standard LLMs generation tasks, the input typically consists of a query. RAG stands out by incorporating not only a query but also various retrieved documents (struc-tured/unstructured) by the retriever into the input. This ad-ditional information can significantly influence the model’s understanding, particularly for smaller models. In such cases, fine-tuning the model to adapt to the input of both query and retrieved documents becomes crucial. Before presenting the input to the fine-tuned model, post-retrieval processing usu-ally occurs for the documents retrieved by the retriever. It is essential to note that the fine-tuning method for the genera-tor in RAG aligns with the general fine-tuning approach for LLMs. In the following, we will briefly describe some rep-resentative works involving data (formatted/unformatted) and optimization functions.

在標準llm生成任務(wù)中，輸入通常由查詢組成。RAG的突出之處在于，它不僅將查詢，還將檢索器檢索到的各種文檔(結(jié)構(gòu)化/非結(jié)構(gòu)化)合并到輸入中。這些附加信息可以顯著地影響模型的理解，特別是對于較小的模型。在這種情況下，對模型進行微調(diào)以適應(yīng)查詢和檢索文檔的輸入變得至關(guān)重要。在將輸入呈現(xiàn)給微調(diào)模型之前，通常會對檢索器檢索到的文檔進行檢索后處理。必須注意的是，RAG中用于genera-tor的微調(diào)方法與用于llm的通用微調(diào)方法是一致的。下面，我們將簡要介紹一些涉及數(shù)據(jù)(格式化/未格式化)和優(yōu)化函數(shù)的代表性工作。

General Optimization Process

As part of the general optimization process, the training data typically consists of input-output pairs, aiming to train the model to produce the output y given the input x. In the work of Self-Mem [Cheng et al., 2023b], a traditional training process is employed, where given the input x, relevant documents z are retrieved (selecting Top-1 in the paper), and after integrating (x, z), the model generates the output y. The paper utilizes two common paradigms for fine-tuning, namely Joint-Encoder and Dual-Encoder [Arora et al., 2023, Wang et al., 2022b, Lewis et al., 2020, Xia et al., 2019, Cai et al., 2021, Cheng et al., 2022].

一般優(yōu)化過程

作為一般優(yōu)化過程的一部分，訓(xùn)練數(shù)據(jù)通常由輸入輸出對組成，目的是訓(xùn)練模型在給定輸入x的情況下產(chǎn)生輸出y。在Self-Mem [Cheng et al.， 2023b]的工作中，采用了傳統(tǒng)的訓(xùn)練過程，給定輸入x，檢索相關(guān)文檔z(在本文中選擇Top-1)，在對(x, z)進行積分后，模型生成輸出y。本文采用了兩種常見的范式進行微調(diào):即聯(lián)合編碼器和雙編碼器[Arora等，2023,Wang等，2022b, Lewis等，2020,Xia等，2019,Cai等，2021,Cheng等，2022]。

In the Joint-Encoder paradigm, a standard model based on an encoder-decoder is used. Here, the encoder initially en-codes the input, and the decoder, through attention mecha-nisms, combines the encoded results to generate tokens in an autoregressive manner. On the other hand, in the Dual-Encoder paradigm, the system sets up two independent en-coders, with each encoder encoding the input (query, con-text) and the document, respectively. The resulting out-puts undergo bidirectional cross-attention processing by the decoder in sequence. Both architectures utilize the Trans-former [Vaswani et al., 2017] as the foundational block and optimize with Negative Log-Likelihood loss.

在聯(lián)合編碼器范例中，使用了基于編碼器-解碼器的標準模型。在這里，編碼器最初對輸入進行編碼，而解碼器通過注意機制組合編碼結(jié)果，以自回歸的方式生成標記。另一方面，在雙編碼器范例中，系統(tǒng)設(shè)置了兩個獨立的編碼器，每個編碼器分別編碼輸入(查詢、上下文)和文檔。由此產(chǎn)生的輸出由解碼器按順序進行雙向交叉注意處理。這兩種架構(gòu)都使用transformer [Vaswani等人，2017]作為基礎(chǔ)塊，并使用負對數(shù)似然損失進行優(yōu)化。

Utilizing Contrastive Learning

In the phase of preparing training data for language mod-els, interaction pairs of input and output are usually created. This traditional method can lead to ”exposure bias,” where the model is only trained on individual, correct output ex-amples, thus restricting its exposure to a range of possible outputs citesequence. This limitation can hinder the model’s real-world performance by causing it to overfit to the partic-ular examples in the training set, thereby reducing its ability to generalize across various contexts.

運用對比學(xué)習(xí)

在為語言模型準備訓(xùn)練數(shù)據(jù)的階段，通常會創(chuàng)建輸入和輸出的交互對。這種傳統(tǒng)方法可能導(dǎo)致“暴露偏差”，即模型只在單個正確的輸出樣本上進行訓(xùn)練，從而限制了其暴露于一系列可能的輸出序列。這種限制可能會導(dǎo)致模型過度擬合訓(xùn)練集中的特定示例，從而降低其在各種上下文中泛化的能力，從而阻礙模型的實際性能。

To mitigate exposure bias, SURGE [Kang et al., 2023] proposes the use of graph-text contrastive learning. This method includes a contrastive learning objective that prompts the model to produce a range of plausible and coherent re-sponses, expanding beyond the instances encountered in the training data. This approach is crucial in reducing overfitting and strengthening the model’s ability to generalize.

為了減輕暴露偏差，SURGE [Kang等人，2023]提出使用圖文對比學(xué)習(xí)。這種方法包括一個對比學(xué)習(xí)目標，促使模型產(chǎn)生一系列合理和連貫的響應(yīng)，擴展到訓(xùn)練數(shù)據(jù)中遇到的實例之外。這種方法對于減少過擬合和增強模型的泛化能力至關(guān)重要。

For retrieval tasks that engage with structured data, the SANTA framework [Li et al., 2023d] implements a tripartite training regimen to effectively encapsulate both structural and semantic nuances. The initial phase focuses on the retriever, where contrastive learning is harnessed to refine the query and document embeddings.

對于涉及結(jié)構(gòu)化數(shù)據(jù)的檢索任務(wù)，SANTA框架[Li et al.， 2023]實現(xiàn)了一個三方訓(xùn)練方案，以有效地封裝結(jié)構(gòu)和語義的細微差別。初始階段關(guān)注檢索器，利用對比學(xué)習(xí)來細化查詢和文檔嵌入。

Subsequently, the generator’s preliminary training stage employs contrastive learning to align the structured data with its unstructured document descriptions. In a further stage of generator training, the model acknowledges the critical role of entity semantics in the representation learning of textual data for retrieval, as highlighted by [Sciavolino et al., 2021, Zhang et al., 2019]. This process commences with the identi-fication of entities within the structured data, followed by the application of masks over these entities within the generator’s input data, thus setting the stage for the model to anticipate and predict these masked elements.

隨后，生成器的初步訓(xùn)練階段采用對比學(xué)習(xí)將結(jié)構(gòu)化數(shù)據(jù)與其非結(jié)構(gòu)化文檔描述對齊。在生成器訓(xùn)練的進一步階段，該模型承認實體語義在文本數(shù)據(jù)的表示學(xué)習(xí)中起著關(guān)鍵作用，如[Sciavolino等人，2021,Zhang等人，2019]所強調(diào)的那樣。這個過程從識別結(jié)構(gòu)化數(shù)據(jù)中的實體開始，然后在生成器的輸入數(shù)據(jù)中對這些實體應(yīng)用掩碼，從而為模型預(yù)測和預(yù)測這些掩碼元素奠定基礎(chǔ)。

The training regimen progresses with the model learning to reconstruct the masked entities by leveraging contextual information. This exercise cultivates the model’s comprehen-sion of the textual data’s structural semantics and facilitates the alignment of pertinent entities within the structured data. The overarching optimization goal is to train the language model to accurately restore the obscured spans, thereby en-riching its understanding of entity semantics [Ye et al., 2020].

訓(xùn)練方案隨著模型學(xué)習(xí)的進展，利用上下文信息重構(gòu)被掩蓋的實體。這個練習(xí)培養(yǎng)了模型對文本數(shù)據(jù)的結(jié)構(gòu)語義的理解，并促進了結(jié)構(gòu)化數(shù)據(jù)中相關(guān)實體的對齊?？傮w優(yōu)化目標是訓(xùn)練語言模型準確地恢復(fù)模糊的跨度，從而豐富其對實體語義的理解[Ye et al.， 2020]。

6 Augmentation in RAG

This section is structured around three key aspects: the aug-mentation stage, sources of augmentation data, and the aug-mentation process. These facets elucidate the critical tech-nologies pivotal to RAG’s development. A taxonomy of RAG’s core components is presented in Figure 4.

本節(jié)圍繞三個關(guān)鍵方面展開:增強階段、增強數(shù)據(jù)的來源和增強過程。這些方面闡明了對RAG發(fā)展至關(guān)重要的關(guān)鍵技術(shù)。RAG的核心組件的分類如圖4所示。

6.1 RAG in Augmentation Stages

RAG, a knowledge-intensive endeavor, incorporates a vari-ety of technical methodologies across the pre-training, fine-tuning, and inference stages of language model training.

RAG是一項知識密集型的工作，它在語言模型訓(xùn)練的預(yù)訓(xùn)練、微調(diào)和推理階段整合了各種技術(shù)方法。

Pre-training Stage

During the pre-training stage, researchers have investigated methods to bolster PTMs for open-domain QA through?retrieval-based strategies. The REALM model adopts a struc-tured, interpretable method for knowledge embedding, fram-ing pre-training, and fine-tuning as a retrieve-then-predict workflow within the masked language model (MLM) frame-work [Arora et al., 2023] .

訓(xùn)練的階段

在預(yù)訓(xùn)練階段，研究人員研究了通過基于檢索的策略來支持開放域QA的ptm的方法。REALM模型采用結(jié)構(gòu)化、可解釋的方法進行知識嵌入、框架預(yù)訓(xùn)練和微調(diào)，作為掩模語言模型(MLM)框架內(nèi)的檢索-預(yù)測工作流[Arora等人，2023]。

RETRO [Borgeaud et al., 2022] leverages retrieval aug-mentation for large-scale pre-training from scratch, achieving a reduction in model parameters while surpassing standard GPT models in terms of perplexity. RETRO distinguishes it-self with an additional encoder designed to process features of entities retrieved from an external knowledge base, build-ing on the foundational structure of GPT models.

RETRO [Borgeaud等人，2022]利用檢索增強從頭開始進行大規(guī)模預(yù)訓(xùn)練，實現(xiàn)了模型參數(shù)的減少，同時在困惑度方面超過了標準GPT模型。RETRO的獨特之處在于它有一個額外的編碼器，該編碼器設(shè)計用于處理從外部知識庫檢索到的實體的特征，建立在GPT模型的基礎(chǔ)結(jié)構(gòu)上。

Atlas[Izacard et al., 2022] also incorporates a retrieval mechanism into the T5 architecture [Raffel et al., 2020] in both the pre-training and fine-tuning stages. It uses a pre-trained T5 to initialize the encoder-decoder language model and a pre-trained Contriever for the dense retriever, improv-ing its efficiency for complex language modeling tasks.

Atlas[Izacard等人，2022]還在預(yù)訓(xùn)練和微調(diào)階段將檢索機制納入T5架構(gòu)[rafael等人，2020]。它使用預(yù)訓(xùn)練的T5來初始化編碼器-解碼器語言模型，使用預(yù)訓(xùn)練的Contriever來初始化密集檢索器，從而提高了復(fù)雜語言建模任務(wù)的效率。

Furthermore, COG [Lan et al., 2022] introduces a novel text generation methodology that emulates copying text frag-ments from pre-existing collections. Utilizing efficient vector search tools, COG computes and indexes contextually mean-ingful representations of text fragments, demonstrating supe-rior performance in domains such as question-answering and domain adaptation when compared to RETRO.

此外，COG [Lan等人，2022]引入了一種新的文本生成方法，該方法模擬從預(yù)先存在的集合中復(fù)制文本片段。利用高效的向量搜索工具，COG計算和索引文本片段的上下文有意義的表示，與RETRO相比，在問答和領(lǐng)域適應(yīng)等領(lǐng)域表現(xiàn)出卓越的性能。

The advent of scaling laws has catalyzed the growth of model parameters, propelling autoregressive models into the mainstream. Researchers are expanding the RAG approach to pretrained larger models, with RETRO++ exemplifying this trend by scaling up the model parameters while preserving or enhancing performance [Wang et al., 2023b].

標度定律的出現(xiàn)促進了模型參數(shù)的增長，推動自回歸模型成為主流。研究人員正在將RAG方法擴展到預(yù)訓(xùn)練更大的模型，RETRO++通過在保持或增強性能的同時擴大模型參數(shù)來體現(xiàn)這一趨勢[Wang等人，2023b]。

Empirical evidence underscores marked improvements in text generation quality, factual accuracy, reduced toxicity, and downstream task proficiency, especially in knowledge-intensive applications like open-domain QA. These results imply that integrating retrieval mechanisms into the pretraining of autoregressive language models constitutes a promising avenue, marrying sophisticated retrieval tech-niques with expansive language models to yield more precise and efficient language generation.

經(jīng)驗證據(jù)強調(diào)了在文本生成質(zhì)量、事實準確性、降低毒性和下游任務(wù)熟練程度方面的顯著改進，特別是在像開放領(lǐng)域QA這樣的知識密集型應(yīng)用中。這些結(jié)果表明，將檢索機制集成到自回歸語言模型的預(yù)訓(xùn)練中是一條很有前途的途徑，將復(fù)雜的檢索技術(shù)與廣泛的語言模型相結(jié)合，以產(chǎn)生更精確和有效的語言生成。

The benefits of augmented pre-training include a robust foundational model that outperforms standard GPT models in perplexity, text generation quality, and task-specific per-formance, all while utilizing fewer parameters. This method is particularly adept at handling knowledge-intensive tasks and facilitates the development of domain-specific models through training on specialized corpora.

增強預(yù)訓(xùn)練的好處包括一個健壯的基礎(chǔ)模型，該模型在困惑度、文本生成質(zhì)量和特定任務(wù)性能方面優(yōu)于標準GPT模型，同時使用更少的參數(shù)。這種方法特別擅長處理知識密集型任務(wù)，并通過對專門語料庫的訓(xùn)練促進特定領(lǐng)域模型的開發(fā)。

Nonetheless, this approach faces challenges such as the necessity for extensive pre-training datasets and resources, as well as diminished update frequencies with increasing model sizes. Despite these hurdles, the approach offers significant advantages in model resilience. Once trained, retrieval-enhanced models can operate independently of ex-ternal libraries, enhancing generation speed and operational efficiency. The potential gains identified render this method-ology a compelling subject for ongoing investigation and in-novation in artificial intelligence and machine learning.

盡管如此，這種方法面臨著挑戰(zhàn)，例如需要廣泛的預(yù)訓(xùn)練數(shù)據(jù)集和資源，以及隨著模型大小的增加而減少的更新頻率。盡管存在這些障礙，但該方法在模型彈性方面提供了顯著的優(yōu)勢。經(jīng)過訓(xùn)練后，檢索增強模型可以獨立于外部庫運行，從而提高了生成速度和操作效率。所確定的潛在收益使這種方法成為人工智能和機器學(xué)習(xí)領(lǐng)域正在進行的研究和創(chuàng)新的引人注目的主題。

Fine-tuning Stage

RAG and Fine-tuning are powerful tools for enhancing LLMs, and combining the two can meet the needs of more specific scenarios. On one hand, fine-tuning allows for the retrieval of documents with a unique style, achieving bet-ter semantic expression and aligning the differences between queries and documents. This ensures that the output of the retriever is more aptly suited to the scenario at hand. On the other hand, fine-tuning can fulfill the generation needs of making stylized and targeted adjustments. Furthermore, fine-tuning can also be used to align the retriever and generator for improved model synergy.

微調(diào)階段

RAG和Fine-tuning是增強llm的強大工具，將兩者結(jié)合起來可以滿足更具體場景的需求。一方面，微調(diào)允許檢索具有獨特樣式的文檔，實現(xiàn)更好的語義表達，并調(diào)整查詢和文檔之間的差異。這確保了檢索器的輸出更適合手頭的場景。另一方面，微調(diào)可以滿足進行風(fēng)格化和針對性調(diào)整的生成需求。此外，微調(diào)還可以用于對齊檢索器和生成器，以改進模型協(xié)同。

The main goal of fine-tuning the retriever is to improve the quality of semantic representations, achieved by directly fine-tuning the Embedding model using a corpus [Liu, 2023]. By aligning the retriever’s capabilities with the prefer-ences of the LLMs through feedback signals, both can be better coordinated [Yu et al., 2023b, Izacard et al., 2022, Yang et al., 2023b, Shi et al., 2023]. Fine-tuning the retriever for specific downstream tasks can lead to improved adapt-ability [cite]. The introduction of task-agnostic fine-tuning aims to enhance the retriever’s versatility in multi-task sce-narios [Cheng et al., 2023a].

微調(diào)檢索器的主要目標是通過使用語料庫直接微調(diào)嵌入模型來提高語義表示的質(zhì)量[Liu, 2023]。通過反饋信號使尋回犬的能力與llm的偏好一致，可以更好地協(xié)調(diào)兩者[Yu et al.， 2023b, Izacard et al.， 2022, Yang et al.， 2023b, Shi et al.， 2023]。為特定的下游任務(wù)微調(diào)檢索器可以提高適應(yīng)能力[引用]。引入任務(wù)不可知微調(diào)的目的是增強尋回犬在多任務(wù)場景中的多功能性[Cheng等人，2023a]。

Fine-tuning generator can result in outputs that are more stylized and customized. On one hand, it allows for specialized adaptation to different input data formats. For example, fine-tuning LLMs to fit the structure of knowledge graphs [Kang et al., 2023], the structure of text pairs [Kang et al., 2023, Cheng et al., 2023b], and other spe-cific structures [Li et al., 2023d]. On the other hand, by con-structing directive datasets, one can demand LLMs to gen-erate specific formats content. For instance, in adaptive or iterative retrieval scenarios, LLMs are fine-tuned to generate content that will help determine the timing for the next step of action [Jiang et al., 2023b, Asai et al., 2023].

微調(diào)生成器可以產(chǎn)生更加風(fēng)格化和定制的輸出。一方面，它允許專門適應(yīng)不同的輸入數(shù)據(jù)格式。例如，微調(diào)llm以擬合知識圖的結(jié)構(gòu)[Kang等人，2023]、文本對的結(jié)構(gòu)[Kang等人，2023,Cheng等人，2023b]和其他特定結(jié)構(gòu)[Li等人，2023d]。另一方面，通過構(gòu)建指令數(shù)據(jù)集，可以要求llm生成特定格式的內(nèi)容。例如，在自適應(yīng)或迭代檢索場景中，llm被微調(diào)以生成有助于確定下一步行動時間的內(nèi)容[Jiang等人，2023b, Asai等人，2023]。

By synergistically fine-tuning both the retriever and the generator, we can enhance the model’s generalization capabilities and avoid overfitting that may arise from training them separately. However, joint fine-tuning also leads to increased resource consumption. RA-DIT [Lin et al., 2023] presents a lightweight, dual-instruction tuning framework that can effectively add retrieval capabilities to any LLMs. The retrieval-enhanced directive fine-tuning updates the LLM, guiding it to make more efficient use of the information re-trieved and to disregard distracting content.

通過協(xié)同微調(diào)檢索器和生成器，我們可以增強模型的泛化能力，并避免單獨訓(xùn)練它們可能產(chǎn)生的過擬合。然而，聯(lián)合微調(diào)也會導(dǎo)致資源消耗增加。RA-DIT [Lin等，2023]提出了一種輕量級的雙指令調(diào)優(yōu)框架，可以有效地為任何llm添加檢索功能。檢索增強指令微調(diào)更新LLM，指導(dǎo)它更有效地利用檢索到的信息，并忽略分散注意力的內(nèi)容。

Despite its advantages, fine-tuning has limitations, includ-ing the need for specialized datasets for RAG fine-tuning and the requirement for significant computational resources. However, this stage allows for customizing models to specific needs and data formats, potentially reducing resource usage compared to the pre-training phase while still being able to fine-tune the model’s output style.

盡管有其優(yōu)點，但微調(diào)也有局限性，包括需要專門的數(shù)據(jù)集進行RAG微調(diào)以及需要大量的計算資源。然而，這個階段允許根據(jù)特定的需求和數(shù)據(jù)格式定制模型，與預(yù)訓(xùn)練階段相比，潛在地減少了資源使用，同時仍然能夠微調(diào)模型的輸出樣式。

In summary, the fine-tuning stage is essential for the adap-tation of RAG models to specific tasks, enabling the refine-ment of both retrievers and generators. This stage enhances the model’s versatility and adaptability to various tasks, de-spite the challenges presented by resource and dataset re-quirements. The strategic fine-tuning of RAG models is therefore a critical component in the development of efficient and effective retrieval-augmented systems.

總之，微調(diào)階段對于使RAG模型適應(yīng)特定的任務(wù)是必不可少的，從而可以對檢索器和生成器進行細化。這一階段增強了模型的通用性和對各種任務(wù)的適應(yīng)性，盡管存在資源和數(shù)據(jù)集需求帶來的挑戰(zhàn)。因此，RAG模型的戰(zhàn)略性微調(diào)是開發(fā)高效和有效的檢索增強系統(tǒng)的關(guān)鍵組成部分。

Inference Stage

The inference stage in RAG models is crucial, as it in-volves extensive integration with LLMs. Traditional RAG approaches, also known as Naive RAG, involve incorporating retrieval content at this stage to guide the generation process.

推理階段

RAG模型中的推理階段是至關(guān)重要的，因為它涉及到與llm的廣泛集成。傳統(tǒng)的RAG方法，也稱為樸素RAG，涉及在此階段合并檢索內(nèi)容以指導(dǎo)生成過程。

To overcome the limitations of Naive RAG, advanced tech-niques introduce more contextually rich information dur-ing inference. The DSP framework [Khattab et al., 2022] utilizes a sophisticated exchange of natural language text between fronzen LMs and retrieval models (RMs), en-riching the context and thereby improving generation out-comes. The PKG [Luo et al., 2023] method equips LLMs with a knowledge-guided module that allows for the retrieval of pertinent information without modifying the LMs’ pa-rameters, enabling more complex task execution. CREA-ICL [Li et al., 2023b] employs a synchronous retrieval of cross-lingual knowledge to enhance context, while RE-CITE [Sun et al., 2022] generates context by sampling para-graphs directly from LLMs.

為了克服樸素RAG的局限性，先進的技術(shù)在推理過程中引入了更多上下文豐富的信息。DSP框架[Khattab等人，2022]利用前沿lm和檢索模型(rm)之間復(fù)雜的自然語言文本交換，豐富了上下文，從而改善了生成結(jié)果。PKG [Luo等人，2023]方法為llm配備了一個知識引導(dǎo)模塊，該模塊允許在不修改LMs的pa參數(shù)的情況下檢索相關(guān)信息，從而能夠執(zhí)行更復(fù)雜的任務(wù)。CREA-ICL [Li et al.， 2023b]采用跨語言知識的同步檢索來增強上下文，而RE-CITE [Sun et al.， 2022]通過直接從llm中采樣段落來生成上下文。

Further refinement of the RAG process during infer-ence is seen in approaches that cater to tasks necessi-tating multi-step reasoning. ITRG [Feng et al., 2023] it-eratively retrieves information to identify the correct rea-soning paths, thereby improving task adaptability. ITER-RETGEN [Shao et al., 2023] follows an iterative strat-egy, merging retrieval and generation in a cyclical pro-cess that alternates between “retrieval-enhanced generation” and “generation-enhanced retrieval”. For non-knowledge-intensive (NKI) tasks, PGRA [Guo et al., 2023] proposes a two-stage framework, starting with a task-agnostic retriever followed by a prompt-guided reranker to select and priori-tize evidence. In contrast, IRCOT [Trivedi et al., 2022] com-bines RAG with Chain of Thought (CoT) methodologies, al-ternating CoT-guided retrievals with retrieval-informed CoT processes, significantly boosting GPT-3’s performance across?various question-answering tasks.

在推理過程中，RAG過程的進一步細化可以在滿足需要多步驟推理的任務(wù)的方法中看到。ITRG [Feng et .， 2023]通過迭代檢索信息來識別正確的推理路徑，從而提高任務(wù)適應(yīng)性。ITER-RETGEN [Shao et al.， 2023]采用迭代策略，在“檢索增強生成”和“生成增強檢索”之間交替的循環(huán)過程中合并檢索和生成。對于非知識密集型(NKI)任務(wù)，PGRA [Guo等人，2023]提出了一個兩階段框架，首先是任務(wù)不可知的檢索器，然后是提示引導(dǎo)的重新排序器，以選擇和優(yōu)先排序證據(jù)。相比之下，IRCOT [Trivedi等人，2022]將RAG與思維鏈(CoT)方法相結(jié)合，將思維鏈引導(dǎo)的檢索與檢索通知的CoT過程相結(jié)合，顯著提高了GPT-3在各種問答任務(wù)中的表現(xiàn)。

In essence, these inference-stage enhancements provide lightweight, cost-effective alternatives that leverage the ca-pabilities of pre-trained models without necessitating further training. The principal advantage is maintaining static LLM parameters while supplying contextually relevant information to meet specific task demands. Nevertheless, this approach is not without limitations, as it requires meticulous data pro-cessing and optimization, and is bound by the foundational model’s intrinsic capabilities. To address diverse task require-ments effectively, this method is often paired with procedural optimization techniques such as step-wise reasoning, iterative retrieval, and adaptive retrieval strategies.

從本質(zhì)上講，這些推理階段的增強提供了輕量級的、經(jīng)濟有效的替代方案，可以利用預(yù)訓(xùn)練模型的功能，而不需要進一步的訓(xùn)練。其主要優(yōu)點是在提供上下文相關(guān)信息以滿足特定任務(wù)需求的同時維護靜態(tài)LLM參數(shù)。然而，這種方法并非沒有局限性，因為它需要細致的數(shù)據(jù)處理和優(yōu)化，并且受到基礎(chǔ)模型固有能力的約束。為了有效地解決不同的任務(wù)需求，這種方法通常與過程優(yōu)化技術(shù)相結(jié)合，如分步推理、迭代檢索和自適應(yīng)檢索策略。

6.2 Augmentation Source

The effectiveness of RAG models is heavily impacted by the selection of data sources for augmentation. Different levels of knowledge and dimensions require distinct processing tech-niques. They are categorized as unstructured data, structured data, and content generated by LLMs. The technology tree of representative RAG research with different augmentation aspects is depicted in Figure 5. The leaves, colored in three different shades, represent enhancements using various types of data: unstructured data, structured data, and content gener-ated by LLMs. The diagram clearly shows that initially, aug-mentation was mainly achieved through unstructured data, such as pure text. This approach later expanded to include the use of structured data (e.g. knowledge graph) for further improvement. More recently, there has been a growing trend in research that utilizes content generated by the LLMs them-selves for retrieval and augmentation purposes.

擴充數(shù)據(jù)源的選擇嚴重影響RAG模型的有效性。不同的知識層次和維度需要不同的處理技術(shù)。它們分為非結(jié)構(gòu)化數(shù)據(jù)、結(jié)構(gòu)化數(shù)據(jù)和法學(xué)碩士生成的內(nèi)容。具有代表性的不同增強方面的RAG研究技術(shù)樹如圖5所示。葉子以三種不同的深淺顏色表示使用不同類型數(shù)據(jù)的增強:非結(jié)構(gòu)化數(shù)據(jù)、結(jié)構(gòu)化數(shù)據(jù)和llm生成的內(nèi)容。圖表清楚地表明，最初，增強主要是通過非結(jié)構(gòu)化數(shù)據(jù)，如純文本來實現(xiàn)的。這種方法后來擴展到包括使用結(jié)構(gòu)化數(shù)據(jù)(例如知識圖)以進一步改進。最近，在研究中有一種日益增長的趨勢，即利用llm本身生成的內(nèi)容進行檢索和增強。

Augmented with Unstructured Data

Unstructured text, is gathered from corpora, such as prompt data for fine-tuning large models [Cheng et al., 2023a] and cross-lingual data [Li et al., 2023b]. Retrieval units vary from tokens (e.g., kNN-LM [Khandelwal et al., 2019]) to phrases (e.g., NPM, COG [Lee et al., 2020, Lan et al., 2022]) and document paragraphs, with finer granularities offering pre-cision at the cost of increased retrieval complexity.

擴充非結(jié)構(gòu)化數(shù)據(jù)

非結(jié)構(gòu)化文本從語料庫中收集，例如用于微調(diào)大型模型的提示數(shù)據(jù)[Cheng等人，2023a]和跨語言數(shù)據(jù)[Li等人，2023b]。檢索單元從令牌(例如kNN-LM [Khandelwal等人，2019])到短語(例如NPM, COG [Lee等人，2020,Lan等人，2022])和文檔段落不等，更細的粒度以增加檢索復(fù)雜性為代價提供了精確的決策。

FLARE [Jiang et al., 2023b] introduces an active re-trieval approach, triggered by the LM’s generation of low-probability words. It creates a temporary sentence for doc-ument retrieval, then regenerates the sentence with the re-trieved context to predict subsequent sentences. RETRO uses the previous chunk to retrieve the nearest neighbor at the chunk level, combined with the previous chunk’s context, it guides the generation of the next chunk. To preserve causal-ity, the generation of the next block Ci only utilizes the near-est neighbor of the previous block N(Ci?1) and not N(Ci).

FLARE [Jiang等人，2023b]引入了一種主動重新檢索方法，該方法由LM生成的低概率詞觸發(fā)。它為文檔檢索創(chuàng)建一個臨時句子，然后使用檢索到的上下文重新生成該句子，以預(yù)測后續(xù)的句子。RETRO使用前一個塊來檢索塊級別上最近的鄰居，結(jié)合前一個塊的上下文，它指導(dǎo)下一個塊的生成。為了保持因果關(guān)系，下一個塊Ci的生成只利用前一個塊的最近鄰居N(Ci?1)而不是N(Ci)。

Augmented with Structured Data

Structured data, such as knowledge graphs (KGs), pro-vide high-quality context and mitigate model hallucina-tions. RET-LLMs [Modarressi et al., 2023] constructs a knowledge graph memory from past dialogues for future ref-erence. SUGRE [Kang et al., 2023] employs Graph Neu-ral Networks (GNNs) to encode relevant KG subgraphs, ensuring consistency between retrieved facts and gener-ated text through multi-modal contrastive learning. KnowledGPT [Wang et al., 2023d] generates KB search queries and stores knowledge in a personalized base, enhancing the RAG model’s knowledge richness and contextuality.

增強結(jié)構(gòu)化數(shù)據(jù)

結(jié)構(gòu)化數(shù)據(jù)，如知識圖(KGs)，提供了高質(zhì)量的背景，減輕了模型幻覺。RET-LLMs [Modarressi et al.， 2023]從過去的對話中構(gòu)建了一個知識圖記憶，以供將來參考。SUGRE [Kang et al.， 2023]使用圖神經(jīng)網(wǎng)絡(luò)(Graph neural Networks, gnn)對相關(guān)KG子圖進行編碼，通過多模態(tài)對比學(xué)習(xí)確保檢索事實與生成文本之間的一致性。KnowledGPT [Wang et al.， 2023]生成知識庫搜索查詢，并將知識存儲在個性化庫中，增強了RAG模型的知識豐富度和上下文性。

LLMs-Generated Content in RAG

Addressing the limitations of external auxiliary information in RAG, some research has focused on exploiting LLMs’ in-ternal knowledge. SKR [Wang et al., 2023e] classifies ques-tions as known or unknown, applying retrieval enhancement selectively. GenRead [Yu et al., 2022] replaces the retriever with an LLM generator, finding that LLM-generated con-texts often contain more accurate answers due to better align-ment with the pre-training objectives of causal language mod-eling. Selfmem [Cheng et al., 2023b] iteratively creates an unbounded memory pool with a retrieval-enhanced genera-tor, using a memory selector to choose outputs that serve as dual problems to the original question, thus self-enhancing the generative model.

法學(xué)碩士生成的內(nèi)容在RAG

針對外部輔助信息在RAG中的局限性，一些研究側(cè)重于利用法學(xué)碩士的內(nèi)部知識。SKR [Wang et al.， 2023e]將問題分類為已知或未知，有選擇地應(yīng)用檢索增強。GenRead [Yu et al.， 2022]用LLM生成器取代了檢索器，發(fā)現(xiàn)LLM生成的上下文通常包含更準確的答案，因為它與因果語言建模的預(yù)訓(xùn)練目標更一致。Selfmem [Cheng et al.， 2023b]使用檢索增強的genera-tor迭代創(chuàng)建無界內(nèi)存池，使用內(nèi)存選擇器選擇作為原始問題的雙重問題的輸出，從而自我增強生成模型。

These methodologies underscore the breadth of innovative data source utilization in RAG, striving to improve model per-formance and task effectiveness.

這些方法強調(diào)了RAG中創(chuàng)新數(shù)據(jù)源利用的廣度，努力提高模型性能和任務(wù)有效性。

6.3 Augmentation Process

In the domain of RAG, the standard practice often involves a singular retrieval step followed by generation, which can lead to inefficiencies. A notable issue, termed the “l(fā)ost in the middle” phenomenon, arises when a single retrieval yields redundant content that may dilute or contradict es-sential information, thereby degrading the generation qual-ity [Liu et al., 2023a]. Furthermore, such singular retrieval is typically insufficient for complex problems demanding multi-step reasoning, as it provides a limited scope of informa-tion [Yoran et al., 2023].

在RAG領(lǐng)域中，標準實踐通常涉及單個檢索步驟，然后是生成，這可能導(dǎo)致效率低下。一個值得注意的問題被稱為“中間丟失”現(xiàn)象，當(dāng)單個檢索產(chǎn)生冗余內(nèi)容時，可能會稀釋或矛盾基本信息，從而降低生成質(zhì)量[Liu et al.， 2023a]。此外，這種奇異檢索通常不足以解決需要多步推理的復(fù)雜問題，因為它提供的信息范圍有限[Yoran等人，2023]。

As illustrated in Figure 5, to circumvent these challenges, contemporary research has proposed methods for refining the retrieval process: iterative retrieval, recursive retrieval and adaptive retrieval. Iterative retrieval allows the model to en-gage in multiple retrieval cycles, enhancing the depth and relevance of the information obtained. Recursive retrieval process where the results of one retrieval operation are used as the input for the subsequent retrieval. It helps to delve deeper into relevant information, particularly when dealing with complex or multi-step queries. Recursive retrieval is of-ten used in scenarios where a gradual approach is needed to converge on a final answer, such as in academic research, le-gal case analysis, or certain types of data mining tasks. Adap-tive retrieval, on the other hand, offers a dynamic adjustment mechanism, tailoring the retrieval process to the specific de-mands of varying tasks and contexts.

如圖5所示，為了規(guī)避這些挑戰(zhàn)，當(dāng)代研究提出了改進檢索過程的方法:迭代檢索、遞歸檢索和自適應(yīng)檢索。迭代檢索允許模型參與多個檢索周期，增強所獲得信息的深度和相關(guān)性。遞歸檢索過程，其中一次檢索操作的結(jié)果用作后續(xù)檢索的輸入。它有助于深入研究相關(guān)信息，特別是在處理復(fù)雜或多步驟查詢時。遞歸檢索通常用于需要逐步收斂于最終答案的場景，例如在學(xué)術(shù)研究、法律案例分析或某些類型的數(shù)據(jù)挖掘任務(wù)中。另一方面，自適應(yīng)檢索提供了一種動態(tài)調(diào)整機制，使檢索過程適應(yīng)不同任務(wù)和上下文的具體要求。

Iterative Retrieval

Iterative retrieval in RAG models is a process where doc-uments are repeatedly collected based on the initial query and the text generated thus far, providing a more compre-hensive knowledge base for LLMs [Borgeaud et al., 2022, Arora et al., 2023]. This approach has been shown to en-hance the robustness of subsequent answer generation by of-fering additional contextual references through multiple re-trieval iterations. However, it may suffer from semantic dis-continuity and the accumulation of irrelevant information, as?it typically relies on a sequence of n tokens to demarcate the boundaries between generated text and retrieved documents.

迭代的檢索

RAG模型中的迭代檢索是基于初始查詢和迄今為止生成的文本重復(fù)收集文檔的過程，為法學(xué)碩士提供了更全面的知識庫[Borgeaud等人，2022,Arora等人，2023]。這種方法已被證明可以通過多次重新檢索迭代提供額外的上下文引用來增強后續(xù)答案生成的魯棒性。然而，它可能會受到語義不連續(xù)性和不相關(guān)信息積累的影響，因為它通常依賴于n個令牌序列來劃定生成文本和檢索文檔之間的邊界。

To address specific data scenarios, recursive retrieval and multi-hop retrieval techniques are utilized. Recursive re-trieval involves a structured index to process and retrieve data in a hierarchical manner, which may include summa-rizing sections of a document or lengthy PDF before per-forming a retrieval based on this summary. Subsequently, a secondary retrieval within the document refines the search, embodying the recursive nature of the process. In contrast, multi-hop retrieval is designed to delve deeper into graph-structured data sources, extracting interconnected informa-tion [Li et al., 2023c].

為了解決特定的數(shù)據(jù)場景，使用了遞歸檢索和多跳檢索技術(shù)。遞歸重新檢索涉及到以分層方式處理和檢索數(shù)據(jù)的結(jié)構(gòu)化索引，其中可能包括在基于該摘要執(zhí)行檢索之前對文檔或冗長PDF的各個部分進行匯總。隨后，文檔中的二次檢索細化了搜索，體現(xiàn)了該過程的遞歸性質(zhì)。相比之下，多跳檢索旨在更深入地挖掘圖結(jié)構(gòu)數(shù)據(jù)源，提取相互關(guān)聯(lián)的信息[Li et al.， 2023c]。

Additionally, some methodologies integrate the steps of re-trieval and generation. ITER-RETGEN [Shao et al., 2023] employs a synergistic approach that leverages “retrieval-enhanced generation” alongside “generation-enhanced re-trieval” for tasks that necessitate the reproduction of specific information. The model harnesses the content required to ad-dress the input task as a contextual basis for retrieving per-tinent knowledge, which in turn facilitates the generation of improved responses in subsequent iterations.

此外，一些方法集成了重新檢索和生成的步驟。ITER-RETGEN [Shao等人，2023]采用了一種協(xié)同方法，在需要復(fù)制特定信息的任務(wù)中，利用“檢索增強生成”和“生成增強檢索”。該模型利用處理輸入任務(wù)所需的內(nèi)容作為檢索各大洲知識的上下文基礎(chǔ)，這反過來又促進了在隨后的迭代中生成改進的響應(yīng)。

Recursive Retrieval

Recursive Retrieval is often used in information retrieval and NLP to improve the depth and relevance of search results.?The process involves iteratively refining search queries based on the results obtained from previous searches. Recursive Retrieval aims to enhance the search experience by gradu-ally converging on the most pertinent information through a feedback loop. IRCoT [Trivedi et al., 2022] uses chain-of-thought to guide the retrieval process and refines the CoT with the obtained retrieval results. ToC [Kim et al., 2023] creates a clarification tree that systematically optimizes the ambiguous parts in the Query. It can be particularly useful in complex search scenarios where the user’s needs are not en-tirely clear from the outset or where the information sought is highly specialized or nuanced. The recursive nature of the process allows for continuous learning and adaptation to the user’s requirements, often resulting in improved satisfaction with the search outcomes.

遞歸檢索

遞歸檢索常用于信息檢索和自然語言處理，以提高搜索結(jié)果的深度和相關(guān)性。該過程涉及基于從以前的搜索中獲得的結(jié)果迭代地改進搜索查詢。遞歸檢索旨在通過反饋循環(huán)逐步收斂到最相關(guān)的信息，從而增強搜索體驗。IRCoT [Trivedi et al.， 2022]使用思維鏈(chain-of-thought)來指導(dǎo)檢索過程，并利用獲得的檢索結(jié)果對CoT進行細化。ToC [Kim等人，2023]創(chuàng)建了一個澄清樹，系統(tǒng)地優(yōu)化查詢中的模糊部分。在復(fù)雜的搜索場景中，如果用戶的需求從一開始就不完全清楚，或者所搜索的信息非常專門化或微妙，那么它特別有用。該過程的遞歸性質(zhì)允許不斷學(xué)習(xí)和適應(yīng)用戶的需求，通常會提高對搜索結(jié)果的滿意度。

Adaptive Retrieval

Adaptive retrieval methods, exemplified by Flare and Self-RAG [Jiang et al., 2023b, Asai et al., 2023], refine the RAG framework by enabling LLMs to actively determine the op-timal moments and content for retrieval, thus enhancing the efficiency and relevance of the information sourced.

自適應(yīng)的檢索

自適應(yīng)檢索方法，如Flare和Self-RAG [Jiang等人，2023b, Asai等人，2023]，通過使llm能夠主動確定檢索的最優(yōu)時刻和內(nèi)容，從而提高信息源的效率和相關(guān)性，從而完善了RAG框架。

These methods are part of a broader trend wherein LLMs employ active judgment in their operations, as seen in model agents like AutoGPT, Toolformer, and Graph-Toolformer [Yang et al., 2023c, Schick et al., 2023,Zhang, 2023]. Graph-Toolformer, for instance, divides its re-trieval process into distinct steps where LLMs proactively use retrievers, apply Self-Ask techniques, and employ few-shot prompts to initiate search queries. This proactive stance al-lows LLMs to decide when to search for necessary informa-tion, akin to how an agent utilizes tools.

這些方法是llm在其操作中采用主動判斷的更廣泛趨勢的一部分，正如在AutoGPT, Toolformer和Graph-Toolformer等模型代理中所看到的那樣[Yang等人，2023c, Schick等人，2023,Zhang, 2023]。例如，Graph-Toolformer將其檢索過程劃分為不同的步驟，其中l(wèi)lm主動使用檢索器，應(yīng)用Self-Ask技術(shù)，并使用少量提示來啟動搜索查詢。這種主動的姿態(tài)允許llm決定何時搜索必要的信息，類似于代理如何利用工具。

WebGPT [Nakano et al., 2021] integrates a reinforcement learning framework to train the GPT-3 model in au-tonomously using a search engine during text generation. It navigates this process using special tokens that facili-tate actions such as search engine queries, browsing results, and citing references, thereby expanding GPT-3’s capabilities through the use of external search engines.

WebGPT [Nakano等人，2021]集成了一個強化學(xué)習(xí)框架，在文本生成過程中使用搜索引擎自主訓(xùn)練GPT-3模型。它使用特殊的令牌來導(dǎo)航這個過程，這些令牌促進了諸如搜索引擎查詢、瀏覽結(jié)果和引用引用等操作，從而通過使用外部搜索引擎擴展了GPT-3的功能。

Flare automates timing retrieval by monitoring the confi-dence of the generation process, as indicated by the probabil-ity of generated terms [Jiang et al., 2023b]. When the prob-ability falls below a certain threshold would activates the re-trieval system to collect relevant information, thus optimizing the retrieval cycle.

耀斑通過監(jiān)測生成過程的置信度來自動獲取時間，如生成項的概率所示[Jiang等，2023b]。當(dāng)概率低于一定閾值時，將激活檢索系統(tǒng)收集相關(guān)信息，從而優(yōu)化檢索周期。

Self-RAG [Asai et al., 2023] introduces “reflection to-kens” that allow the model to introspect its outputs. These tokens come in two varieties: “retrieve” and “critic”. The model autonomously decides when to activate retrieval, or alternatively, a predefined threshold may trigger the pro-cess. During retrieval, the generator conducts a fragment-level beam search across multiple paragraphs to derive the most coherent sequence. Critic scores are used to update the subdivision scores, with the flexibility to adjust these weights during inference, tailoring the model’s behavior. Self-RAG’s design obviates the need for additional classifiers or reliance on Natural Language Inference (NLI) models, thus stream-lining the decision-making process for when to engage re-trieval mechanisms and improving the model’s autonomous judgment capabilities in generating accurate responses.

Self-RAG [Asai等人，2023]引入了“反射因子”，允許模型自省其輸出。這些標記有兩種:“檢索”和“批評”。模型自主地決定何時激活檢索，或者，預(yù)定義的閾值可能觸發(fā)該流程。在檢索過程中，生成器跨多個段落進行片段級波束搜索，以獲得最連貫的序列。評論家分數(shù)用于更新細分分數(shù)，在推理過程中可以靈活地調(diào)整這些權(quán)重，從而調(diào)整模型的行為。Self-RAG的設(shè)計不需要額外的分類器或依賴于自然語言推理(NLI)模型，從而簡化了何時使用重新檢索機制的決策過程，并提高了模型在生成準確響應(yīng)方面的自主判斷能力。

LLM optimization has received significant attention due to its increasing prevalence. Techniques such as prompt engi-neering, Fine-Tuning (FT), and RAG each have distinct char-acteristics, visually represented in Figure 6. While prompt engineering leverages a model’s inherent capabilities, opti-mizing LLMs often requires the application of both RAG and FT methods. The choice between RAG and FT should be based on the specific requirements of the scenario and the in-herent properties of each approach. A detailed comparison of RAG and FT is presented in Table 1.

LLM優(yōu)化由于其日益普及而受到了極大的關(guān)注。諸如提示工程、微調(diào)(FT)和RAG等技術(shù)各有不同的特征，如圖6所示。雖然快速工程利用了模型的固有功能，但優(yōu)化llm通常需要同時應(yīng)用RAG和FT方法。RAG和FT之間的選擇應(yīng)該基于場景的特定需求和每種方法的固有屬性。表1給出了RAG和FT的詳細比較。

6.4 RAG vs Fine-Tuning

RAG is like giving a model a textbook for tailored informa-tion retrieval, perfect for specific queries. On the other hand, FT is like a student internalizing knowledge over time, bet-ter for replicating specific structures, styles, or formats. FT can improve model performance and efficiency by reinforc-ing base model knowledge, adjusting outputs, and teaching complex instructions. However, it is not as good for integrat-ing new knowledge or rapidly iterating new use cases.

RAG就像給模型提供了一本教科書，用于定制信息檢索，非常適合特定查詢。另一方面，《金融時報》就像一個學(xué)生，隨著時間的推移將知識內(nèi)化，更適合復(fù)制特定的結(jié)構(gòu)、風(fēng)格或格式。FT可以通過強化基礎(chǔ)模型知識、調(diào)整輸出和教授復(fù)雜指令來提高模型性能和效率。然而，它不適合集成新知識或快速迭代新用例。

The two methods, RAG and FT, are not mutually exclusive and can be complementary, augmenting a model’s capabil-ities at different levels. In some cases, their combined use may yield optimal performance. The optimization process?involving RAG and FT can necessitate multiple iterations to achieve satisfactory results.

這兩種方法，RAG和FT，并不是相互排斥的，而是可以互補的，可以在不同層次上增強模型的能力。在某些情況下，它們的組合使用可能產(chǎn)生最佳性能。涉及RAG和FT的優(yōu)化過程可能需要多次迭代才能獲得滿意的結(jié)果。

7 RAG Evaluation

The rapid advancement and growing adoption of RAG in the field of Natural Language Processing (NLP) have propelled the evaluation of RAG models to the forefront of research in the LLMs community. The primary objective of this evalua-tion is to comprehend and optimize the performance of RAG models across diverse application scenarios.

RAG在自然語言處理(NLP)領(lǐng)域的快速發(fā)展和越來越多的采用，將RAG模型的評估推向了法學(xué)碩士社區(qū)研究的前沿。該評估的主要目標是理解和優(yōu)化RAG模型跨不同應(yīng)用程序場景的性能。

Historically, RAG models assessments have centered on their execution in specific downstream tasks. These evaluations employ established metrics suitable to the tasks at hand. For instance, question answering evaluations might rely on EM and F1 scores [Wang et al., 2023a, Shi et al., 2023, Feng et al., 2023, Ma et al., 2023a], whereas fact-checking tasks often hinge on accuracy as the pri-mary metric [Lewis et al., 2020, Izacard et al., 2022, Shao et al., 2023]. Tools like RALLE, designed for the auto-matic evaluation of RAG applications, similarly base their as-sessments on these task-specific metrics [Hoshi et al., 2023]. Despite this, there is a notable paucity of research dedicated to evaluating the distinct characteristics of RAG models, with only a handful of related studies.

從歷史上看，RAG模型評估集中在它們在特定下游任務(wù)中的執(zhí)行。這些評估采用適合手頭任務(wù)的既定指標。例如，問答評估可能依賴于EM和F1分數(shù)[Wang等人，2023a, Shi等人，2023,Feng等人，2023,Ma等人，2023a]，而事實核查任務(wù)通常依賴于準確性作為主要指標[Lewis等人，2020,Izacard等人，2022,Shao等人，2023]。為RAG應(yīng)用程序的自動評估而設(shè)計的工具，如RALLE，同樣基于這些特定于任務(wù)的指標進行評估[Hoshi等人，2023]。盡管如此，致力于評估RAG模型獨特特征的研究明顯缺乏，只有少數(shù)相關(guān)研究。

The following section shifts the focus from task-specific evaluation methods and metrics to provide a synthesis of the existing literature based on their unique attributes. This ex-ploration covers the objectives of RAG evaluation, the aspects along which these models are assessed, and the benchmarks and tools available for such evaluations. The aim is to offer a comprehensive overview of RAG model evaluation, outlining the methodologies that specifically address the unique aspects of these advanced generative systems.

以下部分將重點從特定于任務(wù)的評估方法和度量轉(zhuǎn)移到基于其獨特屬性的現(xiàn)有文獻的綜合。本文探討了RAG評估的目標、評估這些模型的各個方面，以及可用于此類評估的基準和工具。目的是提供RAG模型評估的全面概述，概述了具體解決這些先進生成系統(tǒng)獨特方面的方法。

7.1 Evaluation Targets

The assessment of RAG models mainly revolves around two key components: the retrieval and generation modules. This division ensures a thorough evaluation of both the quality of context provided and the quality of content produced.

RAG模型的評估主要圍繞兩個關(guān)鍵組件進行:檢索和生成模塊。這種劃分確保了對所提供的上下文質(zhì)量和所產(chǎn)生的內(nèi)容質(zhì)量的全面評估。

Retrieval Quality

Evaluating the retrieval quality is crucial for determining the effectiveness of the context sourced by the retriever com-ponent. Standard metrics from the domains of search en-gines, recommendation systems, and information retrieval systems are employed to measure the performance of the RAG retrieval module. Metrics such as Hit Rate, MRR, and NDCG are commonly utilized for this purpose [Liu, 2023, Nguyen, 2023].

檢索的質(zhì)量

評估檢索質(zhì)量對于確定檢索器組件來源的上下文的有效性至關(guān)重要。使用來自搜索引擎、推薦系統(tǒng)和信息檢索系統(tǒng)領(lǐng)域的標準度量來度量RAG檢索模塊的性能。命中率、MRR和NDCG等指標通常用于此目的[Liu, 2023, Nguyen, 2023]。

Generation Quality

The assessment of generation quality centers on the gener-ator’s capacity to synthesize coherent and relevant answers from the retrieved context. This evaluation can be catego-rized based on the content’s objectives: unlabeled and la-beled content. For unlabeled content, the evaluation encom-passes the faithfulness, relevance, and non-harmfulness of the generated answers. In contrast, for labeled content, the fo-cus is on the accuracy of the information produced by the?model [Liu, 2023]. Additionally, both retrieval and genera-tion quality assessments can be conducted through manual or automatic evaluation methods [Liu, 2023, Lan et al., 2022, Leng et al., 2023].

一代質(zhì)量

發(fā)電質(zhì)量的評估集中在發(fā)電機從檢索上下文合成連貫和相關(guān)答案的能力上。這種評估可以根據(jù)內(nèi)容的目標進行分類:未標記和標記的內(nèi)容。對于未標記的內(nèi)容，評估包括生成答案的可靠性、相關(guān)性和非危害性。相比之下，對于標記的內(nèi)容，重點是模型產(chǎn)生的信息的準確性[Liu, 2023]。此外，檢索和生成質(zhì)量評估都可以通過手動或自動評估方法進行[Liu, 2023, Lan等，2022,Leng等，2023]。

7.2 Evaluation Aspects

Contemporary evaluation practices of RAG models empha-size three primary quality scores and four essential abilities, which collectively inform the evaluation of the two principal targets of the RAG model: retrieval and generation.

當(dāng)代RAG模型的評估實踐強調(diào)三個主要質(zhì)量分數(shù)和四個基本能力，它們共同通知了RAG模型的兩個主要目標的評估:檢索和生成。

Quality Scores

Quality scores include context relevance, answer faith-fulness, and answer relevance. These quality scores?evaluate the efficiency of the RAG model from differ-ent perspectives in the process of information retrieval and generation [Es et al., 2023, Saad-Falcon et al., 2023, Jarvis and Allard, 2023]. The quality scores—context rele-vance, answer faithfulness, and answer relevance—assess the RAG model’s efficiency from various angles throughout the information retrieval and generation process [Es et al., 2023, Saad-Falcon et al., 2023, Jarvis and Allard, 2023].

質(zhì)量分數(shù)

質(zhì)量分數(shù)包括上下文相關(guān)性、答案真實性和答案相關(guān)性。這些質(zhì)量分數(shù)從不同角度評價RAG模型在信息檢索和生成過程中的效率[Es et al.， 2023; Saad-Falcon et al.， 2023; Jarvis and Allard, 2023]。質(zhì)量分數(shù)——上下文相關(guān)性、答案忠實度和答案相關(guān)性——在整個信息檢索和生成過程中從不同角度評估RAG模型的效率[Es等人，2023;Saad-Falcon等人，2023;Jarvis和Allard, 2023]。

Context Relevance evaluates the precision and specificity of the retrieved context, ensuring relevance and minimizing processing costs associated with extraneous content.

上下文相關(guān)性評估檢索上下文的準確性和特異性，確保相關(guān)性并最大限度地減少與無關(guān)內(nèi)容相關(guān)的處理成本。

Answer Faithfulness ensures that the generated answers re-main true to the retrieved context, maintaining consistency?and avoiding contradictions.

答案忠實確保生成的答案與檢索的上下文保持一致，保持一致性并避免矛盾。

Answer Relevance requires that the generated answers are directly pertinent to the posed questions, effectively address-ing the core inquiry.

答案相關(guān)性要求生成的答案與提出的問題直接相關(guān)，有效地解決核心問題。

Required Abilities

RAG evaluation also encompasses four abilities indicative of its adaptability and efficiency: noise robustness, negative re-jection, information integration, and counterfactual robust-ness [Chen et al., 2023b, Liu et al., 2023b]. These abilities are critical for the model’s performance under various chal-lenges and complex scenarios, impacting the quality scores.

所需的能力

RAG評估還包括表明其適應(yīng)性和效率的四種能力:噪聲魯棒性、負面拒絕、信息整合和反事實魯棒性[Chen et al.， 2023b, Liu et al.， 2023b]。這些能力對于模型在各種挑戰(zhàn)和復(fù)雜場景下的性能至關(guān)重要，影響質(zhì)量分數(shù)。

Noise Robustness appraises the model’s capability to man-age noise documents that are question-related but lack sub-stantive information.

噪聲魯棒性評價模型管理與問題相關(guān)但缺乏實質(zhì)性信息的噪聲文件的能力。

Negative Rejection assesses the model’s discernment in re-fraining from responding when the retrieved documents do not contain the necessary knowledge to answer a question.

當(dāng)檢索到的文檔不包含回答問題所需的知識時，負面拒絕評估模型在重新訓(xùn)練時的識別能力。

Information Integration evaluates the model’s proficiency in synthesizing information from multiple documents to ad-dress complex questions.

信息集成評估模型從多個文檔中綜合信息以解決復(fù)雜問題的熟練程度。

Counterfactual Robustness tests the model’s ability to rec-ognize and disregard known inaccuracies within documents, even when instructed about potential misinformation.

反事實魯棒性測試模型識別和忽略文檔中已知不準確的能力，即使在被告知可能存在錯誤信息的情況下也是如此。

Context relevance and noise robustness are important for evaluating the quality of retrieval, while answer faithfulness, answer relevance, negative rejection, information integration, and counterfactual robustness are important for evaluating the?quality of generation.

上下文相關(guān)性和噪聲魯棒性對于評估檢索質(zhì)量很重要，而答案忠實度、答案相關(guān)性、負面拒絕、信息整合和反事實魯棒性對于評估生成質(zhì)量很重要。

The specific metrics for each evaluation aspect are summa-rized in Table 2. It is essential to recognize that these metrics, derived from related work, are traditional measures and do not yet represent a mature or standardized approach for quan-tifying RAG evaluation aspects. Custom metrics tailored to the nuances of RAG models, though not included here, have also been developed in some evaluation studies.

表2總結(jié)了每個評估方面的具體指標。必須認識到，這些源自相關(guān)工作的度量標準是傳統(tǒng)的度量標準，尚未代表對RAG評價方面進行量化的成熟或標準化的方法。針對RAG模型的細微差別量身定制的度量標準，雖然沒有包括在這里，但也在一些評估研究中得到了開發(fā)。

7.3 Evaluation Benchmarks and Tools

This section delineates the evaluation framework for RAG models, comprising benchmark tests and automated eval-uation tools. These instruments furnish quantitative met-rics that not only gauge RAG model performance but also enhance comprehension of the model’s capabilities across various evaluation aspects. Prominent benchmarks such as RGB and RECALL [Chen et al., 2023b, Liu et al., 2023b] focus on appraising the essential abilities of RAG mod-els. Concurrently, state-of-the-art automated tools like RA-GAS [Es et al., 2023], ARES [Saad-Falcon et al., 2023], and TruLens8 employ LLMs to adjudicate the quality scores. These tools and benchmarks collectively form a robust frame-work for the systematic evaluation of RAG models, as sum-marized in Table 3.

本節(jié)描述RAG模型的評估框架，包括基準測試和自動評估工具。這些工具提供了定量的度量標準，不僅衡量RAG模型的性能，而且還增強了對模型跨各種評估方面的能力的理解。突出的基準，如RGB和RECALL [Chen et al.， 2023b, Liu et al.， 2023b]側(cè)重于評估RAG模型的基本能力。同時，最先進的自動化工具，如RA-GAS [Es等人，2023]，ARES [Saad-Falcon等人，2023]和TruLens8使用法學(xué)碩士來評判質(zhì)量分數(shù)。這些工具和基準共同構(gòu)成了一個健壯的框架，用于對RAG模型進行系統(tǒng)評估，如表3所示。

8 Future Prospects

This section explores three future prospects for RAG: future challenges, modality expansion, and the RAG ecosystem.

本節(jié)探討了RAG的三個未來前景:未來的挑戰(zhàn)、模式擴展和RAG生態(tài)系統(tǒng)。

8.1 Future Challenges of RAG

Despite the considerable progress in RAG technology, several challenges persist that warrant in-depth research:

盡管RAG技術(shù)取得了長足的進步，但仍存在一些需要深入研究的挑戰(zhàn):

Context Length. RAG’s efficacy is limited by the context window size of Large Language Models (LLMs). Balancing the trade-off between a window that is too short, risking insuf-ficient information, and one that is too long, risking informa-tion dilution, is crucial. With ongoing efforts to expand LLM context windows to virtually unlimited sizes, the adaptation of RAG to these changes presents a significant research ques-tion [Xu et al., 2023c, Packer et al., 2023, Xiao et al., 2023].

上下文的長度。RAG的有效性受到大型語言模型(llm)的上下文窗口大小的限制。在窗口太短(可能導(dǎo)致信息不足)和窗口太長(可能導(dǎo)致信息稀釋)之間取得平衡至關(guān)重要。隨著人們不斷努力將LLM上下文窗口擴展到幾乎無限的大小，RAG對這些變化的適應(yīng)提出了一個重要的研究問題[Xu等人，2023c, Packer等人，2023,Xiao等人，2023]。

Robustness. The presence of noise or contradictory infor-mation during retrieval can detrimentally affect RAG’s output quality. This situation is figuratively referred to as “Mis-information can be worse than no information at all”. Im-proving RAG’s resistance to such adversarial or counterfac-tual inputs is gaining research momentum and has become a key performance metric [Yu et al., 2023a, Glass et al., 2021, Baek et al., 2023].

魯棒性。在檢索過程中，噪聲或矛盾信息的存在會對RAG的輸出質(zhì)量產(chǎn)生不利影響。這種情況被比喻為“錯誤的信息可能比根本沒有信息更糟糕”。提高RAG對這種對抗性或反事實輸入的抵抗力正在獲得研究勢頭，并已成為關(guān)鍵的績效指標[Yu等，2023a, Glass等，2021,Baek等，2023]。

Hybrid Approaches (RAG+FT). Combining RAG with fine-tuning is emerging as a leading strategy. Determining the optimal integration of RAG and fine-tuning whether sequen-tial, alternating, or through end-to-end joint training—and how to harness both parameterized and non-parameterized advantages are areas ripe for exploration [Lin et al., 2023].

混合方法(RAG+FT)。將RAG與微調(diào)相結(jié)合正在成為領(lǐng)先的策略。確定RAG和微調(diào)的最佳集成，無論是順序的、交替的還是通過端到端聯(lián)合訓(xùn)練，以及如何利用參數(shù)化和非參數(shù)化優(yōu)勢，都是成熟的探索領(lǐng)域[Lin等，2023]。

Expanding LLM Roles. Beyond generating final answers, LLMs are leveraged for retrieval and evaluation within RAG frameworks. Identifying ways to further unlock LLMs poten-tial in RAG systems is a growing research direction.

擴展法學(xué)碩士角色。除了生成最終答案之外，llm還用于在RAG框架內(nèi)進行檢索和評估。確定進一步釋放RAG系統(tǒng)中l(wèi)lm潛力的方法是一個日益增長的研究方向。

Scaling Laws. While scaling laws [Kaplan et al., 2020] are established for LLMs, their applicability to RAG remains uncertain. Initial studies [Wang et al., 2023b] have begun to ad-dress this, yet the parameter count in RAG models still lags behind that of LLMs. The possibility of an Inverse Scaling Law9, where smaller models outperform larger ones, is par-ticularly intriguing and merits further investigation.

比例法。雖然已經(jīng)為法學(xué)碩士建立了標度定律[Kaplan et al.， 2020]，但它們對RAG的適用性仍然不確定。初步研究[Wang et al.， 2023b]已經(jīng)開始解決這個問題，但RAG模型的參數(shù)計數(shù)仍然落后于llm。逆縮放定律的可能性，即較小的模型優(yōu)于較大的模型，特別有趣，值得進一步研究。

Production-Ready RAG. RAG’s practicality and alignment with engineering requirements have facilitated its adoption. However, enhancing retrieval efficiency, improving document recall in large knowledge bases, and ensuring data secu-rity—such as preventing inadvertent disclosure of document sources or metadata by LLMs—are critical engineering chal-lenges that remain to be addressed [Alon et al., 2022].

生產(chǎn)使用的抹布。RAG的實用性和與工程需求的一致性促進了它的采用。然而，提高檢索效率，提高大型知識庫中的文檔召回率，并確保數(shù)據(jù)安全(如防止法學(xué)碩士無意中泄露文檔源或元數(shù)據(jù))是仍有待解決的關(guān)鍵工程挑戰(zhàn)[Alon等人，2022]。

Modality Extension of RAG

RAG has transcended its initial text-based question-answering confines, embracing a diverse array of modal data. This expansion has spawned innovative multimodal models that integrate RAG concepts across various domains:

RAG的情態(tài)擴展

RAG已經(jīng)超越了它最初基于文本的問答限制，包含了多種模態(tài)數(shù)據(jù)。這種擴展產(chǎn)生了創(chuàng)新的多模態(tài)模型，將RAG概念集成到各個領(lǐng)域:

Image. RA-CM3 [Yasunaga et al., 2022] stands as a pio-neering multimodal model of both retrieving and generating text and images. BLIP-2 [Li et al., 2023a] leverages frozen image encoders alongside LLMs for efficient visual language pre-training, enabling zero-shot image-to-text conversions. The “Visualize Before You Write” method [Zhu et al., 2022] employs image generation to steer the LM’s text generation, showing promise in open-ended text generation tasks.

的形象。RA-CM3 [Yasunaga等人，2022]是一種檢索和生成文本和圖像的并行多模態(tài)模型。BLIP-2 [Li等，2023a]利用凍結(jié)圖像編碼器和llm進行有效的視覺語言預(yù)訓(xùn)練，實現(xiàn)零鏡頭圖像到文本的轉(zhuǎn)換。“在你寫之前可視化”方法[Zhu等人，2022]使用圖像生成來引導(dǎo)LM的文本生成，在開放式文本生成任務(wù)中顯示出前景。

Audio and Video. The GSS method retrieves and stitches together audio clips to convert machine-translated data into speech-translated data [Zhao et al., 2022]. UEOP marks a significant advancement in end-to-end automatic speech recognition by incorporating external, offline strategies for voice-to-text conversion [Chan et al., 2023]. Additionally, KNN-based attention fusion leverages audio embeddings and semantically related text embeddings to refine ASR, thereby accelerating domain adaptation. Vid2Seq augments language models with specialized temporal markers, facilitating the prediction of event boundaries and textual descriptions within a unified output sequence [Yang et al., 2023a].

音頻和視頻。GSS方法檢索并拼接音頻片段，將機器翻譯數(shù)據(jù)轉(zhuǎn)換為語音翻譯數(shù)據(jù)[Zhao et al.， 2022]。UEOP通過結(jié)合外部離線策略進行語音到文本轉(zhuǎn)換，標志著端到端自動語音識別的重大進步[Chan等人，2023]。此外，基于knn的注意力融合利用音頻嵌入和語義相關(guān)的文本嵌入來改進ASR，從而加速領(lǐng)域適應(yīng)。Vid2Seq用專門的時間標記增強了語言模型，便于在統(tǒng)一的輸出序列中預(yù)測事件邊界和文本描述[Yang等，2023a]。

Code. RBPS [Nashid et al., 2023] excels in small-scale learning tasks by retrieving code examples that align with de-velopers’ objectives through encoding and frequency analy-sis. This approach has demonstrated efficacy in tasks such as test assertion generation and program repair. For structured knowledge, the CoK method [Li et al., 2023c] first extracts facts pertinent to the input query from a knowledge graph, then integrates these facts as hints within the input, enhancing performance in knowledge graph question-answering tasks.

代碼。RBPS [Nashid等人，2023]通過編碼和頻率分析檢索與開發(fā)人員目標一致的代碼示例，在小規(guī)模學(xué)習(xí)任務(wù)中表現(xiàn)出色。這種方法在測試斷言生成和程序修復(fù)等任務(wù)中已被證明是有效的。對于結(jié)構(gòu)化知識，CoK方法[Li et al.， 2023c]首先從知識圖中提取與輸入查詢相關(guān)的事實，然后將這些事實作為提示集成到輸入中，從而提高知識圖問答任務(wù)的性能。

8.2 Ecosystem of RAG

Downstream Tasks and Evaluation

RAG has shown considerable promise in enriching language models with the capacity to handle intricate queries and pro-duce detailed responses by leveraging extensive knowledge bases. Empirical evidence suggests that RAG excels in a variety of downstream tasks, including open-ended question answering and fact verification. The integration of RAG not only bolsters the precision and relevance of responses but also their diversity and depth.

下游任務(wù)及評估

通過利用廣泛的知識庫，RAG在豐富語言模型，處理復(fù)雜查詢和生成詳細響應(yīng)的能力方面顯示出了相當(dāng)大的前景。經(jīng)驗證據(jù)表明，RAG在各種下游任務(wù)中表現(xiàn)出色，包括開放式問題回答和事實驗證。RAG的整合不僅提高了響應(yīng)的準確性和相關(guān)性，而且提高了響應(yīng)的多樣性和深度。

The scalability and versatility of RAG across multiple do-mains warrant further investigation, particularly in special-ized fields such as medicine, law, and education. In these ar-eas, RAG could potentially reduce training costs and enhance performance compared to traditional fine-tuning approaches in professional domain knowledge question answering.

RAG跨多個主要領(lǐng)域的可伸縮性和多功能性值得進一步研究，特別是在醫(yī)學(xué)、法律和教育等特殊領(lǐng)域。在這些領(lǐng)域，與專業(yè)領(lǐng)域知識問答的傳統(tǒng)微調(diào)方法相比，RAG可以潛在地降低培訓(xùn)成本并提高性能。

Concurrently, refining the evaluation framework for RAG is essential to maximize its efficacy and utility across different tasks. This entails the development of nuanced metrics and assessment tools that can gauge aspects such as contextual relevance, creativity of content, and non-maleficence.

同時，細化RAG的評估框架對于最大限度地提高其跨不同任務(wù)的效率和效用是必不可少的。這需要開發(fā)細微的度量標準和評估工具，這些工具可以衡量諸如上下文相關(guān)性、內(nèi)容的創(chuàng)造性和非惡意性等方面。

Furthermore, improving the interpretability of RAG-driven models continues to be a key goal. Doing so would allow users to understand the reasoning behind the responses gener-ated by the model, thereby promoting trust and transparency in the use of RAG applications.

此外，改進rag驅(qū)動模型的可解釋性仍然是一個關(guān)鍵目標。這樣做將允許用戶理解模型生成的響應(yīng)背后的原因，從而促進RAG應(yīng)用程序使用中的信任和透明度。

Technical Stack

The development of the RAG ecosystem is greatly impacted by the progression of its technical stack. Key tools like LangChain and LLamaIndex have quickly gained popularity with the emergence of ChatGPT, providing extensive RAG-related APIs and becoming essential in the realm of LLMs.

技術(shù)堆棧

RAG生態(tài)系統(tǒng)的發(fā)展很大程度上受到其技術(shù)堆棧進步的影響。隨著ChatGPT的出現(xiàn)，LangChain和LLamaIndex等關(guān)鍵工具迅速流行起來，提供了大量與rag相關(guān)的api，并成為llm領(lǐng)域必不可少的工具。

Emerging technical stacks, while not as feature-rich as LangChain and LLamaIndex, distinguish themselves with specialized offerings. For instance, Flowise AI10 prioritizes a low-code approach, enabling users to deploy AI applications, including RAG, through a user-friendly drag-and-drop inter-face. Other technologies like HayStack, Meltano11, and Co-here Coral12 are also gaining attention for their unique con-tributions to the field.

新興的技術(shù)棧雖然不像LangChain和LLamaIndex那樣功能豐富，但它們以專門的產(chǎn)品脫穎而出。例如，Flowise AI10優(yōu)先考慮低代碼方法，使用戶能夠通過用戶友好的拖放界面部署AI應(yīng)用程序，包括RAG。干草堆、Meltano11和Co-here Coral12等其他技術(shù)也因其對該領(lǐng)域的獨特貢獻而受到關(guān)注。

In addition to AI-focused providers, traditional software and cloud service providers are expanding their offerings to include RAG-centric services. Verba13 from Weaviate is de-signed for personal assistant applications, while Amazon’s Kendra14 provides an intelligent enterprise search service, al-lowing users to navigate through various content repositories using built-in connectors. During the evolution of the RAG technology landscape, there has been a clear divergence to-wards different specializations, such as: 1) Customization. Tailoring RAG to meet a specific requirements. 2) Simpli-fication. Making RAG easier to use, thereby reducing the ini-tial learning curve. 3) Specialization. Refining RAG to serve production environments more effectively.

除了專注于人工智能的提供商外，傳統(tǒng)的軟件和云服務(wù)提供商也在擴大他們的產(chǎn)品，包括以rag為中心的服務(wù)。Weaviate的Verba13是為個人助理應(yīng)用程序設(shè)計的，而亞馬遜的Kendra14提供了智能企業(yè)搜索服務(wù)，允許用戶使用內(nèi)置連接器瀏覽各種內(nèi)容存儲庫。在RAG技術(shù)領(lǐng)域的發(fā)展過程中，已經(jīng)出現(xiàn)了明顯的分化，趨向于不同的專門化，例如:1)定制。裁剪RAG以滿足特定要求。2) Simpli-fication。使RAG更容易使用，從而減少最初的學(xué)習(xí)曲線。3)專業(yè)化。改進RAG以更有效地服務(wù)于生產(chǎn)環(huán)境。

The mutual growth of RAG models and their technical stack is evident; technological advancements consistently es-tablish new standards for the existing infrastructure. In turn, enhancements to the technical stack drive the evolution of RAG capabilities. The RAG toolkit is converging into a foun-dational technical stack, laying the groundwork for advanced enterprise applications. However, the concept of a fully in-tegrated, comprehensive platform remains on the horizon, pending further innovation and development.

RAG模型及其技術(shù)棧的相互增長是顯而易見的;技術(shù)進步不斷為現(xiàn)有的基礎(chǔ)設(shè)施建立新的標準。反過來，對技術(shù)堆棧的增強推動了RAG功能的發(fā)展。RAG工具包正在聚合成一個基礎(chǔ)技術(shù)堆棧，為高級企業(yè)應(yīng)用程序奠定基礎(chǔ)。然而，一個完全集成的綜合平臺的概念仍然在地平線上，等待進一步的創(chuàng)新和發(fā)展。

9 Conclusion

The summary of this paper, as depicted in Figure 7, high-lights RAG’s significant advancement in enhancing the ca-pabilities of LLMs through the integration of parameter-ized knowledge from language models with extensive non-parameterized data from external knowledge bases. Our sur-vey illustrates the evolution of RAG technologies and their impact on knowledge-intensive tasks. Our analysis delin-eates three developmental paradigms within the RAG frame-work: Naive, Advanced, and Modular RAG, each marking a progressive enhancement over its predecessors. The Ad-vanced RAG paradigm extends beyond the Naive approach by incorporating sophisticated architectural elements, includ-ing query rewriting, chunk reranking, and prompt summariza-tion. These innovations have led to a more nuanced and mod-ular architecture that enhances both the performance and the interpretability of LLMs. RAG’s technical integration with other AI methodologies, such as fine-tuning and reinforce-ment learning, has further expanded its capabilities. In con-tent retrieval, a hybrid methodology that leverages both struc-tured and unstructured data sources is emerging as a trend, providing a more enriched retrieval process. Cutting-edge re-search within the RAG framework is exploring novel con-cepts such as self-retrieval from LLMs and the dynamic tim-ing of information retrieval.

如圖7所示，本文的總結(jié)強調(diào)了RAG通過集成語言模型的參數(shù)化知識和來自外部知識庫的大量非參數(shù)化數(shù)據(jù)，在增強llm的計算能力方面取得的重大進展。我們的調(diào)查說明了RAG技術(shù)的演變及其對知識密集型任務(wù)的影響。我們的分析在RAG框架中劃分了三種發(fā)展范式:樸素的、高級的和模塊化的RAG，每一種都標志著對其前身的逐步增強。高級RAG范例通過合并復(fù)雜的體系結(jié)構(gòu)元素，包括查詢重寫、塊重新排序和提示摘要，擴展了樸素方法。這些創(chuàng)新帶來了更加細致和模塊化的體系結(jié)構(gòu)，增強了llm的性能和可解釋性。RAG與其他人工智能方法(如微調(diào)和強化學(xué)習(xí))的技術(shù)集成進一步擴展了其功能。在內(nèi)容檢索中，利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)源的混合方法正在成為一種趨勢，它提供了更豐富的檢索過程。RAG框架內(nèi)的前沿研究正在探索新的概念，如法學(xué)碩士的自我檢索和信息檢索的動態(tài)時序。

Despite the strides made in RAG technology, research op-portunities abound in improving its robustness and its abil-ity to manage extended contexts. RAG’s application scope is also widening into multimodal domains, adapting its principles to interpret and process diverse data forms such as im-ages, videos, and code. This expansion underscores RAG’s significant practical implications for AI deployment, attract-ing interest from both academic and industrial sectors. The growing ecosystem of RAG is underscored by an increase in RAG-centric AI applications and the ongoing development of supportive tools. However, as RAG’s application land-scape expands, there is an imperative need to refine evaluation methodologies to keep pace with its evolution. Ensuring that performance assessments remain accurate and representative is crucial for capturing the full extent of RAG’s contributions to the AI research and development community.

盡管RAG技術(shù)取得了長足的進步，但在改進其健壯性和管理擴展上下文的能力方面，研究機會仍然很多。RAG的應(yīng)用范圍也擴展到多模式領(lǐng)域，調(diào)整其原理來解釋和處理不同的數(shù)據(jù)形式，如圖像、視頻和代碼。這一擴展凸顯了RAG對人工智能部署的重要實際意義，吸引了學(xué)術(shù)界和工業(yè)界的興趣。以RAG為中心的人工智能應(yīng)用程序的增加和支持性工具的持續(xù)開發(fā)強調(diào)了RAG生態(tài)系統(tǒng)的不斷發(fā)展。然而，隨著RAG應(yīng)用程序領(lǐng)域的擴展，有必要改進評估方法以跟上其發(fā)展的步伐。確?？冃гu估保持準確和代表性對于充分了解RAG對人工智能研究和開發(fā)社區(qū)的貢獻至關(guān)重要。

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

国产一级a片免费看高清,亚洲熟女中文字幕在线视频,黄三级高清在线播放,免费黄色视频在线看