As application of LLM further develops, bare LLM is not capable for some more complex tasks, due to

  • lack of fact verification capabilities
  • cannot access external data sources
  • opaque reasoning process
  • insufficient depth in specialized domain knowledge
  • hallucination, whose main causes include
    • training data bias
    • lack of real time fact-checking mechanism
    • model prioritize “fluency” over “accuracy”
    • overconfidence

So RAG is proposed to handle these problems (although is still not powerful enough to some extent, compared to scaling up the models). The core idea of RAG is to retrieve up-to-date external information for LLM and inject as prompt, to let LLM generate (at least theoretically) accurate answer based on up-to-date knowledge.

RAG benifits for real-time knowledge, response accuracy, source traceability and knowledge base scalability.

Retrieval-Augmented Generation

RAG is a system/pipeline for LLM application in its essence. The word “system” will be emphasized again in the chapter of agents. But let’s focus on RAG itself for now.

In short, RAG can be decomposed into 3 separate components (Retrieval-Augmented Genration):

graph LR;
A[User Query] --> B[Retriever] --> C[Augment] --> D[Generation] --> E(Answer)

For naive RAG, sources will be pre-processed for retrieval.

  1. Indexing
  2. Retrieval

GraphRAG


Since then on, different RAG are proposed to handle different tasks.

  • MiniRAG handles problems regarding cost, privacy and storage.
  • RAG-Anything focuses on multi-modality data.
  • VideoRAG focuses on video sources.