As application of LLM further develops, bare LLM is not capable for some more complex tasks, due to
- lack of fact verification capabilities
- cannot access external data sources
- opaque reasoning process
- insufficient depth in specialized domain knowledge
- hallucination, whose main causes include
- training data bias
- lack of real time fact-checking mechanism
- model prioritize “fluency” over “accuracy”
- overconfidence
So RAG is proposed to handle these problems (although is still not powerful enough to some extent, compared to scaling up the models). The core idea of RAG is to retrieve up-to-date external information for LLM and inject as prompt, to let LLM generate (at least theoretically) accurate answer based on up-to-date knowledge.
RAG benifits for real-time knowledge, response accuracy, source traceability and knowledge base scalability.
Retrieval-Augmented Generation
RAG is a system/pipeline for LLM application in its essence. The word “system” will be emphasized again in the chapter of agents. But let’s focus on RAG itself for now.
In short, RAG can be decomposed into 3 separate components (Retrieval-Augmented Genration):
graph LR; A[User Query] --> B[Retriever] --> C[Augment] --> D[Generation] --> E(Answer)
For naive RAG, sources will be pre-processed for retrieval.
- Indexing
- Retrieval
However, naive RAG has certain limitations:
-
Reliance on flat data representation with limited ability to capture complex relationships. Vectorized text chunks often overlook semantic associations between entities and relations.
-
Lack of context awareness leading to incoherent answers.
Retrieval and generation are linearly connected, with retrieved results fed into generation model as isolated fragments, lacking integration of global context.
-
Redundant information in text chunks with large amounts of query-irrelevant content.
Retrieved text chunks usually contains noise information, which may degrade the performance of RAG system during generation stage.
Since then on, different RAG are proposed to handle different tasks.
- GraphRAG integrates Knowledge Graph into text indexing and retrieval.
- MiniRAG handles problems regarding cost, privacy and storage.
- RAG-Anything focuses on multi-modality data.
- VideoRAG focuses on video sources.