KnowTrace: How Structured Knowledge Tracing is Revolutionizing Multi-Hop Question Answering
Large language models (LLMs) have made impressive strides in natural language tasks, but they still struggle with complex, multi-hop questions that require piecing together information from multiple sources. Retrieval-augmented generation (RAG) has emerged as a promising solution, allowing LLMs to fetch external knowledge to fill in gaps. However, traditional iterative RAG methods face two critical challenges: ever-growing, unstructured contexts that overload the LLM, and non-contributive reasoning steps that trigger irrelevant retrievals.
Enter KnowTrace, a novel framework introduced in a recent arXiv paper by researchers from Renmin University of China and Huawei Noah’s Ark Lab. KnowTrace reimagines iterative RAG through the lens of structured knowledge tracing, offering a more elegant and effective approach to multi-hop question answering (MHQA).
The Problem with Traditional Iterative RAG
Existing iterative RAG systems, like IRCoT and ReAct, alternate between LLM reasoning and retrieval, accumulating external information into the model’s context. But this approach has flaws:
- Context Overload: As more information is piled into the context, the LLM struggles to perceive connections between critical pieces of knowledge.
- Futile Reasoning: Not every LLM generation is useful. Non-contributive reasoning steps can trigger irrelevant retrievals, exacerbating the overload issue.
Some attempts to mitigate this involve restructuring retrieved documents into more organized formats, but these methods are computationally expensive and impractical for iterative workflows.
KnowTrace: A Structured Solution
KnowTrace addresses these challenges by reformulating iterative RAG as a knowledge graph (KG) expansion process. Instead of haphazardly stacking retrieved content, KnowTrace autonomously traces out question-relevant knowledge triplets (subject, relation, object) to build a structured KG. This workflow provides the LLM with an intelligible context, making it easier to reason through complex questions.
Here’s how it works:
- Knowledge Exploration: The LLM assesses whether the current KG is sufficient to answer the question. If not, it identifies entities and relations to explore next.
- Knowledge Completion: For each entity-relation pair, relevant passages are retrieved, and the LLM extracts knowledge triplets to expand the KG.
This structured approach not only improves reasoning quality but also enables a reflective backtracing mechanism. By analyzing the final KG, KnowTrace can retrospectively identify which reasoning steps were truly contributive, filtering out noise to create high-quality training data for self-improvement.
Performance and Benefits
Experiments on three MHQA benchmarks (HotpotQA, 2WikiMultihopQA, and MuSiQue) show that KnowTrace consistently outperforms existing methods. For instance, with LLaMA3-8B-Instruct as the backbone, KnowTrace achieved absolute EM gains of 5.3% over IRCoT and 4.3% over restructuring-based ERA-CoT. With GPT-3.5-Turbo-Instruct, the gains were even higher.
Key advantages of KnowTrace include:
- Efficiency: Unlike restructuring-based methods, KnowTrace avoids costly LLM-driven operations, keeping computational overhead low.
- Flexibility: It adaptively traces relevant knowledge, avoiding the rigidity of question-parsing approaches.
- Self-Bootstrapping: The backtracing mechanism allows KnowTrace to improve its own reasoning capabilities over time.
The Future of Structured RAG
KnowTrace represents a significant step forward in making iterative RAG more robust and scalable. By integrating structured knowledge tracing with LLM reasoning, it not only enhances performance but also opens the door to more interpretable and self-improving AI systems. As the paper concludes, this approach could inspire further innovations in how we combine retrieval, reasoning, and structured knowledge for complex question answering.
For more details, check out the full paper on arXiv or the GitHub repository.