У нас вы можете посмотреть бесплатно IterDRAG: Inference Scaling for Long-Context Retrieval Augmented Generation или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Paper: https://arxiv.org/pdf/2410.04343v1 Notes: Problem - scale test-time compute for RAG beyond "more docs"; measure via *effective context length* = total input tokens across all LLM calls. Core idea - two inference-scaling strategies: DRAG (demonstration-based RAG) and IterDRAG (iterative DRAG with interleaved retrieval + generation). DRAG mechanism - prepend retrieved documents + complete in-context examples (each example = k docs + q + a); reverse doc order so higher-rank docs sit nearer the query. IterDRAG loop - generate either sub-query, intermediate answer, or final answer; on sub-query: retrieve additional docs, merge with existing context, generate intermediate answer; repeat up to n iterations (default ≤5), then force final answer. Creating demonstrations - use constrained decoding / Self-Ask to produce (sub-query, intermediate answer) chains; keep only examples with correct final answer for in-context examples. Budgeting decision space θ = (k docs, m shots, n iterations); for a max budget Lmax choose θ that maximizes average metric under l(x;θ) ≤ Lmax. Empirically grid-search to find optima. Empirical law - optimal performance P*(Lmax) grows nearly linearly with log-scale increases in effective context length up to ≈1M tokens; IterDRAG outperforms at very large budgets (≥128k–1M+). Failure modes observed - retrieval noise / distraction from too many similar docs; model inability to pick relevant facts from ultra-long windows; hallucination; outdated or incorrect retrieval; missing reasoning steps. IterDRAG mitigates several of these. Practical heuristics to avoid collapse/noise - prefer iterative retrieval for multi-hop; limit iterations; constrain decoding to Self-Ask format; reverse doc order; use demonstrations containing documents to teach extraction behavior. Computation-allocation model (predictor) - apply inverse-sigmoid to metric then model: σ⁻¹(P(θ)) ≈ (a + b ⊙ i)^T log(θ) + c, where i = (i_doc,i_shot,0) captures per-task informativeness (gain from +1 doc / +1 shot). Fit a,b,c with OLS. Model diagnostics & use - full model (with b and sigmoidal σ) gives best R² / lowest MSE; generalizes across domains and extrapolates to larger Lmax (best below 1M; predictions degrade toward 5M). Use model to pick near-optimal θ under compute constraints. Efficiency notes - retrieval cost ignored relative to LLM inference; document-recall improves with k but NDCG and generation quality show diminishing returns — re-ranking / filtering recommended. IterDRAG trades extra API calls for better use of tokens. Takeaway intuition - rather than dump more docs into one shot, teach the model via demonstrations and/or let it iteratively decompose queries; this allocates the same token budget more effectively and yields near-linear gains in practice. Disclaimer: This is an AI-powered production. The scripts, insights, and voices featured in this podcast are generated entirely by Artificial Intelligence models. While we strive for technical accuracy by grounding our episodes in original research papers, listeners are encouraged to consult the primary sources for critical applications.