SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs Paper • 2512.00722 • Published 13 days ago • 14
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs Paper • 2512.00722 • Published 13 days ago • 14 • 2
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 37