>

Transformers Optimization: Part 1 - KV Cache

Image by Martin Adams In this Transformers Optimization series, we will explore various optimization techniques for Transformer models. As a kickoff piece, we will dive deep into KV Cache, an inference optimization technique to significantly enhance the inference performance of large language models. What is KV Cache? A common technique for improving the performance of large model inferences is by using the KV cache of the last inference. Using the KV cache of the last inference improves inference performance and reduces end-to-end latency without affecting any accuracy....

October 7, 2023 · 8 min · Rajan Ghimire