Nvidia says its new KV Cache Transform Coding method can cut the memory requirements of large language models by up to twenty times without changing model weights. That matters most for companies paying for expensive long-context workloads, coding assistants and agentic workflows.