Over the past decade or so, foundation models have emerged as the dominant paradigm for interacting with language, images, and code. Large Language Models (LLMs) can generate text. Vision models can ...