As artificial intelligence models become more advanced and enterprise needs expand, Google Cloud is reshaping its AI infrastructure to support the next phase of development.
According to Mark Lohmeyer, vice president and general manager of AI computing infrastructure at Google Cloud, the company is focusing on boosting performance and optimizing costs—especially as hardware expenses continue to climb. This evolution is driven by innovations ranging from open-source tools like JAX to powerful large language models.
“We’re calling 2025 the ‘year of inference’ at Google,” Lohmeyer said. “This shift is evident not just in our own operations but also in how our cloud customers are adopting AI. We’re seeing the rise of reasoning models—those that require several steps to determine the best outcome. These models are now forming the backbone of AI agents and agentic workflows. The infrastructure demands from these models are unprecedented.”
Lohmeyer shared these insights during an interview with theCUBE’s Rob Strechay and Rebecca Knight at the Red Hat Summit, featured in an exclusive livestream by SiliconANGLE Media.
Shifting Infrastructure for the Inference Era
The emergence of reasoning models, which can perform multi-step decision-making processes, has significantly increased the computational burden on AI infrastructure. These models operate as intelligent agents within complex workflows, collaborating to solve intricate problems. Meeting these needs requires robust, scalable, and cost-effective systems, Lohmeyer explained.
One area of innovation is the adoption of VLLM, which is known for delivering high performance and cost-efficiency, especially when used with GPUs. Google Cloud is now expanding VLLM support to Tensor Processing Units (TPUs), offering customers improved cost-performance options. This adaptability allows organizations to choose the most suitable hardware for each workload, minimizing inference costs—a crucial factor for AI-based businesses.
“Google has long been a leader in open-source development,” Lohmeyer noted. “Kubernetes was transformative, and more recently, tools like JAX have proven to be powerful frameworks for training and deploying AI models. We originally built JAX to support our own Gemini model development, but we realized it was too valuable to keep internal—so we made it open-source.”
Topics #AI Infrastructure #Artificial intelligence #Google Cloud #Inference #Machine learning #Reasoning Models