NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

By Pooja Choudhary On Mar 4, 2024

Optimizations for Gemma on AI Platforms

Gemma, Google’s new lightweight 2 billion- and 7 billion-parameter large language models, can be run anywhere, reducing costs and speeding up innovative work for domain-specific use cases. NVIDIA and Google launched optimizations across all NVIDIA AI platforms for Gemma.

Read: How to Incorporate Generative AI Into Your Marketing Technology Stack

Using NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, coupled with NVIDIA GPUs in the data center, the cloud, and locally on workstations with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs, the companies’ teams worked closely to speed up Gemma’s performance. Gemma is built from the same research and technology as the Gemini models. Because of this, developers may aim their products at the more than 100 million high-performance AI PCs around the world that have NVIDIA RTX GPUs installed.

Read: 10 AI In Energy Management Trends To Look Out For In 2024

ABBYY’s 2nd Annual Intelligent Automation Month Showcases AI’s Power to Speed Up Business

Aug 14, 2024

Luminoso announces Technology Partnership with Qlik and New Leadership Appointments

Aug 14, 2024

Bitdeer AI Unveils Advanced AI Training Platform with Serverless GPU Infrastructure for Scalable and Efficient AI/ML Inference

Aug 14, 2024

Prev Next 1 of 40,379

NVIDIA and Google Supercharge AI Platforms for Gemma

On top of that, developers can use Gemma on NVIDIA GPUs in the cloud, such as the A3 instances on Google Cloud that are built on the H100 Tensor Core GPU and the soon-to-be-deployed H200 Tensor Core GPUs from NVIDIA, which come with 141GB of HBM3e memory and 4.8 terabytes per second. To further enhance Gemma and implement the optimized model in their production applications, enterprise developers can leverage NVIDIA’s extensive ecosystem of technologies, which includes NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM.

Read: Intel’s Automotive Innovation At CES 2024

Find out how TensorRT-LLM is boosting Gemma’s inference and other details for developers. All of the model versions, including the FP8-quantized one and the many Gemma checkpoints, were optimized with TensorRT-LLM.

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]