‘For many AI applications, GPUs are compute overkill and consume far more power and money than necessary’: How Ampere Computing plans to ride the AI wave
Ampere computer use is a startup that is making waves in the technology industry by challenging the dominance of technology giants such as AMD, Nvidia and Intel. With the rise of AI, the demand for computing power has increased dramatically, as have energy costs and the demand for electricity grids. Ampere wants to address this with a low-power, high-performance solution.
Despite being the underdog, Ampere’s offering has been adopted by almost all major hyperscalers worldwide. It has broken the scale wall several times with its CPUs, and the company plans to continue to scale in ways that older architectures cannot. We spoke with Jeff Wittich, CPO of Ampere, about his company’s success and future plans.
I sometimes feel like challenger startups, like Ampere Computing, are stuck between a rock and a hard place. On the one hand you have billion dollar companies like AMD, Nvidia and Intel and on the other hand hyperscalers like Microsoft, Google and Amazon have their own offerings. How does it feel to be the small mammal in the land of dinosaurs?
It is truly an exciting time for Ampere. We may only be six years old, but as we predicted when we started the company, the need for a new cloud computing solution has never been greater. The industry doesn’t need more dinosaurs, it needs something new.
The needs of the cloud have changed. The amount of computing power required for today’s connected world is far greater than anyone could have ever imagined and will only increase with the rise of AI. At the same time, energy costs have skyrocketed, demand on the world’s electricity grids is exceeding supply, and construction of new data centers is being halted for a number of reasons. The convergence of these factors has created the perfect opportunity for Ampere to provide a much-needed low-power, high-performance solution that has not yet been delivered by major, legacy players.
Because of our ability to provide this, we have grown rapidly and been acquired by almost all major hyperscalers around the world. We are also seeing increasing adoption across the enterprise as companies look to get the most out of their existing data center footprint. The increased demand we continue to see for Ampere products gives us confidence that the industry recognizes our value.
Ampere has been leading the high core count in the server CPU market for a few years now. But others – AMD and Intel – have caught up; Given the immutable laws of physics, when do you expect to hit a wall when it comes to physical nuclei, and how do you plan to break through it?
As you mentioned, Ampere has been leading in high core count, compact and efficient computing power in recent years. We identified early on where the key challenges would arise for cloud growth, and we’re addressing those exact challenges today with our Ampere CPUs. Our Ampere CPUs are perfect for all kinds of cloud usage and for a wide range of workloads.
We’ve broken the scale wall several times now, first having 128 cores and now 192 cores. Such innovation requires a new approach that breaks through existing limitations. Ampere’s new approach to CPU design, from the microarchitecture to the feature set, will allow us to continue to scale in ways that older architectures cannot.
Another credible threat looming on the horizon is the rise of RISC-V, with China putting its weight behind micro-architecture. What is your own personal opinion on that front? Could Ampere ever join team RISC?
Ampere’s core strategy is to develop sustainable processors that can increase computing capacity now and in the future. We will build our CPUs using the best available technologies to deliver leadership performance, efficiency and scalability, as long as these technologies can be easily used by our customers to run their desired operating systems, infrastructure software and user applications.
What can you tell us about the sequel to Ampere One? Will it follow the same trajectory as Altra > One? More cores? Same frequency, more L2 cache per core? Will it be called Ampere 2 and still be single threaded?
In the coming years, we will continue to focus on releasing CPUs that are more efficient and deliver higher core counts, as well as more memory bandwidth and IO capabilities. This will provide us with increasing throughput for increasingly important workloads such as AI inference, while uniquely meeting the sustainability goals of cloud providers and users.
Our products will also continue to focus on delivering predictable performance to cloud users, eliminating noisy neighbor issues, and enabling providers to run Ampere CPUs at high utilization rates. We will introduce additional features that provide cloud providers with a greater degree of flexibility to meet the diverse range of customer applications. These are critical to the performance of Cloud Native workloads now and in the future.
Given Ampere Computing’s focused approach, can you give us a brief description of what your typical customer typically does and what type of workloads he/she typically faces?
Because our CPUs are general-purpose, they serve a broad spectrum of applications. We built our CPUs from the ground up as Cloud Native Processors so they perform very well on virtually all cloud workloads. AI inference, web services, databases and video processing are just a few examples. In many cases, we can get twice the performance for these workloads with half the power of older x86 processors.
in terms of customers, we work with almost all major hyperscalers in the US, Europe and China. For example, in the US you can find Ampere instances at Oracle Cloud, Google Cloud, Microsoft Azure and more. Ampere CPUs are also available throughout Europe from various cloud providers.
In addition to the major cloud providers, we see a lot of movement in the enterprise through our offerings with OEMs such as HPE and Supermicro. This is largely due to the greater efficiency and rack density that these companies can achieve by deploying Ampere servers. Companies want to save energy and do not want to build additional data centers that are not part of their core activities.
With the rise of AI, once “simple” devices are becoming increasingly intelligent, leading to greater demand for cloud computing in hyper-local areas. These edge deployments have stringent space and power requirements, and due to Ampere’s ability to deliver such a large number of cores in a low power environment, we are also seeing high demand for these workloads.
AI has become the biggest talking point in the semiconductor industry and beyond this year. Do you think this will change in 2024? How do you view this market?
We are convinced that AI will remain the most important topic of conversation. But we do think the conversation will change – and it’s already starting to do so.
By 2024, many companies working on AI solutions will move from the initial training of neural networks to their implementation, known as AI inference. Because AI inference can require ten times more computing power than training, the ability to deploy AI at scale will become increasingly important. Achieving this required scale will be limited by performance, cost and availability, so organizations will look for alternatives to GPUs as they enter this next phase. CPUs, and especially low-power, high-performance CPUs like Ampere offers, will become an increasingly attractive choice given their ability to enable more efficient and cost-effective execution of AI inference models. GPUs will still be important for certain aspects of AI, but we expect the hype to die down.
Second, sustainability and energy efficiency will become even more important in the context of AI next year. Nowadays, data centers often struggle to meet their energy needs. Increasing AI use will lead to even greater demand for computing power by 2024, and for some AI workloads that could require up to 20x more power. As a result, sustainability and efficiency will become challenges for expansion. Data center operators will place a high priority on efficiency in the new year to avoid jeopardizing growth.
How does Ampere deal with these new opportunities in the AI market with its products?
For many AI applications, GPUs are computational overkill and consume far more power and money than necessary. This is especially true for most inferences, especially when running AI workloads in combination with other workloads such as databases or web services. In these cases, replacing the GPU with a CPU saves energy, space and costs.
We’re already seeing this come to life for real workloads, and the benefit of using Ampere processors is significant. For example, running the popular generative AI model Whisper on our 128-core Altra CPU versus Nvidia’s A10 GPU card, we use 3.6 times less power per inference. Compared to Nvidia Tesla T4 cards, we consume 5.6 times less.
As a result, we are seeing a substantial increase in demand for Ampere processors for AI inference, and we expect this to become a huge market for our products. Just a few weeks ago, Scaleway, one of Europe’s leading cloud providers, announced the upcoming general availability of new AI inference instances powered by Ampere. Additionally, we have seen a sevenfold increase in usage of our AI software library over the past six months. All of this speaks to the growing acceptance of our products as a powerful, low-power alternative to AI inference.