Yet another tech startup wants to topple Nvidia with an ‘order of magnitude’ better power efficiency; Sagence AI Leverages In-Memory Analog Computing Power to Deliver 666K Tokens/s on Llama2-70B
- Sagence brings analog computing power to memory to redefine AI inference
- Ten times less power and 20 times lower costs
- Also offers integration with PyTorch and TensorFlow
Sagence AI has introduced an advanced analog in-memory computing architecture designed to address power, cost and scalability issues in AI inference.
Using an analog-based approach, the architecture delivers improvements in energy efficiency and cost-effectiveness, while delivering performance comparable to existing high-end GPU and CPU systems.
This bold move positions Sagence AI as a potential disruptor in a market dominated by Nvidia.
Efficiency and performance
The Sagence architecture offers advantages when processing large language models such as Llama2-70B. Normalized to 666,000 tokens per second, Sagence’s technology delivers results with 10x lower power consumption, 20x lower costs, and 20x smaller rack space compared to leading GPU-based solutions.
This design prioritizes the demands of inference over training, reflecting the shift in AI compute focus within data centers. With its efficiency and affordability, Sagence addresses the growing challenge of ensuring return on investment (ROI) as AI applications expand to large-scale deployment.
At the heart of Sagence’s innovation is its analog in-memory computing technology, which merges storage and computation within memory cells. By eliminating the need for discrete storage and planned multi-accumulation circuitry, this approach simplifies chip designs, reduces costs and improves energy efficiency.
Sagence also uses deep subthreshold computing in multi-level memory cells – an industry first – to achieve the efficiency gains needed for scalable AI inference.
Traditional CPU and GPU-based systems rely on complex dynamic scheduling, which increases hardware requirements, inefficiency, and power consumption. Sagence’s statically scheduled architecture simplifies these processes and mirrors biological neural networks.
The system is also designed to integrate with existing AI development frameworks such as PyTorch, ONNX and TensorFlow. Once trained neural networks are imported, Sagence’s architecture eliminates the need for further GPU-based processing, simplifying deployment and reducing costs.
“A fundamental advance in AI inference hardware is critical to the future of AI. The use of large language models (LLMs) and generative AI is driving demand for rapid and large-scale change at the core of computing, requiring an unprecedented combination of the highest performance at the lowest power and an economic return that matches costs with the created value,” said Vishal Sarin, CEO & Founder, Sagence AI.
“Current computing equipment capable of extremely powerful AI inference costs too much to be economically viable and consumes too much energy to be environmentally sustainable. Our mission is to break these performance and economic limitations in an environmentally responsible manner,” said Sarin.
Via IEEE spectrum