James in Tech

This startup wants to take on Nvidia with a server-on-a-chip to fix what it calls an already flawed system – faster GPU, CPU, LPU, TPU or NIC won’t make the leap that many companies strive for

According to Israeli startup NeuReality, many AI capabilities are not fully realized due to the cost and complexity of building and scaling AI systems.

Current solutions are not optimized for inference and rely on general-purpose CPUs, which are not designed for AI. Additionally, CPU-centric architectures require multiple hardware components, resulting in underutilized Deep Learning Accelerators (DLAs) due to CPU bottlenecks.

NeuReality’s answer to this problem is the NR1AI Inference Solution, a combination of purpose-built software and a unique network-addressable inference server-on-a-chip. NeuReality says this will deliver improved performance and scalability at a lower cost, in addition to lower power consumption.

An express lane for major AI pipelines

“Our disruptive AI Inference technology is not tied to conventional CPUs, GPUs and NICs,” said Moshe Tanach, CEO of NeuReality. “We didn’t just try to improve an already flawed system. Instead, we unpacked and redefined the ideal AI Inference system from top to bottom, end to end, to deliver breakthrough performance, cost savings and energy efficiency.”

Key to NeuReality’s solution is a Network Addressable Processing Unit (NAPU), a new architecture design that leverages the power of DLAs. The NeuReality NR1, a network addressable inference server-on-a-chip, has a built-in Neural Network Engine and a NAPU.

This new architecture enables hardware inference with AI-over-Fabric, an AI hypervisor, and AI pipeline offload.

The company has two products that use the Server-on-a-Chip: the NR1-M AI Inference Module and the NR1-S AI Inference Appliance. The first is a full-height, double-width PCIe card that contains one NR1 NAPU system-on-a-chip and a network-addressable Inference Server that can connect to an external DLA. The latter is an AI-centric inference server with NR1-M modules with the NR1 NAPU. NeuReality claims the server “reduces costs and energy performance by up to 50x, but requires no end-user IT deployment.”

“Investing in more and more DLAs, GPUs, LPUs, TPUs… will not solve the core problem of system inefficiency,” Tanach said. “It’s like installing a faster engine in your car to navigate traffic jams and dead ends; It simply won’t get you to your destination any faster. NeuReality, on the other hand, provides an express lane for large AI pipelines, seamlessly routing tasks to purpose-built AI devices and quickly delivering answers to your customers, while saving both resources and capital.”

NeuReality recently secured $20 million in funding from the European Innovation Council (EIC) Fund, Varana Capital, Cleveland Avenue, XT Hi-Tech and OurCrowd.

An express lane for major AI pipelines

More from Ny Breaking