WEKA Debuts NeuralMesh Axon For Exascale AI Deployments

5 hours ago 1

PR Newswire

Mon, Jul 7, 2025, 10:00 PM 8 min read

New Offering Delivers a Unique Fusion Architecture That's Being Leveraged by Industry-Leading AI Pioneers Like Cohere, CoreWeave, and NVIDIA to Deliver Breakthrough Performance Gains and Reduce Infrastructure Requirements For Massive AI Training and Inference Workloads

, /PRNewswire/ -- From RAISE SUMMIT 2025: WEKA unveiled NeuralMesh Axon, a breakthrough storage system that leverages an innovative fusion architecture designed to address the fundamental challenges of running exascale AI applications and workloads. NeuralMesh Axon seamlessly fuses with GPU servers and AI factories to streamline deployments, reduce costs, and significantly enhance AI workload responsiveness and performance, transforming underutilized GPU resources into a unified, high-performance infrastructure layer.

WEKA's NeuralMesh Axon delivers an innovative fusion architecture designed to address the fundamental challenges of running exascale AI applications and workloads.

Building on the company's recently announced NeuralMesh storage system, the new offering enhances its containerized microservices architecture with powerful embedded functionality, enabling AI pioneers, AI cloud and neocloud service providers to accelerate AI model development at extreme scale, particularly when combined with NVIDIA AI Enterprise software stacks for advanced model training and inference optimization. NeuralMesh Axon also supports real-time reasoning, with significantly improved time-to-first-token and overall token throughput, enabling customers to bring innovations to market faster.

AI Infrastructure Obstacles Compound at Exascale
Performance is make-or-break for large language model (LLM) training and inference workloads, especially when running at extreme scale. Organizations that run massive AI workloads on traditional storage architectures, which rely on replication-heavy approaches, waste NVMe capacity, face significant inefficiencies, and struggle with unpredictable performance and resource allocation.

The reason? Traditional architectures weren't designed to process and store massive volumes of data in real-time. They create latency and bottlenecks in data pipelines and AI workflows that can cripple exascale AI deployments. Underutilized GPU servers and outdated data architectures turn premium hardware into idle capital, resulting in costly downtime for training workloads. Inference workloads struggle with memory-bound barriers, including key-value (KV) caches and hot data, resulting in reduced throughput and increased infrastructure strain. Limited KV cache offload capacity creates data access bottlenecks and complicates resource allocation for incoming prompts, directly impacting operational expenses and time-to-insight. Many organizations are transitioning to NVIDIA accelerated compute servers, paired with NVIDIA AI Enterprise software, to address these challenges. However, without modern storage integration, they still encounter significant limitations in pipeline efficiency and overall GPU utilization.

Read Entire Article