AWS launches new instances to turbocharge AI training

>

Amazon Web Services (AWS) has launched EC2 instances that it claims are optimized specifically for deep learning training.

The new Amazon EC2 Trn1 instances are powered by AWS Trainium chips, a second-generation ML chip designed by AWS, following the lead of the AWS Inferentia chips.

The cloud giant claims these new instances are well suited for large-scale distributed training of complex deep learning models, such as natural language processing and image recognition.

What do users get?

Trn1 instances are available in two configurations and are powered by up to 16 AWS Trainium chips with 128 vCPUs.

The instances apparently offer up to 512 GB of high-bandwidth memory and provide up to 3.4 petaFLOPS of TF32/FP16/BF16 computing power and feature a NeuronLink interconnection between chips. NeuronLink helps avoid communication bottlenecks when scaling workloads across multiple Trainium chips.

In addition, Amazon says Trn1 instances are the first EC2 instances to enable up to 800 Gbps Elastic Fabric Adapter (EFA) network bandwidth for high-throughput network communications. And Trn1 instances come with up to 8TB of local NVMe SSD storage for ultra-fast access to large data sets.

AWS also said its Trainium chips contain specific scalar, vector and tensor engines built specifically for deep learning algorithms.

Other new features of Trainium chips include support for a wide variety of data types, including FP32, TF32, BF16, FP16 and UINT8, stochastic rounding, as well as custom operators written in C++ and dynamic tensor shapes.

AWS Trainium shares the same AWS Neuron SDK as AWS Inferentia, which could facilitate the transition to AWS Trainium.

Where can I register?

You can launch Trn1 instances today in select regions, such as AWS US East (N. Virginia) and US West (Oregon).

These Trn1 instances can be deployed using AWS Deep Learning AMIs, and container images are available through managed services such as Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), and AWS ParallelCluster.

For more information, visit Amazon EC2’s Trn1 instances page (opens in new tab).

Related Post