Timestamp Analysis Behind Alibaba Cloud Server Error Prediction Strength
Alibaba Cloud has shared more information about a technology it uses to improve error prediction and detection for its servers, which the company claims offers a 10% improvement over existing models.
The Chinese company’s latest tool, Time-Aware Attention-Based Transformer (TAAT), addresses the limitations of existing machine learning tools that overlook the importance of log timestamps.
Expanded in a new research paper Co-authored by Alibaba Cloud staff and a researcher from Huazhong University of Science and Technology in Wuhan, TAAT uses timestamps to make outage predictions more accurate.
Alibaba Cloud Increases Server Outage Predictions by 10%
The authors of the paper highlight growing concerns about server reliability and stability in light of the “widespread adoption of cloud computing,” which impacts the availability of virtual machines.
Because companies can predict future outages based on past outages, the company opted to use timestamps to improve accuracy.
TAAT integrates semantic and temporal data using Google’s proprietary Bidirectional Encoder Representations from Transformers (BERT) language model, which Alibaba says is good for analyzing log data. An enhancement to BERT’s capabilities adds a time-aware attention mechanism.
Consequently, Alibaba Cloud is now using TAAT in daily operations to improve predictions. The company has also released the real-world cloud computing failure prediction dataset used in its study to support further community developments. The dataset contains approximately 2.7 billion logs from around 300,000 servers, collected over a period of four months, and is considered the largest log of its kind.
With TAAT, Alibaba hopes for a more reliable cloud infrastructure. Although the tool is not yet publicly available for download, it paves the way for an increasingly cloud-based landscape.