ByteDance’s Doubao Large Model team yesterday introduced UltraMem, a new architecture designed to address the high memory access issues found during inference in Mixture of Experts (MoE) models. UltraMem boosts inference speed by two to six times and can reduce inference costs by up to 83%, according to the team. As large model sizes increase, inference costs and memory efficiency have become critical bottlenecks. UltraMem, a sparse model that decouples computation from parameters, aims to tackle these challenges while maintaining model performance. The breakthrough has been accepted for presentation at ICLR 2025 (International Conference on Learning Representations, a major AI industry event), with ByteDance saying it offers a novel approach to enhancing the efficiency and scalability of large models. [Doubao Large Model team WeChat account]
0 Commentaires