上海市农业科学院机构知识库

Collage: Light-Weight Low-Precision Strategy for LLM Training

文献类型：会议论文

第一作者： Tao Yu

作者： Tao Yu ¹ ; Gaurav Gupta ² ; Karthick Gopalswamy ³ ; Amith Mamidala ³ ; Hao Zhou ⁴ ; Jeffrey Huynh ³ ; Youngsuk Park ⁵ ; Ron Diamant ³ ; Anoop Deoras ² ; Luke Huan ² ;

作者机构： 1.Cornell University

2.AWS AI Labs

3.AWS Annapurna Labs

4.AWS Sage-maker

5.AWS AI Research and Education

会议名称： International Conference on Machine Learning

主办单位：

页码： 57459-57479

摘要： Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision (16-bit floating points) and can be naturally extended to work with even lower precision such as 8-bit. Experimental results show that pre-training using COLLAGE removes the requirement of using 32-bit floating-point copies of the model and attains similar/better training performance compared to (16, 32)-bit mixed-precision strategy, with up to 3.7× speedup and ～ 15% to 23% less memory usage in practice. The code is available at https://github.com/amazon-science/collage.

分类号： tp181-53

相关文献

Collage: Light-Weight Low-Precision Strategy for LLM Training

作者其他论文更多>>

意见箱

Collage: Light-Weight Low-Precision Strategy for LLM Training

作者其他论文 更多>>

意 见 箱

作者其他论文更多>>

意见箱