Abstract: Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multimodal learning models ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results