[NAVER Cloud] Multimodal LLM (°æ·Â)
ºÎ¼¼Ò°³
ÀúÈñ ºÎ¼´Â HyperCLOVA X¸¦ ±â¹ÝÀ¸·Î, À̹ÌÁö¿Í ºñµð¿À µîÀÇ Multimodal µµ¸ÞÀÎÀ¸·ÎÀÇ ±â´É È®ÀåÀ» À§ÇÑ ¾ÆÅ°ÅØÃ³ ¼³°è ¹× ¸ðµ¨ »ý»êÀ» ´ã´çÇϰí ÀÖ½À´Ï´Ù. ÃÖÁ¾ÀûÀ¸·Î´Â Native Multimodal ¹× ÀÔÃâ·Â Ãø¸é¿¡¼ Any-to-Any ±îÁöÀÇ È®ÀåÀ» ¸ñÇ¥·Î Çϰí ÀÖ½À´Ï´Ù.
2024³â 9¿ù Çѱ¹ ÃÖÃÊ·Î Vision LLM ±â´ÉÀ» HyperCLOVA X ¿¡ Àû¿ëÇØ ¼ºñ½º¸¦ ½ÃÀÛÇÏ¿´°í, ƯÈ÷ 2025³â 4¿ù¿¡µµ ¿ª½Ã ´ëÇѹα¹ AI »ýŰ踦 À§ÇÏ¿© ±¹³» ÃÖÃÊ·Î »ó¾÷¿ë ¿ÀǼҽº AI ¸ðµ¨À» °ø°³Çß½À´Ï´Ù.(°ü·Ã ¸µÅ© Click) ³ª¾Æ°¡ Global Frontier Big Tech ¿Í °æÀïÇϱâ À§ÇØ NAVERÀÇ ÀÚ»êÀÎ µ¥ÀÌÅÍ¿Í ±â¼ú Ãø¸é¿¡¼ ¿À·§µ¿¾È ÃàÀûµÈ °æÇè°ú ¿ì¼öÇÑ ÀÎÀ縦 ¹ÙÅÁÀ¸·Î ´Ù¾çÇÑ ÇÁ·ÎÁ§Æ®¿¡ µµÀüÇϰí ÀÖ½À´Ï´Ù.
´ë¿ë·®ÀÇ Computing°ú ¾çÁúÀÇ µ¥ÀÌÅÍ´Â Machine Learning ¿¡ ÀÖ¾î ÇÙ½ÉÀÔ´Ï´Ù. Computing ÀÚ¿øÀº »ý»êÀ» ¹Ýº¹ÇÒ ¶§¸¶´Ù ºñ¿ë¿¡ »êÀÔÀÌ µÇ±â¿¡, ÇнÀ È¿À²¼º ¹× ¼º´É Çâ»óÀ» À§ÇØ µ¥ÀÌÅ͸¦ Á¤±³ÇÏ°Ô Filtering Çϰí CurationÇÏ´Â °ÍÀÌ Áß¿äÇÕ´Ï´Ù. À̸¦ À§ÇØ ÇØ´ç µ¥ÀÌÅ͵éÀ» ¿©·¯ ¹æ¹ý·ÐÀ» ÅëÇÏ¿© (Model Driven) ÇнÀ °øÁ¤À» ´ÜÃàÇÏ°í µ¥ÀÌÅÍ¿Í ¸ðµ¨ÀÇ »óÈ£ÀÛ¿ëÀ» ޱ¸Çϸç SOTA ¼öÁØÀÇ ¼º´É °³¼±À» ¸ñÇ¥·Î ÇÕ´Ï´Ù.
´ã´ç¾÷¹«
Vision Language Model °³¹ß: Multimodal È®ÀåÀ» À§ÇÑ È¿°úÀûÀÎ Architecture ¿Ï¼º ¹× Benchmark Àü¹Ý¿¡ ´ëÇÑ µ¿ÀÏ ÇнÀ ¼öÁØ °æÀï ¸ðµ¨ ´ëºñ ¿ìÀ§ ´Þ¼º
• Real-time streaming/efficient Multimodal understanding¸¦ À§ÇÑ Multimodal (Vision) encoding ¿¡ ´ëÇÑ ½ÇÁõÀûÀÎ architecture design
• Distributed Training(FSDP, ZeRO), Sequence Packing, Sequence Parallel µî VLM ÇнÀ È¿À² °³¼±À» À§ÇÑ ±â¼ú stack Àû¿ë ¹× °³¹ß
• Ãß°¡ÀûÀÎ Modality(Audio I/O, Image/Video) È®ÀåÀ» À§ÇÑ À¯°ü ºÎ¼¿ÍÀÇ ¼ÒÅë
• Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿ø (InfiniBand ±â¹Ý GPU Cluster)¿¡¼ÀÇ VLM Ablation, ÇнÀ ¹× »ý»ê
• Public Benchmark SOTA ´Þ¼ºÀ» À§ÇÑ Small Scale Ablation ¹× ºÐ¼®
Multimodal Pretraining: °æÀï»ç ¼öÁØÀÇ ÇнÀ·®¿¡ µµ´ÞÇÏ´Â Vision Backbone È®º¸ ¹× ÃÖÁ¾ ¼º´É Çâ»ó. Omni modality·ÎÀÇ È®Àå
• Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿ø (InfiniBand ±â¹Ý GPU Cluster)À» Ȱ¿ëÇÑ Multimodal Backbone¿¡ Vision ´É·Â Ãß°¡¸¦ À§ÇÑ ¸ðµ¨ ÇнÀ
• Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿øÀ» Ȱ¿ëÇÑ Backbone »ý»êÀ» À§ÇÑ Framework µµÀÔ°ú °ü·ÃµÈ Engineering
• ´Ù¾çÇÑ Multimodal Backbone ÀÌ ÃÖÁ¾ ¸ðµ¨ÀÇ ¼º´É¿¡ ¹ÌÄ¡´Â ¿µÇâ Ž»ö
• È¿°úÀûÀÎ Pretraining Recipe Ž»ö°ú Pretraining Data ¿¡ ´ëÇÑ Curation ¹× Filtering ÀÛ¾÷
• Distributed Training(FSDP, ZeRO, Megatron), Sequence Packing, Sequence Parallel µî VLM ÇнÀ È¿À² °³¼±À» À§ÇÑ Àû¿ë ¹× °³¹ß