'2025/02 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

목록2025/02 (2)

헬창 개발자

논문 리뷰: s1: Simple test-time scaling

arXiv 2025. [Paper] [Github]Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori HashimotoStanford University | University of Washington | Allen Institute for AI | Contextual AI31 Jan 2025개요해당 논문은 Test-time scaling이라는 개념을 활용하여 언어 모델의 성능 향상시키는 방법을 탐구한다. 최근 OpenAI의 o1모델이 이 기술을 사용하여 뛰어난 성능을 보였으나, 구체적인 ..

공부방 2025. 2. 18. 10:08

논문 리뷰: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

🤔들어가기전에 알고가기1. DeepSeek-V3 개요DeepSeek-V3는 671B(6710억) 개의 총 파라미터를 가진 Mixture-of-Experts (MoE) 기반 대규모 언어 모델이다. 하지만 한 번의 토큰 예측 시 활성화되는 파라미터는 37B로 효율성을 극대화하였다.주요 특징:Multi-Head Latent Attention (MLA): 메모리 절약과 빠른 추론을 위한 새로운 주의(attention) 기법DeepSeekMoE with Auxiliary-Loss-Free Load Balancing: 부가적인 손실(auxiliary loss) 없이 부하 균형(load balancing)을 유지하는 새로운 MoE 아키텍처Multi-Token Prediction (MTP): 여러 개의 토큰을 한 번..

공부방 2025. 2. 5. 16:23

Prev 1 Next

목록2025/02 (2)

헬창 개발자

티스토리툴바