I'm interested in Large-scale Engineering, Data Engineering, Representation Learning, Multi-modal Understanding, Training Optimization, Data Curation
π οΈ LLM Data Engineer (now) - 42dot
π Research Intern - Kakaobrain
πΏ Research Intern @kakaobrain
πΊπΈ Intern as a UI developer - Wavity
π°π· Bachelor degree of Computer Science Engineering at Sogang University (2012 - 2019)
π°π· Master degree of Computer Science Engineering at Sogang University (2020 - 2022)
π₯ 2020 Korea Health Dataton 2nd Prize (Binary Classification on Breast Cancer Pathology Image)
π₯ 2020 Naver AI Rush Challenge, 1st Prize on 3 Areas (Auto Tagging on Naver Shopping Image, Mood Classification on Music, Genre Classification on Japanese Music)
π coyo-700M Dataset: A large-scale dataset aimed at enhancing data curation and multi-modal understanding, publicly released for the research community. Check it out here: coyo-700M.
βοΈ ViT Alignment Blog Post on Hugging Face: Based on the coyo-700M dataset, this blog post discusses the reproduction of Vision Transformer (ViT) models. Read the blog post: vit-align.