This repository organizes a timeline of key events (products, services, papers, GitHub, blog posts and news) that occurred before and after the ChatGPT announcement.
It's curating a variety of information in this timeline, with a particular focus on LLM and Generative AI.
Maybe it's a scene from the hottest history, so I thought it would be important to keep those memories well, so I organized them.
These diagrams were generated by ChatGPT's Code Interpreter.
Issues and Pull Requests are greatly appreciated. If you've never contributed to an open source project before I'm more than happy to walk you through how to create a pull request.
You can start by opening an issue describing the problem that you're looking to resolve and we'll go from there.
arXiv β, PDF π, arxiv-vanity π, paper page π , papers with code β³οΈ, Github
This document is licensed under the MIT license Β© Jonghong Jeon(μ μ’ ν)
- 05/17 - OpenAI strikes Reddit deal to train its AI on your posts
(News), - 05/17 - OpenAI dissolves team focused on long-term AI risks, less than one year after announcing it
(News), - 05/17 - International Scientific Report on the Safety of Advanced AI
(Blog), - 05/16 - TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - Toon3D: Seeing Cartoons from a New Perspective
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - Testing the reliability of an AI-based large language model to extract ecological information from the scientific literature
(News), - 05/16 - Many-Shot In-Context Learning in Multimodal Foundation Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - How to Hit Pause on AI Before Itβs Too Late
(News), - 05/16 - Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - GPT Store Mining and Analysis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - Chameleon: Mixed-Modal Early-Fusion Foundation Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/16 - CAT3D: Create Anything in 3D with Multi-View Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/15 - Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/15 - LoRA Learns Less and Forgets Less
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/15 - Googleβs invisible AI watermark will help identify generative text and video
(News), - 05/15 - Google I/O 2024: everything announced
(Blog), - 05/15 - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/15 - ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - Understanding the performance gap between online and offline alignment algorithms
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - SpeechVerse: A Large-scale Generalizable Audio Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - Compositional Text-to-Image Generation with Dense Blob Representations
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/14 - Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/13 - SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/13 - RLHF Workflow: From Reward Modeling to Online RLHF
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/13 - Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/13 - OpenAI unveils newest AI model, GPT-4o
(News), - 05/13 - MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/13 - How Much Research Is Being Written by Large Language Models?
(Blog), - 05/13 - Hello GPT-4o
(Blog), - 05/13 - Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/11 - Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/11 - LogoMotion: Visually Grounded Code Generation for Content-Aware Animation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/10 - INSPECT - An open-source framework for large language model evaluations
(Blog), - 05/10 - AI Safety Institute releases new AI safety evaluations platform
(News), - 05/07 - SUTRA: Scalable Multilingual Language Model Architecture
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/07 - Meta Releases Llama 3 Open-Source LLM
(News), - 05/03 - What matters when building vision-language models?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - WildChat: 1M ChatGPT Interaction Logs in the Wild
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - LLM-AD: Large Language Model based Audio Description System
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - FLAME: Factuality-Aware Alignment for Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/02 - Customizing Text-to-Image Models with a Single Image Pair
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/01 - Spectrally Pruned Gaussian Fields with Neural Compensation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/01 - Self-Play Preference Optimization for Language Model Alignment
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/01 - Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/01 - Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 05/01 - A Careful Examination of Large Language Model Performance on Grade School Arithmetic
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - STT: Stateful Tracking with Transformers for Autonomous Driving
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Octopus v4: Graph of language models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - MicroDreamer: Zero-shot 3D Generation in sim20 Seconds by Score-based Iterative Reconstruction
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Lightplane: Highly-Scalable Components for Neural 3D Fields
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - KAN: Kolmogorov-Arnold Networks
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Iterative Reasoning Preference Optimization
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Extending Llama-3's Context Ten-Fold Overnight
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - DOCCI: Descriptions of Connected and Contrasting Images
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/30 - Better & Faster Large Language Models via Multi-token Prediction
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/29 - Stylus: Automatic Adapter Selection for Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/29 - SAGS: Structure-Aware 3D Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/29 - Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/29 - NIST AI RMF Generative AI Profile
(News), - 04/29 - LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/29 - Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/29 - Capabilities of Gemini Models in Medicine
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/28 - Paint by Inpaint: Learning to Add Image Objects by Removing Them First
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/28 - LEGENT: Open Platform for Embodied Agents
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/27 - Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/26 - MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/26 - BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - Tele-FLM Technical Report
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - Make Your LLM Fully Utilize the Context
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - Interactive3D: Create What You Want by Interactive 3D Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/25 - ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - The Ethics of Advanced AI Assistants
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - PuLID: Pure and Lightning ID Customization via Contrastive Alignment
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - NeRF-XL: Scaling NeRFs with Multiple GPUs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - MotionMaster: Training-free Camera Motion Transfer For Video Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - MoDE: CLIP Data Experts via Clustering
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - MaGGIe: Masked Guided Gradual Human Instance Matting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - Editable Image Elements for Controllable Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/24 - BASS: Batched Attention-optimized Speculative Sampling
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/23 - Transformers Can Represent n-gram Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/23 - Pegasus-v1 Technical Report
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/23 - Multi-Head Mixture-of-Experts
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/23 - FlashSpeech: Efficient Zero-Shot Speech Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - SnapKV: LLM Knows What You are Looking for Before Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - MultiBooth: Towards Generating All Your Concepts in an Image from Text
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - Learning H-Infinity Locomotion Control
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/22 - A Multimodal Automated Interpretability Agent
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/21 - Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/21 - AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/20 - Music Consistency Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - TextSquare: Scaling up Text-Centric Visual Instruction Tuning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - How Far Can We Go with Practical Function-Level Program Repair?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - Does Gaussian Splatting need SFM Initialization?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/19 - AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - MeshLRM: Large Reconstruction Model for High-Quality Mesh
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - Introducing v0.5 of the AI Safety Benchmark from MLCommons
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - Introducing Meta Llama 3: The most capable openly available LLM to date
(Blog), - 04/18 - EdgeFusion: On-Device Text-to-Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - BLINK: Multimodal Large Language Models Can See but Not Perceive
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/18 - AniClipart: Clipart Animation with Text-to-Video Priors
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/17 - MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/17 - FlowMind: Automatic Workflow Generation with LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/17 - Dynamic Typography: Bringing Words to Life
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/17 - Stable Diffusion 3 API Now Available
(twitter), (Blog), (Demo), - 04/16 - VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/16 - U.S. Commerce Secretary Gina Raimondo Announces Expansion of U.S. AI Safety Institute Leadership Team
(News), - 04/16 - Long-form music generation with latent diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/15 - LLM Evaluators Recognize and Favor Their Own Generations
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/15 - Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/15 - Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/15 - Taming Latent Diffusion Model for Neural Radiance Field Inpainting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/15 - Opus can operate as a Turing machine
(twitter), - 04/15 - MathGPT: Leveraging Llama 2 to create a platform for highly personalized learning
- 04/15 - HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/15 - Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/15 - Compression Represents Intelligence Linearly
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/15 - CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/14 - TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/13 - Cathie Wood Muscles Into ChatGPT Boom With New OpenAI Stake
(News), - 04/12 - Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/12 - Probing the 3D Awareness of Visual Foundation Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/12 - Pre-training Small Base LMs with Fewer Tokens
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/12 - On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/12 - MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/12 - Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/12 - Is ChatGPT Transforming Academics' Writing Style?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/12 - COCONut: Modernizing COCO Segmentation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/12 - AI Chip Trims Energy Budget Back by 99+ Percent
(News), - 04/12 - AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/12 - Grok-1.5 Vision Preview
(Demo), - 04/12 - The good, the bad, and the Humane Pin
(News), - 04/12 - Paid ChatGPT users can now access GPT-4 Turbo
(twitter), (News), , () - 04/11 - The Necessity of AI Audit Standards Boards
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/11 - Remembering Transformer for Continual Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Amazon adds Andrew Ng, a leading voice in artificial intelligence, to its board of directors
(News), - 04/11 - Adobe Is Buying Videos for $3 Per Minute to Build AI Model
(News), - 04/11 - UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/11 - Transferable and Principled Efficiency for Open-Vocabulary Segmentation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/11 - SWE-agent
(twitter), (Demo), , () - 04/11 - Sparse Laneformer
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Rho-1: Not All Tokens Are What You Need
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/11 - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - LLoCO: Learning Long Contexts Offline
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - JetMoE: Reaching Llama2 Performance with 0.1M Dollars
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) (Project), (twitter), , (β³οΈ), () - 04/11 - HGRN2: Gated Linear RNNs with State Expansion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/11 - From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Context-aware Video Anomaly Detection in Long-Term Datasets
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - ChatGPT-3.5, Claude 3 kick pixelated butt in Street Fighter III tournament for LLMs
(News), - 04/11 - ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Best Practices and Lessons Learned on Synthetic Data for Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Benchmark LLMs by fighting in Street Fighter 3
(Demo), , () - 04/11 - Audio Dialogues: Dialogues dataset for audio and music understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/11 - AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/10 - LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - Gemini 1.5 Pro now understands audio
(twitter), - 04/10 - Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/10 - Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - OpenAI and Meta are on the verge of releasing AI models capable of reasoning like humans, report says
(News), - 04/10 - MetaCheckGPT -- A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - Meta confirms that its Llama 3 open source LLM is coming in the next month
(News), - 04/10 - Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - Incremental XAI: Memorable Understanding of AI with Incremental Explanations
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - BRAVE: Broadening the visual encoding of vision-language models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - AI startup Mistral launches a 281GB AI model to rival OpenAI, Meta, and Google
(News), - 04/10 - Agent-driven Generative Semantic Communication for Remote Surveillance
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - Adapting LLaMA Decoder to Vision Transformer
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/10 - A Survey on the Integration of Generative AI for Critical Thinking in Mobile Networks
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - RULER: What's the Real Context Size of Your Long-Context Language Models?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - Revising Densification in Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - Reconstructing Hand-Held Objects in 3D
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - RAR-b: Reasoning as Retrieval Benchmark
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - Privacy Preserving Prompt Engineering: A Survey
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - On Evaluating the Efficiency of Source Code Generated by LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/09 - OmniFusion Technical Report
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - MuPT: A Generative Symbolic Music Pretrained Transformer
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/09 - LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - Hash3D: Training-free Acceleration for 3D Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - Google unveils open source projects for generative AI
(News), - 04/09 - Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/09 - Apple just unveiled new Ferret-UI LLM β this AI can read your iPhone screen
(News), - 04/09 - AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - YaART: Yet Another ART Rendering Technology
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - UniFL: Improve Stable Diffusion via Unified Feedback Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - The Fact Selection Problem in LLM-Based Program Repair
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/08 - SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - SambaLingo: Teaching Large Language Models New Languages
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - Naver debuts multilingual HyperCLOVA X LLM it will use to build sovereign AI for Asia
(News), - 04/08 - MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/08 - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - Evaluating Interventional Reasoning Capabilities of Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/08 - CodecLM: Aligning Language Models with Tailored Synthetic Data
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/08 - AutoCodeRover: Autonomous Program Improvement
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/07 - TimeGPT in Load Forecasting: A Large Time Series Model Perspective
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/07 - OpenAI transcribed over a million hours of YouTube videos to train GPT-4
(News), - 04/07 - MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/07 - ByteEdit: Boost, Comply and Accelerate Generative Image Editing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/06 - Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/06 - Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/06 - DATENeRF: Depth-Aware Text-based Editing of NeRFs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/06 - BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/06 - Aligning Diffusion Models by Optimizing Human Utility
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/06 - The Case for Developing a Foundation Model for Planning-like Tasks from Scratch
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - Increased LLM Vulnerabilities from Fine-tuning and Quantization
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - SpatialTracker: Tracking Any 2D Pixels in 3D Space
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - Social Skill Training with Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/05 - Robust Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - Koala: Key frame-conditioned long video-LLM
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - CLUE: A Clinical Language Understanding Evaluation for LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/05 - Assisting humans in complex comparisons: automated information comparison at scale
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/04 - Language Model Evolution: An Iterated Learning Perspective
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) (twitter), - 04/04 - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/04 - Evaluating LLMs at Detecting Errors in LLM Responses
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/04 - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/04 - Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/04 - Training LLMs over Neurally Compressed Text
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - ReFT: Representation Finetuning for Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/04 - Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - PointInfinity: Resolution-Invariant Point Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/04 - AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/03 - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/03 - On the Scalability of Diffusion-based Text-to-Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/03 - Many-shot jailbreaking
(β) - 04/03 - LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/03 - Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/03 - InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/03 - Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/03 - Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/03 - ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/02 - UK & United States announce partnership on science of AI safety
(News), - 04/02 - Large Language Models as Planning Domain Generators
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 04/02 - Poro 34B and the Blessing of Multilinguality
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/02 - Octopus v2: On-device language model for super agent
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/02 - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/02 - Long-context LLMs Struggle with Long In-context Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/02 - LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/02 - Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
(β) - 04/02 - HyperCLOVA X Technical Report
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/02 - CameraCtrl: Enabling Camera Control for Text-to-Video Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/02 - Advancing LLM Reasoning Generalists with Preference Trees
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - Stream of Search (SoS): Learning to Search in Language
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/01 - The Rise and Rise of A.I. Large Language Models (LLMs)
(Blog), - 04/01 - Streaming Dense Video Captioning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - Measuring Style Similarity in Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - Getting it Right: Improving Spatial Consistency in Text-to-Image Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - For Data-Guzzling AI Companies, the Internet Is Too Small
(News), - 04/01 - FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/01 - Evalverse: Unified and Accessible Library for Large Language Model Evaluation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 04/01 - DBRX, Continual Pretraining, RewardBench, Faster Inference, and More
(Blog), - 04/01 - CosmicMan: A Text-to-Image Foundation Model for Humans
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/01 - Condition-Aware Neural Network for Controlled Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/01 - Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 04/01 - Are large language models superhuman chemists?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/31 - WavLLM: Towards Robust and Adaptive Speech Large Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/31 - Tired of Plugins? Large Language Models Can Be End-To-End Recommenders
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/30 - Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/30 - ST-LLM: Large Language Models Are Effective Temporal Learners
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/30 - Noise-Aware Training of Layout-Aware Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/30 - MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/30 - Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - ReALM: Reference Resolution As Language Modeling
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - NVIDIA H200 GPUs Crush MLPerfβs LLM Inferencing Benchmark
(News), - 03/29 - MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - Gecko: Versatile Text Embeddings Distilled from Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - DiJiang: Efficient Large Language Models through Compact Kernelization
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/29 - DeepMind develops SAFE, an AI-based app that can fact-check LLMs
(News), - 03/29 - CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/29 - Are We on the Right Way for Evaluating Large Vision-Language Models?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/28 - sDPO: Don't Use Your Data All at Once
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/28 - Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/28 - Localizing Paragraph Memorization in Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/28 - Jamba: A Hybrid Transformer-Mamba Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/28 - GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/28 - Claude 3 overtakes GPT-4 in the duel of the AI bots. Here's how to get in on the action
(News), - 03/28 - Announcing Grok-1.5
(Blog), (Demo), - 03/27 - A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - ViTAR: Vision Transformer with Any Resolution
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - Towards a World-English Language Model for On-Device Virtual Assistants
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - TextCraftor: Your Text Encoder Can be Image Quality Controller
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/27 - Long-form factuality in large language models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/27 - LITA: Language Instructed Temporal-Localization Assistant
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/27 - Garment3DGen: 3D Garment Stylization and Texture Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/27 - BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/26 - MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/26 - The Unreasonable Ineffectiveness of the Deeper Layers
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/26 - TC4D: Trajectory-Conditioned Text-to-4D Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/26 - Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/26 - Introducing DBRX: A New State-of-the-Art Open LLM
(Blog), - 03/26 - InternLM2 Technical Report
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/26 - Improving Text-to-Image Consistency via Automatic Prompt Optimization
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/26 - Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/26 - EgoLifter: Open-world 3D Segmentation for Egocentric Perception
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/26 - AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/26 - 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/25 - Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/25 - RepairAgent: An Autonomous, LLM-Based Agent for Program Repair
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/25 - RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/25 - VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/25 - TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/25 - SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/25 - LLM Agent Operating System
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/25 - FlashFace: Human Image Personalization with High-fidelity Identity Preservation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/25 - DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/25 - Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/23 - When LLM-based Code Generation Meets the Software Development Process
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/22 - ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/22 - SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/22 - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/22 - LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/22 - InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/22 - FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/22 - DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/22 - Can large language models explore in-context?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/22 - AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/21 - PeerGPT: Probing the Roles of LLM-based Peer Agents as Team Moderators and Participants in Children's Collaborative Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/21 - StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/21 - ReNoise: Real Image Inversion Through Iterative Noising
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - Recourse for reclamation: Chatting with generative language models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - RakutenAI-7B: Extending Large Language Models for Japanese
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - MyVLM: Personalizing VLMs for User-Specific Queries
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/21 - General Assembly adopts landmark resolution on artificial intelligence
(News), - 03/21 - Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - Explorative Inbetweening of Time and Space
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - DreamReward: Text-to-3D Generation with Human Preference
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/21 - Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/21 - Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/21 - AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - ZigMa: Zigzag Mamba Diffusion Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/20 - VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - RewardBench: Evaluating Reward Models for Language Modeling
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/20 - Reverse Training to Nurse the Reversal Curse
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/20 - LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/20 - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/20 - Evaluating Frontier Models for Dangerous Capabilities
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - DepthFM: Fast Monocular Depth Estimation with Flow Matching
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/20 - Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - When Do We Not Need Larger Vision Models?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - Towards a general-purpose foundation model for computational pathology
(β) - 03/19 - TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - GVGEN: Text-to-3D Generation with Volumetric Representation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - Evolutionary Optimization of Model Merging Recipes
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ([:octocat:](https://github.com/ sakanaai/evolutionary-model-merge)![GitHub Repo stars](https://img.shields.io/github/stars/ sakanaai/evolutionary-model-merge?style=social)) - 03/19 - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - Apple's MM1: A multimodal large language model capable of interpreting both images and text data
(News), - 03/19 - AnimateDiff-Lightning: Cross-Model Diffusion Distillation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/19 - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/19 - A visual-language foundation model for computational pathology
(β) , (β³οΈ) - 03/19 - Characteristic AI Agents via Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), (![GitHub Repo stars](https://img.shields.io/github/stars/nuaa-nlp/character100 ?style=social)) - 03/18 - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/18 - VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - TnT-LLM: Text Mining at Scale with Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - ROUTERBENCH: A Benchmark for Multi-LLM Routing System
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), (SS) - 03/18 - Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/18 - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/18 - Larimar: Large Language Models with Episodic Memory Control
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/18 - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/18 - Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/18 - Compiler generated feedback for Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/17 - PhD: A Prompted Visual Hallucination Evaluation Dataset
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/17 - MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/16 - VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/16 - Do Large Language Models understand Medical Codes?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - VideoAgent: Long-form Video Understanding with Large Language Model as Agent
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - Uni-SMART: Universal Science Multimodal Analysis and Research Transformer
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - Trusting the Search: Unraveling Human Trust in Health Information from Google and ChatGPT
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/15 - RAFT: Adapting Language Model to Domain Specific RAG
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/15 - RAFT: Adapting Language Model to Domain Specific RAG
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - PERL: Parameter Efficient Reinforcement Learning from Human Feedback
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/15 - MusicHiFi: Fast High-Fidelity Stereo Vocoding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/15 - LightIt: Illumination Modeling and Control for Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/15 - FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - DiPaCo: Distributed Path Composition
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/15 - Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - WavCraft: Audio Editing and Generation with Natural Language Prompts
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - Video Editing via Factorized Diffusion Distillation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - Scaling Instructable Agents Across Many Simulated Worlds
(twitter), (Blog), - 03/14 - Recurrent Drafter for Fast Speculative Decoding in Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - LocalMamba: Visual State Space Model with Windowed Selective Scan
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming Prevention
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - GPT on a Quantum Computer
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/14 - Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - GiT: Towards Generalist Vision Transformer through Universal Language Interface
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - Exploring the Capabilities and Limitations of Large Language Models in the Electric Energy Sector
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/14 - BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/14 - 3D-VLA: A 3D Vision-Language-Action Generative World Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - Scaling Instructable Agents Across Many Simulated Worlds
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/13 - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - The Human Factor in Detecting Errors of Large Language Models: A Systematic Literature Review and Future Research Directions
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - SOTOPIA-Ο: Interactive Learning of Socially Intelligent Language Agents
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/13 - Simple and Scalable Strategies to Continually Pre-train Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/13 - Scaling Up Dynamic Human-Scene Interaction Modeling
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - Language-based game theory in the age of artificial intelligence
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - Language models scale reliably with over-training and on downstream tasks
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/13 - Knowledge Conflicts for LLMs: A Survey
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - Gemma: Open Models Based on Gemini Research and Technology
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/13 - Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/13 - Cultural evolution in populations of Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/13 - Bugs in Large Language Models Generated Code: An Empirical Study
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/12 - Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/12 - Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/12 - MoAI: Mixture of All Intelligence for Large Language and Vision Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/12 - Learning Generalizable Feature Fields for Mobile Manipulation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/12 - DragAnything: Motion Control for Anything using Entity Representation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/12 - Chronos: Learning the Language of Time Series
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/12 - Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/11 - Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/11 - HILL: A Hallucination Identifier for Large Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/11 - FAX: Scalable and Differentiable Federated Primitives in JAX
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/11 - FashionReGen: LLM-Empowered Fashion Report Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/11 - VideoMamba: State Space Model for Efficient Video Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/11 - V3D: Video Diffusion Models are Effective 3D Generators
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/11 - Stealing Part of a Production Language Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/11 - Multistep Consistency Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/11 - FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/11 - Chain-of-table: Evolving tables in the reasoning chain for table understanding (Blog),
- 03/11 - An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/11 - Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/10 - VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/09 - Algorithmic progress in language models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 03/08 - VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - DeepSeek-VL: Towards Real-World Vision-Language Understanding
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/08 - CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/08 - Now available on Poe: Claude 3 (Demo),
- 03/08 - Google - Health-specific embedding tools for dermatology and pathology (Blog),
- 03/07 - Yi: Open Foundation Models by 01.AI
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/07 - Teaching Large Language Models to Reason with Reinforcement Learning
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/07 - StableDrag: Stable Dragging for Point-based Image Editing
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/07 - Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/07 - PixArt-Ξ£: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/07 - Pix2Gif: Motion-Guided Diffusion for GIF Generation
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/07 - Meet βLiberated Qwenβ, an uncensored LLM that strictly adheres to system prompts (News),
- 03/07 - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/07 - KAIST develops next-generation ultra-low power LLM accelerator (News),
- 03/07 - Inflection-2.5: meet the world's best personal AI (News),
- 03/07 - How Far Are We from Intelligent Visual Deductive Reasoning?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/07 - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/07 - Evaluating LLM models at scale (Blog),
- 03/07 - Common 7B Language Models Already Possess Strong Math Capabilities
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/07 - Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/06 - Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/06 - ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/06 - SaulLM-7B: A pioneering Large Language Model for Law
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/06 - NY hospital exec: Multimodal LLM assistants will create a βparadigm shiftβ in patient care (News),
- 03/06 - Learning to Decode Collaboratively with Multiple Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/06 - Enhancing Vision-Language Pre-training with Rich Supervisions
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/06 - Backtracing: Retrieving the Cause of the Query
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/06 - AI Prompt Engineering Is Dead (News),
- 03/06 - 3D Diffusion Policy
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 03/05 - OpenAI and Elon Musk (Blog),
- 03/05 - Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/05 - WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ()
- 03/05 - Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS)
- 03/05 - Revisiting Meta-evaluation for Grammatical Error Correction (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - MathScale: Scaling Instruction Tuning for Mathematical Reasoning (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS)
- 03/05 - KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ()
- 03/05 - Interactive Continual Learning: Fast and Slow Thinking (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - In Search of Truth: An Interrogation Approach to Hallucination Detection (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - ImgTrojan: Jailbreaking Vision-Language Models with ONE Image (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Generative Software Engineering (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS)
- 03/05 - Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Exploring the Limitations of Large Language Models in Compositional Relation Reasoning (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Design2Code: How Far Are We From Automating Front-End Engineering? (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/05 - An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 3/5 - OpenAI - ChatGPT can now read responses to you. (twitter,
- 03/04 - The Claude 3 Model Family: Opus, Sonnet, Haiku
(β) (twitter), , (β³οΈ) - 03/04 - Wukong: Towards a Scaling Law for Large-Scale Recommendation (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/04 - Large language models surpass human experts in predicting neuroscience results
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 03/04 - NoteLLM: A Retrievable Large Language Model for Note Recommendation (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/04 - MagicClay: Sculpting Meshes With Generative Neural Fields (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS)
- 03/04 - Enhancing LLM Safety via Constrained Direct Preference Optimization (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/04 - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ()
- 03/04 - CODE-ACCORD: A Corpus of Building Regulatory Data for Rule Generation towards Automatic Compliance Checking (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ()
- 03/04 - Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ)
- 03/04 - adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ()
- 3/4 - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 3/4 - TripoSR: Fast 3D Object Reconstruction from a Single Image (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 3/4 - RT-H: Action Hierarchies Using Language (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 3/4 - ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 3/4 - OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 3/4 - Build AI for a Better Future (twitter), (News),
- 3/4 - AtomoVideo: High Fidelity Image-to-Video Generation (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 03/03 - Research Papers in February 2024: A LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM Research (Blog),
- 3/3 - Nvidia CEO Jensen Huang says AI could pass most human tests in 5 years (News
- 3/3 - MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 3/3 - InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 3/3 - Could this be bigger than OpenAI? Microsoft invests billions in French startup β Mistral AI is a multilingual maestro that's almost as good as ChatGPT 4 (News),
- 3/3 - 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 3/2 - Nvidia CEO says AI could pass human tests in five years (News
- 3/1 - Elon Musk sues OpenAI and CEO Sam Altman over contract breach (News)
- 3.1 - AtP*: An efficient and scalable method for localizing LLM behaviour to components (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS)
- 3.1 - VisionLLaMA: A Unified LLaMA Interface for Vision Tasks (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS)
- 3.1 - Learning and Leveraging World Models in Visual Representation Learning (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS)
- 3.1 - RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS)
- 3.1 - Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS)
- 3.1 - Resonance RoPE: Improving Context Length Generalization of Large Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 02/29 - OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 02/29 - Retrieval-Augmented Generation for AI-Generated Content: A Survey (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), ()
- 2.29 - DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - Humanoid Locomotion as Next Token Prediction (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - StarCoder 2 and The Stack v2: The Next Generation (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - Trajectory Consistency Distillation (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.29 - Beyond Language Models: Byte Models are Digital World Simulators (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - Syntactic Ghost: An Imperceptible General-purpose Backdoor Attacks on Pre-trained Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.29 - ViewFusion: Towards Multi-View Consistency via Interpolated Denoising (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.29 - MOSAIC: A Modular System for Assistive and Interactive Cooking (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS)
- 02/28 - Automatic Creative Selection with Cross-Modal Matching
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS) - 2.28 - Priority Sampling of Large Language Models for Compilers (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.28 - Simple linear attention language models balance the recall-throughput tradeoff (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.28 - Approaching Human-Level Forecasting with Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.28 - Datasets for Large Language Models: A Comprehensive Survey (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.28 - A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 02/27 - A High Level Guide to LLM Evaluation Metrics (Blog),
- 2/27 - Users Say Microsoft's AI Has Alternate Personality as Godlike AGI That Demands to Be Worshipped (News)
- 2/27 - Google DeepMind CEO on AGI, OpenAI and Beyond β MWC 2024 (News)
- 2.27 - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.27 - Towards Optimal Learning of Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - Evaluating Very Long-Term Conversational Memory of LLM Agents (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - Training-Free Long-Context Scaling of Large Language Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.27 - VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - Sora Generates Videos with Stunning Geometrical Consistency (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.27 - Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.27 - Video as the New Language for Real-World Decision Making (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 02/27 - On the Societal Impact of Open Foundation Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 02/26 - Set the Clock: Temporal Alignment of Pretrained Language Models
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 2/26 - DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models (β), (π)(π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 02/26 - Mistral Large is our flagship model, with top-tier reasoning capacities (News)
- 2.26 - Disentangled 3D Scene Generation with Layout Learning (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.26 - Multi-LoRA Composition for Image Generation (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.26 - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.26 - Do Large Language Models Latently Perform Multi-Hop Reasoning? (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.26 - Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.26 - Nemotron-4 15B Technical Report (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.26 - StructLM: Towards Building Generalist Models for Structured Knowledge Grounding (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.26 - Towards Open-ended Visual Quality Comparison (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.25 - ChatMusician: Understanding and Generating Music Intrinsically with LLM (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.25 - FuseChat: Knowledge Fusion of Chat Models (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 02/24 - Divide-or-Conquer? Which Part Should You Distill Your LLM?
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ) - 02/24 - Perplexity.ai Revamps Google SEO Model For LLM Era (News)
- 02/24 - Data Interpreter: An LLM Agent For Data Science
(β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS), (β³οΈ), () - 2.24 - Empowering Large Language Model Agents through Action Learning (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.23 - MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.23 - Seamless Human Motion Composition with Blended Positional Encodings (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.23 - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.23 - Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ), ()
- 2.23 - API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.23 - Genie: Generative Interactive Environments (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.23 - GPTVQ: The Blessing of Dimensionality for LLM Quantization (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.23 - ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition (β), (π), (π), (π), (π ), (HTML), (SP), (GS), (SS), (β³οΈ)
- 2.22 - CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models (β), (π), (π), (π), (π ), (HTML), (AS), (GS), (β³οΈ), ()
- 02/22 - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models (β), (π), (π), (π), (π ), (HTML), (SL), (SP), (GS), (SS)
- 2.22 - Divide-or-Conquer? Which Part Should You Distill Your LLM? (β), (π), (π), (π), (π ), (HTML), (AS), (GS), (β³οΈ)
- 2.22 - MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (β), (π), (π), (π), (π ), (HTML), (AS), (GS), (β³οΈ)
- 2.22 - Watermarking Makes Language Models Radioactive (β), (π), (π), (π), (π ), (HTML), (AS), (GS), (β³οΈ)
- 2.22 - AutoPrompt - prompt optimization framework ()
- 2.22 - Announcing Stable Diffusion 3 (tweet), (blog)
- 2.22 - DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models (β), (π), (π), (π), (π ), (HTML), (β³οΈ) , ()
- 2.22 - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation (β), (π), (π), (π), (π ), (HTML), (β³οΈ)
- 2.22 - LLMsΒ with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey (β), (π), (π), (π), (π ), (HTML), (β³οΈ)
- 2.22 - Vision-Language Navigation with EmbodiedΒ Intelligence: A Survey (β), (π), (π), (π), (π ), (HTML), (β³οΈ)
- 2.22 - Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models (β), (π), (π), (π), (π ), (HTML), (β³οΈ)
- 2.22 - Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization (β), (π), (π), (π), (π ), (HTML), (β³οΈ)
- 2.22 - PALO: A Polyglot Large Multimodal Model for 5B People (β), (π), (π), (π), (π ), (HTML), (β³οΈ) , ()
- 2.22 - GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion (β), (π), ([:paperclip:](https://arxiv.org/pdf/2402.148