Skip to content
@PKU-Alignment

PKU-Alignment

Loves Sharing and Open-Source, Making AI Safer.

PKU-Alignment Team

Large language models (LLMs) have immense potential in the field of general intelligence but come with significant risks. As a research team at Peking University, we actively focus on alignment techniques for LLMs, such as safety alignment, to enhance the model's safety and reduce toxicity.

Welcome to follow our AI Safety project:

Pinned Loading

  1. omnisafe omnisafe Public

    JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.

    Python 975 136

  2. safety-gymnasium safety-gymnasium Public

    NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

    Python 474 66

  3. safe-rlhf safe-rlhf Public

    Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

    Python 1.5k 123

  4. Safe-Policy-Optimization Safe-Policy-Optimization Public

    NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms

    Python 373 54

Repositories

Showing 10 of 23 repositories
  • llms-resist-alignment Public

    [ACL2025 Oral & Panel Discussion] Language Models Resist Alignment

    PKU-Alignment/llms-resist-alignment’s past year of commit activity
    Python 9 1 0 0 Updated Jun 11, 2025
  • eval-anything Public
    PKU-Alignment/eval-anything’s past year of commit activity
    Python 20 Apache-2.0 16 1 2 Updated Jun 8, 2025
  • SAE-V Public

    [ICML 2025 Poster] SAE-V: Interpreting Multimodal Models for Enhanced Alignment

    PKU-Alignment/SAE-V’s past year of commit activity
    1 0 0 0 Updated Jun 5, 2025
  • SafeVLA Public
    PKU-Alignment/SafeVLA’s past year of commit activity
    Python 52 2 5 1 Updated Jun 4, 2025
  • align-anything Public

    Align Anything: Training All-modality Model with Feedback

    PKU-Alignment/align-anything’s past year of commit activity
    Jupyter Notebook 4,298 Apache-2.0 503 25 3 Updated May 28, 2025
  • ProgressGym Public

    Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.

    PKU-Alignment/ProgressGym’s past year of commit activity
    Python 23 MIT 4 0 0 Updated Mar 30, 2025
  • s1-m Public Forked from PKU-Alignment/align-anything

    S1-M: Simple Test-time Scaling in Multimodal Reasoning

    PKU-Alignment/s1-m’s past year of commit activity
    Python 3 Apache-2.0 512 0 0 Updated Mar 25, 2025
  • omnisafe Public

    JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.

    PKU-Alignment/omnisafe’s past year of commit activity
    Python 975 Apache-2.0 136 15 4 Updated Mar 17, 2025
  • ProAgent Public

    AAAI24(Oral) ProAgent: Building Proactive Cooperative Agents with Large Language Models

    PKU-Alignment/ProAgent’s past year of commit activity
    JavaScript 89 MIT 10 0 0 Updated Mar 4, 2025
  • safety-gymnasium Public

    NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

    PKU-Alignment/safety-gymnasium’s past year of commit activity
    Python 474 Apache-2.0 66 10 1 Updated Feb 27, 2025