PKU-Alignment

All

16 repositories

align-anything
Public
Align Anything: Training All-modality Model with Feedback
chameleon multimodal dpo large-language-models rlhf vision-language-model
Python
•
Apache License 2.0
•35•212•4•0•Updated Nov 3, 2024Nov 3, 2024
ProgressGym
Public
Alignment with a millennium of moral progress.
Python
•
MIT License
•2•10•0•0•Updated Oct 31, 2024Oct 31, 2024
Aligner2024.github.io
Public
HTML
•1•0•0•0•Updated Oct 31, 2024Oct 31, 2024
omnisafe
Public
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
benchmark machine-learning reinforcement-learning deep-learning deep-reinforcement-learning constraint-satisfaction-problem pytorch safety-critical saferl safe-reinforcement-learning
Python
•
Apache License 2.0
•132•936•8•3•Updated Oct 15, 2024Oct 15, 2024
safe-sora
Public
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
alignment human-preferences text-to-video-generation large-vision-models
Python
•5•24•1•0•Updated Aug 20, 2024Aug 20, 2024
.github
Public
0•0•0•0•Updated Jul 14, 2024Jul 14, 2024
safe-rlhf
Public
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
reinforcement-learning transformers transformer safety llama gpt datasets beaver alpaca ai-safety
Python
•
Apache License 2.0
•119•1.3k•12•0•Updated Jun 13, 2024Jun 13, 2024
llms-resist-alignment
Public
Repo for paper "Language Models Resist Alignment"
alignment llama safe alpaca ai-safety vicuna llm llms rlhf safe-rlhf
Python
•0•4•0•0•Updated Jun 9, 2024Jun 9, 2024
aligner
Public
Achieving Efficient Alignment through Learned Correction
Python
•6•0•0•0•Updated Jun 7, 2024Jun 7, 2024
safety-gymnasium
Public
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
reinforcement-learning constraint-satisfaction-problem safety-critical safety-critical-systems safe-reinforcement-learning safe-reinforcement-learning-environments constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•52•393•4•0•Updated May 14, 2024May 14, 2024
ProAgent
Public
ProAgent: Building Proactive Cooperative Agents with Large Language Models
language-model cooperative human-ai overcooked human-ai-interaction cooperative-ai llm-agent
JavaScript
•
MIT License
•6•57•1•0•Updated Apr 8, 2024Apr 8, 2024
SafeDreamer
Public
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
reinforcement-learning constraint-satisfaction-problem safety-critical-systems safe-reinforcement-learning constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•7•45•1•0•Updated Apr 8, 2024Apr 8, 2024
Safe-Policy-Optimization
Public
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
benchmarks reinforcement-learning-algorithms safe safe-reinforcement-learning constrained-reinforcement-learning
Python
•
Apache License 2.0
•45•326•0•0•Updated Mar 20, 2024Mar 20, 2024
AlignmentSurvey
Public
AI Alignment: A Comprehensive Survey
awesome reinforcement-learning ai deep-learning survey alignment papers interpretability red-teaming large-language-models
0•128•0•0•Updated Nov 2, 2023Nov 2, 2023
beavertails
Public
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
safety llama gpt datasets language-model beaver ai-safety human-feedback-data llm llms
Makefile
•
Apache License 2.0
•5•110•2•0•Updated Oct 27, 2023Oct 27, 2023
ReDMan
Public
ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.
Python
•
Apache License 2.0
•2•16•0•0•Updated May 2, 2023May 2, 2023