Large language model alignment: A survey.
T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo, X Wu, Y Liu, D Xiong.
arXiv:2309.15025, 2023.
[ArXiv]
Ai alignment: A comprehensive survey.
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, Z Zhang, F Zeng, KY Ng, et al.
arXiv:2310.19852, 2023.
[ArXiv]
[Homepage]
Aligning large language models with human: A survey.
Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang, L Shang, X Jiang, Q Liu.
arXiv:2307.12966, 2023.
[ArXiv]
[Homepage]
Making large language models better reasoners with alignment.
P Wang, L Li, L Chen, F Song, B Lin, Y Cao, T Liu, Z Sui.
arXiv:2309.02144, 2023.
[Paper]
Preference ranking optimization for human alignment.
F Song, B Yu, M Li, H Yu, F Huang, Y Li, et al.
AAAI, 2024.
[Paper]
Aligner: Achieving efficient alignment through weak-to-strong correction.
J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, et al.
arXiv, 2024.
[ArXiv]
[HomePage]
Knowledgeable preference alignment for llms in domain-specific question answering.
Y Zhang, Z Chen, Y Fang, L Cheng, Y Lu, F Li, W Zhang, H Chen.
arXiv:2311.06503, 2023.
[ArXiv]
[Github]
Aligning ai with shared human values.
D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt.
ICLR, 2021.
[ArXiv]
Safe rlhf: Safe reinforcement learning from human feedback.
J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang, et al.
arXiv, 2023.
[ArXiv]
[Github]
A Moral Imperative: The Need for Continual Superalignment of Large Language Models.
G Puthumanaillam, M Vora, P Thangeda, M Ornik.
arXiv:2403.14683, 2024.
[ArXiv]
A survey of safety and trustworthiness of large language models through the lens of verification and validation.
X Huang, W Ruan, W Huang, G **, Y Dong, C Wu, S Bensalem, R Mu, Y Qi, X Zhao, K Cai, et al.
arxiv:2305.11391, 2023.
[ArXiv]
A survey on large language model (llm) security and privacy: The good, the bad, and the ugly.
Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang.
High-Confidence Computing, 2024.
[Paper]
Safeguarding Large Language Models: A Survey.
Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu, G Jin, Y Qi, J Hu, J Meng, S Bensalem, et al.
arXiv:2406.02622, 2024.
[ArXiv]
Parameter-efficient fine-tuning of large-scale pre-trained language models.
N Ding, Y Qin, G Yang, F Wei, Z Yang, Y Su, S Hu, Y Chen, CM Chan, W Chen, J Yi, W Zhao, et al.
Nature Machine Intelligence, 2023.
[Paper]
Lora: Low-rank adaptation of large language models.
EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen.
arXiv:2106.09685, 2021.
[Paper)]
Longlora: Efficient fine-tuning of long-context large language models.
Y Chen, S Qian, H Tang, X Lai, Z Liu, S Han, J Jia.
arXiv:2309.12307, 2023.
[ArXiv]
[Github]
Openassistant conversations-democratizing large language model alignment.
A Köpf, Y Kilcher, D von Rütte, S Anagnostidis, ZR Tam, K Stevens, A Barhoum, D Nguyen, et al.
Advances in Neural Information Processing Systems, 2024.
[Paper]
[Homepage]
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment.
G Shen, Z Wang, O Delalleau, J Zeng, Y Dong, D Egert, S Sun, J Zhang, S Jain, et al.
arXiv:2405.01481, 2024.
[ArXiv]
[Github]