YiZeng623

Follow

🏔️

@ Menlo Park

Yi Zeng YiZeng623

🏔️

@ Menlo Park

Follow

RS intern @ Meta AI | Ph.D. @ Virginia Tech | M.S. @ UCSD | Previous Intern @ Sony AI

78 followers · 43 following

San Diego
06:31 (UTC -08:00)
https://www.yi-zeng.com/
@EasonZeng623

Achievements

Achievements

Highlights

Pro

Organizations

Pinned Loading

LLM-Tuning-Safety/LLMs-Finetuning-Safety LLM-Tuning-Safety/LLMs-Finetuning-Safety Public

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 239 28
CHATS-lab/persuasive_jailbreaker CHATS-lab/persuasive_jailbreaker Public

Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

HTML 259 19
reds-lab/Narcissus reds-lab/Narcissus Public

The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% att…

Python 104 12
I-BAU I-BAU Public

Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''

Jupyter Notebook 50 13
frequency-backdoor frequency-backdoor Public

ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop strong…

Jupyter Notebook 41 6
reds-lab/Meta-Sift reds-lab/Meta-Sift Public

The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.

Python 18 4