Benchmark and Empirical Study
-
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. S&P 2024, Link
-
Vulnerability Detection with Code Language Models: How Far Are We? arxiv 2024, Link
-
A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection, arxiv 2024, Link
-
How Far Have We Gone in Vulnerability Detection Using Large Language Models, ICLR 2024, Link
-
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities, arxiv 2023, Link
-
Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection, arXiv, Link
-
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection, RAID 2023, Link
-
SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models, Link
General Analysis
-
A Learning-Based Approach to Static Program Slicing. OOPSLA 2024, Link
-
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. ICSE 2024, Link
-
E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. arXiv, Link
Domain-Specific Bug Detection(Domain-Specific Program & Bug Type)
-
SMARTINV: Multimodal Learning for Smart Contract Invariant Inference, S&P 2024, Link
-
LLM-based Resource-Oriented Intention Inference for Static Resource Detection, arxiv, Link
-
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models, OOPSLA 2024, Link
-
Do you still need a manual smart contract audit? Link
-
Harnessing the Power of LLM to Support Binary Taint Analysis, arxiv, Link
-
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. arXiv, Link
-
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. ICSE 2024 Link
-
Continuous Learning for Android Malware Detection, USENIX Security 2023, Link
-
Beware of the Unexpected: Bimodal Taint Analysis, ISSTA 2023, Link
-
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification, CAV 2024, Link
-
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models, Link
-
Lemur: Integrating Large Language Models in Automated Program Verification, ICLR 2024, Link
-
Zero and Few-shot Semantic Parsing with Ambiguous Inputs, ICLR 2024, Link
-
Finding Inductive Loop Invariants using Large Language Models, Link
-
Can ChatGPT support software verification? arXiv, Link
-
Impact of Large Language Models on Generating Software Specifications, Link
-
Can Large Language Models Reason about Program Invariants?, ICML 2023, Link
-
Ranking LLM-Generated Loop Invariants for Program Verification, Link
-
Towards AI-Assisted Synthesis of Verified Dafny Methods, FSE 2024, Link
-
Enabling Memory Safety of C Programs using LLMs, arxiv, Link
-
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, ICLR 2024, Link
-
Is Self-Repair a Silver Bullet for Code Generation? ICLR 2024, Link
-
Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search Link
-
Hypothesis Search: Inductive Reasoning with Language Models, ICLR 2024, Link
-
CodePlan: Repository-level Coding using LLMs and Planning, FMDM & NIPS 2023, Link
-
Repository-Level Prompt Generation for Large Language Models of Code. ICML 2023, Link
-
Refactoring Programs Using Large Language Models with Few-Shot Examples. arXiv, Link
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Link
-
Teaching Large Language Models to Self-Debug, ICLR 2024, Link
-
Guess & Sketch: Language Model Guided Transpilation, ICLR 2024, Link
-
Optimal Neural Program Synthesis from Multimodal Specifications, EMNLP 2021, Link
-
CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation, ICLR 2022, Link
-
Sporq: An Interactive Environment for Exploring Code Using Query-by-Example, UIST 2021, Link
-
Data Extraction via Semantic Regular Expression Synthesis, OOPSLA 2023, Link
-
Web Question Answering with Neurosymbolic Program Synthesis, PLDI 2021, Link
-
Active Inductive Logic Programming for Code Search, ICSE 2019, Link
-
Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer. ICSE 2024. Link
-
LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models Link
-
Large Language Model guided Protocol Fuzzing, NDSS 2024, Link
-
Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models, ISSTA 2023, Link
-
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag, MASEC@NeurIPS 2023, Link
-
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs, arxiv, Link
-
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking, FSE 2024, Link
-
FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations, ICSE 2024, Link
-
Symmetry-Preserving Program Representations for Learning Code Semantics Link
-
LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis, Link
-
When Do Program-of-Thought Works for Reasoning? AAAI 2024 Link
-
Grounded Copilot: How Programmers Interact with Code-Generating Models, OOPSLA 2023, Link
-
Extracting Training Data from Large Language Models, USENIX Security 2023, Link
- Using an LLM to Help With Code Understanding, ICSE 2024, Link
- Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024, Link
-
Self-Evaluation Guided Beam Search for Reasoning, NeurIPS 2023, Link
-
Self-consistency improves chain of thought reasoning in language models. NeurIPS 2022, Link
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023, Link
-
Cumulative Reasoning With Large Language Models, Link
-
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting, EMNLP 2023, Link
-
Complementary Explanations for Effective In-Context Learning, ACL 2023, Link
-
Wechat Post: 大语言模型的数学之路 Link
-
Blog: Prompt Engineering Link
-
Hallucination: Survey Link
-
Natural Language Commanding via Program Synthesis, Microsoft Link
-
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Feifei Li, Google Link
-
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Link
-
Real-world practices of AI Agents, Link
-
Cognitive Architectures for Language Agents, Link
-
The Rise and Potential of Large Language Model Based Agents: A Survey, Link
-
ReAct: Synergizing Reasoning and Acting in Language Models Link
-
Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023, Link
-
Wechat Post: AutoGen, Link
-
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting, NeurIPS 2023, Link
-
Awesome things about LLM-powered agents: Papers, Repos, and Blogs, Link
-
ChatDev: Mastering the Virtual Social Realm, Shaping the Future of Intelligent Interactions. Link
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Link
-
LMFLow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. Link
-
codellama: Inference code for CodeLlama models, Link
-
CodeFuse: LLM for Code from Ant Group, Link
-
Owl-LM: Large Language Model for Blockchain, Link
-
Tao YU, The University of Hong Kong (Training)
-
Shunyu YAO, Princeton University (Reasoning, Agent)
-
Xi YE, Isil Dillig, UT Austin (Prompting)
-
Lingming ZHANG, UIUC (Application: Testing, Repair)
-
Zhiyun QIAN, UC Riverside (Application: Analysis)
-
Yizheng CHEN, University of Maryland (Application: Analysis)
-
Baishakhi Ray, Columbia University (Application: Repair, Analysis)
-
Martin Vechev, ETH (Data, Hallucination)