diff --git "a/_notes/assets/Screenshot 2024-05-31 at 10.39.08\342\200\257AM.png" "b/_notes/assets/Screenshot 2024-05-31 at 10.39.08\342\200\257AM.png" new file mode 100644 index 00000000..23c688ad Binary files /dev/null and "b/_notes/assets/Screenshot 2024-05-31 at 10.39.08\342\200\257AM.png" differ diff --git "a/_notes/assets/Screenshot 2024-05-31 at 10.39.15\342\200\257AM.png" "b/_notes/assets/Screenshot 2024-05-31 at 10.39.15\342\200\257AM.png" new file mode 100644 index 00000000..677873f1 Binary files /dev/null and "b/_notes/assets/Screenshot 2024-05-31 at 10.39.15\342\200\257AM.png" differ diff --git a/_notes/neuro/comp_neuro.md b/_notes/neuro/comp_neuro.md index 5534c93b..0e853a13 100755 --- a/_notes/neuro/comp_neuro.md +++ b/_notes/neuro/comp_neuro.md @@ -1056,6 +1056,7 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne - Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding ([chen et al. 2022](https://arxiv.org/pdf/2211.06956.pdf)) - Aligning brain functions boosts the decoding of visual semantics in novel subjects ([thual...king, 2023](https://arxiv.org/abs/2312.06467)) - align across subjects before doing decoding - A variational autoencoder provides novel, data-driven features that explain functional brain representations in a naturalistic navigation task ([cho, zhang, & gallant, 2023](https://jov.arvojournals.org/article.aspx?articleid=2792546)) +- What's the Opposite of a Face? Finding Shared Decodable Concepts and their Negations in the Brain ([efird...fyshe, 2024](https://arxiv.org/abs/2405.17663)) - build clustering shared across subjects in CLIP space # advanced topics diff --git a/_notes/research_ovws/ovw_llms.md b/_notes/research_ovws/ovw_llms.md index 5fae96e1..9e0945bd 100644 --- a/_notes/research_ovws/ovw_llms.md +++ b/_notes/research_ovws/ovw_llms.md @@ -560,7 +560,7 @@ Editing is generally very similar to just adaptation/finetuning. One distinction - T-Patcher (Huang et al., 2023) and CaliNET (Dong et al., 2022) introduce extra trainable parameters into the feed- forward module of PLMs - weight updates - Knowledge Neurons in Pretrained Transformers ([dai et al. 2021](https://arxiv.org/abs/2104.08696)) - integrated gradients wrt to each neuron in BERT, then selectively udpate these neurons - - ROME: Locating and Editing Factual Associations in GPT ([meng, bau et al. 2022](https://arxiv.org/abs/2202.05262) ) + - ROME: Locating and Editing Factual Associations in GPT ([meng, bau et al. 2022](https://arxiv.org/abs/2202.05262)) - *localize factual associations* - causal intervention for identifying neuron activations that are decisive in a model’s factual predictions - "causal traces" - run net multiple times, introducing corruptions and then restore states from original non-corrupted forward pass to see which states can restore the original results - a small number of states contain info that can flip the model from one state to another @@ -642,7 +642,7 @@ Editing is generally very similar to just adaptation/finetuning. One distinction - builds on DAS ([geiger, ...goodman, 2023](https://arxiv.org/abs/2303.02536)) - N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in LLMs ([foote, nanda, ..., barez, 2023](https://arxiv.org/abs/2304.12918)) - explain each neuron in a graph - Finding Skill Neurons in Pre-trained Transformer-based Language Models ([wang et al. 2022](https://arxiv.org/abs/2211.07349)) - some individual neurons are predictive of the final task (dubbed "skill neurons') -- [thread](https://transformer-circuits.pub/2021/framework/index.html) (elhage...olah, 2021) +- circuits thread ([elhage...olah, 2021](https://transformer-circuits.pub/2021/framework/index.html)) - all layers are same dimension and each attention block **adds** a vector to it - Although they’re parameterized as separate matrices, $W_O W_V$ and $W_Q^T W_K$ can always be thought of as individual, low-rank matrices - $x \in \mathbb R^{d_{embed} \times d_{sequence}}$: $d_{embed}$ can be hundreds - tens of thousands @@ -749,6 +749,11 @@ Editing is generally very similar to just adaptation/finetuning. One distinction - Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT) ([sabir, babar, & abuadbba, 2023](https://arxiv.org/pdf/2307.01225.pdf)) - leverages techniques such as attention maps, integrated gradients, and model feedback to detect and then change adversarial inputs +- generation-time defenses + - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves ([deng...gu, 2023](https://arxiv.org/abs/2311.04205)) + - SafeDecoding ([xu…poovendran, 2024](https://arxiv.org/pdf/2402.08983#page=3.89)) + - Hierarchical instruction following ([wallace..beutel, 2024](https://arxiv.org/abs/2404.13208)) + **Attacks** @@ -925,7 +930,7 @@ mixture of experts models have become popular because of the need for (1) fast s - Training - Nomic 235M curated text pairs (mostly filtered from [here](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)) - Followed by supervised contrastive fine-tuning on datasets like MSMarco, NQ, NLI, HotpotQA, Fever, WikiAnswers, etc. - + - MEDI (from Instructor paper): combines 300 datasets from Super- NaturalInstructions with 30 datasets from existing collections designed for embedding training - customization - e.g. add prompt or prefixes like *search query*, *search document*, *classification*, *clustering* before embedding so model knows how to match things @@ -936,9 +941,9 @@ mixture of experts models have become popular because of the need for (1) fast s - GRIT: Generative Representational Instruction Tuning ([meunninghoff...kiela, 2024](https://arxiv.org/abs/2402.09906)) - train a single model that, given different instructions, can produce either generations or embeddings - EchoEmbeddings: Repetition Improves Language Model Embeddings ([springer, kotha, fried, neubig, & raghunathan, 2024](https://arxiv.org/pdf/2402.15449.pdf)) - Feed a prompt such as “Rewrite the sentence: x, rewritten sentence: x” to the language model and pool the contextualized embeddings of the 2nd occurence of x - + - include task-specific prefix like in E5-mistral-instruct - + - E5-mistral-instruct: Improving Text Embeddings with LLMs ([wang...wei, 2023](https://arxiv.org/abs/2401.00368)) - finetune embeddings on synthetic data - first prompt GPT-4 to brainstorm a list of potential retrieval tasks, and then generate *(query, positive, hard negative)* triplets for each task (GPT write the whole documents) - builds on E5 ([wang...wei, 2022](https://arxiv.org/abs/2212.03533)) @@ -949,7 +954,6 @@ mixture of experts models have become popular because of the need for (1) fast s - BGE ([github](https://github.com/FlagOpen/FlagEmbedding)) - Nomic Embed ([nussbaum, morris, duderstadt, & mulyar, 2024](https://static.nomic.ai/reports/2024_Nomic_Embed_Text_Technical_Report.pdf)), ([blog post](https://blog.nomic.ai/posts/nomic-embed-text-v1)) - Older: [SBERT](https://arxiv.org/abs/1908.10084), [SIMCSE](https://arxiv.org/abs/2104.08821), [SGPT](https://arxiv.org/abs/2202.08904) - - embedding approaches [overview](https://github.com/caiyinqiong/Semantic-Retrieval-Models) - 3 levels of interaction - bi-encoder: separately encode query & doc @@ -969,12 +973,19 @@ mixture of experts models have become popular because of the need for (1) fast s - Active Retrieval Augmented Generation ([jiang...neubig, 2023](https://arxiv.org/abs/2305.06983)) - introduce FLARE, a method that iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens - Matryoshka Representation Learning ([kusupati...kakade, jain, & farhadi, 2022](https://arxiv.org/abs/2205.13147)) - in training given an embedding of full dimensionality M (e.g. 2048), learn N different distance functions for each prefix of the embedding (e.g. l2_norm(embedding[:32]), l2_norm(embedding[:64]), l2_norm(embedding[:128]), etc). - AGRAME: Any-Granularity Ranking with Multi-Vector Embeddings ([reddy...potdar, 2024](https://arxiv.org/pdf/2405.15028)) - rank at varying levels of granularity while maintaining encoding at a single (coarser) level - - Hypothetical Document Embeddings ([gao…callan, 2022](https://arxiv.org/pdf/2212.10496.pdf)) - generate hypothetical document from query + instruction using GPT and find match for that doc - Probing embeddings - Uncovering Meanings of Embeddings via Partial Orthogonality ([jiang, aragam, & veitch, 2023](https://arxiv.org/abs/2310.17611)) - - The Linear Representation Hypothesis and the Geometry of LLMs ([park...veitch, 2023](https://arxiv.org/abs/2311.03658)) - concepts can be decoded linearly from representations - - vec2text: Text Embeddings Reveal (Almost) As Much As Text ([morris et al. 2023](https://arxiv.org/abs/2310.06816)) - invert embeddings to text without using gradients + - The Linear Representation Hypothesis and the Geometry of LLMs ([park...veitch, 2023](https://arxiv.org/abs/2311.03658)) - concepts can be decoded linearly from representations +- Embedding inversions + - Generative Embedding Inversion Attack to Recover the Whole Sentence ([li...song, 2023](https://arxiv.org/pdf/2305.03010)) - train projection to LM jointly to reconstruct input + - Information Leakage from Embedding in Large Language Models ([wan...wang, 2024](https://arxiv.org/abs/2405.11916)) + - base embed inversion - directly pass hidden states to the LM head for generation + - hotmap embed inversion - find input which yields embedding with greatest cosine similarity + - embed parrot - learn a linear mapping to embedding states that is then + - vec2text ([morris et al. 2023](https://arxiv.org/abs/2310.06816)) - invert embeddings to text without using gradients + - logit2prompt ([morris, ..., rush, 2024](https://arxiv.org/pdf/2311.13647)) - recover prompt from output logits + - output2prompt ([zhang, morris, & shmatikov, 2024](https://arxiv.org/pdf/2405.15012)) - recover prompt from long text outputs (by building a model of the sparse encodings of the outputs) - RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval ([sarthi...manning](https://arxiv.org/abs/2401.18059)) - retrieve many docs and cluster/summarize before using - Seven Failure Points When Engineering a Retrieval Augmented Generation System ([barnet...abdelrazek, 2024](https://arxiv.org/abs/2401.05856)) - Retrieve to Explain: Evidence-driven Predictions with Language Models ([patel...corneil, 2024](https://arxiv.org/pdf/2402.04068.pdf)) @@ -998,7 +1009,6 @@ mixture of experts models have become popular because of the need for (1) fast s - Why do These Match? Explaining the Behavior of Image Similarity Models ([plummer…saenko, forsyth, 2020](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123560630.pdf)) - generate saliency map + with an attribute based on the salient region - Towards Visually Explaining Similarity Models ([zheng…wu, 2020](https://arxiv.org/abs/2008.06035)) - similarity of cnn embeddings - Interpretable entity representations through large-scale typing ([onoe & durrett, 2020](https://arxiv.org/abs/2005.00147)) - embedding is interpretable predictions for different entities - - Explaining similarity with different outputs - Analogies and Feature Attributions for Model Agnostic Explanation of Similarity Learners ([ramamurthy…tariq, 2022](https://arxiv.org/pdf/2202.01153.pdf)) - returned explanation is an analogy (pair from the training set) rather than a saliency map - Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations ([chen…cao, 2023](https://dl.acm.org/doi/full/10.1145/3563039)) - give both saliency map + counterfactual explanation @@ -1306,6 +1316,7 @@ mixture of experts models have become popular because of the need for (1) fast s - Task Ambiguity in Humans and Language Models ([tamkin, .., goodman, 2023](https://arxiv.org/abs/2212.10711)) - Bayesian Preference Elicitation with Language Models ([handa, gal, pavlick, goodman, tamkin, andreas, & li, 2024](https://arxiv.org/pdf/2403.05534v1.pdf)) - STaR-GATE: Teaching Language Models to Ask Clarifying Questions ([andukuri...goodman, 2024](https://arxiv.org/abs/2403.19154)) + - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves ([deng...gu, 2024](https://arxiv.org/abs/2311.04205)) - Loose LIPS Sink Ships: Asking Questions in *Battleship* with Language-Informed Program Sampling ([grand, pepe, andreas, & tenenbaum , 2024](https://arxiv.org/pdf/2402.19471.pdf)) - language-informed program sampling (LIPS) model uses large language models (LLMs) to generate NL questions, translate them into symbolic programs, and evaluate their expected info gain @@ -1321,6 +1332,11 @@ mixture of experts models have become popular because of the need for (1) fast s - see also things in [imodelsX](https://github.com/csinva/imodelsX) - Can Foundation Models Wrangle Your Data? ([narayan...re, 2022](https://arxiv.org/abs/2205.09911)) - Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning ([vos, dohmen, & schelter, 2024](https://openreview.net/pdf?id=8kyYJs2YkFH)) + - llms for reading charts + - ChartLlama: A Multimodal LLM for Chart Understanding and Generation ([han...zhang, 2023](https://arxiv.org/abs/2311.16483)) + - Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots ([wu...luo, 2024](https://arxiv.org/abs/2405.07990)) + - MathVista: Evaluating Math Reasoning in Visual Contexts ([lu...galley, gao, 2024](https://mathvista.github.io/)) + - Evaluating Task-based Effectiveness of MLLMs on Charts ([wu...tang, 2024](https://arxiv.org/abs/2405.07001)) - evals + chhain-of-charts prompting - modeling - TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations ([slack, krishna, lakkaraju, & singh, 2023](https://arxiv.org/abs/2207.04154)) - train model to translate human queries into API calls (~30 calls, things like feature importance, filter data, counterfactual explanation) - TalkToEBM: LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs ([lengerich...caruana, 2023](https://arxiv.org/abs/2308.01157)) - use LLMs to analyze tabular data and make suggestions for EBMs diff --git a/_notes/stat/causal_inference.md b/_notes/stat/causal_inference.md index 7a6a23cc..047e0a44 100755 --- a/_notes/stat/causal_inference.md +++ b/_notes/stat/causal_inference.md @@ -477,6 +477,10 @@ M --> Y *The emphasis in this section is on ATE estimation, as an example of the considerations required for making causal conclusions. Observational analysis focuses on adjusting for observed confounding.* +![Screenshot 2024-05-31 at 10.39.08 AM](../assets/Screenshot%202024-05-31%20at%2010.39.08%E2%80%AFAM.png) + +![Screenshot 2024-05-31 at 10.39.15 AM](../assets/Screenshot%202024-05-31%20at%2010.39.15%E2%80%AFAM.png) + ## ATE estimation basics - assume we are given iid samples of $\{ X_i, T_i, Y_i^{T=1}, Y_i^{T=0} \}$, and drop the index $i$