https://eugeneyan.com/rss 09月30日 19:09
语言模型阅读清单
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提供了一份语言模型相关的阅读清单,涵盖了Attention机制、Transformer架构、预训练方法、模型规模与效率、微调技术、检索增强生成等关键主题。内容精选自近年来顶级的NLP论文,适合想要了解语言模型最新进展的研究者。

📚 本文整理了一份涵盖Attention机制、Transformer架构、预训练方法(如BERT、GPT)、模型规模与效率(如Scaling Laws、Chinchilla)、微调技术(如LoRA、QLoRA)、检索增强生成(如RAG、DPR)等关键主题的语言模型阅读清单。

🔍 清单内容精选自近年来顶级的NLP论文,如Attention Is All You Need、BERT、GPT系列、T5、Chinchilla、LLaMA等,旨在帮助读者了解语言模型领域的最新进展和核心思想。

📖 阅读清单按照主题进行分类,并提供了每篇论文的简要介绍和核心贡献,方便读者快速掌握关键信息,为撰写论文或进行相关研究提供参考。

    Attention Is All You Need: Query, Key, and Value are all you need* (*Also position embeddings, multiple heads, feed-forward layers, skip-connections, etc.)

    GPT: Improving Language Understanding by Generative Pre-Training: Decoder is all you need* (*Also, pre-training + finetuning)

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: Encoder is all you need*. Left-to-right language modeling is NOT all you need. (*Also, pre-training + finetuning)

    T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer: Encoder-only or decoder-only is NOT all you need, though text-to-text is all you need* (*Also, pre-training + finetuning)

    GPT2: Language Models are Unsupervised Multitask Learners: Unsupervised pre-training is all you need?!

    GPT3: Language Models are Few-Shot Learners: Unsupervised pre-training + a few* examples is all you need. (*From 5 examples, in Conversational QA, to 50 examples in Winogrande, PhysicalQA, and TriviaQA)

    Scaling Laws for Neural Language Models: Larger models trained on lesser data* are what you you need. (*10x more compute should be spent on 5.5x larger model and 1.8x more tokens)

    Chinchilla: Training Compute-Optimal Large Language Models: Smaller models trained on more data* are what you need. (*10x more compute should be spent on 3.2x larger model and 3.2x more tokens)

    LLaMA: Open and Efficient Foundation Language Models: Smoler models trained longer—on public data—is all you need

    InstructGPT: Training language models to follow instructions with human feedback: 40 labelers are all you need* (*Plus supervised fine-tuning, reward modeling, and PPO)

    LoRA: Low-Rank Adaptation of Large Language Models: One rank is all you need

    QLoRA: Efficient Finetuning of Quantized LLMs: 4-bit is all you need* (*Plus double quantization and paged optimizers)

    DPR: Dense Passage Retrieval for Open-Domain Question Answering: Dense embeddings are all you need* (*Also, high precision retrieval)

    RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: Semi-parametric models* are all you need (*Dense vector retrieval as non-parametric component; pre-trained LLM as parametric component)

    RETRO: Improving language models by retrieving from trillions of tokens: Retrieving based on input chunks and chunked cross attention are all you need

    Internet-augmented language models through few-shot prompting for open-domain question answering: Google Search as retrieval is all you need

    HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels: LLM-generated, hypothetical documents are all you need

    FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness: For-loops in SRAM are all you need

    ALiBi; Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation: Constant bias on the query-key dot-product is all you need* (*Also hyperparameter m and cached Q, K, V representations)

    Codex: Evaluating Large Language Models Trained on Code: Finetuning on code is all you need

    Layer Normalization: Consistent mean and variance at each layer is all you need

    On Layer Normalization in the Transformer Architecture: Pre-layer norm, instead of post-layer norm, is all you need

    PPO: Proximal Policy Optimization Algorithms: Clipping your surrogate function is all you need

    WizardCoder: Empowering Code Large Language Models with Evol-Instruct: Asking the model to make the question harder is all you need* (*Where do they get the responses to these harder questions though?!)

    Llama 2: Open Foundation and Fine-Tuned Chat Models: Iterative finetuning, PPO, rejection sampling, and ghost attention is all you need* (*Also, 27,540 SFT annotations and more than 1 million binary comparison preference data)

    RWKV: Reinventing RNNs for the Transformer Era: Linear attention during inference, via RNNs, is what you need

    RLAIF - Constitutional AI: Harmlessness from AI Feedback: A natural language constitution* and model feedback on harmlessness is all you need (*16 different variants of harmlessness principles)

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer: Noise in your softmax and expert regularization are all you need

    CLIP: Learning Transferable Visual Models From Natural Language Supervision: *A projection layer between text and image embeddings is all you need (*Also, 400 million image-text pairs)

    ViT; An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: Flattened 2D patches are all you need

    Generative Agents: Interactive Simulacra of Human Behavior: Reflection, memory, and retrieval are all you need

    Out-of-Domain Finetuning to Bootstrap Hallucination Detection: Open-source, permissive-use data is what you need

    DPO; Direct Preference Optimization: Your Language Model is Secretly a Reward Model: A separate reward model is NOT what you need

    Consistency Models: Mapping to how diffusion adds gaussian noise to images is all you need

    LCM; Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference: Consistency modeling in latent space is all you need* (*Also, a diffusion model to distill from)

    LCM-LoRA: A Universal Stable-Diffusion Acceleration Module: Combining LoRAs is all you need

    Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models: Asking the LLM to reflect on retrieved documents is all you need

    Emergent Abilities of Large Language Models: The Bitter Lesson is all you need

    Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions: The Bellman equation and replay buffers are all you need

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations: Classification guidelines and the multiple-choice response are all you need

    \(\text{REST}^{EM}\); Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models: Synthetic data and a reward function are all you need

    Mixture of Experts Explained: Conditional computation and sparsity are all you need

    SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models: Generator and discriminator are all you need.

    Self-Instruct: Aligning Language Models with Self-Generated Instructions: 54% valid instruction-input-output tuples is all you need.

    Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling: Well documented, publicly available model checkpoints are all you need.

    Self-Rewarding Language Models: Asking the model to evaluate itself is all you need.

    Building Your Own Product Copilot - Challenges, Opportunities, and Needs: Prompt engineering LLMs is NOT all you need.

    Matryoshka Representation Learning: Aggregated losses across \(2^n\)-dim embeddings is all you need.

    Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems: Bigger GPUs is not all you need.

    How to Generate and Use Synthetic Data for Finetuning: Synthetic data is almost all you need.

    Whisper: Robust Speech Recognition via Large-Scale Weak Supervision: 680k hrs of audio and multitask formulated as a sequence is all you need.

@article{yan2024default,  title   = {Language Modeling Reading List (to Start Your Paper Club)},  author  = {Yan, Ziyou},  journal = {eugeneyan.com},  year    = {2024},  month   = {Jan},  url     = {https://eugeneyan.com/writing/llm-reading-list/}}

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语言模型 阅读清单 BERT GPT Chinchilla LLaMA LoRA RAG
相关文章