其乐无穷 LV
发表于 2025-4-9 16:03:59
有基础,多看Paper!多撸代码!欢迎关注GitHub更新更加及时哦,汇总了NLP的论文和复现代码(持续更新),共同学习和进步,加油!!!(部分精读论文配有阅读笔记)
nlp-paper-codeContents | 内容
- 综述
- 预训练
- 模型
- 对话系统
- 语音系统
- 数据集
- 评估
- 文本相似度(匹配)
- 深度学习
- 机器学习
Summarize | 综述
- A Survey on Dialogue Systems:Recent Advances and New Frontiers:对话系统的最新研究和方向 | Chen et al,2017
- Recent Advances and Challenges in Task-oriented Dialog Systems | 阅读笔记:面向任务型对话系统的最新研究和方向 | Zhang et al,2020
- Pre-trained Models for Natural Language Processing: A Survey | 阅读笔记:超详细的NLP预训练语言模型总结清单 | Xipeng Qiu et al,2020
- Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey: 对话系统综述:新进展新前沿 | JinJie Ni et al,2021
Pretraining | 预训练
- Character-Aware Neural Language Models:提供一种功能强大,功能强大的语言模型,其可编码子词相关性,同时解决先前模型的罕见字问题,使用更少的参数获得可比较的表现力。 | Yoon et al,2015
- Neural Machine Translation of Rare Words with Subword Units:就是我们所熟知的Byte Pair Encoding,是一种使用一些出现频率高的byte pair来组成新的byte的方法 | Sennrich et al,2015
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models:一个非常出色的框架,主要是在word-level进行翻译,但是在有需要的时候可以很方便的使用Character-level的输入。 | Luong et al,2016
- Learning Character-level Representations for Part-of-Speech Tagging:Character-level去构建word-level,该网络结构主要是对字符进行卷积以生成单词嵌入,同时使用固定窗口对PoS标记的字嵌入进行操作。 | Jason et al,2016
- A Joint Model for Word Embedding and Word Morphology:该模型的目标与word2vec相同,但是使用的是Character-level的输入,它使用了双向的LSTM结构尝试捕获形态并且能够推断出词根。 | Kris et al,2016
- Enriching Word Vectors with Subword Information:word2vec的升级版,对于具有大量形态学的稀有词和语言有更好的表征,它也可以说是带有字符n-gram的w2v skip-gram模型的扩展。 | Piotr et al,2016
- Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation:wordpiece作为BERT使用的分词方式,其生成词表的方式和BPE非常相近,区别在于BPE选择频率最高的相邻字符对进行合并,而wordpiece是基于概率生成的。 | Yonghui et al,2016
- Fully Character-Level Neural Machine Translation without Explicit Segmentation:比较经典的Character-Level的Subword算法模型 | Jason et al,2016
- Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates:unigram在给定词表及对应概率值下,直接以最大化句子的likelihood为目标来直接构建整个词表 | Kudo et al,2018
- How to Fine-Tune BERT for Text Classification? | 阅读笔记:BERT在Text Classification上的一些微调实验 | Xipeng Qiu et al,2019
- Pretraining Methods for Dialog Context Representation Learning | 阅读笔记:作者列举了四种针对对话上下文表示的预训练方法,其中两种是作者新提出的 | Shikib et al,2019
- Pre-trained Models for Natural Language Processing: A Survey | 阅读笔记:超详细的NLP预训练语言模型总结清单 | Xipeng Qiu et al,2020
- TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue | 阅读笔记:任务导向型对话的预训练自然语言理解模型 | Chien-Sheng Wu et al,2020
- LogME: Practical Assessment of Pre-trained Models for Transfer Learning | 阅读笔记:一种通用且快速的评估选择适合下游任务的预训练模型的打分方法,logME | Kaichao You et al,2021
- Are Pre-trained Convolutions Better than Pre-trained Transformers? | 阅读笔记:将Transformer的Attention换成了卷积,尝试预训练模型新方式 | Yi Tay et al,2021
Model | 模型
- Convolutional Neural Networks for Sentence Classification:经典的TextCNN,static/non-static几种特征向量学习方式 | Yoon Kim et al,2014
- Language Modeling with Gated Convolutional Networks | 阅读笔记:受LSTM门控机制的启发,将线性门控机制应用于卷积结构,文中对比GLU、GTU等结构性能 | Yann N. Dauphin et al,2016
- A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS:Smooth Inverse Frequency,一种简单但是效果好的Sentence Embedding方法 | Sanjeev Arora et al,2017
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data:InferSent,通过不同的encoder得到Sentence Embedding,并计算两者差值、点乘得到交互向量,从而得到相似度。 | Alexis Conneau et al,2017
- Attention Is All You Need | 阅读笔记:Transformer的开山之作,值得精读 | Ashish et al,2017
- Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline:Unsupervised Smooth Inverse Frequency,USIF改进SIF对句向量长度敏感,在相似度任务上提升很大 | Kawin Ethayarajh Arora et al,2018
- Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction | 阅读笔记:一种用于通用序列对建模的整体架构,结合多种注意力机制进行特征增强 | Yi Tay et al,2018
- Sliced Recurrent Neural Networks:切片RNN网络,尝试突破RNN时序限制的模型 | Zeping Yu et al,2018
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | 阅读笔记:BERT的顶顶大名,使用Transformer的Encoder双向架构 | Devlin et al,2018
- Pay Less Attention With Lightweight And Dynamic Convolutions | 阅读笔记:论文研究Lightweight、Dynamic Convolutions,卷积结构同样能够达到和Self-Attention媲美的效果 | Felix Wu et al,2019
- XLNet: Generalized Autoregressive Pretraining for Language Understanding | 阅读笔记:XLNet--自回归语言模型的复兴,30多项任务超越BERT | Zhilin Yang et al,2019
- Synthesizer: Rethinking Self-Attention for Transformer Models | 阅读笔记:在Transformer架构下,对Self-Attention计算的探索研究,看完会对Self-Attention有个新认识 | Yi Tay et al,2020
- Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting | 阅读笔记:一种效果远超Transformer的长序列预测模型,针对LSTF问题上的研究改进 | Haoyi Zhou et al,2020
Dialogue | 对话系统
- The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management:关于对话状态管理的文章,可以用来补充相关背景知识 | Young et al,2010
- Context Sensitive Spoken Language Understanding Using Role Dependent LSTM Layers:使用LSTM在SLU方面做的工作,通过agent和client角色划分,能够解决多轮对话中的歧义问题 | Hori et al,2015
- A Neural Conversational Model:Seq2Seq结构的对话模型 | Oriol et al,2015
- A Network-based End-to-End Trainable Task-oriented Dialogue System | 阅读笔记:非常值得一读的任务型对话模型架构 | Wen et al,2016
- Neural Belief Tracker: Data-Driven Dialogue State Tracking | 阅读笔记:NBT框架,理解Belief state和tracking的好文 | Young et al,2016
- Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots | 阅读笔记:SMN检索式对话模型,多层多粒度提取信息 | Devlin et al,2016
- Latent Intention Dialogue Models | 阅读笔记:离散潜在变量模型学习对话意图的框架 | Wen et al,2017
- An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog | 阅读笔记:面向任务的对话系统的新型端到端可训练神经网络模型 | Liu et al,2017
- Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network | 阅读笔记:DAM检索式对话模型,完全基于注意力机制的多层多粒度提取信息 | Xiangyang et al,2018
- Global-Locally Self-Attentive Dialogue State Tracker | 阅读笔记:全局-局部自注意力状态跟踪 | Zhong et al,2018
- Dense Passage Retrieval for Open-Domain Question Answering | 阅读笔记:DPR一种高效的开放域问答检索技术,应用了BERT进行编码 | Karpukhin et al,2020
- TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue | 阅读笔记:任务导向型对话的预训练自然语言理解模型 | Chien-Sheng Wu et al,2020
- Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering:Fusion-in-Decoder生成式阅读理解模型 | Izacard et al,2020
- DISTILLING KNOWLEDGE FROM READER TO RETRIEVER FOR QUESTION ANSWERING | 阅读笔记:一种模型训练模型的开放域问答方法 | Izacard et al,2021
- Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features:通过可控特征来增加知识对话系统的学习 | Rashkin et al,2021
Speech | 语音系统
- Attention-Based Models for Speech Recognition:Tacotron2使用的Location Sensitive Attention | Chorowski et al,2015
- Tacotron: A Fully End-To-End Text-To-Speech Synthesis Model | 阅读笔记:Tacotron,端到端的语音合成系统 | Yuxuan et al,2017
- Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions | 阅读笔记:Tacotron2,相较于Tacotron有着更好的性能,使用WaveNet作为Vocoder | Jonathan et al,2017
- Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese:使用Transformer应用在普通话语音识别,数据集是HKUST datasets | Shiyu et al,2018
- Neural Speech Synthesis with Transformer Network | 阅读笔记:本文受Transformer启发,使用多头自注意力机制取代Tacotron2中的RNN结构和原始注意力机制。 | Naihan et al,2018
- A Comparative Study on Transformer vs RNN in Speech Applications | 阅读笔记:Transformer应用在语音领域上与RNN对比的论文,并在ESPnet上面开源了模型代码 | Nanxin et al,2019
Dataset | 数据集
- The Second Dialog State Tracking Challenge:DSTC系列语料是专门用于对话状态跟踪的,非常经典,不过它的官网貌似无用了 | Henderson et al,2014
- The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems:Ubuntu 非结构化多轮对话数据集 | Ryan Lowe et al,2015
- DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset | [数据集地址](DailyDialog.zip):包含对话意图和情感信息的多轮对话数据集 | Yanran Li et al, 2017
- CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset | 阅读笔记:第一个大规模的中文跨域任务导向对话数据集 | Qi Zhu et al,2020
- Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining | 数据集地址:DailyDialog数据集的升级版,11K的多轮对话上下文,每个上下文包括五个标准的参考回复、五个不相关的回复、五个随机挑选的回复 | Ananya B. Sai et al, 2020
- MuTual: A Dataset for Multi-Turn Dialogue Reasoning | 阅读笔记:MuTual 数据集,用于针对性地评测模型在多轮对话中的推理能力 | L Cui et al,2020
- MultiWOZ 2.2: A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines](https://arxiv.org/pdf/2007.12720.pdf) | [阅读笔记](DengBoCong:论文阅读笔记:MultiWOZ 2.2):MultiWOZ是一个著名的面向任务的对话数据集,被广泛用作对话状态跟踪的基准,MultiWOZ 2.2是目前最新版本 | Zang et al,2020
Evaluate | 评估
- LogME: Practical Assessment of Pre-trained Models for Transfer Learning | 阅读笔记:一种通用且快速的评估选择适合下游任务的预训练模型的打分方法,logME | Kaichao You et al,2021
- Towards Quantifiable Dialogue Coherence Evaluation:QuantiDCE,一种实现可量化的对话连贯性评估指标模型 | Zheng Ye et al,2021
Text Similarity | 文本相似度(匹配)
- Siamese Recurrent Architectures for Learning Sentence Similarity:Siamese LSTM,一个用来计算句对相似度的模型 | Jonas Mueller et al,2016
Deep Learning | 深度学习
- NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE:Bahdanau Attention的原文 | Bahdanau et al,2014
- Convolutional Neural Networks at Constrained Time Cost:针对卷积网络很好地概述了计算成本以及深度,过滤器尺寸之间的权衡 | Kaiming He et al,2014
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift | 阅读笔记:经典的Batch Normalization原论文 | Sergey et al,2015
- Learning both Weights and Connections for Efficient Neural Networks:有一张表格,其中列出了计算与内存访问的相对成本,除此之外还讨论了怎么精简神经网络 | Song Han et al,2015
- Effective Approaches to Attention-based Neural Machine Translation:Luong Attention的原文 | Luong et al,2015
- Strategies for Training Large Vocabulary Neural Language Models | 阅读笔记:主要是对当时的一些Softmax和Sampling进行总结,顺便提出了Differentiated Softmax方法 | Wenlin Chen et al,2015
- Exploring the Limits of Language Modeling:CNN Softmax方法,虽然还是离不开原始的Softmax,但是换了一个视角效果很好 | Rafal Jozefowicz et al,2016
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks:Weight Normalization是一种在权值维度上进行归一化的方法 | Tim Salimans et al,2016
- Layer Normalization | 阅读笔记:层归一化方法,针对Batch Normalization的改进 | Jimmy et al,2016
- Instance Normalization:The Missing Ingredient for Fast Stylization:Instance Normalization是一种不受限于批量大小的算法专门用于Texture Network中的生成器网络 | Dmitry Ulyanov et al,2016
- Efficient softmax approximation for GPUs | 阅读笔记:Adaptive Softmax,针对GPU的矩阵计算,实现了多倍与普通Softmax计算效率的提升,值得一看 | Edouard Grave et al,2016
- Large-Margin Softmax Loss for Convolutional Neural Networks | 阅读笔记:L-Softmax在原Softmax的基础上增加了控制系数m,使得类内距离尽可能小,类间距离尽可能大 | Weiyang Liu et al,2016
- An empirical analysis of the optimization of deep network loss surfaces:论文中得出一个结论,即Batch Normalization更有利于梯度下降 | Shibani et al,2016
- Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks:Cosine Normalization是一种将unbounded的向量点积换成夹角余弦操作,从而进行归一化的方法 | Luo Chunjie et al, 2017
- Massive Exploration of Neural Machine Translation Architectures | 阅读笔记:展示了以NMT架构超参数为例的首次大规模分析,实验为构建和扩展NMT体系结构带来了新颖的见解和实用建议。 | Denny et al,2017
- SphereFace: Deep Hypersphere Embedding for Face Recognition | 阅读笔记:A-Softmax,思路和L-Softmax差不多,区别是对权重进行了归一化 | Weiyang Liu et al,2017
- ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections | 阅读笔记:一种叫ProjectionNet的联合框架,可以为不同机器学习模型架构训练轻量的设备端模型。 | Google et al,2017
- Additive Margin Softmax for Face Verification | 阅读笔记:AM-Softmax在A-Softmax的最大区别是AM是角度距离,A是余弦距离
- Self-Attention with Relative Position Representations | 阅读笔记:对Transformer里面用到的位置编码进行讨论,对自注意力进行改造,从而使用相对位置编码代替硬位置编码 | Mihaylova et al,2018
- Group Normalization:Group Normalization是将输入的通道分成较小的子组,并根据其均值和方差归一化这些值 | Yuxin Wu et al,2018
- How Does Batch Normalization Help Optimization?:讨论Batch Normalization是如何帮助优化器工作的,主要结论是BN层能够让损失函数更加平滑 | Shibani et al,2018
- Scheduled Sampling for Transformers | 阅读笔记:在Transformer应用Scheduled Sampling | Mihaylova et al,2019
- Consistency of a Recurrent Language Model With Respect to Incomplete Decoding | 阅读笔记:讨论Seq2Seq模型解码停不下来的原因 | Sean Welleck et al,2020
- PowerNorm: Rethinking Batch Normalization in Transformers:对于Transformer中BN表现不好的原因做了一定的empirical和theoretical的分析 | Sheng Shen et al,2020
- A Theoretical Analysis of the Repetition Problem in Text Generation | 阅读笔记:讨论Seq2Seq模型解码重复生成的原因 | Zihao Fu et al,2020
Machine Learning | 机器学习
- Optimal Whitening and Decorrelation:提供五种白化方法的数学证明 | Agnan Kessy et al,2015
- An overview of gradient descent optimization algorithms | 阅读笔记:对当前主流的梯度下降算法进行概述 | Sebastian Ruder et al,2016
- Covariate Shift: A Review and Analysis on Classifiers | 阅读笔记:通过几种分类算法,在四种不同的数据集下验证几种方法处理Covariate Shift问题后的性能分析 | Geeta et al,2019
Code
|
|