NLP领域有哪些必读的经典论文？

NLP领域有哪些必读的经典论文？

发表于 2025-4-9 13:41:23

推荐下NLP这个领域内最重要的8篇论文吧（依据学术范标准评价体系得出的8篇名单）：

一、Deep contextualized word representations
作者：Matthew E. Peters / Mark Neumann / Mohit Iyyer / Matt Gardner / Christopher M. Clark / ... / Luke Zettlemoyer
摘要：We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.
全文链接：文献全文 - 学术范 (xueshufan.com)

二、Glove: Global Vectors for Word Representation
作者：Piotr Bojanowski / Edouard Grave / Armand Joulin / Tomas Mikolov
摘要：Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.
全文链接：文献全文 - 学术范 (xueshufan.com)

三、SQuAD: 100,000+ Questions for Machine Comprehension of Text
作者：Pranav Rajpurkar / Jian Zhang / Konstantin Lopyrev / Percy Liang
摘要：We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). However, human performance (86.8%) is much higher, indicating that the dataset presents a good challenge problem for future research. The dataset is freely available at this https URL
全文链接：文献全文 - 学术范 (xueshufan.com)

四、GloVe: Global Vectors for Word Representation
作者：Jeffrey Pennington / Richard Socher / Christopher D. Manning
摘要：Recent methods for learning vector spacerepresentations  of  words  have  succeededin capturing fine-grained semantic andsyntactic  regularities  using  vector  arith-metic,  but the origin of these regularitieshas  remained  opaque. We  analyze  andmake explicit the model properties neededfor  such  regularities  to  emerge  in  wordvectors. The  result  is  a  new  global  log-bilinear  regression  model  that  combinesthe  advantages  of  the  two  major  modelfamilies  in  the  literature: global  matrixfactorization  and  local  context  windowmethods.  Our model efficiently leveragesstatistical information by training only onthe nonzero elements in a word-word co-occurrence matrix, rather than on the en-tire sparse matrix or on individual contextwindows in a large corpus. The model pro-duces a vector space with meaningful sub-structure, as evidenced by its performanceof 75% on a recent word analogy task.  Italso outperforms related models on simi-larity tasks and named entity recognition.
全文链接：学术范

五、Sequence to Sequence Learningwith Neural Networks
作者：Ilya Sutskever / Oriol Vinyals / Quoc V. Le
摘要：Deep Neural Networks (DNNs) are powerful models that have achieved excel-lent performance on difficult learning tasks. Although DNNswork well wheneverlarge labeled training sets are available, they cannot be used to map sequences tosequences.  In this paper, we present a general end-to-end approach to sequencelearning that makes minimal assumptions on the sequence structure. Our methoduses a multilayered Long Short-Term Memory (LSTM) to map theinput sequenceto a vector of a fixed dimensionality, and then another deep LSTM to decode thetarget sequence from the vector.  Our main result is that on anEnglish to Frenchtranslation task from the WMT’14 dataset, the translationsproduced by the LSTMachieve a BLEU score of 34.8 on the entire test set,  where the LSTM’s BLEUscore was penalized on out-of-vocabulary words. Additionally, the LSTM did nothave difficulty on long sentences.  For comparison, a phrase-based SMT systemachieves a BLEU score of 33.3 on the same dataset.  When we usedthe LSTMto rerank the 1000 hypotheses produced by the aforementioned SMT system, itsBLEU score increases to 36.5, which is close to the previous best result on thistask.  The LSTM also learned sensible phrase and sentence representations thatare sensitive to word order and are relatively invariant to the active and the pas-sive voice.  Finally, we found that reversing the order of thewords in all sourcesentences (but not target sentences) improved the LSTM’s performance markedly,because doing so introduced many short term dependencies between the sourceand the target sentence which made the optimization problemeasier.
全文链接：学术范

六、The Stanford CoreNLP Natural Language Processing Toolkit
作者：Christopher D. Manning / Mihai Surdeanu / John Bauer / Jenny Finkel / Steven J. Bethard / David McClosky
摘要：We  describe  the  design  and  use  of  theStanford  CoreNLP  toolkit,  an  extensiblepipeline  that  provides  core  natural  lan-guage analysis. This toolkit is quite widelyused, both in the research NLP communityand also among commercial and govern-ment  users  of  open  source  NLP  technol-ogy. We  suggest  that  this  follows  froma  simple,  approachable  design,  straight-forward  interfaces,  the  inclusion  of  ro-bust  and  good  quality  analysis  compo-nents,  and  not  requiring  use  of  a  largeamount of associated baggage.
全文链接：学术范

七、Distributed Representations of Words and Phrases and their Compositionality
作者：Tomas Mikolov / Ilya Sutskever / Kai Chen / Greg Corrado / Jeffrey Dean
摘要：The recently introduced continuous Skip-gram model is an efficient method forlearning high-quality distributed vector representations that capture a large num-ber of precise syntactic and semantic word relationships. In this paper we presentseveral extensions that improve both the quality of the vectors and the trainingspeed.  By subsampling of the frequent words we obtain significant speedup andalso learn more regular word representations.  We also describe a simple alterna-tive to the hierarchical softmax called negative sampling.An inherent limitation of word representations is their indifference to word orderand their inability to represent idiomatic phrases.  For example, the meanings of“Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivatedby this example, we present a simple method for finding phrases in text, and showthat learning good vector representations for millions of phrases is possible.
全文链接：学术范

八、Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
作者：Richard Socher / Alex Perelygin / Jean Y. Wu / Jason Chuang / Christopher D. Manning / Andrew Y. Ng / Christopher Potts
摘要：Semantic  word  spaces  have  been  very  use-ful but cannot express the meaning of longerphrases in a principled way.  Further progresstowards understanding compositionality intasks  such  as  sentiment  detection  requiresricher  supervised  training  and  evaluation  re-sources  and  more  powerful  models  of  com-position.To  remedy  this,  we  introduce  aSentiment Treebank.  It includes fine grainedsentiment  labels  for  215,154  phrases  in  theparse  trees  of  11,855  sentences  and  presentsnew  challenges  for  sentiment  composition-ality.To  address  them,  we  introduce  theRecursive  Neural  Tensor  Network.Whentrained on the new treebank,  this model out-performs all previous methods on several met-rics. It  pushes  the  state  of  the  art  in  singlesentence positive/negative classification from80% up to 85.4%. The accuracy of predictingfine-grained  sentiment  labels  for  all  phrasesreaches 80.7%, an improvement of 9.7% overbag of features baselines. Lastly, it is the onlymodel that can accurately capture the effectsof negation and its scope at various tree levelsfor both positive and negative phrases.
全文链接：学术范

希望对你有帮助！

发表于 2025-4-9 13:51:11

平时看的论文会更新上去，一些精读的会写一些阅读笔记！
https://github.com/DengBoCong/nlp-paper

发表于 2025-4-9 14:00:37

就不列论文了，说一下找重要论文的方法，授人以渔。
先说反向查找法，就是从新论文出发进行检索。
你可以看一下最新的NLP会议，例如ACL2019，在你感兴趣的领域里找出那些得奖的论文，以及那些到现在引用量比较大的论文，然后读。这是第一轮。
然后从这些里挑出你觉得对你最有帮助的，看一下他们的参考文献。被大量第一轮论文引用的论文可以作为第二轮论文。以此类推。
再说一下正向查找法，就是从经典论文出发检索引申的重要论文。
搜索引擎的citation已经告诉你什么论文重要，而且重要的论文会形成网络，给你一个网络中的节点很快就能覆盖整个图。例如查一下bert，如图一，点击它的cited by，会出现图二，你又可以根据引用量找到重要的论文。如此迭代。

BERT 在google scholar里的结果

引用bert的论文结果

发表于 2025-4-9 14:13:53

2021.8更新一下之前写的两个有意思的方向调研
prompt:
Prompt Based Task Reformulation in NLP 调研edit based generation:
基于编辑的文本生成调研
<hr/>列出一些通用的NLP知识点，如果更早的话直接看书比较全：Speech and Language Processing：An Introduction to Natural Language Processing,Computational Linguistics, and Speech Recognition。其余的得先确定细分领域再找论文。

deep learning for nlp的框架在08年就大致成型了：A unified architecture for natural language processing: deep neural networks with multitask learning
主题模型，LDA： Latent dirichlet allocation
条件随机场：Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data；Bidirectional LSTM-CRF Models for Sequence Tagging
词向量，word2vec： Efficient Estimation of Word Representations in Vector Space ； glove： Glove: Global Vectors for Word Representation
Gated RNN： Long short-term memory ； On the Properties of Neural Machine Translation: Encoder-Decoder Approaches； Empirical evaluation of gated recurrent neural networks on sequence modeling
CNN: Convolutional Neural Networks for Sentence Classification
RNN-based seq2seq，文本生成基本都是套seq2seq：Sequence to Sequence Learning with Neural Networks；以及之后让NMT一战成名的注意力机制： Neural Machine Translation by Jointly Learning to Align and Translate
两种非RNN的seq2seq：Convolutional Sequence to Sequence Learning；Attention Is All You Need
预训练语言模型：DEEP CONTEXTUALIZED WORD REPRESENTATIONS；Language Models are Unsupervised Multitask Learners；BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

发表于 2025-4-9 14:27:28

BERT是一定要看的了，就不再说什么了。
本人主要是做文本蕴含和阅读理解，所以推荐几篇本人认为的经典论文。
往前一般基于cnn和lstm框架较多，
ESIM【1】、BiMPM【2】、DIIN【3】、Cafe【4】、QANet【5】和DrQA【6】等
往后都是在transformer上进行修改的。
GPT【7】、BERT【8】和UniLM【9】等
这几篇只是本人比较喜欢、比较火的几篇论文。还有很多，有待补充。
[1]Enhanced LSTM for Natural Language Inference
[2]Bilateral Multi-Perspective Matching for Natural Language Sentences
[3]Natural Language Inference over Interaction Space
[4]Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
[5]QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
[6]Reading Wikipedia to Answer Open-Domain Questions
[7]Improving Language Understanding by Generative Pre-Training
[8]BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
[9]Unified Language Model Pre-training for Natural Language Understanding and Generation

NLP领域有哪些必读的经典论文？

本周热门