MS Marco Data from "Leading Conversational Search by Suggesting Useful Questions" (2020) | Microsoft Machine Reading Comprehension (MS MARCO) is a new ... | dataset | Full Record |
Reuters-21578 | Currently the most widely used test collection for text ... | dataset | Full Record |
The EMILLE Corpus | The EMILLE Corpus has been constructed as part of ... | dataset | Full Record |
English Language Stop Words | This list of stop words is more extensive than ... | dataset | Full Record |
MS Marco Keyphrase Extraction Dataset | Keyphrase extraction on open domain document is an up ... | dataset | Full Record |
MS Marco Optimal Crawling Dataset | The dataset used for Optimal Freshness Crawl Under Politeness ... | dataset | Full Record |
Eighteenth-Century Poetry Archive | The Eighteenth-Century Poetry Archive (ECPA) — eighteenthcenturypoetry.org — is ... | dataset | Full Record |
Thomas Gray Archive | The Thomas Gray Archive (TGA) — thomasgray.org — is ... | dataset | Full Record |
Plaintext Jokes | Approximately 208,000 jokes scraped from various websites
... | dataset | Full Record |
Data from "The Dative Alternation Revisited: Fresh Insights from Contemporary British Spoken Data" | ... | dataset | Full Record |