EarlyPrint Corpus

Description

EarlyPrint is a collaborative effort—centered doubly at Northwestern University and Washington University in St. Louis—to transform the early English print record, from 1473 to the early 1700s, into a linguistically annotated and deeply searchable text archive. Its leaders have been Joseph Loewenstein and Martin Mueller. Contributions to the project have been made by faculty, librarians, IT professionals, and students at Amherst, Northwestern, Notre Dame, Nebraska-Lincoln, Tübingen, and Washington University in St. Louis, notably Anupam Basu, Craig Berry, John Ladd, Philip Burns, Douglas Knox, Stephen Pentecost, Kate Needham, Elisabeth Chaghafi, Peter Berek, Tracy Bergstrom, Daniel Johnson, Eric Lease Morgan, Hannah Bredar, Brian Pytlik Zillig, and Lydia Zoells. The corpus on bitbucket currently contains approximately 52,000 EEBO-TCP-derived texts with linguistic tagging and other enhancements. Link leads to a combined repository using a git submodule for each subdirectory of the texts directory. Each submodule is named after the first three letters of the TCP identifiers. Consult the bitbucket README for information, including instructions for downloading the corpus.

Resource Fields

Resource Type:

dataset

Submitted By:

Matt Lavin

Date Submitted:

2023-03-29 14:22:14


Project Open Data Required Fields (version 1.1)

Modified

03-15-2023

Publisher

[No data]

Contact Name

Martin Mueller; Joseph Loewenstein

Unique Identifier

[No data]

Public Access Level

[No data]

Project Open Data Additional Fields (version 1.0)

Contact email

martinmueller [at] northwestern.edu; jfloewen [ at ] wustl.edu

Endpoint

https://earlyprint.org/download/

Format

xml, csv

Project Open Data Required-if-Applicable Fields (version 1.1)

Access Level Comment

[No Data]

Bureau Code

[No Data]

Program Code

[No Data]

License

https://creativecommons.org/licenses/by-nc/3.0/legalcode

Rights

[No Data]

Spatial

[No Data]

Temporal

[No Data]