HTRC Extracted Features Dataset

Description

The HTRC Extracted Features Dataset v.1.0 is comprised of page-level features for 13.7 volumes in the HathiTrust Digital Library. This version contains non-consumptive features for both public-domain and in-copyright books. Features include part-of-speech tagged term token counts, header/footer identification, marginal character counts, and much more. A full explanation of the dataset's features, motivation, and creation is available at the EF Dataset documentation page

Resource Fields

Resource Type:

dataset

Submitted By:

Matt Lavin

Date Submitted:

2016-12-15 16:01:05


Project Open Data Required Fields (version 1.1)

Modified

November 2016

Publisher

HTRC

Contact Name

4357604871

Unique Identifier

http://dx.doi.org/10.13012/J8X63JT3

Public Access Level

public

Project Open Data Additional Fields (version 1.0)

Contact email

htrc-help@hathitrust.org

Endpoint

[No Data]

Format

zip, bzip2, json

Project Open Data Required-if-Applicable Fields (version 1.1)

Access Level Comment

[No Data]

Bureau Code

[No Data]

Program Code

[No Data]

License

[No Data]

Rights

Boris Capitanu, Ted Underwood, Peter Organisciak, Timothy Cole, Maria Janina Sarol, J. Stephen Downie (2016). The HathiTrust Res

Spatial

[No Data]

Temporal

[No Data]