Humanities Data

Humanitiesdata.com seeks to help collect and disseminate information about publicly available data of particular interest to digital humanities and humanities computing. It is founded on the premise that open data will be crucial for the future of digital humanities. A culture of openness and collective access to shared digital objects of study will enable digital humanists to:

Collaborate more effectively
Interrogate and invalidate insufficiently rigorous scholarship
Verify and build upon excellent scholarship
Avoid duplicating the intense labor of data creation and normalization
Learn new methods and approaches at a pace that’s consistent with the speed at which DH moves

Currently, the open data movement in digital humanities is growing but not yet dominant. It’s all too common to publish data-driven humanities scholarship without making one’s data available to the public. Concerns about proprietary data, copyright, vendors’ terms and conditions, and long-term data curation are significant roadblocks and are not to be dismissed out of hand. In turn, an overall reticence about data for the humanities (or using the term data when discussing DH) creates a climate where discussions about open data are more difficult to have.

There are numerous places to find data of relevance to the humanities, including the Corpora listserv and the digital humanities Slack channel, but a more consolidated web-based clearinghouse for web-hosted resources can only increase visibility and help newcomers to digital humanities find their way. As a result, the mission of humanitiesdata.com is not to replace any existing resources but, rather, to increase the overall number of digital pathways to humanities data, and to make it easier to search for data by its relevance to specific subfields.

Frequently Asked Questions

Do you host datasets?

No, I do not. At the moment, the site only lists links to datasets and "recipes" that demonstrate some kind of data fetch operation (scraping, API, etc.). I am considering several expansions to the website if you are interested in contributing to its growth. (See below for more information on getting involved.)

Then how can I host a dataset?

You can use CKAN services like datahub.io, host your own data via Github or Github Large File Storage, and share recipes using Gist. If hosting your own data, you can use our metadata schema to make sure you've filled in the required and required-if-applicable data fields associated with Project Open Data version 1.1. These fields are specifically design to map to Data Catalog Vocabulary (DCAT) and CKAN sites. Some POD schema version 1.0 fields are also included in order to assure mappings.

How can I help with this project?

I am always looking for volunteers to help with this site. I promise not to spam your email address. To be a part of humanitiesdata.com, just send me an email (listed below) with your name, a little about yourself, and some thoughts on how you think you might contribute.

Other forms of support include adding datasets and recipes, letting me know if anything seems to be broken, and spreading the word about this site via social media, word-of-mouth, or messenger bird.

Some Additional Details

Correspondence Address

Matthew J. Lavin
Data Analytics Program
100 West College Street
Denison University
Granville, OH 43023

Email

lavinm [at] denison.edu

Under the Hood

Humanitiesdata.com is one component of the webserver, which is represented by this Github repository. The webserver uses Docker to containerize a proxy server (nginx), and several other Docker containers. Humanitiesdata.com is container running Flask (Python for the web) with WSGI. The page design uses bootstrap3 to achieve responsiveness (i.e., mobile website compatible). As the site grows in size, it would ideal to add elasticsearch to power a search engine but, for now, a lightweight javascript search (over json) seems to be perform in adequately.

About

Frequently Asked Questions

Some Additional Details

Under the Hood