Humanitiesdata.com seeks to help collect and disseminate information about publicly available data of particular interest to digital humanities and humanities computing. It is founded on the premise that open data will be crucial for the future of digital humanities. A culture of openness and collective access to shared digital objects of study will enable digital humanists to:

  1. Collaborate more effectively
  2. Interrogate and invalidate insufficiently rigorous scholarship
  3. Verify and build upon excellent scholarship
  4. Avoid duplicating the intense labor of data creation and normalization
  5. Learn new methods and approaches at a pace that’s consistent with the speed at which DH moves

Currently, the open data movement in digital humanities is growing but not yet dominant. It’s all too common to publish data-driven humanities scholarship without making one’s data available to the public. Concerns about proprietary data, copyright, vendors’ terms and conditions, and long-term data curation are significant roadblocks and are not to be dismissed out of hand. In turn, an overall reticence about data for the humanities (or using the term data when discussing DH) creates a climate where discussions about open data are more difficult to have.

There are numerous places to find data of relevance to the humanities, including the Corpora listserv and the digital humanities Slack channel, but a more consolidated web-based clearinghouse for web-hosted resources can only increase visibility and help newcomers to digital humanities find their way. As a result, the mission of humanitiesdata.com is not to replace any existing resources but, rather, to increase the overall number of digital pathways to humanities data, and to make it easier to search for data by its relevance to specific subfields.

Frequently Asked Questions

Do you host datasets?

No, we do not. We only collect links to datasets and "recipes" that demonstrate some kind of data fetch operation (scraping, API, etc.).

Then how can I host a dataset?

You can use CKAN services like datahub.io, host your own data via Github or Github Large File Storage, and share recipes using Gist. If hosting your own data, you can use our metadata schema to make sure you've filled in the required and required-if-applicable data fields associated with Project Open Data version 1.1. These fields are specifically design to map to Data Catalog Vocabulary (DCAT) and CKAN sites. Some POD schema version 1.0 fields are also included in order to assure mappings.

How can I help with this project?

We are looking for volunteers to be a part of our project evaluation period. Your contributions might involve peer reviewing the website, testing its functionality, or filling at a short user survey. We promise not to spam your email address. To be a part of humanitiesdata.com, go to Sign Up and provide your name and email.

Other forms of support include adding datasets and recipes, letting me know if anything seems to be broken, and spreading the word about this site via social media, word-of-mouth, or messenger bird.

We are also accepting small financial contributions to pay for web hosting so that we can move humanitiesdata.com to its own virtual host.

Some Additional Details

Correspondence Address

Matthew J. Lavin
Department of English
526 Cathedral of Learning
4200 Fifth Ave.
Pittsburgh, PA 15260


lavin [at] pitt.edu

Under the Hood

Humanitiesdata.com is one component of the webserver, which is represented by this Github repository. The webserver uses Docker to containerize a proxy server (nginx), and several other Docker containers. Humanitiesdata.com is container running Flask (Python for the web) with WSGI. The page design uses bootstrap3 to achieve responsiveness (i.e., mobile website compatible). As the site grows in size, it would ideal to add elasticsearch to power a search engine but, for now, a lightweight javascript search (over json) seems to be performin adequately.