The Blog Authorship Corpus

Description

The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person. Each blog is presented as a separate file, the name of which indicates a blogger id# and the blogger’s self-provided gender, age, industry and astrological sign.Cite as: J. Schler, M. Koppel, S. Argamon and J. Pennebaker (2006). Effects of Age and Gender on Blogging in _Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs_.

Resource Fields

Resource Type:

dataset

Submitted By:

Matt Lavin

Date Submitted:

2017-01-01 18:22:17


Project Open Data Required Fields (version 1.1)

Modified

[No data]

Publisher

[No data]

Contact Name

Moshe Koppel

Unique Identifier

[No data]

Public Access Level

[No data]

Project Open Data Additional Fields (version 1.0)

Contact email

koppel@cs.biu.ac.il

Endpoint

[No Data]

Format

xml

Project Open Data Required-if-Applicable Fields (version 1.1)

Access Level Comment

[No Data]

Bureau Code

[No Data]

Program Code

[No Data]

License

Non-commercial research use

Rights

J. Schler, M. Koppel, S. Argamon and J. Pennebaker (2006). Effects of Age and Gender on Blogging in Proceedings of 2006 AAAI Spr

Spatial

[No Data]

Temporal

August 2004