Colonia Corpus of Historical Portuguese

Description

Portuguese is a romance language that is the native language of over 215 million speakers worldwide. Like Spanish, English and French, it was the language of both its country of origin and also that country’s colonial possessions. This corpus contains examples of historical Portuguese written between 1500 and 1936, both in Portugal and Brazil. The corpus contains complete Portuguese manuscripts published from 1500 to 1936 divided into 5 sub-corpora per century (summarized in the table below). The part of speech (POS) of words in this corpus was tagged using TreeTagger. You can find more information on this corpus on the Colonia homepage.

Resource Fields

Resource Type:

dataset

Submitted By:

Eva Bacas and Matt Lavin

Date Submitted:

2020-04-24 14:54:12


Project Open Data Required Fields (version 1.1)

Modified

[No data]

Publisher

[No data]

Contact Name

[No data]

Unique Identifier

[No data]

Public Access Level

[No data]

Project Open Data Additional Fields (version 1.0)

Contact email

[No Data]

Endpoint

[No Data]

Format

csv,txt

Project Open Data Required-if-Applicable Fields (version 1.1)

Access Level Comment

[No Data]

Bureau Code

[No Data]

Program Code

[No Data]

License

[No Data]

Rights

Zampieri, M. and Becker, M. (2013) Colonia: Corpus of Historical Portuguese. In: ZSM Studien, Special Volume on Non-Standard Dat

Spatial

[No Data]

Temporal

[No Data]