Title:Academia Sinica Tagged Corpus of Early Mandarin Chinese
Contributor:Academia Sinica
Institute of History and Philology,Academia Sinica
Chiang Ching-Kuo Foundation for International Scholarly Exchange
Contributor (editor):Chinese Knowledge Information Processing Group of Institute of Information Science of Academia Sinica
Corpus Research Group in Institute of Linguistics, Academia Sinica
Contributor (sponsor):Academia Sinica Computing Centre
Chu-Ren Huang
Cheng-hui Liu
Keh-jiann Chen
Pei-chuan Wei
P.M. Thompson
Coverage:Early (After Tang and Five Dynasties)
Primitive (Early Qin to Western Han)
Medieval (Eastern Han to Wei-Jin-Southern and Northern Dynasties)
Zhonghua (nation)
Chung-hua Min-kuo (nation)
Creator:Academia Sinica Computing Centre
Chinese Knowledge Information Processing Group of Institute of Information Science of Academia Sinica
Corpus Research Group in Institute of Linguistics, Academia Sinica
Date Available:2001/12, December, 2001
Date Created:1990
Date Modified:1997
Description:Institute of Information Science,Institute fo Linguistics,ASCC. All Rights Reserved.
The www version of Sinica Early Chinese Corpus was open to research community in November, 2001. Literature available for searching is "Hong-lou-mong" and "San-sui-ping-yao-zhuan". Although the searching function and criteria for segmentation and tagging are mostly the same with Sinica Corpus for modern Chinese, it has its own features as well. For example, while the searching result provides the part-of-speech with the lemma, the source of the cited sentence is also given. This feature helps researchers on the reference. Besides, for the segmentation and tagging standard, some changes are also made because of the different focus from the standard for analyzing modern language. For instance, the verb-complement structure is labelled more detailed in our Sinica Early Chinese Corpus.
Scholarly Exchange and Institute of History and Philology in Academia Sinica. The objective was only to collect the raw text at that time. Ever since, the colletion for raw corpus has never stopped. The collected data has entended from Primitive Chinese to Medieval Chinese and Early Chinese. The work of collection is mainly managed by Prof. Pei-Chuang Wei and is founded by Academia Sinica. The tagging of Primitive Chinese data started in 1995. For Early Chinese, the tagging system was designed in 1997 and applied immediately. The project of Early Chinese was led by Prof. Pei-Chuang Wei and Prof. Cheng-hui Liu (Hsing-Hua University) The financial support came from Academia Sinica and National Science Council; technical support on tagging system and computer science were provided by Prof. Chu-Ren Huang, Prof. Ker-Jiann Chen, and ASCC.
Format:4.5 million words, 19044 KB, saved in text file and presented in HTML format
Criteria for POS and Feature tagging in Academia Sinica Balanced Corpus of Early Chinese
Is Part Of:Academia Sinica Ancient Chinese Corpus
Language:Mandarin Chinese
Language (ISO639):cmn
Publisher:Academia Sinica
Institute of Linguistics, Preparatory Office Academia Sinica
Institute of Information Science Academia Sinica
References:Academia Sinica Formosan Language Archive
Academia Sinica Balanced Corpus of Modern Chinese
Rights:This notice regulates your usage of this web site and its associated services including interface, corpus data, segmenting and tagging standard, etc. All rights are reserved by Academia Sinica. In your research you may apply the data resulting from the searching processes of our interface systems. However, you are prohibited to abstract, alter or publish any searching results voluntarily. The copyright of corpus data is still reserved by original author or source and cannot be reproduced, copied or violate anything involving intellectual property.
Subject:English language
Mandarin Chinese language
Subject (ISO639):eng
