OLAC Record
oai:www.ldc.upenn.edu:LDC2020T23

Metadata
Title:Corpus of Law, Academic, and News
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Mohammadi, Ariana Negar. Corpus of Law, Academic, and News LDC2020T23. Web Download. Philadelphia: Linguistic Data Consortium, 2020
Contributor:Mohammadi, Ariana Negar
Date (W3CDTF):2020
Date Issued (W3CDTF):2020-10-15
Description:*Introduction* Corpus of Law, Academic, and News consists of 400 Persian documents divided into three genres: legal, academic, and news. The legal section contains texts from official publications, including the civil penal code, the criminal penal code, and the constitution of the Islamic Republic of Iran. The academic sub-corpus is comprised of published academic abstracts in various disciplinary areas, such as Art and Humanities, Social Sciences, and Natural Sciences. The news sub-corpus was extracted from an archive of ten Iranian news outlets spanning the period 2010- 2020. *Data* The document and token counts are as follows: 48 legal documents, 88,170 tokens; 274 academic documents, 85,765 tokens; and 78 news documents, 101,055 tokens. Each document contains metadata in the file's header with information such as specific text type, dates and source, and also contains annotations marking title and body paragraphs. All documents are presented as UTF-8 encoded XML with internal DTDs. *Samples* Please view this sample (XML). *Updates* None at this time.
Extent:Corpus size: 4780 KB
Identifier:LDC2020T23
https://catalog.ldc.upenn.edu/LDC2020T23
ISBN: 1-58563-947-8
ISLRN: 903-821-836-195-4
DOI: 10.35111/wcbv-pj21
Language:Persian
Language (ISO639):fas
License:Corpus of Law, Academic, and News Agreement: https://catalog.ldc.upenn.edu/license/corpus-of-law-academic-and-news-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2020T23
Rights Holder: Portions © 2020 Ariana N. Mohammadi, © 2020 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2020T23
DateStamp:  2021-01-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Mohammadi, Ariana Negar. 2020. Linguistic Data Consortium.
Terms: dcmi_Text iso639_fas olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2020T23
Up-to-date as of: Mon Jan 4 8:40:24 EST 2021