OLAC Record: Global Yoruba Lexical Database v. 1.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2008L03

Metadata

Title: Global Yoruba Lexical Database v. 1.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Awoyale, Yiwola. Global Yoruba Lexical Database v. 1.0 LDC2008L03. Web Download. Philadelphia: Linguistic Data Consortium, 2008

Contributor: Awoyale, Yiwola

Date (W3CDTF): 2008

Date Issued (W3CDTF): 2008-12-19

Description: *Introduction* The Global Yoruba Lexical Database v. 1.0 is a set of related dictionaries providing definitions and translations for over 450,000 words from the Yoruba language and its variants: Standard Yoruba (over 368,000 words), Gullah (over 3,600 words), Lucumí (over 8,000 words) and Trinidadian (over 1,000 words). Yoruba is a Niger-Congo language (sub classification: Kwa > Yoruboid) spoken natively by nearly 20 million people, the vast majority of them in southwestern Nigeria. There are also approximately a half million Yoruba speakers in Benin, as well as speakers in Togo and Ghana and among the emigrant populations in the United States and the United Kingdom. In addition, roughly two million people in Nigeria speak Yoruba as a second language. The Yoruba language diaspora is wide, stretching from southwestern Nigeria and Benin westward to the Caribbean and islands along the southeastern United States coast. Yoruba and other African dialects arrived in the Americas and the Caribbean as a consequence of the Atlantic slave trade. Throughout the region, Yoruba dialects blended with each other and with languages like Spanish and French to form a variety of creoles such as Gullah in the United States and Nagô in Brazil. Many of those creoles have become the language of liturgy and music in Cuba, Brazil, Argentina, Trinidad, Jamaica and parts of the United States and Canada. The ultimate goal of this dictionary is to provide coverage for all Yoruba dialects across the globe. For that reason, it will continue to be a work in progress. The current standard orthography is tone-driven. Yoruba has three tones: a high tone, a middle tone and a low tone. Each syllable in a Yoruban word must have at least one tone and long vowels may have two tones. While there are no explicit rising or falling tones, combinations of the languages three basic tones may produce the same effect. Grammatically, Yoruba is a Subject-Verb-Object (SVO) language. Verbs have no infinitive forms, past or present tense and typically have only a single syllable. Discrete auxiliary words provide information on the verb tense. Nor do Yoruba nouns have plural or singular form their number derives from the context in which the word occurs. The Yoruba dialect continuum consists of over fifteen varieties, with considerable phonological and lexical differences among them and some grammatical ones as well. Peripheral areas of dialectal regions often have some similarities to adjoining dialects. Standard Yoruba is a koine used for education, writing, broadcasting, and contact between speakers of different dialects. It is also called Literary Yoruba, common Yoruba, or simply Yoruba without qualification. Though in large part based on the Ò?yò? and Ibadan dialects, it incorporates several features from other dialects and has a simplified vowel harmony system and some other features not found in other Yoruba dialects. *Data * This release encompasses the following languages and dialects: Languages Description Number of words Yoruba->English This dictionary of Standard Yoruba contains detailed lexicographic entries which include the part of speech, the English definition of the Yoruba headword, cross references, examples in English and the morphemic decomposition of the Yoruba headword. 142,389 English->Yoruba This dictionary maps the English headword back to Standard Yoruba and includes the part of speech, Yoruba definition, and morphemic decomposition of the Yoruba word. 226,585 Gullah->English and Yoruba Gullah is a creole spoken in the coastal Low Country of South Carolina and Georgia in the United States. Although the language is no longer spoken to a great extent, its words are still commonly used for personal names and nicknames. The dictionary translates from Gullah headwords to English and to Standard Yoruba. 3,636 Lucumí->Spanish, English and Yoruba Lucumí is the ritual language of the Santeria religion practiced in Cuba. The Lucumí dictionary translates from a Lucumí headword to Cuban Spanish to English to Standard Yoruba. At the time of this publication in 2008, some entries do not have complete translations and only map from Lucumí to Cuban Spanish. 8,075 Trinidadian->English and Yoruba Trinidadian is a creole which blends English, French, Spanish and African languages. The Trinidadian dictionary presents those words that have Yoruban roots and maps from the Trinidadian headword to English and Standard Yoruba. 1,187 The dictionaries in this publication are presented in two formats, Toolbox databases and XML. Short for The Field Linguists Toolbox, Toolbox is a lexicographical database system published by SIL. SIL makes Toolbox freely available for download. In order to use the Global Yoruba Lexical Database v. 1.0, Toolbox must first be installed on the users local computer. The orthography of the text in the databases conforms to that presented to students in the Nigerian school system. The basic Yoruba alphabet is: a b d e e? f g gb h i j k l m n o o? p r s s? t u w y The letter gb is a digraph, two letters that combine to form a single phoneme. In written Yoruba, gb functions as a single letter. In the Toolbox presentation, this has been taken into account and the software sorts the words accordingly in all functions. The XML presentation has been sorted according to the above alphabet but is a static, flat file. For that reason, developers creating applications from the XML files will need to take into account the digraph when writing searching and reporting functions. As Yoruba is a tonal language, the written language uses additional diacritic marks to denote tones. The orthography uses three tones: * Low: denoted with a grave symbol () as in à * Mid: plain letter without diacritics * High: denoted with an acute (´) symbol as in á Both the Toolbox and XML presentations encode the text in Unicode UTF-8 using normalized form C. Unicode normalized forms govern the order in which letters and characters are composed and processed by software systems. Normalized form C is the standard form used by most web systems and is a W3C standard for the web. The Toolbox presentation uses the Aria Unicode MS font for display. The Tahoma and Lucida Grande fonts will also display the Yoruba alphabet under UTF-8 encoding. Since XML only provides information about document structure, fonts are not specified in the XML versions of the dictionaries. Displaying non-Western letters:Windows users will need to install and configure their computers for Extended Language support. To do this, open the Windows Control Panel and click the Regional and Language Options icon. In the Regional and Language Options window that opens, select the Languages pane. Under the Supplemental Language Support section, check both check boxes and click okay. Windows will as for your install disc and will install the modules needed to properly display complex and non Western letters. If users do not have their Windows install disc, they should contact their local system administrator to install Extended Language Support. *Samples* For an example of the data in this database, please review this sample entry (jpg) from the Yoruba-English Lexicon. *Contact Informaton* All questions and inquiries should be directed to the author, Yiwola Awoyale at awoyale@ldc.upenn.edu

Extent: Corpus size: 176128 KB

Identifier: LDC2008L03

https://catalog.ldc.upenn.edu/LDC2008L03

ISBN: 1-58563-500-6

ISLRN: 973-344-578-516-8

DOI: 10.35111/6sp6-8p36

Language: Yoruba

Trinidadian Creole English

Lucumi

Sea Island Creole English

English

Spanish

Language (ISO639): yor

trf

luq

gul

eng

spa

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2008L03

Rights Holder: Portions © 1990-2008 Trustees of the University Pennsylvania

Subject: Sea Island Creole English language

Lucumi language

Trinidadian Creole English language

Yoruba language

Subject (ISO639): gul

luq

trf

yor

Type (DCMI): Text

Type (OLAC): lexicon

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2008L03

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Awoyale, Yiwola. 2008. Linguistic Data Consortium.
Terms: area_Africa area_Americas area_Europe country_CU country_ES country_GB country_NG country_TT country_US dcmi_Text iso639_eng iso639_gul iso639_luq iso639_spa iso639_trf iso639_yor olac_lexicon

Inferred Metadata
Country: Cuba Nigeria Trinidad and Tobago United States
Area: Africa Americas

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2008L03
Up-to-date as of: Wed Oct 29 7:01:03 EDT 2025

Metadata
Title:		Global Yoruba Lexical Database v. 1.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Awoyale, Yiwola. Global Yoruba Lexical Database v. 1.0 LDC2008L03. Web Download. Philadelphia: Linguistic Data Consortium, 2008
Contributor:		Awoyale, Yiwola
Date (W3CDTF):		2008
Date Issued (W3CDTF):		2008-12-19
Description:		Introduction The Global Yoruba Lexical Database v. 1.0 is a set of related dictionaries providing definitions and translations for over 450,000 words from the Yoruba language and its variants: Standard Yoruba (over 368,000 words), Gullah (over 3,600 words), Lucumí (over 8,000 words) and Trinidadian (over 1,000 words). Yoruba is a Niger-Congo language (sub classification: Kwa > Yoruboid) spoken natively by nearly 20 million people, the vast majority of them in southwestern Nigeria. There are also approximately a half million Yoruba speakers in Benin, as well as speakers in Togo and Ghana and among the emigrant populations in the United States and the United Kingdom. In addition, roughly two million people in Nigeria speak Yoruba as a second language. The Yoruba language diaspora is wide, stretching from southwestern Nigeria and Benin westward to the Caribbean and islands along the southeastern United States coast. Yoruba and other African dialects arrived in the Americas and the Caribbean as a consequence of the Atlantic slave trade. Throughout the region, Yoruba dialects blended with each other and with languages like Spanish and French to form a variety of creoles such as Gullah in the United States and Nagô in Brazil. Many of those creoles have become the language of liturgy and music in Cuba, Brazil, Argentina, Trinidad, Jamaica and parts of the United States and Canada. The ultimate goal of this dictionary is to provide coverage for all Yoruba dialects across the globe. For that reason, it will continue to be a work in progress. The current standard orthography is tone-driven. Yoruba has three tones: a high tone, a middle tone and a low tone. Each syllable in a Yoruban word must have at least one tone and long vowels may have two tones. While there are no explicit rising or falling tones, combinations of the languages three basic tones may produce the same effect. Grammatically, Yoruba is a Subject-Verb-Object (SVO) language. Verbs have no infinitive forms, past or present tense and typically have only a single syllable. Discrete auxiliary words provide information on the verb tense. Nor do Yoruba nouns have plural or singular form their number derives from the context in which the word occurs. The Yoruba dialect continuum consists of over fifteen varieties, with considerable phonological and lexical differences among them and some grammatical ones as well. Peripheral areas of dialectal regions often have some similarities to adjoining dialects. Standard Yoruba is a koine used for education, writing, broadcasting, and contact between speakers of different dialects. It is also called Literary Yoruba, common Yoruba, or simply Yoruba without qualification. Though in large part based on the Ò?yò? and Ibadan dialects, it incorporates several features from other dialects and has a simplified vowel harmony system and some other features not found in other Yoruba dialects. Data This release encompasses the following languages and dialects: Languages Description Number of words Yoruba->English This dictionary of Standard Yoruba contains detailed lexicographic entries which include the part of speech, the English definition of the Yoruba headword, cross references, examples in English and the morphemic decomposition of the Yoruba headword. 142,389 English->Yoruba This dictionary maps the English headword back to Standard Yoruba and includes the part of speech, Yoruba definition, and morphemic decomposition of the Yoruba word. 226,585 Gullah->English and Yoruba Gullah is a creole spoken in the coastal Low Country of South Carolina and Georgia in the United States. Although the language is no longer spoken to a great extent, its words are still commonly used for personal names and nicknames. The dictionary translates from Gullah headwords to English and to Standard Yoruba. 3,636 Lucumí->Spanish, English and Yoruba Lucumí is the ritual language of the Santeria religion practiced in Cuba. The Lucumí dictionary translates from a Lucumí headword to Cuban Spanish to English to Standard Yoruba. At the time of this publication in 2008, some entries do not have complete translations and only map from Lucumí to Cuban Spanish. 8,075 Trinidadian->English and Yoruba Trinidadian is a creole which blends English, French, Spanish and African languages. The Trinidadian dictionary presents those words that have Yoruban roots and maps from the Trinidadian headword to English and Standard Yoruba. 1,187 The dictionaries in this publication are presented in two formats, Toolbox databases and XML. Short for The Field Linguists Toolbox, Toolbox is a lexicographical database system published by SIL. SIL makes Toolbox freely available for download. In order to use the Global Yoruba Lexical Database v. 1.0, Toolbox must first be installed on the users local computer. The orthography of the text in the databases conforms to that presented to students in the Nigerian school system. The basic Yoruba alphabet is: a b d e e? f g gb h i j k l m n o o? p r s s? t u w y The letter gb is a digraph, two letters that combine to form a single phoneme. In written Yoruba, gb functions as a single letter. In the Toolbox presentation, this has been taken into account and the software sorts the words accordingly in all functions. The XML presentation has been sorted according to the above alphabet but is a static, flat file. For that reason, developers creating applications from the XML files will need to take into account the digraph when writing searching and reporting functions. As Yoruba is a tonal language, the written language uses additional diacritic marks to denote tones. The orthography uses three tones: * Low: denoted with a grave symbol () as in à * Mid: plain letter without diacritics * High: denoted with an acute (´) symbol as in á Both the Toolbox and XML presentations encode the text in Unicode UTF-8 using normalized form C. Unicode normalized forms govern the order in which letters and characters are composed and processed by software systems. Normalized form C is the standard form used by most web systems and is a W3C standard for the web. The Toolbox presentation uses the Aria Unicode MS font for display. The Tahoma and Lucida Grande fonts will also display the Yoruba alphabet under UTF-8 encoding. Since XML only provides information about document structure, fonts are not specified in the XML versions of the dictionaries. Displaying non-Western letters:Windows users will need to install and configure their computers for Extended Language support. To do this, open the Windows Control Panel and click the Regional and Language Options icon. In the Regional and Language Options window that opens, select the Languages pane. Under the Supplemental Language Support section, check both check boxes and click okay. Windows will as for your install disc and will install the modules needed to properly display complex and non Western letters. If users do not have their Windows install disc, they should contact their local system administrator to install Extended Language Support. Samples For an example of the data in this database, please review this sample entry (jpg) from the Yoruba-English Lexicon. Contact Informaton All questions and inquiries should be directed to the author, Yiwola Awoyale at awoyale@ldc.upenn.edu
Extent:		Corpus size: 176128 KB
Identifier:		LDC2008L03
		https://catalog.ldc.upenn.edu/LDC2008L03
		ISBN: 1-58563-500-6
		ISLRN: 973-344-578-516-8
		DOI: 10.35111/6sp6-8p36
Language:		Yoruba
		Trinidadian Creole English
		Lucumi
		Sea Island Creole English
		English
		Spanish
Language (ISO639):		yor
		trf
		luq
		gul
		eng
		spa
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2008L03
Rights Holder:		Portions © 1990-2008 Trustees of the University Pennsylvania
Subject:		Sea Island Creole English language
		Lucumi language
		Trinidadian Creole English language
		Yoruba language
Subject (ISO639):		gul
		luq
		trf
		yor
Type (DCMI):		Text
Type (OLAC):		lexicon
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2008L03
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Awoyale, Yiwola. 2008. Linguistic Data Consortium.
Terms:		area_Africa area_Americas area_Europe country_CU country_ES country_GB country_NG country_TT country_US dcmi_Text iso639_eng iso639_gul iso639_luq iso639_spa iso639_trf iso639_yor olac_lexicon
Inferred Metadata
Country:		Cuba Nigeria Trinidad and Tobago United States
Area:		Africa Americas