OLAC Record
oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-6133-9

Metadata
Title:W2C – Web to Corpus – Corpora
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0022-6133-9
Creator:Majliš, Martin
Date (W3CDTF):2013-06-25T15:08:15Z
Date Available:2013-06-25T15:08:15Z
Description:A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Identifier (URI):http://hdl.handle.net/11858/00-097C-0000-0022-6133-9
Language:Afrikaans
Tosk Albanian
Amharic
Arabic
Aragonese
Egyptian Arabic
Asturian
Azerbaijani
Belarusian
Bengali
Bosnian
Bishnupriya
Breton
Buginese
Bulgarian
Catalan
Cebuano
Czech
Chuvash
Corsican
Welsh
Danish
German
Dimli (individual language)
Modern Greek (1453-)
English
Esperanto
Estonian
Basque
Faroese
Persian
Finnish
French
Western Frisian
Gan Chinese
Scottish Gaelic
Irish
Galician
Gilaki
Gujarati
Haitian
Serbo-Croatian
Hebrew
Fiji Hindi
Hindi
Croatian
Upper Sorbian
Hungarian
Armenian
Ido
Interlingua (International Auxiliary Language Association)
Indonesian
Icelandic
Italian
Javanese
Japanese
Kannada
Georgian
Kazakh
Korean
Kurdish
Latin
Latvian
Limburgan
Lithuanian
Lombard
Luxembourgish
Malayalam
Marathi
Macedonian
Malagasy
Mongolian
Maori
Malay (macrolanguage)
Burmese
Neapolitan
Low German
Nepali (macrolanguage)
Newari
Dutch
Norwegian Nynorsk
Norwegian
Occitan (post 1500)
Ossetian
Pampanga
Piemontese
Polish
Portuguese
Quechua
Romanian
Russian
Yakut
Sicilian
Scots
Slovak
Slovenian
Spanish
Albanian
Serbian
Sundanese
Swahili (macrolanguage)
Swedish
Tamil
Tatar
Telugu
Tajik
Tagalog
Thai
Turkish
Ukrainian
Urdu
Uzbek
Venetian
Vietnamese
Volapük
Waray (Philippines)
Walloon
Yiddish
Yoruba
Chinese
Language (ISO639):afr
als
amh
ara
arg
arz
ast
aze
bel
ben
bos
bpy
bre
bug
bul
cat
ceb
ces
chv
cos
cym
dan
deu
diq
ell
eng
epo
est
eus
fao
fas
fin
fra
fry
gan
gla
gle
glg
glk
guj
hat
hbs
heb
hif
hin
hrv
hsb
hun
hye
ido
ina
ind
isl
ita
jav
jpn
kan
kat
kaz
kor
kur
lat
lav
lim
lit
lmo
ltz
mal
mar
mkd
mlg
mon
mri
msa
mya
nap
nds
nep
new
nld
nno
nor
oci
oss
pam
pms
pol
por
que
ron
rus
sah
scn
sco
slk
slv
spa
sqi
srp
sun
swa
swe
tam
tat
tel
tgk
tgl
tha
tur
ukr
urd
uzb
vec
vie
vol
war
wln
yid
yor
zho
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
http://creativecommons.org/licenses/by-sa/3.0/
Subject:multilingual corpora
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-6133-9
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Majliš, Martin. 2013. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Africa area_Americas area_Asia area_Europe area_Pacific country_AL country_AM country_BA country_BD country_BE country_BG country_BY country_CN country_CZ country_DE country_DK country_EG country_ES country_ET country_FI country_FJ country_FR country_GB country_GE country_GR country_HR country_HT country_HU country_ID country_IE country_IL country_IN country_IR country_IS country_IT country_JP country_KR country_KZ country_LT country_LU country_MK country_MM country_NG country_NL country_NO country_NP country_NZ country_PH country_PK country_PL country_PT country_RO country_RS country_RU country_SE country_SI country_SK country_TH country_TJ country_TR country_UA country_VA country_VN country_ZA dcmi_Text iso639_afr iso639_als iso639_amh iso639_ara iso639_arg iso639_arz iso639_ast iso639_aze iso639_bel iso639_ben iso639_bos iso639_bpy iso639_bre iso639_bug iso639_bul iso639_cat iso639_ceb iso639_ces iso639_chv iso639_cos iso639_cym iso639_dan iso639_deu iso639_diq iso639_ell iso639_eng iso639_epo iso639_est iso639_eus iso639_fao iso639_fas iso639_fin iso639_fra iso639_fry iso639_gan iso639_gla iso639_gle iso639_glg iso639_glk iso639_guj iso639_hat iso639_hbs iso639_heb iso639_hif iso639_hin iso639_hrv iso639_hsb iso639_hun iso639_hye iso639_ido iso639_ina iso639_ind iso639_isl iso639_ita iso639_jav iso639_jpn iso639_kan iso639_kat iso639_kaz iso639_kor iso639_kur iso639_lat iso639_lav iso639_lim iso639_lit iso639_lmo iso639_ltz iso639_mal iso639_mar iso639_mkd iso639_mlg iso639_mon iso639_mri iso639_msa iso639_mya iso639_nap iso639_nds iso639_nep iso639_new iso639_nld iso639_nno iso639_nor iso639_oci iso639_oss iso639_pam iso639_pms iso639_pol iso639_por iso639_que iso639_ron iso639_rus iso639_sah iso639_scn iso639_sco iso639_slk iso639_slv iso639_spa iso639_sqi iso639_srp iso639_sun iso639_swa iso639_swe iso639_tam iso639_tat iso639_tel iso639_tgk iso639_tgl iso639_tha iso639_tur iso639_ukr iso639_urd iso639_uzb iso639_vec iso639_vie iso639_vol iso639_war iso639_wln iso639_yid iso639_yor iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-6133-9
Up-to-date as of: Thu Oct 5 0:38:54 EDT 2023