OLAC Record: Service - Weka noun signatures creator Web Service

OLAC Record
oai:iula.upf.edu:226

Metadata

Title: Service - Weka noun signatures creator Web Service

Alternative Title: Weka noun signatures creator Web Service

Creator: Universitat Pompeu Fabra (UPF)

Date: 2012-04-20T08:45:34Z

Description: This web service creates a weka file containing context information of a list of nouns in a given corpus. The context information for each noun is extracted using a set of Regular Expressions and it is encoded in one vector (one line per noun in the weka file). Each slot in the vector represents the number of times the regular expression in this position has been observed with the given noun. Inputs: - corpusId: Already indexed CQP corpus ID from which to extract the signatures. You can index your PoS tagged corpus using cqp_index web service. - regularExpressions: List of Regular Expressions to be applied separated by line breaks. The order of the REs in this file will be the order in the weka vectors. Optional parameters: - className: Name of the class to be included in the weka file. - indicators: Indicators file informing about the belonging of different nouns to the studied class. Format: one word per line with binary values of belonging/not belonging to the class separated by tab. In UTF-8. Example: - lemmas: If the information about belonging, not belonging to the class (indicators) is not available, you may want to include a list of nouns to be processed. The format is a list of lemmata separated by line breaks, in UTF-8. If this and indicators fields are empty, all nouns in corpus will be processed (may take a long time). - minOccurrences: minimum number of times a noun has to be seen in the corpus to be included in the output file. If a list of lemmas is given, by default minOccurrences is set to 1. - vector_type: type of vector desired at the output. 2.2 Outputs - weka: weka file with noun vectors found in the given corpus. - notFoundLemmas: list of lemmas that did not appear in the corpus more than the minOccurrences threshold. - concordances: sentences in the corpus in which the selected nouns appear and informationa bout which Regular Expressions matched in each sentence. Useful for developing and testing the Res.

End point: http://ws03.iula.upf.edu/soaplab2-axis/services/lexicon_terminology_extraction.create_weka_noun_signatures. WSDL file: http://ws03.iula.upf.edu/soaplab2-axis/services/lexicon_terminology_extraction.create_weka_noun_signatures?wsdl.

Identifier: http://services.iula.upf.edu/services/226

Publisher: Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)

Subject: NLP service, Lexicon/Terminology Extraction,

SOAP, Soaplab,

Type (DCMI): Service

OLAC Info

Archive: IULA UPF OAI Archive

Description: http://www.language-archives.org/archive/iula.upf.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:iula.upf.edu:226

DateStamp: 2019-09-25

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Universitat Pompeu Fabra (UPF). 2012-04-20T08:45:34Z. Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA).
Terms: dcmi_Service

http://www.language-archives.org/item.php/oai:iula.upf.edu:226
Up-to-date as of: Tue Oct 15 14:18:09 EDT 2019

Metadata
Title:		Service - Weka noun signatures creator Web Service
Alternative Title:		Weka noun signatures creator Web Service
Creator:		Universitat Pompeu Fabra (UPF)
Date:		2012-04-20T08:45:34Z
Description:		This web service creates a weka file containing context information of a list of nouns in a given corpus. The context information for each noun is extracted using a set of Regular Expressions and it is encoded in one vector (one line per noun in the weka file). Each slot in the vector represents the number of times the regular expression in this position has been observed with the given noun. Inputs: - corpusId: Already indexed CQP corpus ID from which to extract the signatures. You can index your PoS tagged corpus using cqp_index web service. - regularExpressions: List of Regular Expressions to be applied separated by line breaks. The order of the REs in this file will be the order in the weka vectors. Optional parameters: - className: Name of the class to be included in the weka file. - indicators: Indicators file informing about the belonging of different nouns to the studied class. Format: one word per line with binary values of belonging/not belonging to the class separated by tab. In UTF-8. Example: - lemmas: If the information about belonging, not belonging to the class (indicators) is not available, you may want to include a list of nouns to be processed. The format is a list of lemmata separated by line breaks, in UTF-8. If this and indicators fields are empty, all nouns in corpus will be processed (may take a long time). - minOccurrences: minimum number of times a noun has to be seen in the corpus to be included in the output file. If a list of lemmas is given, by default minOccurrences is set to 1. - vector_type: type of vector desired at the output. 2.2 Outputs - weka: weka file with noun vectors found in the given corpus. - notFoundLemmas: list of lemmas that did not appear in the corpus more than the minOccurrences threshold. - concordances: sentences in the corpus in which the selected nouns appear and informationa bout which Regular Expressions matched in each sentence. Useful for developing and testing the Res.
Description:		End point: http://ws03.iula.upf.edu/soaplab2-axis/services/lexicon_terminology_extraction.create_weka_noun_signatures. WSDL file: http://ws03.iula.upf.edu/soaplab2-axis/services/lexicon_terminology_extraction.create_weka_noun_signatures?wsdl.
Identifier:		http://services.iula.upf.edu/services/226
Publisher:		Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
Subject:		NLP service, Lexicon/Terminology Extraction,
Subject:		SOAP, Soaplab,
Type (DCMI):		Service
OLAC Info
Archive:		IULA UPF OAI Archive
Description:		http://www.language-archives.org/archive/iula.upf.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:iula.upf.edu:226
DateStamp:		2019-09-25
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Universitat Pompeu Fabra (UPF). 2012-04-20T08:45:34Z. Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA).
Terms:		dcmi_Service