Dissertation/These OnlineInfoling 12.25 (2012)

Autor/ra:Pareja Lora, Antonio
Fecha de lectura o defensa:26 de July de 2012
Título de la tesis:OntoTag. A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
Director/a de la tesis:Guadalupe Aguado de Cea
Codirección:Asunción Gómez Pérez
University/College:Universidad Politécnica de Madrid (UPM)
Department:Departamento de Inteligencia Artificial (DIA)
Country:España
Descripción de la tesisEl objeto principal de esta tesis es el modelo de anotación OntoTag, de utilidad tanto para la Lingüística como para la Web Semántica. El modelo OntoTag fue concebido cuando aún (1) no se había publicado ningún estándar oficial de anotación lingüística, (2) no se tenía claro qué es o puede considerarse realmente una ontología, (3) no se había desarrollado ninguna ontología lingüística, (4) no existían las primeras recomendaciones del W3C para RDF y OWL, y (5) la Web Semántica era aún una utopía. No obstante, desde un primer momento, este modelo de anotación ya incorporaba todas estas tecnologías, por entonces incipientes, [A] para combinar, integrar e interconectar anotaciones de varios tipos y/o niveles, y también [B] para hacer que un conjunto de herramientas y recursos lingüísticos interoperaran de manera efectiva. Estos dos objetivos se alcanzaron gracias al conjunto de ontologías lingüísticas que forman el núcleo principal del modelo. OntoTag proporciona asimismo una metodología y un conjunto de buenas prácticas para la anotación lingüística y ontológica de textos, de acuerdo con las normas, recomendaciones y directrices de anotaciones lingüísticas y para la Web Semántica publicadas hasta la finalización de este trabajo de investigación. Finalmente, en la presente tesis también se presenta la plataforma OntoTagger. OntoTagger se desarrolló aplicando la metodología y el conjunto de buenas prácticas ya mencionados para evaluar el modelo OntoTag. Los resultados experimentales obtenidos mediante esta plataforma permitieron confirmar y/o refutar las distintas hipótesis asociadas al modelo de anotación. Todos estos resultados quedan recogidos también en la memoria de la tesis.
Subject Area(s):Lingüística computacional, Lingüística de corpus, Semántica
Tesis completa en el Archivo de Infoling: http://www.infoling.org/repository/ID/95
Table of Contents1. Introduction
1.1. Linguistic tools and annotations: their lights and shadows
1.1.1. The lights: linguistic tools and annotations are very useful…
1.1.2. The shadows of linguistic tools and annotations
1.1.2.1 Linguistic tools are usually expensive
1.1.2.2 Sometimes, linguistic tools and annotations are not accurate
1.1.2.3 Linguistic tools and linguistic annotations hardly interoperate
1.2. The problem of linguistic tools and/or annotations interoperation in detail
1.3. Solving the problem: an outline of our proposal
1.3.1. The role of ontologies
1.3.2. The role of standardisation
1.3.3. A desideratum: minimise cascaded errors
1.4. Our proposal: main pillars, contributions and results
1.5. Structure of this dissertation

2. Work objectives
2.1. Open research problems
2.2. Goals of the present work
2.3. Expected contributions and results
2.4. Work assumptions, hypotheses and restrictions
2.4.1. Assumptions
2.4.2. Hypotheses
2.4.3. Restrictions

3. State of the art
3.1. Annotation: a historical approach and basic terminology
3.2. Annotation: current approaches
3.2.1. The linguistic approach to annotation
3.2.1.1 Morphosyntactic annotations
3.2.1.2 Syntactic annotations
3.2.1.3 Semantic annotations
3.2.1.3.1 Semantic annotation layers
3.2.1.3.2 Semantic annotation-related projects, guidelines and standardisation initiatives
3.2.1.3.2.1 Regarding the Sense Tagging Layer
3.2.1.3.2.2 Regarding the Semantic Domain Annotation Layer
3.2.1.3.2.3 Regarding the Semantic Field Annotation Layer
3.2.1.3.2.4 Regarding the Semantic Role Labelling Layer
3.2.1.3.3 Semantic annotations – concluding remarks
3.2.1.4 Linguistic annotation: level-independent approaches
3.2.2. The computational approach to annotation
3.2.2.1 Computational representation of annotations: annotation languages
3.2.2.2 The semantic web and semantic (web) annotations
3.2.2.2.1 Definition of ontological tagsets and metadata: RDFS and OWL
3.3. Concluding remarks: guidelines for a hybrid approach to annotation

4. OntoTag: the hybrid annotation model
4.1. OntoTag’s linguistic ontologies
4.1.1. Building ontologies with methontology
4.1.2. The OntoTag Integration Ontology (OIO)
4.1.2.1 OIO glossary of terms
4.1.2.2 OIO concept taxonomies
4.1.2.3 OIO ad hoc relationships
4.1.2.4 OIO concept dictionary
4.1.2.5 OIO detailed tables
4.1.2.5.1 OIO ad hoc binary relation table
4.1.2.5.2 OIO instance attribute table
4.1.2.5.3 OIO class attribute table
4.1.2.5.4 OIO constant table
4.1.2.6 OIO formal axioms and rules
4.1.2.6.1 OIO rule table
4.1.2.6.2 OIO formal axiom table
4.1.2.7 OIO instance table
4.1.2.8 The OIO statistics
4.1.3. The linguistic unit ontology (LUO)
4.1.3.1 The concept linguistic unit, its subclasses and its attributes
4.1.3.2 The syntactic module
4.1.3.2.1 The syntactic module concepts and taxonomy
4.1.3.2.2 The syntactic module attributes
4.1.3.2.3 The syntactic module ad hoc relations
4.1.3.2.4 The syntactic module rules and axioms
4.1.3.3 The semantic module
4.1.3.3.1 The semantic module concepts and taxonomy
4.1.3.3.2 The semantic module attributes
4.1.3.3.3 The semantic module ad hoc relations
4.1.3.3.4 The semantic module rules and axioms
4.1.3.4 The LUO statistics
4.1.4. The linguistic attribute ontology (LAO)
4.1.4.1 The LAO concepts, taxonomy and instances
4.1.4.1.1 Top-level concepts and taxonomy in the LAO
4.1.4.1.2 Syntactic concepts, taxonomy and instances in the LAO
4.1.4.1.3 Semantic concepts, taxonomy and instances in the LAO
4.1.4.2 The LAO attributes
4.1.4.3 The LAO ad hoc relations
4.1.4.4 The LAO rules and axioms
4.1.4.5 The LAO statistics
4.1.5. The linguistic value ontology (LVO)
4.1.5.1 The LVO concepts, taxonomy and instances
4.1.5.1.1 Top-level concepts and taxonomy in the LVO
4.1.5.1.2 Syntactic concepts, taxonomy and instances in the LVO
4.1.5.1.3 Semantic concepts, taxonomy and instances in the LVO
4.1.5.2 The LVO attributes
4.1.5.3 The LVO ad hoc relations
4.1.5.4 The LVO rules and axioms
4.1.5.5 The LVO statistics
4.2. OntoTag’s abstract annotation architecture
4.2.1. Phase 1 – Distillation
4.2.2. Phase 2 – Tagging
4.2.3. Phase 3 – Standardisation
4.2.4. Phase 4 – Decanting
4.2.5. Phase 5 – Merging
4.2.5.1 Sub-Phase 5.1 – Combination (Intra-Level Merging)
4.2.5.1.1 L+POS Combination
4.2.5.1.2 POS+M Combination
4.2.5.1.3 Syntactic Combination
4.2.5.1.4 Semantic Combination
4.2.5.2 Sub-Phase 5.2 – Integration (Inter-Level Merging)
4.2.5.3 Putting combination and integration together
4.3. OntoTag’s abstract annotation scheme
4.3.1.1 Morphosyntactic annotations
4.3.1.2 Other annotations at the Syntactic Level
4.3.1.3 Annotations at the Semantic Level
4.3.1.3.1 Annotations at the Concept Semantic Annotation Layer
4.3.1.3.2 Annotations at the Instance Semantic Annotation Layer

5. OntoTagger: an instance of OntoTag
5.1. Linguistic annotation tools integrated into OntoTagger
5.1.1. Connexor’s FDG (Functional Dependency Grammar) Parser for Spanish
5.1.2. Bitext’s Datalexica
5.1.3. LACELL’s POS Tagger
5.2. OntoTagger’s configuration components
5.3. The Combination Module (MMACM) in detail
5.3.1. Syntactic structure combination
5.3.2. Morphosyntactic category and lemma combination
5.3.2.1 Morphosyntactic category combination
5.3.2.1.1 Mathematical terms applied in the notation
5.3.2.1.2 Description of the notation
5.3.2.1.3 Morphosyntactic category combination rules
5.3.2.2 Lemma combination
5.3.2.2.1 Mathematical terms applied in the notation
5.3.2.2.2 Description of the notation
5.3.2.2.3 Lemma combination rules
5.3.3. Morphological combination
5.3.3.1 Mathematical terms applied in the notation
5.3.3.2 Description of the notation
5.3.3.3 Morphological combination rules
5.4. The Semantic Annotation Manager Module (SAMM) in detail
5.4.1. OntoTaggerLex – the semantic lexicon
5.4.2. The Instance Semantic Annotation Layer
5.4.2.1 Named entity-related linguistic heuristics
5.4.2.1.1 Description of the notation
5.4.2.1.2 Rules for named entity recognition
5.4.2.1.2.1 Rules for simple named entity identification
5.4.2.1.2.2 Rules for multiword named entity aggregation
5.4.2.1.3 Rules for named entity (sub)classification
5.4.3. The concept semantic annotation layer
5.4.3.1 Domain–Independent Sense Tagging: EuroWordNet integration
5.4.3.2 Domain–Dependent Sense Tagging: CNEO integration
5.4.4. Putting semantic annotations together: the Semantic Intra-Level Merger (SILM)
5.4.5. The semantic learning process
5.5. OntoTagger’s annotation schemas
5.5.1. XML implementation: an example
5.5.2. OWL implementation: an example

6. Results and evaluation
6.1. Evaluation of the combination sub-phase
6.1.1. Statistical analysis of the morphosyntactic category combination results
6.1.1.1 Precision-related (generic) statistical indicators
6.1.1.2 Recall-related (particular) statistical indicators
6.1.2. Statistical analysis of the lemma combination results
6.1.3. Statistical analysis of the morphological attribute combination results
6.2. Evaluation of the named entity recognition and classification subsystem

7. Conclusions
7.1. Main outcomes
7.2. Other practical outcomes
7.2.1. Recommendations, best practices and lessons learned for annotation standardisation, interoperation and merge
7.3. Other theoretical outcomes
7.3.1. Concerning standardisation
7.3.2. Concerning annotation interoperation and merge
7.4. OntoTag’s hypotheses assessment

8. Further work
9. Acknowledgements
10. References
11. Appendix A: OntoTagger’s annotation XML schema
Número de págs.:365


Fecha de publicación en Infoling:15 de December de 2012
Remitente:
Antonio Pareja Lora
Universidad Complutense de Madrid (UCM)
<aparejasip.ucm.es>