Introduction to Linguistic Annotation and Text Analytics

Morgan & Claypool Publishers, 2009 - Computers - 149 pages

Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good linguistic annotations are the essential foundation for good text analytics. After briefly reviewing the basics of XML, with practical exercises illustrating in-line and stand-off annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next chapter shows how annotations can be created automatically using statistical NLP tools, and compares two sets of tools, the OpenNLP and Stanford NLP tools. The second half of the book describes different annotation formats and gives practical examples of how to interchange annotations between different formats using XSLT transformations. The two main text analytics architectures, GATE and UIMA, are then described and compared, with practical exercises showing how to configure and customize them. The final chapter is an introduction to text analytics, describing the main applications and functions including named entity recognition, coreference resolution and information extraction, with practical examples using both open source and commercial tools. Copies of the example files, scripts, and stylesheets used in the book are available from the companion website, located at http: //sites.morganclaypool.com/wilcock. Table of Contents: Working with XML / Linguistic Annotation / Using Statistical NLP Tools / Annotation Interchange / Annotation Architectures / Text Analytics

Preview this book »

Selected pages

Table of Contents

References

Working with XML	1

Linguistic Annotation	19

Using Statistical NLP Tools	45

Annotation Interchange	63

Annotation Architectures	95

Text Analytics	119

Bibliography	147

Copyright

Other editions - View all

Introduction to Linguistic Annotation and Text Analytics
Graham Wilcock
Limited preview - 2009

Introduction to Linguistic Annotation and Text Analytics
Graham Wilcock
Limited preview - 2022

Introduction to Linguistic Annotation and Text Analytics
Graham Wilcock
No preview available - 2009

Common terms and phrases

ADJP aggregate analysis engine ANNIE annotation format Annotation Sets annotation tools annotator="gw bin/sh Chapter Chunker CLASSPATH CLASSPATH java Click component coreference document element example father information extraction input installed JAPE rules Java jEdit job titles linguistic annotations LRWB markup maximum entropy models name finder named entity recognition Node Northanger Abbey noun phrase NP DT NP NP NP PRP OpenNLP parser OpenNLP POS tagger OpenNLP sentence detector OpenNLP tools OPENNLP_HOME output parameter parser part-of-speech tagging Penn Treebank plain text plugin POS tags PRACTICAL Processing Resources pronoun regular expression SBAR script to run Section Select semantic sentence boundaries Set Annotation shell script shown in Figure snow be white Sonnet stylesheet syntactic parsing tagset text analytics token Transforming WordFreak Treebank UIMA verb Viewer VP VBP Wilcock WordFreak WordFreak annotation WordFreak XML format words xalan.sh XML Metadata Interchange xsl:apply-templates xsl:attribute xsl:element xsl:template xsl:text xsl:value-of select="$newline xsl:variable XSLT

Bibliographic information

Title	Introduction to Linguistic Annotation and Text Analytics Synthesis digital library of engineering and computer science Volume 3 of Synthesis lectures on human language technologies
Author	Graham Wilcock
Publisher	Morgan & Claypool Publishers, 2009
ISBN	1598297384, 9781598297386
Length	149 pages
Subjects	Computers › Artificial Intelligence › General Computers / Artificial Intelligence / General Computers / Artificial Intelligence / Natural Language Processing Computers / Languages / General Computers / Languages / XML Language Arts & Disciplines / Linguistics / General

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home