November 21, Wednesday
12:00 – 14:00
Text to Text Generation: Review of the Field and Open Issues
Bio-Informatics seminar
Lecturer : Dr. Michael Elhadad
Lecturer homepage : http://www.cs.bgu.ac.il/~elhadad/
Affiliation : CS, BGU
Location : 202/37
Host : Student Seminar
In T2T, textual units (sentences, clauses, phrases) are extracted from one context and recombined into a new text. As in NLG in general, one can split the overall task of generation into several steps:
- Content selection
- Content organization
- Realization: the rendering of the selected content into fluent language.
In T2T, "content" is encoded as textual units - generally called "SCU" (Shared Content Unit). The input to a T2T system includes a collection of texts that are related in topic, for example a collection of news reports describing the same event (from different sources or published at different times). Content selection and organization in D2T applications is generally related to knowledge representation and inferencing. In the case of T2T, it is related to Information Extraction, which includes named entity recognition, coreference resolution, entity identification, relation identification and scenario identification.
Realization is the linguistic component of generation, and is organized around the following steps:
- Lexicalization (selection of the words)
- Aggregation
- Referring expression generation
- Rhetorical structuring and ordering
- Centering and salience
- Syntactic realization (ordering of the words, morphological inflection)