AUTOMATIC LABELING OF CONTRASTIVE WORD PAIRS FROM SPONTANEOUSSPOKEN ENGLISH
AUTOMATIC LABELING OF CONTRASTIVE WORD PAIRS FROM SPONTANEOUSSPOKEN ENGLISH
The concept of contrast plays an important role in many spo-ken language technologies, ranging from spoken language un-derstanding to speech synthesis. According to the observationpoint one looks at it, contrast can be seen as: a) a discourserelation that ties discourse elements; b) a concept of infor-mation structure that makes a word (or a phrase) salient bycomparing it with other word(s) available from the discoursecontext; c) a linguistic concept often prosodically marked.Given the broad meaning of contrast, the different dis-course scenarios invoking it, the poor availability of corporaannotated with categories of contrast, and our main researchinterest of investigating the role of contrast in prosodic promi-nence modeling for text-to-speech applications, we decided tofocus on one aspect/category of contrast only: an informationstructure relation that links two semantically related wordsthat explicitly contrast with each other.
Before merging the syntactic and the information structureannotations we converted the constituent format in the PennTreebank into dependency trees using the Penn2Malt con-verter ([6]). Since the PennTreebank constituent annotationfor Switchboard uses slightly different (and not yet standardlyheld) conventions from whose presupposed by the Penn2Maltconverter we had to support the converter with some addi-tional scripts. However, because of problems we encounteredin the conversion process we had to remove 54 (out of 146)dialogues. For each remaining dialogue all the word senses(according to the WordNet senses set) were disambiguatedusing the WordNet::SenseRelate Perl module ([7]).
All syntactic features are POS, dependency relations (subjectof, object of, etc…) and features derived from both of them.Examples of features derived from POS are the features indi-cating if W1 is the only word in the sentence having the samebroad POS of W2, and the feature indicating if W1 is the clos-est (in term of words between them) word preceding W2 andhaving the same broad POS. The use of deeper than POS syntactic information suchas syntactic dependencies (and information related to them)is motivated by the need of identifying syntactic patterns ofcontrastiveness that can not be identified using POS and lex-ical features alone. For example knowing that W1 and W2have the same type of dependency with their heads as in ex-ample (3) (both “you” and the first “I” have a “subject of”dependency with “take” and “do” respectively) or that theirheads refer to the same item as in example (6), seems to bea necessary (but often not sufficient) information to identifycontrast. https://speakinenglish.in/
