A Workshop on Machine Learning in Natural Language Processing

 

Organizers: Shalom Lapin and Ido Dagan

 

Workshop Abstracts

 

Boosting unsupervised language acquisition with ADIOS

Presentation (ppt)


Eytan Ruppin
Tel Aviv University

 


This talk will review our recent investigations into improving the performance of ADIOS. The latter is an algorithm which, given a corpus of strings, recursively distills from it an assembly of hierarchically structured constituents and equivalence classes in an unsupervised manner. We aim to cover a few directions of development, including
  1. Enhancing the capabilities of the algorithm by splitting complex sentences on function words, thus simplifying the training data and preventing acquisition of "across-clauses" patterns. This involves a method for an unsupervised identification of conjunctions, based solely on particle statistics and using distributional clustering.
  2. Combining the search for constituents with the search for equivalence classes, by only defining a constituent when it is necessary to improve the generalization ability of the inferred grammar. This approach is based on an extension of distributional clustering which clusters sub-sentences, and on a novel alignment procedure utilized to detect interchangeable words and expressions.
Overall, these approaches show an improvement in quality of the grammar inferred, compared with ADIOS.
 
Joint work with Ben Sandbank, Jonathan Berant and Shimon Edelman

 

 

Back