This talk will review our recent investigations into improving
the performance of ADIOS. The latter is an algorithm which, given a corpus
of strings, recursively distills from it an assembly of hierarchically
structured constituents and equivalence classes in an unsupervised manner.
We aim to cover a few directions of development, including
- Enhancing the capabilities of the algorithm by
splitting complex sentences on function words,
thus simplifying the training data and preventing acquisition of
"across-clauses" patterns. This involves a method
for an unsupervised identification of conjunctions, based
solely on particle statistics and using distributional clustering.
- Combining the search for constituents with the search for equivalence
classes, by only defining a constituent when it is necessary to improve
the generalization ability of the inferred grammar. This approach is based
on an extension of distributional clustering which clusters sub-sentences,
and on a novel alignment procedure utilized to detect interchangeable
words and expressions.
Overall, these approaches show an improvement
in quality of the grammar inferred, compared with ADIOS.
Joint work with Ben Sandbank, Jonathan Berant and Shimon Edelman
|