Querying Parse Trees of Stochastic Context-Free Grammars
Authors
- Sara Cohen (The Hebrew University, Israel)
- Benny Kimelfeld (IBM Almaden Research Center, USA)
Abstract
Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Web-page wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of tree-pattern queries with projection (and child/descendant edges), it is shown that query results have rational probabilities with a polynomial-size bit representation and, more importantly, an efficient query-evaluation algorithm is presented.
Session
ICDT Research Session 2: Probabilistic Databases (Tuesday, March 23, 14:00—15:30)

