java - How to extract derivation rules from a bracketed parse tree? -
i have lot of parse trees this:
( s ( np-sbj ( prp ) ) ( inode@s ( vp ( vbp have ) ( np ( dt ) ( inode@np ( nn savings ) ( nn account ) ) ) ) ( . . ) ) )
for sentence this: "i have savings account ."
i need extract derivation rules these trees. derivation rules like:
s -> np-sbj inode@s np-sbj -> prp prp -> inode@s -> vp np , on.
is there prepared code (preferably in java) or pseudo code purpose?
edit:
i think problem general , common in many areas. simplified problem find each parent , it's children parenthesis tree.
i wrote python. believe can read pseudocode. i edit post java later. added java implementation later.
import re # grammar repository grammar_repo = [] s = "( s ( np-sbj ( prp ) ) ( inode@s ( vp ( vbp have ) ( np ( dt ) ( inode@np ( nn savings ) ( nn account ) ) ) ) ( . . ) ) )" # clean string (make sure there no double space in it) s = s.replace(" ", " ").replace(" ", " ") # find inner parenthesis (no-need-to-parse-grammar) or (lhs rhs) format simple_grammars = re.findall("\([^\(\)]*\)", s) # repeat until cannot find ( lhs rhs ) grammar while len(simple_grammars) > 0: # add simple grammar grammar repository # replace them head simple_grammar in simple_grammars: grammar = simple_grammar.split(" ") # '(' == grammar[0] , ')' == grammar[-1] lhs = grammar[1] rhs = grammar[2:-1] grammar_repo.append((lhs, rhs)) s = s.replace(simple_grammar, lhs) simple_grammars = re.findall("\([^\(\)]*\)", s)
in short, start simplest grammar can find , replace them left-hand-side , continue. e.g. find (prp i)
save it, replace prp
. repeat until find syntax.
update: java implementation bit different it's same idea. full code here: http://ideone.com/0ee8bd
printstream ps = new printstream(system.out); arraylist grammarrepo = new arraylist(); string item, grammar; string str = "( s ( np-sbj ( prp ) ) ( inode@s ( vp ( vbp have ) ( np ( dt ) ( inode@np ( nn savings ) ( nn account ) ) ) ) ( . . ) ) )"; // cleanup double spaces while (str.contains(" ")){ str = str.replaceall(" ", " "); } // find no parenthesis zones! matcher m = pattern.compile("\\([^\\(\\)]*\\)").matcher(str); // loop until nothing left: while (m.find()) { item = m.group(); // make grammar: grammar = item.substring(1, item.length()-1).trim().replacefirst(" ", " -> "); if (!grammarrepo.contains(grammar)) { grammarrepo.add(grammar); ps.print(grammar + "\n"); } str = str.replace(item, grammar.split(" -> ")[0]); m = pattern.compile("\\([^\\(\\)]*\\)").matcher(str); }
the output:
prp -> np-sbj -> prp vbp -> have dt -> nn -> savings nn -> account inode@np -> nn nn np -> dt inode@np vp -> vbp np . -> . inode@s -> vp . s -> np-sbj inode@s
Comments
Post a Comment