java - How to extract derivation rules from a bracketed parse tree? -


i have lot of parse trees this:

( s ( np-sbj ( prp  )  )  ( inode@s ( vp ( vbp have  )  ( np ( dt  )  ( inode@np ( nn savings  )  ( nn account  )  )  )  )  ( . .  )  )  ) 

for sentence this: "i have savings account ."

i need extract derivation rules these trees. derivation rules like:

s -> np-sbj inode@s np-sbj -> prp  prp -> inode@s -> vp np , on. 

is there prepared code (preferably in java) or pseudo code purpose?

edit:

i think problem general , common in many areas. simplified problem find each parent , it's children parenthesis tree.

i wrote python. believe can read pseudocode. i edit post java later. added java implementation later.

import re  # grammar repository  grammar_repo = []  s = "( s ( np-sbj ( prp  )  )  ( inode@s ( vp ( vbp have  )  ( np ( dt  )  ( inode@np ( nn savings  )  ( nn account  )  )  )  )  ( . . )  )  )" # clean string (make sure there no double space in it) s = s.replace("   ", " ").replace("  ", " ")  # find inner parenthesis (no-need-to-parse-grammar) or (lhs rhs) format simple_grammars = re.findall("\([^\(\)]*\)", s) # repeat until cannot find ( lhs rhs ) grammar while len(simple_grammars) > 0:     # add simple grammar grammar repository     # replace them head     simple_grammar in simple_grammars:         grammar = simple_grammar.split(" ")         # '(' == grammar[0] , ')' == grammar[-1]         lhs = grammar[1]         rhs = grammar[2:-1]         grammar_repo.append((lhs, rhs))          s = s.replace(simple_grammar, lhs)      simple_grammars = re.findall("\([^\(\)]*\)", s) 

in short, start simplest grammar can find , replace them left-hand-side , continue. e.g. find (prp i) save it, replace prp. repeat until find syntax.

update: java implementation bit different it's same idea. full code here: http://ideone.com/0ee8bd

printstream ps = new printstream(system.out); arraylist grammarrepo = new arraylist(); string item, grammar; string str = "( s ( np-sbj ( prp  )  )  ( inode@s ( vp ( vbp have  )  ( np ( dt  )  ( inode@np ( nn savings  )  ( nn account  )  )  )  )  ( . . )  )  )"; // cleanup double spaces while (str.contains("  ")){     str = str.replaceall("  ", " "); } // find no parenthesis zones! matcher m = pattern.compile("\\([^\\(\\)]*\\)").matcher(str);  // loop until nothing left: while (m.find()) {     item = m.group();     // make grammar:     grammar = item.substring(1, item.length()-1).trim().replacefirst(" ", " -> ");      if (!grammarrepo.contains(grammar)) {         grammarrepo.add(grammar);         ps.print(grammar + "\n");     }      str = str.replace(item, grammar.split(" -> ")[0]);     m = pattern.compile("\\([^\\(\\)]*\\)").matcher(str); } 

the output:

prp -> np-sbj -> prp vbp -> have dt -> nn -> savings nn -> account inode@np -> nn nn np -> dt inode@np vp -> vbp np . -> . inode@s -> vp . s -> np-sbj inode@s 

Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -