Mark Chang's Blog

Machine Learning, Deep Learning and Python

NLTK Tree

今天我們來看看nltk.Tree要怎麼用

先載入模組

1
>>> from nltk import Tree

1.build syntax tree

舉個例子

Gary plays baseball

此句子的剖析樹(syntax tree)是這樣:

nltk1.png

用nltk.Tree來建構tree:

1
2
3
4
5
>>> tree=Tree('S',[Tree('NP',['Gary']),
...           Tree('VP',[Tree('VT',['plays']),
...                     Tree('NP',['baseball'])])])

>>> tree.draw()

即可將剖析樹畫出來

若沒安裝 python-tk,你也可以這樣把tree印出

1
2
>>> tree.pprint()
'(S (NP Gary) (VP (VT plays) (NP baseball)))'

2. Subtrees ,Nodes and Leaves

再來可以對tree進行操作

例如:取得tree的subtree, node 和leaf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
>>> tree[1]
Tree('VP', [Tree('VT', ['plays']), Tree('NP', ['baseball'])])

>>> tree[1].node
'VP'

>>> tree[1,1]
Tree('NP', ['baseball'])

>>> tree[1,1].node
'NP'

>>> tree[1,1,0]
'baseball'

>>> tree.leaves()
['Gary', 'plays', 'baseball']

3. Grammar , Chomsky normal form

我們也可以看看這個tree是由哪些grammar產生的

可以用productions()得出grammar

1
2
>>> tree.productions()
[S -> NP VP, NP -> 'Gary', VP -> VT NP, VT -> 'plays', NP -> 'baseball']

再來,我們可以把grammar轉成chomsky normal form(CNF)

(若不清楚CNF是什麼,請查閱”計算理論”的教科書)

首先,看看一個不符合CNF的例子:

1
2
3
4
5
6
7
8
>>> tree2= Tree('S', [ Tree('NP', ['Gary']),
...                     Tree('VT', ['play']),
...                     Tree('NP', ['baeball'])])

>>> tree2.productions()
[S -> NP VT NP, NP -> 'Gary', VT -> 'play', NP -> 'baeball']

>>> tree2.draw()

nltk2.png

S -> NP VT NP不符合chomsky normal form

可以用chomsky_normal_form()

1
2
3
4
5
>>> tree2.chomsky_normal_form()
>>> tree2.productions()
[S -> NP S|<VT-NP>, NP -> 'Gary', S|<VT-NP> -> VT NP, VT -> 'play', NP -> 'baeball']

>>> tree2.draw()

nltk3.png

轉換後產生了一個新的node,S|<VT-NP>,這樣子就符合CNF了

4.Parse tree from string

如果我們現在有一個string如下:

1
>>> s=r"(S (NP Gary) (VP (VT plays) (NP baseball)))"

要從這個string建立出tree,可用以下方法:

1
2
3
>>> tree3=Tree.parse(s)
>>> tree3.pprint()
'(S (NP Gary) (VP (VT plays) (NP baseball)))'

結語:

想要看更多參考資料,請到:

tutorial:

https://nltk.googlecode.com/svn/trunk/doc/howto/tree.html

http://www.mit.edu/~6.863/spring2011/labs/nltk-tree-pages.pdf

api documentation:

http://nltk.googlecode.com/svn/trunk/doc/api/nltk.tree.Tree-class.html#productions

source code:

http://www.nltk.org/_modules/nltk/tree.html

Comments