I have a tree, specifically a parse tree with tags at the nodes and strings/words at the leaves. I want to pass this tree as input into a neural network all the while preserving its structure.
Current approach Assume we have some dictionary of words w1,w2.....wn Encode the words that appear in the parse tree as n dimensional binary vectors with a 1 showing up in the ith spot whenever the word in the parse tree is wi
Now how about the tree structure? There are about 2^n possible parent tags for n words that appear at the leaves So we cant set a max length of input words and then just brute force enumerate all trees.
Right now all i can think of is to approximate the tree by choosing the direct parent of a leaf. This can be represented by a binary vector as well with dimension equal to number of different types of tags - on the order of ~ 100 i suppose. My input is then two dimensional. The first is just the vector representation of a word and the second is the vector representation of its parent tag
Except this will lose a lot of the structure in the sentence. Is there a standard/better way of solving this problem?
You need a recursive neural network. Please see this repository for an example implementation: https://github.com/erickrf/treernn
The principle of a recursive (not recurrent) neural network is shown in this picture.
It learns representation of each leaf, and then goes up through the parents to finally construct the representation of the whole structure.
Encode tree structure: Think of Recurrent Neural Network, which you have one chain which can be construct by for loop. But here you have a tree. So you would need do some kind of loop with branch. Recursive function call might work with some Python overhead. I suggest you build neural network with 'define by run' framework (like Chainer, PyTorch) to reduce overhead. Because your tree may have to be rebuild different for each data sample, which require to rebuilding computation graph. Read Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks, with original Torch7 implementation here and PyTorch implementation, you may have some ideal.
For encoding a tag at node, I think an easiest way would be encoding them as you do with word. For example, a node data is [word vector][tag vector]. If node is leaf, you have word, but may not have tag (you did not say that there is tag at leaf node), so leaf data representation is [word][zero vector] (or [word vector][tag vector]). The case inner node that does not have word=> [zero vector][tag vector]. Then, you have inner node and leaf with same vector dimension of data representation. You may treat them equally (or not :3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With