As mentioned in the introduction, this section is divided into three parts:
There are also some uncommon links you might need, but they are not required for the seminar: Uncommon links.
A < B (A is the parent of B)
This is the most simple and one of the most common links.
A is the parent node of B and immediately dominates it.
Let us take a very simple structure: a noun phrase consisting of a determiner and a noun. It will look like this:
A query that would match this structure would be: NP < DT (NP is the parent of DT). It would find all sentence trees that contain a determiner inside a noun phrase.
A > B (A is the child of B)
Sometimes it is easier and more comfortable to use this link (>) instead of
B < A(B is the parent of A),
especially if it is the child node you are querying for because you can easily combine queries for a single node.
We take that simple structure again:
To match this structure we simply
turn aroundthe query from the last example: DT > NP (DT is the child of NP).
Advanced example (combining
Now we want to find sentences which have a PP inside of a VP.
To make the query more realistic, we want to have an the preposition
on(query: IN < /^on$/) as a child of PP. A matching structure looks like this:
In the query we combine
<and form the more complex query.
PP > VP < (IN < /^on$/)
(PP is a child of VP and the parent of IN, which is the parent of /^on$/).
A $ B (A is a sister of B)
When you search for two different nodes with the same parent (=immediately dominated by the same node) you can use this link, it is represented by a dollar sign ($).
In this tree DT and NN are sister nodes because they both are dominated by a NP, so the query DT $ NN (DT is a sister of NN) would match this tree.
As you are only searching for a sister-relation you can also use NN $ DT (NN is a sister of DT) because in this case the node names are interchangeable: DT is a sister of NN and NN is a sister of DT.
The only condition is that the nodes have to be different, so the query DT $ DT (DT is a sister of DT) would not match this tree, even though
bothsymbols are immediately dominated by NP.
But it would match this following tree because it contains two determiners:
A $.. B (A is a sister of and precedes B)
Sometimes you want to be a little more precise than just querying for sister nodes. If you know the relative order of your sister nodes, you should use this basic link, a dollar sign followed by two dots ($..).
In our example it is very easy, the determiner always precedes its noun. A query that would match this structure is: DT $.. NN (DT is a sister of and precedes NN).
But when we use this query DT does not have to immediately precede NN, the following structure would also be matched because DT and NN are sisters and DT does in fact precede NN.
A $. B (A is a sister of and immediately precedes B)
If you want to have more exact results and search for a node that is a sister of and immediately precedes another you will need to use this link, a dollar sign followed by one dot ($.).
The query DT $. NN (DT is a sister of and immediately precedes NN) will not match this structure:
But it will of course match this stricter structure, as the determiner immediately precedes the noun.
This basic link
$.is more restrictive than
$..and you should thus prefer it to
$..(wherever it is possible).
The less precise basic link might overgenerate, that is give you too many results you do not want to have.
A $,, B (A is a sister of follows B)
This basic link is similar to
A $.. B, it looks for node A which has a sister node B and follows B.
We take a simple noun phrase:
A query that is using
$,,and matching this structure is NN $,, DT (NN is a sister of and follows DT).
A $, B (A is a sister of and immediately follows B)
This basic link is more restrictive than
$,,— A has to follow B immediately.
The query from the previous link will not work:
If we use the query NN $, DT (NN is a sister of and immediately follows DT) we will not get the structure above as a result, there is a JJ between these nodes.
The query JJ $, DT (JJ is a sister of and immediately follows DT) will match, though.
With this basic links you do not query for hierarchical structures but for the order in which the nodes and the words are saved in the treebank. They are useful as additional links which can narrow down your search but usually you will start with hierarchical links.
A . B (A immediately precedes B)
We once again take our simple structure.
It is apparent that the determiner the immediately precedes the noun cat. So two possible queries matching this structure would be:
- DT . NN (DT immediately precedes NN)
- /^the$/ . /^cat$/ (/^the$/ immediately precedes /^cat$/)
These two queries are basically the same because when you look at this structure in Labelled Bracketing you will see that DT and the are in the same brackets and so are NN and cat.
(NP (DT the)
This means that this structure will also match such queries as NP . DT (NP immediately precedes DT) and NP . /^the$/ (NP immediately precedes the).
NP . NN will not match because NP does not immediately precede NN, there is a determiner between them.
A .. B (A precedes B)
Querying with restrictive linear basic links, such as
,, will in most cases undergenerate, that is give you too few results.
It is quite obvious that when you query for /^the$/ . /^cat$/ (the immediately follows cat) you will miss many possible results:
the black cat, the sleeping cat, the very large but really, really cute cat
When you are searching for a verb phrase like were sitting (query: /^were$/ . /^sitting$/) you might want to include verb phrases like were not sitting or were finally sitting.
For this you will need to use this less restrictive basic link (..).
The query /^the$/ .. /^cat$/ (the follows cat) will match the above tree, the sleeping cat, the very large but really, really cute cat and every other sentence which has the determiner the precede cat.
So the sentence The woman walking in the garden saw a cat. will be among the results, even though we do not need this sentence as the is referring to woman and not cat.
This is where our query will overgenerate, give you not needed results.
A , B (A immediately follows B)
This link is the opposite to
.where A immediately precedes B.
Its usage is very similar.
The query NN , DT (NN immediately precedes DT) will match this structure.
It will not match the following structure:
A ,, B (A follows B)
This basic link is less restrictive than
,and thus can overgenerate (cf. A .. B).
The query NN ,, DT (NN precedes DT) will match all of the following structure trees:
These are uncommon basic links, which you will probably not need for your queries. For the sake of completeness they are briefly introduced here.
- A <N B (B is the Nth child of A)
- A >N B (A is the Nth child of B)
- A <, B (B is the first child of A)
- A >, B (A is the first child of B)
- A <-N B (B is the Nth-to-last child of A)
- A >-N B (A is the Nth-to-last child of B)
- A <- B (B is the last child of A)
- A >- B (A is the last child of B)
- A <‚ B (B is the last child of A)
- A >‘ B (A is the last child of B)
- A <: B (B is the only child of A)
- A >: B (A is the only child of B)
- A << B (A dominates B (A is an ancestor of B))
- A >> B (B dominates A (B is an ancestor of A))
- A <<, B (B is a left-most descendant of A)
- A >>, B (A is a left-most descendant of B)
- A <<‚ B (B is a right-most descendant of A)
- A >>‘ B (A is a right-most descendant of B)
- A <<: B (There is a single path of descent from A and B is on it)
- A >>: B (There is a single path of descent from B and A is on it)
- A = B (A is the same node as B)
- A ~ B (A has the same name as B)