Similarly to the SUSANNE Corpus Tagset, the Penn Treebank Tagset consists of two main parts. There is the syntactic tagset and the POS tagset.
The syntactic tagset
ADJP | Adjective phrase |
ADVP | Adverb phrase |
NP | Noun phrase |
PP | Prepositional phrase |
S | Simple declarative clause |
SBAR | Clause introduced by subordinating conjunction or 0 (see below) |
SBARQ | Direct question introduced by wh-word or wh-phrase |
SINV | Declarative sentence with subject-aux inversion |
SQ | Subconstituent of SBARQ excluding wh-word or wh-phrase |
VP | Verb phrase |
WHADVP | Wh-adverb phrase |
WHNP | Wh-noun phrase |
WHPP | Wh-prepositional phrase |
X | Constituent of unknown or uncertain category |
Null elements | |
* | „Understood“ subject of infinitive or imperative |
0 | Zero variant of that in subordinate clauses |
T | Trace—marks position where moved wh-constituent is interpreted |
NIL | Marks position where preposition is interpreted in pied-piping contexts |
The POS tagset
CC | Coordinating Conjunction |
CD | Cardinal Number |
DT | Determiner |
EX | Existential there |
FW | Foreign word |
IN | Preposition/subordinating conjunction |
JJ | Adjective |
JJR | Adjective, comparative |
JJS | Adjective, superlative |
LS | List item marker |
MD | Modal |
NN | Noun, singular or mass |
NNS | Noun, plural |
NNP | Proper noun, singular |
NNPS | Proper noun, plural |
PDT | Predeterminer |
POS | Posessive ending |
PRP | Personal pronoun |
PP | Posseive pronoun |
RB | Adverb |
RBR | Adverb, comparative |
RBS | Adverb, superlative |
RP | Particle |
SYM | Symbol (mathematic or scientific) |
TO | to |
UH | Interjection |
VB | Verb, base form |
VBD | Verb, past tense |
VBG | Verb, gerund/present participle |
VBN | Verb, past participle |
VBP | Verb, non-3rd person singular present |
VBZ | Verb, 3rd person singular present |
WDT | wh-determiner |
WP | wh-pronoun |
WP$ | Possesive wh-pronoun |
WRB | wh-adverb |
# | Pound sign |
$ | Dollar sign |
. | Sentence-final punctuation |
, | Comma |
: | Colon, semi-colon |
( | Left bracket character |
) | Right bracket character |
" | Straight double quote |
‘ | Left open single quote |
“ | Left open double quote |
’ | Right closed single quote |
” | Right closed double quote |
This list is taken from the HTML version of ‚Building a large annotated corpus of English: the Penn Treebank‘ by Mitchell P. Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini which also contains a lot of useful information about the Penn Treebank.