Conlang/Advanced/Grammar/Government/Constituency, trees, rules

From Wikibooks, open books for an open world
< Conlang‎ | Advanced‎ | Grammar‎ | Government
Jump to navigation Jump to search

Parts of speech

Constituency, trees, rules

Structural relationships

Constituency, Trees, Rules[edit]

A constituent is any group of items in a sentence that are linked to one another to form a unit. So far the only constituents we’ve seen are words, but there are other sorts of constituents too, namely, phrases. Phrases are groups of words or phrases that are connected more closely to one another than to words and phrases outside the group. In general we say that every phrase has a head (a word) and every head projects a phrase, but this isn’t always the case. A head of a particular syntactic category projects a phrase for that category: a noun (N) projects a noun phrase (NP), a verb (V) projects a verb phrase (VP), etc. We can also say that phrases can be composed of two conjoined phrases of the same time, without having a head.

We can see constituents in sentences just by asking if two words are more closely related to one another than to something else. For example, in “the dog bit the man”, we would say that “the” and “dog” form a constituent with one another, “the” and “man” form a constituent, and so on. Knowing which form of phrase is formed depends on what’s more central to the meaning of the constituent. “The dog” and “the man” are more about “dog” and “man”, respectively, and not as much about “the”, so each of those constituents would be NPs.

Showing the Constituency Hierarchy of a Sentence[edit]


There are two ways of showing constituency. The first method is by using brackets. This is the easiest to do in text, because it doesn’t involve any special shapes or such. The general way of showing constituency is by putting brackets around a constituent and putting the type of constituent in subscripts after the first bracket. The two identified constituents from the previous example sentence would then be:

1) [NP [D the] [N dog]]
2) [NP [D the] [N man]]

Sometimes you’ll see some constituents incompletely bracketed, such as

3) [NP the dog]
4) [NP the man]

This occurs when the structure that’s been neglected is irrelevant to whatever is being demonstrated by showing constituency. You might also see bracketing without labels (e.g. [the dog] for [NP the dog]) when the labels are irrelevant or recoverable from context. When you’re beginning with constituency it’s better to fully indicate constituency, regardless of relevance.


More common for showing the structure of a sentence is the tree diagram. In trees, constituents are represented by nodes connected by lines. Lines connect up from one constituent to another that encloses it. The nodes themselves are the type (N, NP, V, VP, etc.). Heads (N, V, etc.) often have the actual word written below them. The tree versions of [NP [D the] [N dog]] and [NP [D the] [N man]] are shown below.

5 & 6) Intermediate-Syntax-Trees-5&6.jpg

 Triangles are used to indicate that the constituency isn’t fully shown.

7 & 8) Intermediate-Syntax-Trees-7&8.jpg

 One general rule for drawing tree structures is that lines never cross. In other words, a constituent must be formed by adjacent items. In the advanced syntax tutorial we’ll explore how non-adjacent constituency is handled and why it arises.

Describing Allowable Constituency Hierarchies[edit]

Showing constituency via bracketing and trees is useful for describing what a particular sentence looks like, but syntax is about describing how all sentences look like, and what rules are followed by those sentences. What we need, then, is some way of describing what constituents are acceptable, and what’s contained within them. Looking at the simple examples “the dog” and “the man”, we can derive some basic rules for English NP. In each we have a determiner followed by a noun. If we were to reverse these, giving “dog the” and “man the”, we get ungrammatical phrases, so we can say that an NP consists of a D followed by an N. We can represent this more succinctly as

9) NP → D N

English, of course, also allows collections of adjectives, AdjP, before a noun, but after the determiner, as in “the big dog” (but not as in “big the dog” or “the dog big”), and prepositional phrases, PP, after the noun, as in “the dog in the house” (but not as in “in the house the dog” or “the in the house dog”). So English NP’s can be described more fully like so:

10) NP → D AdjP N PP

We can use X, Y, Z, etc. as metavariables indicated an arbitrary head type. If something is optional, we put it inside parentheses, as in (XP). Alternating choices are put inside braces and separated by forward slashes, as in {XP/YP}. If you need to have any number of an item, for example “XP or XP XP or XP XP XP or ...”, you can put one of the items append a “+” after it, as in XP+. These can be combined to create multiple optional items — (XP+) being no XP, one XP, two XPs, etc. — or multiple instances of a collection of items — {XP/YP/ZP}+ being one or more phrases, each of which can be either an XP, YP, or ZP — and so on.

These rules, with a phrase on the left of an arrow and the contents of that phrase on the right of the arrow, are called Phrase Structure rules, or PS rules. For English, we can describe a large number of sentences with the following set of PS rules:

11) CP → (C) TP
12) TP → {NP/CP} (T) VP
13) VP → (AdvP+) V (NP) ({NP/CP}) (AdvP+) (PP+) (AdvP+)
14) NP → (D) (AdjP+) N (PP+) (CP)
15) PP → P (NP)
16) AdjP → (AdvP) Adj
17) AdvP → (AdvP) Adv
18) XP → XP conj XP
19) X → X conj X

So if we wanted to describe the sentence “the dog bit the man”, we can make this bracketed sentence:

20) [TP [NP [D the] [N dog]] [VP [V bit] [NP [D the] [N man]]]]

And as a tree:

21) Intermediate-Syntax-Trees-21.jpg

Tests for Constituency[edit]

To develop a list of PS rules for a language, it’s necessary to determine what forms constituents. To some extent, it’s possible to do this merely by intuition for your native language, but there are many cases where it’s hard to tell what is a constituent and what is not. To make it easier to determine constituency, there are numerous tests that can be performed on any potential constituent. In the advanced syntax tutorial, we’ll see how these tests give us reason to revise the PS rules, reformulating them as a whole new theory.


One way to test for constituency is to see whether the potential constituent can be replaced by a single word without affecting the meaning of the sentence.

22a) The dog bit the man.
22b) The dog bit him.
23a) The dog bit the man.
23b) It bit him.

In these examples we see pronouns replacing strings of words of the form D N, which suggests that D N forms a constituent. This of course is an English NP. We can then say that anything replaceable by a pronoun is of the same kind of constituent, which allows us to take any collection of sentences and identify examples of pronoun-replaceable constituents to identify what can go into them. This is what gives us the rule for what constitutes an NP.

Stand Alone[edit]

Another kind of constituency test is the stand alone test (a.k.a. the sentence fragment test). For example, if a group of words can be used as an answer to a question, that is, if it can stand alone as a meaningful independent clause, we can say that it’s a constituent.

24a) The dog bit the man.
24b) The dog bit the man.
What did the dog do?
25a) Bit the man.
25b) *Bit the.

These show that “bit the man” forms some constituent, while “bit the” doesn’t form a constituent in English.* By continuing tests involving V’s we find a definition for VP’s.

* The situation is less clear in languages where "the" and "him/her" have the same form. For example, Spanish la "her" resembles la "the (feminine singular)".


Movement tests show constituency by moving a potential constituent without making the sentence ungrammatical. Clefting involves inserting It is or It was before the potential constituent that after it (26). Preposing/pseudoclefting involves inserting Is/are what/who the potential constituent (27). Making a sentence passive, by swapping subject and object, inserting by before the former subject, and turning the verb into passive form (bit becomes was bitten, etc.), will also indicate constituency.

26) It was the dog that bit the man.
27) The dog is what bit the man.
28) The man was bitten by the dog.


The last test for constituency involves taking a potential constituent and conjoining it with something else that’s similar.

29) The dog bit the man.
30) The dog and the cat bit the man.

Constituency Test Failures[edit]

Constituency tests are not always a guarantee. There are situations where a language appears to have constituents that violate other previously identified constituents.

31) The cat saw and the dog bit the man.

(31) would indicate to us that “the cat saw” and “the dog bit” each form a constituent, because they seem to satisfy the conjunction test, which is a violation of previously determined constituency rules. Situations like this require further inquiry. One way we can resolve this, which is taken in the Principles and Parameters framework, is by positing that “the cat saw and the dog bit” aren’t in fact a constituence, but instead there’s an unspoken pronoun (i.e. it has lexical content but no phonological content) that’s called pro, which references “the man”. This would allow us to reanalyze (31) as being a conjunction of two full sentences.

32) [The cat saw [pro]i] and [the dog bit [the man]i]

Here a subscript “i” is used to indicate that pro and the man reference are the same. In this analysis there’s no violation of the PS rules we’ve previously found for English. Other frameworks will handle this sort of situation differently. Lexical Functional Grammar, for example, doesn’t use pro and instead allows a much looser variety of constituents. This sort of failure of the tests show us that there’s more going on in the sentence than there first seems to be, and we’ll explore various different ways of addressing these apparent oddities in the advanced tutorial.

 Next: Structural relationships