Linguistics/Syntax

The field of syntax^[1] looks at the mental 'rules' that we have for forming sentences and phrases. In English, for instance, it is grammatical to say 'I speak English', but ungrammatical to say 'English speak I' — this is because of a rule which says that subjects normally precede verbs which precede the object.

Academic syntacticians often study either the grammar of a particular language, or study the various theories of a generative grammar — a theory which claims there is a universal underlying grammar in our heads, which different languages activate in different ways. The main concern of generative grammar is discovering the grammatical rules which apply to all languages, and determining how the manifest differences in world languages can be accounted for.

However, there are many competing theories which do not make — or at least remain neutral on — such a strong claim regarding a universal grammar. They include construction grammar and cognitive linguistics. There are a number of grammatical theories that stand more or less in opposition to the perspectives of generative grammar, like systemic functional grammar or word grammar.

At the level of syntax, it is important to understand the distinction between grammaticality and semantic soundness. Sentences such as Me Tarzan, you Jane clearly make sense even though they sound "wrong". Conversely, the sentence Colorless green ideas sleep furiously, famously created by Noam Chomsky, doesn't sound 'wrong' in the way the last sentence did, but it is hard to imagine anything which it would describe. We can say that these two aspects of acceptability vary independently: grammaticality, or well-formedness, gives the rules for how sentences must be constructed, but does not restrict their content, while semantic acceptability may occur even in ill-formed sentences. When we say that something may occur grammatically, we mean that well-formedness is preserved, with complete disregard for semantics.

Units

The basic units of syntax are words and clitics. A word is the smallest free form in language that can be spoken in isolation or in varying positions within a sentence and retain its semantic and pragmatic content, which we can informally refer to as its meaning. As we've seen, a clitic is syntactically indistinguishable from a word, but, unlike a word, it is phonologically dependent upon some other constituent of a sentence.

Lexical categories

In general, words and clitics can be divided into groups called lexical categories. This grouping is accomplished by observing similarities in words and clitics according to several factors, such as their distribution within a sentence, the types of affixes that can be attached to them, and the type of meaning they express. All of these factors can be taken into account together because words and clitics that correlate well with each other according to one factor generally also correlate well according to the others. While classifications according to meaning or inflection can sometimes remain subject to significant ambiguity, tests of distribution are generally very reliable. Words of certain categories are restricted to occurring or not occurring alongside words of other specific categories, and these restrictions can allow simple determinations of category based on distributions to be made.

While syntactic categories can vary somewhat from language to language, most languages nonetheless possess a similar set of categories. A typical list of such categories with their most common linguistic abbreviations is given as follows:

Noun (N)
Proper noun (PN)
Pronoun (Pr)
Verb (V)
Adjective (A)
Adposition (P)
Adverb (Adv)
Degree word (Deg)
Auxiliary (Aux)
- Modal auxiliary (Mod)
Interjection (I)
Determiner (D or Det)
Conjunction (Con)
- Coordinate conjunction (Co)
- Subordinate conjunction (Sub)

The twice-indented categories can often be treated as simply a part of the categories they fall under; treating them separately is unnecessary for the purposes of most basic syntactic investigation.

The last several categories in the list, from degree words onward, are referred to as functional categories in contrast with other lexical categories because their words have little lexical meaning but instead express such things as grammatical relationships and functions between other words in a sentence. Words that fit these criteria but cannot be comfortably placed in any group are sometimes given the generic label of particle (Par), though the term 'particle' generally refers to a much broader set of words and clitics than this, also encompassing some conjunctions, interjections, determiners, etc.

Constituent structure

Theories of syntax generally view sentences as being made up of constituents, which are either made up of constituents themselves or are irreducible units. Constituents which are made up of multiple words are traditionally known as phrases. For now, we provisionally use this traditional definition. However, some refinements of the theory of constituent structure give a different and more specific meaning to the term "phrase", which permits the existence of certain types of one-word phrases; we will encounter this meaning later.

Constituent Categories

For each lexical category, there is a corresponding constituent or phrasal category, and taken together, these are referred to as syntactic categories. Each constituent is assigned to one of these categories according to much the same criteria as those used for assigning words and clitics to lexical categories. Consequently, because of the reliable distribution factor, constituents of a category generally may always appear in the same position in a sentence. For instance, in the sentence

(1a) I see _____

Constituents of the same category as the big man (e.g. the doll, a fast, red car, this quickly-moving snowflake, Barack Obama) may occur in the gap, whereas constituents similar to "eats pie" (e.g. "drives fast", "writes a letter") cannot. Note that certain phrases similar to the acceptable constituents, such as doll or happy camper, do not. The particular structure that can fit into our blank will, for the purpose of this book, be called a noun phrase (NP).^[2]

Now, let's alter (1a) a bit and obfuscate the word see as well:

(1b) I _________

There are various types of phrases that can fit into this sentence, including fly, am completely fed up with this, read a book of linguistics and have been struggling for years to understand linguistics in vain. Note that the final sentence is still acceptable, even though it's very long. These are called verb phrases (VP) in this book.

There are two other basic types of phrases. They fit into (1c) and (1d) respectively:

(1c) I feel ______. (1d) There is a book ______.

(1c) can be filled by happy, foolish or a little squirmy. These are adjective phrases (AP). (1d) can be filled by on the table, in the drawer or at school. These are prepositional phrases (PP).

The type of analysis we are doing here is structural analysis. We are trying to analyse what sort of constituent we can plug inside a certain part of our sentence. However, this does not allow us to see the forest for the trees.

Constituent Analysis

Since our previous analysis has proven to be insufficient, we can take another approach, known as constituent analysis.

Hierarchical Organisation of Constituents

Yet constituent analysis is still not enough. In constituent analysis, we viewed our sentences as simple blocks of constituents that can be lumped together in various ways, but actual sentences in language do not work that way. Consider this sentence:

(2a) I hit the big man

Reflecting upon the meaning of this sentence, we see that there are three semantic components of what is being stated:

There is an action occurring, which is hitting.
The speaker is the person performing the action.
The big man is the person undergoing the action.

Thus it seems natural that "I", "hit", and "the big man" should function as units, while "I hit the" and "hit the big" should not. This was not clear in our previous analysis.

In addition, there seems to be a hierarchical organisation in the sentence. The phrase hit the big man likely makes sense to you, since it describes a real action: hitting a big man. However, I hit does not make sense: hit is a transitive verb which needs an object. This seems to tell us that I and hit the big man are two larger constituents, known as subject and predicate in traditional grammar. For now, sentence (2a) can be separated into constituents like this:

(2b) [I [hit [the big man] ] ]

We can fill up our bracketed sentence with our lexical and constituent categories to produce this:

(2c) S[ Pr[I] VP[ V[hit] NP[ Det[the] Adj[big] N[man] ] ] ]

Rewrite rules

Going back to our previous example, let's look at a larger set of phrases which can be inserted into (1a);

a cat; a dog; a fast, red car; a paper; a small toy
the big man; the doll; the house; the zebra
this Argentinian hyena; this umpire; this quickly-moving snowflake
that big-scale event; that wolf
every impressively large elephant; every person
Barack Obama; Nelson Mandela; Mao Zedong
He; she; it

What can we infer from these phrases? Well, we find that they have one of three structures:

Proper noun
Pronoun
Determiner + Noun, sometimes with an adjective in between

We can thus produce a rewrite rule:

(3a) NP → {PN, Pr, Det (Adj) N}

Where the braces indicate 'choose one from the set' and the round brackets indicate optionality.

Similarly, we can produce a rewrite rule for verb phrases and sentences:

(3b) VP → V (NP) (Adv)
(3c) S → NP VP

All of the rewrite rules so far are phrase structure rules, which deal with phrases. We also need lexical rules to inject the lexicon into our grammar. Let's add some lexical rules: (3d) Pr → {he, she, it, I}
(3e) PN → {Chomsky, Jackendoff, Pinker}
(3f) Det → {this, that, the, a, my, some}
(3g) Adj → {happy, green, lucky, colourless}
(3h) N → {computer, book, homework idea}
(3i) V → {defended, attacked, do, eat, slept, poisoned}
(3j) Adv → {furiously, happily, noisily}

Using these rules, we can produce sentences:

We start with a simple S.
We rewrite S as NP VP.
We rewrite NP as PN and VP as V.
We rewrite NP as He and VP as slept.

This gives us our final sentence: He slept. Here are some other possible sentences:

Chomsky defended a green idea furiously.
This green computer attacked Jackendoff.
She poisoned Pinker noisily.
It slept furiously.

Obviously, our grammar is not perfect. There are many things that we did not take into account when we produced this very simplified grammar. However, this should give us a general idea of how rewrite rules work, and that will suffice in our beginner's book.

Syntax Trees

Bracketed sentences can get fairly ugly, as you've no doubt seen above. Most syntacticians now use another technique known as syntax trees to map sentences in a schematic manner.

Each of constituent is known as a node. Nodes are connected by branches. Lexical categories are terminal nodes as they do not branch out further.

Phrases are usually made of two constituents. This is known as the binary branching condition.^[3] The head is defined to be the obligatory element which gives the phrase its defining properties, generally including the constituent category of the phrase. For verb phrases, for example, the head is the verb.

For instance, this is the syntax tree of the sentence I hit the man:

You may be wondering how we could fit big in the sentence without violating the binary branching condition. In fact, there are more layers involved than meets the eye, but the other layers will not be presented here.

Movement

Recall from the introduction that sentences have a different structure in our minds than are actually produced. The process of deriving the surface structure from the deep structure is called movement since it involves moving things around. English has many types of movements that apply to a wide variety of situations. Compare:

(4a) It seems that Wikibooks is useful.
(4b) Wikibooks seems to be useful.
(5a) You will try to steal what?
(5b) What will you try to steal?
(6a) Volunteers wrote the Linguistics textbook.
(6b) The Linguistics textbook was written by volunteers.

In the above set, the a-sentence is the deep structure, and the b-sentence is the surface structure. Note that (5a) is not normally produced except in the case of echo questions, such as the following situation:

(7a) A: The best website on the Internet is Wikipedia.
(7b) B: The best website on the Internet is what?

B has echoed A's question because of disbelief, perhaps because he considers Wikibooks to be the single best website.

Let's look at one specific type of movement: inversion in interrogative sentences. Before we can do that, let's change our rewrite rules a bit:

(8a) S → NP (Aux) VP
(8b) Aux → {will, may, might do}

This caters for sentences like this, which readers are likely intimately familiar with:

(9a) I will do my homework.

Now, let's turn it into a question:

(9b) Will I do my homework?

Can you generalise a rule for this type of inversion? Here it is:

(10) NP Aux VP ⇒ Aux NP VP

Let's wrap up by looking at the two trees .

⇒

Again, this does not satisfy the binary branching condition, but how to alter the tree to satisfy it is not within the scope of our book.

Recursion

Adjuncts

One of the most important features of human language is the ability to process recursion, which is why it must be introduced in our book, even though it is a more advanced concept. When we have an NP a book, we can expand the NP several times: a book on the desk, a book on the desk in the classroom, etc. Each time, it remains a noun phrase. This is known as recursion.

[DIAGRAM HERE]

PPs are not the only phrases that can be added to another phrase recursively. Here are some other examples:

Adverbials: Colourless green ideas sleep furiously all day long.
Participle phrases: He was standing by the door impatiently, waiting for John.

Complement clauses

Another notable type of recursion in linguistics is the complement phrase. Consider the following:

I think John is a genius. John is happy that I think he is a genius. It is surprising that John is happy that I think he is a genius. I believe that it is surprising that John is happy that I think he is a genius.

that is known as a complementiser (C) in English as it is capable of producing complement clauses. We can generate lots of sentences in a recursive manner. Other common complementisers are below:

I was asking whether John was happy or not.
He is wondering if I think he is a genius.
It would be strange for him to consider me a genius.

The clauses followed by a complementiser are called complement clauses. Note that complementisers were erroneously classified as subordinating conjunctions in traditional grammar!

Grammatical relations

Constituents bear grammatical relation (or grammatical function), which determines how they behave syntactically within a proposition.

In a simplified analysis, the sentence may be viewed as generally consisting of three major constituents, which play the grammatical roles of subject, object, and verb.

Notes

↑ Colloquially this may be referred to as 'grammar', but this is technically incorrect, as 'grammar' also includes morphology, (as well as other aspects of linguistics in some usages).
↑ Modern syntacticians have taken a different approach, believing that there is a structure above the noun phrase, but this is outside the scope of an introductory book on linguistics.
↑ Whether this is always true is very controversial. Noam Chomsky's Minimalist theory states that this is always the case, but phrases such as "Andrew and Bob and Carol" create a dilemma.

[1] Colloquially this may be referred to as 'grammar', but this is technically incorrect, as 'grammar' also includes morphology, (as well as other aspects of linguistics in some usages).

[2] Modern syntacticians have taken a different approach, believing that there is a structure above the noun phrase, but this is outside the scope of an introductory book on linguistics.

[3] Whether this is always true is very controversial. Noam Chomsky's Minimalist theory states that this is always the case, but phrases such as "Andrew and Bob and Carol" create a dilemma.

[1]

[2]

[3]