The field of syntax looks at the mental 'rules' that we have for forming sentences and phrases. In English, for instance, it is grammatical to say 'I speak English', but ungrammatical to say 'English speak I' — this is because of a rule which says that subjects normally precede verbs which precede the object.
Academic syntacticians often study either the grammar of a particular language (e.g. English Grammar), or study the various theories of a Generative Grammar — a theory which claims there is a universal underlying grammar in our heads, which different languages activate in different ways. The main concern of Generative Grammar is discovering the grammatical rules which apply to all languages, and determining how the manifest differences in world languages can be accounted for.
However, there are many competing theories which do not make — or at least remain neutral on — such a strong claim regarding a universal grammar. They include construction grammar, and cognitive linguistics. There are a number of grammatical theories that stand more or less in opposition to the perspectives of generative grammar, for example
At the level of syntax, it is important to understand the distinction between grammaticality and semantic soundness. Sentences such as Me Tarzan, you Jane clearly make sense even though they sound "wrong". Conversely, the sentence Colorless green ideas sleep furiously (famously created by Noam Chomsky) doesn't sound "wrong" in the way the last sentence did, but it is hard to imagine anything which it would describe. We can say that these two aspects of acceptability vary independently: grammaticality, or well-formedness, gives the rules for how sentences must be constructed, but does not restrict their content, while semantic acceptibility may occur even in ill-formed sentences. When we say that something may occur grammatically, we mean that well-formedness is preserved, with complete disregard for semantics.
The basic units of syntax are words and clitics. A word is the smallest free form in language that can be spoken in isolation or in varying positions within a sentence and retain its semantic and pragmatic content, which we can informally refer to as its meaning. A clitic is syntactically indistinguishable from a word, but, unlike a word, it is phonologically dependent upon some other constituent of a sentence.
In general, words and clitics can be divided into groups called lexical categories. This grouping is accomplished by observing similarities in words and clitics according to several factors, such as their distribution within a sentence, the types of affixes that can be attached to them, and the type of meaning they express. All of these factors can be taken into account together because words and clitics that correlate well with each other according to one factor generally also correlate well according to the others. While classifications according to meaning or inflection can sometimes remain subject to significant ambiguity, tests of distribution are generally very reliable. Words of certain categories are restricted to occurring or not occurring alongside words of other specific categories, and these restrictions can allow simple determinations of category based on distributions to be made.
While syntactic categories can vary somewhat from language to language, most languages nonetheless possess a similar set of categories. A typical list of such categories with their most common linguistic abbreviations is given as follows:
- Noun (N)
- Pronoun (Pr)
- Verb (V)
- Adjective (A)
- Adposition (P)
- Adverb (Adv)
- Degree word (Deg)
- Auxiliary (Aux)
- Modal auxiliary (Mod)
- Interjection (I)
- Determiner (D or Det)
- Conjunction (Con)
- Coordinate conjuction (C)
- Subordinate conjunction (Sub)
The twice-indented categories can often be treated as simply a part of the categories they fall under; treating them seperately is unnecessary for the purposes of most basic syntactic investigation.
The last several categories in the list, from degree words onward, are referred to as functional categories in contrast with other lexical categories because their words have little lexical meaning but instead express such things as grammatical relationships and functions between other words in a sentence. Words that fit these criteria but cannot be comfortably placed in any group are sometimes given the generic label of "particle" (Par), though the term "particle" generally refers to a much broader set of words and clitics than this, also encompassing some conjunctions, interjections, determiners, etc.
Theories of syntax generally view sentences as being made up of constituents, which are either made up of constituents themselves or are irreducible units. Constituents which are made up of multiple words are traditionally known as phrases. For now, we provisionally use this traditional definition. However, some refinements of the theory of constituent structure give a different and more specific meaning to the term "phrase", which permits the existence of certain types of one-word phrases; we will encounter this meaning later.
To understand the concept of a constituent, consider the sentence:
(1a) I hit the big man
Reflecting upon the meaning of this sentence, we see that there are three semantic components of what is being stated:
- There is an action occurring, which is hitting.
- The speaker is the person performing the action.
- The big man is the person undergoing the action.
Thus it seems natural that "I", "hit", and "the big man" should function as units, while "I hit the" and "hit the big" should not. The meaning of these semantic relations will be explored further in the following chapter, but suffice it to say that you probably already have some innate intuition about what a constituent is in some sense.
According to the most common analysis, sentence (1a) can be seperated into constituents like this:
(1b) (I (hit (the (big man) ) ) )
For each lexical category, there is a corresponding constituent or phrasal category, and taken together, these are referred to as syntactic categories. Each constituent is assigned to one of these categories according to much the same criteria as those used for assigning words and clitics to lexical categories. Consequently, because of the reliable distribution factor, constituents of a category generally may always appear in the same position in a sentence. For instance, in the sentence
(2a) I see _____"
Constituents of the same category as "the big man" (e.g. "the doll", "a fast, red car", "this quickly-moving snowflake") may occur in the gap, whereas constituents similar to "eats pie" (e.g. "drives fast", "writes a letter") cannot.
We will soon explore how constituent structures can be discovered, but we first must make note of their internal structure. Phrases are generally binary-branching, with one of the elements making up the phrase being known as the head and the other as the complement. The head is defined to be the obligatory element which gives the phrase its defining properties, generally including the constituent category of the phrase. Going back to our previous example, let's look at a larger set of phrases which can be inserted into (2a);
- a cat; a dog; a fast, red car; a paper; a small toy
- the big man; the doll; the house; the zebra
- this Argentinian hyena; this umpire; this quickly-moving snowflake
- that big-scale event; that wolf
- every impressively large elephant; every person
What do these phrases all have in common? We might be inclined to say that the defining property is the presence of a noun, but this fails for our purpose. Note that the following sentence is ungrammatical:
(2b) *I see cat
The only other common element is what is known as a determiner (a, the, this, that, every, etc.). According to this analysis, these constituents are headed by a determiner and have complements of the form cat; dog; fast, red car; etc. As it is conventional to name phrases after their heads, we will call these determiner phrases, or DPs for short.
There are a number of tests we can use to get an idea whether a group of words in a sentence is a constituent or not. Many of these are largely language-dependent, even though the concept of a constituent is thought to be universal.
Constituents bear grammatical relation (or grammatical function), which detemines how they behave syntactically within a proposition.
In a simplified analysis, the sentence may be viewed as generally consisting of three major constituents, which play the grammatical roles of subject, object, and verb.
- Colloquially this may be referred to as 'grammar', but this is technically incorrect, as 'grammar' also includes morphology, (as well as other aspects of linguistics in some usages).
- Whether this is always true is very controversial. Noam Chomsky's Minimalist theory states that this is always the case, but phrases such as "Andrew and Bob and Carol" create a dilemma.
- Other analyses are possible, but beyond our scope at the moment.