Child Language Acquisition and Universal Grammar

Child Language Acquisition: Why Universal Grammar Doesn’t Help

Introduction

Many theories suggest innate knowledge of universal grammar (UG) aids language acquisition.
These include syntactic categories (noun, verb), constraints (structure dependence, subjacency), and parameters (head direction).
Arguments for UG are often based on learnability and evolutionary plausibility.
This article assesses whether specific components of UG knowledge help language learners.
It focuses on areas with apparent learnability issues where innate knowledge is proposed as a solution:
- Identifying syntactic categories
- Acquiring basic morphosyntax
- Structure dependence
- Subjacency
- Binding principles
The article does not compare UG accounts with constructivist or usage-based accounts but evaluates if adding UG constraints simplifies learning.
'Universal grammar' is defined as innate categories, constraints/principles, and parameters that are genetically encoded and not learned.
The goal is to evaluate specific proposals for innate knowledge components that solve learnability problems.
The article evaluates the claim that innate knowledge helps language learning.
The authors identify three problems with innate knowledge proposals:
- Linking: How innate knowledge connects to the input language.
- Inadequate data coverage: Leading to incorrect conclusions in some languages.
- Redundancy: Learning procedures explain learning without innate principles.
The article argues no current proposal for innate knowledge is useful to language learners due to these problems.
UG-based accounts are controversial but remain a current hypothesis with researchers arguing for innate knowledge of specific components.

Identifying Syntactic Categories

One basic task is grouping words into syntactic categories (noun, verb, adjective).
Definitions of these categories are circular; categories are defined by their participation in the system.
For example, a word is a noun if it occurs in similar syntactic contexts to other nouns.
The traditional solution posits these categories as part of UG to avoid circularity.
If children know in advance that there will be a class of nouns and can assign just a few words to this category, they can then add new words to the category on the basis of semantic and/or distributional similarity to existing members.
This section considers three approaches to how children break into these syntactic categories to begin with:
- Distributional analysis
- Prosodic bootstrapping
- Semantic bootstrapping

Distributional Analysis

Syntactic categories are defined distributionally in adult grammar.
Accounts of syntactic category acquisition must include distributional analysis.
- Chomsky advocated a probabilistic approach to words and categories through the analysis of clustering and distributional distance.
- Pinker argues children can use the syntactic distribution of a newly heard word to induce its linguistic properties.
- Mintz advocates frequent frames and assumes a pre-given set of syntactic category labels.
- Valian, Solt, and Stewart suggest children use pattern learning based on distributional regularities.
Learners form clusters that roughly correspond to syntactic categories.
The question is whether learners are helped by innate prespecified categories to which they could be linked.
A better strategy for learners is to use the distributionally defined clusters directly.
Few accounts that assume innate syntactic categories include a mechanism for linking the two.
Mintz suggests children could assign the label noun to the category that contains words for concrete objects, using an innate linking rule. The label verb would then be assigned either to the next largest category or to the category that takes nouns as arguments.
Pinker’s semantic bootstrapping account assumes innate rules linking 'name of person or thing' to noun, 'action or change of state' to verb, and 'attribute' to adjective.
Once the child has used these linking rules to break into the system, distributional analysis largely takes over.
A problem facing Mintz’s and Pinker’s proposals is that they include no mechanisms for linking distributionally defined clusters to other innate categories that are generally assumed as a necessary part of UG, such as determiner, wh-word, auxiliary, and pronoun.
Proposals offer no account of linking for categories other than noun, verb, and adjective.
There exist no proposals at all for how instances of these categories can be recognized in the input - an example of the linking problem.
There are no viable candidates for crosslinguistic syntactic categories other than a noun category containing at least names and concrete objects.
Mandarin Chinese has property words that are similar to adjectives in some respects, and verbs in others.
Haspelmath characterizes Japanese as having two distinct adjective-like parts of speech, one a little more noun-like, the other a little more verb-like.
Even the noun/verb distinction has been disputed for languages such as Salish, Samoan, and Makah.
Maratsos concluded that the only candidate for a universal lexical category distinction is between “noun and Other”.
Pinker argues that the nonuniversality of syntactic categories is not fatal for his theory, provided that different crosslinguistic instances of the same category share at least a “family resemblance structure”.
Crosslinguistic variation means that it is almost certainly impossible in principle to build in innate rules for identifying other commonly assumed UG categories (the problem of data coverage).
In summary, Pinker’s and Mintz’s proposals capture the insight that, in order to form syntactic categories, learners will have to make use of both semantic and distributional information.
Where they falter is in their assumption that these distributional clusters must be linked to innate syntactic categories.
Syntactic categories are language-specific, and children must acquire them on the basis of semantic and distributional regularities.
Even categories as (relatively) uncontroversial as English noun and verb are made up of semantically and distributionally coherent subcategories.
Even if a learner could instantaneously assign every noun or verb that is heard into the relevant category, this would not obviate the need for a considerable degree of clustering based on semantic and distributional similarity.
Innate categories are redundant given that such clustering yields useful syntactic categories.
Distributional analysis must take place at the level of the word, as opposed to other levels such as the phone, phoneme, syllable, etc.
The child will have to conduct distributional analysis at many of these levels simultaneously to solve other problems such as speech segmentation, constructing an inventory of phonemes, and learning the phonotactic constraints and stress patterns of her language.
Units of a certain size occur more often than would be expected if speakers produced random sequences of phones (and, crucially, cooccur with concrete or functional referents in the world).
These units share certain distributional regularities with respect to one another, the type of distributional analysis required for syntactic-class formation.
There is no need to build in innate constraints to rule out every theoretically possible distributional-learning strategy.
The question of how the child knows to perform distributional analysis at the word level, as opposed to some other level, is equally problematic for accounts that do and do not posit innate syntactic categories.
None of the distributional-analysis algorithms outlined above are unequivocally successful in grouping words into categories, but this is no argument for innate syntactic categories.

Prosodic Bootstrapping

The prosodic bootstrapping hypothesis suggests learners use prosodic information to split clauses into syntactic phrases.
Phrase endings signal phrase boundaries and then the child labels each phrase to assign items to categories.
This avoids circularity and the problem of linking distributional clusters to UG categories.
Six-month-old infants are sensitive to prosodic properties.
Infants can discriminate between strings differing in NP/VP boundary, marked by final-syllable lengthening and pitch drop.
The proposal would probably lead to incorrect segmentation in the majority of cases, even looking only at the case of the NP/VP boundary in a single language (i.e. English).
For sentences with unstressed pronoun subjects (e.g. He kissed the dog) as opposed to full NPs (e.g. The boy kissed the dog), prosodic cues place the NP/VP boundary in the wrong place (e.g. *[NP He kissed] [VP the dog]).
84% of sentences were of this type in an analysis of spontaneous speech to a child aged 1;0.
The nonexistence of universal syntactic categories also constitutes a problem for the Christophe et al. approach.
Even if it were somehow possible to come up with a list of universal categories (as well as reliable prosodic cues to phrase boundaries), the proposal would still fail unless it were possible to identify a “flag” for every category in every language.
Given that there exists no proposal for a universal set of flags, the Christophe et al. account suffers from the linking problem.
It also suffers from an additional problem that is common to many UG approaches.

Interim Conclusion

Learners acquire syntactic categories present in the language using distributional and semantic similarities.
Learners use prosodic/phonological cues to category membership.
Theories falter by trying to fit language-specific categories into a rigid framework of innate categories derived from Indo-European languages.
There are essentially no proposals for how children could identify instances of them, other than by using distributional and semantics-based learning, a procedure that yields the target categories in any case.
Nativist proposals for syntactic category acquisition suffer from problems of data coverage, linking, and redundancy.

Acquiring Basic Morphosyntax

Children learn how their language marks ‘who did what to whom’.
For English, this involves subject, verb, and object order; for other languages, morphological marking.
Subject, verb, and object are abstract notions.
A basic semantic Agent-Action-Patient schema does not work for nonactional sentences.
Syntactic roles cannot be defined in terms of semantics and are defined instead in terms of their place within the grammatical system of which they form a part.

Semantic Bootstrapping

Pinker’s semantic bootstrapping account assumes that UG contains not only syntactic roles (e.g. subject, verb, and object), but also innate rules that link each to a particular semantic role (e.g. Agent → subject, verb → Action, Patient → object).
Pinker actually posits a hierarchy of linking rules, but since the first pass involves linking Agent and Patient to subject and object, the facts as they relate to the discussion here are unchanged.
Assume, for example, that the child hears an utterance such as The dog bit the cat and is able to infer (for example, by observing an ongoing scene) that the dog is the Agent (the biter), bit the Action, and the cat the Patient (the one bitten).
By observing in this way that English uses Agent-Action-Patient order, and using the innate rules linking these semantic categories to syntactic roles, the child will discover (in principle from a single exposure) that English uses subject-verb-object word order.
An important aspect of Pinker’s proposal is that once basic word order has been acquired in this way, the linking rules are abandoned in favor of (i) the recently acquired word-order rules and (ii) distributional analysis.
The advantage of Pinker’s account is that it avoids the problems inherent in the circularity of syntactic roles by using nonsyntactic (i.e. semantic) information to break into the system.
Sentences that do not conform to the necessary pattern do not present a problem, since this semantic information is used only as a bootstrap and then discarded.
One basic problem facing Pinker’s proposal is that it is unclear how the child can identify which elements of the utterance are the semantic arguments of the verb (Agent and Patient), and hence are available for linking to subject and object, given the way that the particular target language carves up the perceptual world.
It has been argued that the existence of morphologically ergative(-absolutive) languages (e.g. Dyirbal) constitutes a problem for Pinker’s proposal.
Accusative languages (e.g. English) use one type of case marking (nominative) for A and S, and a different type of case marking (accusative) for P.
Ergative languages use one type of case marking (ergative) for A and another (absolutive) for P and S.
Split-ergative languages have no mapping between semantic and syntactic categories that is consistent across the entire grammar.
Also argued to be problematic for semantic bootstrapping are languages that exhibit true syntactic ergativity (e.g. Dixon 1972, Woodbury 1977, Pye 1990).
Since the mapping between semantic roles and morphological/syntactic marking changes depending on animacy, tense, aspect, and so on, there is no alternative but for children to learn the particular mapping that applies in each part of the system, using whatever probabilistic semantic or distributional regularities hold in that domain.
Let us conclude this section by examining which parts of Pinker’s account succeed and which fail.
Its first key strength is the assumption that children exploit probabilistic, though imperfect, correlations between semantic roles (e.g. Agent) and morphosyntactic marking, whether realized by word order or morphology.
Its second key strength (as noted by Braine 1992) is the principle that ‘old rules analyze new material’, which allows the initial semantically based categories (e.g. Agent) to expand into syntactic categories via distributional analysis.
The problem for Pinker’s proposal is that these learning procedures are so powerful that they obviate the need for innate linking rules (as indeed they must, given that there can be no set of rules that is viable crosslinguistically).

Parameter Setting

An alternative UG-based approach to the acquisition of basic word order is parameter setting (Chomsky 1981b).
Parameter-setting accounts assume that learners acquire the word order of their language by setting parameters on the basis of input utterances.
The specifier-head parameter determines whether a language uses SV (e.g. English) or VS (e.g. Hawaiian) order.
The complement-head parameter determines whether a language uses VO (e.g. English) or OV (e.g. Turkish) order.
The V2 parameter determines whether a language additionally stipulates that a tensed verb must always be the second constituent of all declarative main clauses.
A potential problem facing parameter-setting approaches is parametric ambiguity: certain parameters cannot be set unless the child has previously set another parameter, and knows this setting to be correct.
Recent work, however, has largely solved this problem.
The first solution is to propose that each parameter has a default initial state and/or to relax the restrictions that (i) only changes that allow for a parse of the current sentence are retained (greediness) and (ii) only one parameter may be changed at a time (the single-value constraint).
A much more successful strategy is to have the parser detect ambiguous sentences.
The third possible solution rejects triggering (or transformational learning) altogether in favor of variational learning.
The variational learning model enjoys the advantages of being robust to noise (i.e. noncanonical or ungrammatical utterances) and avoiding having children lurch between various incorrect grammars as they flip parameter settings (as opposed to gradually increasing/decreasing their strength).
Their success depends crucially on the assumption that the learner is able to parse input sentences as sequences of syntactic roles (e.g. subject-verb-object).
The problem is that there are no successful accounts of how this knowledge could be obtained.
The problem is that there are no successful accounts of how this knowledge could be obtained.
Semantic bootstrapping, distributional learning linked to innate syntactic categories, and prosodic bootstrapping do not work.
Children could set the head-direction (VO/OV) parameter on the basis of a crosslinguistic correlation between head direction and branching direction.
Christophe and colleagues propose that children set the head-direction parameter using a correlation with phonological prominence.
Gervain and colleagues provided some preliminary evidence for the prosodic bootstrapping approach by demonstrating that Italian and Japanese eight-month-olds prefer prosodic phrases with frequent items phrase-initially and phrase-finally respectively.
To our knowledge, no study has provided evidence that there exists a sufficiently robust crosslinguistic correlation between the presence of this cue and the setting of a particular parameter (e.g. VO) and (ii) that children are aware of this correlation.

Interim Conclusion

Given the problems with prosodic bootstrapping, parameter-setting accounts have never adequately addressed the linking problem.
This leaves only Pinker’s semantic bootstrapping account, which suffers from the linking problem unless one largely abandons the role of innate semantics-syntax linking rules in favor of some form of a probabilistic input-based learning mechanism.
As in §2 (syntactic categories), we end this section by considering the objection that, by invoking semantic and distributional analysis, we are bringing in innate knowledge by the back door.
But even if it does turn out to be necessary to build in a bias for children to care especially about, for example, causation, this is a very different type of innate knowledge from that assumed under UG theories, in particular, innate semantics-syntax linking rules and word-order parameters.

Structure Dependence

Structure dependence has been called the ‘parade case’ of an innate principle.
Chomsky argued it is impossible for children to acquire the structure of complex yes-no questions from the input since they are virtually absent.
Complex questions are those that contain both a main clause and a relative clause.
Adult rule: move the auxiliary in the main clause to the front of the sentence; The correct rule is structure-dependent.
Chomsky claims children cannot learn that the structure-dependent rule is the correct one, since a person might go through much or all of his life without ever having been exposed to relevant evidence.
Children’s knowledge of UG contains the principle of structure dependence.

Complex Yes-No Questions

There are two questions:
- How children avoid structure-dependence errors and acquire the correct generalization the particular case of complex yes-no questions in English.
- How children know that all linguistic generalizations are structure-dependent.
Potential solutions that do not assume an innate principle:
- Posit that questions are not formed by movement rules at all
- Learners are sensitive to the pragmatic principle that one cannot extract elements of an utterance that are not asserted, but constitute background information
- Children make use of bi/trigram statistics in their input.

Structure Dependence In General

Children know that syntactic rules are structure-dependent in general.
There is abundant evidence for the general principle of structure dependence not only in the language that children hear, but also in the conceptual world.
Such exchanges constitute evidence that strings of arbitrary length that share distributional similarities can be substituted for one another (i.e. evidence for the structure-dependent nature of syntax).
Computer models that use distribution in this way can simulate many structure-dependent phenomena, including the specific example of complex yes-no questions in English, at least to some extent.
The reason that John, the boy, the tall boy, and the boy who is tall can be substituted for one another is that all refer to entities in the world upon which the same kinds of semantic operations can be performed.
Thus, to acquire a structure-dependent grammar, all a learner has to do is to recognize that strings such as the boy, the tall boy, war, and happiness share both certain functional and distributionalsimilarities.
To acquire a structure dependent grammar, all a learner has to do is ro recognize that strings such as the boy, the tall boy, war, and happiness share both certain functional and distributionalsimilarities
Whatever else one does or does not build into a theory of language acquisition, some kind of prelinguistic conceptual structure that groups together functionally similar concepts is presumably inevitable.
That assumption constitute bringing in innate knowledge by the back door.

Subjacency

Newmeyer and Pinker and Bloom cite subjacency, another constraint on syntactic movement, as a prime example of an arbitrary linguistic constraint that is part of children’s knowledge of UG.
The standard UG assumption is that wh-questions are formed from an underlying declarative (or similar) by movement of the auxiliary and, more relevant for subjacency, the wh-word.
The phenomenon to be explained here is as follows. Wh-words can be extracted from both simple main clauses and object complements.
Many other syntactic phrases are ‘islands’ in that wh-words (and other constituents) cannot be extracted from them.
Since Chomsky 1973 (though see Ross 1967 for an earlier formulation), the standard account has been the subjacency constraint, which specifies that movement may not cross more than one ‘bounding node’.
Although this proposal has undergone some modifications, the claim remains that some form of an innate UG island constraint aids learners by allowing them to avoid the production of ungrammatical sentences.
We argue, however, that an innate subjacency constraint is redundant: island constraints can be explained by discourse-pragmatic principles that apply to all sentence types, and hence that will have to be learned anyway.
Most utterances have a topic (or theme) about which some new information (the focus, comment, or rheme) is asserted.
Children will have to learn about information structure in order to formulate even the most basic utterances.
Returning to questions, it is clear that the questioned element is the focus of both a question and the equivalent declarative.
What all island constructions have in common is that the contain information that is old, incidental, presupposed, or otherwise backgrounded in some way.
Do such cases mean that an innate subjacency principle could be actively harmful?
It seems that the only way to prevent an innate subjacency principle from being harmful to learners would be to allow the discourse-pragmatic principles discussed here to override it, rendering subjacency redundant.
The proposal is so successful because its primitives correspond to the primitives of discourse structure.

Binding Principles

Languages exhibit certain constraints on coreference; that is, they appear to block certain pronouns from referring to particular noun phrases.
The standard assumption of UG-based approaches is that such principles are unlearnable (e.g. Guasti & Chierchia 1999/2000:140) and must instead be specified by innate binding principles that are part of UG.
The formal definition of ‘binding’ is that X binds Y if (i) X c-commands Y and (i) X and Y are coindexed (i.e. refer to the same entity).

Principle C

Principle C, which rules that a R(eferring)-expression must be free everywhere.
The functional explanation is as follows. As we saw in the previous section, the topic/theme is the NP that the sentence is ‘about’, and about which some assertion is made (the comment/focus/rheme).
When a particular referent is already topical, it is most natural to use a pronoun as topic.
If I (as speaker) am sufficiently confident that you (as listener) know who am I talking about to use a pronoun as the topic of my main assertion (She listens to music), I should be just as happy to use pronouns in the part of the sentence that constitutes only background information (when she reads poetry).
For single-clause sentences, the discourse-functional explanation is even simpler (though, of course, there is no backgrounded clause).
In general, it makes pragmatic sense to use a lexical NP (including quantified NPs like everyone) as the topic about which some assertion is made, and a pronoun in a part of the sentence containing information that is secondary to that assertion, but not vice versa.
Harris and Bates demonstrated that if a principle-C-violating sentence is manipulated such that the subordinate clause contains new information and the main clause background information, participants accepted a coreferential reading on a substantial majority of trials (75%).
The exception to this backgrounding account occurs in cases of forward anaphora from a subordinate into a main clause
That add-on to the principle C account makes reference to the same notion of information structure on which the functional account is based. 19 In order to produce even simple single-clause sentences, children need to know (and, indeed, by age three do know; Matthews et al. 2006) certain discourse-functional principles (here, when to use a lexical NP vs. a pronoun).

Principles A and B

Principles A and B govern the use of reflexive (e.g. herself ) vs. nonreflexive (e.g. her) pronouns.
Principle A states that a reflexive pronoun (e.g. herself ) must be bound in its local domain.
Principle B states that a nonreflexive pronoun must be free (i.e. not bound) in its local domain. Effectivel.y, it is the converse of principle A: in a context where a reflexive pronoun (e.g. herself ) must be used, one cannot substitute it with a nonreflexive pronoun (e.g. her) without changing the meaning
Indeed, this is incorporated into UG accounts of binding (Grodzinsky & Reinhart 1993:79).
(Kuno states) Reflexive pronouns are used in English if and only if they are direct recipients or targets of the actions represented by the sentences.
A very similar formulation is that reflexive pronouns denote a referent as seen from his or her own point of view, nonreflexive pronouns from a more objective viewpoint (Cantrall 1974).
Indeed, UG-based accounts propose essentially this very solution. For example, Thornton and Wexler’s (1999) guise-creation hypothesis argues that listeners create two separate guises for the referents (e.g. a person who may be John, and a person who is John).

Interim Conclusion

For all three binding principles, there exist phenomena thatunder any account, UG-based or otherwisecan be explained only by recourse to discourse-functional principles.
The proposed syntactic principle issuccessful only to the extent that it is a restatement of the discourse-based account, and fails when it does not (e.g. for both intersentential and Evans-style contexts).

Conclusion

The present article has argued that, even if no restrictions are placed on the type of innate knowledge that may be posited, there are no proposals for components of innate knowledge that would simplify the learning process for the domains considered.
Each component of innate knowledge proposed suffers from at least one of the problems of linking, data coverage, and redundancyin some cases all three.
The cues and mechanisms that actually solve the learning problem are ones that are not related to UG, and that must be assumed by all accounts, whether or not they additionally assume innate knowledge.
Our challenge to advocates of UG is this: rather than presenting abstract learnability arguments of the form ‘X is not learnable given the input that a child receives’, explain precisely how a particular type of innate knowledge would help children to acquire X. In short, ‘You can’t learn X without innate knowledge’ is no argument for innate knowledge, unless it is followed by ‘ … but you can learn X with innate knowledge, and here’s one way that a child could do so’.