Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Prompt 3: Domain-specificity in Language Learning
- Is it necessary to possess language-specific innate biases for the possibility of language acquisition in a limited timeframe given limited data? Noam Chomsky answered affirmatively and presented a series of arguments that are renowned as the Poverty of Stimulus (PoS) arguments to support his answer. However, the arguments have drawn both major supporters and harsh critics. Laurence and Margolis (L & M) [1] presented a strong defense for the force of PoS-styled arguments while countering many of the critics. This paper is primarily a critique against L & M’s presentation and defense of the strength of the PoS arguments. I argue that PoS arguments are largely insufficient to settle the starting question.
- 1. Heart of the Debate
- At the two ends of the debate, in regards to the initial question, are the nativists and the empiricists. One way to distinguish the two positions could be in terms of the belief in “innate” biases for language learning - nativists believe in such innate biases and empiricists don’t. However, any such differentiation would be problematic. First, it’s hard to lock into a legitimate meaning of “innate”. Second, whatever is meant by “innate”, neither of the sides generally disagrees that there are innate biases priming language learning. Even empiricists can believe that there are some innate structures and biases that make our brains more capable of learning languages than rocks. The real disagreement is on the matter of what kind of innate biases we have. Particularly, the nativists of interest believe that at the beginning of our first language acquisition, we already have language-specific biases that guide our language learning whereas empiricists believe that we don’t (empiricists should be able to allow the development of language-specific biases at a later point in time from linguistic data through a language learning process starting from domain-general biases). Note that the manner in which I am distinguishing empiricists and nativists here is not necessarily consistent with the orthodox or the conventional distinction. The division here is focused solely on what I believe is truly at the heart of the debate. If need be, the empiricists and nativists as defined here can be treated as some new positions empiricism* and nativism*.
- In this paper, I define language-specific biases to be those that are only useful or applicable to the learning of linguistic skills but not useful or applicable to learning tasks and skills from other domains such as facial recognition or object cognition. Empiricists, as I defined them, can allow any biases (at the beginning of language-learning) that are not specific to language or they can even discard biases altogether. However, I think completely unbiased learning is implausible; even some appeal to simplicity is a form of bias. Moreover, biases that are specific to non-linguistic domains, by definition, would be irrelevant to the linguistic domain that is the subject of our discussion. So the only kind of biases that are of relevant concern here, for our empiricists, are domain-general biases. I define domain-general biases as biases that are useful and applicable for multiple (if not all) domains.
- Note that by “learning” I do not generally mean exclusively an explicitly rational and intentional level of learning; I also include sub-personal levels of learning. I doubt realistic cases of the former can be cleanly separated from the latter types of learning. Thus, I believe, excluding the term “learning” to only the former kind (like Fodor tries to do) can be problematic. Below I present the standard PoS argument and later I discuss it from the perspective of the two opposing sides of interest in this debate.
- 2. The Poverty of Stimulus (PoS) Argument
- The PoS argument is meant to support the nativist claim that there must be some language-specific biases as a helping hand to guide us in learning the correct grammatical/linguistic principles. The standard PoS argument as presented by L & M can be reconstructed as follows:
- P1: Either we have language-specific biases at the beginning of learning or not.
- P2: There are indefinite (linguistic) principles that are compatible with primary linguistic data but only a (comparatively) few are correct.
- P3: The correct principles can be either selected a priori or a posteriori based on data.
- P4: The correct set of principles cannot be a priori selected without initial language-specific bias (because the principles are not the simplest or most natural in any pre-theoretic sense).
- P5: There are certain classes of finite data series D from which it is impossible to select, a posteriori, the correct linguistic principles (or hypothesis) P without language-specific biases.
- P6: Human children learn correct principles P from D type classes of data.
- C: Human children (and by extension, humans in general) possess language-specific biases at the beginning of language learning.
- The crux of the argument is to stress the impoverishment of the sort of data we get in comparison to the sort of things that we learn from that data. Essentially, there are indefinite principles and hypotheses that are consistent with primary linguistic data yet children can invariably select, roughly, the correct ones. The selection cannot be done a priori based on domain-general biases like a bias towards simplicity or naturalness because the correct principles are neither the simplest nor the most natural in any obvious sense. Thus, for an empiricist, the only possible explanation of children’s acquisition of correct linguistic principles would be that the correct principles can be learned from experiential data based on some domain-general biases. However, the sort of data that children have access to is purportedly utterly insufficient in selecting the correct principles based solely on domain-general principles. This suggests that the empiricists are mistaken and that there are some language-specific biases helping us.
- Much of the heavy lifting in the above argument is done by P5. However, there is a relatively trivial way to refute P5. One could argue that there is always a possibility for a domain-general way to select the correct principle from any data - the way of random-selection. A random-selection model can randomly select the correct principle even when no data is given. However, it's implausible that children use random-selection methods to get to the relevant correct principles P, particularly, if multiple children arrive at the same principles quite reliably and roughly, invariably. So we can add a qualifier to P5 to exclude random selection. Still, it’s not prima facie obvious that P5 and P6 can both be true at the same time. The premises need support. Below, I analyze how well they can be supported.
- 2.1 Analyzing the Support for P5 and P6: One of the strongest ways to provide support for P5 and P6 can be to provide concrete examples backed by experimental evidence. Many such examples have been provided. These concrete examples usually attempt to demonstrate that children can learn some correct principles P from certain classes of data distribution D even when the crucial evidence that is indispensable to learn P, when starting from domain-general biases, is absent from D. This suggests that some language-specific biases are compensating for the absence of crucial evidence.
- In practice, however, as Pullum and Scholz [2] showed, most of these experiments fall short in establishing either the correctness of P or the absence (inaccessibility) of the relevant crucial evidence from D. Besides that, the indispensability of the purported crucial evidence also often remains suspicious. A general problem with PoS defenses is that there is very little specification about what the nativists have in mind as “domain-general biases”. Often L & M, for example, limits “domain-general biases” to bias to simplicity, elegance, or naturalness (all of which are vague). However, a domain-general bias can be much more: it can be a bias towards approximate simplicity (for example, at times, local adjustments may be made to a hypothesis at hand to fit the given data instead of recomputing to find the globally optimally simple hypotheses), it can be a bias towards seeking circumstance-invariant principles associated with stable correlations, it can be a bias towards creating predictive models to constantly make estimates of the next sensory signals and constantly update the model to minimize long-term prediction error (i.e. the so-called prediction error minimization principle from the paradigm of predictive processing), it can be a bias towards minimizing mutual information between sensory inputs and intermediate representations without severely compromising the aforementioned prediction error to extract useful (for predictive success) information from raw sensory signals and develop stimulus-independence, and so on. Even compositional thinking and recursive processing can be considered as “domain-general” biases because besides language, they could as well be applicable for visual intelligence and other related tasks. For example, recursive processing can be useful for tracking motion of objects motion through time or any temporal phenomena in general. There can be extremely powerful combinations of these domain-general biases that can plausibly address many of the difficulties that L & M posed for empiricist learners (learners starting from no language-specific biases).
- Anyway, even if we focus only on bias towards simplicity, it’s not clear how PoS style arguments and experiments really succeed. Nativists don’t generally develop a formal notion of simplicity (or use any already developed one like Kolmogorov Complexity) to show that the simplest set of principles that is consistent with the sort of data accessed by a typical children does not also include or imply the correct principles the knowledge of which are in fact demonstrated by typical children. A rule that is prima facie locally simple for a few examples may not be included in the set of simplest rules that is derived by the empiricist learner to model a much broader corpus of accessible data including the examples. Yet most concrete experimental instances of PoS arguments refer to precisely these kinds of prima facie locally simple “incorrect” rules as something that should be inferred by an empiricist learner to make their point. It’s quite plausible that a simplicity-biased learner would go beyond modeling construction-specific rules and opt for a smaller set of context-invariant powerful and general principles of language with maximal explanatory power (high fittingness to data). According to L & M, even modern linguists themselves have gone beyond construction-specific rules and instead rely on “a smaller number of powerful general principles that only work in interaction with one another” [1, Page 229]. L & M made several arguments as to why an empiricist learner cannot learn those powerful general principles (PGPs) but all of them can be countered:
- Argument (i): It took decades for linguists to empirically discover these PGPs. Thus, they are not likely to be simple. Also there are many unintuitive rules.
- Counter (i): What is at stake here is the degree of correctness of an empirically derived “simplest” model that fits the data, not just the “absolute simplest” model. The former need not be easy to comprehend, intuitive, or easy to discover empirically when attempted more explicitly at the level of intentional operations even if it is one of the simplest models to fit the data.
- Argument (ii): A simplicity-biased learner has to model both the correct interactions and correct principles. It will often be indeterminate if a false prediction is from an incorrect interaction or incorrect principle and which to reject or which to accept (reminiscent of Quine’s indeterminacy arguments).
- Counter (ii): In principle, if there is indeterminacy, the learner can preserve multiple hypotheses weighted by some credence (and the prediction can be a weighted summation - this is the principle behind model ensembling and Bayesian marginalization). Upon further data (or experience), implausible hypotheses that don't fit the new data can be rejected. If this learning is occurring at a sub-personal level we can’t infer how capable they can be from common-sense phenomenology. The sub-personal mechanisms behind the shadows of consciousness may even maintain thousands of models. While biases to preserve multiple models may step beyond basic simplicity bias, these are all still, by and large, domain-general. With these, we can also plausibly avoid being stuck on “dead-end” hypotheses.
- Argument (iii): Linguists have taken decades to learn PGPs empirically. Mere children cannot just learn such difficult concepts without initial language-specific biases.
- Counter (iii): Linguists are trying to come up with an explicit rational formulation. This can be hard and time-consuming to do but that doesn’t mean a primarily sub-personal process (an empirical learning process can be primarily sub-personal) that keeps on incessantly learning these principles implicitly from data would take up as much time even without any help from language-specific biases.
- Argument (iv): Linguists utilize a lot more data including cross-lingual data to get to PGPs. Children typically don’t have that much data.
- Counter (iv): Even if linguists require assistance from cross-lingual data, it’s not clear that kind of data is, in principle, necessary for a domain-general learning process to learn PGPs. If PGPs, at least, apply to a single language, it is still a data-compatible set of principles for that language and it’s not clear if it would not also be the simplest set of principles for that language. Moreover, it’s not clear how correct the derived PGPs are, or if children really need to derive the exact PGPs and their interaction rules. What is at stake here if an empiricist learner given a reasonable timeframe and a reasonable series of data can derive a reasonable set of principles (implicitly or explicitly) and demonstrate similar linguistic behaviors that are typically empirically observed from a human who is given a similar timeframe and data. It’s possible that the empiricist learner can do that without deriving the exact PGPs or without deriving all the correct principles and interaction rules.
- While L & M presents many other arguments to support P5 and P6, they are mostly similar in spirit to what we discussed so far; and all of them suffer from similar issues - they either severely underestimate what a domain-general learning process can amount to or rely on unsubstantiated claims.
- 3. Conclusion
- I am not necessarily trying to side with the empiricists nor do I find the possibility of existence of language-specific biases before the start of language acquisition particularly implausible. It’s quite possible to have such biases encoded in our biological structures thanks to natural selection. Even some initially random brute biases and quirks developed for reasons unrelated to language can turn “language-specific” if those biases become roughly invariant (universal across humans) and at the same time possess some influence (even if nothing interesting for which the bias needs to be selected for) over emergence of natural languages. However, whatever the case is, I don’t believe that the PoS arguments themselves, as presented by L&M, is persuasive enough for a reasonable agnostic to be led to believe in the necessity of initial language-specific biases for language acquisition.
- References
- [1] Laurence and Margolis, ‘The Poverty of the Stimulus Argument’
- [2] Pullum & Scholz, ‘Empirical Assessment of Stimulus Poverty Arguments’
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement