Are Lexical Subjects Deviant?
Hartwell S. Francis, Michelle L. Gregory and Laura A. Michaelis
University of Colorado at Boulder
0 Introduction
The canonical word order of English is taken to be SVO, where S and O are assumed to be lexical (cf. Lambrecht 1987), as in (1) below.
(1) The news coverage showed all the, you know, the guys who didnt get hurt coming home.
In the example in (1) we see the lexical NP the news coverage is the subject of the sentence. While this sentence looks like a typical English sentence (cf. Sapir 1921), the SVO assumption in conversation has been challenged in the literature. Lambrecht (1987:218) suggests SVO may not be the predominant pattern for spoken discourse in any language. Similarly, Du Bois (1987) suggests lexical transitive arguments are highly constrained in conversation across languages. Such arguments are based on a plethura of data concerning the distribution of lexical subjects both in English as well as cross-linguistically. The general finding is that lexical subjects in English conversation are rare (Lambrecht 1994, Givón 1983b, Du Bois 1987).
The rarity of lexical subjects in English conversation coupled with the pronounced asymmetry we find in a corpus of English telephone conversations lead us to consider lexical subjects to be a marked linguistic choice. Given the large asymmetry between pronominal subjects versus lexical, we find that subject position in English conversation is constrained by the Principle of Separation of reference and Role (Lambrecht 1994). In examining the morphosyntactic distribution of the small number of lexical subjects found in the corpus, we conclude that the use of a lexical subject is motivated by one of two competing principles and restrained by the other: the speakers economy versus the hearers economy (Horn 1984).
We begin in §1 with a review of the function of subjects in English and the distribution of lexical versus pronominal NPs. In §2 we discuss the nature of lexical subjects. In §3 we consider the Principle of Separation of Reference and Role as a constraint on subject position in English and propose that speakers who violate this principle do so to conserve effort. In §4 we discuss the morphosyntactic coding of the small class of lexical subjects as evidence of this competition. We conclude in §5 that although lexical subjects in conversation are new, the speaker ensures their recoverability from the discourse through morphosyntactic coding.
1 Subjects denote topics
There is a general agreement among researchers in functional syntax that the grammatical role of subject is the syntactic expression of the discourse role of topic (Foley & Van Valin 1984, Givon 1990, Lambrecht 1994). Mithun (1991:160) is explicit in her statement of the correlation: "the function of subjects is clear: They are essentially grammaticized clause topics."
Gundel (1988:210) provides a particularly clear definition of topic status:
(3) Topic. An entity E is the topic of a sentence, S, iff in using S the speaker intends to increase the addressees knowledge about, request information about, or otherwise get the addressee to act with respect to E.
This definition describes the topic-comment relationship between the subjects and predicates in the corpus under investigation.
(4) She lives, its a, its a fairly large community. She got real lucky, though. She had a boss who, uh, moved into a larger office.
In (4), the speaker intends to increase the hearers knowledge about the referent coded as the pronoun she in subject position in a series of clauses. This example demonstrates two trends in the data: Subjects are pronouns and subjects are topics.
As "the peg on which the message is hung" (Halliday 1970:161), a topical referent represents the argument in a predication whose appearance is relatively predictable (Lambrecht & Michaelis 1998). Predictable arguments are those referents which have either been previously evoked in the discourse or are readily recoverable from the prior context (Prince 1981, Walker & Prince 1996). Evoked referents make the best topics (Lambrecht 1994, Givon 1983b) and this is consistent with pronominal coding. This accounts for the strong cross-linguistic tendency for subjects to be pronouns (Kuno 1972, Du Bois 1986, Prince 1996). Example (5) is typical of the corpus under discussion:
(5) We used to see a husband and wife in there together and they were in the same room which not all husband and wives were.
A new topic, a husband and wife, is introduced in object position. As a clause topic in the clause following the introduction of the referent the couple is referred to with a pronoun in subject position. We find that this tendency toward pronominal subjects is extremely strong in English conversation.
1.1 Distribution of subjects and objects in the corpus
For this study, we examined subjects from a subset of the Switchboard corpus of English telephone conversations (Godfrey, Holliman, & McDaniel 1992). The Switchboard corpus is comprised of approximately 2,400 telephone conversations between unacquainted adults. The participants in the conversations range in age and represent all major dialect groups. From this corpus, we used the 400 conversations that were syntactically parsed (Marcus, Santorini, & Marcinkiewicz 1993). We collected a total of 31,021 subjects of declarative sentences. Of these, 91% are pronouns and only 9% are lexical NPs:
Number |
Percentage |
|
Lexical Subjects |
2,858 |
9% |
Pronominal Subjects |
28,163 |
91% |
Table 1. Subject type distribution for 31,021 declarative sentences.
In contrast, the asymmetry between lexical and pronominal objects is nearly the reverse of that for subjects and nearly as pronounced:
Number |
Percentage |
|
Lexical Objects |
4,921 |
66% |
Pronominal Objects |
2,568 |
34% |
Table 2. Object type distribution for 7,489 transitive sentences.
A comparison of Table 1 with Table 2 suggests subject position is dispreferred for lexical coding. This conforms to the various given-before-new and topic-comment proposals that have been made in the literature. Example (6) provides some insight into the difference.
(6) My sister has a, she just had a baby. Hes about five months old, and she was worrying about going back to work and what she was going to do with him.
A baby is introduced as an indefinite referential lexical NP in object position and then reference to the baby is continued with pronouns, beginning with he as a clause topic in subject position.
In the English conversation data we examine lexical subjects are rare in comparison to pronominal subjects and lexical objects. This tendency is not as pronounced in other genres. In a Wall Street Journal corpus 80% of the subjects are lexical NPs (Roland, p.c.). In the ZPG fund-raising text studied by Prince (1992), 60% of the subjects are lexical NPs. Givón (1990) finds 25.6% of the subjects in spoken English narrative are lexical NPs. Although the range of use of lexical NP subjects is genre related in the corpus under investigation here there is a clear relationship between subject position and pronominal coding (see Roland & Jurafsky, in press, for a discussion of genre effects in corpus study).
Given the large asymmetry between lexical and pronominal subjects, lexical subjects in conversation appear to be deviant. They occur in a position in which lexical NPs are dispreferred; subject position is largely reserved for established topic entities. In what follows, we explore the small class of lexical subjects. Specifically, we are interested in whether lexical subjects denote (a) topical referents and (b) old or new referents.
2 The nature of lexical subjects
Given the small number of lexical NPs in subject position, one must consider whether the general tendencies for subject position, that referents in subject position tend to be topics and evoked, applies to the class of lexical subjects. Through an examination of sentences with lexical subjects, we find that in general, they are topical, but are deviant in the sense that they denote referents new to the discourse.
2.1 Lexical subjects are topical
Many researchers note that there is not a one-to-one mapping between grammatical position and topic (Givón 1983a, Gundel 1988, Lambrecht 1994). Sentence types that are not consistent with topical subjects include argument focus and sentence focus (Lambrecht 1994). When we examine the lexical subjects in the data, we do not find the hallmarks of argument focus or sentence focus as a strong factor in our data. Argument focus constructions, as in the example in (7), have old, or given predicates.
(7) I was the only one who did not catch a single fish. My daughter caught fish, his daughter caught fish, he caught fish.
In (7), the speaker is contrasting himself with the people who did catch fish. The speaker intends to increase the hearers knowledge about who caught fish, not that fish were caught. According to the definition in (3) above, the information that fish were caught is the topic, not the referents in subject position in the three clauses of the second sentence of (7). Although these examples are found in the data, we do not find a large percentage of argument focus constructions.
If a major source of lexical subjects were sentence focus constructions, we would expect high intransitivity as a concomitant of lexical subjects. Ocampo (1993) discusses the occurrence of lexical subjects of intransitive verbs in non-canonical post-verbal position in Spanish focus constructions and the contrast between these lexical subjects and pronominal subjects of transitive verbs. This introduction strategy is not available in English so we would expect such referents in canonical subject position. These lexical subjects select for stative or unacusative verbs, as in (8).
(8) The muscle builds [uh-huh] very rapidly.
In (8), a conversation about exercising on a bike, this sentence is all new information. There is no presupposition, the topic is not represented in the sentence. The subject in (8) is a patient and verb builds is used unaccusatively in this context. These sentence types are relatively rare in our data, as indicated by identical distribution of transitive sentences for both pronominal and lexical subject types. We also do not find that agentivity is less likely in lexical NP subjects as opposed to pronominal NP subjects. From the lack of focus constructions involving subjects and the general tendency for subjects to be topics, we conclude that lexical subjects in conversation tend to be topics.
2.2 Lexical subjects are new to the discourse
Although lexical subjects are topical, they are not necessarily discourse old. Prince (1992) shows that topical referents can actually be new referents. In the analysis of Prince (1981), the referent my brother in (9) is inferrable based on a family frame. Hearers assume speakers have family members.
(9) Context: Conversation about drug testing.
We, that 's been an, a, an issue, uh, in our company even though we don't have the random or even regular drug screening. In fact, they'll have these little parties, and people will just get, I mean I've, my brother lives where I work, and I have many a time called him to come get me, you know.
Prince (1992:305) finds these referents are like hearer new, and therefore discourse new referents, because my brother has not yet been introduced in the discourse. On the other hand, Prince claims that these referents also exhibit characteristics of hearer and discourse old referents in that there must be some antecedent entity (the speaker) in the discourse model that triggers an inference and assumptions about what the hearer knows (the family frame), thus redering my brother inferrable. Givon (1983:10) proposes some topics such as family members "are in the file permanently, and are thus always accessible to speakers/hearers as part of their generic firmament"(emphasis in original). Birner and Ward 1998 take a stronger position following the distinction between hearer old and discourse old established by Prince 1992. In their analysis of word order inversion, they claim that both "inferrable elements and explicitly evoked elements behave as a single class of discourse-old information for the purpose of word order inversion." (1998:178).
Lambrecht (1994:114) argues the speaker exploits the potential for easy activation of the family member referent and "conveys a request to the hearer to act as if the referent of the NP were already pragmatically available". While this liscenses the use of the definite marker, as we see in (10), it does not liscense the use of coding reserved for discourse old entities. Despite the fact that inferrable referents have some characteristics of discourse old entities, in analyzing our data we maintain a strict definition of discourse old: A referent is discourse old if it has been previously mentioned in the discourse. We adhere to this definition because inferrable referents differ from discourse old referents in one important aspect; they cannot be coded pronominally.
(10) Context: Conversation about drug testing.
We, that 's been an, a, an issue, uh, in our company even though we don't have the random or even regular drug screening. In fact, they'll have these little parties, and people will just get, I mean I've, #He lives where I work, and I have many a time called him to come get me, you know.
In example (10), repeated from (9), above, we see that when a pronoun is used in place of lexical NP for the referent my brother the sentence becomes infelicitious. While it is clear that some enities are always part of the discourse model and thus inferrableespecially kinship terms, they are not always discourse old. In this study we take a referent to be discourse new if it has not been previously mentioned in the discourse.
An examination of a sample of the lexical subjects indicates that 85% of the lexical subjects have not been previously mentioned. In this sense, these lexical subjects are new to the discourse. Although we do find lexical NP subjects which denote evoked referents, and whose use is motivated by ambiguity avoidance as in (11), most of the lexical NP subjects are new in the sense discussed above, exemplified in (9).
(11) Context. Conversation about the merits of two highly rated American cars.
What - what attracts you to the Saturns? Or - or of course, we've already talked , you know, the Taurus is safe.
In (11), the use of a pronoun to refer to the Taurus is presumably preempted by the presence of a competitor element, the Saturns, to which the pronoun it might refer. The use of the definite NP the Taurus functions as a return pop in terms of Fox (1987): a reactivation of a topic for which there exist competitors in the intervening discourse segments. In this case, the Taurus was last mentioned 19 turns prior to its mention in (11). Despite the small number of lexical subjects used for ambiguity resolution, most, 85%, of lexical subjects in the Switchboard corpus are new to the discourse.
In the data under consideration lexical subjects are topics. We come to this conclusion based on the lack of argument and sentence focus constructions in the data. Furthermore, lexical subjects in the data are new to the discourse. This conclusion is based on the finding that lexical subject referents have not been previously mentioned. Lexical subjects in the data considered here are unestablished topics. These conclusions, however, cannot be generalized to other genres.
3 Constraints on subject position
The pronominal status of the majority of subjects in the data lead us to propose the position is pragmatically constrained. In her study of the status of entities in subject position, Prince (1992) found subjects represent discourse-old information. In the small written corpus she examined, inferrable referents in subject position were not a significant factor. The preponderance of pronouns, which are considered categorically discourse-old by Prince (1992:304), in subject position in spoken conversation are compatible with this conclusion. Given the asymmetry between pronouns and lexical NPs in subject position, we believe the data here warrant a constraint on referent occurrence in subject position in English conversation.
Several candidate constraints have been proposed in the literature. Chafe (1987) proposes one new piece of information per intonation unit coupled with a light starting point. Du Bois (1987) proposes one new argument per clause and a given transitive subject. Lambrecht (1994) proposes a principle of separation of reference and role (PSRR) stated as a maxim: "Do not introduce a referent and talk about it in the same clause" (p.185). For the purpose of our paper we adopt Lambrecht's PSRR as the constraint on our data. The PSRR specifically addresses topic and, thereby, subject position.
(12) The, the procedure is utterly humiliating. You go in there with the doctor, he makes you take off all your clothes.
In (12) a referent, the doctor, is introduced before any propositional information about the referent. The two tasks, introducing the referent and talking about it are kept separate. The hearer is not required to process information about an unknown referent. The principle of separation of reference and role accounts for the majority of the data. The 91% pronominal subjects (see Table 1) are the result of this constraint on conversation.
3.1 The Principle of Seperation of Reference and Role
The PSRR motivates "a conspiracy of syntactic constructions resulting in the nonoccurrence of NPs low on the [familiarity] scale in subject position" (Prince 1981:247). New referents are kept out of canonical subject position through the use of special constructions. Through these constructions propositional information about a referent occurs apart from introduction of the referent.
A wide range of syntactic constructions have been discussed in the literature (cf. Birner & Ward 1998). Presentational and existential 'there' (Birner & Ward 1998) and the French 'il y'a' construction (Lambrecht 1994) are use to introduce a referent before talking about it. Likewise left dislocation (Prince 1981, Ziv 1994, Birner & Ward 1998, Gregory & Michaelis 1999) is used as in example (4):
(13) I like classical, but I cant deal with opera at all.
And heavy metal, uh, its noisy. Im into some industrial music thats, a bit even harder than that.
A post-posing strategy is used in Spanish when a new referent is coupled with an intransitive verb (Ocampo 1993). In Ocampo's (1993) Spanish conversation corpus brand-new entities are not introduced pre-verbally. In Lambrecht's (1988) study of French conversation, lexical NPs do not occur in canonical subject position.
The PSRR applies cross-linguistically. In a number of languages there are special constructions for introducing new referents. English itself has a range of constructions serving this function. Nevertheless, our data indicate the PSRR can be violated. Entities are introduced and talked about in the same clause:
(14) As soon as he went there, the teacher took one look at him and he threw up again.
Here in (5) the teacher is introduced as the subject topic of a clause. What would drive a speaker to override the PSRR and are violations of the PSRR constrained?
3.2 Lexical subjects as PSRR violations
Given the constraint on introducing and talking about a referent in the same clause and the constructions available to avoid violating the constraint the lexical NP subjects in the corpus under investigation pose a problem.
(15) I mean, the, uh, documentary, the THIN BLUE LINE, pretty much demonstrated that. You know, I don't know for, if you're familiar with that or not.
The referent the documentary is a PSRR violation. It is a discourse-new referent introduced as a clause topic. These discourse new entities as clause topics in subject position are clear violations of the maxim do not introduce a referent and talk about it in the same clause.
We suggest the Gricean competition between the speakers needs and the hearers needs at once lead the speaker to violate the principle of separation of reference and role and to ensure that the violation does not compromise understanding. Horn (1984) provides the interpretation of the Gricean competition we adopt in our analysis. Based on the economic model of Zipf, Horn (1984) reduces the Gricean maxims of quantity and quality to two, presented here as Q1 and Q2:
Q1. Hearer-based lower-bound on information
Say as much as you can.
Q2. Speaker-based upper-bound on information
Say no more than you must.
Q2, say no more than you must, leads the speaker to conflate introduction of a referent and talking about the referent. Two constructions are replaced by one. Q1, say as much as you can, sets the lower bound on information that prevents Q2 from operating unrestrained. Q1 is similar to Clark and Havilands (1977:4) given-new contract in which "the speaker tries, to the best of his ability, to make the structure of his utterances congruent with his knowledge of the listeners mental world". We propose the introduction of discourse new referents as topics in subject position is motivated by the speakers economy, Q2, and constrained by the speakers adherence to the hearers economy, Q1.
(16) I have a opportunity to go to, uh, Paris, France, uh, with my friend in April. She is her family, you know, lives there
This example is also indicative of the constraint that holds on the violation. The hearer-based lower-bound on information 'say as much as you can' keeps the speaker from introducing just any referent in subject position. Speakers who choose to override the PSRR produce referents that are accessible and anchored.
In this study we measure accessibility and anchoring by morphosyntactic coding. In the section that follows we look a definite determination, possessive determination, and pronominal-subject relatives as measures of accessibility and anchoring.
4. Morphosyntactic coding of lexical subjects
The morphosyntactic coding of lexical subjects in the data suggests that they are, on the whole, definite. Table 3 shows a comparison the morphosyntactic coding for subjects and objects for the morphosyntactic categories under consideration in this study.
A/An |
The |
Possessive |
Other |
|
Subjects |
65 (2%) |
1,070 (37%) |
715 (25%) |
1,008 (36%) |
Objects |
1,419 (29%) |
784 (16%) |
346 (7%) |
2,372 (48%) |
Table 3. Distribution of determiners for lexical subjects and objects.
Within the small class of lexical subjects, the majority are formally definite. Table 3 shows a total of 62% of the lexical subjects are determined by a definite article or a possessive pronoun while only 2% are determined by an indefinite article. In contrast, Table 3 shows 29% of the objects are determined by an indefinite article while only 23% are determined by a definite article or a possessive determiner.
4.1 Accessibility
Within the small class of lexical subjects in the corpus the majority are formally definite. As table 3 shows a total of 62% of the lexical subjects are determined by the definite article 'the' or a possessive determiner like 'my' while only 2% of the lexical subjects are determined with the indefinite article 'a' or 'an'. In contrast, table 3 shows 29% of the objects are determined by 'a' or 'an' while only 23% are determined by the definite article or a possessive determiner. Lexical NPs in subject position are more likely to be definite.
As definite NPs, lexical subjects are more accessible. According to the Givenness Hierarchy of Gundel et al. (1993), definite referring expressions are at least uniquely identifiable. The hearer can identify the referent on the basis of the NP alone. In example (17), the referent is uniquely identifiable.
(17) The, uh, Governor, you know, has been trying to decide whether he's going to commute it or not.
In the data lexical subjects tend to be at least uniquely identifiable.
The definite lexical subjects in our data belong to two classes. The first class comprises those NPs which denote evoked referents, and whose use is motivated by ambiguity avoidance. This class is discussed in 2.2 above. The majority of the definite lexical subjects in the data, however, 'top out' at inferable status, they are no more than inferrable from the context of their use. They are discourse new referents according to our definition of discourse new.
(18) She sent him to kindergarten. As soon as he went there, the teacher took one look at him and he threw up again.
In (18) the definite lexical subject the teacher is inferable from the kindergarten background. The lexical subject triggers what Clark and Haviland (1977) refer to as a bridging inference, "the speaker must enable the listener to compute an antecedent that is unique" (p9). In this case, the referent is identifiable by virtue of belonging to a semantic frame that is currently active. The passage in (19) provides another example.
(19) uh, actually I lived over in Europe for a couple of years, I lived in Germany and in Germany they don't have the jury system. What they do is they have, uh, three judges, basically, and you get up there and the prosecuting attorney presents his evidence ...
In (19) the NP the prosecuting attorney denotes an entity which although new to the discourse is nevertheless highly recoverable by virtue of its relationship to the previously evoked court frame.
In these examples the lexical subjects cannot be described as established in the discourse. These lexical subjects share definite morphology and they refer to uniquely identifiable referents. They are recoverable from the context in which they are used.
4.2 Anchoring
This section involves referents that are rendered recoverable by virtue of a link to a discourse-active entity, in particular the speaker. As Prince (1981:236) says,
"A discourse entity is anchored if the NP representing it is linked by means of another NP or anchor properly contained in it to some other discourse entity." We discuss two anchors here, possessive determiners and relative clauses.
As seen in table 3 above, pronominal determiners such as my or her are more frequently associated with lexical subjects than with lexical objects. 25% of lexical subjects are modified with possessive determiners. Only 7% of lexical objects are modified with possessive determiners.
(20) A: I'm a single mother. I have three children.
B: Oh, I see, uh-huh.
A: So, uh, right now, we're on, we get, you know, aid from the state at this point because there's no other way to do it. And my ex-husband just sort of took off and doesn't pay child support.
B: Oh dear.
In example (20), the discourse new ex-husband is anchored to the speaker through her use of my. The frame is deictically established in this case.
Table 4 shows the distribution of object-trace and subject-trace relative clauses in the data.
Subject relativization |
Object relativization |
|
Lexical Subject |
102 (29%) |
244 (71%) |
Lexical Object |
249 (60%) |
164 (40%) |
Table 4. Distribution of relative clause types for lexical subjects and objects.
Object relativization occurs in 71% of the lexical subjects that are post-modified with a relative clause. This type of relative clause anchors the discourse new referent to some discourse active frame (Fox & Thompson 1990), as in example (21).
(21) Our friend, the President, right now, says no new taxes. We should and especially, if anything, be cutting taxes now because of the recession and at the same time, the budget he sent to Congress has tax and fee increases, so, uh, I know the politicians, uh, aren't straightforward.
The discourse new budget is anchored to the President. The pronominal reference to the President in the relative caluse guides the hearer to relate the budget to an entity in the discourse.
In contrast to these object-trace relative clauses, table 4 shows the majority of the lexical objects in the data which are post-modified with relative clauses are post-modified with subject-trace relative clauses.
(22) We do oil well services. So, a lot of our clients are oil companies, big oil companies, and they go out to, we have engineers who, uh, go out to the oil well, to the client's oil well, and work with a lot of heavy equipment and put tools down the oil well and stuff.
In (22) the discourse new engineers is the trace subject of the relative clause. The new referent is introduced as the object of have. There is no need to anchor it to the discourse as there is to anchor the budget in (21). The difference is that (21) is a violation of the PSRR and (22) is not.
Lexical subjects denote more recoverable referents in general than lexical objects. Measured in terms of pronominal possessive determiners and type of relative clause modifier lexical subjects are recoverable referents.
5 Conclusions
References
Birner, Betty J., and Gregory Ward. 1998. Information Status and noncanonical word order in English.
Chafe, Wallace. 1987. Cognitive constraints on information flow. In Russell Tomlin, ed., Coherence and Grounding in Discourse. Amsterdam: John Benjamins.
Clark, Herbert and S.E. Haviland. 1977. Comprehension and the Given-New Contract. In R. Freedle, ed., Discourse Production and Comprehension. Hillsdale, NJ: Lawrence Erlbaum.
Dubois, John. 1987. The Discourse Basis of Ergativity. Language 63: 805-855.
Fox, Barbara A. and Thompson, Sandra A. 1990. A Discourse Explanation of the Grammar of Relative Clauses in English Conversation. Language 66: 297-316.
Givón, Talmy. 1990. Syntax: A functional typological introduction. Amsterdam: John Benjamins.
Givón, Talmy. 1983a. Topic continuity in discourse: An introduction. In Talmy Givón, ed., Topic Continuity in Discourse. Amsterdam: John Benjamins.
Godfrey J., E. Holliman and J. McDaniel. 1992. SWITCHBOARD: Telephone Speech Corpus for Research and Development. Proceedings of ICASSP-92, San Francisco. 517-520.
Gregory, Michelle L. and Laura A. Michaelis. 1999.
Gundel, Jeanette K.; Nancy Hedberg; and Ron Zacharski. 1993. Referring expressions in discourse. Language 69.274-307.
Halliday, M.A.K. 1970 Language structure and language function. In John Lyons, ed., New Horizons in Linguistics. Baltimore: Penguin Books, Ltd.
Horn, Laurence R. 1984. Toward a new taxonomy for pragmatic inference: Q-based and R-based implicature. In Deborah Schiffrin, ed., Meaning, Form and Use in Context: Linguistic Applications. Washington, DC: Georgetown University Press.
Lambrecht, Knud. 1994. Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge: Cambridge University Press.
Marcus, Mitchell, Beatrice Santorini & May Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The penn treebank. Computational Linguistics 19.2:313-330.
Mithun, Marianne. 1991. The role of motivation in the emergence of grammatical categories: The grammaticization of subjects. In Elizabeth C. Traugott and Bernd Heine, eds., Approaches to Grammaticalization: Volume 2. Amsterdam: John Benjamins.
Prince, Ellen. 1981. Toward a taxonomy of given-new information. In Peter Cole, ed. Radical Pragmatics. New York: Academic Press.
__________. 1992. The ZPG Letter: Subjects, definiteness, and information-status. In William C. Mann and Sandra A. Thompson, eds., Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. Philadelphia: John Benjamins.
Roland, Douglas and Daniel Jurafsky. To appear. Verb sense and verb subcategorization probabilities. In Suzanne Stevenson and Paola Merlo, eds. Papers from the 1998 CUNY Sentence Processing Conference. Philadelphia: John Benjamins
Walker, Marilyn A. Ellen F. Prince. 1996. A Bilateral Approach to Givenness: A Hearer-Status Algorithm and a Centering Algorithm. In Thorstein Fretheim and Jeanette K. Gundel,eds., Reference and Referent Accessibility. Philadelphia: John Benjamins.
Ziv, Yael. 1994. Left and right dislocations: Discourse functions and anaphora. Journal of Pragmatics 22: 629-645.