The Switchboard Corpus

We used tgrep, a grep-based tool that enables the user to search for syntactic parterns in syntactic parsed corpora, to extract subjects from a subset of the Switchboard corpus of English telephone conversations (Godfrey et al. 1992).  The Switchboard corpus is composed of approximately 2,400 telephone conversations between unacquainted adults. The participants in the conversations vary in age and represent all major US dialect groups.  From the corpus we used the 400 conversations that had been syntactically parsed (Marcus et al. 1993). We collected a total of 31,021 subjects of declarative sentences.

Links

Subject type distribution
Object type distribution
Home