CCCC 2014 - Rhetory.com

Report
FYC corpus: an introduction and
overview, with preliminary findings
Exploring the question or ‘orality’ empirically
with a controlled data set
Daniel Kies
Department of English
College of DuPage
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
We noted several items related to the question of orality:
Growing concern for a “shift to orality” and consequently a
degeneration, degradation, and overall diminishment of the
English language

For example consider the next slide.
3
CCCC 2014
Indianapolis 20 March 2014
Comments in the popular media
Students use texting language in papers at university. Help!
I am not an English teacher, but I just started teaching at an
American college and I have found that several students sometimes
substitute a single number or letter for a word. One student used
"4" instead of "for" throughout his entire paper. Another wrote "U"
instead of "you." It was the kind of writing that you would expect to
see in a text message.These students are still required to take
English no matter what subjects they choose to major in, so it is
hard for me to understand why they make mistakes like these. I
have to assume that it is intentional laziness rather than a real
error, but this makes it harder to correct.
Source: http://www.usingenglish.com/forum/threads/137474Students-use-texting-language-in-papers-at-university-Help

4
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
We noted a second trend in this line of thought:
The blame is usually attributed to the wide-spread adoption of
communications technology by the millennial generation

For example, see the next slide
5
CCCC 2014
Indianapolis 20 March 2014
Comments in the popular media (2)

Teenagers who frequently use 'techspeak' when they text
performed poorly on a grammar test, said Drew Cingel, a former
undergraduate student in communications at Penn State.

When tweens write in techspeak, they often use shortcuts, such as
homophones, acronyms and omissions of non-essential letters such
as 'wud' for 'would.’

Source:
http://www.telegraph.co.uk/education/educationnews/9432222/Texting-isfostering-bad-grammar-and-spelling-researchers-claim.html
6
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
Finally, we observed:
Parallels between Birkerts’ and Ong’s explorations of linguistic
change triggered by technological innovation seen in the shift
from pre-literate (what Ong called “primary orality”) to
literate cultures reflected in the “secondary orality” (Ong
1982) some researchers believe to exist in the contemporary
technological, cultural, and linguistic environment

And similar remarks are found in the professional literature
7
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
For example, Clive Thompson summarizing the L&L
Standford Writing Project study:
“Technology isn’t killing our ability to write. It’s reviving it—
and pushing our literacy in bold new directions.”
Bauerlein in the CHE concludes of the same study:
“I think we can say that instead of dispelling fears about the
impact of technology on student writing, the Lunsford study
raises them to a new level.”
Mark Bauerlein, 2008, “The Lunsfords on Student Writing” Chronicle of Higher Education.
http://chronicle.com.cod.idm.oclc.org/blogs/brainstorm/the-lunsfords-on-studentwriting/6148
8
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
Fears for the future of writing (1)

Experts say that children write more these days than they did 20
years ago, because of texting and social media. Most of that writing,
however, is in text-speak, and that form of language becomes a bad
habit. Students are now so used to writing in text-speak that they
can’t easily remember (or apply) proper language rules.

Communication is becoming more global in scope and more
electronic in form. By the time these children finish school and enter
the workforce, this decline in the spoken (sic) word will become
greater.Written communication, in a formal report, an email, or even
a text, isn’t just happening on the colloquial level anymore, and
children need to be educated on how to use technology in formal,
professional contexts.

Source: http://www.telegraph.co.uk/education/educationopinion/9966117/Text-speaklanguage-evolution-or-just-laziness.html
9
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
Fears for the future of writing (2)

Rebecca Gemkow, a Lyons Township High School English teacher,
said she believes it is crucial for teenagers to recognize the
difference between social and academic writing in order to be
successful in the real world.

“I feel that all of the online opportunities and the time spent with
such opportunities puts students at a deficit when it comes to
producing sophisticated writing,” she said. “In result, there is a much
greater responsibility put on teachers to help rectify the situation so
that students will be prepared for the rest of high school, as well as
post-high school writing.”

Source: http://www.suntimes.com/news/education/4600849-418/teachers-students-seetexting-lingo-popping-up-in-school-writing.html
10
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
Background of our research corpus (1):

First-year composition (FYC) corpus, over 7 million words drawn from the
academic writing of the general population of students in first-year writing
classes at a community college in America’s Midwest.

The corpus spans the period 1989 - 2013, and thus allows for a comparison
of student writing over the time period beginning with the adoption of the
world wide web and search engines by the general population, and the
present, when electronic texts are pervasive.
11
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
Background of our research corpus (2):

The FYC corpus is from the same composition courses taught by the same
instructor over the period. This stability produces highly comparable data in
terms of writing topics, and reduces variability that might have been due to
different instructors’ pedagogical styles or abilities.

The writing prompts were intended to elicit essays in different academic
genres such as summary, review of an article, argumentative/persuasive
essay, descriptive/comparative response, analysis of persuasive writing, and
definition. Major topics were the future of books, The Gutenberg Elegies,
literacy, and in the second semester students typically wrote academic
research essays on topics related to the Orwell’s 1984.
12
CCCC 2014
Indianapolis 20 March 2014
The Genesis of the Project
Background of our student writers:

All students have similar backgrounds



cultural,
linguistic, and
socio-economic

Most students come from the western suburbs of Chicago
that surround the college

All students have similar educational achievements
13
WRAB III
Paris, France 19 February 2014
Research questions
What are the general features of first year composition
students’ writing?
What are the principal markers of orality?
Is there any evidence of a shift to orality in first year
students’ writing over time?



14
CCCC 2014
Indianapolis 20 March 2014
Methodology (1)
1. Review of previous research on differences between oral and written text.
(e.g. Ong, Halliday & Matthiesson, 2004, O’Donnell, 1974, Chafe, Tannen)
2. Selection of comparable written texts (Orwell’s 1984 and Birkerts’
Gutenberg Elegies essays)
3. Conversion of word files to machine/software readable unicode text files
4. Parsing of 1984 texts using UAMCorpusTool (O’Donnell).
5. Analysis of general linguistic features using UAM and generation of
descriptive stats.
6. General comparison with Biber’s (1988) Mean frequencies for academic
prose and face to face conversation. (Not all categories are easily
comparable).
7. Finer analysis of wordlists using Wordsmith Tools 6 (Scott)
8. Concordancing of specific features using WSTools 6.
Future research: More fine-grained analyses. Factor analysis (Biber, 1988,
2006).
15
CCCC 2014
Indianapolis 20 March 2014
Methodology (2)
Tools:



WordSmith Tools (Mike Scott)
UAMCorpusTool (Mick O’Donnell)
AntConc (Laurence Anthony)
Materials:


The pronoun study corpus: 100,000 words on Birkerts’ Gutenberg Elegies.
The verb study corpus: student research essays on George Orwell’s 1984.
 Sub-corpus 1: 1998-99 (449,706 words)
 Sub-corpus 2: 2012-13 (363,157 words)
16
CCCC 2014
Indianapolis 20 March 2014
Methodology (3)
Techniques:
Establishing sets of metrics from earlier research to
provide a means to measure the orality of the students’
texts:

Biber et al. (2006) examined a range of university registers, both spoken
and written (T2K-SWAL corpus).

Includes a wide range of spoken registers such as classroom instruction,
office hours, and service encounters, and written academic registers such
as textbooks and administrative texts, but no student writing.

The T2K-SWAL corpus provides a useful backdrop against which to
compare student writing, but it does not examine the texts of novice
writers.
17
CCCC 2014
Indianapolis 20 March 2014
Methodology (4)
Techniques:

Compare student corpora against the academic registers in
corpora.byu.edu (Mark Davies)

That corpus focuses largely on cross-disciplinary, academic
journal articles.
18
CCCC 2014
Indianapolis 20 March 2014
Claims for writing (1)
Writing has been claimed to be:






More structurally complex and elaborate
More explicit
More decontextualized/autonomous
Less personally involved/ more detached or abstract
Higher concentration of new information
More deliberately organized
(Biber, 1988, p. 47).
19
CCCC 2014
Indianapolis 20 March 2014
Claims for writing (2)
The theoretical notion of register (field, mode and
tenor) from systemic functional linguistics postulates
a number of features that distinguish orality i.e. “very
spoken” or conversational English from “very
written” genres such as academic texts.
(Halliday & Matthiessen, 2004)
20
CCCC 2014
Indianapolis 20 March 2014
Claims for writing (3)
Some markers of orality are:
in terms of field,
 a tendency to focus on subjective experience;
in mode,
 reduction in social distance between interlocuters;
in tenor,
 a tendency to focus on subjective experience;
 lower lexical density,
 higher grammatical intricacy, and
 the predominance of generalized “hypernomic” lexical
items over more abstract or obscure meanings (e.g. went
rather than walk or stagger).
(Halliday & Matthiessen, 2004)
21
CCCC 2014
Indianapolis 20 March 2014
Claims for writing (4)

Corpus-based research by Biber, Johannsen, Leech, Conrad, &
Finegan (1999) showed significant differences between
academic and spoken text.

For example, 45% of the lexical verbs in spoken texts were
represented by just 12 key words (words like say, make,
think, and get).

First and second person pronouns were much more common
in spoken than academic texts.
22
CCCC 2014
Indianapolis 20 March 2014
More recent research since Biber 1988

Spoken/written dichotomy is inadequate.

Biber et al. (2006) proposed seven dimensions that cut across
academic discourse in the university context:
 “a fundamental oral/literate opposition” … holds between
spoken and written modes “regardless of purpose,
interactiveness, or other pre-planning considerations.”
(Biber, 2006, p. 186).
23
CCCC 2014
Indianapolis 20 March 2014
Characteristics of spoken vs written text in
academic contexts

Some key findings of Biber (2006):
24

Present tense is the most common tense in academic texts,
both spoken and written. Humanities have the greatest
proportion of past tense at 40%. However, these tend to be in
connection with historical events rather than personal
narratives.

95% of written and 90% of spoken academic registers use
simple aspect.

Active voice is much more common than passive (80% active
in written academic registers and 90% in spoken registers.)
CCCC 2014
Indianapolis 20 March 2014
Spoken vs written registers

Biber et al. (2002) found “strong polarization between
spoken and written registers.” Demarked as
“dimensions.”
Written (regardless of purpose) is
 informationally dense, (Dimension 1),
 non-narrative focus (Dimension 2),
 elaborated reference (Dimension 3),
 little overt persuasion (Dimension 4), and
 impersonal (Dimension 5).
25
CCCC 2014
Indianapolis 20 March 2014
University registers
(Biber, Conrad, Reppen, Byrd, and Helt, 2002)
Written (e.g. textbooks, syllabi,
administrative info.)
Spoken (e.g. lectures, labs. study
groups, office hrs)
Information-dense
Involvement and interaction
(D1)
Non-narrative focus (D2)
Non-narrative focus (D2)
Elaborated reference (D3)
Situated reference
Little overt persuasion (D4)
More overt persuasion
Impersonal style
Less impersonal in style
(D5)
To study ‘orality,’ I concentrated on the syntactic patterns that mark Dimension
1 in the table above.
26
CCCC 2014
Indianapolis 20 March 2014
Oral and literate discourse compared on
Dimension 1:
Positive features for orality:
“interactiveness and personal involvement (1st and 2nd person
pronouns, WH questions), personal stance (e.g., mental verbs, thatclauses with likelihood verbs and factual verbs, factual adverbials,
hedges), and structural reduction and formulaic language (e.g.,
contractions, that- omission, common vocabulary, lexical bundles)”
(p. 186.)
These features contrast with literate discourse:
“informational density and complex noun phrase structures
(frequent nouns and nominalizations, prepositional phrases,
adjectives, and relative causes) as well as passive constructions” (p.
186.)
27
CCCC 2014
Indianapolis 20 March 2014
Dimension 1: Oral vs Literate discourse
(Biber et al. 2004)
POSITIVE LOADING
contractions, pronouns, verbs, adverbials
Contractions
Pronouns: demonstrative
Pronouns: it
Pronouns:1st person
Verbs: present tense
Adverbials: time
Adverbs: common
Pronouns: indefinite
That-omission
NEGATIVE LOADING
nouns, adjectives, passives
Nouns: nominalizations
Word length
Prepositional phrases
Adjectives: attributive
Passives: agentless
Passives: postnominal
Type/token ratio
Common adjectives: relational
Relative clauses
CCCC 2014
Indianapolis 20 March 2014
Results:
Common Assertions 1 (Smileys & Emoji)
No emoticons appeared in the corpus, except in one
paper: a paper about internet related language changes
 For example, see the opening of the student’s paper:
____________________________________________

Textspeak Has the Sustenance Teenagers Want
Textspeak; Netspeak; Chatspeak; these names are given
to the “language” of text messaging and instant
messaging, but these terms all have the same origin:
Newspeak. …
29
CCCC 2014
Indianapolis 20 March 2014
Results:
Common Assertions 2 (txtng abbr)
No abbreviations related to text (SMS) messages appeared
in the corpus, e.g.:
AAMOF
ADN
AFAIA
AFAIC
AFAIK
BTW
CU
CUL
DEB
30
gf
GMTA
HTH
IC
IIRC
ITSFWI
IMO
IMCO
IMHO
LOL
NBD
NOYL
NTYMI
OIC
OOTQ
PITA
PTB
POV
RO(T)FL
ROFLMAO
RTFM
SEP
SNAFU
STFU
TIA
TOBG
TPTB
CCCC 2014
TTFN
TTUL
TYVM
WB
WRT
WYSIWYG
WTG
YGLT
YMMV
Indianapolis 20 March 2014
Marker of Orality 1: Contractions
The number of contractions decreased by a factor of 10
between 1998 and 2013:


Total contractions 1998-99 subcorpus: 1183
Total contractions 2012-13 subcorpus: 193
CCCC 2014
Indianapolis 20 March 2014
Marker of Orality 2: Pronoun it
The number of instances decreased between 1998 and
2013:


Total in 1998-99 subcorpus: 4240
Total in 2012-13 subcorpus: 3262
CCCC 2014
Indianapolis 20 March 2014
Marker of Orality 3: Demonstrative Pronouns
The number of instances decreased between 1998 and
2013:


Total in 1998-99 sub-corpus: 13723
Total in 2012-13 sub-corpus: 9451
CCCC 2014
Indianapolis 20 March 2014
Marker of Orality 4: Pro-verb do
The number of instances decreased between 1998 and
2013:


Total in 1998-99 sub-corpus: 2377
Total in 2012-13 sub-corpus: 1363
CCCC 2014
Indianapolis 20 March 2014
Marker of Orality 5: First person pronouns
The number of instances decreased between 1998 and
2013:


Total in 1998-99 sub-corpus: 6345
Total in 2012-13 sub-corpus: 3889
CCCC 2014
Indianapolis 20 March 2014
Marker of “very written” Text 1:
Nominalization
The number of instances increased between 1998 and
2013:


Total in 1998-99 sub-corpus: 3796
Total in 2012-13 sub-corpus: 5851
CCCC 2014
Indianapolis 20 March 2014
Marker of “very written” Text 2: Word length
The number of instances 1.3 million words between 1998
and 2013:


Total in 1998-99 sub-corpus: 4.730 characters/word
504320 total word count
Total in 2012-13 sub-corpus: 4.934 characters/word
406281 total word count
CCCC 2014
Indianapolis 20 March 2014
Marker of “very written” Text 3: Prepositional
phrases
The number is insignificant between 1998 and 2013:


Total in 1998-99 sub-corpus: 87079
(0.172 prep/total word count)
504320 total word count
Total in 2012-13 sub-corpus: 57114
(0.141 prep/total word count)
406281 total word count
CCCC 2014
Indianapolis 20 March 2014
Marker of “very written” Text 4: Passives
The number of instances increased between 1998 and
2013:


Total in 1998-99 sub-corpus:
Total in 2012-13 sub-corpus:
7740
12983
CCCC 2014
Indianapolis 20 March 2014
Marker of “very written” Text 5: Attributive
Adjectives
The number of instances increased between 1998 and
2013:


Total in 1998-99 sub-corpus: 1537
Total in 2012-13 sub-corpus : 3847
CCCC 2014
Indianapolis 20 March 2014
Comparison of lexemes across word classes
Feature
1998-9 research papers
2012-13 research papers
449,706 words
363,157 words
% of total (Frequency/1,000 words)
% of total (Frequency/1,000 words)
Noun
32.72% (327)
32.60% (326)
Verb
16.23% (162)
16.53% (165)
Adjective
0.37% (4)
0.37% (4)
Pronoun
4.75% (47)
4.52% (45)
Adverb
3.99% (40)
4.06% (41)
Preposition
10.71% (107)
Conjunction 3.19% (32)
41
11.09% (111)
3.29% (33)
CCCC 2014
Indianapolis 20 March 2014
Comparison of lexemes with “Orwell” corpora
Feature
Biber 1998
Biber 1998
FYC 1998-99 (2012-13)
Academic prose
Face-to-face conversation
Frequency/1,000 words
Noun
188
137.4
207 (205)
Adjective attrib.
76.9
40.8
3.7(4) (Comp. and super)
Preposition
139.5
85.0
107 (111)
Conjunction
3.0
0.3
32 (33)
Verb (past)
21.9
37.4
27.5 (25)
Verb (pres)
63.7
128.4
49.1 (49.7)
Pronoun (pers.)
5.8
39.3
(45)
Adverb
51.8
86.0
40 (41)
42
CCCC 2014
Indianapolis 20 March 2014
Results: Interjections

corpus.byu.edu (Academic prose [journal articles])

SECTION
2005-2009
# TOKENS
6031
SIZE
102,046,528
PER MILLION
59.10
CCCC 2014
Indianapolis 20 March 2014
Results: Interjections
Orwell research papers 1998-99

Length:



Text Complexity:



- Number of segments:109
- Words in segments:128
- Av. Word Length:4.04
- Av. Segment Length:1.17
Lexical Density:


- Lexemes per segment:0.72
- Lexemes % of text:61.72%
CCCC 2014
Indianapolis 20 March 2014
Results: Interjections
Orwell research papers 2012-13

Length:



Text Complexity:



- Number of segments:60
- Words in segments:56
- Av. Word Length:3.93
- Av. Segment Length:0.93
Lexical Density:


- Lexemes per segment:0.67
- Lexemes % of text:71.43%
CCCC 2014
Indianapolis 20 March 2014
Conclusions (1)



Uses of texting and emoji as the return of the Rebus principle
(representing language by means of a symbol).
The rebus marks intellectual leap that every literate culture and individual
will make when moving from pre-literate to literate states (Ong’s “primary
literacy”))
Above: a message from a child, age 4, incorporating rebuses as she is just
beginning to learn that symbols can represent words, letters of the
alphabet, and the sounds of speech.
CCCC 2014
Indianapolis 20 March 2014
Conclusions (2)


The return of the Rebus principle as a commonly used shortcut in
communication systems, which we use everyday whenever we use
technology mediated communication (smart phones and browsers)
Here, we see modern examples of Ong’s “secondary literacy.”
CCCC 2014
Indianapolis 20 March 2014
Selected References
Biber, D. (1988). Variation across speech and writing. Cambridge NY: Cambridge
University Pres.
Biber, D., Johannsen, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of
Spoken and Written English. Harlow, England: Pearson Education.
Biber, D. (2006) University Language: A corpus-based study of spoken and written registers.
John Benjamins.
Fellbaum, C., & Miller, G. A. (1990). Folk psychology or semantic entailment? Comment
on Rips and Conrad (1989). Psychological Review, 0033295X, 97(4), 565-570.
Freeman Y.S. & Freeman, D. (2009) Academic Language for English language learners and
struggling readers. How to help students succeed across content areas. Portsmouth NH:
Heinemann.
Halliday, M. A. K., & Matthiessen, C. (2004). An introduction to functional grammar (3 ed.).
London: Arnold.
Ong, W. J. (1982). Orality and literacy:The technologizing of the word. London: Routledge.
Partridge, M. (2011). A comparison of lexical specificity in the communication verbs of
L1 English and TE student writing. Southern African Linguistics and Applied Language
Studies, 29(2), 135-147.
Scott, M. (2012). Wordsmith Tools version 6. Liverpool: Lexical Analysis Software.
49
CCCC 2014
Indianapolis 20 March 2014
Contact Information
Daniel Kies
Department of English
College of DuPage
425 Fawell Boulevard
Glen Ellyn, Illinois 60137, USA
[email protected]
50
CCCC 2014
Indianapolis 20 March 2014

similar documents