Habeas Corpus in Your Classroom An InterACTIVE Workshop Dr. Rob Troyer Western Oregon University IALLT June 11, 2013 • Introduction Outline • What’s a Corpus? • What have we learned from corpus studies? • What can I do to and with a corpus? • Google me n-grams • Learning to play and playing to learn • COCA me crazy • Using COCA • Designing corpus-based lessons • Activity • getting on MICASE (and MICUSP) • Activity • Frankencorpus • AntConc and free PoS tagging • Activity • Conclusion What’s a corpus? • a body of text – referring to “a corpus” is like referring to “a dictionary” • SEU (Survey of English Usage), University College London – began 1955 on note-cards, completed by 1985 on computer 1 million words, 200 texts, 5,000 words each, written + spoken (mono and dialogue) Comprehensive Grammar of the English Language (1985) Svartvik, Crystal, Greenbaum, Leech, and Quirk. Types of Corpora • University produced, for purchase only • University produced, freely available via www • Corporate produced for their own use only • The www is a corpus (& Google n-grams) • Self-compiled corpora What can I do to a Corpus? • annotation (auto, manual, or both) – metadata – textual markup – linguistic annotation • POS, parsing, semantic • annotation software (free online) • CLAWS • Stanford Parser – (for purchase) WordSmith, WMATRIX, etc. What can I do with a Corpus? • Descriptive statistics (software/interface) – frequency – type-token ratio – keywords – collocations – (factor and cluster analysis) • Concordancing (software/interface) – KWIC lines What have we learned from Corpus Studies (a few examples)? • emergent modals/grammaticalization – going to, need to, had better (will, must, should) Leech et. all from 1971 to 2004 continually rev. • changes to the perfect aspect construction – ‘be’ as auxiliary with certain verbs where today we have only ‘have’ What have we learned from Corpus Studies? • preferences for grammatical structures – participle vs. infinitive • ELL teaching – recall Doug Biber’s Tri-TESOL presentation • progressive aspect in conversation Google me n-grams Google n-gram viewer http://books.google.com/ngrams • Highlights – Google Books, 20 million books as of Oct 2012 • Includes English, Spanish, French, German, Russian, Italian, Chinese, and Hebrew • In 2010, Google estimated 130 million books have been printed since Gutenberg (all languages); thus, GB=14% • English portion is 500 billion words (500,000,000,000) – n-gram viewer searches subset of 8 million books Google me n-grams Perform the following searches • • • • • flapper, hippie, yuppie vodka,whiskey,gin,rum Plato,Aristotle cocaine,heroin,amphetamine,LSD,marijuana werewolf,zombie – add vampire – add pedant • the whole hog,cold turkey Google me n-grams Perform the following searches The latest release (Oct 2012) added some options • telephone_VERB, phone_VERB • call_VERB, call_NOUN • contact_VERB,impact_VERB,access_VERB • telephone,radio,television – add Internet – add computer – change the last two to (Internet+computer) Google me n-grams Activity • You have 5 minutes to come up with your coolest Google n-gram search and graph. I will circulate to pick a winner. Concordance: a list of every use of a certain word in a corpus KWIC lines: Key Word In Context Activity • Analyze the context of “is arrived” on the handout. • Note the date of publication and the genre – (Magazine, Non-Fiction, Newspaper, Fiction) • FYI: COHA = Corpus of Historical English • At what point in history does the pattern change? • What happened? – hint: present tense passive voice of transitive ‘arrive’ vs. present perfect of intransitive ‘arrive’ • Where did I get this data? COCA me crazy http://corpus.byu.edu The BYU Suite of Corpora and Tools • Highlights – Corpora • Corpus of Contemporary American English (COCA) 450 mil • Corpus of Historical English (COHA) 400 mil • TIME magazine Corpus of American English (Time) 100 mil • Corpus of American Soap Operas (SOAP) 100 mil • BYU-BNC: British National Corpus (BNC) 100 mil • 5 of the Google Book Corpora 34 – 155 bil – Tools • online concordance interface • WordAndPhrase.info COCA me crazy The BYU Suite of Corpora and Tools http://corpus.byu.edu • Go to the site (hint: just Google “coca”) • register • use the “start” navigator to go to COHA • type [np*] is arrived in the search box • make a note of the number of occurrences in different years • type [np*] has arrived in the search box • What can we conclude about changes to auxiliary use with the present perfect of some transitive verbs (arrive)? COCA me crazy Vocabulary instruction 1 • Goal: teach students to correctly use evident/evidence • use the “start” navigator to go to Word and Phrase • type evident in the search box • examine the information • Go to COCA and type evident [i*] • choose the “Academic” corpus and click “search” • scan down the lines and look for patterns • click on “evident” in a KWIK line • let’s make a class handout Vocabulary instruction 2 COCA me crazy • select 10 KWIC lines for “evident” and copy • open MS Word and change orientation and margins • paste the lines, select and delete unnecessary columns • change font to courier, 8 pt • delete characters so “evident” is in the middle • rearrange lines so that patterns are more obvious • delete difficult lines and/or redundant patterns • follow the same steps for “evidence” • delete the key words to make a gap-fill Vocabulary instruction 3 COCA me crazy • lesson design • on board, write a frame for each word from your examples • ask students to fill in words that could fit • teach the new items: evident/evidence • give students the handout to fill the gaps • have them identify patterns of use • engage in authentic writing in which the new words can be used Vocabulary instruction 4 COCA me crazy • many potential variations • multiple words • single word—multiple patterns • transition words instead of content words • different register (spoken, fiction, news, academic) • comparison of use in different registers • teach students to search for a word in passive vocabulary— examples will help them use it confidently • try WordAndPhrase.Info http://www.wordandphrase.info COCA me crazy Teaching Reporting Clauses and Phrases • Example summaries from last year (pre-corpus-based lesson) • Example summaries from this term (post-corpus-based lesson) • Activity: do the student handout – Answers and “What did you learn?” COCA me crazy Teaching Reporting Clauses and Phrases • How did I make this handout using COCA? – identified target phrases, searched academic register, – clicked selected lines for context, copied entire sentence – pasted to Word, arranged so that target moves away from sentence initial position – identified target verbs, searched with [np1] before verbs, selected for context, copied, pasted – Selected additional lines for page 2 gap-fill and arranged – wrote questions that emphasize form Getting on MICASE http://micase.elicorpora.info and MICUSPhttp://micusp.elicorpora.info • Highlights – Corpora • Michigan Corpus of Academic Spoken English (MICASE) – 152 transcripts; 1,848,364 words • Michigan Corpus of Upper-level Student Papers (MICUSP) – 830 “A-grade” papers; 2,600,000 words – variety of disciplines – Tools • online concordance interfaces Getting on MICASE http://micase.elicorpora.info and MICUSPhttp://micusp.elicorpora.info • Goal: raise awareness of what’s in lectures – Ss are probably familiar with what’s + NP, Verb, or Adj for a ? • so with that in mind what's the next word after, avaritiae? • now what's physically going on here? • but what's wrong? why did… isn't this useful? – But less aware of it’s frequent use in complement clauses and wh-clefts • there is never any sense of what's going on behind • but what's usually happening is that some victim, of raw political oppression, is unjustly imprisoned. Getting on MICASE http://micase.elicorpora.info and MICUSPhttp://micusp.elicorpora.info • Go to MICASE – In the “Transcript Attributes” menu, select “Lecture-small” – In the search box type what’s and click “submit” – Sort results by “2 right” – Which frequent followers typically lead to questions? » a, an, the, this, wrong – Which frequent followers typically lead to statements? » called, going on, gonna, important Getting on MICASE http://micase.elicorpora.info and MICUSPhttp://micusp.elicorpora.info • Make an awareness raising handout – what’s in questions » copy 2 lines each of what’s + a, an, the, this, wrong » paste in a pre-formatted Word doc » edit lines to bring key word to the center – what’s in statements » copy 4 lines of what’s + called, important, going on, and gonna » paste and edit Getting on MICASE http://micase.elicorpora.info and MICUSPhttp://micusp.elicorpora.info • Make an awareness raising handout – Students analyze first set of 10 » What is the function of what’s? – Students analyze what’s + called » can you remove “what’s called”? » what is the purpose of adding “what’s called?” – Students analyze what’s + going on » read what’s before the key words—what do most lines have in common? Getting on MICASE http://micase.elicorpora.info and MICUSPhttp://micusp.elicorpora.info • Make an awareness raising handout – For each pattern, allow students to form generalizations about meanings associated with the pattern. • Copy and Paste additional lines to make a gap-fill with random patterns. • Follow-up with authentic listening/speaking practice that uses at least some of the what’s patterns. Frankencorpus: Build your own body • Research Questions • prepare texts • PoS tagging • Download Concordancer • Basic Analysis Frankencorpus: Build your own body • Choose a news topic: find at least three articles in different online news sources • copy and paste the text of the articles into one notepad text file • copy all of the text and tag it (CLAWS, C7 tag set) • copy tagged text and paste into notepad and save • download antconc • open antconc Frankencorpus: Build your own body • antconc global settings: hide tags but allow search • antconc tool settings: word count and concordancer— select “treat all text as lowercase” • load text file • perform word count • go to concordancer tab • search for a keyword or phrase • search for a part of speech or combination • sort lines alphabetically by keyword and/or left-right Conclusions Corpus tools and corpus-based materials are not magic. Playing with the tools in your free time will help you build skills used for efficient materials production. Authenticity is authentic. Editing requires conscious attention to form.