Report

Statistical Methods for the Social Sciences Rahul Mukherjee REVIEW SESSION 01 TA: Marcio Cruz • marcio.cruz@graduateinstitute.ch Office hours • Wednesdays 09:00-11:00 • Rigot 10 Basic data management 1. Where do I find interesting data? GOOGLE!!! ;-) Some interesting links to MDEV and MIA students: MACRO (agregate variables/different countries) • World Development Indicators (World Bank) http://data.worldbank.org/data-catalog/world-development-indicators • World Economic Outlook Database (IMF) http://www.imf.org/external/pubs/ft/weo/2011/02/weodata/index.aspx • World Economic Outlook Database http://stat.wto.org/Home/WSDBHome.aspx (IMF) Basic data management By region: • US Economy - Federal Reserve Economic Data http://research.stlouisfed.org/fred2/ • European Union Economy - ECB statistics http://sdw.ecb.europa.eu/ • China – National Bureau of Statistics of China http://www.stats.gov.cn/english/statisticaldata/ • Mexico – Banco de México http://www.banxico.org.mx/estadisticas/index.html Basic data management MACRO DATA: You can find macro dataset for most of countries on their central banks and national statistics bureau webpages. • Central Banks http://www.bis.org/cbanks.htm • Official National Bureau of Statistics Basic data management MICRO (household surveys , firm-level data, etc.) • Official National Bureau of Statistic http://www.census.gov/acs/www/data_documentation/public_use_microdata_sample/ http://epp.eurostat.ec.europa.eu/portal/page/portal/microdata/introduction http://www.esds.ac.uk/international/access/micro.asp • International Organizations http://microdata.worldbank.org/index.php/home • Some blogs provide good links: https://sites.google.com/site/medevecon/development-economics/devecondata/micro http://openmicrodata.wordpress.com/ • Faculty webpages http://dvn.iq.harvard.edu/dvn/dv/JAngrist Basic Excel 2 . How should I download this data? Let us start with an example using MACRO data (from WDI). • • • • • • .csv , .txt or .xls? What is the difference? How to manage this data on excel? How to sort this data? How to do basic math operations on excel? How to get basic descriptive statistics on excel? How to generate a graph? Statistical packages Why should I manage data using a statistical package? It provides you more flexibility and you can keep the information about what you did in your research! Some examples of statistical packages: http://en.wikipedia.org/wiki/List_of_statistical_packages SPSS – comprehensive statistics package EViews – for econometric analysis Stata – comprehensive statistics package; SAS – comprehensive statistical package MATLAB – programming language with statistical features; R – A free implementation of the S language. S-PLUS – general statistics package Basic STATA 3 . Where can I find resources and tips for learning STATA? GOOGLE!!! ;-) • Stata webpage, universities webpage, etc. • Resources for learning Stata http://www.stata.com/links/resources1.html • Stata Starter Kit: Learning Modules http://www.ats.ucla.edu/stat/stata/sk/modules_sk.htm • Getting Started in Data Analysis http://dss.princeton.edu/training/ Basic STATA This link provides some exercises from the course's textbook: Statistical Methods for the Social Sciences, the 3rd edition by Alan Agresti & Barbra Finlay http://www.ats.ucla.edu/stat/examples/smss/default.htm Textbook Examples: Introduction to the Practice of Statistics by David Moore and George McCabe http://www.ats.ucla.edu/stat/examples/mm/default.htm • How to start on STATA? • .do, .dta, .log files? • USE .do FILES!!! Why? You can keep the information about everything you have done! • If you need to manage data: use .do file! .do FILE • How to use a .do file? 1. 2. 3. 4. Open STATA New .do file editor Set memory (this can improve the performance of STATA), but it depends on the capacity of your computer. So, if it does not work, you should demand less memory. (You don’t need to use this command) ex: set memory 1200m Define the directory you will work: cd "C:\Users\My Documents… " See example: " rs01_example01.do " Importing data to STATA 4. How to import data from excel to STATA? Importing data from excel: Source: http://www.stata.com/support/faqs/data/newexcel.html 1. A rule to remember Stata expects one matrix or table of data from one sheet, with at most one line of text at the start defining the contents of the columns. 2. How to get information from Excel into Stata • Start Excel. • Enter data in rows and columns or read in a previously saved file. • Highlight the data of interest, and then select Edit and click Copy. • Start Stata and open the Data Editor (type edit at the Stata dot prompt). • Paste data into editor by selecting Edit and clicking Paste. You can do this (2), but better avoid it! Why??? INSHEET COMMAND THE BEST WAY TO IMPORT DATA FROM EXCEL!!! • • • • • • • • 3.1 insheet command Launch Excel and read in your Excel file. Save as a text file (tab delimited or comma delimited) by selecting File and clicking Save As. If the original filename is filename.xls, then save the file under the name filename.txt or filename.csv. (Use the Save as type list—specifying an extension such as .txt is not sufficient to produce a text file.) Quit Excel if you wish. Launch Stata if it is not already running. (If Stata is already running, then either save or clear your current data.) In Stata, type insheet using filename.ext, where filename.ext is the name of the file that you just saved in Excel. Give the complete filename, including the extension. In Stata, type compress. Save the data as a Stata dataset using the save command. Importing data to STATA Common problems 5.1 Nonnumeric characters • One cell containing a nonnumeric character, such as a letter, within a column of data is enough for Stata to make that variable a string variable. 5.2 Spaces • What appear to be purely numeric data in Excel are often treated by Stata as string variables because they include spaces 5.3 Cell formats • Much formatting within Excel interferes with Stata's ability to interpret the data reasonably. Just before saving the data as a text file, make sure that all formatting is turned off, at least temporarily. You can do this by highlighting the entire spreadsheet, selecting Format, and then Cells, and clicking General. Importing data to STATA Common problems 5.4 Variable names • Stata limits variable names to 32 characters and does not allow within such names any characters that it uses as operators or delimiters. Also, variable names should start with a letter. 5.5 Missing rows and columns • Completely empty rows in a spreadsheet are ignored by Stata, but completely empty columns are not. A completely empty column gets read in as a variable with missing values for every observation. 5.6 Leading zeros • With integer-like codes, such as ICD-9 codes or U.S. Social Security numbers, that do not contain a dash, leading zeros will get dropped when pasted into Stata from Excel. One solution is to flag within the first line that the variable is string: add a nonnumeric character in Excel on that line, and then remove it in Stata. 5.7 Filename and folder • Confirm the filename and location of the file you are trying to read. Use Explorer or its equivalent to check. STATA - data types • Numeric variables • String variables • What is a ‘STRING’ variable ? How to deal with them? Some basic commands • • • • • • • • • • • • • Summary: sum Conditions: if, &, | Sort variables: sort Order variables: order Generate variables: gen var Drop variables (columns): drop Drop rows: drop in Concatanate variables: concat() Destring variables: destring var, replace Generate numerical variables from string variables: tab var, gen(newvar) Basic math operations : / ; *; -; + or rsum(var1, var2, …, varn); Replace: replace var Collapse: collapse (sum) var, by(var) – see help collpase Linking with class notes… How to generate a quantitative variable from a categorical variable? For example: . Favorite music type of (rock, jazz, folk, classical) Command on STATA tab, gen(name of the var. For example: music) tab, gen(music) EXERCISE The slide on page 30 of the first class notes is the following: www.stat.ufl.edu/~aa/social/data.html EXERCISE Access this webpage (www.stat.ufl.edu/~aa/social/data.html) and do the following procedure: 1. Download the data in Excel; 2. Plot a graph showing the age of students (on axes x) and the time they spend on TV (on axes y); 3. Plot a pie graph showing the number of males and females; 4. Save this data as .csv; 5. Transfer this data to STATA 6. Identify which variables are numerical and which one are string. 7. Plot a graph showing the age of students (on axes x) and the time they spend on TV (on axes y); 8. Plot a pie graph showing the number of males and females; 9. How many of these students are: D = Democrat, R = Republican, I = independent ? 10. Generate a variable called average_gpa that is: average_gpa = (high school GPA (on a four-point scale) + college GPA)/2 I have a problem on STATA… • If you have any doubt about how to use one specific procedure on STATA, how should you deal with this? • 1. Google!!! ;-) …. If this doesn’t work: • 2. Google!!! Try again, maybe you haven’t searched properly… but, if this doesn’t work: • 3. Google!!! Try once more, just in case. • 4 . Command HELP on STATA. • 5. Send your questions to statalist: http://www.stata.com/statalist/ • 6. Talk to you TA • 7. Talk to your Professor • You can talk to your TA whenever you want, but try at least the first 4 steps. This will be important for developing your skills to deal with Stata! ;-)