Designing Monte Carlo Simulation Studies Xitao Fan, Ph.D. Chair Professor & Dean Faculty of Education University of Macau Getting Involved in Monte Carlo Simulation Fan, X., Felsovalyi, A., Sivo, S. A., & Keenan, S. (2002) SAS for Monte Carlo studies: A guide for quantitative researchers. Cary, NC: SAS Institute, Inc. Fan, X. (2012). Designing simulation studies. In H. Cooper (Ed.), Handbook of Research Methods in Psychology,Vol. 2 (pp. 427-444). Washington, DC: American Psychological Association. Getting Involved in Monte Carlo Simulation Peugh, J., & Fan, X. (In press). Enumeration index performance in generalized growth mixture models: a Monte Carlo test of Muthén’s (2003) hypothesis. Structural Equation Modeling. Peugh, J., & Fan, X. (In press). Modeling unobserved heterogeneity using latent profile analysis: A Monte Carlo simulation. Structural Equation Modeling. Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling, (19), 204-226. Fan, X., & Sivo, S. A. (2009). Using goodness-of-fit indices in assessing mean structure invariance. Structural Equation Modeling, 16, 1-16. Fan, X. & Sivo, S. (2007). Sensitivity of fit indices to model misspecification and model types. Multivariate Behavioral Research, 42, 509-529. Sivo, S. A., Fan, X., Witta, E. L., & Willse, J. T. (2006). The search for "optimal" cutoff properties: Fit index criteria in structural equation modeling. Journal of Experimental Education, 74, 267-288. Getting Involved in Monte Carlo Simulation Fan, Xitao, & Fan, Xiaotao. (2005). Power of latent growth modeling for detecting linear growth: Number of measurements and comparison with other analytic approaches. Journal of Experimental Education, 73, 121-139. Fan, X., & Sivo, S. A. (2005). Sensitivity of fit indices to misspecified structural or measurement model components: Rationale of two-index strategy revisited. Structural Equation Modeling, 12, 343-367. Fan, Xitao, & Fan, Xiaotao. (2005). Using SAS for Monte Carlo simulation research in structural equation modeling. Structural Equation Modeling, 12, 299-333. Sivo, S., Fan, X., & Witta, L. (2005). The biasing effects of unmodeled ARMA time series processes on latent growth curve model estimates. Structural Equation Modeling, 12, 215-231. Fan, X. (2003). Two Approaches for Correcting Correlation Attenuation Caused by Measurement Error: Implications for Research Practice. Educational and Psychological Measurement, 63, 6, 915-930. Fan, X. (2003). Power of latent growth modeling for detecting group differences in linear growth trajectory parameters. Structural Equation Modeling, 10, 380-400. Getting Involved in Monte Carlo Simulation Yin, P., & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of different analytical methods. Journal of Experimental Education, 69, 203-224. Fan, X., & Wang, L. (1999). Comparing logistic regression with linear discriminant analysis in their classification accuracy. Journal of Experimental Education, 67, 265-286. Fan, X., Thompson, B, & Wang, L. (1999). The effects of sample size, estimation methods, and model specification on SEM fit indices. Structural Equation Modeling: A Multidisciplinary Journal, 6, 56-83. Fan, X., & Wang, L. (1998). Effects of potential confounding factors on fit indices and parameter estimates for true and misspecified SEM models. Educational and Psychological Measurement, 58, 699-733. Fan, X. & Wang, L. (1996). Comparability of jackknife and bootstrap results: An investigation for a case of canonical analysis. Journal of Experimental Education, 64, 173-189. What Is a Monte Carlo Simulation Study? “the use of random sampling techniques and often the use of computer simulation to obtain approximate solutions to mathematical or physical problems especially in terms of a range of values each of which has a calculated probability of being the solution” (Merriam-Webster OnLine). An empirical alternative to a theoretical approach (i.e., a solution based on statistical/mathematical theory) Increasingly possible because of the advances in computing technology Situations Where Simulation Is Useful Consequences of Assumption Violations Statistical Theory: stipulates what the condition should be, but does not say what the reality would be if the conditions were not satisfied in the data Understanding a Sample Statistic That May Not Have Theoretical Distribution ● Many Other Situations Retaining the optimal number of factors in EFA Evaluating the performance of mixture modeling in identifying the latent groups Assessing the consequences of failure to model correlated error structure in latent growth modeling Basic Steps in a Simulation Study Asking Questions Suitable for a Simulation Study Questions for which no (no trustworthy) analytical/theoretical solutions Simulation Study Design (Example) Include / manipulate the major factors that potentially affect the outcome Data Generation Sample data generation & transformation Analysis (Model Fitting) for Sample Data Accumulation and Analysis of the Statistic(s) of Interest Presentation and Drawing Conclusions Conclusions limited to the design conditions An Example: Independent t-test (group variance homogeneity) An Example: Independent t-test (group variance homogeneity) Data Generation in a Simulation Study Common Random Number Generators * binomial, Cauchy, exponential, gamma, Poisson, normal, uniform, etc. * All distributions are based on uniform distribution Simulating Univariate Sample Data * Normally-Distributed Sample Data (N ~ , 2) * Non-Normal Distribution: Fleishman (1978): a, b, c, d: coefficients needed for transforming the unit normal variate to a nonnormal variable with specified degrees of population skewness and kurtosis. Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521-531. Data Generation in a Simulation Study Sample Data from a Multivariate Normal Distribution * F: matrix decomposition procedure (Kaiser & Dickman, 1962): k k matrix containing principal component factor pattern coefficients obtained by applying principal component factorization to the given population inter-correlation matrix R; Sample Data from a Multivariate Non-Normal Distribution * Interaction between non-normality and inter-variable correlations * Intermediate correlations using Fleishman coefficients (Vale & Maurelli, 1983) * Matrix decomposition procedure applied to intermediate correlation matrix Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika, 27, 179-182 Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465471. Checking the Validity of Data Generation Procedures Example: Multivariate non-normal sample data (three correlated variables) From Simulation Design to Population Data Parameters It may take much effort to obtain population parameters – t-test example From Simulation Design to Population Data Parameters Latent growth model example From Simulation Design to Population Data Parameters Latent growth model example Accumulation and Analysis of the Statistic(s) of Interest Accumulation: Straightforward or Complicated * Typically, not an automated process * Statistical software used * Analytical techniques involved * Type of statistic(s) of interest, etc. Analysis * Follow-up data analysis may be simple or complicated * Not different from many other data analysis situations Presentation and Drawing Conclusions Presentation * Representativeness & Exceptions * Graphic Presentations * Typical: table after table of results – No one has the time to read the tables! Drawing Conclusions * Validity & generalizability depend on the adequacy & appropriateness of simulation design * Conclusions must be limited by the design conditions and levels.