Korean Journal of Psychology : General
[ Article ]
The Korean Journal of Psychology: General - Vol. 40, No. 4, pp.389-413
ISSN: 1229-067X (Print)
Print publication date 25 Dec 2021
Received 06 Nov 2021 Accepted 26 Nov 2021
DOI: https://doi.org/10.22257/kjp.2021.

빅 데이터와 기계 학습의 시대 심리학 연구 모형의 평가 원칙과 방법

중앙대학교 심리학과
Principles and methods for model assessment in psychological research in the era of big-data and machine learning
Taehun Lee
Department of Psychology, Chung-Ang University

Correspondence to: 이태헌, 중앙대학교 사회과학대학 심리학과 부교수, (155-756) 서울시 동작구 흑석로 84 Tel: 02-820-5124, E-mail: lee0267@cau.ac.kr


본 논문에서는 계량 심리학 분야에서 지난 수 십 년 동안 꾸준히 논의가 진행되어 왔던 모형 추정과 평가의 원칙을 심리학 연구자들에게 소개하는 것을 목적으로 한다. 계량 심리학 분야에서 진행된 논의의 핵심은 1) 후보 모형들은 참 모형(true model)이 아니라 근사 모형(approximating model)이며, 2) 데이터 크기가 무한히 커지더라도 참 모형과 근사 모형 간 불일치는 사라지는 것은 아니기 때문에, 3) 여러 후보 모형 중 참 모형과의 불일치가 가장 낮은 것으로 추정되는 근사 모형을 선정하는 것이 바람직하다는 것이다. 이러한 모형 선정의 원리는 4차 산업 혁명의 시대, 여러 학문 분야에 걸쳐 그 영역을 확장하고 있는 기계 학습(machine learning) 분야에서 채택하고 있는 모형 평가의 원칙과 동일함을 설명하였다. 즉, 기계 학습 분야에서는 훈련(training) 과정에 노출되지 않았던 새로운 사례에서 보이는 모형의 성능인 일반화 혹은 예측 오차(generalization or prediction error)를 추정함으로써 모형을 선정하는데, 이는 계량 심리학 분야에서 근사모형과 참모형의 불일치 추정량인 총체적 오차(overall discrepancy)를 추정함으로써 모형을 선정해야 한다는 원리와 동일함을 설명하였다. 본 논문의 두 번째 목적은, 이러한 모형 선정의 원칙에 대한 이해를 바탕으로, 현재 심리학 분야에서 주어진 데이터에 대한 “철저한” 분석 관행이 초래하는 과적합(overfitting) 문제와 그 해결 방안을 논의하는 데 있다. 특히, 기계 학습 분야에서 가정 널리 사용되고 있으며, 계량 심리학 분야에서도 오래전부터 논의가 되어온(Mosier, 1951) 교차-타당성 입증법(cross-validation)을 일반화 오차의 추정량이라는 관점에서 소개하고 사용을 당부하였다.


The objective of the present article is to explain principles of estimation and assessment for statistical models in psychological research. The principles have indeed been actively discussed over the past few decades in the field of mathematical and quantitative psychology. The essence of the discussion is as follows: 1) candidate models are to be considered not the true model but approximating models, 2) discrepancy between a candidate model and the true model will not disappear even in the population, and therefore 3) it would be best to select the approximating model exhibiting the smallest discrepancy with the true model. The discrepancy between the true model and a candidate model estimated in the sample has been referred to as overall discrepancy in quantitative psychology. In the field of machine learning, models are assessed in light of the extent to which performance of a model is generalizable to the new unseen samples, without being limited to the training samples. In machine learning, a model’s ability to generalize is referred to as the generalization error or prediction error. The present article elucidates the point that the principle of model assessment based on overall discrepancy advocated in quantitative psychology is identical to the model assessment principle based on generalization/prediction error firmly adopted in machine learning. Another objective of the present article is to help readers appreciate the fact that questionable data analytic practices widely tolerated in psychology, such as HARKing (Kerr, 1998) and QRP (Simmons et al., 2011), have been likely causes of the problem known as overfitting in individual studies, which in turn, have collectively resulted in the recent debates over replication crisis in psychology. As a remedy against the questionable practices, this article reintroduces cross-validation methods, whose initial discussion dates back at least to the 1950s in psychology (Mosier, 1951), by couching them in terms of estimators of the generalization/prediction error in the hope of reducing the overfitting problems in psychological research.


overfitting, generalization error, training error, cross-validation, bias-variance tradeoff


과적합, 일반화 오차, 훈련 오차, 교차-타당성 입증법, 편향-분산 균형


이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No.2020R1H1A1102581).


  • Agler, R. A., & De Boeck, P. (2020). Factors associated with sensitive regression weights: A fungible parameter approach. Behavior research methods, 52(1), 207-22. https://doi.org/10.3758/s13428-019-01220-6 [https://doi.org/10.3758/s13428-019-01220-6]
  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N., Csaki, F. (Eds.), 2nd International Symposium on Information Theory. Akademia Kiado: Budapest.
  • Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705 [https://doi.org/10.1109/TAC.1974.1100705]
  • Allen, D. M. (1974). The relationship between variable selection and data agumentation and a method for prediction. technometrics, 16(1), 125-127. https://doi.org/10.1080/00401706.1974.10489157 [https://doi.org/10.1080/00401706.1974.10489157]
  • Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics surveys, 4, 40-79. https://doi.org/10.1214/09-SS054 [https://doi.org/10.1214/09-SS054]
  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and regression trees. Wadsworth Statistics/ Probability Series. Wadsworth Advanced Books and Software, Belmont, CA.
  • Bozdogan, H. (2000). Akaike's information criterion and recent developments in information complexity. Journal of mathematical psychology, 44(1), 62-91. https://doi.org/10.1006/jmps.1999.1277 [https://doi.org/10.1006/jmps.1999.1277]
  • Browne, M. W. (2000). Cross-validation methods. Journal of mathematical psychology, 44(1), 108-132. https://doi.org/10.1006/jmps.1999.1279 [https://doi.org/10.1006/jmps.1999.1279]
  • Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures. Multivariate behavioral research, 24(4), 445-455. https://doi.org/10.1207/s15327906mbr2404_4 [https://doi.org/10.1207/s15327906mbr2404_4]
  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage.
  • Burnham, K. P., & Anderson, D. R. (1998). Model Selection and Inference: A Practical Information-Theoretic Approach. New York: Springer. [https://doi.org/10.1007/978-1-4757-2917-7]
  • Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological methods & research, 33(2), 261-304. https://doi.org/10.1177/0049124104268644 [https://doi.org/10.1177/0049124104268644]
  • Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research, 1(2), 245-276. https://doi.org/10.1207/s15327906mbr0102_10 [https://doi.org/10.1207/s15327906mbr0102_10]
  • Chapman, B. P., Weiss, A., & Duberstein, P. R. (2016). Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development. Psychological methods, 21(4), 603. https://doi.org/10.1037/met0000088 [https://doi.org/10.1037/met0000088]
  • Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3), 419-444.. https://doi.org/10.2307/2983440 [https://doi.org/10.2307/2983440]
  • Chung, H. Y., Lee, K. W., & Koo, J. Y. (1996). A note on bootstrap model selection criterion. Statistics & probability letters, 26(1), 35-41. https://doi.org/10.1016/0167-7152(94)00249-5 [https://doi.org/10.1016/0167-7152(94)00249-5]
  • Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behavioral Research, 18(2), 147-167. https://doi.org/10.1207/s15327906mbr1802_2 [https://doi.org/10.1207/s15327906mbr1802_2]
  • Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the "problem" of sample size: A clarification. Psychological Bulletin, 109(3), 512-519. https://doi.org/10.1037/0033-2909.109.3.512 [https://doi.org/10.1037/0033-2909.109.3.512]
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x [https://doi.org/10.1111/j.2517-6161.1977.tb01600.x]
  • Enders, C. K., & Mansolf, M. (2018). Assessing the fit of structura equation models with multiply imputed data. Psychological Methods, 23(1), 76–93. https://doi.org/10.1037/met0000102 [https://doi.org/10.1037/met0000102]
  • Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American statistical Association, 70(350), 320-328. https://doi.org/10.1080/01621459.1979.10481632 [https://doi.org/10.1080/01621459.1979.10481632]
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York, NY: Springer. [https://doi.org/10.1007/978-0-387-84858-7]
  • Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118 [https://doi.org/10.1080/10705519909540118]
  • Hurvich, C. M., 6 Tsai, C. L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297-307. https://doi.org/10.1093/biomet/76.2.297 [https://doi.org/10.1093/biomet/76.2.297]
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning.. New York: springer. [https://doi.org/10.1007/978-1-4614-7138-7]
  • Jones, J. A., & Waller, N. G. (2016). Fungible weights in logistic regression. Psychological Methods, 21(2), 241-260. https://doi.org/10.1037/met0000060 [https://doi.org/10.1037/met0000060]
  • Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and social psychology review, 2(3), 196-217. https://doi.org/10.1207/s15327957pspr0203_4 [https://doi.org/10.1207/s15327957pspr0203_4]
  • Kim, C. (2019). Studying psychology using big data, Korean Journal of Psychology: General 38(4), 519-548. http://dx.doi.org/10.22257/kjp.2019. [https://doi.org/10.22257/kjp.2019.]
  • Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams Jr, R. B., Alper, S., ... & Sowden, W. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443-490. https://doi.org/10.1177/2515245918810225 [https://doi.org/10.1177/2515245918810225]
  • Klein, R., Ratliff, K., Vianello, M., Adams Jr, R., Bahník, S., Bernstein, M., ... & Nosek, B. (2014). Data from investigating variation in replicability: A “many labs” replication project. Journal of Open Psychology Data, 2(1). http://doi.org/10.5334/jopd.ad [https://doi.org/10.5334/jopd.ad]
  • Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 6(1), 1-15. https://doi.org/10.1186/1758-2946-6-10 [https://doi.org/10.1186/1758-2946-6-10]
  • Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance. Sociological methods & research, 33(2), 188-229. https://doi.org/10.1177/0049124103262065 [https://doi.org/10.1177/0049124103262065]
  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer. [https://doi.org/10.1007/978-1-4614-6849-3]
  • Lee, T., & MacCallum, R. C. (2015). Parameter influence in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 102-114.. https://doi.org/10.1080/10705511.2014.935255 [https://doi.org/10.1080/10705511.2014.935255]
  • Lee, T., MacCallum, R. C., & Browne, M. W. (2018). Fungible parameter estimates in structural equation modeling. Psychological Methods, 23(1), 58-75. https://doi.org/10.1037/met0000130 [https://doi.org/10.1037/met0000130]
  • Lee, T., & Shi, D. (2021). A comparison of full information maximum likelihood and multiple imputation in structural equation modeling with missing data. Psychological Methods, 26(4), 466-485. https://doi.org/10.1037/met0000381 [https://doi.org/10.1037/met0000381]
  • Lubke, G. H., & Campbell, I. (2016). Inference based on the best-fitting model can contribute to the replication crisis: Assessing model selection uncertainty using a bootstrap approach. Structural equation modeling: a multidisciplinary journal, 23(4), 479-490. https://doi.org/10.1080/10705511.2016.1141355 [https://doi.org/10.1080/10705511.2016.1141355]
  • MacCallum, R. C., & Tucker, L. R. (1991). Representing sources of error in the common-factor model: Implications for theory and practice. Psychological Bulletin, 109(3), 502-511. https://doi.org/10.1037/0033-2909.109.3.502 [https://doi.org/10.1037/0033-2909.109.3.502]
  • Mallow, C. L. (1973). Some comments on Cp. Technometrics, 28, 313-319. [https://doi.org/10.2307/1268980]
  • Miller, P. J., Lubke, G. H., McArtor, D. B., & Bergeman, C. S. (2016). Finding structure in data using multivariate tree boosting. Psychological Methods, 21(4), 583-602. https://doi.org/10.1037/met0000087 [https://doi.org/10.1037/met0000087]
  • Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New York.
  • Mosier, C. I. (1951). Symposium: The need and means of cross-validation. I. Problems and designs of cross-validation. Educational and Psychological Measurement, 11(1), 5-11. https://doi.org/10.1177/001316445101100101 [https://doi.org/10.1177/001316445101100101]
  • Linhart, H. & Zucchini, W. (1986). Model selection. New York: Wiley.
  • Myung, I. J., Forster, M. R., & Browne, M. W. (2000). GUEST EDITORS'INTRODUCTION: special issue on model selection.Journal of mathematical psychology, 44(1), 1-2. https://doi.org/10.1006/jmps.1999.1273 [https://doi.org/10.1006/jmps.1999.1273]
  • Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569. https://doi.org/10.1080/10705510701575396 [https://doi.org/10.1080/10705510701575396]
  • Pek, J., & Wu, H. (2018). Parameter uncertainty in structural equation models: Confidence sets and fungible estimates. Psychological Methods, 23(4), 635-653. http://dx.doi.org/10.1037/met0000163 [https://doi.org/10.1037/met0000163]
  • Preacher, K. J., & Merkle, E. C. (2012). The problem of model selection uncertainty in structural equation modeling. Psychological methods, 17(1), 1. [https://doi.org/10.1037/a0026804]
  • Prendez, J. Y., & Harring, J. R. (2019). Measuring Parameter Uncertainty by Identifying Fungible Estimates in SEM. Structural Equation Modeling: A Multidisciplinary Journal, 26(6), 893-904. https://doi.org/10.1080/10705511.2019.1608550 [https://doi.org/10.1080/10705511.2019.1608550]
  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes with Source Code CD-ROM 3rd Edition: The Art of Scientific Computing. Cambridge University Press.
  • Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111-163. https://doi.org/10.2307/271063 [https://doi.org/10.2307/271063]
  • Rocca, R., & Yarkoni, T. (2020, November 12). Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction, https://doi.org/10.31234/osf.io/e437b [https://doi.org/10.31234/osf.io/e437b]
  • Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461-464. http://www.jstor.org/stable/2958889 [https://doi.org/10.1214/aos/1176344136]
  • Shi, D., Lee, T., & Maydeu-Olivares, A. (2019). Understanding the model size effect on SEM fit indices. Educational and psychological measurement, 79(2), 310-334. https://doi.org/10.1177/0013164418783530 [https://doi.org/10.1177/0013164418783530]
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22(11), 1359-1366. https://doi.org/10.1177/0956797611417632 [https://doi.org/10.1177/0956797611417632]
  • Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual review of psychology, 69, 487-510. https://doi.org/10.1146/annurev-psych-122216-011845 [https://doi.org/10.1146/annurev-psych-122216-011845]
  • Stone, M. (1974). Cross‐validatory choice and assessment of statistical predictions. Journal of the royal statistical society: Series B (Methodological), 36(2), 111-133. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x [https://doi.org/10.1111/j.2517-6161.1974.tb00994.x]
  • Stone, M. (1977). An asymptotic equivalence of choice of model by cross‐validation and Akaike's criterion. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 44-47. https://doi.org/10.1111/j.2517-6161.1977.tb01603.x [https://doi.org/10.1111/j.2517-6161.1977.tb01603.x]
  • Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematical Sciences) 153 12-18.
  • Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley..
  • Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC bioinformatics, 7(1), 1-8. https://doi.org/10.1186/1471-2105-7-91 [https://doi.org/10.1186/1471-2105-7-91]
  • Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological Methods, 17(2), 228-243. https://doi.org/10.1037/a0027127 [https://doi.org/10.1037/a0027127]
  • Waller, N. G. (2008). Fungible weights in multiple regression. Psychometrika, 73(4), 691-703. https://doi.org/10.1007/S11336-008-9066-Z [https://doi.org/10.1007/s11336-008-9066-z]
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p<' 0.05”. https://doi.org/10.1080/00031305.2019.1583913 [https://doi.org/10.1080/00031305.2019.1583913]
  • Wiggins, B. J., & Christopherson, C. D. (2019). The replication crisis in psychology: An overview for theoretical and philosophical psychology. Journal of Theoretical and Philosophical Psychology, 39(4), 202-217. https://doi.org/10.1037/teo0000137 [https://doi.org/10.1037/teo0000137]
  • Wherry, R. J. (1951). IV. Comparison of cross-validation with statistical inference of betas and multiple R from a single sample. Educational and Psychological Measurement, 11(1), 23-28. https://doi.org/10.1177/001316445101100104 [https://doi.org/10.1177/001316445101100104]
  • Wherry, R. J. (1975). Underprediction from overfitting: 45 years of shrinkage. Personnel Psychology, 28(1), 1-18. https://doi.org/10.1111/j.1744-6570.1975.tb00387.x [https://doi.org/10.1111/j.1744-6570.1975.tb00387.x]
  • Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122. https://doi.org/10.1177/1745691617693393 [https://doi.org/10.1177/1745691617693393]
  • Yuan, K.-H., & Zhong, X. (2008). Outliers, leverage observations, and influꠓential cases in factor analysis: Using robust procedures to minimize their effect. Sociological Methodology, 38, 329-368. https://doi.org/10.1111/j.1467-9531.2008.00198.x [https://doi.org/10.1111/j.1467-9531.2008.00198.x]
  • Zucchini, W. (2000). An introduction to model selection. Journal of mathematical psychology, 44(1), 41-61. https://doi.org/10.1006/jmps.1999.1276 [https://doi.org/10.1006/jmps.1999.1276]