Korean Journal of Psychology : General

理쒓렐샇 寃깋

Korean Journal of Psychology : General - Vol. 40 , No. 4

[ Article ]
The Korean Journal of Psychology: General - Vol. 40, No. 4, pp. 389-413
ISSN: 1229-067X (Print)
Print publication date 25 Dec 2021
Received 06 Nov 2021 Accepted 26 Nov 2021
DOI: https://doi.org/10.22257/kjp.2021.

빅 데이터와 기계 학습의 시대 심리학 연구 모형의 평가 원칙과 방법
중앙대학교 심리학과

Principles and methods for model assessment in psychological research in the era of big-data and machine learning
Taehun Lee
Department of Psychology, Chung-Ang University
Correspondence to : 이태헌, 중앙대학교 사회과학대학 심리학과 부교수, (155-756) 서울시 동작구 흑석로 84 Tel: 02-820-5124, E-mail: lee0267@cau.ac.kr

Funding Information ▼


본 논문에서는 계량 심리학 분야에서 지난 수 십 년 동안 꾸준히 논의가 진행되어 왔던 모형 추정과 평가의 원칙을 심리학 연구자들에게 소개하는 것을 목적으로 한다. 계량 심리학 분야에서 진행된 논의의 핵심은 1) 후보 모형들은 참 모형(true model)이 아니라 근사 모형(approximating model)이며, 2) 데이터 크기가 무한히 커지더라도 참 모형과 근사 모형 간 불일치는 사라지는 것은 아니기 때문에, 3) 여러 후보 모형 중 참 모형과의 불일치가 가장 낮은 것으로 추정되는 근사 모형을 선정하는 것이 바람직하다는 것이다. 이러한 모형 선정의 원리는 4차 산업 혁명의 시대, 여러 학문 분야에 걸쳐 그 영역을 확장하고 있는 기계 학습(machine learning) 분야에서 채택하고 있는 모형 평가의 원칙과 동일함을 설명하였다. 즉, 기계 학습 분야에서는 훈련(training) 과정에 노출되지 않았던 새로운 사례에서 보이는 모형의 성능인 일반화 혹은 예측 오차(generalization or prediction error)를 추정함으로써 모형을 선정하는데, 이는 계량 심리학 분야에서 근사모형과 참모형의 불일치 추정량인 총체적 오차(overall discrepancy)를 추정함으로써 모형을 선정해야 한다는 원리와 동일함을 설명하였다. 본 논문의 두 번째 목적은, 이러한 모형 선정의 원칙에 대한 이해를 바탕으로, 현재 심리학 분야에서 주어진 데이터에 대한 “철저한” 분석 관행이 초래하는 과적합(overfitting) 문제와 그 해결 방안을 논의하는 데 있다. 특히, 기계 학습 분야에서 가정 널리 사용되고 있으며, 계량 심리학 분야에서도 오래전부터 논의가 되어온(Mosier, 1951) 교차-타당성 입증법(cross-validation)을 일반화 오차의 추정량이라는 관점에서 소개하고 사용을 당부하였다.


The objective of the present article is to explain principles of estimation and assessment for statistical models in psychological research. The principles have indeed been actively discussed over the past few decades in the field of mathematical and quantitative psychology. The essence of the discussion is as follows: 1) candidate models are to be considered not the true model but approximating models, 2) discrepancy between a candidate model and the true model will not disappear even in the population, and therefore 3) it would be best to select the approximating model exhibiting the smallest discrepancy with the true model. The discrepancy between the true model and a candidate model estimated in the sample has been referred to as overall discrepancy in quantitative psychology. In the field of machine learning, models are assessed in light of the extent to which performance of a model is generalizable to the new unseen samples, without being limited to the training samples. In machine learning, a model’s ability to generalize is referred to as the generalization error or prediction error. The present article elucidates the point that the principle of model assessment based on overall discrepancy advocated in quantitative psychology is identical to the model assessment principle based on generalization/prediction error firmly adopted in machine learning. Another objective of the present article is to help readers appreciate the fact that questionable data analytic practices widely tolerated in psychology, such as HARKing (Kerr, 1998) and QRP (Simmons et al., 2011), have been likely causes of the problem known as overfitting in individual studies, which in turn, have collectively resulted in the recent debates over replication crisis in psychology. As a remedy against the questionable practices, this article reintroduces cross-validation methods, whose initial discussion dates back at least to the 1950s in psychology (Mosier, 1951), by couching them in terms of estimators of the generalization/prediction error in the hope of reducing the overfitting problems in psychological research.

Keywords: overfitting, generalization error, training error, cross-validation, bias-variance tradeoff
키워드: 과적합, 일반화 오차, 훈련 오차, 교차-타당성 입증법, 편향-분산 균형


이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No.2020R1H1A1102581).

1. Agler, R. A., & De Boeck, P. (2020). Factors associated with sensitive regression weights: A fungible parameter approach. Behavior research methods, 52(1), 207-22. https://doi.org/10.3758/s13428-019-01220-6
2. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N., Csaki, F. (Eds.), 2nd International Symposium on Information Theory. Akademia Kiado: Budapest.
3. Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705
4. Allen, D. M. (1974). The relationship between variable selection and data agumentation and a method for prediction. technometrics, 16(1), 125-127. https://doi.org/10.1080/00401706.1974.10489157
5. Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics surveys, 4, 40-79. https://doi.org/10.1214/09-SS054
6. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and regression trees. Wadsworth Statistics/ Probability Series. Wadsworth Advanced Books and Software, Belmont, CA.
7. Bozdogan, H. (2000). Akaike's information criterion and recent developments in information complexity. Journal of mathematical psychology, 44(1), 62-91. https://doi.org/10.1006/jmps.1999.1277
8. Browne, M. W. (2000). Cross-validation methods. Journal of mathematical psychology, 44(1), 108-132. https://doi.org/10.1006/jmps.1999.1279
9. Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures. Multivariate behavioral research, 24(4), 445-455. https://doi.org/10.1207/s15327906mbr2404_4
10. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage.
11. Burnham, K. P., & Anderson, D. R. (1998). Model Selection and Inference: A Practical Information-Theoretic Approach. New York: Springer.
12. Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological methods & research, 33(2), 261-304. https://doi.org/10.1177/0049124104268644
13. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research, 1(2), 245-276. https://doi.org/10.1207/s15327906mbr0102_10
14. Chapman, B. P., Weiss, A., & Duberstein, P. R. (2016). Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development. Psychological methods, 21(4), 603. https://doi.org/10.1037/met0000088
15. Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3), 419-444.. https://doi.org/10.2307/2983440
16. Chung, H. Y., Lee, K. W., & Koo, J. Y. (1996). A note on bootstrap model selection criterion. Statistics & probability letters, 26(1), 35-41. https://doi.org/10.1016/0167-7152(94)00249-5
17. Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behavioral Research, 18(2), 147-167. https://doi.org/10.1207/s15327906mbr1802_2
18. Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the "problem" of sample size: A clarification. Psychological Bulletin, 109(3), 512-519. https://doi.org/10.1037/0033-2909.109.3.512
19. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
20. Enders, C. K., & Mansolf, M. (2018). Assessing the fit of structura equation models with multiply imputed data. Psychological Methods, 23(1), 76–93. https://doi.org/10.1037/met0000102
21. Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American statistical Association, 70(350), 320-328. https://doi.org/10.1080/01621459.1979.10481632
22. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York, NY: Springer.
23. Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118
24. Hurvich, C. M., 6 Tsai, C. L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297-307. https://doi.org/10.1093/biomet/76.2.297
25. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning.. New York: springer.
26. Jones, J. A., & Waller, N. G. (2016). Fungible weights in logistic regression. Psychological Methods, 21(2), 241-260. https://doi.org/10.1037/met0000060
27. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and social psychology review, 2(3), 196-217. https://doi.org/10.1207/s15327957pspr0203_4
28. Kim, C. (2019). Studying psychology using big data, Korean Journal of Psychology: General 38(4), 519-548. http://dx.doi.org/10.22257/kjp.2019.
29. Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams Jr, R. B., Alper, S., ... & Sowden, W. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443-490. https://doi.org/10.1177/2515245918810225
30. Klein, R., Ratliff, K., Vianello, M., Adams Jr, R., Bahník, S., Bernstein, M., ... & Nosek, B. (2014). Data from investigating variation in replicability: A “many labs” replication project. Journal of Open Psychology Data, 2(1). http://doi.org/10.5334/jopd.ad
31. Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 6(1), 1-15. https://doi.org/10.1186/1758-2946-6-10
32. Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance. Sociological methods & research, 33(2), 188-229. https://doi.org/10.1177/0049124103262065
33. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.
34. Lee, T., & MacCallum, R. C. (2015). Parameter influence in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 102-114.. https://doi.org/10.1080/10705511.2014.935255
35. Lee, T., MacCallum, R. C., & Browne, M. W. (2018). Fungible parameter estimates in structural equation modeling. Psychological Methods, 23(1), 58-75. https://doi.org/10.1037/met0000130
36. Lee, T., & Shi, D. (2021). A comparison of full information maximum likelihood and multiple imputation in structural equation modeling with missing data. Psychological Methods, 26(4), 466-485. https://doi.org/10.1037/met0000381
37. Lubke, G. H., & Campbell, I. (2016). Inference based on the best-fitting model can contribute to the replication crisis: Assessing model selection uncertainty using a bootstrap approach. Structural equation modeling: a multidisciplinary journal, 23(4), 479-490. https://doi.org/10.1080/10705511.2016.1141355
38. MacCallum, R. C., & Tucker, L. R. (1991). Representing sources of error in the common-factor model: Implications for theory and practice. Psychological Bulletin, 109(3), 502-511. https://doi.org/10.1037/0033-2909.109.3.502
39. Mallow, C. L. (1973). Some comments on Cp. Technometrics, 28, 313-319.
40. Miller, P. J., Lubke, G. H., McArtor, D. B., & Bergeman, C. S. (2016). Finding structure in data using multivariate tree boosting. Psychological Methods, 21(4), 583-602. https://doi.org/10.1037/met0000087
41. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New York.
42. Mosier, C. I. (1951). Symposium: The need and means of cross-validation. I. Problems and designs of cross-validation. Educational and Psychological Measurement, 11(1), 5-11. https://doi.org/10.1177/001316445101100101
43. Linhart, H. & Zucchini, W. (1986). Model selection. New York: Wiley.
44. Myung, I. J., Forster, M. R., & Browne, M. W. (2000). GUEST EDITORS'INTRODUCTION: special issue on model selection.Journal of mathematical psychology, 44(1), 1-2. https://doi.org/10.1006/jmps.1999.1273
45. Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569. https://doi.org/10.1080/10705510701575396
46. Pek, J., & Wu, H. (2018). Parameter uncertainty in structural equation models: Confidence sets and fungible estimates. Psychological Methods, 23(4), 635-653. http://dx.doi.org/10.1037/met0000163
47. Preacher, K. J., & Merkle, E. C. (2012). The problem of model selection uncertainty in structural equation modeling. Psychological methods, 17(1), 1.
48. Prendez, J. Y., & Harring, J. R. (2019). Measuring Parameter Uncertainty by Identifying Fungible Estimates in SEM. Structural Equation Modeling: A Multidisciplinary Journal, 26(6), 893-904. https://doi.org/10.1080/10705511.2019.1608550
49. Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes with Source Code CD-ROM 3rd Edition: The Art of Scientific Computing. Cambridge University Press.
50. Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111-163. https://doi.org/10.2307/271063
51. Rocca, R., & Yarkoni, T. (2020, November 12). Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction, https://doi.org/10.31234/osf.io/e437b
52. Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461-464. http://www.jstor.org/stable/2958889
53. Shi, D., Lee, T., & Maydeu-Olivares, A. (2019). Understanding the model size effect on SEM fit indices. Educational and psychological measurement, 79(2), 310-334. https://doi.org/10.1177/0013164418783530
54. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22(11), 1359-1366. https://doi.org/10.1177/0956797611417632
55. Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual review of psychology, 69, 487-510. https://doi.org/10.1146/annurev-psych-122216-011845
56. Stone, M. (1974). Cross‐validatory choice and assessment of statistical predictions. Journal of the royal statistical society: Series B (Methodological), 36(2), 111-133. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x
57. Stone, M. (1977). An asymptotic equivalence of choice of model by cross‐validation and Akaike's criterion. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 44-47. https://doi.org/10.1111/j.2517-6161.1977.tb01603.x
58. Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematical Sciences) 153 12-18.
59. Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley..
60. Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC bioinformatics, 7(1), 1-8. https://doi.org/10.1186/1471-2105-7-91
61. Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological Methods, 17(2), 228-243. https://doi.org/10.1037/a0027127
62. Waller, N. G. (2008). Fungible weights in multiple regression. Psychometrika, 73(4), 691-703. https://doi.org/10.1007/S11336-008-9066-Z
63. Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p<' 0.05”. https://doi.org/10.1080/00031305.2019.1583913
64. Wiggins, B. J., & Christopherson, C. D. (2019). The replication crisis in psychology: An overview for theoretical and philosophical psychology. Journal of Theoretical and Philosophical Psychology, 39(4), 202-217. https://doi.org/10.1037/teo0000137
65. Wherry, R. J. (1951). IV. Comparison of cross-validation with statistical inference of betas and multiple R from a single sample. Educational and Psychological Measurement, 11(1), 23-28. https://doi.org/10.1177/001316445101100104
66. Wherry, R. J. (1975). Underprediction from overfitting: 45 years of shrinkage. Personnel Psychology, 28(1), 1-18. https://doi.org/10.1111/j.1744-6570.1975.tb00387.x
67. Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122. https://doi.org/10.1177/1745691617693393
68. Yuan, K.-H., & Zhong, X. (2008). Outliers, leverage observations, and influꠓential cases in factor analysis: Using robust procedures to minimize their effect. Sociological Methodology, 38, 329-368. https://doi.org/10.1111/j.1467-9531.2008.00198.x
69. Zucchini, W. (2000). An introduction to model selection. Journal of mathematical psychology, 44(1), 41-61. https://doi.org/10.1006/jmps.1999.1276