Korean Journal of Psychology : General

理쒓렐샇 寃깋

Korean Journal of Psychology : General - Vol. 38 , No. 4

[ Special issue: Psychology and Fourth Industrial Revolution 1 ]
The Korean Journal of Psychology: General - Vol. 38, No. 4, pp.519-548
ISSN: 1229-067X (Print)
Print publication date 25 Dec 2019
Received 04 Nov 2019 Accepted 24 Dec 2019
DOI: https://doi.org/10.22257/kjp.2019.12.38.4.519

빅데이터를 이용한 심리학 연구 방법
김청택
서울대학교 심리학과, 인지과학협동과정

Studying Psychology using Big Data
Cheongtag Kim
Department of Psychology, Interdisciplinary Program in Cognitive Science, Seoul National University
Correspondence to : 김청택, 서울대학교 심리학과, (08827) 서울시 관악구 관악로 1 Tel: 02-880-6076, E-mail: ctkim@snu.ac.kr


초록

빅데이터, 기계학습, AI 등의 새로운 기술의 발달은 사람들의 사고와 행동을 변화시키고 이전에는 접근하기 힘들었던 인간에 대한 다양한 활동을 관찰하는 것을 가능하게 한다. 사람들이 인터넷을 광범위하게 사용함에 따라서, 개인의 행동도 인터넷에 저장되고 있다. 자료들은 매우 광범위하며 다양하기 때문에 이를 적절하게 분석하면 인간 심리를 이해하는 범위를 확대할 수 있을 것이다. 이 논문에서는 새롭게 발달된 이러한 기술들을 심리학 연구에 활용하는 방법에 대하여 모색하고자 하였다. 특히 기술의 발달로 가능해진 새로운 자료, 빅데이터의 특성과 심리학에서의 활용방안에 대하여 논의하였다. 이 논문에서는 첫째, 빅데이터의 특성과 빅데이터가 심리학에서 어떠한 역할을 할 수 있는지 살펴보았다. 심리학의 모형주도적 분석법과 다른 빅데이터의 자료주도적 분석법의 문제점들과 이러한 분석을 심리학연구에 어떻게 적용될 수 있는지에 대하여 논의하였다. 둘째, 자료의 분석 방법론에 대하여 살펴보았다. 기존 심리학 연구에서는 정교한 연구설계에 의해 자료가 수집되기 때문에 분석이 상대적으로 덜 중요하지만, 빅데이터 분석에서는 자료분석의 역할이 아주 중요해진다. 방대하고 구조화되지 않은 자료를 처리할 수 있어야 하고, 언어 자료와 같은 숫자 이외의 자료도 분석할 수 있어야 한다. 특히 주제 모형화, 능선 회귀분석과 라소 회귀분석, 지지벡터 기계, 신경망, 딥러닝 등에 대한 원리를 소개하고 심리학 연구에 적용되는 방법들에 대하여 논의하였다. 셋째, 심리학에서 빅데이터 분석 적용의 한계점을 살펴보고, 마지막으로 빅데이터의 심리학 연구의 적용에 대한 방법을 제안하였다.

Abstract

The development of new technology such as big data, machine learning, and Artificial Intelligence changes human behaviors and thought. Increased use of the internet makes it possible to observe various human activities that were not observable before. Huge amounts of data about various types of human activities are being stored on the internet. Analyzing this information will help extend the scope of understanding human behaviors and psychology. The present paper attempts to find a way of applying new technology to psychological studies. Specifically, we focused on what big data are like and how they can be used for psychological research. This paper first reviewed the characteristics of big data and their role in psychological research. In this context, it discussed the problems of data-driven analysis techniques in which big data analysis is applied and the possibility of applying such methods to psychological research. In this context, it discussed the problems of the data-driven analytic scheme that big data analysis adapting and the possibilities of applying such a method to psychological research. Second, data analytic techniques used in big data analyses are reviewed. These techniques should be able to deal with big and unorganized data and unstructured data such as pictures, video clips, texts, etc. Specifically, it reviewed basic principles of topic modeling, ridge or lasso regression, support vector machine, neural network, and deep learning, and their application to psychological data. Third, the limitations of the use of big data in psychological research are discussed. Finally, it proposed ways of applying big data technology to psychological research.


Keywords: Big Data, Artificial Intelligence, Machine Learning, Topic Modeling, Deep Learning, Data-driven Analysis, Model-driven analysis
키워드: 빅데이터, 인공지능, 기계학습, 주제모형, 딥러닝, 자료주도적 분석, 모형주도적 분석

References
1. 김청택, 이태헌 (2002). 뇌와 인지모형: 잠재의 미분석을 사용한 문서분류. 한국심리학지: 실험 및 인지, 14(4), 309-320.
2. 박성준, 박희영, 김청택 (2019). 잠재의미분석을 활용한 성격검사문항의 의미표상과 요인구조의 비교. 인지과학, 30(3), 133-156.
3. 이태헌, 김청택 (2004). LSA모형에서 다의어 의미의 표상, 인지과학, 15, 23-31.
4. Adjerid, I.,, & Kelley, K. (2018). Big data in psychology: A framework for research advancement. American Psychologist, 73(7), 899-917.
5. Amato A., & Coronato, A. (2017). Supporting hypothesis generation by machine learning in smart health. Advances in Intelligent Systems and Computing, 612, 401-410.
6. Anderson, J. (1990). The Adaptive Character of Thought. Hillsdale, NJ: Erlbaum Associates.
7. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022.
8. Boser, B. E., Guyon, I., & Vapnik, V.N. (1992). Training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop of Computational Learning Theory (pp 144-152), Pittsburgh: ACM.
9. Cheung, M. W. L., & Jak, S. (2016). Analyzing big data in psychology: A split/analyze/meta-analyze approach. Frontiers in Psychology, 7.
10. Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., Marvalos, S. Moens, M-F., & De Cock, M. (2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26, 109-142.
11. Griggs, B. (2014, January 27). It's Facebook vs. Princeton in study smackdown. CNN. https://edition.cnn.com/2014/01/24/tech/social-media/facebook-princeton-smackdown/index.html
12. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504-507.
13. Hofmann, T. (1999). Probabilistic latent semantic analysis. In K. B. Laskey, & H. Prade (Eds.), Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(pp. 289-296). Stockholm Sweden: Morgan Kaufmann Publishers Inc.
14. Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177-196.
15. HostingFacts (2019, November) Internet Stats & Facts for 2019. Retrieved November 25, 2019 from https://hostingfacts.com/internet-facts-stats
16. Kaplan, R. M., & Saccuzzo, D. P. (2018). Psychological Testing: Principles, Applications, and Issues. Boston, MA: Cengage Learning.
17. Kosinski,M., Matz, S., Gosling, S., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines, American Psychologist, 70(6), 543-556.
18. Kosinski, M., Wang, Y., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21(4), 493.
19. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s Problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211-240.
20. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
21. Landers, R., & Behrend, T. (2015). An inconvenient truth: arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology, 8(2), 142-164.
22. Laney, D. (2001) 3D Data management: controlling data volume, velocity and variety. META Group Research Note, 6.
23. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of google flu: Traps in big data analysis. Science, 343(6176). 1203-1205.
24. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436-444.
25. Markowetz, A, Błaszkiewicz, K, Montag, C, Switala, C, & Schlaepfer, T. E. (2014). Psycho-informatics: Big data shaping modern psychometrics. Medical Hypotheses, 82(4), 405-411.
26. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375-407.
27. McClelland, J. L., Rumelhart, D. E., & the PDP Research Group (Eds.). (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Vol. 2. Psychological and biological models. Cambridge, MA: MIT Press.
28. McCulloch, W. S, & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115-133.
29. Moustafa, A. A., Diallo, T. M. O., Amoroso, N., Zaki, N., Hassan, M., & Alashwal, H. (2018). Applying big data methods to understanding human behavior and health. Frontiers in Computational Neuroscience, 12, 1-4.
30. Oquendo, M. A., Baca-Garcia, E., Artés-Rodríguez, A., Perez-Cruz, F., Galfalvy, H. C., Blasco-Fontecilla, H.,, Madigan D., & Duan, N. (2012, October). Machine learning and data mining: Strategies for hypothesis generation. Molecular Psychiatry.
31. Popper, K. R. (1959). The Logic of Scientific Discovery (translation of Logik der Forschung). London: Hutchinson.
32. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
33. Rumelhart, D. E., McClelland, J. L., & the PDP Research Group (Eds.). (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Vol. 1. Foundations. Cambridge, MA: MIT Press.
34. Sang, S., Yang, Z., Li, Z., & Lin, H. (2015). Supervised learning based hypothesis generation from biomedical literature. BioMed Research International, 215.
35. Shawe-Taylor, J. & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press.
36. Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of internet. International Journal of Internet Science, 7, 1-5.
37. Steyvers, M. & Griffiths, T. (2006). Probabilistic topic models. In D. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.). Latent Semantic Analysis: A Road to Meaning. Mahwah: Erlbaum.
38. Thomas, K. A., & Clifford, S. (2017). Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior, 77, 184-197.
39. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267-288.
40. Young, J. L. (2018). The long history of big data in psychology. The American Journal of Psychology, 131(4), 477-482.
41. Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences of the United States of America, 112(4), 1036-1040.