Social Media Data Predicts Diverse Personality Traits

Summary: Personality traits—such as empathy and extraversion—can be inferred from specific social network features, while a broader set of traits, lifestyle indicators, and mental health signals can be predicted from language use on social media.

Source: NICT

Researchers Masahiko Haruno and Kazuma Mori at the Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology (NICT), applied machine learning to Twitter data to predict a wide range of personality traits and attributes, including intelligence and extraversion. Using component-wise gradient boosting, the study shows that distinct types of Twitter information—network metrics (such as number of tweets and likes), temporal posting patterns, word statistics, and actual word usage—have different strengths for predicting interpersonal and mental health-related traits. Network features were particularly informative for social traits like extraversion, while natural language features better predicted mental health-related attributes such as anxiety.

The peer-reviewed study was published online in the Journal of Personality on August 20, 2020. It contributes to growing interest in how digital footprints from social networking services (SNS) reflect human personality and behavior, and how different data modalities from SNS may be used for psychological assessment or personalized interventions.

Although previous research established links between SNS activity and the Big Five personality dimensions, this study asked a more detailed question: which specific types of SNS-derived information are useful for predicting a wider array of personality traits and attributes beyond the Big Five? To answer this, the team collected intensive behavioral and self-report data and trained machine learning models to compare the predictive power of four distinct SNS data types.

The dataset included 239 participants (156 men and 83 women, mean age 22.4) who completed personality assessments covering 24 traits with a total of 52 subscales. From each participant’s Twitter account the researchers extracted four classes of features: network metrics, time-based activity patterns, aggregated word statistics, and bag-of-words usage. Using component-wise gradient boosting and rigorous cross-validation, the models were evaluated for their ability to predict each measured trait and subscale.

Overall, the four types of Twitter information collectively produced reliable predictions for 23 out of the 52 subscales. Correlations between measured and predicted scores averaged around 0.25—indicating modest prediction accuracy for individuals but meaningful relationships at the group level. Notably, extraversion showed a stronger relationship: predicted and measured Big Five extraversion scores correlated at r = 0.44 (calculated using a 10-fold cross-validation repeated 10 times; Bonferroni-corrected significance threshold of 0.05/52).

This is a diagram from the study — The National Institute of Information and Communications Technology reports using machine learning to analyze Twitter behavior and predict a range of personality traits and attributes. The study found that network features (e.g., number of tweets and likes) and word usage on Twitter are predictive of social traits like extraversion and mental health traits such as anxiety. Image credit: National Institute of Information and Communications Technology (NICT).

Detailed findings highlight how different data types specialize in predicting different traits. Network features were effective for interpersonal and social characteristics — for example, extraversion, empathy, and autism-related traits. In contrast, natural language features (word statistics and bag-of-words) were better predictors of socioeconomic indicators and health-relevant behaviors such as smoking and drinking, and also showed predictive power for depression and schizophrenia-related measures. Time-based features were generally less predictive but did show statistically significant associations with cognitive measures such as intelligence and with social value orientation.

While current prediction accuracy is not sufficient to make definitive clinical or diagnostic decisions for individuals, the authors emphasize the value of these methods at scale: with larger populations, the models can yield informative trends useful for mental health screening, population-level research, and possibly tailored behavioral nudges. The approach may also provide insights into neural and behavioral mechanisms underlying individual differences in personality.

About this psychology research article

Source:
NICT
Contacts:
HIROTA Sachiko – NICT
Image Source:
Image credited to National Institute of Information and Communications Technology (NICT).

Original Research: Open access. “Differential ability of network and natural language information on social media to predict interpersonal and mental health traits” by Kazuma Mori and Masahiko Haruno. Journal of Personality. DOI: 10.1111/jopy.12578

Abstract

Differential ability of network and natural language information on social media to predict interpersonal and mental health traits

Objective
This study investigates how different categories of social media information predict a broader set of interpersonal and mental health traits beyond the traditional Big Five dimensions. Prior work has shown that SNS footprints can reflect personality, but the relative strengths of various SNS data types remained unclear.

Method
We collected comprehensive assessments covering 24 traits (52 subscales) from N = 239 participants and extracted four types of Twitter-derived information: network features, temporal patterns, word statistics, and bag-of-words. Machine learning models based on component-wise gradient boosting were trained and tested to predict each trait from these feature sets.

Results
Collectively, the four SNS data types reliably predicted 23 subscales. Network metrics and word statistics demonstrated complementary strengths: network features were more informative for interpersonal traits such as autism-related measures, while natural language features predicted mental health-related traits including schizophrenia and anxiety. Intelligence showed predictive signal across all four information types.

Conclusions
Different types of SNS information can together predict a wider range of human traits and attributes than previously recognized. Each information type brings unique predictive value for specific traits, suggesting that integrated SNS-based analyses are a promising tool for personality research, mental health monitoring, and applications in information technology.