TY - JOUR
T1 - The illusion of data validity
T2 - Why numbers about people are likely wrong
AU - Jansen, Bernard J.
AU - Salminen, Joni
AU - Jung, Soon gyo
AU - Almerekhi, Hind
N1 - Publisher Copyright:
© 2022 Wuhan University
PY - 2022/10
Y1 - 2022/10
N2 - This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong. Addressing the potential causes of this wrongness, we present examples of analyzing people numbers, i.e., numbers derived from digital data by or about people, and discuss the comforting illusion of data validity. We first lay a foundation by highlighting potential inaccuracies in collecting people data, such as selection bias. Then, we discuss inaccuracies in analyzing people data, such as the flaw of averages, followed by a discussion of errors that are made when trying to make sense of people data through techniques such as posterior labeling. Finally, we discuss a root cause of people data often being wrong – the conceptual conundrum of thinking the numbers are counts when they are actually measures. Practical solutions to address this illusion of data validity are proposed. The implications for theories derived from people data are also highlighted, namely that these people theories are generally wrong as they are often derived from people numbers that are wrong.
AB - This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong. Addressing the potential causes of this wrongness, we present examples of analyzing people numbers, i.e., numbers derived from digital data by or about people, and discuss the comforting illusion of data validity. We first lay a foundation by highlighting potential inaccuracies in collecting people data, such as selection bias. Then, we discuss inaccuracies in analyzing people data, such as the flaw of averages, followed by a discussion of errors that are made when trying to make sense of people data through techniques such as posterior labeling. Finally, we discuss a root cause of people data often being wrong – the conceptual conundrum of thinking the numbers are counts when they are actually measures. Practical solutions to address this illusion of data validity are proposed. The implications for theories derived from people data are also highlighted, namely that these people theories are generally wrong as they are often derived from people numbers that are wrong.
KW - Measurement
KW - People data
KW - Quantitative paradigm
KW - Statistics
UR - http://www.scopus.com/inward/record.url?scp=85144362127&partnerID=8YFLogxK
U2 - 10.1016/j.dim.2022.100020
DO - 10.1016/j.dim.2022.100020
M3 - Article
AN - SCOPUS:85144362127
SN - 2543-9251
VL - 6
JO - Data and Information Management
JF - Data and Information Management
IS - 4
M1 - 100020
ER -