The problem with studies conducted by gathering opinions is that the measurement system is inherently flawed. Before examining the flaws, take a quick look at a sample Big Five personality test:
http://personality-testing.info/printable/big-five-personality-test.pdf
This test uses what social scientists call the "Likert scale". Usually, 4 or 5 choices are available, where each choice represents the feelings of the survey taker. The scale should be familiar, though its name might not be.
Likert scales are useful for simple surveys designed to summarize the general opinion of the public. The problem starts when social scientists try to use Likert scales to conduct statistical research.
Obviously, the Likert scale is limited by human irrationality. Respondents might avoid the extremes. They might respond in ways that make themselves look good. They might choose an extreme opinion because they think there were not enough extreme answers in the survey.
While all of these complications are worth recognizing, the real problem is that even if people were perfectly honest when answering surveys, and could accurately represent their opinions with a number, the survey will still have problems: problems limited by poor design. Unfortunately, most surveys seem to have poor design.
These are the instructions for the Big Five personality test from the link above: "[F]or each statement 1-50 mark how much you agree with on the scale 1-5, where 1=disagree, 2=slightly disagree, 3=neutral, 4=slightly agree and 5=agree. . . ." Two problems of design in this test are worth discussing.
The first problem is that the five points are equidistant. But there is no justification in claiming that the difference between "agree" and "slightly agree" is the same as the difference between "neutral" and "slightly disagree". If anything, isn't the former pair of choices more similar than the latter pair? "Slightly agreeing" and "agreeing" people are on the same side. Strictly speaking, "slightly disagreeing" and "neutral" people are not.
To fix this problem, perhaps the set of scores {5, 4, 3, 2, 1} should be replaced with the set {5, 3, 0, -3, -5}. The idea is to accentuate the "differences of position"—exact values can be figured out by statisticians. (In addition, it is more intuitive to give negative scores for disagreement. Double negatives could mean something positive, but five "disagrees" should not equal one "agree", as the Likert scale seems to imply.)
The second problem is that the statements are all given the same weight. After the respondent completes marking the 50 questions, the results are added up for each category.
Below are some sample statements from the personality test that will be examined.
Statements 25 and 50 are both questions for the category called openness. This means that the scores from these two statements will be added up. But isn't it harder to have excellent ideas than it is to be full of regular ideas?
Statements 5 and 40 are also for based on openness. Again, these unequal statements are treated equally. (One must possess a rich vocabulary before being able to use difficult words correctly.) Furthermore, the definition of openness from the text (Openness to experience is the personality trait of seeking new experience and intellectual pursuits) suggests that statements concerning ideas should count more than statements about vocabulary size.
This personality test has many more problems that could be discussed. For one thing, the test, which attempts to measure conscientiousness, has 50 statements. An unconscientious person is not going to meticulously answer all 50 statements. And statement 31 is a logical fallacy:
This statement, which is supposed to measure extroversion, presupposes that respondents go to parties. This post, however, was not written to criticize all the problems of the personality test; it was written to criticize the scale that the test uses.
The fundamental problem is the problem of measurement. The Likert scale fails to assign accurate numbers that truly represent the respondents' feelings. Thus any study that relates a value obtained through the Likert scale is subject to criticism. The results from any such study are statistically meaningless unless the questions that measure the variables are carefully evaluated. Likert scales are convenient, but without fixing the flaws in measurement, convenient is all they will be.
http://personality-testing.info/printable/big-five-personality-test.pdf
This test uses what social scientists call the "Likert scale". Usually, 4 or 5 choices are available, where each choice represents the feelings of the survey taker. The scale should be familiar, though its name might not be.
By Nicholas Smithvectorization: Own work - Own work, based on File:Example Likert Scale.jpg |
Likert scales are useful for simple surveys designed to summarize the general opinion of the public. The problem starts when social scientists try to use Likert scales to conduct statistical research.
Obviously, the Likert scale is limited by human irrationality. Respondents might avoid the extremes. They might respond in ways that make themselves look good. They might choose an extreme opinion because they think there were not enough extreme answers in the survey.
While all of these complications are worth recognizing, the real problem is that even if people were perfectly honest when answering surveys, and could accurately represent their opinions with a number, the survey will still have problems: problems limited by poor design. Unfortunately, most surveys seem to have poor design.
These are the instructions for the Big Five personality test from the link above: "[F]or each statement 1-50 mark how much you agree with on the scale 1-5, where 1=disagree, 2=slightly disagree, 3=neutral, 4=slightly agree and 5=agree. . . ." Two problems of design in this test are worth discussing.
The first problem is that the five points are equidistant. But there is no justification in claiming that the difference between "agree" and "slightly agree" is the same as the difference between "neutral" and "slightly disagree". If anything, isn't the former pair of choices more similar than the latter pair? "Slightly agreeing" and "agreeing" people are on the same side. Strictly speaking, "slightly disagreeing" and "neutral" people are not.
To fix this problem, perhaps the set of scores {5, 4, 3, 2, 1} should be replaced with the set {5, 3, 0, -3, -5}. The idea is to accentuate the "differences of position"—exact values can be figured out by statisticians. (In addition, it is more intuitive to give negative scores for disagreement. Double negatives could mean something positive, but five "disagrees" should not equal one "agree", as the Likert scale seems to imply.)
The second problem is that the statements are all given the same weight. After the respondent completes marking the 50 questions, the results are added up for each category.
Below are some sample statements from the personality test that will be examined.
Statements 25 and 50 are both questions for the category called openness. This means that the scores from these two statements will be added up. But isn't it harder to have excellent ideas than it is to be full of regular ideas?
Statements 5 and 40 are also for based on openness. Again, these unequal statements are treated equally. (One must possess a rich vocabulary before being able to use difficult words correctly.) Furthermore, the definition of openness from the text (Openness to experience is the personality trait of seeking new experience and intellectual pursuits) suggests that statements concerning ideas should count more than statements about vocabulary size.
This personality test has many more problems that could be discussed. For one thing, the test, which attempts to measure conscientiousness, has 50 statements. An unconscientious person is not going to meticulously answer all 50 statements. And statement 31 is a logical fallacy:
This statement, which is supposed to measure extroversion, presupposes that respondents go to parties. This post, however, was not written to criticize all the problems of the personality test; it was written to criticize the scale that the test uses.
The fundamental problem is the problem of measurement. The Likert scale fails to assign accurate numbers that truly represent the respondents' feelings. Thus any study that relates a value obtained through the Likert scale is subject to criticism. The results from any such study are statistically meaningless unless the questions that measure the variables are carefully evaluated. Likert scales are convenient, but without fixing the flaws in measurement, convenient is all they will be.
1. Have I told you how much I hate personality tests?
ReplyDelete2. And you didn't even address the fundamental issue of "neutral" on a likert scale. Like does it mean you're stuck in the middle? Half/half? 50% I agree, 50% I disagree? Or it means you have no opinion? Is that even possible? To have no thoughts on it to not have an opinion? Or does it mean not applicable? Lol. So if neutral is 50/50 then is slightly agree 60% agree and 40% disagree? And the fallacies go on and on and on.
3. The logical fallacies with the questions itself I think is mainly due to, they're simply trying to ask the same question again just in a different wording. "Have excellent ideas" & "Full of ideas" and "Have a rich vocabulary" & "Use difficult words". Obviously if you derive the meaning, they're not saying the same thing. But this is just personality tests trying to basically ask the same question several times to remove rooms for error (but they're just creating more aren't they). Like ask the same question 10 times and even if you feel like answering you're not very open for two of the questions, at least 8 times, your true personality is supposed to come out. But obviously, the examples you used as far worse than relatively well-made personality tests. But yeah, you're not here to cut down on those tests. But I am. HAHA...