Search Header Logo
Reliability, Validity, Utility

Reliability, Validity, Utility

Assessment

Presentation

Other

Professional Development

Practice Problem

Easy

Created by

Niña Montebon

Used 2+ times

FREE Resource

141 Slides • 57 Questions

1

media

2

Multiple Choice

Which statement about reliability is TRUE?

A) A test must be highly reliable to be valid.
B) A test can be valid even if it has low reliability.
C) A test can be reliable but not valid.
D) Reliability and validity are the same concept.

1

A

2

B

3

C

4

D

3

media

4

Multiple Choice

Which of the following is NOT a definition of reliability?

A) Reliability refers to the consistency of test scores obtained by the same persons when they are re-examined with the same test on different occasions, or with different sets of equivalent items, or under varying examining conditions.

B) Reliability is the extent to which a score or measure is free from measurement error. Theoretically, reliability is the ratio of true score variance to observed score variance.

C) Reliability refers to the consistency in measurement; the extent to which measurements differ from occasion to occasion as a function of measurement error.

D) Reliability is the degree to which a test measures what it intends to measure across different conditions and populations.

1

A

2

B

3

C

4

D

5

media

6

Multiple Choice

In psychological testing, what does the term "error" primarily refer to?
A) Mistakes made by test administrators
B) Random inaccuracies inherent in measurement
C) Poorly designed test items
D) Differences in individual intelligence levels

1

A

2

B

3

C

4

D

7

media

8

Multiple Choice

Which variance should be higher in psychological measurement for a test to be considered reliable?
A) Error variance
B) Environmental variance
C) True variance
D) Test-taker variance

1

A

2

B

3

C

4

D

9

media

10

Multiple Choice

Which of the following best defines measurement error?
A) A systematic flaw in a test’s design
B) The difference between an individual’s true score and observed score
C) The extent to which a test measures what it claims to measure
D) A type of error caused by examiner-related variables

1

A

2

B

3

C

4

D

11

Multiple Choice

Which factor is LEAST likely to introduce random error in psychological testing?
A) Fatigue of the test taker
B) Variations in test administration
C) A misprinted question in the test
D) Test-takers’ varying levels of motivation

1

A

2

B

3

C

4

D

12

media

13

Multiple Choice

Item sampling or content sampling is considered a source of error variance because:
A) Test-takers may perform differently depending on which items are selected for the test.
B) It is impossible to create test items that measure the same construct.
C) Errors in scoring contribute more significantly to reliability issues than item sampling.
D) Test-takers are always equally familiar with all test items.

1

A

2

B

3

C

4

D

14

media

15

media

16

media

17

media

18

Multiple Choice

A test-taker's score is composed of a true score and an error component. What equation best represents this concept?
A) X = T - E
B) X = T / E
C) X = T + E
D) X = E - T

1

A

2

B

3

C

4

D

19

media

20

media

21

Multiple Choice

Which of the following best defines the Domain Sampling Model in measurement theory?
A) The concept that a test score is based on a limited sample of items from a larger domain.
B) The idea that a test should include all possible items from a domain to be valid.
C) The assumption that all test items should be identical to maintain reliability.
D) The belief that measurement error can be completely eliminated with enough test items.

1

A

2

B

3

C

4

D

22

media

23

media

24

media

25

media

26

Multiple Choice

In the Domain Sampling Model, which factor primarily affects the reliability of a test?
A) The total number of test-takers who complete the assessment.
B) The selection of items representing the broader domain of content.
C) The specific order in which test items are presented.
D) The speed at which test-takers complete the assessment.

1

A

2

B

3

C

4

D

27

media

28

media

29

media

30

media

31

media

32

media

33

media

34

media

35

media

36

media

37

media

38

media

39

media

40

media

41

media

42

media

43

media

44

media

45

media

46

Multiple Choice

Which type of reliability measures the consistency of scores obtained by the same individuals when tested at different points in time?
A) Internal consistency reliability
B) Parallel forms reliability
C) Inter-rater reliability
D) Test-retest reliability

1

A

2

B

3

C

4

D

47

Multiple Choice

A researcher develops two versions of a psychological test to ensure the consistency of results across different test forms. Which type of reliability is being assessed?
A) Internal consistency reliability
B) Inter-rater reliability
C) Parallel forms reliability
D) Test-retest reliability

1

A

2

B

3

C

4

D

48

Multiple Choice

A psychologist splits a test into two halves and measures the correlation between them to determine reliability. What is this method called?
A) Parallel forms reliability
B) Test-retest reliability
C) Inter-rater reliability
D) Split-half reliability

1
2

B

3

C

4

D

49

Multiple Choice

What does internal consistency reliability primarily measure?
A) The degree to which test scores remain stable over time
B) The extent to which test items measure the same underlying construct
C) The similarity of results obtained from two equivalent forms of a test
D) The consistency of scores given by different raters

1

A

2

B

3

C

4

D

50

Multiple Choice

A student takes a personality test twice, six weeks apart, and receives significantly different scores. What does this suggest about the test?
A) It has low test-retest reliability
B) It has strong parallel forms reliability
C) It has high internal consistency
D) It has excellent inter-rater reliability

1

A

2

B

3

C

4

D

51

Multiple Choice

Which statistical method is most commonly used to estimate internal consistency reliability?
A) Pearson correlation coefficient
B) Spearman’s rank correlation
C) Cronbach’s alpha
D) Cohen’s kappa

1

A

2

B

3

C

4

D

52

Multiple Choice

A researcher wants to assess the reliability of scores given by multiple raters. Which statistic should be used?
A) Cronbach’s alpha
B) Cohen’s kappa
C) Spearman-Brown formula
D) Standard error of measurement

1

A

2

B

3

C

4

D

53

Multiple Choice

A test developer wants to determine how much the reliability of a test would change if its length were increased. Which statistical formula should be applied?
A) Spearman-Brown formula
B) Standard error of measurement
C) Cohen’s kappa
D) Intraclass correlation coefficient

1

A

2

B

3

C

4

D

54

Multiple Choice

Which statistical method is used to estimate the internal consistency of a test with dichotomous (right/wrong) items?
A) Cronbach’s alpha
B) Cohen’s kappa
C) KR-20
D) Intraclass correlation coefficient

1

A

2

B

3

C

4

D

55

Multiple Choice

When would the Spearman-Brown formula be more appropriate to use than the KR-20 formula?
A) When estimating how reliability would change if a test were shortened or lengthened.
B) When measuring the internal consistency of a test with dichotomous items.
C) When evaluating the inter-rater reliability of a scoring system.
D) When determining the impact of random error on test performance.

1

A

2

B

3

C

4

D

56

media

57

media

58

media

59

media

60

media

61

media

62

media

63

media

64

media

65

media

66

media

67

media

68

Multiple Choice

Which type of reliability is best assessed using test-retest procedures?
A) Internal consistency reliability
B) Inter-scorer reliability
C) Stability reliability
D) Criterion-referenced reliability

1

A

2

B

3

C

4

D

69

Multiple Choice

Which measure of reliability assesses the degree to which test items measure the same construct?
A) Test-retest reliability
B) Inter-rater reliability
C) Internal consistency reliability
D) Stability reliability

1

A

2

B

3

C

4

D

70

Multiple Choice

If a researcher wants to measure internal consistency for a test with items scored in different ways (e.g., Likert scale), which statistic is most appropriate?
A) Spearman-Brown formula
B) Coefficient Alpha (Cronbach’s Alpha)
C) Kuder-Richardson 20 (KR-20)
D) Inter-scorer reliability coefficient

1

A

2

B

3

C

4

D

71

Multiple Choice

How does the nature of a psychological trait influence reliability estimates?
A) Dynamic traits tend to yield lower reliability due to their variability over time.
B) Static traits are always measured with lower reliability than dynamic traits.
C) Traits that change frequently are more reliable in longitudinal studies.
D) Reliability estimates remain the same regardless of whether a trait is dynamic or static.

1

A

2

B

3

C

4

D

72

Multiple Choice

Why do criterion-referenced tests sometimes yield lower reliability coefficients than norm-referenced tests?
A) Criterion-referenced tests aim to classify test-takers rather than differentiate their scores widely.
B) Criterion-referenced tests use items with equal difficulty levels, leading to unstable scores.
C) Criterion-referenced tests are not designed to measure the same construct across different groups.
D) Criterion-referenced tests always contain more measurement error than norm-referenced tests.

1

A

2

B

3

C

4

D

73

Multiple Choice

When evaluating the reliability of a test, which reliability coefficient is considered acceptable for high-stakes decisions such as hiring or clinical diagnosis?
A) 0.50 or higher
B) 0.60 or higher
C) 0.70 or higher
D) 0.90 or higher

1

A

2

B

3

C

4

D

74

Multiple Choice

What is the best course of action if a test has a low reliability coefficient?
A) Increase the number of items measuring the same construct.
B) Reduce the number of items to avoid redundancy.
C) Assume the test is valid despite the low reliability.
D) Use the test only for research purposes without interpreting individual scores.

1

A

2

B

3

C

4

D

75

media

76

media

77

Open Ended

When do we say that a test is valid?

78

Open Ended

Is a valid test also reliable?

79

media

80

media

81

media

82

media

83

media

84

media

85

media

86

Multiple Choice

Which type of validity is most concerned with whether a test covers the relevant subject matter?
A) Construct validity
B) Criterion-related validity
C) Content validity
D) Face validity

1

A

2

B

3

C

4

D

87

Multiple Choice

What distinguishes construct validity from other forms of validity?
A) It focuses on whether the test looks appropriate to test-takers.
B) It evaluates the relationship between test scores and future performance.
C) It assesses whether the test fairly represents all groups taking it. I
D) It involves a comprehensive analysis of how test scores fit into a theoretical framework.

1

A

2

B

3

C

4

D

88

Multiple Choice

Why is construct validity often referred to as "umbrella validity"?
A) It covers all other forms of validity by integrating multiple types of evidence.
B) It is the easiest type of validity to establish statistically.
C) It applies to tests measuring abstract psychological traits.
D) It ensures a test remains valid regardless of the population being tested.

1

A

2

B

3

C

4

D

89

Multiple Choice

When validating a test for hiring decisions, which type of validity is most relevant?
A) Face validity
B) Criterion-related validity
C) Content validity
D) Internal consistency validity

1

A

2

B

3

C

4

D

90

Multiple Choice

If a test measures a psychological trait that changes over time, what impact might this have on its validity?
A) It may have low reliability but strong validity.
B) Its validity may be limited to specific timeframes or conditions.
C) The test’s validity will not be affected as long as it measures consistently.
D) The validity of the test will automatically increase over time.

1

A

2

B

3

C

4

D

91

Multiple Choice

A test developer is responsible for which aspect of validity?
A) Providing evidence to support the test’s validity in the test manual
B) Ensuring that all test-takers achieve similar scores
C) Making sure the test is widely used before proving its validity
D) Guaranteeing the test remains valid for all populations and purposes

1

A

2

B

3

C

4

D

92

media

93

media

94

media

95

media

96

media

97

media

98

media

99

media

100

media

101

media

102

media

103

media

104

media

105

media

106

media

107

media

108

media

109

media

110

media

111

media

112

media

113

media

114

media

115

media

116

media

117

Multiple Choice

How is content validity typically assessed?
A) By comparing test scores with an external criterion
B) By computing a validity coefficient
C) By expert judgment and systematic examination of test content
D) By conducting factor analysis on the test item

1

A

2

B

3

C

4

D

118

Multiple Choice

When a new test correlates moderately with an already validated test measuring the same construct, this is an example of:

A. Discriminant validity
B. Predictive validity
C. Convergent validity
D. Reliability

1

A

2

B

3

C

4

D

119

Multiple Choice

If subscales within a test do not correlate well with the total score, this suggests issues with:

A. Evidence from distinct groups
B. Homogeneity
C. Convergent validity
D. Discriminant validity

1

A

2

B

3

C

4

D

120

Multiple Choice

A spelling test that only assesses the ability to recognize misspelled words but is used to claim that students have strong overall spelling skills lacks:
A) Construct validity
B) Predictive validity
C) Content validity
D) Incremental validity

1

A

2

B

3

C

4

D

121

Multiple Choice

Which of the following is NOT an example of a psychological construct?

A. Self-esteem
B. Job satisfaction
C. Blood pressure
D. Leadership ability

1

A

2

B

3

C

4

D

122

Multiple Choice

Which of the following best illustrates discriminant validity?

A. A self-esteem test correlates highly with an established self-worth test.
B. A new leadership ability test correlates moderately with a validated leadership test.
C. A personality test produces consistent results over time.
D. A test measuring anxiety shows no significant correlation with a test measuring extraversion.

1

A

2

B

3

C

4

D

123

Multiple Choice

Which of the following is an example of predictive validity?
A) A personality inventory aligns with expert psychiatric diagnoses
B) A math aptitude test correlates with students' final math grades months later
C) A reading comprehension test correlates highly with another reading test administered at the same time
D) A vocabulary test consistently produces similar scores over multiple administrations

1

A

2

B

3

C

4

D

124

Multiple Choice

If a test incorrectly identifies a student as highly skilled in logical reasoning when they are not, this is an example of:
A) False negative
B) False positive
C) Incremental validity
D) Base rate error

1

A

2

B

3

C

4

D

125

Multiple Choice

If a newly created anxiety test correlates too highly (e.g., r = 0.95) with an existing anxiety measure, this suggests that:

A. The new test lacks construct validity.
B. The new test is unnecessarily duplicating the existing measure.
C. The test demonstrates strong discriminant validity.
D. The test lacks predictive validity.

1

A

2

B

3

C

4

D

126

media

127

media

128

Open Ended

Can you think of some synonyms of "bias" or "biased"?

129

media

130

media

131

Open Ended

When do we say that bias exists in testing?

132

media

133

media

134

media

135

media

136

media

137

media

138

media

139

media

140

media

141

media

142

media

143

media

144

media

145

media

146

media

147

media

148

media

149

media

150

media

151

media

152

media

153

media

154

Multiple Choice

A rater who avoids giving extremely high or low ratings, instead placing all scores in the middle, is exhibiting:

A. Halo effect
B. Central tendency error
C. Severity error
D. Leniency error

1

A

2

B

3

C

4

D

155

Multiple Choice

If a judge gives a gymnast a much lower score than deserved after witnessing an exceptional performance by the previous competitor, this demonstrates:

A. Contrast effect
B. Leniency error
C. Central tendency error
D. Test bias

1

A

2

B

3

C

4

D

156

Multiple Choice

A teacher who gives a student consistently high ratings in all subjects simply because the student excels in one subject is displaying which rating error?

A. Severity error
B. Leniency error
C. Halo effect
D. Central tendency error

1

A

2

B

3

C

4

D

157

Multiple Choice

A recruiter is extremely strict and gives all job applicants low scores on an interview assessment. This is an example of:

A. Leniency error
B. Severity error
C. Central tendency error
D. Halo effect

1

A

2

B

3

C

4

D

158

Multiple Choice

How can raters minimize rating errors?
A. By considering both subjective impressions and structured criteria when making judgments
B. By focusing primarily on their past experiences rather than standardized guidelines
C. By using well-defined rating scales and participating in training to recognize and reduce bias
D. By aligning their ratings with the average scores given by other raters to maintain consistency

1

A

2

B

3

C

4

D

159

media

160

media

161

media

162

media

163

media

164

media

165

media

166

media

167

media

168

media

169

media

170

media

171

media

172

media

173

media

174

media

175

media

176

media

177

media

178

media

179

media

180

media

181

media

182

media

183

media

184

media

185

media

186

media

187

media

188

media

189

media

190

media

191

media

192

Multiple Choice

Which factor most directly influences the utility of a psychological test?
A. The test’s reliability, validity, and cost-effectiveness in decision-making
B. The length of the test and its number of items
C. The number of people taking the test annually
D. The ease with which test-takers understand the questions

1

A

2

B

3

C

4

D

193

Multiple Choice

A psychological test with high validity but low utility is likely to be:
A. Accurate but not cost-effective in real-world applications
B. Useless in measuring the intended construct
C. A test that has high face validity but low reliability
D. The most preferred assessment tool in applied settings

1

A

2

B

3

C

4

D

194

Multiple Choice

In a utility analysis, which of the following is most important in determining the cost-effectiveness of a test?
A. The time taken to administer the test
B. The test’s ability to improve decision-making outcomes
C. The ease of interpreting the test results
D. The test-takers' subjective satisfaction with the assessment

1

A

2

B

3

C

4

D

195

Multiple Choice

Which of the following is a method used to estimate the financial impact of using a test for selection decisions?
A. Content analysis
B. Brogden-Cronbach-Gleser (BCG) Model
C. Parallel forms reliability analysis
D. Thematic coding

1

A

2

B

3

C

4

D

196

Multiple Choice

Which method of setting cut scores involves expert judgment to classify test-takers into performance categories?
A. The Angoff Method
B. The Test-Retest Method
C. The Item-Response Theory (IRT) Approach
D. The Split-Half Method

1

A

2

B

3

C

4

D

197

media

198

Open Ended

Psychological assessments are evaluated based on three key psychometric properties: reliability, validity, and utility. Explain how these concepts interrelate in the context of psychological testing. Provide examples of situations where a test may be reliable but not valid, valid but not useful, and useful but not highly reliable. Discuss the potential consequences of using a test that lacks one or more of these properties in real-world settings such as education, employment, or clinical diagnosis

media

Show answer

Auto Play

Slide 1 / 198

SLIDE