We systematically assessed the relationships between growth of four components of verbal ability—Information, Similarities, Vocabulary, and Comprehension subtests of the Wechsler Intelligence Scale–Revised—and longitudinal growth from Grades 1 to 9 of the Woodcock–Johnson Psycho-Educational Battery Passage Comprehension subtest while controlling for Word Identification and Word Attack, using multilevel growth models on a sample of 414 children. Growth was assessed over all grades (1-9), and separately for early grades (1-5) and later grades (5-9). Over all grades, growth in Word Identification had a substantial standardized loading to Passage Comprehension, and all four verbal abilities had smaller, but significant standardized loadings to Passage Comprehension (p < .05), with Information and Vocabulary having slightly higher loadings than Similarities and Comprehension. For early grades, results were similar to the overall results, with the exception of Vocabulary, which had a nonsignificant loading to Passage Comprehension. For later grades, Word Identification again had the largest, but substantially smaller standardized loading on Passage Comprehension and standardized loadings of all four verbal abilities were statistically significant with Vocabulary and Wechsler Intelligence Scale for Children–Revised (WISC-R) Comprehension having appreciably higher loadings than in the previous analyses. Conversation- and interaction-based intervention and instruction in oral language in general, and vocabulary in particular throughout early childhood and continuing throughout the school years, combined with evidence-based instruction that systematically develops the skills of phonologic awareness, decoding, word reading, fluency, and comprehension in school, may provide a pathway to reducing the achievement gap in reading.
This study examines the factor structures of Personal and Classroom Achievement Goals and the relationships between them. Multilevel structural equation modeling was used to examine data from a sample of 3,544 Italian 10th-grade students (184 classrooms) who completed the Patterns of Adaptive Learning Scales (PALS). Findings about the factor structure of personal goals were consistent with studies in other cultural contexts. The scales showed measurement invariance both across gender and across various immigrant backgrounds. Boys showed lower levels of mastery and higher levels of performance-approach than girls. Immigrant students scored higher than the native students on all Performance scales. At the group level, a measurement model including mastery and performance-approach goal structures showed good fit indices. In classrooms more oriented toward mastery, students’ personal goals tend to be in the same direction. Classroom performance-approach goal structures were related to performance-avoidance personal orientations but not to performance-approach personal orientations.
The present studies report on the initial development and validation of the Youth Internalizing Problems Screener (YIPS), which is a 10-item self-report rating scale for assessing general internalizing problems and identifying depression and anxiety caseness within the context of school mental health screening. Results from Study 1 (N = 177) demonstrated that responses to the YIPS yielded a single-factor latent structure, that scores derived from the scale had concurrent validity with scores from measures of student subjective well-being and problem behavior, and showed that scores derived from the YIPS demonstrated incremental validity in comparison with scores from another common internalizing problems screener for predicting self-reports of broad student functioning. Findings from Study 2 (N = 219) confirmed the latent structure and internal reliability of responses to the YIPS, demonstrated that scores derived from this scale had strong associations with scores from criterion measures of depression and anxiety, and showed that YIPS scores had good-to-excellent power for accurately discriminating between youth scoring at or above the clinical caseness thresholds on criterion measures of depression and anxiety. Taken together, results suggest the YIPS shows promise as a technically adequate instrument for measuring general internalizing problems and identifying depression and anxiety caseness among secondary students. Implications for future research and practice are discussed.
The selection and interpretation of individually administered norm-referenced cognitive tests that are administered to culturally and linguistically diverse (CLD) students continue to be an important consideration within the psychoeducational assessment process. Understanding test directions during the assessment of cognitive abilities is important, considering the high-stakes nature of these assessments. Therefore, the linguistic demand of spoken test directions from the following commonly used cognitive test batteries was examined and compared: Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V), Woodcock–Johnson IV Tests of Cognitive Abilities (WJ IV COG), Cognitive Assessment System, Second Edition (CAS2), and Kaufman Assessment Battery for Children, Second Edition (KABC-II). On average, the linguistic demand of the standard test directions was greater than the linguistic demand of the supplementary test directions. When examining individual test characteristics, very few individual tests were identified as outliers with respect to the linguistic demand of their test directions. This finding differs from previous research and suggests that the linguistic demand of the required directions for most tests included in commonly used cognitive batteries is similar. Implications for future research and test development are discussed.
The literature contains a variety of assessment tools for measuring the skills of individuals with autism or other developmental delays, but most lack adequate empirical evidence supporting their reliability and validity. The current pilot study sought to examine the reliability of scores obtained from the Assessment of Basic Language and Learning Skills–Revised (ABLLS-R). Two forms of reliability were measured: internal consistency and test–retest reliability. Analyses using data obtained from neuro-typical children (N = 50) yielded strong evidence of internal consistency and test–retest reliability. These preliminary findings suggest that the ABLLS-R can yield reliable scores.
The psychometric properties of a new, multidimensional measure of test anxiety, the Test Anxiety Measure for College Students (TAM-C), were examined in a sample of 720 undergraduate students. Results of confirmatory factor analyses provided support for a six-factor (Cognitive Interference, Physiological Hyperarousal, Social Concerns, Task-Irrelevant Behaviors, Worry, and Facilitating Anxiety) model. Cronbach’s coefficient alphas ranged from .75 to .95 for the TAM-C scores. Gender differences were found on four of the TAM-C scales, with females reporting higher levels of test anxiety than males. Convergent and discriminant evidence of validity for the TAM-C scores was found. Implications of the findings for mental health professionals who work with college students are discussed.
The 3 x 2 achievement goal framework comprising task-approach, task-avoidance, self-approach, self-avoidance, other-approach, and other-avoidance goals is the most recent conceptual development in goal orientation research. The purpose of this study was to extend the current literature by investigating the validity of the 3 x 2 Achievement Goal Questionnaire in a non-Western educational setting. Based on a sample of 384 first-year undergraduate students from a university in Hong Kong, confirmatory factor analyses demonstrated that the six-factor model of achievement goals provided better fit to the data than 10 alternative model structures tested. The six goal subscales have high internal consistency reliability, and gender invariance of the instrument was established via multigroup analysis. Results from multiple regression analyses also revealed that the six achievement goals were differentially predictive of students’ academic performance, deep and surface learning strategies, and instrumental help-seeking.
The Social Competence and Behavior Evaluation (SCBE), originally developed for assessing preschoolers, was adapted for the adolescents. The instrument taps social competence, externalizing and internalizing problems. In the adolescent SCBE, more than 65% of the items (54 items) remained practically the same as in the preschool version, 24 items were modified slightly, and two items were rewritten completely. The instrument was tested on 342 adolescents (M = 14.4 years, SD = .6). The summary scales showed high reliability. Using exploratory structural equation modeling (ESEM), acceptable support for the three-factor model based on 16-item clusters was found, indicating that minimal adjustments to the items of the preschool version allows for the assessment of the same constructs in adolescence. The adolescent version of the SCBE can be valid and reliable instrument for describing social adjustment in adolescents making the SCBE interesting from an international perspective.
This study explored the factor structure and psychometric properties of the Perceived Sense of School Membership (PSSM-18) Scale in two samples of South African adolescents. Principal components analysis (n = 1,052; males = 50.86%, Mage = 14.89, SD = 1.68) supported the retention of 15 items across a revised, three-factor structure of acceptance, belonging, and inclusion (PSSM–South African version [PSSM-SA]). Confirmatory factor analysis (n = 1,418; males = 49.86%, Mage = 14.93, SD = 1.70) provided an acceptable level of fit for the PSSM-SA. The structure was found to be invariant across sex, age, and poverty quintile groupings. Follow-up group comparisons showed selected scales were able to discriminate between groups and predicted alcohol and substance use, and the mean inter-item correlations indicated each scale possessed an appropriate level of internal consistency. The findings suggest the PSSM-SA is a valid and reliable measure of school belonging among South African high school–age children.
This report presents further validation evidence for the Student Subjective Wellbeing Questionnaire (SSWQ). Analyses conducted with a sample of urban middle-school students (Grades 5-8, N = 335) targeted two limitations from previous validation studies: the lack of convergent validity evidence linking responses to the SSWQ with actual school outcomes and the lack of comparative validity evidence demonstrating the relative contributions of the SSWQ’s first-order and second-order factors for predicting criterion variables. Results from the present study confirmed the SSWQ’s higher-order measurement model and then demonstrated that both first-order and second-order factors had substantive effects on several school-reported outcomes, although first-order factors were more robust predictors overall. Implications for theory, practice, and future research are briefly discussed.
W scores are used in a number of commercially available tests. Due to their complex nature, it can be hard for applied researchers and practitioners to understand them or even acquire information about them beyond what is provided in technical manuals. In this article, we provide information regarding the background and derivation of W scores that can aid in understanding and appropriate utilization of these scores.
The Emotional and Behavioral Screener (EBS) is a recently developed teacher-reported brief screening instrument for identifying students who are at-risk of an emotional or behavioral disorder (EBD). Although prior research supports the technical adequacy of scores from the EBS, there is a gap in the literature regarding strong evidence of the factor structure underlying EBS scores. This study investigated the latent structure of scores from the EBS in a sample of 646 elementary students who were rated by their teachers in a 2-week screening period. Single-factor confirmatory factor analysis (CFA) and bifactor models were used to test the hypothesis that EBS scores are a measure of both overall emotional and behavioral risk and students’ externalizing and internalizing behaviors. Results supported a bifactor structure, in that scores from the EBS can be considered to represent a general factor (i.e., risk of EBD) and two group factors (i.e., externalizing and internalizing domains). Findings have implications for interpreting scores when using the EBS as a universal screener for the risk of EBD.
Oral language and word reading skills have important effects on reading comprehension. The Wechsler Individual Achievement Test–Third Edition (WIAT-III) measures both skill sets, but little is known about their specific effects on reading comprehension within this battery. Path analysis was used to evaluate the collective effects of reading and oral language measures on reading comprehension in a total of 456 students referred for special education evaluations. Students were split randomly into two groups (calibration and validation) for model construction and testing. Results indicate that these measures demonstrate a number of effects on reading measures that go beyond the expressive/receptive distinction in the test manual and add to the validity evidence for the WIAT-III. Implications for practice and study limitations are discussed.
This study explored motivation and engagement among 585 Jamaican middle and high school students. Motivation and engagement were assessed via students’ responses to the Motivation and Engagement Scale. Confirmatory factor analysis (CFA) found satisfactory fit, and by most measures, multigroup CFA demonstrated comparable factor structure for males and females, younger and older students, lower and higher socio-economic groups, and Jamaican students compared with a randomly selected sample of 585 Australian students from a normative archive data set. Correlations with a set of validational factors (e.g., engagement, achievement) were also in line with previous research. Taken together, findings suggest that motivation and engagement instrumentation that has received psychometric support in other national and regional contexts also generalizes to students in an emerging regional context (Jamaica).
The study analyzed the factorial and concurrent validity of the Student–Teacher Relationship Scale (STRS) using an exploratory structural equation modeling (ESEM) approach. Participants were 368 Italian children aged 3 to 6 (M = 4.60, SD = 0.98). The three-factor ESEM solution fit the data better than the classical confirmatory factor analysis (CFA) model and the measurement invariance of the scale was confirmed across sex and age (3-4 vs. 5-6 years) groups. The concurrent validity of the STRS was investigated within the ESEM approach using children’s social behaviors as validity criteria. Findings supported the goodness of ESEM over CFA and attested to the validity of the STRS to understanding the teacher–child relationship quality in young children.
This study sought to validate the Short Grit Scale (Grit-S), an instrument that measures perseverance and passion for long-term goals, among Chinese high school students. Confirmatory factor analyses revealed that the scale retains the two-factor structure of the original scale. The scale demonstrated satisfactory internal consistency and test–retest reliability. Evidence for construct validity was found in relation to the Big Five personality traits, self-control, and IQ. Evidence for criterion validity was found via the observation that grit explained unique variance in academic performance. Together, the Grit-S is a sound measure of grit among Chinese adolescents.
This study aims to construct and validate the Career and Educational Decision Self-Efficacy Inventory for Secondary Students (CEDSIS) by using a sample of 2,631 students in Hong Kong. Principal component analysis yielded a three-factor structure, which demonstrated good model fit in confirmatory factor analysis. High reliability was found for the whole scale and each subscale, and construct validity was exhibited by the positive correlations with general self-esteem. Overall, evidence indicates that CEDSIS is a valid and reliable tool for effectively and efficiently assessing the educational and career decision-making self-efficacy of secondary students. Finally, implications and limitations of this study are discussed.
The current study examined the psychometric validity and gender invariance of the Academic Buoyancy Scale in the Philippines through a construct validation approach. In terms of within-network construct validity, our results demonstrated that the unidimensional model of academic buoyancy significantly fit the current sample and was invariant across gender. Male students scored significantly higher than female students on academic buoyancy. Regarding between-network construct validity, our results revealed that academic buoyancy was positively associated with behavioral and emotional engagement. Implications of the findings of the study are discussed.
The present study examined the validity of a newly developed instrument, the Mental Toughness Scale for Adolescents, which examines the attributes of challenge, commitment, confidence (abilities and interpersonal), and control (life and emotion). The six-factor model was supported using exploratory factor analysis (n = 373) and confirmatory factor analysis (n = 372). In addition, the mental toughness attributes correlated with adolescents’ academic motivation and engagement (n = 439), well-being (depression and anxiety; n = 279), and test anxiety (n = 279), indicating relations with a number of affective, cognitive, and behavioral dispositions, and demonstrating relevance in education and potentially mental health contexts.
Through the use of excerpts from one of our own case studies, this commentary applied concepts inherent in, but not limited to, the neuropsychological literature to the interpretation of performance on the Kaufman Tests of Educational Achievement–Third Edition (KTEA-3), particularly at the level of error analysis. The approach to KTEA-3 test interpretation advocated here parallels the cognitive process-oriented approach used by McCloskey and colleagues in their interpretation of the Wechsler scales. This approach is also advocated by Hale and Fiorello as part of their cognitive hypothesis testing model and is inherent in the neuropsychological assessment and interpretation frameworks proposed by Miller and Dehn. For the purpose of this commentary, we describe how this approach to KTEA-3 test interpretation fits within our own Cattell-Horn-Carroll (CHC)-based approach to specific learning disabilities (SLD) identification. To derive maximum benefit from error analysis, practitioners must pay careful attention to the manner in which students respond to test items and copiously document their observations during test administration.
The kinds of errors that children and adolescents make on phonological processing tasks were studied with a large sample between ages 4 and 19 (N = 3,842) who were tested on the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Principal component analysis identified two phonological processing factors: Basic Phonological Awareness and Advanced Phonological Processing. Canonical analysis and correlation analysis were conducted to determine how each factor related to reading, writing, and oral language across the wide age range. Results of canonical correlation analysis indicated that the advanced error factor was more responsible for reading, writing, and oral language skills than the basic error factor. However, in the correlation analysis, both the basic and advanced factors related about equally to different aspects of achievement—including reading fluency and rapid naming—and there were few age differences.
We reviewed 13 studies that focused on analyzing student errors on achievement tests from the Kaufman Test of Educational Achievement–Third edition (KTEA-3). The intent was to determine what instructional implications could be derived from in-depth error analysis. As we reviewed these studies, several themes emerged. We explain how a careful analysis of errors is key to planning the most appropriate instructional interventions.
An understanding of the strengths, weaknesses, and achievement profiles of students with giftedness and learning disabilities (G&LD) is needed to address their asynchronous development. This study examines the subtests and error factors in the Kaufman Test of Educational Achievement–Third Edition (KTEA-3) for strength and weakness patterns of students with G&LD in higher and lower level thinking skills by comparing G&LD students (n = 196) with academically gifted (GT; n = 69) and specific learning disability (SLD) students (n = 90). Several one-way MANCOVAs were conducted with subtest error factor scores as dependent variables and grouping variable (G&LD, GT, or SLD) as the independent variable. The G&LD means scores across subtests were in between the two control groups. On many higher level thinking tasks, the G&LD group scored similar to the gifted group. The results support the use of error analysis to gain further understanding into the profile of students with G&LD.
Children with a specific learning disability in reading/writing (LDRW) and/or language impairment (LI) are likely to have difficulties across all areas of academic achievement, as a great deal of teaching and learning depends on intact reading skill and linguistic communication. Despite a large number of studies examining academic difficulties among these groups, there has been minimal research investigating types of errors made on tests of academic achievement. The present study compared academic error types of children with LDRW (Group 1) and children with LI (Group 3) to two distinct demographically matched control groups (Groups 2 and 4) using the Kaufman Test of Educational Achievement–Third Edition (KTEA-3) error analysis system. Findings indicate that children in the LDRW group or LI group, on average, made a greater number of errors than their matched counterparts. Statistically significant differences, with moderate effect sizes, were found between examinees in the clinical groups and their respective matched control groups across several error categories. Some of the largest differences were found in the Written Expression and Oral Expression subtests. Most importantly, the patterns of errors made by LDRW and LI samples differed notably on the various tasks, providing new insights about these clinical samples.
This study investigated the relationship between specific cognitive patterns of strengths and weaknesses and the errors children make on oral language, reading, writing, spelling, and math subtests from the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Participants with scores from the KTEA-3 and either the Wechsler Intelligence Scale for Children–Fifth Edition (WISC-V), Differential Ability Scales–Second Edition (DAS-II), or Kaufman Assessment Battery for Children–Second Edition (KABC-II) were selected based on their profile of scores. Error factor scores for the oral and written language tests were compared for three groups: High Gc paired with low processing speed, long-term memory, and/or reasoning abilities; Low Gc paired with high speed, memory, and/or reasoning; and Low orthographic and/or phonological processing. Error factor scores for the math tests were compared for three groups: High Gc profile; High Gf paired with low processing speed and/or long-term memory; and Low Gf paired with high processing speed and/or long-term memory. Results indicated a difference in Oral Expression and Written Expression error factor scores between the group with High Gc paired with low processing speed, long-term memory, and/or reasoning abilities; and the group with Low Gc paired with high speed, memory, and/or reasoning.
Although word reading has traditionally been viewed as a foundational skill for development of reading fluency and comprehension, some children demonstrate "specific" reading comprehension problems, in the context of intact word reading. The purpose of this study was to identify specific patterns of errors associated with reading profiles—basic reading difficulties (BRD), reading fluency difficulties (RFD), reading comprehension difficulties (RCD), and typical readers (total n = 821). Results indicated significant differences between the groups on most error factors. Post hoc analyses indicated there were no significant differences between the RFD and RCD groups, but these groups demonstrated different patterns of significant weakness relative to typical readers. The RFD group was weaker in spelling and oral expression whereas the RCD group demonstrated difficulties in writing mechanics and listening comprehension. These findings indicate that comprehension deficits cannot be attributed only to fluency difficulties and that specific reading difficulties may translate to other aspects of achievement.
This study investigated cognitive patterns of strengths and weaknesses (PSW) and their relationship to patterns of math errors on the Kaufman Test of Educational Achievement (KTEA-3). Participants, ages 5 to 18, were selected from the KTEA-3 standardization sample if they met one of two PSW profiles: high crystallized ability (Gc) paired with low processing speed/long-term retrieval (Gs/Glr; n = 375) or high Gs/Glr paired with low Gc (n = 309). Estimates of Gc and Gs/Glr were based on five KTEA-3 subtests that measure either Gc (e.g., Listening Comprehension) or Gs/Glr (e.g., Object Naming Facility). The two groups were then compared on math error factors. Significant differences favored the High-Gc group for factors that measure math calculation, basic math concepts, and complex computation. However, the two groups did not differ in their errors on factors that measure geometry/measurement or simple addition. Results indicated that students with different PSW profiles also differed in the kinds of errors they made on math tests.
This commentary will take an historical perspective on the Kaufman Test of Educational Achievement (KTEA) error analysis, discussing where it started, where it is today, and where it may be headed in the future. In addition, the commentary will compare and contrast the KTEA error analysis procedures that are rooted in psychometric methodology and the process approach to error analysis which is derived primarily from cognitive neuropsychology.
Norm-referenced error analysis is useful for understanding individual differences in students’ academic skill development and for identifying areas of skill strength and weakness. The purpose of the present study was to identify underlying connections between error categories across five language and math subtests of the Kaufman Test of Educational Achievement–Third Edition (KTEA-3) through exploratory factor analyses (EFAs). The EFA results were supportive of models with two or three factors for each of the five subtests. Significant inter-factor correlations within subtests were identified in all subtests, except between two factors within the Math Concepts and Application (MCA) subtest. There was also consistency in the covariance patterns of some error categories across subtests, particularly within the Nonsense Word Decoding (NWD) and Spelling (SP) subtests. This consistency was supportive of the proposed factor structures. The factor structures yielded by these analyses were used as the bases for the other articles in this special issue.
An attention-deficit/hyperactivity disorder (ADHD) diagnosis requires symptoms to be present across two or more settings, thus requiring information from multiple informants. Research consistently shows low to moderate agreement between parents and teachers; however, the mechanisms underlying these discrepancies remain unclear. This study examined (a) agreement between parents and teachers, (b) effects of using different combination rules in assigning diagnoses, and (c) the role of contextual influences and/or personal biases in informants’ reports. Fifty-five children, their parents, and teachers participated. Parent and teacher ratings on the Attention-Deficit/Hyperactivity Disorder Rating Scale–Fourth edition (ADHD-RS-IV) and clinician ratings on the Behavioral Observation of Students in Schools (BOSS) were obtained. Results indicated moderate agreement among parent and teacher ratings on the ADHD-RS. Diagnostically, the rule for combining information from multiple informants dramatically altered the ADHD classification assigned to the child. With regard to rater differences, the clinician-rated school observation gave some support for the notion that ratings are person rather than context specific.
Children’s oral language skills typically begin to develop sooner than their written language skills; however, the four language systems (listening, speaking, reading, and writing) then develop concurrently as integrated strands that influence one another. This research explored relationships between students’ errors in language comprehension of passages across oral and written modalities (listening and reading) and in language expression across oral and written modalities (speaking and writing). The data for this study were acquired during the standardization of the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Correlational analyses from the total sample (n = 2,443-3,552) and within grade bands revealed low to moderate correlations (.26-.50). No evidence of convergent or divergent validity was found when comparing correlations of "same-name" error types (e.g., inferential errors across modalities) with correlations of "different-name" error types. These results support previous research findings and hypotheses that language by ear, eye, hand, and mouth are separable but interacting systems that differ in more ways than modality of input/output.
This study investigated the differences in error factor scores on the Kaufman Test of Educational Achievement–Third Edition between individuals with mild intellectual disabilities (Mild IDs), those with low achievement scores but average intelligence, and those with low intelligence but without a Mild ID diagnosis. The two control groups were matched with the Mild ID clinical cases on demographic variables including age, gender, and parental education. Results showed significant differences between the groups on several error factors, particularly between the Mild ID group and the two control groups, and no significant differences between all three groups on six error factors. In addition, the two control groups differed significantly on four error factors. Implications for intervention selection, diagnostic considerations, and future directions for achievement test creation are discussed.
The purpose of this study was to understand and compare the types of errors students with a specific learning disability in reading and/or writing (SLD-R/W) and those with a specific learning disability in math (SLD-M) made in the areas of reading, writing, language, and mathematics. Clinical samples were selected from the norming population of the Kaufman Test of Educational Achievement–Third Edition (KTEA-3) as well as matched controls. Although the authors expected to find overall differences between the groups in their area of difficulties, the study revealed that the two clinical samples were more similar than different. In particular, the SLD-M clinical group performed lower on some errors that were not related to their area of disability compared with the SLD-R/W group. Implications of the study show the importance of error analysis especially when creating goals for individual education plans. Although a student may have an SLD-R/W, he or she may still need support in certain mathematic areas, and vice versa.
This study investigated developmental gender differences in mathematics achievement, using the child and adolescent portion (ages 6-19 years) of the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Participants were divided into two age categories: 6 to 11 and 12 to 19. Error categories within the Math Concepts & Applications and Math Computation subtests of the KTEA-3 were factor analyzed and revealed five error factors. Multiple ANOVA of the error factor scores showed that, across both age categories, female and male mean scores were not significantly different across four error factors: math calculation, geometric concepts, basic math concepts, and addition. They were significantly different on the complex math problems error factor, with males performing better at the p < .05 significance level for the 6 to 11 age group and at the p < .001 significance level for the 12 to 19 age group. Implications in light of gender stereotype threat are discussed.
This special issue focuses on an array of studies conducted using the Kaufman Test of Educational Achievement–Third Edition (KTEA-3) error analysis system. These studies, based on KTEA-3 standardization and validation data with normal and clinical samples, were conducted to provide greater understanding of the kinds of errors students make in reading, writing, math, and oral language. This introduction provides a brief history of the error analysis system and outlines the organization of the special issue, which features commentaries on the articles by experts in the field. The themes throughout the special issue are patterns of errors made by students and the educational implications of these patterns.
This study investigated the relationship between specific cognitive patterns of strengths and weaknesses (PSWs) and the errors children make in reading, writing, and spelling tests from the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Participants were selected from the KTEA-3 standardization sample based on five cognitive profiles: High Crystallized Ability paired with Low Processing Speed and Long-Term Retrieval (High Gc), Low Crystallized Ability paired with High Processing Speed and Long-Term Retrieval (High Gs/Glr), Low Orthographic Processing (Low OP), Low Phonological Processing (Low PP), and Low Phonological Processing paired with Low Orthographic Processing (Low PP_OP). Error factor scores for all five groups were compared on Reading Comprehension and Written Expression; the first four groups were compared on Letter & Word Recognition, Nonsense Word Decoding, and Spelling, and the first three groups were compared on Phonological Processing. Significant differences were noted among the patterns of errors demonstrated by the five groups. Findings support the notion that students with diverse cognitive PSWs display different patterns of errors on tests of academic achievement.
A large body of research has documented the relationship between attention-deficit hyperactivity disorder (ADHD) and reading difficulties in children; however, there have been no studies to date that have examined errors made by students with ADHD and reading difficulties. The present study sought to determine whether the kinds of achievement errors made by students diagnosed with ADHD vary as a function of their reading ability. The participants in this study were 91 students in the ADHD clinical validity standardization sample of the Kaufman Test of Educational Achievement–Third Edition (KTEA-3), as well as a control group of 63 students selected from the larger standardization sample. Students with ADHD and reading difficulties demonstrated a statistically significant greater amount of errors across tests of academic achievement. Findings from the study are discussed within the context of past research, as well as implications for the field of school psychology and practitioners.
The current study examined the factor structure of the Schutte Self-Report Emotional Intelligence (SSREI) scale with an American college sample (n = 404, 322 females, 88.9% Whites). Data were collected through an online survey, and confirmatory factor analyses were conducted to test several proposed factor models from previous studies. The results showed that the Ng et al. two-level factor model fit the current data best. Implications of the study and the usefulness of SSREI scale among American students were discussed.
The articles presented in this Special Issue provide evidence for many statistically significant relationships among error scores obtained from the Kaufman Test of Educational Achievement, Third Edition (KTEA)-3 between various groups of students with and without disabilities. The data reinforce the importance of examiners looking beyond the standard scores when analyzing results. Although the data in these articles are powerful by themselves, this commentary explores the potential advantages of considering additional information to increase the practicality of these results. Although statistical significance may provide evidential validity of the results, and the articles inform clinical practice and offer valuable leads for further research that should be pursued, the present authors question whether the data as presented provide sufficient information to determine the predictive and practical utility of these initial results. The next step, we believe, is to extend these novel approaches to data from larger, carefully defined samples of students with specific learning challenges and disabilities.
Researchers examined the concurrent and predictive validity of a brief (12-item) teacher-rated school readiness screener, the Kindergarten Student Entrance Profile (KSEP), using receiver operating characteristic (ROC) curve analysis to examine associations between (N = 78) children’s social-emotional (SE) and cognitive (COG) readiness with measures of behavioral/emotional risk and early literacy skills throughout kindergarten. Results indicated statistically significant associations between both subscales of the KSEP (SE and COG) with all outcome variables. Findings provide validity evidence in support of the KSEP as an initial gate in the universal screening process to inform educators on the readiness of incoming kindergarteners.
Claims abound in the research literature regarding multicultural teacher dispositions, including how to foster them in teacher preparation programs. However, measures of multicultural dispositions of teachers that (a) capture the range of conceptually rich constructs and (b) demonstrate strong psychometric properties are not represented in the literature. In this article, we discuss the iterative development and psychometric properties of the Multicultural Teacher Dispositions Scale (MTDS), a survey of 15 items designed to assess three dispositions/factors: Meekness, Social Awareness, and Advocacy. We analyze responses from 372 preservice teachers in three samples and analytic phases, and discuss factor and item analytic results from the final phase. Results demonstrate strong support for Meekness, though moderate support for Social Awareness and Advocacy. We discuss limitations, implications for measure refinement, and eventual use for research and practice improvement.
The present study analyzed the factor validity of the Patterns of Adaptive Learning Scale (PALS) to assess students’ perceptions of mathematics classroom goal structures. Participants were N = 7,773 Italian students aged from 11 to 15 years (M = 11.97, SD = 0.50). The confirmatory factor analysis replicated a three-factor structure (i.e., mastery, performance-avoidance, and performance-approach goals) of the scale. Multigroup confirmatory factor analyses supported configural, metric, and scalar measurement invariance of the scale across gender. Moreover, the students’ mathematics achievement was positively related to mastery goals and negatively associated with performance-avoidance goals. The use of the scale may help teachers to understand the relations between classroom goal structures and mathematics achievement during middle school.
This study sought to better understand the prevalence of concurrent and specific difficulties in reading fluency and vocabulary among adolescents with low reading comprehension. Latent class analysis (LCA) was used to identify a sample of 180 students in sixth through eighth grades with reading comprehension difficulties. A subsequent LCA identified subgroups of students with common patterns of strengths and weaknesses in reading fluency and vocabulary. Results indicated that more than 96% of the students demonstrated deficits in at least one area, with the largest subgroup exhibiting co-occurring difficulties in fluency and vocabulary. Difficulties in fluency were more common than difficulties in vocabulary. Students with low reading comprehension but adequate scores in reading fluency or vocabulary represented only a very small portion of the sample. Coupled with findings from prior studies, results indicate that large numbers of adolescents with reading comprehension difficulties are likely in need of intervention in foundational skill and knowledge areas, which may not be viewed as instructional priorities among secondary educators.
This research aimed to construct and validate the School Conflict Negotiation Effectiveness Questionnaire (SCNEQ). This objective is both based on the increasing relevance of the area of constructive conflict management in schools and also in the scarcity of instruments that try to measure these dimensions in the educational context. We used two samples of students from middle and high school in two urban public schools, one with 622 students and another with 505, the last one to confirm validation. The results of the samples show values of Cronbach’s alpha of .84 and .87, respectively. The data suggest the feasibility and validity of SCNEQ to assess the construct under study. We consider it relevant to continue the psychometric studies of the scale, so future research should address this topic in depth. Concerning the findings, results of the present study reveal that affective groups statistically differ in their self-reported conflict management styles.
It has been suggested that tacit knowledge may be a good predictor of performance in college. The purpose of this study was to investigate the extent to which a situational judgment test developed to measure tacit knowledge correlates with predictors and indicators of college performance. This situational judgment test includes eight situations relevant to the life of college (undergraduate) students and is comprised of 211 behavioral strategies. Four hundred forty-eight college students participated in the study. The results of this study suggest that tacit knowledge has small, statistically nonsignificant correlations with cumulative grade point average (GPA), the percentage of the academic requirements passed on the first attempt, cognitive abilities, achievement motivation, and attention. However, tacit knowledge was found to correlate moderately with the personality factor of agreeableness. The findings do not support claims about the importance of tacit knowledge in academic settings and question what tacit knowledge really is and if it is a useful construct for performance prediction.
This study investigated the heterogeneity of depressive symptom trajectories and the roles of school-related factors in predicting the membership of different trajectories in a sample of early adolescents in Taiwan. In all, 870 junior high school students were followed for 3 years. Using growth mixture modeling, the study identified four distinct trajectories: stable-low depression, stable-moderate depression, steadily increasing depression, and early elevated but later decreasing depression. Female and private school students tended to belong to the high-risk group. Students with negative academic self-concept, low self-esteem, or poor peer relationships tended to follow the two high-risk trajectories (stable-moderate depression and steadily increasing depression). The findings suggested that these school-related factors could be used to target the high-risk depressive symptom groups for receiving further counseling, especially in the East Asian context.
The conceptualization of giftedness continues to be a widely debated topic within the field. Recently, there has been a shift from a psychometric view of giftedness to inclusion of conative and contextual factors. How one defines and conceptualizes "gifted" drives assessment and identification practices. Conceptualization also guides the development of measures used in gifted assessment; however, the perceptions of giftedness held by test authors have not yet been explored empirically. The aim of this study was to investigate the perspectives of giftedness held by authors of the leading tests in gifted assessment. Test authors provided their views on topics, including gifted identification, use of cutoff scores, and how their tests serve the needs of this unique population. Findings indicate that there are varying opinions and views of giftedness, although most test authors in the sample embrace a multi-dimensional approach to gifted assessment. This article includes a discussion of clinical implications and future directions.
This study examined associations between broad cognitive abilities (Fluid Reasoning [Gf], Short-Term Working Memory [Gwm], Long-Term Storage and Retrieval [Glr], Processing Speed [Gs], Comprehension-Knowledge [Gc], Visual Processing [Gv], and Auditory Processing [Ga]) and reading achievement (Basic Reading Skills, Reading Rate, Reading Fluency, and Reading Comprehension) in a nationally representative school-age sample. Findings indicate that some cognitive abilities were stronger predictors of reading achievement than previously found (e.g., Gf, Ga, and Gs). Most notably, the Woodcock-Johnson–IV Gf cluster was found to be the strongest and most consistent predictor of reading achievement. A secondary analysis suggests that this effect was likely due to the new Number Series test. The results of the study suggest revisions to previous conceptualizations of the associations between the broad Cattell-Horn-Carroll abilities and areas of reading achievement.
This article describes the development and validation of a newly designed instrument for measuring the spatial ability of middle school students (11-13 years old). The design of the Spatial Reasoning Instrument (SRI) is based on three constructs (mental rotation, spatial orientation, and spatial visualization) and is aligned to the type of spatial maneuvers and task representations that middle-school students may encounter in mathematics and Science, Technology, Engineering and Mathematics (STEM)-related subjects. The instrument was administered to 430 students. Initially, a set of 15 items were devised for each of the three spatial constructs and the 45 items were eventually reduced to 30 items on the basis of factor analysis. The three underpinning factors accounted for 43% of variance. An internal reliability value of .845 was obtained. Subsequent Rasch analysis revealed appropriate item difficulty fit across each of the constructs. The three constructs of the SRI correlated significantly with existing well-established psychological instruments: for mental rotation (.71), spatial orientation (.41), and spatial visualization (.66). The psychometric characteristics of SRI substantiate the use of this measurement tool for research and pedagogical purposes.
We used existing reading (n = 1,498) and math (n = 2,260) data to evaluate state test scores for screening middle school students. In Phase 1, state test data were used to create a research-derived cut score that was optimal for predicting state test performance the following year. In Phase 2, those cut scores were applied with future cohorts. Diagnostic accuracy of the research-derived cut scores was compared with the state’s proficiency benchmark from the previous year. Across grades and content areas, research-derived cut scores yielded higher sensitivity and lower specificity values relative to state-defined cut scores. Marked decreases in sensitivity and specificity were not observed in subsequent years. Results provide evidence for procedures in which previous state test data are repurposed for screening decisions.
Measuring human motivation requires understanding the outcomes individuals value and the strategies they prefer to employ to attain them. Knowledge of promotion and prevention, two pivotal motivation orientations, provide key information regarding these aspects. The Regulatory Focus Questionnaire, which measures these two independent constructs, was validated using data provided by university students and alumni of an elite U.S. university. Thus, little is known whether this instrument provides reliable and valid measures of promotion and prevention in a population of younger respondents from a different culture. To bridge this gap, the study employed data collected from three independent large samples of New Zealand secondary school students and used the jigsaw piecewise technique in combination with confirmatory factor analyses. Findings show that, in this population, items in the Regulatory Focus Questionnaire measure promotion and prevention as theoretically distinct constructs.
R. Goodman’s Strength and Difficulties Questionnaire (SDQ) is widely used to measure emotional and behavioral difficulties in childhood and adolescence. In the present study, we examined whether the SDQ measures the same construct across time, when used for longitudinal research. A nationally representative sample of parents (N = 3,375) provided data on their children at ages 4, 5, and 6 years. Using confirmatory factor analysis (CFA) for ordinal data, two competing models (three-factor model vs. five-factor model) were tested to establish equivalence across time. Results showed that the five-factor model had a superior fit to the data compared with the alternative three-factor model which only achieved an adequate fit at a configural level. Strong longitudinal factorial invariance was established for the five-factor parent version of the SDQ. Our findings support the use of the SDQ in longitudinal studies and provide the important psychometric information required for basing educational, clinical, and policy decisions on outcomes of the SDQ.
Behavioral and emotional problems among children and adolescents can lead to numerous negative outcomes without intervention. From a prevention standpoint, screening for behavioral and emotional risk is an important step toward identifying such problems before the point of diagnosis or referral. The present study conducted a k-means cluster analysis to determine the subtypes of risk captured by one such screening instrument, the Behavioral and Emotional Screening System (BESS). The final solution produced four clusters: Well-Adapted, Internalizing/Adjustment Problems, Mild Externalizing Problems, and General Problems-Severe; these results were similar to those found with the full Behavioral Assessment System for Children, Second Edition (BASC-2), suggesting that the BESS assesses similar constructs. Predictive validity evidence suggested that cluster membership was associated with standard achievement scores and in-school disciplinary incidents.
The Flourishing Scale (FS) is a brief eight-item inventory used to measure psychological well-being. This study evaluated the psychometric properties of the FS in a sample of 766 Chinese adolescents. The paper-and-pencil method was adopted. Confirmatory factor analysis was conducted to examine the factor structure of the FS items. Expanded Satisfaction With Life Scale and Hospital Anxiety and Depression Scale were used to examine the criterion-related and incremental validities. Results showed good internal consistency reliability, one-factor structure, strong convergent validity, and incremental validity of the FS in the current sample. We can conclude that the FS is suitable for use in the Chinese adolescent context.
Career goal feedback provides information about career goal suitability, adequacy of goal progress, and whether changes are needed to reach the goals. Feedback comes from external (e.g., parents, peers) and internal sources (e.g., self-reflection), and plays an important role in the career development of young people. As there is no existing measure that adequately captures this construct, we devised and validated a 24-item inventory for use with young adults. In Study 1, initial items were developed, expert reviewed, and administered to a sample of Chinese university students (N = 1,055; MAGE = 19 years). We used exploratory factor analysis to test the factor structure and confirmatory factor analysis on a holdout sample to validate a third-order solution (one third-order factor manifested by three second-order factors). In addition, we provided evidence for convergent and incremental validity. In Study 2, we confirmed the factor structure on Australian university students (N = 184; MAGE = 19 years).
Emotional Intelligence Scale (EIS) is a popular EI measure. Yet, it has been criticized for an unclear factor structure, and its psychometric properties were mainly examined in the Western context. This study was to evaluate its psychometric properties based on 1,724 Hong Kong undergraduate students, including its (a) factor structure, (b) internal consistency, and (c) criterion validity. We compared different factor structures reported in the literature. The confirmatory factor analysis (CFA) results supported a six-factor structure, which is tallied with Salovey and Mayer’s EI conceptualization. A multigroup CFA also rendered the structure as gender invariant. The scale was internally consistent with high McDonald’s omega coefficients. Significant association between EI and grade point average (GPA) was revealed in the faculties with people-oriented studies. Furthermore, EI was correlated with social, cognitive, and self-growth outcomes and satisfaction of university experience. The study contributes to clarify the factor structure and provides new reliability and validity evidence of the EIS in the Eastern context.
In the current research, we illustrate the impact that item wording has on the content of personality scales and how differences in item wording influence empirical results. We present evidence indicating that items in certain scales used to measure "adaptive" perfectionism fail to capture the disabling all-or-nothing approach that is synonymous with the individual who is driven to attain perfection. Original and modified versions of two perfectionism measures of high personal standards and modified perfectionistic standards versions of these scales were administered to three samples of participants. A series of analyses established that item wording does indeed matter. In particular, our results differed for a modified version of the Almost Perfect Scale–Revised when the focus was on a conceptualization and assessment of perfectionism that is fundamentally different from conscientious striving. The current findings are discussed in terms of their implications for scale construction and item wording in general and for the measurement of perfectionism in particular. The specific implications of these findings are examined in terms of understanding dysfunctional perfectionism and the current debate about whether certain aspects of perfectionism are adaptive versus maladaptive.
There is growing interest in perfectionism among children and adolescents as well as growing interest in the measures designed to assess perfectionism in young people. The current article describes the development and psychometric characteristics of the Child–Adolescent Perfectionism Scale (CAPS), a measure that assesses self-oriented perfectionism and socially prescribed perfectionism. The results of three studies involving multiple samples are reported. The psychometric features of this measure are summarized, including extensive data that attest to the reliability and validity of the CAPS subscales. Normative data are also provided in Study 1. The results of Study 2 suggest that the academic behavior of perfectionistic students is motivated by a complex blend of factors that include a strong emphasis on introjected regulation in both self-oriented and socially prescribed perfectionism; however, there are key motivational differences between these perfectionism dimensions. Finally, Study 3 confirmed that self-oriented and socially prescribed perfectionism are associated with various indices of stress, distress, and maladjustment. Collectively, our results support the use of the CAPS and the notion that vulnerable children and adolescents who are perfectionistic are under substantial pressure to meet expectations. The assessment and theoretical implications of these results are discussed.
The Cattell–Horn–Carroll (CHC) model is a comprehensive model of the major dimensions of individual differences that underlie performance on cognitive tests. Studies evaluating the generality of the CHC model across test batteries, age, gender, and culture were reviewed and found to be overwhelmingly supportive. However, less research is available to evaluate the CHC model for clinical assessment. The CHC model was shown to provide good to excellent fit in nine high-quality data sets involving popular neuropsychological tests, across a range of clinically relevant populations. Executive function tests were found to be well represented by the CHC constructs, and a discrete executive function factor was found not to be necessary. The CHC model could not be simplified without significant loss of fit. The CHC model was supported as a paradigm for cognitive assessment, across both healthy and clinical populations and across both nonclinical and neuropsychological tests. The results have important implications for theoretical modeling of cognitive abilities, providing further evidence for the value of the CHC model as a basis for a common taxonomy across test batteries and across areas of assessment.
The 2 x 2 model of perfectionism conceptualizes perfectionism as the within-person combinations of self-oriented and socially prescribed perfectionism to define four subtypes of perfectionism. This model posits that each subtype is distinctively associated with self-determined motivation and psychological adjustment. Results of latent moderated structural equation model with data from a sample of 559 university students with our newly developed MPLUS syntax codes to estimate simple slopes and their statistical significance supported this hypothesis. As expected, pure self-oriented perfectionism was associated with higher academic self-determination and academic satisfaction relative to mixed perfectionism. Mixed perfectionism was also associated with higher academic self-determination and satisfaction than was pure socially prescribed perfectionism. Results of a latent mediated moderation structural equation model also showed that academic self-determined motivation significantly mediated the relationships between perfectionism subtypes and academic satisfaction. The indirect effects of the four simple slopes, tested with our newly developed MPLUS syntax codes, all reached statistical significance. On substantive grounds, the different amounts of autonomy or self-determination associated with each of the four subtypes of perfectionism of the 2 x 2 model explicate why they are distinctively associated with academic satisfaction. On methodological grounds, this study offered a roadmap to examine the hypotheses of the 2 x 2 model of perfectionism with latent moderated structural equation modeling.
The purpose of this study was to evaluate the construct validity of Ryff’s Scales of Psychological Well-Being (SPWB) using exploratory structural equation modeling (ESEM). The data were drawn from the national survey of Midlife in the United States conducted during 1994 and 1995. Measurement models assuming different number of factors (1-6 factors) and considering the effect of negatively wording items were specified and compared to determine optimal number of underlying factors. The discriminant validity was assessed following Farrell’s suggestions. The results showed the discriminant validity was questionable due to five indicators with considerable cross-loadings.
Research on perfectionism with the Almost Perfect Scale–Revised (APS-R) distinguishes adaptive perfectionists versus maladaptive perfectionists based primarily on their responses to the 12-item unidimensional APS-R Discrepancy subscale, which assesses the sense of falling short of standards. People described as adaptive perfectionists have high standards but low levels of discrepancy (i.e., relatively close to attaining these standards). Maladaptive perfectionists have perfectionistic high standards and high levels of discrepancy. In the current work, we re-examine the psychometric properties of the APS-R Discrepancy subscale and illustrate that this supposedly unidimensional discrepancy measure may actually consists of more than one factor. Psychometric analyses of data from student and community samples distinguished a pure five-item discrepancy factor and a second four-item factor measuring dissatisfaction. The five-item factor is recommended as a brief measure of discrepancy from perfection and the four-item factor is recommended as a measure of dissatisfaction with being imperfect. Overall, our results confirm past suggestions that most people with maladaptive perfectionism are characterized jointly by chronic dissatisfaction as well as a sense of being discrepant due to having fallen short of expectations. These findings are discussed in terms of their implications for the assessment of perfectionism, as well as the implications for research and practice.
This article introduces a new measure of dispositional perfectionism: the Big Three Perfectionism Scale (BTPS). The BTPS assesses three higher-order global factors (rigid perfectionism, self-critical perfectionism, narcissistic perfectionism) via 10 lower-order perfectionism facets (self-oriented perfectionism, self-worth contingencies, concern over mistakes, doubts about actions, self-criticism, socially prescribed perfectionism, other-oriented perfectionism, hypercriticism, grandiosity, entitlement). The present investigation examined the structure of the BTPS using exploratory factor analysis in Study 1 (288 undergraduates) and confirmatory factor analyses in Study 2 (352 community adults) and Study 3 (290 undergraduates). Additionally, in Study 3 the relationships among the BTPS, other measures of perfectionism, and the five-factor model of personality were investigated. Overall, findings provide first evidence for the reliability and validity of the BTPS as a multidimensional measure of perfectionism.
As part of universal screening efforts in schools, validated measures that identify internalizing distress are needed. One promising available measure, the Depression, Anxiety, and Stress Scales–21 (DASS–21), has yet to be thoroughly investigated with adolescents in the United States. This study investigated the underlying factor structure of the DASS–21 in a sample of U.S. adolescents (N = 2,454) by using confirmatory factor analytic techniques to test several alternate models. A bifactor model specifying general Negative Affectivity and three specific factors of Depression, Anxiety, and Stress yielded the best fit. Results from this study suggest that (a) the DASS–21 scales reflect a common factor, indicating that a total score of the DASS–21 can be derived as a measure of general negative affectivity, and (b) the DASS–21 may not adequately differentiate between the experiences of negative affectivity, anxiety, and stress in U.S. adolescents.
The Test of Early Mathematics Ability–Third Edition (TEMA-3) is a commonly used measure of early mathematics knowledge for children aged 3 years to 8 years 11 months. In spite of its wide use, research on the psychometric properties of TEMA-3 remains limited. This study applied the Rasch model to investigate the psychometric properties of TEMA-3 from three aspects: technical qualities, internal structure, and convergent evidence. Data were collected from 971 K1 children in Singapore. Item fit statistics suggested a reasonable model-data fit. The TEMA-3 items were found to demonstrate generally good technical qualities, interpretable internal structure, and reasonable convergent evidence. Implications for test development, test use, and future research are further discussed.
Twenty-five years ago, one of the first empirically validated measures of perfectionism, the Frost et al. Multidimensional Perfectionism Scale (F-MPS) was published. Since that time, psychometric studies of the original F-MPS have provided a plethora of evidence to support the potential development of a shorter yet still psychometrically robust version of the measure. Using confirmatory factor analyses across community and clinical samples, the current study identifies an eight-item F-MPS-Brief with two dimensions (i.e., striving and evaluative concerns) that evidences good internal consistency, measurement equivalence across ethnicities, and concurrent and convergent validity. This new, short version of the F-MPS captures well the bidimensional model of perfectionism that has emerged across studies over the past two decades and is suggested for use when a short yet high-performing assessment tool for this model is desired.
Valid and reliable instruments are required to appropriately study perfectionism. With this in mind, three studies are presented that describe the development and initial validation of a new instrument designed to measure multidimensional performance perfectionism for use in sport (Performance Perfectionism Scale–Sport [PPS-S]). The instrument is based on Hewitt and Flett’s (1991) model of perfectionism and includes self-oriented, socially prescribed, and other-oriented performance perfectionism. These dimensions encapsulate the features of Hewitt and Flett’s dimensions but are focused on athletic performance rather than life generally. The three studies outline item generation and refinement, exploratory, confirmatory, and exploratory-confirmatory examination of factor structure, and initial assessment of construct validity in multiple samples of adolescent and young adult athletes. Findings suggest that the PPS-S is likely to be a reliable and valid measure of performance perfectionism in youth sport. As validation continues, we expect the instrument to have wider applicability for use in adults and other performance contexts (e.g., education and work).
The perfectionism field has advanced considerably over the past 25 years, but researchers typically focus on substantive findings, and there has been comparatively little systematic emphasis on measurement issues. This special issue introduces new perfectionism measures and examines several important measurement topics. This special issue advances the theme that how constructs are conceptualized and measured has a direct impact on the findings that emerge in empirical research. We provide an overview of specific topics addressed in this special issue, including the importance of distinguishing between perfectionism versus conscientiousness and the role of assessment in documenting the heterogeneity that exists among people who all describe themselves as perfectionists. It is evident from the papers in this special issue that the complexities inherent in the perfectionism construct require an equally complex and sophisticated measurement approach. Further advances in the perfectionism field depend largely on implementing a programmatic approach to measurement and assessment.
This study investigated the cross-cultural validity of the Preschool Learning Behavior Scale (PLBS) in the Chinese cultural context. Multiple approaches were used for this purpose, including exploratory factor analysis, confirmatory factor analysis, criterion-related validity evidence, and internal consistency reliability estimates. The findings generally supported the PLBS’ three-factor structure (Competence Motivation, Learning Strategy, and Attention/Persistence) as used in the Chinese cultural context, and with minor adaptations, PLBS could be a psychometrically sound measure for assessing the learning behaviors of Chinese children.
Numerous studies have identified differences between males and females in academic performance across the areas of reading, writing, and mathematics. The current study examined whether or not gender differences exist when math curriculum–based measures (M-CBMs) are used to assess basic math computation skills in a sample of third- through eighth-grade students. Participants included 1,626 general and special education students from five schools in a rural southeastern school district. Two-way repeated measures ANOVAs were used to determine significance across genders at each grade level. Statistically significant differences in favor of females were found in Grades 5, 7, and 8. The discussion highlights applied and theoretical implications of these findings.
Fluency is an important construct in clinical assessment and in cognitive taxonomies. In the Cattell–Horn–Carroll (CHC) model, Fluency is represented by several narrow factors that form a subset of the long-term memory encoding and retrieval (Glr) broad factor. The CHC broad classification of Fluency was evaluated in five data sets, and the CHC narrow classification was evaluated in an additional two data sets. The results suggest that Fluency tests are more strongly related to processing speed (Gs) and acquired knowledge (Gc) than to Glr, but Fluency may also be represented as a distinct broad factor. In the two additional data sets with a large number of Fluency tests, the CHC Fluency narrow factors failed to replicate with confirmatory factor analysis. An alternative and simpler narrow structure of Fluency was found, supporting the factorial distinction of semantic versus orthographic Fluency. The results have important implications for the factorial structure of memory, the classification of Fluency tests, and the assessment of Fluency.
When using educational/psychological instruments, psychometric investigations should be conducted before adopting to new environments to ensure that an instrument measures the same constructs. Exploratory structural equation modeling and confirmatory factor analysis methods were used to examine the utility of the short form of the Pediatric Symptoms Checklist (PSC-17) in the school setting. Using a sample of 836 preschool children rated by teachers, three factors were identified across both techniques, with factors matching the hypothesized structure of the instrument. The PSC-17 may be an option for use in preschool settings when conducting behavioral and emotional screening.
The present study reports on the initial validation of the eight-item version of the Avoidance and Fusion Questionnaire for Youth (AFQ-Y8) as a school mental health screener for identifying clinical-level depression and anxiety caseness within a sample of urban high school students (N = 219). Results indicated that responses to the AFQ-Y8 yielded better data–model fit and comparable internal consistency and convergent validity in relation to responses to the longer, 17-item version of the measure. Findings from receiver operating curve (ROC) analyses showed that scores derived from the AFQ-Y8 had excellent discrimination ability for correctly classifying students with and without clinical-level depression (area under the curve [AUC] = .91) and anxiety (AUC = .92), and that a cutoff score of ≥15 yielded optimal sensitivity (.86, .92) and specificity (.88, .87) for accomplishing these purposes. Taken together, findings suggest the AFQ-Y8 is a technically adequate instrument for both measuring psychological inflexibility and classifying students with clinical-level internalizing problems. Implications for future research and practice are discussed.
The Resiliency Scale for Young Adults (RSYA) is presented as an upward extension of the Resiliency Scales for Children and Adolescents (RSCA). The RSYA is based on the three-factor model of personal resiliency including mastery, relatedness, and emotional reactivity. Several stages of scale development and studies leading to the current RSYA are described that provide construct validity (i.e., internal consistency, confirmatory factor analyses, and convergent–divergent validity) support for the three-factor structure and 10 subscales of this measure for young adults who are attending college. This work is a step in a longer-term project of translating the constructs of personal resiliency for application across the life span.
The current study attempted to answer whether a specific executive functioning profile for individuals with test anxiety exists and whether deficits in working memory are associated with an earlier onset of test anxiety. Two hundred eighty-four undergraduate students completed a survey on test anxiety and self-report measures of test anxiety and executive functioning. Executive functioning profiles were compared between test anxiety groups (below average, average, and above average) for differences in severity and pattern. Onset of test anxiety was analyzed in relation to working memory. Executive functioning profiles were found to significantly vary in severity and pattern based on level of test anxiety. Working memory varied significantly based on onset of test anxiety. These results suggest that deficits across multiple areas of executive functioning may provide a more robust etiology for test anxiety. In addition, working memory deficits may be an early indicator for the development of test anxiety.
We examined the structure of the new Block Patterns (BP) test from the Shipley Institute of Living Scale–Second Edition in a sample of Jamaican young adults. To date, very little has been published on the properties of this subtest’s items and scores. The BP test is similar in design to the Block Design subtest found in many cognitive ability assessments but uses a matching format that minimizes the need for excess materials and time. We analyzed the BP items using item response theory (IRT) methods. Although designed to measure a single construct, the analyses from this study found that the BP subtest is likely measuring more than a single construct, which confounds the interpretation of the instrument’s scores. Before the subtest is used clinically, more research should be done to purposefully investigate the effects of ancillary variables on its scores.
The No Child Left Behind Act requires that 95% of students in all public elementary and secondary schools are assessed in mathematics. Unfortunately, direct assessments of young students can be timely, costly, and challenging to administer. Therefore, policy makers have looked to indirect forms of assessment, such as teachers’ ratings of student skills, as a substitute. However, prekindergarten teachers’ ratings of students’ mathematical knowledge and skills are only correlated with direct assessments at the .50 level. Little is known about factors that influence accuracy in teacher ratings. In this study, we examine the influence of student and teacher characteristics on prekindergarten teachers’ ratings of students’ mathematical skills, controlling for direct assessment of these skills. Results indicate that students’ race/ethnicity and social competency, as well as teachers’ self-efficacy, are significantly related to prekindergarten teachers’ ratings of students’ mathematical skills.
This study examined the psychometric properties of the school engagement measure (SEM) in Singapore. The sample consisted of 1,027 students from a multi-ethnic Singapore adolescent community. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) confirmed that the two-factor solution, namely, (a) Emotional and (b) Cognitive Engagement, was the best fit for the data. Reliability for the 11-item SEM as well as its Emotional and Cognitive subscales was good. Concurrent validity was assessed using correlations between Emotional Engagement, Cognitive Engagement, SEM Total scores, and the measure of aggression and delinquency. Statistically significant negative correlations were found between SEM scores with aggression and delinquency. Taken together, the findings suggest that the SEM is a useful instrument for assessing school engagement among Asian students, in particular in Singapore.
Students often feel time pressure when taking tests, and students with disabilities are sometimes given extended time testing accommodations, but little research has been done on the factors that affect students’ test-taking speed. In the present study, 253 students at two colleges completed measures of processing speed, reading fluency, and self-reports of their reading and test-taking skills, as well as a standardized paper-and-pencil reading comprehension task. The time taken to complete the reading comprehension task was not significantly related to students’ accuracy on the task, but it was predicted by students’ reading fluency and by their self-reports of problems with timed reading/test-taking. Students’ processing speed did not significantly predict comprehension task completion time or accuracy when reading fluency and self-reports were held constant. We discuss the implications of these and other results for making determinations about extended time testing accommodations, as well as for future research studies.
The proliferation of Internet usage has motivated Internet researchers and practitioners to study possible gratifications underlying Internet use. Despite the fact that research examining Internet gratification is more than two decades old, no attempt has been made in the last decade to develop an instrument that has known reliability of scores and validity of inferences to examine the various Internet gratifications. To bridge this gap, the present study has estimated the reliability of scores and validity of inferences of a 27-item instrument, examining different Internet uses and gratification (U&G) among 1,914 adolescent Internet users. The development and validation process involved exploratory and confirmatory factor analyses, examination of convergent and discriminant validity, and other measures of construct validity and reliability. The factor analyses revealed a six-factor structure, representing six Internet gratifications, namely, information seeking, exposure, connection, coordination, entertainment, and social influence. This instrument exhibits excellent internal reliability. The practical and theoretical contributions of this instrument are also presented.
The current investigation studied psychometric properties of the Homework Emotion Regulation Scale (HERS) for math homework, with 915 tenth graders from China. Confirmatory factor analyses (CFAs) supported the presence of two separate yet related subscales for the HERS: Emotion Management and Cognitive Reappraisal. The latent factor means for both subscales were shown to be invariant across gender. Furthermore, both subscales were positively related to homework purposes and behaviors (effort and completion) in the theoretically expected directions. Meanwhile, math performance was positively related to emotion management, but not cognitive reappraisal.
Researchers investigated the diagnostic utility of the Social Skills Improvement System: Performance Screening Guide (SSIS-PSG). Correlational, regression, receiver operating characteristic (ROC), and conditional probability analyses were run to compare ratings on the SSIS-PSG subscales of Prosocial Behavior, Reading Skills, and Math Skills, to report card grades for conduct, reading, and math, respectively. Respective subscales were all statistically significantly correlated with one another. In addition, all regressions indicated significant predictions for the SSIS-PSG to respective report card grades. ROC analyses for SSIS-PSG Math with math grades and SSIS-PSG Reading with reading grades were statistically significant and described as fair. ROC analysis for SSIS-PSG Prosocial Behavior with conduct grades was not significant and described as poor. In a conditional probability analysis, the variable of concern for screeners concerns false negative ratios; all estimates for this fell within the targeted range.
Several methods of assessing executive function (EF), self-regulation, language development, and social development in young children have been developed over previous decades. Yet new technologies make available methods of assessment not previously considered. In resolving conceptual and pragmatic limitations of existing tools, the Early Years Toolbox (EYT) offers substantial advantages for early assessment of language, EF, self-regulation, and social development. In the current study, results of our large-scale administration of this toolbox to 1,764 preschool and early primary school students indicated very good reliability, convergent validity with existing measures, and developmental sensitivity. Results were also suggestive of better capture of children’s emerging abilities relative to comparison measures. Preliminary norms are presented, showing a clear developmental trajectory across half-year age groups. The accessibility of the EYT, as well as its advantages over existing measures, offers considerably enhanced opportunities for objective measurement of young children’s abilities to enable research and educational applications.
Classroom observations increasingly inform high-stakes decisions and research in education, including the allocation of school funding and the evaluation of school-based interventions. However, trends in rater scoring tendencies over time may undermine the reliability of classroom observations. Accordingly, the present investigations, grounded in social psychology research on emotion and judgment, propose that state emotion may constitute a source of psychological bias in raters’ classroom observations. In two studies, employing independent sets of raters and approximately 5,000 videotaped fifth- and sixth-grade classroom interactions, within-rater state positive emotion was associated with favorable ratings of classroom quality using the Classroom Assessment Scoring System (CLASS). Despite various protections enacted to secure reliable and valid observations in the face of rater trends—including professional training, certification testing, and routine calibration meetings—emotional bias still emerged. Study limitations and implications for classroom observation methodology are considered.
This study examined the psychometric properties of the Achievement Goal Questionnaire–Revised (AGQ-R) in a sample of Singapore secondary students (N = 1,171). Confirmatory factor analyses provided support for the AGQ-R in measuring the four achievements goals delineated in the 2 x 2 framework. Measurement invariance across ethnic groups was supported via multigroup analysis. Multidimensional Rasch analysis revealed that only one item on the instrument showed a slight misfit, and the distribution of items also matched reasonably well with students’ achievement goals levels, though there were some students on the upper end of the continuum whose positions were not well-covered. Examination of the 5-point rating scale showed that while the response categories demonstrated monotonicity, two of the lower categories were not clearly differentiated. These findings suggest that the AGQ-R has adequate psychometric properties for use with school-aged students in Singapore, but a rating scale with fewer categories should be considered.
U.S. public education systems are required to provide free appropriate public education to students with disabilities in least restrictive environments that are appropriate to meet their individual needs. The practice of educating students with disabilities in neighborhood schools in age-appropriate general education classrooms and other school settings to meet this requirement has come to be known as "inclusive education." The long-standing interest in keeping students with disabilities in the same classrooms with their neighbors and peers has created a need for reform to establish equity in America’s schools. Schoolwide Integrated Framework for Transformation (SWIFT) is a whole-system school reform model provided through a national technical assistance center that addresses core features of inclusive education support for elementary and middle schools, particularly those that are chronically low performing and those serving students with the most extensive needs. We describe the development and preliminary technical adequacy of SWIFT Fidelity of Implementation Tool (SWIFT-FIT) as a means to document the extent to which schools are implementing inclusive education. Findings provide preliminary support for trained assessors using SWIFT-FIT as a valid and reliable instrument to produce evidence that describes the extent to which schools install, implement, and sustain these evidence-based practices. Researchers and other school personnel can use these data to evaluate the impact of implementation on progress as well as important student and other outcomes.
When conducting classroom observations, researchers are often confronted with the decision of whether to conduct observations live or by using pre-recorded video. The present study focuses on comparing and contrasting observations of live and video administrations of the Classroom Assessment Scoring System–PreK (CLASS-PreK). Associations between versions, mean differences, reliability, and predictive validity were examined. Results generally indicated high correlations between versions. Video codes were slightly lower on average than live codes. Reliability was generally acceptable in terms of Cronbach’s alpha, but multigroup confirmatory factor models suggested some differences between observation types. Finally, CLASS scores based on each observation type indicated some predictive validity of children’s academic achievement, but no observation type was uniformly better. The discussion focuses on why the codes might differ and the implications of those differences.
The reliance on self-reports in detecting noncredible symptom report of attention-deficit/hyperactivity disorder in adulthood (aADHD) has been questioned due to findings showing that symptoms can easily be feigned on self-report scales. In response, Suhr and colleagues developed an infrequency index for the Conners’ Adult ADHD Rating Scale (CII) and provided initial validation for its utility in detecting noncredible symptom report. The aim of this study was to evaluate the utility of the CII in detecting noncredible aADHD symptom report by using a simulation design. Data did not support the validity of the CII for the detection of noncredible aADHD symptoms, as it failed to differentiate instructed malingerers from genuine patients with sufficient accuracy. It is concluded that there is a need for infrequency scales composed of items that were specifically developed to be endorsed infrequently and embedded within valid self-report scales.
Preliminary findings indicate that positive relations between parents and teachers are associated with successful school outcomes for children. However, measures available to assess parent–teacher relations are scant. The current study examined validity evidence for the Parent–Teacher Relationship Scale–II (PTRS). Specifically, the internal structure of the PTRS and the test–criterion relationships between the PTRS and several important child-level variables were examined. Primary school teachers (n = 120) completed the PTRS referencing two different parents of children in their classroom, as well as outcome measures about both of these parent’s children (i.e., academic competence, student–teacher relationship, and behavior). Confirmatory factor analyses supported the two-factor solution originally proposed by the PTRS authors. Associations between the PTRS and child outcome variables provided further evidence in support of test–criterion relationships. School mental health professionals and researchers seeking to assess the contributions of parent–teacher relations to academic and behavioral outcomes of children should consider administering the PTRS.
The present study reports on an investigation of the generalizability of the technical adequacy of the Positive Experience at School Scale (PEASS) with a sample of students (N = 1,002) who differed substantially in age/grade level (i.e., adolescents in middle school as opposed to children in elementary school) and ethnic identity (i.e., majority Black/African American as opposed to majority Latino/a) in comparison with the measure’s primary development sample. Findings from confirmatory factor analyses indicated the original latent structure of the PEASS was tenable in the current sample and that the measure was invariant across gender and grade level, with some small demographic differences identified via latent means testing. Additional psychometric findings regarding the technical adequacy of the PEASS with this sample, including its observed scale characteristics and simulated classification utility with criterion measures of academic self-efficacy and school connectedness, are also presented. Implications for future research and practice are discussed.
As recommended by Carroll, the present study examined the factor structure of the Wechsler Intelligence Scale for Children–Fourth Edition Spanish (WISC-IV Spanish) normative sample using higher order exploratory factor analytic techniques not included in the WISC-IV Spanish Technical Manual. Results indicated that the WISC-IV Spanish subtests were properly aligned with theoretically proposed factors; however, application of the Schmid and Leiman procedure found that the g factor accounted for large portions of total and common variance, whereas the four first-order factors accounted for small portions of total and common variance. Implications for clinical interpretation of the measurement instrument are discussed.
The factor structure of the Teacher–Child Rating Scale (T-CRS 2.1) was examined using confirmatory factor analysis (CFA). A cross-sectional study was carried out on 68,497 children in prekindergarten through Grade 10. Item reduction was carried out based on modification indices, standardized residual covariance, and standardized factor loadings. A higher order model with a general super-ordinate factor fit the data well, and is consistent with the notion of a unidimensional non-cognitive set of learning-related skills. Model-based reliability estimates are provided.
Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy progression and understanding of analogical reasoning in 5- to 6-year-olds (N = 111). A pretest-training-posttest control group design with randomized blocking was utilized, where two experimental groups were trained according to the graduated prompts method. Results show that both training conditions improved more during dynamic testing compared with untrained controls. As expected, children in the CR condition required more prompting during training and showed different strategy-use patterns compared with the MC group. However, the quality of solution explanations was significantly better for children in the CR condition. It appears that possible performance advantages of training with CR items are most apparent when active processing is required. In the future, we advise including items such as CR or analogy construction in dynamic testing that allow for fine-grained analysis of strategy-use to further discern differences in children’s analogical reasoning understanding.
Research has indicated declining empathy within specific professions and social structures. Few psychometric instruments have addressed empathy within the context of psychological distance/relatedness to other individuals and even to other species, relationships that can be important contributors to psychological well-being and health. We developed and tested the Empathy Gradient Questionnaire (EGQ), which contains five subscales (i.e., Family, Friend, Peer, Distant Other, and Species Empathy) representing increasing psycho-spatial distances. We used LISREL to factor validate the five-factor structure of the EGQ, and we evaluated levels of empathy among a sample of n = 161 individuals, aged 18 to 60+. The EGQ was shown to have high subscale (0.80-0.89) and overall internal consistencies (0.94). The factor pattern and structural equation models showed five latent factors explaining 69.8% of variance for all variables (goodness-of-fit index [GFI] = 0.98, adjusted goodness-of-fit index [AGFI] = 0.98, standardized root mean square residual [SRMR] = 0.06, comparative fit index [CFI] = 0.92). There were no significant effects for age, gender, or race on overall empathy or for each of the five subscales. A decreasing gradient was noted for Friend to Species Empathies.
This study assessed the test-taking skills of 776 high school students, 35 of whom were diagnosed with learning disabilities (LD). Students completed a computerized battery of timed reading tests as well as scales that assess test anxiety and test-taking perceptions. Students with LD obtained lower scores than the nondisabled group on all of the reading tasks (speed, comprehension, vocabulary, and decoding), spent more time reviewing comprehension questions, and were less active in looking for answers in the passages. Both groups favored the same comprehension strategy of reading the entire passage and then answering questions. The groups did not differ in their levels of test anxiety or confidence in taking tests under timed conditions. Vocabulary score best discriminated between groups and best predicted reading comprehension performance, suggesting a potential target for intervention.
General cognitive diagnostic models (CDM) such as the generalized deterministic input, noisy, "and" gate (G-DINA) model are flexible in that they allow for both compensatory and noncompensatory relationships among the subskills within the same test. Most of the previous CDM applications in the literature have been add-ons to simulation studies. Although there are some applications of CDMs such as the Fusion Model and the Rule Space Model to educational assessment data in general and second-language data in particular, there are few studies applying general models such as the G-DINA. The purpose of the present study was to demonstrate the application of the G-DINA to the reading comprehension data of a high-stakes test. To this end, an initial Q-matrix was developed, validated, and cross-validated. The skill profiles of the test takers were estimated using the "CDM" package in R. Throughout, the process of constructing and validating a Q-matrix was elaborated on, the benefits of general models were emphasized, and implications for research investigating inter-skill relationships were discussed. Finally, suggestions for further research, to better take advantage of the flexibilities of general diagnostic models, were presented.
The purpose of this study was to develop a scale that measures adolescents’ attitudes toward classroom incivility and determine whether items would reveal subscales. A sample of 549 adolescents between ages 11 and 18 (53.1% boys; Mage = 13.90, SD = 1.41) completed items written to measure attitudes toward classroom incivility. An exploratory factor analysis (EFA) was used on one half of the randomly split sample and a confirmatory factor analysis (CFA) on the remainder. Results from both analyses suggested that two factors representing unintentional and intentional incivility might be the best factor solution. In addition, evidence for concurrent validity was found in correlations with four additional scales. Results suggest that attitudes toward classroom incivility are heterogeneous and that adolescence may be an important developmental period to address this construct. Future studies should continue psychometrically developing this scale and exploring this measure with additional antisocial beliefs and behaviors.
The underlying structure of the Behavioral and Emotional Screening System, Teacher Rating Scale–Preschool was investigated with a replication sample. Ratings from more than 3,000 students were used and four alternative models were investigated. As with prior research, a bifactor model with four factors was identified. The results supported an (Emerging) School Problems factor and yielded similar patterns of relationships with specific and general factors identified with prior research.
The purpose of this study was to determine the extent to which early literacy measures administered in kindergarten and Oral Reading Fluency (ORF) measures administered in Grade 1 are related to and predict future state reading assessment performances up to 7 years later. Results indicated that early literacy and ORF performances were significantly and moderately related to performances in Grades 3, 5, and 7. Grade 3 achievement was best predicted by ORF, followed by Phoneme Segmentation Fluency (PSF), and then Initial Sound Fluency (ISF). After controlling for the effects of previous state assessment scores in Grade 3, additional significant variance in Grade 5 performance was accounted for by ORF. Finally, after controlling for the effects of Grades 5 and 3 state assessment performances, early literacy and ORF measures did not significantly predict Grade 7 achievement. Discussion focuses on the implications of these findings for theory and practice, as well as limitations and directions for future research.
Many Flynn effect (FE) studies compare scores across different editions of Wechsler’s IQ tests. When construct changes are introduced by the test developers in the new edition, however, the presumed generational effects are difficult to untangle from changes due to test content. To remove this confound, we use the same edition of Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV) across an 11-year period. Whereas previous research has reported the FE to be less than half the theoretical rate when comparing WISC-IV with Wechsler Intelligence Scale for Children–Fifth Edition (WISC-V), we find the rate of gain to be nearly identical to Flynn’s prediction when comparing only WISC-IV scores over the same time period. The FE is shown to vary significantly across the domains of cognitive ability, and thus changes to the construct coverage of the WISC-V Full Scale IQ (FSIQ) composite between editions significantly affect FE findings. Implications for future FE research are discussed.
Psychometric properties of the Chinese version of the Self-Regulation Scale (C-SRS) were examined in a sample of 1,458 third- to eighth-grade students in China. Children completed self-reports of self-regulation, loneliness, depression, and self-esteem, and teachers rated children’s school adjustment. Results showed a stable three-factor model that demonstrated a reasonable fit to the C-SRS items, and the scale demonstrated adequate internal consistency, reliability, and convergent validity. Results of measurement invariance tests indicated metric and scalar invariance across gender and grade. Findings from this study suggest that the C-SRS can be used with Chinese primary and junior high school students.
This study examined the factor structure and the psychometric properties of the Satisfaction With Life Scale (SWLS) in a sample of 1,515 Italian (females = 60.26%, males = 39.74%) adolescents and young adults (Mage = 17.6 years, SD = 1.21). Results confirmed the unidimensionality, good reliability, and concurrent validity of the Italian version of the SWLS supporting its use in the Italian context.
Data from the standardization sample of the Woodcock–Johnson Psychoeducational Battery–Third Edition (WJ III) Cognitive standard battery and Test Session Observation Checklist items were analyzed to understand the relationship between g (general mental ability) and test session behavior (TSB; n = 5,769). Latent variable modeling methods were used to construct both the g and TSB factors, and measurement invariance of the two latent factors was tested across five age groupings (ages 6-8, 9-3, 14-19, 20-39, and 40+). Results indicated partial scalar invariance across age groups for both the g and TSB factors. Correlations between the g and TSB factors were moderately strong and statistically significant across all groups. Suggestions for future research that is most likely to advance theory development and scale development related to the relationship between g and TSB are discussed.
Although school climate has long been recognized as an important factor in the school improvement process, there are few psychometrically supported measures based on teacher perspectives. The current study replicated and extended the factor structure, concurrent validity, and test–retest reliability of the teacher version of the Authoritative School Climate Survey (ASCS) using a statewide sample of high school teachers. Multilevel confirmatory factor analyses based on surveys completed by 12,808 high school teachers from 302 schools found that factors of disciplinary structure and student support were associated to varying degrees with the teacher reports of the prevalence of student teasing and bullying and student engagement. These findings provide some empirical support for the use of the teacher version of the ASCS in high schools.
The goal of the current investigation was to evaluate psychometric properties of the Homework Distraction Scale (HDS) using 796 middle school students. Results from confirmatory factor analyses (CFAs) supported the presence of two distinct yet related subscales for the HDS: Conventional Distraction and Tech-Related Distraction. Results of measurement invariance tests further revealed that factor loadings were invariant across gender groups. Finally, correlation coefficients between the HDS and other external measures (goal orientations, homework behaviors, and homework interest) were consistent with theoretical expectations.
Differential granting of extra-examination time (EET) is commonly based on learning disabilities (LD) status: EET is granted to LD examinees and is denied to nondisabled examinees. We argue that LD serves as a proxy for the extent to which time limitation affects the examinee’s test score (e). Hence, the validity of the LD-based EET granting policy depends on how well LD status serves as a proxy for e. Reanalysis of 11 comparative experimental studies of the effect of EET shows that LD status is a poor proxy for e. The proportion of nondisabled examinees who benefit from EET roughly equals the corresponding proportion among LD students. Implications of these results for the validity and fairness of this policy are discussed.
Patterns of maintenance of ability across the life span have been documented on tests of knowledge (Gc), as have patterns of steady decline on measures of reasoning (Gf/Gv), working memory (Gsm), and speed (Gs). Whether these patterns occur at the same rate for adults from different educational backgrounds has been debated. In addition, age-related research is needed to study global IQs, especially in view of the increased reliance on IQ in capital punishment court cases. In this study, large representative samples of adults tested during the standardizations of three versions of the Wechsler Adult Intelligence Scale (WAIS) served as subjects: WAIS-R (N = 1,480, ages 20-74), WAIS-III (N = 2,093, ages 20-90), and WAIS-IV (N = 1,800, ages 20-90). Based on regression analysis, patterns of aging on Full Scale IQ (FSIQ) and the four abilities (a) were essentially the same for males versus females and (b) characterized all levels of education across three generations of Wechsler’s adult scales.
Strength-based assessment of behaviors in preschool children provides evidence of emotional and behavioral skills in children, rather than focusing primarily on weaknesses identified by deficit-based assessments. The Preschool Behavioral and Emotional Rating Scales (PreBERS) is a normative assessment of emotional and behavioral strengths in preschool children. The PreBERS has well-established reliability and validity for typically developing children as well as children with identified special education needs, but this has not yet been established for children in Head Start programs, who tend to be at high risk for development of emotional and behavioral concerns. This study explores the factorial validity of the PreBERS scores for a large sample of children participating in Head Start programs around the United States. Results not only confirm the fit of the four-factor model of the PreBERS for this population, but also demonstrate the application of a bifactor model to the structure of the PreBERS which, in turn, allows for the computation of model-based reliability estimates for the four subscales (Emotional Regulation, School Readiness, Social Confidence, Family Involvement) and overall strength index score. The implications suggest that the PreBERS items are reliable scores that can be used to identify behavioral strengths in preschool children in Head Start, and support planning of interventions to selectively address component skills to promote child social and academic success.
The present study examines the psychometric properties of the Social and Emotional Health Survey (SEHS), which is a 32-item self-report behavior rating scale for assessing youths’ social–emotional competencies, with a small sample (N = 77) of academically at-risk students attending a limited-residency charter school. This study is the first to explore the technical adequacy of the SEHS with a concentrated sample of at-risk youth in an alternative school context, located in a different geographic locale compared with the SEHS’s original development samples. Findings indicate that the SEHS composite scales were internally reliable and demonstrated internal convergent validity with each other as well as external discriminant validity with indicators of teacher-reported internalizing and externalizing symptoms. Results also indicate that several of the SEHS’s subscales had poor internal reliability in the present sample, and thus the usefulness of the subscales for applied purposes seems questionable. Limitations of the present study and implications for future research and practice are discussed.
The class climate is acknowledged as being related to student learning. Students learn more in classrooms that are supportive and caring. However, there are few class climate instruments at the elementary school level. The aim of the current study was to assess the measurement invariance of a recently developed scale in a different context (New Zealand) from where it was developed (the United States) and across different ethnic groups. A total of 1,924 elementary school students (963 males and 961 females) participated. Students completed the Student Personal Perception of Classroom Climate (SPPCC). Results of the invariance tests of the SPPCC across four ethnic samples (New Zealand European, Māori, Pasifika, and Asian) indicated that the SPPCC represented the same four factors in classroom climate for each of these groups (configural invariance). Results also revealed that full metric invariance was supported although only partial scalar invariance was achieved because of a lack of invariance in the thresholds for five items. Therefore, this study provided empirical support for the SPPCC when used within a new context and with different ethnic groups. Future studies to enhance the usability of the SPPCC are discussed.
Following the call to ensure the validity of instruments used to assess users’ level of Internet usage, this study examined the factor structure of the Internet Addiction Test–Adolescence version (IAT-A) when applied to a sample of young children in a multicultural society and assessed whether the items in the IAT-A were invariant by gender and, if the factor mean scores were significantly different by gender. IAT-A is a revised version from the original IAT, with very minor changes in item wordings for the use of adolescence and older children population. A total of 325 primary and secondary students (140 males and 185 females) participated in this study. Exploratory and confirmatory factor analyses generated three factors (Loss of Control, Dereliction of Duty, and Excessive Use), all of which were subsumed under a second-order factor of overall Internet addiction. The results also revealed that the factor loadings of IAT-A were invariant by gender and although males had higher factor mean scores than females, these were very small. Future studies to enhance the usability of the IAT and IAT-A are discussed.
The purpose of this study was to evaluate and compare two behavior screening instruments—the Behavioral and Emotional Screening System and the Behavior Screening Checklist. The sample consisted of 492 elementary school children from the southeastern United States. The psychometric properties of the screening instruments were evaluated in terms of intra-rater agreement, concurrent validity, and predictive validity of academic and behavioral outcomes. Results revealed that both screeners were significantly correlated with behavioral and academic variables. Consumers of screening instruments are encouraged to select an instrument that has sound psychometric properties and is practical for use in applied settingsC.
Two exploratory bifactor methods (e.g., Schmid–Leiman [SL] and exploratory bifactor analysis [EBFA]) were used to investigate the structure of the Woodcock–Johnson III (WJ-III) Cognitive in early school age (age 6-8). The SL procedure is recognized by factor analysts as a preferred method for EBFA. Jennrich and Bentler recently developed an alternative EBFA procedure. They claim that EBFA more readily produces independent cluster structure and overcomes the proportionality constraint experienced by the SL. The results of both analyses support the preeminence of the g factor at age 6 to 8. Examination of omega coefficients, the divergent factor structure, and the small amount of variance accounted for by the lower order factors suggests caution when interpreting beyond the higher order factor. Implications for interpretation of the WJ-III Cognitive at age 6 to 8 are discussed.
Content validation is a crucial, but often neglected, component of good test development. In the present study, content validity evidence was collected to determine the degree to which elements (e.g., grammatical structures, items, picture responses, administration, and scoring instructions) of the Comprehension of Written Grammar (CWG) test are representative of the construct of interest and appropriate for deaf and/or hard of hearing (DHH) students. Using 10 subject matter experts (SMEs) and the Content Validity Index, the quantitative findings showed general support for the content validity of elements of the CWG in assessing the written grammar comprehension of DHH students. Suggested revisions to the test, based on the feedback provided by the SMEs, are discussed.
Independently learning from informative texts becomes increasingly important from the age of 11. Little information is available, however, on (a) how and to what extent late elementary education students already apply specific text-learning strategies, and (b) whether different learner profiles can already be distinguished. In this study, a 37-item Text-Learning Strategies Inventory (TLSI) was developed and validated by means of exploratory (Sample 1; 896 students) and confirmatory factor analysis (Sample 2; 644 students). The TLSI contains nine subscales reflecting overt, covert, surface- and deep-level cognitive and metacognitive text-learning strategies. Based on these subscales, four learner profiles (i.e., integrated strategy users, information organizers, mental learners, and memorizers) were identified and validated by means of hierarchical and k-means cluster analysis and study traces. No differences in text free recall score between profiles were found. More girls were profiled as integrated strategy users, whereas more boys were identified as mental learners or memorizers.
The development and validation of the Brazilian Temperament Scale for Students (BTSS) are examined through the use of data from 1,258 children and adolescents, ages 10 through 21 (M = 15.0, SD = 2.1, 56% females). Three psychometric properties of BTSS are reported: its internal structure (e.g., validity), its reliability, and cut points to best distinguish between the bipolar styles for each of the scale’s four constructs (extroversion–introversion, practical–imaginative, thinking–feeling, and organized–flexible styles). Rasch analyses were used to estimate item intensity and students latent score parameters. The use of construct maps to help establish norms is described. The results of exploratory factor analysis using varimax rotation confirm the four scales’ intended bipolar factors are composed of items that theoretically represent the desired constructs. The confirmatory factor analysis for a four-factor structure generally displays good fit indexes. Infit and outfit values reach acceptable ranges (i.e., from 0.68 to 1.32). Results are interpreted using cut points.
This study assesses the ability of a brief screening form, the Behavioral and Emotional Screening System–Student Form (BESS-SF), to predict scores on the much longer form from which it was derived: the Behavior Assessment System for Children–Second Edition Self-Report of Personality–Child Form (BASC-2-SRP-C). The present study replicates a former study included in the BESS manual with an entirely new sample. Participants included 252 students from a large, urban, Southwestern U.S. city school district in the third through fifth grades. The sample’s ethnic majority was Hispanic (81.7%). Results revealed high specificity and negative predictive values between the screener and omnibus form, suggesting a child who identifies as not "at-risk" on the BESS-SF will likely identify as not "at-risk" on the BASC-2-SRP-C domains. These results effectively replicate the previous findings with a new sample of largely Hispanic (Latino/a) students from a large urban school district.
Through an examination of measurement invariance, this study investigated whether attachment-related dimensions (i.e., secure base, safe haven, and negative interactions as measured with the Network of Relationships Inventory—Behavioral Systems Version) have the same psychological meaning for early adolescents in their relationships with parents and teachers. Data were gathered for a sample of 297 families with an adolescent in Grade 7 (M age = 11.40; 62% boys). The results indicated that perceived attachment-related dimensions have a similar meaning in parent–child and teacher–child relationships (weak metric invariance), but that no direct comparison of observed means should be made (lack of strong metric invariance). In addition, it seemed that teachers fulfill the function of secure base rather than safe haven in early adolescence.
To determine whether children and adolescents (7-17 years old) who had experienced physical, sexual, or both types of abuse reflected distinct profiles of personal resiliency, we administered the Resiliency Scales for Children and Adolescents (RSCA) to 250 youth. We performed cluster analyses with T scores for the RSCA Self-Mastery, Relatedness, and Emotional Reactivity scales, and four profiles of resiliency representing high (20%), average (28%), and slightly below-average (30%) resiliency, and high vulnerability (22%) were identified. The youth with the high vulnerability profile described themselves as more depressed and were rated by their parents as having more internalizing and externalizing problems than the youth with high resiliency. We propose different treatment approaches that might be used with youth representing each of the four different profiles.
The psychometric properties of the Revised Children’s Manifest Anxiety Scale–Second Edition (RCMAS-2) were examined in a sample of 1,003 U.S. elementary and secondary students in Grades 2 to 12. Confirmatory factor analyses (CFAs) were performed comparing the five-factor (target) model consisting of three anxiety (Physiological Anxiety, Social Anxiety, and Worry) factors and two defensiveness (Defensiveness 1 and 2) factors with a three-factor model (one anxiety factor and two defensiveness factors). The results of the CFAs conducted indicated that the five-factor model provided a better fit to the data than the three-factor model. Tests of measurement invariance were also performed and the results provided support for configural, metric, and partial scalar invariance of the RCMAS-2 scores across gender. Latent mean analyses were also conducted and the results of these analyses indicated that females scored significantly higher than males on the three anxiety factors. These findings provide support for the construct validity of the RCMAS-2 scores. Implications of the findings for mental health professionals who work with elementary and secondary school students are discussed.
The Student Engagement Instrument (SEI) is a self-report measure of cognitive and affective engagement with school. Prior SEI validation studies have focused primarily on construct validity through analyses of internal consistency, factor analysis, and measurement invariance. Results are presented here from a two-pronged study of the criterion validity of SEI scores. Using a middle school sample (N = 35,900), concurrent validity was assessed through analyses of group differences in SEI scores across student subgroups expected to differ in cognitive and affective engagement levels: behaviorally disengaged versus non-disengaged, high-risk versus low-risk disability status, and high versus low academic achievement. Next, through multiple logistic regression analyses, the 4-year predictive validity of SEI scores for on-time graduation and dropout was assessed in a cohort of first-time ninth graders (N = 11,588). Nearly all SEI factors demonstrated directionally consistent associations with each criterion, including considerable long-term predictive associations with both dropout and on-time graduation.
The factor structure of the Phonological Awareness Literacy Screening for Grades 1 through 3 (PALS 1-3), a widely used early literacy screener in the Commonwealth of Virginia, was investigated using a large sample of public-school second-grade students (n = 14,993). Three alternative factor models (i.e., a one-factor, two-correlated factors, and a bifactor model) were tested and explored using an exploratory sample consisting of a randomly selected half of the overall sample. Model fit indices using confirmatory factor analyses indicated that a bifactor model fit the best and supported the scoring methods used with PALS 1-3, which is largely measured by a general factor of orthographic knowledge. The model was found to replicate in the randomly selected hold-out sample as well and exhibited adequate measurement precision ( h = .88).
In this study, a computerized measure, Interactive Analogical Measure (IAM), was developed and used to assess young children’s ability to reason analogically. The IAM was equipped to provide corrective feedback and the effects of that feedback were tested for experimental and control groups. A group of 5-year-olds (N = 80) participated in the study. Children were randomly assigned to the experimental or control group and the IAM was similarly implemented with the exception of the feedback indicators. There were significant differences between the experimental and control groups for overall performance. Data suggested that the effect of immediate, corrective feedback emerged very early during the test and manifested a cumulative effect across the entire test. It seems that an interactive measure provides meaningful problem-solving experience for young children.
Research on the effectiveness of competence-based education (CB-education) across educational contexts and levels requires a new evaluation measurement. This study explores the face validity, construct validity, and robustness of a competency self-report instrument that is aligned with contemporary competence theory and with current educational practice based on CB-qualification frameworks. A pilot study showed face validity of the competency constructs and indicators according to students from various levels in tertiary education. The results of the principal components analyses and parallel analyses, using data from 351 secondary vocational education and academic students, show more construct validity and robustness for competency constructs that are concrete and easy to relate to specific situations (e.g., "applying expertise") compared with the abstract competencies (e.g., "deciding and initiating"). This article sets out implications for designing and administrating uniform competency self-reports across educational levels and discusses suggestions for subsequent research.
Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling (GRM). Along with number of categories (from 2 to 6), two scale characteristics of scale length (n = 5, 10, and 20 items) and item discrimination (high/medium/low) were examined. Results of this study show that there was virtually no difference in psychometric properties of the scales using 4, 5, or 6 categories. Most deteriorating change was observed when the number of response categories reduced from 3 to 2 points in all six psychometric measures. Small moderating effects by scale length and item discrimination seem to be present, that is, a slightly larger impact on the psychometric properties by changing the number of response categories in a shorter and/or highly discriminating scale. This study concludes with the suggestion that a caution should be made if a scale has only 2 response categories but that limitation may be overcome by manipulating other scale features, namely, scale length or item discrimination.
Test anxiety was examined in college students with and without attention deficit hyperactivity disorder (ADHD). Results indicated that, relative to college students without ADHD, college students with ADHD reported higher total test anxiety as well as specific aspects of test anxiety, including worry (i.e., cognitive aspects of test anxiety) and emotionality (i.e., physiological aspects of test anxiety). Effect sizes were large for total test anxiety and the worry aspect of test anxiety. Nearly half of college students with ADHD reported clinically significant levels of the worry aspect of test anxiety. Females with ADHD reported higher levels of the emotionality aspect of test anxiety than did males with ADHD. Those with combined type and inattentive type ADHD did not differ on any aspect of test anxiety. Implications for assessment and intervention are discussed.
The purpose of this study is to provide evidence of reliability and validity of the Self-Efficacy for Teaching Mathematics Instrument (SETMI). Self-efficacy, as defined by Bandura, was the theoretical framework for the development of the instrument. The complex belief systems of mathematics teachers, as touted by Ernest provided insights into the elements of mathematics beliefs that could be relative to a teacher’s self-efficacy beliefs. The SETMI was developed in July 2010 and has undergone revisions to the original version through processes defined in this study. Evidence of reliability and validity were collected to determine whether the SETMI is an adequate instrument to measure self-efficacy of elementary mathematics teachers. Construct validity of the revised SETMI was tested using confirmatory factor analysis. Findings indicate that the SETMI is a valid and reliable measure of two aspects of self-efficacy: pedagogy in mathematics and teaching mathematics content.
Cognitive assessment of young children contributes to high-stakes decisions because results are often used to determine eligibility for early intervention and special education. Previous reviews of cognitive measures for young children highlighted concerns regarding adequacy of standardization samples, steep item gradients, and insufficient floors for young children functioning at lower levels. The present report extends previous reviews by including measures recently published or revised, nonverbal cognitive assessment tools, and issues specific to assessing bilingual or non-English-speaking children. Sixteen tests were reviewed, including all available measures of cognitive functioning for 2- to 4-year-old children normed in the United States. Test characteristics evaluated included (a) representativeness and recency of standardization data, (b) item bias analysis, (c) psychometric characteristics, and (d) appropriateness for assessing young children with developmental delays and non-English-speaking children. Implications are discussed for clinicians, researchers, and test developers.
A new multidimensional measure of test anxiety, the Test Anxiety Measure for Adolescents (TAMA), specifically designed for U.S. adolescents in Grades 6 to 12 was developed and its psychometric properties were examined. The TAMA consists of five scales (Cognitive Interference, Physiological Hyperarousal, Social Concerns, Task Irrelevant Behavior, and Worry). The results of confirmatory factor analyses on the responses of a sample of middle and high school students to the TAMA indicated that a five-factor (target) model provided a better fit to the data than a one-factor model. Results also indicated that the TAMA scores had adequate internal consistency reliability. Evidence supporting the convergent and discriminant validity of the TAMA scores was found. Implications of the findings for school personnel who work with adolescent students are discussed.
This study was designed to examine the factor structure and psychometric properties of the English as a Foreign Language Reading Anxiety Inventory (EFLRAI). A total of 939 non-English major students responded to the EFLRAI. Exploratory and confirmatory factor analyses were performed using a principal component analysis and structural equation modeling. Reliability analysis was also conducted to provide an indication of the internal consistency (reliability) of the measurement instrument. The findings of the study confirmed the adequacy of the three-factor model for the EFLRAI and also indicated decent reliability through internal consistency for the measure. The results not only support the EFLRAI’s multidimensionality, but also indicate the usefulness of the EFLRAI in reading anxiety research among non-English major students. The limitations of the study are discussed and recommendations for further research are provided.
Technical adequacy and usability are important considerations in selecting early childhood social-emotional (SE) screening and assessment measures. As identification of difficulties can be tied to programming, intervention, accountability, and funding, it is imperative that practitioners and decision makers select appropriate and quality measures from the plethora of measures available. This study systematically reviewed and evaluated the technical adequacy and usability of 10 commonly used SE assessment and screening measures, using a framework for evaluating selected properties of measures (e.g., reliability, validity). Through this review, it was found that there are inadequacies in many commonly used SE measures, deserving the attention of both users and developers.
There have been few research studies to examine the positive mental health of Asian adolescents. The aim here is to examine the factorial structure, internal consistency, test-retest reliability, and convergent/discriminant validity of a Korean version of the Mental Health Continuum–short form (K-MHC-SF), a newly developed self-report scale for positive mental health assessment, in a sample of South Korean adolescents. The Korean sample comprised 547 high school students (57% were female), ranging in age from 14 to 17 years (mean age = 16.08 years, SD = 0.34). Confirmatory factor analysis revealed that the K-MHC-SF replicated the three-factor structure of emotional, psychological, and social well-being found in earlier studies. Another confirmatory factor analysis supported the correlated two-factor model of mental health and mental disorder. The internal consistency of the overall K-MHC-SF was .91. The total score on the K-MHC-SF significantly correlated with a measure of life satisfaction ( = .58) and a measure of self-esteem ( = .57). In addition, the attempt at categorical classification revealed that 11.7% were in the category of positive mental health, characterized as flourishing, and 13.0% were in the category of absence of positive mental health, described as languishing. The results of the present study suggest that the K-MHC-SF is a psychometrically sound instrument for measuring the three lower-order dimensions of subjective well-being.
Cognitive assessments are used for a variety of research and clinical purposes in children with autism spectrum disorder (ASD). This study establishes concurrent validity of the Wechsler Intelligence Scales for Children–fourth edition (WISC-IV) and Differential Ability Scales–second edition (DAS-II) in a sample of children with ASD with a broad range of cognitive abilities. Participants achieved significantly higher overall scores on the DAS-II and nearly half the sample achieved a higher classification label on the DAS-II. The difference between overall scores is suggested to be attributable to a relative weakness in processing speed, which is assessed on the WISC-IV but not the DAS-II. Autistic symptomatology was not associated with cognitive scores, while adaptive behavior was positively associated. Neither was associated with the magnitude of difference between overall scores. Choice of assessment should be considered carefully given the systematic differences in overall scores produced in this population.
The Woodcock–Johnson-III cognitive in the adult time period (age 20 to 90 plus) was analyzed using exploratory bifactor analysis via the Schmid–Leiman orthogonalization procedure. The results of this study suggested possible overfactoring, a different factor structure from that posited in the Technical Manual and a lack of invariance across both age ranges under study. Even when forcing the seven-factor fit, the structure was problematic. The results from the 20 to 39 age group displayed patterns of convergence with and divergence from the Technical Manual’s structure. The results from the 40 and above age group were generally consistent with the Technical Manual’s structure except for retrieval fluency. This study is consistent with the body of exploratory factor analysis structural validity evidence suggesting that contemporary tests of cognitive ability, particularly those based on Cattell–Horn–Carroll theory, are overfactored and lack alignment with their respective Technical Manual’s presented structure.
This study examined whether the Single-Item Math Anxiety Scale (SIMA), based on the item suggested by Ashcraft, provided valid and reliable scores of mathematical anxiety. A large sample of university students (n = 279) was administered the SIMA and the 25-item Shortened Math Anxiety Rating Scale (sMARS) to evaluate the relation between the scores of the two measures. The university students were also administered other tests to provide validity evidence for the SIMA scores. The temporal stability of the SIMA scores was also evaluated over a 7-week test-retest interval. The findings of the study demonstrated that the SIMA scores showed evidence of validity and strong test-retest reliability. We advocate for the use of the SIMA as a quick and useful means of assessing math anxiety, particularly in research and educational settings when large samples have to be assessed.
This study examined the predictive validity of a teacher rating scale called the Self-Regulation Strategy Inventory–Teacher Rating Scale (SRSI-TRS) and its level of convergence with several student self-report measures of self-regulated learning (SRL). Eighty-seven high school students enrolled in one of four sections of a mathematics course in an urban high school and one mathematics teacher participated in the study. Correlation analyses revealed moderate correlations between the SRSI-TRS and self-report questionnaires targeting students’ motivation beliefs (i.e., mathematics interest) and regulatory behaviors in mathematics. Students’ self-efficacy perceptions correlated with all SRL and achievement measures, but not the SRSI-TRS. Hierarchical regression analyses showed that the SRSI-TRS emerged as the primary SRL predictor of achievement although student reports of their maladaptive SRL behaviors was a significant predictor in the final model.
Vocational interest inventories are commonly analyzed using a unidimensional approach, that is, each subscale is analyzed separately. However, the theories on which these inventories are based often postulate specific relationships between the interest traits. This article presents a multidimensional approach to the analysis of vocational interest data, which takes these relationships into account. Models in the framework of Multidimensional Item Response Theory (MIRT) are explained and applied to a widely used German vocational interest inventory based on the RIASEC model, the AIST-R. MIRT models were more appropriate to describe the data than unidimensional models. It follows that responses to some items were not only influenced by the interest type they were designed to measure but also by another dimension. The advantages of MIRT models are discussed.
There is empirical evidence to suggest that oral language and vocabulary on entering kindergarten are the best predictors of later reading success. Identifying skills that are predictive of later achievement using psychometrically sound measurement methods is a necessary component of early intervention efforts. Currently, there are limited methods for measuring early vocabulary acquisition. The Dynamic Indicators of Vocabulary Skills (DIVS) were designed to measure the vocabulary of preschool and kindergarten students. The purpose of this article is to contribute to the psychometric evidence supporting the use of the DIVS as effective measures of early vocabulary acquisition. This article presents an array of validity evidence for the DIVS, including predictive validity and the construct validity evaluated by both convergent validity and discriminant validity estimates.
The aim of the present study was to investigate the reliability and validity of a brief standardized assessment of children’s working memory; Lucid Recall. Although there are many established assessments of working memory, Lucid Recall is fully automated and can therefore be administered in a group setting. It is therefore ideally suited to large-scale screening or research purposes. The findings indicated suitable test–retest reliability. Scores were also correlated with children’s scores on the Wechsler Intelligence Scale for Children–IV working memory subtests, scholastic attainment, and ratings of children’s working memory behaviors. Working memory scores also distinguished between children with and without special educational needs. The findings are discussed in terms of practical implications for practitioners.
Traditional resources for ascertaining risk and protection for disengaged youth are often unsuitable, due to the stamina and skill required to complete them. Many of these tools assess risk without considering participants’ potential for personal growth. The present study outlines the development and initial validation of a tool titled the Contextualized Assessment Tool for Risk and Protection Management (CAT-RPM), which was administered to 499 participants across a range of high school settings. Six factors emerged that were highly correlated and had good internal consistency. Multivariate tests strongly suggest that the CAT-RPM is a valid and psychometrically sound assessment tool for differentiating groups across sex, age, and antisocial behavior. The research reveals a reliable measure of risk and protection that can assist young people to recognize and build on their strengths and adds a positive dimension to traditional risk assessment tools.
This study explored the Junior Brixton Test (JBT), an executive function (EF) measure for children, in comparison to the Wisconsin Card Sorting Test (WCST) in a sample of 6- to 8-year-olds, all attending the first 2 years of elementary school. Factor analyses indicated two main domains in both measures, namely concept formation and cognitive flexibility. However, within the cognitive flexibility domain of the JBT, perseveration scores reflected qualitatively different perseverative errors. More specifically, perseveration of previous rule and same stimulus scores loaded on the same subcomponent, whereas perseveration of same response loaded on another. The latter score was also negatively correlated both with a measure of general reasoning ability and a memory span task. The authors argue that the JBT is a promising tool to explore individual variations behind seemingly one type of executive function error, namely perseveration.
Authors contrasted Bracken Basic Concept Scale: Receptive, Third Edition (BBCS: R-3) test performance between 57 children with intellectual disability (ID) and 76 children with autism spectrum disorder (ASD) and ID. BBCS: R-3 School Readiness Composite (SRC) and Self-/Social Awareness subtests were analyzed. Multivariate analysis of covariance revealed no differences between groups on SRC performance; however, children with ID demonstrated better mastery of self-/social awareness concepts when compared to children with ASD. Within the group of children with ASD, mastery of school-based concepts exceeded mastery of self-/social awareness concepts. Findings suggest relatively greater delays in mastery of self-/social awareness concepts for young children with ASDs when compared to mastery of other concepts.
Letter name knowledge in the preschool ages is a strong predictor of later reading ability, but little is known about the psychometric characteristics of uppercase and lowercase letters considered together. Data from 1,113 preschoolers from diverse backgrounds on both uppercase and lowercase letter name knowledge were analyzed using Item Response Theory. Results indicated that uppercase and lowercase form a single dimension. Uppercase letters tended to be easier and more discriminating but had a narrow range of difficulty. Visual confusability (e.g., b vs. d) was an important aspect of both discrimination and difficulty. Including lowercase letters in the assessment of letter name knowledge increases its range to enable effective measurement of children with higher ability. A practical implication is that assessments of letter name knowledge can have fewer items and measure an extended range of ability while maintaining high levels of reliability.
The Polish Temperament Styles Questionnaire (PTSQ), derived from Student Style Questionnaire (SSQ) was developed to measure four bipolar temperament styles: extroverted versus introverted, practical versus imaginative, thinking versus feeling, and organized versus flexible. The study focuses on factorial validity and measurement invariance (configural, metric, and scalar) across gender and age groups using data from 1,022 students ages 8 to 19. Confirmatory factor analysis (CFA) supports the four factor model, and multigroup confirmatory factor analysis (MGCFA) supports measurement invariance for both age and gender groups.
An important consideration in determining the validity of an observational assessment measure for young children is the variability attributed to the child versus that ascribed to the assessor or to some other factor such as classroom context. The Teaching Strategies GOLD® assessment system was used to elicit teacher ratings of a national sample of 21,592 children (age 12-51 months). Teacher ratings of child development and learning were associated in expected directions with both child demographic characteristics and classroom composition variables. Children with disabilities started behind their typically developing peers and grew slower, girls showed an advantage in some areas over boys, and English language learners (ELLs) were rated lower at the beginning of the year and showed some faster rates of growth than their native English-speaking peers.
This study provided an independent examination of the Teacher Student Relationship Inventory (TSRI), a teacher report measure developed in Singapore. A total of 500 American high school students were rated by 84 teachers. Exploratory factor analysis supported the existence of three factors representing instrumental help, satisfaction, and conflict; 11 of 14 items emerged as relatively pure indicators. Evidence of concurrent validity was provided through correlations, in the expected directions, between students’ ratings of their overall relationships with their teachers (teacher support and negative attitudes towards teachers) and TSRI satisfaction and conflict scores; instrumental help was unrelated to student perceptions of general teacher–student relations. Criterion-related validity was established through significant correlations in the expected directions between TSRI satisfaction and conflict scores and multiple indicators of students’ psychological and academic functioning. Instrumental help co-occurred with greater academic achievement but also with more teacher-observed symptoms of psychopathology. Findings provide initial support for use of the TSRI with American adolescents and suggest teacher-rated satisfaction as particularly relevant to students’ academic and psychological functioning.
Students who exhibit substantial behavior and emotional problems in school often have shown less severe problems earlier. Screening for such problems can suggest which students need extra support and help educators to direct support to students who are more likely to benefit. The present study explored predictive validity of a very brief behavior problem screening procedure as applied to 2,253 students ages 5 to 17 years. About half were special education students identified with emotional disturbance; the rest were students with no identified disabilities. Teachers rated them on the 10 items of the Emotional and Behavioral Screener. Any student whose sum of ratings exceeded a norm-based cutoff score was designated as at-risk; otherwise the student was not at-risk. Binary classification analyses of four age-level by gender subgroups of students showed that the instrument validly identifies at-risk students. Study method limitations and directions for research to clarify some remaining questions about this screening procedure are presented.
The influential Common Core State Standards for Mathematics (CCSSM) expect students to start statistics learning during middle grades. Thus teacher education and professional development programs are advised to help preservice and in-service teachers increase their knowledge and confidence to teach statistics. Although existing self-efficacy instruments used in statistics education focus on students, the Self-Efficacy to Teach Statistics (SETS) instrument measures a teacher’s efficacy to teach key CCSSM statistical topics. Using the results from a sample of n = 309 participants enrolled in a mathematics education or introductory statistics course, SETS scores were validated for use with middle grades preservice teachers to differentiate levels of self-efficacy to teach statistics. Confirmatory factor analysis using the Multidimensional Random Coefficient Multinomial Logit Model supports the use of two dimensions, which exhibit adequate reliabilities and correspond to the first two levels of the Guidelines for Assessment and Instruction in Statistics Education adopted by the American Statistical Association. Item and rating scale analyses indicate that the items and the six-category scale perform as intended. These indicators suggest that the SETS instrument may be appropriate for measuring preservice teacher levels of self-efficacy to teach statistics.
Little extant research attempts to understand why rural African Americans engage in social relationships with peers in school. This is somewhat surprising as rural students’ peer interactions often affect their scholastic desires, and peers can alter African Americans’ academic performance. Hence, the current study examined both the presence and psychometric validity of social achievement goals among rural African American high school students. Results suggest the presence of three reasons for engaging in social relationships in school: social development (desire to increase friendship quality), social demonstration-approach (wanting to appear "cool" among friends), and social demonstration-avoid (fear of appearing socially inferior). Confirmatory factor analysis and Rasch analysis provide support for both the presence and valid measurement of social achievement goals among rural African American adolescents.
Confirmatory factor analysis was used to determine the factor structure of the Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV) scores of 297 children referred to a children’s hospital in the Southwestern United States. Results support previous findings that indicate the WISC-IV is best represented by a direct hierarchical (bifactor) model including four first-order factors and one general intelligence factor. In this sample, the general intelligence factor accounted for 50% of the total variance and 76% of the common variance. Of the first-order factors, the Verbal Comprehension factor accounted for the most total (5.6%) and common (8.4%) variance. Furthermore, the general intelligence factor accounted for more variance in each subtest than did its respective first-order factor and it was the only factor that exhibited adequate measurement precision (h = .87). These findings support previous recommendations that WISC-IV interpretations should focus on the general intelligence factor over first-order factors.
The Social Skills Rating System (SSRS) developed by Gresham and Elliott (1990) is a multirater, norm-referenced instrument measuring social skills and adaptive behavior in preschool children. The aims of the present study were (a) to test the factorial structure of the Parent Form of the SSRS for the first time with a German preschool sample (391 children) and (b) to present a modified version appropriate for 3- to 6-year-olds in German preschools. The sample consisted of 391 children (187 males, 204 females) from German preschools and their parents. Confirmatory factor analyses (CFAs) revealed a poor overall fit testing the original version. For the extended age range (3-6 years) a revised version of the SSRS with a reduced and newly composed item pool is proposed based on explorative factor analyses. Results are discussed with respect to practicability and revision of SSRS for German preschool populations.
This study employed a newly developed measure, the Social Skills Q-Sort (SSQ), to assess paraprofessionals’ and teachers’ reports of social skills for children with and without ASD. Paraprofessionals and teachers showed good rater-agreement on the SSQ. ROC curve analyses yielded an excellent profile of sensitivity and specificity for discriminating between children with ASD and typically developing children. The paraprofessional SSQ converged with objective ratings of playground social behavior; however, there was little evidence of convergence between SSQ scores and parent and teacher ratings on questionnaire measures. The SSQ may be effective in screening for ASD and the severity of ASD-related social communication challenges.
Research on secular trends in mean intelligence test scores shows smaller gains in vocabulary skills than in nonverbal reasoning. One possible explanation is that vocabulary test items become outdated faster compared to nonverbal tasks. The history of the usage frequency of the words on five popular vocabulary tests, the GSS Wordsum, Wechsler Adult Intelligence Scale (WAIS), Wechsler Adult Intelligence Scale–Revised (WAIS-R), Wechsler Intelligence Scale for Children (WISC), and Wechsler Intelligence Scale for Children–Revised (WISC-R) IQ tests, was analyzed by means of the Google ngram viewer. Usage frequency had a 0.38 to 0.73 correlation with item difficulty. In the period between test standardizations, the median change in usage frequency was –17% for WISC words, –8% for Wordsum, –5% for WISC-R, –4% for WAIS, and 0% for WAIS-R words. The correlation between median change in usage frequency and gain in vocabulary score was 0.33. Further studies with a larger set of vocabulary tests are needed to analyze in more detail the magnitude of the effect of changing word usage frequencies.
Problem solving is a key component of weight loss programs. The Social Problem Solving Inventory–Revised (SPSI-R) has not been evaluated in weight loss studies. The purpose of this study was to evaluate the psychometrics of the SPSI-R. Cronbach’s α (.95 for total score; .67-.92 for subscales) confirmed internal consistency reliability. The SPSI-R score was significantly associated (ps < .05) with decreased eating barriers and binge eating, increased self-efficacy in following a cholesterol-lowering diet, consumption of fewer calories and fat grams, more frequent exercise, lower psychological distress, and higher mental quality of life; all suggesting concurrent validity with other instruments used in weight loss studies. However, confirmatory factor analysis of the hypothesized five-factor structure did not fit the data well (2 = 350, p < .001).
This study examined empirical evidence for clinical utility of the Wechsler Intelligence Scale for Children, fourth edition (WISC-IV) cancellation subtest by comparing data from 597 clinical and 597 matched control children. The results of dependent t and sequential logistic regression analyses demonstrated that (a) children with intellectual disabilities, motor impairments, head injuries, Autistic/Aszperger’s disorder, ADHD and learning disabilities, and mathematics disorder showed significant deficits on the cancellation subtest; (b) children with intellectual disabilities and Asperger’s disorder benefited when stimuli were randomly aligned, but children with ADHD benefited from structured conditions; (c) beyond the full-scaled IQ (FSIQ) and General Ability Index (GAI)–Cognitive Proficiency Index (CPI) discrepancy scores, the cancellation subtest added unique diagnostic power to identify children with reading disorders, mild intellectual disabilities, closed head injuries, and motor impairments. These results suggest the utility of the cancellation subtest in clinical assessment.
The M5-50 is a five-factor theory instrument based on the International Personality Item Pool (IPIP) that has had difficulties with the five-factor model fitting well. The openness domain’s factor structure has a history of concerns that might relate to the connected yet distinguishable facets of openness/intellect. This study explored the factor structure and interdomain correlations within the openness domain of the M5-50 in 255 college students. Results indicate that no significant interdomain correlations exist between Openness and the other M5-50 domains. In addition, results suggest that having one factor convey the aspects of the openness domain construct does not explain the structure of the domain as well as a three-factor solution; the three-factor solution in the M5-50 includes commonalities to the distinct IPIP facets of artistic interests and intellect while the third factor demonstrates less emergent facets of the domain. Implications of the findings include a suggested review of the openness domain in the M5-50 and interpretations of openness/intellect in vocational settings.
This study examined the longitudinal factor structure of general self-concept and locus of control among high school students over a 4-year period, with data from the National Educational Longitudinal Study of 1988. Measurement invariance was tested over time and across gender and ethnic groups; second-order piecewise latent growth models were applied to study changes. In all analyses, Likert-type scale items were correctly treated as ordered-categorical variables and methodology was used accordingly. Results suggested that the measurement structure of general self-concept and locus of control was stable over time and across groups. In addition, both constructs decreased and then increased during the 4-year period. The female group and the White group followed the pattern of changes of the total sample. The male group and the three ethnic minority groups (Asian/Pacific Islander, Hispanic, and Black) differed from the total sample in their change patterns. Further, group differences were observed in the two constructs at the base year.
Since facilitating the fullest development of each student in terms of enriching their intellectual development as well as their personal, social, and emotional development has become an important objective, social and emotional guidance has become an integral part of education. This recent acknowledgment has stimulated research interest into how to measure, evaluate, and optimize guidance activities in schools. This interest, however, is contingent on a valid assessment of integrated socioemotional guidance, which despite the growing attention still remains a problem. This study therefore aims to investigate the validity and generalizability of the Socio-Emotional Guidance Questionnaire (SEG-Q). Measurement invariance across three groups of teachers teaching in different stages of secondary education (Total n = 3,336) was tested, by means of multigroup confirmatory factor analyses (MGCFA) in Mplus. The results show partial invariance of the SEG-Q across the teacher groups, confirming that the SEG-Q is a psychometrically sound self-report instrument for secondary education teachers which can be used by researchers and practitioners to measure, map, describe, or evaluate integrated socioemotional guidance.
The preschool years are a critical time to identify and treat early emotional or behavioral problems. Universal screening can be used to identify emotional and behavioral risk in preschoolers and fits well within current service delivery frameworks. This criterion-related validity study examined the use of a brief teacher-rated screener, the Behavioral and Emotional Screening System (BESS Preschool) in a sample of 65 preschool-age students from a predominately Latino/a background. Findings suggest that screening results from the BESS Preschool are highly correlated with important outcomes, including kindergarten readiness, receptive vocabulary, and social emotional development.
The authors demonstrate the increment of clinical validity in early childhood assessment of physical impairment (PI), developmental delay (DD), and autism (AUT) using multiple standardized developmental screening measures such as performance measures and parent and teacher rating scales. Hierarchical regression and sensitivity/specificity analyses were used to identify the differential impact of each domain the scales measure. Significant findings include (a) self-help domains in either parent or teacher questionnaires are more significant contributors than social-emotional domains to early detection, (b) performance measures are stronger predictors than parent or teacher questionnaires in detecting physical impairment or developmental delay, and (c) parent questionnaires measuring self-help skills are a stronger predictor of autism than performance measures. These results support the combined use of parent and teacher rating scales and provide important implications in choosing instruments for different developmental disorders when time and resources are limited.
A pilot study was conducted to examine the psychometric properties of the Environmental School Transition Anxiety Scale (E-STAS) with a sample of 220 fourth- to sixth-grade students who were about to or had completed their school transition. The results of an exploratory factor analysis (EFA) of the students’ responses on the E-STAS produced a two-factor (Academic and School Organization) structure. The E-STAS scores demonstrated adequate internal consistency reliability and 1- to 3-week test score stability. Gender differences were also found on the E-STAS, with females outscoring males. In addition, the results of correlational analyses provided support for the convergent and discriminant validity of the E-STAS scores. Implications of the findings for school professionals are discussed.
The Strengths and Difficulties Questionnaire (SDQ) is an instrument developed by Goodman for screening child and adolescent psychopathology. The aim of this study is to contribute to the analysis and validation of the internal structure of the Italian SDQ teacher version (SDQ-T). The SDQ-T was completed by 301 teachers, evaluating 3,302 children aged 3 to 15 years. Exploratory factor analysis (EFA) performed on a portion of the sample (n = 1,000) led to five interpretable factors, partially different from the original structure: Items 2 and 10 loaded on the Conduct Problems scale instead of that on the Hyperactivity/Inattention scale. Confirmatory factor analyses (CFAs) were performed to compare three different latent structures: the original five-factor structure; a second-order model recently proposed by Goodman, Lamping, and Ploubidis; and structure obtained with the EFA. The latter one showed the best fit. Significant differences by gender and school grade were found. The Italian SDQ-T’s internal structure differed in part from the original instrument; possible causes are discussed.
Measures of adaptive behaviors provide an important tool in the repertoire of clinical and school/educational psychologists. Measures that assess adaptive behaviors typically have been built in Western cultures and developed in light of behaviors common to them. Nevertheless, these measures are used elsewhere despite a paucity of data that examine their cross-national transportability. The issue of test transportability of such measures is important because they are used in cultures that differ from those in which they initially were developed as well as with immigrants. This present article describes the Romanian and Taiwan adaptation process of the U.S.–developed Adaptive Behavior Assessment System, Second Edition (ABAS-II) Parent Form for ages 5 to 21. Steps taken to help ensure a valid translation and cultural adaptation process are described. Data from more than 3,000 parents who completed the ABAS-II are examined across these three test versions, focusing on score differences, internal consistency, intercorrelations, and factor structure equivalence. Data from the three versions display considerable similarity. Score differences are infrequent between the three versions yet display some differences mainly at lower ages. Scale reliabilities are high and comparable for all the three versions. Correlation patterns are sufficiently similar between the three versions. Confirmatory factor analyses show a similar fit for all three versions.
This study examined reliability and validity of the Devereux Early Childhood Assessment (DECA), based on samples of parents and teachers’ ratings of 1,145 entering kindergartners in the Southwest. Confirmatory factor analysis showed that DECA presented good reliability and validity for manifest variables, corroborating previous findings. Three latent variables (initiative, self-control, and attachment) substantiated reliability estimates but showed insufficient discriminant validity because of multicollinearity among latent variables. We recommended caution in interpreting and applying results of DECA assessments in practice because of a lack of discriminant validity. Thus, these findings addressed the need for researchers, educators, and policy makers to consider alternative instruments for early identification of social and emotional problems in young children until new research and a revised DECA model show evidence for the validity of outcomes.
The goal of this study was to investigate reliable cognitive change in epilepsy by developing computational procedures to determine reliable change index scores (RCIs) for the Dutch Wechsler Intelligence Scales for Children. First, RCIs were calculated based on stability coefficients from a reference sample. Then, these RCIs were applied to a sample of 73 children with refractory epilepsy who were tested twice with the WISC-RNNL/WISC-IIINL after a mean interval of 2.3 years. Results indicated that children with refractory epilepsy are at risk for cognitive decline over time: 26.0% of the children showed reliable losses on Verbal IQ and 16.4% on the full scale IQ (expected rate = 5%). Declines on performance IQ were within expected limits.
The aims of the study were (a) to develop a scale to measure university students’ task value and (b) to use confirmatory factor analytic techniques to investigate the construct validity of the scale. The questionnaire items were developed based on theoretical considerations and the final version contained 38 items divided into 4 subscales. Analyses were conducted on 2 samples of university students (n1 = 430, n2 = 430). The results of confirmatory factor analysis suggested a modified version of the 4-factor model. The scale may have construct validity for the current sample of university students. Finally, possible applications for this scale, including for the early identification and prevention of problematic task values among university students, and implications for further psychometric research are highlighted.