Selected Current Research Projects for RMME Faculty | Research Methods, Measurement, and Evaluation

Psychometrics

Scale linking in multidimensional bifactor item response models

The bifactor model provides a viable representation of the structure of items and their relationships in multidimensional item response models. The linking of different forms of test items that show bifactor structure poses a challenging problem. In this project, methods for linking multidimensional items with a bifactor structure are provided and evaluated.

A Bayesian approach to assessing differential item functioning

A Bayesian framework for assessing DIF is provided in this project. The complete posterior distribution of the difference in the parameters of interest in the reference and focal groups is obtained. A more detailed analysis of DIF than is possible with current procedures can be carried out by examining the complete posterior distribution of the parameters in the groups of interest.

Assessing invariance in structural equation models: A Bayesian approach

A general Bayesian framework for examining invariance of parameters in populations of interest in the context of structural equation models is provided. The framework encompasses continuous and discrete endogenous and exogenous variables. A MCMC procedure is employed to obtain the posterior distributions of pairwise differences of the parameters in the populations.

Using response time information to improve item parameter estimation in IRT models.

With the advent of computer-based and computer adaptive testing, information regarding the time taken for an examinee to respond to an item can be routinely collected. Such auxiliary information can be used to improve the estimation of item and ability parameters. The use of response time as auxiliary information for improving item and ability parameter estimation in polytomous response models is evaluated.

Determining standard errors of linking using a bootstrap resampling approach.

Determining standard errors when test forms are linked is critical in the measurement context. While standard errors using IRT can be obtained, the information function approach may not be appropriate when the tests are short or if the model does not fit the data well. Bootstrap sampling approaches may provide better estimates. The bootstrap and information function approaches for computing standard errors of linking are compared.

Comparison of item response theory linking procedures for vertical scaling of test forms.

The development of vertical scales is crucial for the assessment of growth in children. However, several methodological issues must be resolved before vertical scales can be developed. Scale linking is one such issue. Several methods for scale linking are compared.

Smoothing procedures for the tails of vertical scales.

A problem that faces the development of scales is their behavior at the extremes. Vertical scales, in general, behave well in the middle of the scale but show instability at the low and high ends of the scales. Methods for smoothing the scale so that the scale shows stability at the extremes are investigated.

Assessment of growth and prediction of future performance of students using vertical scales.

Given that the impetus for the development of vertical scales is to assess student growth, it is important to develop methods for projecting student performance level/growth at a future grade so that students at risk can be identified and remedial action taken. In this project several methods for projecting student growth/proficiency level, growth models as well as regression models that do not require a vertical scale are compared using statewide assessment data.

The effect of multidimensionality on the classification of students into proficiency categories.

Unidimensional latent proficiency must be assumed in using common item response models to estimate examinee ability and classify examinees into proficiency categories. The consequences of violation of this assumption are examined in this project. Since it is impractical to provide and use a score on each dimension measured by the test to classify students, (a) a multidimensional model is fitted to the response data, and (b) a unitary score is derived by weighting the dimensions using the test characteristic curve. The effect on classification accuracy of using this score compared to fitting a unidimensional model is examined.

Assessing the dimensionality of a set of dichotomous and polytomous test items.

While several procedures have been developed for assessing the fit of item response models, very few directly address the issue of dimensionality. A direct method for assessing dimensionality that takes into account the nonlinearity in the item responses is developed in this project and compared with available methods.

Scoring and combining multiple choice and constructed response items in mixed format tests.

Composite scores based on differentially weighting examination sections are often used for assigning grades in large-scale achievement, licensure and certification testing programs. To the extent that the sections measure different constructs, different weighting schemes may change the relative standing of an examinee and alter the obtained grade. In this project different weighting schemes based on classical and IRT- based methods are compared and evaluated with respect to accuracy of classification of examinees into proficiency categories.

Evaluation of the feasibility of computer adaptive testing and multi-stage testing in low incidence fields.

The advantages of computer adaptive tests over traditional paper and pencil tests have been well documented. However, computer adaptive tests require large item banks, a requirement that cannot be met in low incidence fields. Multistage testing may provide a solution to this problem. In this project, multistage testing is compared with fully adaptive testing through simulation studies with respect to the number of items required, the accuracy of the estimate of an examinee’s ability, and the accuracy of classification into proficiency categories.

Goodness of fit statistics for polytomously scored items.

In fitting any model to data, the evaluation of the fit of the model to the data is critical. While several procedures exist for assessing the goodness of fit for dichotomous data, very little information is available for polytomously scored items. In this project, classical and Bayesian goodness of fit measures suitable for polytomous models are developed and evaluated.

The effect of errors in item parameter estimates on the estimation of population characteristics.]

The validity of large-scale assessment results used to describe and compare populations is predicated upon the accuracy with which the parameters of the population proficiency distributions are estimated. Ignoring errors in the estimation of item parameters can lead to serious problems in estimating population parameters. The effect of the effect of errors in item parameter estimation on the estimation of population characteristics is examined in this paper. A joint estimation procedure employing MCMS methods is compared with the plausible value approach with respect to accuracy and bias of estimation.

Quantitative Research Methods

Optimal design for regression discontinuity (RD) studies

RD studies have become an increasingly popular tool for researchers in recent years. Recent work has clarified the conditions necessary to design RD studies with sufficient statistical power in education policy studies. Despite existing work describing the conditions for optimal design of multi-level studies employing random assignment there has not been work extending these optimality results to RD studies. This project derives results which will inform the optimal design of RD studies.

Improving experimental designs in the presence of contamination

A critical issue in designing experiments to estimate the causal effects of interventions is to account for the possibility that features of the intervention intended for the treatment group may unintentionally be experienced by the control group. This phenomenon is variously referred to as “contamination” or “treatment diffusion”. Contamination can bias estimates of treatment effects. Existing work has clarified that even when contamination is substantial the variance penalty incurred by opting for a cluster randomized design often outweighs the biasing effects of contamination. Current work involves extending existing results to more complex multi-level designs, situations where outcomes are dichotomous and applying the idea of pseudo-cluster randomization, originally proposed in the context of health studies to educational contexts.

Using prior information about the ICC to improve power in research designs with clustering

Research designs that randomly assign entire clusters of individuals (such as schools) to treatments are common in studies of educational policy and practice. A major problem for these studies is that it is difficult to obtain a sufficient numbers of schools to conduct studies with adequate statistical power. This project involves deriving a new method of utilizing prior information about the intracluster correlation coefficient to improve power.

Statistical models for implementation fidelity and the implications of fidelity for statistical power.

One important aspect of interpreting the results of experiments is understanding the role that fidelity of treatment implementation plays in interpreting experimental results. Recently, evaluation researchers have created a model for formalizing and quantifying treatment fidelity. This project involves applying this model to understand the impact of treatment fidelity on statistical power.

The consequences of a mismatch between analytic and data generating models in education research.

A frequent mistake in the analysis of cluster randomized trials is made when the data are analyzed as if assignment was carried out at the level of individuals. It is useful to understand how to interpret the results of such analyses. This project derives actual (as opposed to nominal) type I and type II error rates under a variety of scenarios for a mismatch between the true data generating model and the model used for data analysis.

School structure science success: Organization and leadership influences on student achievement.

The emerging consensus is that school organization and leadership have quantifiable influences on student achievement. This mixed methods project seeks to understand the complex interrelationships between school climate and culture, administrator and teacher perceptions, values, and beliefs, and students’ science achievement. The project involves the development and administration of a large teacher questionnaire, which measures over a dozen attitudinal and climate variables, as well as the pairing of the survey data with achievement data. The major quantitative analyses consist of series of two-level structural equation models, which will allow for the exploration of mediational pathways of school climate and organizational influences on student achievement as well as a series of school-level growth curve analyses.

Early Vocabulary Intervention Project

The Early Vocabulary Intervention project is a multi-site, randomized cluster design study that looks at the effects of a tiered vocabulary intervention on kindergarten students. Primarily, the research team will use hierarchical linear models to answer questions about the effectiveness of the intervention across schools and growth curve models to explore the long-term effects of the intervention on outcomes in reading and language arts. A graduate student in RMME serves as the data specialist on the team, under the supervision of an RMME faculty member.

Measurement

Assessment of Teacher Effectiveness

This research explores the instruments used to measure teacher effectiveness. A reviewed of the research on the underlying construct, classroom observation protocols and value-added models has been carried out. The present focus is on individualized goal based measures of student growth, currently being implemented in places such as Rhode Island and New Haven, CT. Goal based measures of effectiveness, such as Goal Attainment Scaling, have been used successfully in a variety of health care settings, but there is little research on their use in an educational context. We are in the process of validating a measure that uses individualized goals set by cooperating teachers as a measure of a student teachers’ contribution to student growth. Access to the Measures of Effective (MET) Project database, funded by the Bill and Melinda Gates Foundation, is in progress as well. This is the largest database of its kind, and would provide opportunities to examine many different aspects of teacher effectiveness.

Development and Validation of the Challenges to Scholastic Achievement Scale (CSAS)

This ongoing research project involves the development and validation of the Challenges to Scholastic Achievement Scale (CSAS), The CSAS, which is designed to identify negative manifestations of underachievement among high school students. The CSAS measures five constructs related to underachievement: alienation, negative academic self-perception, negative attitudes toward teachers and school, low motivation/self-regulation, and low goal valuation. Our goal is that educators, researchers, and clinicians will be able to use the CSAS to identify the students who are at the greatest risk of underachieving and to understand the reasons that certain able students underachieve so that they can target appropriate interventions .

Development and Validation of a Science Content Knowledge for Teaching Assessment

This ongoing research project involves the development and validation of a Science Content Knowledge for Teaching (S-CKT) assessment. The S-CKT is designed to measure specialized knowledge of content, which includes the deeper understanding that one develops through acquisition of an undergraduate major or minor, and the specialized disciplinary understanding that one develops through acquisition of a teaching degree and certification. It also includes pedagogical content knowledge, which is the subject-specific, practical knowledge of how to teach the content and of how students learn the content that is used by teachers to guide their actions in highly contextualized classroom settings. Our goal is that educators and researchers will be able to use the S-CKT to inform science teaching and learning.

Evaluating Content Alignment Methods

There are many ways to examine whether test content matches the intended objectives, most depend on test reviews by subject matter experts (SMEs). This series of studies compared how the results of content validity analyses differed dependent upon the instructions provided to one group of SMEs. Results of asking SMEs to rate items as aligned or not aligned to test objectives were compared to results generated when SMEs were allowed to rate alignment across a continuum. The study also compares the items grouped together in different subtests by test developers with analyses that compare groupings generated by SMEs when they were asked to group items in ways that made sense to them.

Instructional Sensitivity of Large Scale Tests

Instructional sensitivity refers to the ability of tests to detect instructional efforts. For school accountability to work, it is vital that tests reflect instructional efforts instead of student aptitude or other non-school factors that affect achievement. However, most state assessment and accountability systems do not evaluate the ability of tests to capture instructional efforts. This study examined the instructional sensitivity of one state test by collecting detailed information about the way that teachers operationalized state standards, rating teachers according to their alignment to and emphasis on tested topics, and using this information to predict student performance using multilevel models.

Teaching to the Test Under Standards-based Reform

Although teaching to the test is a ubiquitous term, there is no commonly-adopted definition of what constitutes teaching to the test in a standards-based environment, nor is there a commonly accepted list of appropriate and inappropriate test preparation practices. This study proposes both a definition and a list and reviews a group of third and fifth grade teachers’ stated practices, and uses multilevel modeling to determine whether teaching to the test affects test performance after controlling for prior performance.

Validation of Standards-based Report Cards

Many school districts require that teachers complete report cards using the same performance level categories as reported on the state test instead of assigning letter grade in an attempt to increase teacher focus on state standards and to improve the quality of information provided to parents on report cards. This study compares report card grades and test scores to determine their level of concurrence. Teachers’ explanations of their grading methods are used to determine whether the use of reliable grading methods results in greater consistency with state test scores. Finally, the study examines the contribution of teacher, student, and content area to inconsistencies between grades and report cards.

CSDE/UConn Measurement, Evaluation, and Assessment Partnership

The Connecticut State Department of Education (CSDE)/UConn partnership was established in 2003. The purpose of the partnership is to provide additional technical resources to the CSDE student assessment office to develop, administer, and report results from statewide measures of student achievement. Support services are provided for the main assessment program, which includes the Connecticut Mastery Test (CMT) and the Connecticut Academic Performance Test (CAPT), as well as for smaller scale assessment initiatives such as the CMT/CAPT Skills Checklist, the modified assessment program, the kindergarten inventories, and formative assessment programs. Examples of services provided include: (1) independent analysis of testing data to confirm analyses performed by the CSDE and/or its contractors to ensure data accuracy and program quality, as well as to resolve technical issues; (2) item review and test form review for developing instrumentation; (3) research to monitor the effectiveness of the student assessment programs; (4) research on current assessment issues such as the Core Content Standards, growth modeling, closing the achievement gap, the use of technology to enhance student learning, and teacher quality.

Project VIABLE-II: Unified validation of Direct Behavior Rating (DBR)

This project involves validating an 11-point behavior rating scale that teachers complete at the end of instructional periods using teacher ratings of approximately 2000 students in grades 1-2, 4-5, and 7-8 across three states. Teachers complete three separate behavioral measures for each student, reflecting on their behavior over the course of a week, in the Fall, Winter and Spring of four school years. Analyses include setting cut scores using ROC analyses, analysis of multitrait-multimethod matrices, and examining the predictive validity of the measures in determining behavioral risk in later school years. In addition, we will use verbal protocol analyses to determine what information teachers consider in rating students and will examine the efficiency and usability of the instrument based on survey data.

Evaluations

Connecticut Prekindergarten Impact Evaluation

With a large volume of literature pointing to the beneficial impact of pre-kindergarten, researchers, educators, and policymakers are now raising questions about what works for whom, under what conditions. However, few studies exist on the differential and contextual effects of these programs, creating opportunities for new research in this area. This evaluation project involves using a regression discontinuity design to examine the average causal impact of attending the prekindergarten overall, and by race and income across four different outcomes (reading, oral language, vocabulary, and mathematics).

Connecticut Prekindergarten Cost-Benefit Analysis

Cost-benefit analysis (CBA) goes beyond traditional measures of effect size to compare costs with program benefits. This evaluation project involves assessing the impact of attending prekindergarten relative to monetary and non-monetary costs. Key evaluation questions include: (1) What are the comprehensive costs associated with offering enhanced prekindergarten slots through the state-funded programs in Connecticut?; (2) How much do prekindergarten expenditures vary across the state?; (3) What is the relationship between prekindergarten expenditures and student outcomes?; and (4) What is the estimated net benefit of the prekindergarten expansion pilot on student learning, teacher retention, and family engagement?

Applied Educational Research

Methodological support for the National Center for Research on Gifted Education at the University of Connecticut

With funding authorized through the Jacob K. Javits Gifted and Talented Students Education Act, the Institute of Education Sciences, U.S. Department of Education (PR/Award #R305C140018) launched the National Center for Research on Gifted Education at the University of Connecticut to address these issues. During the first three years (Phase 1), the Center examined the extent of gifted programming and student participation in three states; identifying districts and schools that showed high achievement growth rates among gifted students, including those from underserved groups; and exploring how these sites successfully identified, served, and retained students from underrepresented groups in gifted programs. The Exploratory Phase 1 work focused on identifying gifted and talented programs that had a strong commitment to identifying and serving students from underrepresented groups and that showed promise for improving student outcomes. In Phase 2 (Year 4 and 5), we are examining the effect of attending dedicated gifted classes in core content areas on students’ academic achievement in reading/language arts and mathematics in a large, ethnically, economically, and linguistically diverse urban school district by comparing the reading/language arts and mathematics achievement of gifted students in three different settings: schools offering a full-time gifted-only program with gifted classes in all subject areas, schools offering a part-time gifted-only program with gifted classes in mathematics, and schools offering a part-time gifted-only program with gifted classes in reading/language arts. The Center’s work extends over a total of 5 years (approximately 3 years for Phase 1, and 2 years for Phase 2).

Co-PI and internal evaluation of Science of Learning and Art of Communication NSF NRT grant

The National Science Foundation has awarded our interdisciplinary group of UConn researchers a five-year grant, “The science of learning, from neurobiology to real-world application: A problem-based approach.” We aim to develop transformative models for graduate education in science, technology, engineering, and mathematics (STEM) fields, training 50 students (including 25 Ph.D. fellows). This grant follows our highly productive Integrative Graduate Education and Research Traineeship grant in the Neurobiology of Language.

The “Science of Learning and Art of Communication,” or SLAC, draws on subfields of cognitive science and neuroscience: genetics, behavioral neuroscience, linguistics, education, psychology, and speech-language-hearing sciences. “SLACer” graduate students, and their mentors, will develop new, team-based, interdisciplinary approaches to learning. SLAC teams will also learn how communicate effectively with a wide range of audiences — specialist peers, but also the general public. SLACers will face the challenge of how to clearly and effectively share ideas without assuming prior knowledge or relying on technical jargon. This skill not only enables excellence in research, but empowers trainees to become ambassadors for science to society as a whole.
Students will complete a one-year graduate seminar on the science of learning and the challenge of communicating their research to audiences ranging from specialist peers to school children, drawing on techniques from the performing arts and new approaches using digital media. In a hands-on practicum, they will develop skills crucial to academic and nonacademic careers that are often absent in graduate education, such as project design and management, budgeting and resource allocation, and external communications. The program will also promote diversity in academic and industry careers that require advanced training by prioritizing recruitment and retention.