In this article, we study a simple mathematical model of a bilingual community in which all agents are fluent in the majority language but only a fraction of the population has some degree of proficiency in the minority language. We investigate how different distributions of proficiency, combined with the speakers’ attitudes toward or against the minority language, may influence its use in pair conversations.
Two approaches for the statistical analysis of social network generation are widely used; the tie-oriented exponential random graph model (ERGM) and the stochastic actor-oriented model (SAOM) or Siena model. While the choice for either model by empirical researchers often seems arbitrary, there are important differences between these models that current literature tends to miss. First, the ERGM is defined on the graph level, while the SAOM is defined on the transition level. This allows the SAOM to model asymmetric or one-sided tie transition dependence. Second, network statistics in the ERGM are defined globally but are nested in actors in the SAOM. Consequently, dependence assumptions in the SAOM are generally stronger than in the ERGM. Resulting from both, meso- and macro-level properties of networks that can be represented by either model differ substantively and analyzing the same network employing ERGMs and SAOMs can lead to distinct results. Guidelines for theoretically founded model choice are suggested.
Fuzzy-set qualitative comparative analysis (fsQCA) has become one of the most prominent methods in the social sciences for capturing causal complexity, especially for scholars with small- and medium-N data sets. This research note explores two key assumptions in fsQCA’s methodology for testing for necessary and sufficient conditions—the cumulation assumption and the triangular data assumption—and argues that, in combination, they produce a form of aggregation bias that has not been recognized in the fsQCA literature. It also offers a straightforward test to help researchers answer the question of whether their findings are plausibly the result of aggregation bias.
Respondent-driven sampling (RDS), a link-tracing sampling and inference method for studying hard-to-reach populations, has been shown to produce asymptotically unbiased population estimates when its assumptions are satisfied. However, some of the assumptions are prohibitively difficult to reach in the field, and the violation of a crucial assumption can produce biased estimates. We compare two different inference approaches: design-based inference, which relies on the known probability of selection in sampling, and model-based inference, which is based on models of human recruitment behavior and the social context within which sampling is conducted. The advantage of the latter approach is that when the violation of an assumption has been shown to produce biased population estimates, the model can be adjusted to more accurately reflect actual recruitment behavior, and thereby control for the source of bias. To illustrate this process, we focus on three sources of bias, differential effectiveness of recruitment, a form of nonresponse bias, and bias resulting from status differentials that produce asymmetries in recruitment behavior. We first present diagnostics for identifying types of bias and then present new forms of a model-based RDS estimator that controls for each type of bias. In this way, we show the unique advantages of a model-based estimator.
In this article, a numerical method integrated with statistical data simulation technique is introduced to solve a nonlinear system of ordinary differential equations with multiple random variable coefficients. The utilization of Monte Carlo simulation with central divided difference formula of finite difference (FD) method is repeated n times to simulate values of the variable coefficients as random sampling instead being limited as real values with respect to time. The mean of the n final solutions via this integrated technique, named in short as mean Monte Carlo finite difference (MMCFD) method, represents the final solution of the system. This method is proposed for the first time to calculate the numerical solution obtained for each subpopulation as a vector distribution. The numerical outputs are tabulated, graphed, and compared with previous statistical estimations for 2013, 2015, and 2030, respectively. The solutions of FD and MMCFD are found to be in good agreement with small standard deviation of the means, and small measure of difference. The new MMCFD method is useful to predict intervals of random distributions for the numerical solutions of this epidemiology model with better approximation and agreement between existing statistical estimations and FD numerical solutions.
The National Study of Protest Events (NSPE) employed hypernetwork sampling to generate the first-ever nationally representative sample of protest events. Nearly complete information about various event characteristics was collected from participants in 1,037 unique protests across the United States in 2010 to 2011. The first part of this article reviews extant methodologies in protest-event research and discusses how the NSPE overcomes their recognized limitations. Next, we detail how the NSPE was conducted and present descriptive statistics for a number of important event characteristics. The hypernetwork sample is then compared to newspaper reports of protests. As expected, we find many differences in the types of events these sources capture. At the same time, the overall number and magnitude of the differences are likely to be surprising. By contrast, little variation is observed in how protesters and journalists described features of the same events. NSPE data have many potential applications in the field of contentious politics and social movements, and several possibilities for future research are outlined.
This article analyzes the effect of interviewers’ physical attractiveness on cooperation rates in face-to-face interviews and survey responses (self-reports on physical appearance, weight, and health). This article includes four aspects of physical attractiveness (facial attractiveness, voice attractiveness, body mass index [BMI], and height) and reports that (1) interviewers with more attractive faces and lower BMI have higher cooperation rates, (2) differences in interviewers’ personality (Big Five, Rosenberg self-esteem) account for about one third of the total effect of facial attractiveness on cooperation rates, and (3) being interviewed by a more attractive interviewer leads to more positive self-reports on physical appearance, weight, and health (but does not affect self-reports unrelated to physical appearance).
It is necessary to test for equivalence of measurements across groups to guarantee that comparisons of regression coefficients or mean scores of a latent factor are meaningful. Unfortunately, when tested, many scales display nonequivalence. Several researchers have suggested that nonequivalence may be used as a useful source of information as to why equivalence is biased and proposed employing a multilevel structural equation modeling (MLSEM) approach to explain why equivalence is not given. This method can consider a latent between-level factor and/or single contextual variables and use them to explain items’ nonequivalence. In the current study, we show that this method may also be useful for social science studies in general and for survey research and sociological comparative studies in particular when one fails to establish cross-group equivalence. We utilize data from the International Social Survey Program national identity module (2003) to test for the cross-country equivalence of a scale measuring attitudes toward granting citizenship rights to immigrants. As expected, the scale fails to achieve scalar equivalence. However, we explain a significant part of the most nonequivalent intercept by a latent between-level factor and one contextual variable, namely, the percentage of foreigners in the country relying on group threat theory. We show that the method does not necessarily rectify nonequivalence, but it can help to explain why it is absent.
Information quality deficiencies have been detected in occupational safety and health surveys in Europe, which typically gather self-reported data responded by employers or their representatives. For instance, their low response rates and informant profiles make estimations on establishments with safety representatives (SRs) unreliable. We tested the mode of administration and informants as sources of error regarding establishments with SRs in Catalonia, Spain. Two sources of information were compared: the Second Catalan Survey of Working Conditions 2011 (IICSWC)—with a methodology similar to surveys conducted at the state and European level—and the Progam on Prevention of Risks Management in Companies (PPRMC)—in which the labor authority collected data using a documentary verification in another sample of establishments. Percentage of establishments with SRs was estimated using the data from the PPRMC and also the differences in percentage between sources and informant profiles (with 95 percent confidence interval). Results show that the IICSWC overestimates the percentage of establishments with SRs.
In this article, alternative randomized response models are proposed, which make use of sum of quantitative scores generated from two decks of cards being used in a survey. The proposed methods are compared to the Odumade and Singh and Singh and Grewal models through a simulation study. It is shown that the modified methods can be used more efficiently than those both models.
This article distinguishes three measures of intergenerational economic mobility that emerge when the population is divided into groups: overall individual mobility, within-group mobility, and between-group mobility. We clarify their properties and the relationship between them. We then evaluate Clark’s use of surname between-group persistence as a preferred measure of intergenerational mobility in the book The Son Also Rises: Surnames and the History of Social Mobility. We show that aggregate surname-level intergenerational persistence cannot be compared with individual persistence because group-level income averages captures diverse individual-level and group-level factors impossible to disentangle without additional identifying information. Furthermore, measures of group persistence do not address the problem of measurement error leading to attenuation bias, which is Clark’s rationale to study surname mobility. An empirical example partitioning the population into groups based on racial/ethnic origins and a simulation clarify the relationship between these different measures of mobility.
Development and refinement of self-report measures generally involves selecting a subset of indicators from a larger set. Despite the importance of this task, methods applied to accomplish this are often idiosyncratic and ad hoc, or based on incomplete statistical criteria. We describe a structural equation modeling (SEM)-based technique, based on the standardized residual variance–covariance matrix, which subsumes multiple traditional psychometric criteria: item homogeneity, reliability, convergent, and discriminant validity. SEMs with a fixed structure, but with substituted candidate items, can be used to evaluate the relative performance of those items. Using simulated data sets, we demonstrate a simple progressive elimination algorithm, which demonstrably optimizes item choice across multiple psychometric criteria. This method is then applied to the task of short-form development of the multidimensional "4Es" (Excitement, Esteem, Escape, Excess) scale, which are understood as indicators of psychological vulnerability to gambling problems. It is concluded that the proposed SEM-based algorithm provides an automatic and efficient approach to the item-reduction stage of scale development and should be similarly useful for the development of short forms of preexisting scales. Broader use of such an algorithm would promote more transparent, consistent, and replicable scale development.
This article explores the ontological politics of research in the field of community studies. Focusing on a migrant community in London, UK, it shows how the community is (re)assembled in different ways through the different research practices of academics and practitioners. Guided by a framework based on material semiotics, this article compares the agendas, methods, and representational texts that inform the different research practices. It argues that community studies researchers have an ethical responsibility to acknowledge the particular enactments of communities that emerge through their research and the role that agendas, methods, and texts play in constructing those enactments.
Between-subject design surveys are a powerful means of gauging public opinion, but critics rightly charge that closed-ended questions only provide slices of insight into issues that are considerably more complex. Qualitative research enables richer accounts but inevitably includes coder bias and subjective interpretations. To mitigate these issues, we have developed a sequential mixed-methods approach in which content analysis is quantitized and then compared in a contrastive fashion to provide data that capitalize upon the features of qualitative research while reducing the impact of coder bias in analysis of the data. This article describes the method and demonstrates the advantages of the technique by providing an example of insights into public attitudes that have not been revealed using other methods.
Data collected in psychological studies are mainly characterized by containing a large number of variables (multidimensional data sets). Analyzing multidimensional data can be a difficult task, especially if only classical approaches are used (hypothesis tests, analyses of variance, linear models, etc.). Regarding multidimensional models, visual techniques play an important role because they can show the relationships among variables in a data set. Parallel coordinates and Chernoff faces are good examples of this. This article presents self-organizing maps (SOM), a multivariate visual data mining technique used to provide global visualizations of all the data. This technique is presented as a tutorial with the aim of showing its capabilities, how it works, and how to interpret its results. Specifically, SOM analysis has been applied to analyze the data collected in a study on the efficacy of a cognitive and behavioral treatment (CBT) for childhood obesity. The objective of the CBT was to modify the eating habits and level of physical activity in a sample of children with overweight and obesity. Children were randomized into two treatment conditions: CBT traditional procedure (face-to-face sessions) and CBT supported by a web platform. In order to analyze their progress in the acquisition of healthier habits, self-register techniques were used to record dietary behavior and physical activity. In the traditional CBT condition, children completed the self-register using a paper-and-pencil procedure, while in the web platform condition, participants completed the self-register using an electronic personal digital assistant. Results showed the potential of SOM for analyzing the large amount of data necessary to study the acquisition of new habits in a childhood obesity treatment. Currently, the high prevalence of childhood obesity points to the need to develop strategies to manage a large number of data in order to design procedures adapted to personal characteristics and increase treatment efficacy.
The article outlines a methodology for systematically observing collective violence (and public order policing in relation to it). Specific attention is given to matters of sampling and measurement and to the way in which observational challenges have been met in comparison with participant observational studies of demonstrations and football matches. The article shows that it is possible to conduct meaningful systematic observations of episodes of collective violence in a reliable way (more complete and more detailed than police records or newspaper reports) without compromising the physical safety of the observer. Even though violence at these types of events is relatively rare, it is also possible specifically to sample events with an increased likelihood for collective violence. Direct systematic observation of collective violence yields data that cannot be obtained by other means (surveys, interviews, participant observation) and that are crucial to an understanding of the initiation and escalation of collective violence.
Qualitative and multimethod scholars face a wide and often confusing array of alternatives for case selection using the results of a prior regression analysis. Methodologists have recommended alternatives including selection of typical cases, deviant cases, extreme cases on the independent variable, extreme cases on the dependent variable, influential cases, most similar cases, most different cases, pathway cases, and randomly sampled cases, among others. Yet this literature leaves it substantially unclear which of these approaches is best for any particular goal. Via statistical modeling and simulation, I argue that the rarely considered approach of selecting cases with extreme values on the main independent variable, as well as the more commonly discussed deviant case design, are the best alternatives for a broad range of discovery-related goals. By contrast, the widely discussed and advocated typical case, extreme-on-Y, and most similar cases approaches to case selection are much less valuable than scholars in the qualitative and multimethods research traditions have recognized to date.
Case studies are usually considered a qualitative method. However, some aspects of case study research—notably, the selection of cases—may be viewed through a quantitative template. In this symposium, we invite authors to contemplate the ways in which case study research might be conceived, and improved, by applying lessons from large-n cross-case research.
The combination of Qualitative Comparative Analysis (QCA) with process tracing, which we call set-theoretic multimethod research (MMR), is steadily becoming more popular in empirical research. Despite the fact that both methods have an elected affinity based on set theory, it is not obvious how a within-case method operating in a single case and a cross-case method operating on a population of cases are compatible and can be combined in empirical research. There is a need to reflect on whether and how set-theoretic MMR is internally coherent and how QCA and process tracing can be integrated in causal analysis. We develop a unifying foundation for causal analysis in set-theoretic MMR that highlights the roles and interplay of QCA and process tracing. We argue that causal inference via counterfactuals on the level of single cases integrates QCA and process tracing and assigns proper and equally valuable roles to both methods.
The literature proposes numerous so-called pseudo-R^{2} measures for evaluating "goodness of fit" in regression models with categorical dependent variables. Unlike ordinary least square-R^{2}, log-likelihood-based pseudo-R^{2}s do not represent the proportion of explained variance but rather the improvement in model likelihood over a null model. The multitude of available pseudo-R^{2} measures and the absence of benchmarks often lead to confusing interpretations and unclear reporting. Drawing on a meta-analysis of 274 published logistic regression models as well as simulated data, this study investigates fundamental differences of distinct pseudo-R^{2} measures, focusing on their dependence on basic study design characteristics. Results indicate that almost all pseudo-R^{2}s are influenced to some extent by sample size, number of predictor variables, and number of categories of the dependent variable and its distribution asymmetry. Hence, an interpretation by goodness-of-fit benchmark values must explicitly consider these characteristics. The authors derive a set of goodness-of-fit benchmark values with respect to ranges of sample size and distribution of observations for this measure. This study raises awareness of fundamental differences in characteristics of pseudo-R^{2}s and the need for greater precision in reporting these measures.
This study revisits the task of case selection in case study research, proposing a new typology of strategies that is explicit, disaggregated, and relatively comprehensive. A secondary goal is to explore the prospects for case selection by algorithm, aka ex ante, automatic, quantitative, systematic, or model-based case selection. We lay out a suggested protocol and then discuss its viability. Our conclusion is that it is a valuable tool in certain circumstances, but should probably not determine the final choice of cases unless the chosen sample is medium-sized. Our third goal is to discuss the viability of medium-n samples for case study research, an approach closely linked to algorithmic case selection and occasionally practiced by case study researchers. We argue that medium-n samples occupy an unstable methodological position, lacking the advantages of efficiency promised by traditional, small-n case studies but also lacking the advantages of representativeness promised by large-n samples.
Since the 1970s, catalogs of protest events have been at the heart of research on social movements. To measure how protest changes over time or varies across space, sociologists usually count the frequency of events, as either the dependent variable or a key independent variable. An alternative is to count the number of participants in protest. This article investigates demonstrations, strikes, and riots. Their size distributions manifest enormous variation. Most events are small, but a few large events contribute the majority of protesters. When events are aggregated by year or by city, the correlation between total participation and event frequency is low or modest. The choice of how to quantify protest is therefore vital; findings from one measure are unlikely to apply to another. The fact that the bulk of participation comes from large events has positive implications for the compilation of event catalogs. Rather than worrying about the underreporting of small events, concentrate on recording large ones accurately.
This research empirically evaluates data sets from the National Center for Education Statistics (NCES) for design effects of ignoring the sampling design in weighted two-level analyses. Currently, researchers may ignore the sampling design beyond the levels that they model which might result in incorrect inferences regarding hypotheses due to biased standard error estimates; the degree of bias depends on the informativeness of any ignored stratification and clustering in the sampling design. Some multilevel software packages accommodate first-stage sampling design information for two-level models but not all. For five example public release data sets from the NCES, design effects of ignoring the sampling design in unconditional and conditional two-level models are presented for 15 dependent variables selected based on a review of published research using these five data sets. Empirical findings suggest that there are minor effects of ignoring the additional sampling design and no differences in inference would be made had the first-stage sampling design been ignored. Strategically, researchers without access to multilevel software that can accommodate the sampling might consider including stratification variables as independent variables at level 2 of their model.
Analyzing relationships of necessity is important for both scholarly and applied research questions in the social sciences. An often-used technique for identifying such relationships—fuzzy set Qualitative Comparative Analysis (fsQCA)—has limited ability to make the most out of the data used. The set-theoretical technique fsQCA makes statements in kind (e.g., "a condition or configuration is necessary or not for an outcome"), thereby ignoring the variation in degree. We propose to apply a recently developed technique for identifying relationships of necessity that can make both statements in kind and in degree, thus making full use of variation in the data: Necessary Condition Analysis (NCA). With its ability to also make statements in degree ("a specific level of a condition is necessary or not for a specific level of the outcome"), NCA can complement the in kind analysis of necessity with fsQCA.
It is widely recognized that segregation processes are often the result of complex nonlinear dynamics. Empirical analyses of complex dynamics are however rare, because there is a lack of appropriate empirical modeling techniques that are capable of capturing complex patterns and nonlinearities. At the same time, we know that many social phenomena display nonlinearities. In this article, we introduce a new modeling tool in order to partly fill this void in the literature. Using data of all secondary schools in Stockholm county during the years 1990 to 2002, we demonstrate how the methodology can be applied to identify complex dynamic patterns like tipping points and multiple phase transitions with respect to segregation. We establish critical thresholds in schools’ ethnic compositions, in general, and in relation to various factors such as school quality and parents’ income, at which the schools are likely to tip and become increasingly segregated.
Many poverty measures are estimated by using sample data collected from social surveys. Two examples are the poverty gap and the poverty severity indices. A novel method for the estimation of these poverty indicators is described. Social surveys usually contain different variables, some of which can be used to improve the estimation of poverty measures. The proposed estimation methodology is based on this idea. The variance estimation and the construction of confidence intervals are also topics addressed in this article. Real survey data, extracted from the European Union Survey on Income and Living Conditions and based on various countries, are used to investigate some empirical properties of our estimators via Monte Carlo simulation studies. Empirical results indicate that the suggested methods can be more accurate than the customary estimator. Desirable results are also obtained for the proposed variances and confidence intervals. Various populations generated from the Gamma distribution also support our findings.
The item wording (or keying) effect is respondents’ differential response style to positively and negatively worded items. Despite decades of research, the nature of the effect is still unclear. This article proposes a potential reason; namely, that the item wording effect is scale-specific, and thus findings are applicable only to a particular measure involved in an investigation. Using multiple scales and several methods, the present study provides strong and converging support for this hypothesis. In light of the results, we reinterpret major findings of the item wording effect and propose possible future directions for research.
This article makes the case for shadowing as ethnographic methodology: focusing attention on what occurs as interlocutors move among settings and situations. Whereas ethnographers often zoom in on one principal set of situations or site, we argue that intersituational variation broadens and deepens the researcher’s ethnographic account as well as affording important correctives to some common inferential pitfalls. We provide four warrants for shadowing: (a) buttressing intersituational claims, (b) deepening ethnographers’ ability to trace meaning making by showing how meanings shift as they travel and how such shifts may affect interlocutors’ understandings, (c) gaining leverage on the structure of subjects’ social worlds, and (d) helping the ethnographer make larger causal arguments. We show the use value of these considerations through an analysis of violence and informal networks in an ethnography of immigrant Latinos who met to socialize and play soccer in a Los Angeles park.
Surveys provide a critical source of data for scholars, yet declining response rates are threatening the quality of data being collected. This threat is particularly acute among organizational studies that use key informants—the mean response rate for published studies is 34 percent. This article describes several response enhancing strategies and explains how they were implemented in a national study of organizations that achieved a 94 percent response rate. Data from this study are used to examine the relationship between survey response patterns and nonresponse bias by conducting nonresponse analyses on several important individual and organizational characteristics. The analyses indicate that nonresponse bias is associated with the mean/proportion and variance of these variables and their correlations with relevant organizational outcomes. After identifying the variables most susceptible to nonresponse bias, a final analysis calculates the minimum response rate those variables needed to ensure that they do not contain significant nonresponse bias. Heuristic versions of these analyses can be used by survey researchers during data collection (and by scholars retrospectively) to assess the representativeness of respondents and the degree of nonresponse bias variables contain. This study has implications for survey researchers, scholars who analyze survey data, and those who review their research.
Mobile technologies, specifically smartphones, offer social scientists a potentially powerful approach to examine the social world. They enable researchers to collect information that was previously unobservable or difficult to measure, expanding the realm of empirical investigation. For research that concerns resource-poor and hard-to-reach groups, smartphones may be particularly advantageous by lessening sample selection and attrition and by improving measurement quality of irregular and unstable experiences. At the same time, smartphones are nascent social science tools, particularly with less advantaged populations that may have different phone usage patterns and privacy concerns. Using findings from a smartphone study of men recently released from prison, this article discusses the strengths and challenges of smartphones as data collection tools among disadvantaged and hard-to-reach groups.
The last decade has witnessed resurgence in the interest in studying the causal mechanisms linking causes and effects. This article games through the methodological consequences that adopting a systems understanding of mechanisms has for what types of cases we should select when using in-depth case study methods like process tracing. The article proceeds in three steps. We first expose the assumptions that underpin the study of causal mechanisms as systems that have methodological implications for case selection. In particular, we take as our point of departure the case-based position, where: causation is viewed in deterministic and asymmetric terms, the focus is ensuring causal homogeneity in case-based research to enable cross-case inferences to be made, and finally where mechanisms are understood as more than just intervening variables but instead a system of interacting parts that transfers causal forces from causes to outcomes. We then develop a set of case selection guidelines that are in methodological alignment with these underlying assumptions. We then develop guidelines for research where the mechanism is the primary focus, contending that only typical cases where both X, Y, and the requisite contextual conditions are present should be selected. We compare our guidelines with the existing, finding that practices like selecting most/least-likely cases are not compatible with the underlying assumptions of tracing mechanisms. We then present guidelines for deviant cases, focusing on tracing mechanisms until they breakdown as a tool to shed light on omitted contextual and/or causal conditions.
Social scientists routinely address temporal dependence by adopting a simple technical fix. However, the correct identification strategy for a causal effect depends on causal assumptions. These need to be explicated and justified; almost no studies do so. This article addresses this shortcoming by offering a precise general statement of the (nonparametric) causal assumptions required to identify causal effects under temporal dependence. In particular, this article clarifies when one should condition or not condition on lagged dependent variables (LDVs) to identify causal effects: one should not condition on LDVs, if there is no reverse causation and no outcome autocausation; one should condition on LDVs if there are no unobserved common causes of treatment and the lagged outcome, or no unobserved persistent causes of the outcome. When only one of these is true (with one exception), the incorrect decision will induce bias. Absent a well-justified identification strategy, inferences should be appropriately qualified.
Survey data are usually of mixed type (quantitative, multistate categorical, and/or binary variables). Multidimensional scaling (MDS) is one of the most extended methodologies to visualize the profile structure of the data. Since the past 60s, MDS methods have been introduced in the literature, initially in publications in the psychometrics area. Nevertheless, sensitivity and robustness of MDS configurations have been topics scarcely addressed in the specialized literature. In this work, we are interested in the construction of robust profiles for mixed-type data using a proper MDS configuration. To this end, we propose to compare different MDS configurations (coming from different metrics) through a combination of sensitivity and robust analysis. In particular, as an alternative to classical Gower’s metric, we propose a robust joint metric combining different distance matrices, avoiding redundant information, via related metric scaling. The search for robustness and identification of outliers is done through a distance-based procedure related to geometric variability notions. In this sense, we propose a statistic for detecting multivariate outliers in the context of mixed-type data and evaluate its performance through a simulation study. Finally, we apply these techniques to a real data set provided by the largest humanitarian organization involved in social programs in Spain, where we are able to find in a robust way the most relevant factors defining the profiles of people that were under risk of being socially excluded in the beginning of the 2008 economic crisis.
Reliability of nonprobability online volunteer panels for epidemiological purposes has rarely been studied.
To assess the quality of a questionnaire on sexual and reproductive health (SRH) administered in a nonprobability Web panel and in a random telephone survey (n = 8,992; n = 8,437, age 16–49 years). Especially, we were interested in the possible difference in the association of sociodemographic variables and some outcome variables in the two surveys that are in the reliability of analytical epidemiological studies conducted in such panels.
Interventions to increase response rate were used in both surveys (four e-mail reminders, high number of call attempts and callbacks to refusals). Both were calibrated on the census population. Sociodemographic composition, effects of reminders, and prevalence were compared to their telephone counterpart. In addition, the associations of sociodemographic and sexual behaviors were compared in the two samples in multivariate logistic regressions.
The online survey had a lower response rate (20.0 percent vs. 44.8 percent) and a more distorted sociodemographic structure although the reminders improved the representativeness as did the analogous interventions on the telephone survey. Prevalences of SRH variables were similar for the common behaviors but higher online for the stigmatized behaviors, depending on gender. Overall, 29 percent of the 63 interactions studied were significant for males and 11 percent for women, although opposite effects of sociodemographic variables were rare (5 percent of the 171 tested for each gender).
Nonprobability online panels are to be used with caution to monitor SRH and conduct analytical epidemiological studies, especially among men.
Many randomized experiments in the social sciences allocate subjects to treatment arms at the time the subjects enroll. Desirable features of the mechanism used to assign subjects to treatment arms are often (1) equal numbers of subjects in intervention and control arms, (2) balanced allocation for population subgroups and across covariates, (3) ease of use, and (4) inability for a site worker to predict the treatment arm for a subject before he or she has been assigned. In general, a trade-off must be made among these features: Many mechanisms that achieve high balance do so at the cost of high predictability. In this article, we review methods for randomized assignment of individuals that have been discussed in the literature, evaluating the performance of each with respect to the desirable design features. We propose a method for controlling the amount of predictability in a study while achieving high balance across subgroups and covariates. The method is applicable when a database containing the subgroup membership and covariates of each potential participant is available in advance. We use simple simulation and graphical methods to evaluate the balance and predictability of randomization mechanisms when planning the study and describe a computer program implemented in the R statistical software package that prospectively evaluates candidate randomization methods.
In recent years, there has been increasing interest in the combination of two methods on the basis of set theory. In our introduction and this special issue, we focus on two variants of cross-case set-theoretic methods—qualitative comparative analysis (QCA) and typological theory (TT)—and their combination with process tracing (PT). Our goal is to broaden and deepen set-theoretic empirical research and equip scholars with guidance on how to implement it in multimethod research (MMR). At first glance, set-theoretic cross-case methods and PT seem to be highly compatible when causal relationships are conceptualized in terms of set theory. However, multiple issues have not so far been thoroughly addressed. Our article builds on the emerging MMR literature and seeks to enhance it in four ways. First, we offer a comprehensive and coherent elaboration of the two sequences in which case studies can be combined with a cross-case method. Second, we expand the perspective and discuss QCA and TT as two alternative methods for the cross-case analysis. Third, based on the idea of analytical priority, we introduce the distinction between a condition-centered and a mechanism-centered variant of set-theoretic MMR. Fourth, we point attention to the challenges of theorizing and analyzing arrangements of conditions and mechanisms associated with sufficient conjunctions.
Areal data have been used to good effect in a wide range of sociological research. One of the most persistent problems associated with this type of data, however, is the need to combine data sets with incongruous boundaries. To help address this problem, we introduce a new method for identifying common geographies. We show that identifying common geographies is equivalent to identifying components within a k-uniform k-partite hypergraph. This approach can be easily implemented using a geographic information system in conjunction with a simple search algorithm.
Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all combinations of possible controls as well as specified functional form issues, variable definitions, standard error calculations, and estimation commands. This allows analysts to present their core, preferred estimate in the context of a distribution of plausible estimates. Second, we develop a model influence analysis showing how each model ingredient affects the coefficient of interest. This shows which model assumptions, if any, are critical to obtaining an empirical result. We demonstrate the architecture and interpretation of multimodel analysis using data on the union wage premium, gender dynamics in mortgage lending, and tax flight migration among U.S. states. These illustrate how initial results can be strongly robust to alternative model specifications or remarkably dependent on a knife-edge specification.
Despite the increasing spread of standardized assessments of student learning, longitudinal data on achievement data are still lacking in many countries. This article raises the following question: Can we exploit cross-sectional assessments held at different schooling stages to evaluate how achievement inequalities related to individual-ascribed characteristics develop over time? This is a highly policy relevant issue, as achievement inequalities may develop in substantially different ways across educational systems. We discuss the issues involved in estimating dynamic models from repeated cross-sectional surveys in this context; consistently with a simple learning accumulation model, we propose an imputed regression strategy that allows to "link" two surveys and deliver consistent estimates of the parameters of interest. We then apply the method to Italian achievement data of fifth and sixth graders and investigate how inequalities develop between primary and lower secondary school.
We offer a new conceptualization and measurement models for constructs at the group-level of analysis in small group research. The conceptualization starts with classical notions of group behavior proposed by Tönnies, Simmel, and Weber and then draws upon plural subject theory by philosophers Gilbert and Tuomela to frame a new perspective applicable to many forms of small group behavior. In the proposed measurement model, a collective property is operationalized as shared interpersonal action that explicitly allows us to control for systematic (method) error and random error. Group members act as key informants of group properties and processes and are treated as methods in a multitrait multimethod setting to validate our models. The models are applied to data of 277 three-person groups to develop and illustrate new procedures for ascertaining variation in measures due to hypothesized construct(s), method error, and random error. Implications and guidelines for small group research are discussed.
Adjacent category logit models are ordered regression models that focus on comparisons of adjacent categories. These models are particularly useful for ordinal response variables with categories that are of substantive interest. In this article, we consider unconstrained and constrained versions of the partial adjacent category logit model, which is an extension of the traditional model that relaxes the proportional odds assumption for a subset of independent variables. In the unconstrained partial model, the variables without proportional odds have coefficients that freely vary across cutpoint equations, whereas in the constrained partial model two or more of these variables have coefficients that vary by common factors. We improve upon an earlier formulation of the constrained partial adjacent category model by introducing a new estimation method and conceptual justification for the model. Additionally, we discuss the connections between partial adjacent category models and other models within the adjacent approach, including stereotype logit and multinomial logit. We show that the constrained and unconstrained partial models differ only in terms of the number of dimensions required to describe the effects of variables with nonproportional odds. Finally, we illustrate the partial adjacent category logit models with empirical examples using data from the international social survey program and the general social survey.
Large-scale surveys typically exhibit data structures characterized by rich mutual dependencies between surveyed variables and individual-specific skip patterns. Despite high efforts in fieldwork and questionnaire design, missing values inevitably occur. One approach for handling missing values is to provide multiply imputed data sets, thus enhancing the analytical potential of the surveyed data. To preserve possible nonlinear relationships among variables and incorporate skip patterns that make the full conditional distributions individual specific, we adapt a full conditional multiple imputation approach based on sequential classification and regression trees. Individual-specific skip patterns and constraints are handled within imputation in a way ensuring the consistency of the sequence of full conditional distributions. The suggested approach is illustrated in the context of income imputation in the adult cohort of the National Educational Panel Study.
Explanatory typologies have recently experienced a renaissance as a research strategy for constructing and assessing causal explanations. However, both the new methodological works on explanatory typologies and the way such typologies have been used in practice have been affected by two shortcomings. First, no elaborate procedures for assessing the general explanatory power of a typological theory on the cross-case level have been devised. Second, rigorous selection procedures for within-case analysis are lacking. Against this background, we introduce a systematic measure that helps researchers assess the explanatory power on the cross-case level, first, within the scope set by a particular typological theory and, second, by investigating the transferability of the theory beyond these scope conditions via an increase in the number of cases. Drawing on recent methodological works on nested analysis, we show how researchers can identify key cases for process tracing based on the cross-case explanatory fit of the typological theory. We illustrate the purchase of our procedures by revisiting seminal studies from the field of comparative historical analysis.
We consider an ordinal regression model with latent variables to investigate the effects of observable and latent explanatory variables on the ordinal responses of interest. Each latent variable is characterized by correlated observed variables through a confirmatory factor analysis model. We develop a Bayesian adaptive lasso procedure to conduct simultaneous estimation and variable selection. Nice features including empirical performance of the proposed methodology are demonstrated by simulation studies. The model is applied to a study on happiness and its potential determinants from the Inter-university Consortium for Political and Social Research.
Recently, sociologists have expended much effort in attempts to define social mechanisms. We intervene in these debates by proposing that sociologists in fact have a choice to make between three standards of what constitutes a good mechanistic explanation: substantial, formal, and metaphorical mechanistic explanation. All three standards are active in the field, and we suggest that a more complete theory of mechanistic explanation in sociology must parse these three approaches to draw out the implicit evaluative criteria appropriate to each. Doing so will reveal quite different preferences for explanatory scope and nuance hidden under the ubiquitous term "social mechanism." Finally, moving beyond extensive debates about realism and antirealism, we argue prescriptively against "mechanistic fundamentalism" for sociology and advocate for a more pluralistic understanding of social causality.
For many years, sociologists, political scientists, and management scholars have readily relied on Qualitative Comparative Analysis (QCA) for the purpose of configurational causal modeling. However, this article reveals that a severe problem in the application of QCA has gone unnoticed so far: model ambiguities. These arise when multiple causal models fare equally well in accounting for configurational data. Mainly due to the uncritical import of an algorithm that is unsuitable for causal modeling, researchers have typically been unaware of the whole model space. As a result, there exists an indeterminable risk for practically all QCA studies published in the last quarter century to have presented findings that their data did not warrant. Using hypothetical data, we first identify the algorithmic source of ambiguities and discuss to what extent they affect different methodological aspects of QCA. By reanalyzing a published QCA study from rural sociology, we then show that model ambiguities are not a mere theoretical possibility but a reality in applied research, which can assume such extreme proportions that no causal conclusions whatsoever are possible. Finally, the prevalence of model ambiguities is examined by performing a comprehensive analysis of 192 truth tables across 28 QCA studies published in applied sociology. In conclusion, we urge that future QCA practice ensures full transparency with respect to model ambiguities, both by informing readers of QCA-based research about their extent and by employing algorithms capable of revealing them.
Obtaining predictions from regression models fit to multiply imputed data can be challenging because treatments of multiple imputation seldom give clear guidance on how predictions can be calculated, and because available software often does not have built-in routines for performing the necessary calculations. This research note reviews how predictions can be obtained using Rubin’s rules, that is, by being estimated separately in each imputed data set and then combined. It then demonstrates that predictions can also be calculated directly from the final analysis model. Both approaches yield identical results when predictions rely solely on linear transformations of the coefficients and calculate standard errors using the delta method and diverge only slightly when using nonlinear transformations. However, calculation from the final model is faster, easier to implement, and generates predictions with a clearer relationship to model coefficients. These principles are illustrated using data from the General Social Survey and with a simulation.
This study proposes using Internet search data from search engines like Google to produce state-level metrics that are useful in social science research. Generally, state-level research relies on demographic statistics, official statistics produced by government agencies, or aggregated survey data. However, each of these data sources has serious limitations in terms of both the availability of the data and its ability to capture important concepts. This study demonstrates how state-level Google search measures can be produced and then demonstrates the effectiveness of such measures in an empirical case: predicting state-level Tea Party movement mobilization. Drawing on existing studies of the Tea Party movement and theories of right-wing and conservative mobilization, state-level Google search measures for anti-immigrant sentiment and economic distress are developed and compared to traditional metrics that are typically used to measure these concepts, such as the unemployment rate and the international immigration rate in their ability to successfully predict Tea Party event counts. The results show that the Google search measures are effective in predicting Tea Party mobilization in a way that is consistent with existing theory, while the traditional measures are not.
Despite recent and growing interest in using Twitter to examine human behavior and attitudes, there is still significant room for growth regarding the ability to leverage Twitter data for social science research. In particular, gleaning demographic information about Twitter users—a key component of much social science research—remains a challenge. This article develops an accurate and reliable data processing approach for social science researchers interested in using Twitter data to examine behaviors and attitudes, as well as the demographic characteristics of the populations expressing or engaging in them. Using information gathered from Twitter users who state an intention to not vote in the 2012 presidential election, we describe and evaluate a method for processing data to retrieve demographic information reported by users that is not encoded as text (e.g., details of images) and evaluate the reliability of these techniques. We end by assessing the challenges of this data collection strategy and discussing how large-scale social media data may benefit demographic researchers.
Survey data are often subject to various types of errors such as misclassification. In this article, we consider a model where interest is simultaneously in two correlated response variables and one is potentially subject to misclassification. A motivating example of a recent study of the impact of a sexual education course for adolescents is considered. A simulation-based sample size determination scheme is applied to illustrate the impact of misclassification on power and bias for the parameters of interest.
This article introduces a number of methods that can be useful for examining the emergence of large-scale structures in collaboration networks. The study contributes to sociological research by investigating how clusters of research collaborators evolve and sometimes percolate in a collaboration network. Typically, we find that in our networks, one cluster among the leading ones eventually wins the growth race by percolating through the network, spanning it and rapidly filling up a significant volume of it. We show how this process is governed by the dynamics of cluster growth in the network. When operating in a percolating regime, this class of networks possesses many useful functional properties, which have important sociological implications. We first develop the methodological tools to perform a study of the intrinsic clustering process. Then, to understand the actual large-scale structure formation process in the network, we apply the theoretical methods to simulate a number of realistic scenarios, including one based on actual data on the collaboration behavior of a sample of researchers. From the perspective of social science research, our methods can be adapted to suit the application domains of many other types of real social processes.
That rates of normative behaviors produced by sample surveys are higher than actual behavior warrants is well evidenced in the research literature. Less well understood is the source of this error. Twenty-five cognitive interviews were conducted to probe responses to a set of common, conventional survey questions about one such normative behavior: religious service attendance. Answers to the survey questions and cognitive probes are compared both quantitatively and qualitatively. Half of the respondents amended their answer during cognitive probing, all amendments indicating a lower rate of attendance than originally reported, yielding a statistically significant reduction in reported attendance. Narrative responses shed light onto the source of bias, as respondents pragmatically interpreted the survey question to allow themselves to include other types of religious behavior, to report on a more religious past, and discount current constraints on their religious behavior, in order to report aspirational or normative religious identities.
The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A simulation study was conducted, focusing on potential variables related to DIF in polytomous items, such as DIF pattern, sample size, magnitude, and percentage of DIF items. The results showed that DIF patterns and the number of DIF items affected the type I error rates and effect size of t-test values. The results highlighted the need to analyze DIF before making comparative group interpretations.
This article addresses the problem of estimating the proportion _{S} of the population belonging to a sensitive group using optional randomized response technique in stratified sampling based on Mangat model that has proportional and Neyman allocation and larger gain in efficiency. Numerically, it is found that the suggested model is more efficient than Kim and Warde stratified randomized response model and Mangat model.
Latent curve models have become a popular approach to the analysis of longitudinal data. At the individual level, the model expresses an individual’s response as a linear combination of what are called "basis functions" that are common to all members of a population and weights that may vary among individuals. This article uses differential calculus to define the basis functions of a latent curve model. This provides a meaningful interpretation of the unique and dynamic impact of each basis function on the individual-level response. Examples are provided to illustrate this sensitivity, as well as the sensitivity of the basis functions, to changes in the measure of time.
We address the task of determining, from statistical averages alone, whether a population under study consists of several subpopulations, unknown to the investigator, each responding to a given treatment markedly differently. We show that such determination is feasible in three cases: (1) randomized trials with binary treatments, (2) models where treatment effects can be identified by adjustment for covariates, and (3) models in which treatment effects can be identified by mediating instruments. In each of these cases, we provide an explicit condition which, if confirmed empirically, proves that treatment effect is not uniform but varies appreciably across individuals.
In this article, we review popular parametric models for analyzing panel data and introduce the latest advances in matching methods for panel data analysis. To the extent that the parametric models and the matching methods offer distinct advantages for drawing causal inference, we suggest using both to cross-validate the evidence. We demonstrate how to use these methods by examining race-of-interviewer effects (ROIE) in the 2006 to 2010 panel data of the General Social Survey. We find that ROIE mostly concentrate on race-related outcomes and may vary by respondent’s race for some outcomes. But we find no statistically significant evidence that ROIE vary by the interview mode (i.e., in person vs. by phone). Our study has both methodological and substantive implications for future research.
The combined usage of qualitative comparative analysis (QCA) and process tracing (PT) in set-theoretic multi-method research (MMR) holds great potential for reaching valid inferences. Established views of case selection after QCA hold that studying negative cases provides lessons about the causes of an outcome in a limited set of circumstances. In particular, recommendations focus on negative cases only if they contradict the analysis or if suitably similar positive match cases exist to leverage comparisons. By contrast, I argue that set-theoretic MMR can gain from studying negative cases even when these conditions do not hold. First, negative cases can give insights into why an outcome fails to occur. Second, they can help guard against theoretical inconsistency between explanations for the outcome and its absence. Third, they can ensure that the mechanisms producing the outcome and its absence are not too similar to be logically capable of resulting in different outcomes. Following these arguments, I recommend that studies of negative cases in set-theoretic MMR focus on failure mechanisms in carefully bounded populations, search for theoretical inconsistency among mechanisms, and focus in part on the mechanism proposed to produce the outcome.
This article deals with a model for describing a sequence of events, for example, education is typically attained by a set of transitions from one level of education to the next. In particular, this article tries to reconcile measures describing the effect of a variable on each of these transitions, with measures describing the effect of this variable on the final outcome of that process. Such a relationship has been known to exist within a sequential logit model, but it has hardly been used in empirical research mainly because of an absence of a practical way of giving it a substantive interpretation. This article tries to provide such an interpretation by showing that the effect on the final outcome is a weighted sum of the effects on each transition, such that a transition gets more weight if more people are at risk of passing that transition, passing the transition is more differentiating, and people gain more from passing.
This study evaluated three types of bias—total, measurement, and selection bias (SB)—in three sequential mixed-mode designs of the Dutch Crime Victimization Survey: telephone, mail, and web, where nonrespondents were followed up face-to-face (F2F). In the absence of true scores, all biases were estimated as mode effects against two different types of benchmarks. In the single-mode benchmark (SMB), effects were evaluated against a F2F reference survey. In an alternative analysis, a "hybrid-mode benchmark" (HMB) was used, where effects were evaluated against a mix of the measurements of a web survey and the SB of a F2F survey. A special reinterview design made available additional auxiliary data exploited in estimation for a range of survey variables. Depending on the SMB and HMB perspectives, a telephone, mail, or web design with a F2F follow-up (SMB) or a design involving only mail and/or web but not a F2F follow-up (HMB) is recommended based on the empirical findings.
In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of domains are interrelated and their joint analysis justified. Also, we discuss about the quality of the results obtained using joint sequence analysis. In particular, we focus on cluster analysis and propose criteria to assess whether clusters obtained using a joint approach satisfactorily describe each domain.
Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error and corrects for both. Like
To measure what determines people’s attitudes, definitions, or decisions, surveys increasingly ask respondents to judge vignettes. A vignette typically describes a hypothetical situation or object as having various attributes (dimensions). In factorial surveys, the values (levels) of dimensions are experimentally varied, so that their impact on respondents’ judgments can be estimated. Drawing on the literature in cognitive psychology and survey methodology, we examine two research questions: Does the order in which dimensions are presented impact the vignette evaluations and change substantive conclusions? Under which conditions are order effects mostly likely to occur? Using data from a web survey of 300 students, we analyze several possible moderators: features of the vignette design, characteristics of respondents, and interactions between these features. Results show that strong order effects can occur, but only when the vignettes are of a minimum complexity or respondents show a low attitude certainty.
Often in sociology, researchers are confronted with nonnormal variables whose joint distribution they wish to explore. Yet, assumptions of common measures of dependence can fail or estimating such dependence is computationally intensive. This article presents the copula method for modeling the joint distribution of two random variables, including descriptions of the method, the most common copula distributions, and the nonparametric measures of association derived from the models. Copula models, which are estimated by standard maximum likelihood techniques, make no assumption about the form of the marginal distributions, allowing consideration of a variety of models and distributions in the margins and various shapes for the joint distribution. The modeling procedure is demonstrated via a simulated example of spousal mortality and empirical examples of (1) the association between unemployment and suicide rates with time series models and (2) the dependence between a count variable (days drinking alcohol) and a skewed, continuous variable (grade point average) while controlling for predictors of each using the National Longitudinal Survey of Youth 1997. Other uses for copulas in sociology are also described.
Measuring values in sociological research sometimes involves the use of ranking data. A disadvantage of a ranking assignment is that the order in which the items are presented might influence the choice preferences of respondents regardless of the content being measured. The standard procedure to rule out such effects is to randomize the order of items across respondents. However, implementing this design may be impractical and the biasing impact of a response order effect cannot be evaluated. We use a latent choice factor (LCF) model that allows statistically controlling for response order effects. Furthermore, the model adequately deals with the known issue of ipsativity of ranking data. Applying this model to a Dutch survey on work values, we show that a primacy effect accounts for response order bias in item preferences. Our findings demonstrate the usefulness of the LCF model in modeling ranking data while taking into account particular response biases.
Although much has been written about the process of party system institutionalization in different regions, the reasons why some party systems institutionalize while others do not still remain a mystery. Seeking to fill this lacuna in the literature, and using a mixed-methods research approach, this article constitutes a first attempt to answer simultaneously the following three questions: (1) What specific factors help party systems to institutionalize (or not)? (2) What are the links (in terms of time and degree) as well as the causal mechanisms behind such relationships? and (3) how do they affect a particular party system? In order to do so, this article focuses on the study of party system development and institutionalization in 13 postcommunist democracies between 1990 and 2010. Methodologically, the article innovates in five respects. First, it continues the debate on the importance of "mixed methods" when trying to answer different research questions. Second, it adds to the as yet brief literature on the combination of process tracing and qualitative comparative analysis. Third, it constitutes the first attempt to date to use a most similar different outcome/most different same outcome procedure in order to reduce causal complexity before undertaking a crisp-set qualitative comparative analysis. Third, it also shows the merits of combining both congruence and process tracing in the same comparative study. Finally, it also develops a novel "bipolar comparative method" to explain the extent to which opposite outcomes are determined by reverse conditions and conflicting intervening causal forces.
We extend a unified and easy-to-use approach to measurement error and missing data. In our companion article, Blackwell, Honaker, and King give an intuitive overview of the new technique, along with practical suggestions and empirical applications. Here, we offer more precise technical details, more sophisticated measurement error model specifications and estimation procedures, and analyses to assess the approach’s robustness to correlated measurement errors and to errors in categorical variables. These results support using the technique to reduce bias and increase efficiency in a wide variety of empirical research.
Age–period–cohort (APC) models are designed to estimate the independent effects of age, time periods, and cohort membership. However, APC models suffer from an identification problem: There are no unique estimates of the independent effects that fit the data best because of the exact linear dependency among age, period, and cohort. Among methods proposed to address this problem, using unequal-interval widths for age, period, and cohort categories appears to break the exact linear dependency and thus solve the identification problem. However, this article shows that the identification problem remains in these models; in fact, they just implicitly impose multiple block constraints on the age, period, and cohort effects to achieve identifiability. These constraints depend on an arbitrary choice of widths for the age, period, and cohort intervals and can have nontrivial effects on the estimates. Because these assumptions are extremely difficult, if not impossible, to verify in empirical research, they are qualitatively no different from the assumptions of other constrained estimators. Therefore, the unequal-intervals approach should not be used without an explicit rationale justifying their constraints.
Social media websites such as Facebook and Twitter provide an unprecedented amount of qualitative data about organizations and collective behavior. Yet these new data sources lack critical information about the broader social context of collective behavior—or protect it behind strict privacy barriers. In this article, I introduce social media survey apps (SMSAs) that adjoin computational social science methods with conventional survey techniques in order to enable more comprehensive analysis of collective behavior online. SMSAs (1) request large amounts of public and non-public data from organizations that maintain social media pages, (2) survey these organizations to collect additional data of interest to a researcher, and (3) return the results of a scholarly analysis back to these organizations as incentive for them to participate in social science research. SMSAs thus provide a highly efficient, cost-effective, and secure method for extracting detailed data from very large samples of organizations that use social media sites. This article describes how to design and implement SMSAs and discusses an application of this new method to study how nonprofit organizations attract public attention to their cause on Facebook. I conclude by evaluating the quality of the sample derived from this application of SMSAs and discussing the potential of this new method to study non-organizational populations on social media sites as well.
Randomized controlled trials (RCTs) and quasi-experimental designs like regression discontinuity (RD) designs, instrumental variable (IV) designs, and matching and propensity score (PS) designs are frequently used for inferring causal effects. It is well known that the features of these designs facilitate the identification of a causal estimand and, thus, warrant a causal interpretation of the estimated effect. In this article, we discuss and compare the identifying assumptions of quasi-experiments using causal graphs. The increasing complexity of the causal graphs as one switches from an RCT to RD, IV, or PS designs reveals that the assumptions become stronger as the researcher’s control over treatment selection diminishes. We introduce limiting graphs for the RD design and conditional graphs for the latent subgroups of compliers, always takers, and never takers of the IV design, and argue that the PS is a collider that offsets confounding bias via collider bias.
One interesting idea in social network analysis is the directionality test that utilizes the directions of social ties to help identify peer effects. The null hypothesis of the test is that if contextual factors are the only force that affects peer outcomes, the estimated peer effects should not differ, if the directions of social ties are reversed. In this article, I statistically formalize this test and investigate its properties under various scenarios. In particular, I point out the validity of the test is contingent on the presence of peer selection, sampling error, and simultaneity bias. I also outline several methods that can help provide causal estimates of peer effects in social networks.
Effects of rating scale forms on cross-sectional reliability and measurement equivalence were investigated. A randomized experimental design was implemented, varying category labels and number of categories. The participants were 800 students at two German universities. In contrast to previous research, reliability assessment method was used, which relies on the congeneric measurement model. The experimental manipulation had differential effects on the reliability scores and measurement equivalence. Attitude strength seems to be a relevant moderator variable, which influences measurement equivalence. Overall, the results show that measurement quality is influenced by rating scale forms. Results are discussed in terms of their implications for latent variables measurement.
This article specifies a multilevel measurement model for survey response when data are nested. The model includes a test–retest model of reliability, a confirmatory factor model of interitem reliability with item-specific bias effects, an individual-level model of the biasing effects due to respondent characteristics, and a neighborhood-level model of construct validity. We apply this model for measuring informal social control within collective efficacy theory. Estimating the model on 3,260 respondents nested within 123 Seattle neighborhoods, we find that measures of informal control show reasonable test–retest and interitem reliability. We find support for the hypothesis that respondents’ assessments of whether their neighbors would intervene in specific child deviant acts are related to whether they have observed such acts in the past, which is consistent with a cognitive model of survey response. Finally, we find that, when proper measurement models are not used, the effects of some neighborhood covariates on informal control are biased upward and the effect of informal social control on violence is biased downward.
The factorial survey is an experimental design consisting of varying situations (vignettes) that have to be judged by respondents. For more complex research questions, it quickly becomes impossible for an individual respondent to judge all vignettes. To overcome this problem, random designs are recommended most of the time, whereas quota designs are not discussed at all. First comparisons of random designs with fractional factorial and D-efficient designs are based on fictitious data, first comparisons with fractional factorial and confounded factorial designs are restricted to theoretical considerations. The aim of this contribution is to compare different designs regarding their reliability and their internal validity. The benchmark for the empirical comparison is established by the estimators from a parsimonious full factorial design, each answered by a sample of 132 students (real instead of fictitious data). Multilevel analyses confirm that, if they exist, balanced confounded factorial designs are ideal. A confounded D-efficient design, as proposed for the first time in this article, is also superior to simple random designs.
Contemporary case studies rely on verbal arguments and set theory to build or evaluate theoretical claims. While existing procedures excel in the use of qualitative information (information about kind), they ignore quantitative information (information about degree) at central points of the analysis. Effectively, contemporary case studies rely on crisp sets. In this article, I make the case for fuzzy-set case studies. I argue that the mechanisms that are the focal points of contemporary case study methods can be modeled as set-theoretic causal structures. I show how case study claims translate into sufficiency statements. And I show how these statements can be evaluated using fuzzy-set tools. This procedure permits the use of both qualitative and quantitative information throughout a case study. As a consequence, the analysis can determine whether one or more cases are both qualitatively and quantitatively consistent with its claims. Or whether some or all cases are consistent by kind but not by degree.
The use of controlled comparisons pervades comparative historical analysis. Heated debates have surrounded the methodological purchase of such comparisons. However, the quality and validity of the conceptual building blocks on which the comparisons are based have largely been ignored. This article discusses a particular problem pertaining to these issues, that is, the danger of creating false historical analogies that do not serve to control for relevant explanatory factors. It is argued that this danger increases when we use composite (thick) concepts that are aggregated via a (loose) family resemblance logic. It is demonstrated that this problem seriously affects the way the concept of feudalism has entered comparative historical analysis. On this basis, an alternative conceptual strategy—centered on teasing out the core attributes of thick and loose concepts—is proposed.
Given their capacity to identify causal relationships, experimental audit studies have grown increasingly popular in the social sciences. Typically, investigators send fictitious auditors who differ by a key factor (e.g., race) to particular experimental units (e.g., employers) and then compare treatment and control groups on a dichotomous outcome (e.g., hiring). In such scenarios, an important design consideration is the power to detect a certain magnitude difference between the groups. But power calculations are not straightforward in standard matched tests for dichotomous outcomes. Given the paired nature of the data, the number of pairs in the concordant cells (when neither or both auditor receives a positive response) contributes to the power, which is lower as the sum of the discordant proportions approaches one. Because these quantities are difficult to determine a priori, researchers must exercise particular care in experimental design. We here present sample size and power calculations for McNemar’s test using empirical data from an audit study on misdemeanor arrest records and employability. We then provide formulas and examples for cases involving more than two treatments (Cochran’s Q test) and nominal outcomes (Stuart–Maxwell test). We conclude with concrete recommendations concerning power and sample size for researchers designing and presenting matched audit studies.
Despite their long trajectory in the social sciences, few systematic works analyze how often and for what purposes focus groups appear in published works. This study fills this gap by undertaking a meta-analysis of focus group use over the last 10 years. It makes several contributions to our understanding of when and why focus groups are used in the social sciences. First, the study explains that focus groups generate data at three units of analysis, namely, the individual, the group, and the interaction. Although most researchers rely upon the individual unit of analysis, the method’s comparative advantage lies in the group and interactive units. Second, it reveals strong affinities between each unit of analysis and the primary motivation for using focus groups as a data collection method. The individual unit of analysis is appropriate for triangulation; the group unit is appropriate as a pretest; and the interactive unit is appropriate for exploration. Finally, it offers a set of guidelines that researchers should adopt when presenting focus groups as part of their research design. Researchers should, first, state the main purpose of the focus group in a research design; second, identify the primary unit of analysis exploited; and finally, list the questions used to collect data in the focus group.
With one treated and one untreated periods, difference in differences (DD) requires the untreated response changes to be the same across the treatment and control groups, if the treatment were withheld contrary to the fact. A natural way to check the condition is to backtrack one period and examine the response changes in two pretreatment periods. If the condition does not hold in the pretreatment periods, then a modified DD takes the form of "generalized difference in differences (GDD)," which is a triple difference (TD) with one more time-wise difference than DD. GDD generalizes DD with a weaker identification condition in the sense that a time-constant, but not necessarily zero, time/selection effect is allowed. One more time-wise differencing (quadruple difference [QD]) than GDD allows for the time/selection effect even to change over time, which makes it possible to test for the GDD identification condition. Simple panel least squares estimators (LSEs)/tests for DD and GDD are proposed and an empirical illustration is presented.
This study compares the performance of two approaches in analysing four-point Likert rating scales with a factorial model: the classical factor analysis (FA) and the item factor analysis (IFA). For FA, maximum likelihood and weighted least squares estimations using Pearson correlation matrices among items are compared. For IFA, diagonally weighted least squares and unweighted least squares estimations using items polychoric correlation matrices are compared. Two hundred and ten conditions were simulated in a Monte Carlo study considering: one to three factor structures (either, independent and correlated in two levels), medium or low quality of items, three different levels of item asymmetry and five sample sizes. Results showed that IFA procedures achieve equivalent and accurate parameter estimates; in contrast, FA procedures yielded biased parameter estimates. Therefore, we do not recommend classical FA under the conditions considered. Minimum requirements for achieving accurate results using IFA procedures are discussed.
To explain the inequalities in access to a discrete good G across two populations, or across time in a single national context, it is necessary to distinguish, for each population or period of time, the effect of the diffusion of G from that of unequal outcomes of underlying micro-social processes. The inequality of outcomes of these micro-social processes is termed inequality within the selection process. We present an innovative index of measurement that captures variations in this aspect of inequality of opportunity and is insensitive to margins. We applied this index to the analysis of inequality of educational opportunity by exploring the effects of the British 1944 Education Act, of which various accounts have been offered. The relationships between the measure of inequality within a selection process presented and classical measures of inequality of opportunity are analyzed, as well as the benefits of using this index with regard to the insight it provides for interpreting data.
This article describes (1) the survey methodological and statistical characteristics of the nonrandomized method for surveying sensitive questions for both cross-sectional and panel survey data and (2) the way to use the incompletely observed variable obtained from this survey method in logistic regression and in loglinear and log-multiplicative association models. The nonrandomized method, which was introduced by Yu, Tian, and Tang and Tian et al. for surveying sensitive questions, is much more efficient than randomized response methods, and unlike the latter, the former can be included in a mail survey. The method also has unique advantages compared with the randomized response method for the analysis of panel survey data. A simulation analysis presents how the relative efficiency of statistics in the analysis of data collected with this method changes as a function of mixing probability compared with a hypothetical situation of collecting responses of a sensitive question from every sample subject. The use of statistical software for the application of the logistic regression model, with an incompletely observed variable from cross-sectional and panel surveys, is also described.
In this article, first we obtained the correct mean square error expression of Gupta and Shabbir’s linear weighted estimator of the ratio of two population proportions. Later we suggested the general class of ratio estimators of two population proportions. The usual ratio estimator, Wynn-type estimator, Singh, Singh, and Kaur difference-type estimator, and Gupta and Shabbir estimator have been found to be members of the suggested class.
Extant research comparing survey self-reports of normative behavior to direct observations and time diary data have yielded evidence of extensive measurement bias. However, most of this research program has relied on observational data, comparing independent samples from the same target population, rather than comparing survey self-reports to a criterion measure for individual respondents. This research addresses the next step using data from two studies. In each study, respondents completed a conventional survey questionnaire, including questions about frequency of religious behavior. Respondents were then asked to participate in a text messaging (short message service) data collection procedure, reporting either (1) participation in religious behavior specifically or (2) all changes in major activity without explicitly specifying religious behavior. Findings suggest that directive measurement, priming the respondent to consider the focal behavior, is a cause of measurement bias.
Since the seminal introduction of the propensity score (PS) by Rosenbaum and Rubin, PS-based methods have been widely used for drawing causal inferences in the behavioral and social sciences. However, the PS approach depends on the ignorability assumption: there are no unobserved confounders once observed covariates are taken into account. For situations where this assumption may be violated, Heckman and his associates have recently developed a novel approach based on marginal treatment effects (MTEs). In this article, we (1) explicate the consequences for PS-based methods when aspects of the ignorability assumption are violated, (2) compare PS-based methods and MTE-based methods by making a close examination of their identification assumptions and estimation performances, (3) apply these two approaches in estimating the economic return to college using data from the National Longitudinal Survey of Youth (NLSY) of 1979 and discuss their discrepancies in results. When there is a sorting gain but no systematic baseline difference between treated and untreated units given observed covariates, PS-based methods can identify the treatment effect of the treated (TT). The MTE approach performs best when there is a valid and strong instrumental variable (IV). In addition, this article introduces the "smoothing-difference PS-based method," which enables us to uncover heterogeneity across people of different PSs in both counterfactual outcomes and treatment effects.
This article shows how statistical matching methods can be used to select "most similar" cases for qualitative analysis. I first offer a methodological justification for research designs based on selecting most similar cases. I then discuss the applicability of existing matching methods to the task of selecting most similar cases and propose adaptations to meet the unique requirements of qualitative analysis. Through several applications, I show that matching methods have advantages over traditional selection in "most similar" case designs: They ensure that most similar cases are in fact most similar; they make scope conditions, assumptions, and measurement explicit; and they make case selection transparent and replicable.
Case studies appear prominently in political science, sociology, and other social science fields. A scholar employing a case study research design in an effort to estimate causal effects must confront the question, how should cases be selected for analysis? This question is important because the results derived from a case study research program ultimately and unavoidably rely on the criteria used to select the cases. While the matter of case selection is at the forefront of research on case study design, an analytical framework that can address it in a comprehensive way has yet to be produced. We develop such a framework and use it to evaluate nine common case selection methods. Our simulation-based results show that the methods of simple random sampling, influential case selection, and diverse case selection generally outperform other common methods. And, when a research design mandates that only a very small number of cases, say one or two, be selected in the course of a research program, the very simple method of sampling from the largest cell of a 2 x 2 table is competitive with other, more complicated, case selection methods. We show as well that a number of common case selection strategies work well only in idiosyncratic situations, and we argue that these methods should be abandoned in favor of the more powerful and robust case selection methods that our analytical framework identifies.
The recent change in the general social survey (GSS) to a rotating panel design is a landmark development for social scientists. Sociological methodologists have argued that fixed-effects (FE) models are generally the best starting point for analyzing panel data because they allow analysts to control for unobserved time-constant heterogeneity. We review these treatments and demonstrate the advantages of FE models in the context of the GSS. We also show, however, that FE models have two rarely tested assumptions that can seriously bias parameter estimates when violated. We provide simple tests for these assumptions. We further demonstrate that FE models are extremely sensitive to the correct specification of temporal lags. We provide a simulation and a proof to show that the use of incorrect lags in FE models can lead to coefficients that are the opposite sign of the true parameter values.
The study of causal mechanisms interests scholars across the social sciences. Case studies can be a valuable tool in developing knowledge and hypotheses about how causal mechanisms function. The usefulness of case studies in the search for causal mechanisms depends on effective case selection, and there are few existing guidelines for selecting cases to study causal mechanisms. We outline a general approach for selecting cases for pathway analysis: a mode of qualitative research that is part of a mixed-method research agenda, which seeks to (1) understand the mechanisms or links underlying an association between some explanatory variable, X1, and an outcome, Y, in particular cases and (2) generate insights from these cases about mechanisms in the unstudied population of cases featuring the X1/Y relationship. The gist of our approach is that researchers should choose cases for comparison in light of two criteria. The first criterion is the expected relationship between X1/Y, which is the degree to which cases are expected to feature the relationship of interest between X1 and Y. The second criterion is variation in case characteristics or the extent to which the cases are likely to feature differences in characteristics that can facilitate hypothesis generation. We demonstrate how to apply our approach and compare it to a leading example of pathway analysis in the so-called resource curse literature, a prominent example of a correlation featuring a nonlinear relationship and multiple causal mechanisms.
Qualitative Comparative Analysis (QCA) is a method for cross-case analyses that works best when complemented with follow-up case studies focusing on the causal quality of the solution and its constitutive terms, the underlying causal mechanisms, and potentially omitted conditions. The anchorage of QCA in set theory demands criteria for follow-up case studies that are different from those known from regression-based multimethod research (MMR). Based on the evolving research on set-theoretic MMR, we introduce principles for formalized case selection and causal inference after a fuzzy-set QCA on sufficiency. Using an empirical example for illustration, we elaborate on the principles of counterfactuals for intelligible causal inference in the analysis of three different types of cases. Furthermore, we explain how case-based counterfactual inferences on the basis of QCA solutions are related to counterfactuals in the course of processing a truth table in order to produce a solution. We then flesh out two important functions that ideal types play for QCA-based case studies: First, they inform the development of formulas for the choice of the best available cases for with-case analysis and, second, establish the boundaries of generalization of the causal inferences.
Does participation in one wave of a survey have an effect on respondents’ answers to questions in subsequent waves? In this article, we investigate the presence and magnitude of "panel conditioning" effects in one of the most frequently used data sets in the social sciences: the General Social Survey (GSS). Using longitudinal records from the 2006, 2008, and 2010 surveys, we find convincing evidence that at least some GSS items suffer from this form of bias. To rule out the possibility of contamination due to selective attrition and/or unobserved heterogeneity, we strategically exploit a series of between-person comparisons across time-in-survey groups. This methodology, which can be implemented whenever researchers have access to at least three waves of rotating panel data, is described in some detail so as to facilitate future applications in data sets with similar design elements.
Storytelling has long been recognized as central to human cognition and communication. Here we explore a more active role of stories in social science research, not merely to illustrate concepts but also to develop new ideas and evaluate hypotheses, for example, in deciding that a research method is effective. We see stories as central to engagement with the development and evaluation of theories, and we argue that for a story to be useful in this way, it should be anomalous (representing aspects of life that are not well explained by existing models) and immutable (with details that are well-enough established that they have the potential to indicate problems with a new model). We develop these ideas through considering two well-known examples from the work of Karl Weick and Robert Axelrod, and we discuss why transparent sourcing (in the case of Axelrod) makes a story a more effective research tool, whereas improper sourcing (in the case of Weick) interferes with the key useful roles of stories in the scientific process.
Mixed-mode surveys are known to be susceptible to mode-dependent selection and measurement effects, collectively referred to as mode effects. The use of different data collection modes within the same survey may reduce selectivity of the overall response but is characterized by measurement errors differing across modes. Inference in sample surveys generally proceeds by correcting for selectivity—for example, by applying calibration estimators—and ignoring measurement error. When a survey is conducted repeatedly, such inferences are valid only if the measurement error remains constant between surveys. In sequential mixed-mode surveys, it is likely that the mode composition of the overall response differs between subsequent editions of the survey, leading to variations in the total measurement error and invalidating classical inferences. An approach to inference in these circumstances, which is based on calibrating the mode composition of the respondents toward fixed levels, is proposed. Assumptions and risks are discussed and explored in a simulation and applied to the Dutch crime victimization survey.
We delineate the underlying homogeneity assumption, procedural variants, and implications of the comparative method and distinguish this from Mill’s method of difference. We demonstrate that additional units can provide "placebo" tests for the comparative method even if the scope of inference is limited to the two units under comparison. Moreover, such tests may be available even when these units are the most similar pair of units on the control variables with differing values of the independent variable. Small-n analyses using this method should therefore, at a minimum, clearly define the dependent, independent, and control variables so they may be measured for additional units, and specify how the control variables are weighted in defining similarity between units. When these tasks are too difficult, process tracing of a single unit may be a more appropriate method. We illustrate these points with applications to two studies.
The positive relationship between family formation and regular weekly religious service attendance is well established, but cross-sectional data make it difficult to be confident that this relationship is causal. Moreover, if the relationship is causal, cross-sectional data make it difficult to disentangle the effects of three distinct family-formation events: marrying, having a child, and having a child who reaches school age. We use three waves of the new General Social Survey panel data to disentangle these separate potential effects. Using random-, fixed-, and hybrid-effect models, we show that, although in cross-section marriage and children predict attendance across individuals, neither leads to increased attendance when looking at individuals who change over time. Having a child who becomes school aged is the only family-formation event that remains associated with increased attendance among individuals who change over time. This suggests that the relationships between marriage and attending and between having a first child (or, for that matter, having several children) and attending are spurious, causal in the other direction, or indirect (since marrying and having a first child make it more likely that one will eventually have a school-age child). Adding a school-age child in the household is the only family-formation event that directly leads to increased attendance.
This is a comment suggesting that Jerolmack and Khan’s article in this issue embodies news from "somewhere," in arguing that ethnography can emphasize interaction in concrete situations and what people do rather than what they say about what they do. However, their article also provides news from "nowhere," in that ethnography often claims to prioritize in situ organization while dipping into an unconstrained reservoir of distant structures that analytically can subsume and potentially eviscerate the local order. I elaborate on each of these somewhere/nowhere ideas. I also briefly point to the considerable ethnomethodological and conversation analytic research of the last several decades that address the structural issue. Such research, along with other traditions in ethnography, suggest that investigators can relate social or political contexts to concrete situations provided that there is, in the first place, preservation of the parameters of everyday life and the exactitude of the local order.
There are over three decades of largely unrebutted criticism of regression analysis as practiced in the social sciences. Yet, regression analysis broadly construed remains for many the method of choice for characterizing conditional relationships. One possible explanation is that the existing alternatives sometimes can be seen by researchers as unsatisfying. In this article, we provide a different formulation. We allow the regression model to be incorrect and consider what can be learned nevertheless. To this end, the search for a correct model is abandoned. We offer instead a rigorous way to learn from regression approximations. These approximations, not "the truth," are the estimation targets. There exist estimators that are asymptotically unbiased and standard errors that are asymptotically correct even when there are important specification errors. Both can be obtained easily from popular statistical packages.
This article presents an analysis of interviewer effects on the process leading to cooperation or refusal in face-to-face surveys. The focus is on the interaction between the householder and the interviewer on the doorstep, including initial reactions from the householder, and interviewer characteristics, behaviors, and skills. In contrast to most previous research on interviewer effects, which analyzed final response behavior, the focus here is on the analysis of the process that leads to cooperation or refusal. Multilevel multinomial discrete-time event history modeling is used to examine jointly the different outcomes at each call, taking account of the influence of interviewer characteristics, call histories, and sample member characteristics. The study benefits from a rich data set comprising call record data (paradata) from several face-to-face surveys linked to interviewer observations, detailed interviewer information, and census records. The models have implications for survey practice and may be used in responsive survey designs to inform effective interviewer calling strategies.
This article offers reflections on Jerolmack and Khan’s article "Talk is Cheap: Ethnography and the Attitudinal Fallacy." Specifically, I offer three suggestions aimed at moderating the authors’ critique. Since the sociology of culture and cognition is my area of expertise, I, like Jerolmack and Khan, use this literature to mine supporting examples.
This article examines the methodological implications of the fact that what people say is often a poor predictor of what they do. We argue that many interview and survey researchers routinely conflate self-reports with behavior and assume a consistency between attitudes and action. We call this erroneous inference of situated behavior from verbal accounts the attitudinal fallacy. Though interviewing and ethnography are often lumped together as "qualitative methods," by juxtaposing studies of "culture in action" based on verbal accounts with ethnographic investigations, we show that the latter routinely attempts to explain the "attitude–behavior problem" while the former regularly ignores it. Because meaning and action are collectively negotiated and context-dependent, we contend that self-reports of attitudes and behaviors are of limited value in explaining what people actually do because they are overly individualistic and abstracted from lived experience.
Random route samples are widely used in face-to-face surveys. Most previous studies of random route sample quality compare the data collected by random route samples with data from reliable sources, such as the German Microcensus. While these studies usually find few differences in the distributions of demographic variables, it is possible that other substantive variables of interest are biased if random route samples select households with unequal probabilities. This article takes a different approach to assessing the quality of random route samples. Since random routes are used when no complete list of respondents is available, it is assumed that all units have the same selection probability. This assumption is tested, by simulating all possible random routes within a German city and calculating the probability of selection for each household. The simulation results show that all three sets of tested random route instructions lead to strong deviations from a uniform distribution and two create systematic biases.
Sociology is pluralist in subject matter, theory, and method, and thus a good place to entertain ideas about causation associated with their use under the law. I focus on two themes: (1) the legal lens on causation that "considers populations in order to make statements about individuals" and (2) the importance of distinguishing between effects of causes and causes of effects.
Attrition is the process of dropout from a panel study. Earlier studies into the determinants of attrition study respondents still in the survey and those who attrited at any given wave of data collection. In many panel surveys, the process of attrition is more subtle than being either in or out of the study. Respondents often miss out on one or more waves, but might return after that. They start off responding infrequently, but more often later in the course of the study. Using current analytical models, it is difficult to incorporate such response patterns in analyses of attrition. This article shows how to study attrition in a latent class framework. This allows the separation of different groups of respondents, that each follow a different and distinct process of attrition. Classifying attriting respondents enables us to formally test substantive theories of attrition and its effects on data accuracy more effectively.
In commenting on Fienberg, Faigman, and Dawid, the author contrasts the proposed parameter, the probability of causation, to other parameters in the causal inference literature, specifically the probability of necessity discussed by Pearl, and Robins and Greenland, and Pearl’s probability of sufficiency. This article closes with a few comments about the difficulties of estimation of parameters related to individual causation.
Life course perspectives focus on the variation in trajectories, generally to identify differences in variation dynamics and classify trajectories accordingly. Our goal here is to develop methods to gauge the discontinuity characteristics trajectories exhibit and demonstrate how these measures facilitate analyses aimed to evaluate, compare, aggregate, and classify behaviors based on the event discontinuity they manifest. We restrict ourselves here to binary event sequences, providing directions for extending the methods in future research. We illustrate our techniques to data on older drug users. It should be noted though that the application of these techniques is not restricted to drug use, but can be applied to a wide range of trajectory types. We suggest that the innovative measures of discontinuity presented can be further developed to provide additional analytical tools in social science research and in future applications. Our novel discontinuity measure visualizations have the potential to be valuable assessment strategies for interventions, prevention efforts, and other social services utilizing life course data.
Law and science share many perspectives, but they also differ in important ways. While much of science is concerned with the effects of causes (EoC), relying upon evidence accumulated from randomized controlled experiments and observational studies, the problem of inferring the causes of effects (CoE) requires its own framing and possibly different data. Philosophers have written about the need to distinguish between the "EoC" and "the CoE" for hundreds of years, but their advice remains murky even today. The statistical literature is only of limited help here as well, focusing largely on the traditional problem of the "EoC." Through a series of examples, we review the two concepts, how they are related, and how they differ. We provide an alternative framing of the "CoE" that differs substantially from that found in the bulk of the scientific literature, and in legal cases and commentary on them. Although in these few pages we cannot fully resolve this issue, we hope to begin to sketch a blueprint for a solution. In so doing, we consider how causation is framed by courts and thought about by philosophers and scientists. We also endeavor to examine how law and science might better align their approaches to causation so that, in particular, courts can take better advantage of scientific expertise.
Readability formulas, such as the Flesch Reading Ease formula, the Flesch–Kincaid Grade Level Index, the Gunning Fog Index, and the Dale–Chall formula are often considered to be objective measures of language complexity. Not surprisingly, survey researchers have frequently used readability scores as indicators of question difficulty and it has been repeatedly suggested that the formulas be applied during the questionnaire design phase, to identify problematic items and to assist survey designers in revising flawed questions. At the same time, the formulas have faced severe criticism among reading researchers, particularly because they are predominantly based on only two variables (word length/frequency and sentence length) that may not be appropriate predictors of language difficulty. The present study examines whether the four readability formulas named above correctly identify problematic survey questions. Readability scores were calculated for 71 question pairs, each of which included a problematic (e.g., syntactically complex, vague, etc.) and an improved version of the question. The question pairs came from two sources: (1) existing literature on questionnaire design and (2) the Q-BANK database. The analyses revealed that the readability formulas often favored the problematic over the improved version. On average, the success rate of the formulas in identifying the difficult questions was below 50 percent and agreement between the various formulas varied considerably. Reasons for this poor performance, as well as implications for the use of readability formulas during questionnaire design and testing, are discussed.
Social surveys generally assume that a sample of units (students, individuals, employees,...) is observed by two-stage selection from a finite population, which is grouped into clusters (schools, household, companies,...). This design involves sampling from two different populations: the population of schools or primary stage units and the population of students or second-stage units. Calibration estimators for student statistics can be defined by using combined information based on school totals and student totals. Auxiliary information from the units at the two stages can be calibrated by integrated weighting, as proposed by Lemaître and Dufour or Estevao and Särndal. Two calibration estimators for the population total based on unit weights are defined. The first estimator satisfies a calibration equation at the unit level, and the second one, at the cluster level. The proposed estimator shrinks the unit estimator toward the cluster. A simulation study based on two real populations is carried out to study the empirical performance of this shrinkage estimator. The populations studied were obtained from the Programme for International Student Assessment database and from the Spanish Household Budget Survey.
Visualization is a potentially powerful tool for exploration and complexity reduction of categorical sequence data. This article discusses currently available sequence visualization against established criteria for graphical excellence in the visual display of quantitative information. Existing sequence graphs fall into two groups: They either represent categorical sequences or summarize them. The authors propose relative frequency sequence plots as an informative way of graphing sequence data and as a bridge between data representation graphs and data summarization graphs. The efficacy of the proposed plot is assessed by the R ^{2} and the F statistics. The applicability of the proposed graphs is demonstrated using data from the German Life History Study on women’s family formation.
This special issue of Sociological Methods & Research contributes to recent trends in studies that exploit the availability of multiple measures in sample surveys in order to detect the level and patterning to measurement errors. Articles in this volume focus on topics in one of (or some combination of) the three areas: (1) those that develop and test theoretical hypotheses regarding the behavior of measurement errors under specific conditions of measurement, (2) those that focus on the methodological problems encountered in the design of data collection permitting the estimation of measurement models, and (3) those that focus on the evaluation of existing models for detecting and quantifying the nature of measurement errors. The designs included in these investigations include those that incorporate follow-up probes, record-check studies, multitrait-multimethod designs, longitudinal designs, and latent class models for assessing measurement errors for categorical variables.
Noise is widely regarded as a residual category—the unexplained variance in a linear model or the random disturbance of a predictable pattern. Accordingly, formal models often impose the simplifying assumption that the world is noise-free and social dynamics are deterministic. Where noise is assigned causal importance, it is often assumed to be a source of inefficiency, unpredictability, or heterogeneity. We review recent sociological studies that are noteworthy for demonstrating the theoretical importance of noise for understanding the dynamics of a complex system. Contrary to widely held assumptions, these studies identify conditions in which noise can increase efficiency and predictability and reduce diversity. We conclude with a methodological warning that deterministic assumptions are not an innocent simplification.
Crisp-set Qualitative Comparative Analysis, fuzzy-set Qualitative Comparative Analysis (fsQCA), and multi-value Qualitative Comparative Analysis (mvQCA) have emerged as distinct variants of QCA, with the latter still being regarded as a technique of doubtful set-theoretic status. Textbooks on configurational comparative methods have emphasized differences rather than commonalities between these variants. This article has two consecutive objectives, both of which focus on commonalities. First, but secondary in importance, it demonstrates that all set types associated with each variant can be combined within the same analysis by introducing a standardized notational system. By implication, any doubts about the set-theoretic status of mvQCA vis-à-vis its two sister variants are removed. Second, but primary in importance and dependent on the first objective, this article introduces the concept of the multivalent fuzzy set variable. This variable type forms the basis of generalized-set Qualitative Comparative Analysis (gsQCA), an approach that integrates the features peculiar to mvQCA and fsQCA into a single framework while retaining routine truth table construction and minimization procedures. Under the concept of the multivalent fuzzy set variable, all existing QCA variants become special cases of gsQCA.
Survey methodologists worry about trade-offs between nonresponse and measurement error. Past findings indicate that respondents brought into the survey late provide low-quality data. The diminished data quality is often attributed to lack of motivation. Quality is often measured through internal indicators and rarely through true scores. Using administrative data for validation purposes, this article documents increased measurement error as a function of recruitment effort for a large-scale employment survey in Germany. In this case study, the reduction in measurement quality of an important target variable is largely caused by differential measurement error in subpopulations and respective shifts in sample composition, as well as increased cognitive burden through the increased length of recall periods among later respondents. Only small portions of the relationship could be attributed to a lack of motivation among late or reluctant respondents.
This article applies coincidence analysis (CNA), a Boolean method of causal analysis presented in Baumgartner (2009a), to configurational data on the Swiss minaret vote of 2009. CNA is related to qualitative comparative analysis (QCA) (Ragin 2008), but contrary to the latter does not minimize sufficient and necessary conditions by means of Quine–McCluskey optimization, but based on its own custom built optimization algorithm. The latter greatly facilitates the analysis of data featuring chain-like causal dependencies among the conditions of an ultimate outcome—as can be found in the data on the Swiss minaret vote. Apart from providing a model of the causal structure behind the Swiss minaret vote, we show that a CNA of that data is preferable over a QCA.
Agent-based modeling has become increasingly popular in recent years, but there is still no codified set of recommendations or practices for how to use these models within a program of empirical research. This article provides ideas and practical guidelines drawn from sociology, biology, computer science, epidemiology, and statistics. We first discuss the motivations for using agent-based models in both basic science and policy-oriented social research. Next, we provide an overview of methods and strategies for incorporating data on behavior and populations into agent-based models, and review techniques for validating and testing the sensitivity of agent-based models. We close with suggested directions for future research.
For survey methodologists, latent class analysis (LCA) is a powerful tool for assessing the measurement error in survey questions, evaluating survey methods, and estimating the bias in estimates of population prevalence. LCA can be used when gold standard measurements are not available and applied to essentially any set of indicators that meet certain criteria for identifiability. LCA offers quality inference, provided the key threat to model validity—namely, local dependence—can be appropriately addressed either in the study design or in the model-building process. Three potential causes threaten local independence: bivocality, behaviorally correlated error, and latent heterogeneity. In this article, these threats are examined separately to obtain insights regarding (a) questionnaire designs that reduce local dependence, (b) the effects of local dependence on parameter estimation, and (c) modeling strategies to mitigate these effects in statistical inference. The article focuses primarily on the analysis of rare and sensitivity outcomes and proposes a practical approach for diagnosing and mitigating model failures. The proposed approach is empirically tested using real data from a national survey of inmate sexual abuse where measurement errors are a serious concern. Our findings suggest that the proposed modeling strategy was successful in reducing local dependence bias in the estimates, but its success varied by the quality of the indicators available for analysis. With only three indicators, the biasing effects of local dependence can usually be reduced but not always to acceptable levels.
Causal inference via process tracing has received increasing attention during recent years. A 2 x 2 typology of hypothesis tests takes a central place in this debate. A discussion of the typology demonstrates that its role for causal inference can be improved further in three respects. First, the aim of this article is to formulate case selection principles for each of the four tests. Second, in focusing on the dimension of uniqueness of the 2 x 2 typology, I show that it is important to distinguish between theoretical and empirical uniqueness when choosing cases and generating inferences via process tracing. Third, I demonstrate that the standard reading of the so-called doubly decisive test is misleading. It conflates unique implications of a hypothesis with contradictory implications between one hypothesis and another. In order to remedy the current ambiguity of the dimension of uniqueness, I propose an expanded typology of hypothesis tests that is constituted by three dimensions.
Prepaid monetary incentives are used to address declining response rates in random-digit dial surveys. There is concern among researchers that some respondents will accept the prepayment but not complete the survey. There is little research to understand check cashing and survey completing behaviors among respondents who receive prepayment. Data from the International Tobacco Control Four-Country Study—a longitudinal survey of smokers in Canada, the United States, the United Kingdom, and Australia—were used to examine the impact of prepayment (in the form of checks, approximately US$10) on sample profile. Approximately 14 percent of respondents cashed their check, but did not complete the survey, while about 14 percent did not cash their checks, but completed the survey. Younger adults (Canada and United States), those of minority status (United States), and those who had been in the survey for only two waves or less (Canada and United States) were more likely to cash their checks and not complete the survey.
A multilevel regression model is proposed in which discrete individual-level variables are used as predictors of discrete group-level outcomes. It generalizes the model proposed by Croon and van Veldhoven for analyzing micro–macro relations with continuous variables by making use of a specific type of latent class model. A first simulation study shows that this approach performs better than more traditional aggregation and disaggreagtion procedures. A second simulation study shows that the proposed latent variable approach still works well in a more complex model, but that a larger number of level-2 units is needed to retain sufficient power. The more complex model is illustrated with an empirical example in which data from a personal network are used to analyze the interaction effect of being religious and surrounding yourself with married people on the probability of being married.
We propose a new multiple imputation technique for imputing squares. Current methods yield either unbiased regression estimates or preserve data relations. No method, however, seems to deliver both, which limits researchers in the implementation of regression analysis in the presence of missing data. Besides, current methods only work under a missing completely at random (MCAR) mechanism. Our method for imputing squares uses a polynomial combination. The proposed method yields both unbiased regression estimates, while preserving the quadratic relations in the data for both missing at random and MCAR mechanisms.
Group-based trajectory models are used to investigate population differences in the developmental courses of behaviors or outcomes. This note introduces a new Stata command, traj, for fitting to longitudinal data finite (discrete) mixture models designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time. Normal, Censored normal, Poisson, Zero-inflated Poisson, and Logistic distributions are supported.
Many face-to-face surveys use field staff to create lists of housing units from which samples are selected. However, housing unit listing is vulnerable to errors of undercoverage: Some housing units are missed and have no chance to be selected. Such errors are not routinely measured and documented in survey reports. This study jointly investigates the rate of undercoverage, the correlates of undercoverage, and the bias in survey data due to undercoverage in listed housing unit frames. Working with the National Survey of Family Growth, we estimate an undercoverage rate for traditional listing efforts of 13.6 percent. We find that multiunit status, rural areas, and map difficulties strongly correlate with undercoverage. We find significant bias in estimates of variables such as birth control use, pregnancies, and income. The results have important implications for users of data from surveys based on traditionally listed housing unit frames.
Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro–macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by microspecifications of actor-based models.
Measurement theorists agree that one has measured well when one’s measurement scheme faithfully represents the concept under investigation. Yet, the conventional wisdom on "measurement validation" pays surprisingly little attention to conceptual meaning and instead emphasizes measurement error and the pursuit of true scores. Researchers are advised to adopt an empiricist stance; treat data as objective facts; and confer validity through predictive correlations. This article offers an alternative outlook on ascertaining goodness in measurement. First, researchers must measure a concept’s dimensional expanse. Second, they must contextualize their measures to ensure concept–measure congruence and categorial pertinence. Third, this approach hinges on dialogue among subject matter experts to craft disciplinary measurement norms. The article contrasts these dueling approaches through an extended example of how scholars measure the concept state capacity. Overall, this article argues that social scientists must reconceive what it means to have measured well.
This article is an empirical contribution to the evaluation of the randomized response technique (RRT), a prominent procedure to elicit more valid responses to sensitive questions in surveys. Based on individual validation data, we focus on two questions: First, does the RRT lead to higher prevalence estimates of sensitive behavior than direct questioning (DQ)? Second, are there differences in the effects of determinants of misreporting according to question mode? The data come from 552 face-to-face interviews with subjects who had been convicted by a court for minor criminal offences in a metropolitan area in Germany. For the first question, the answer is negative. For the second, it is positive, that is, effects of individual and situational determinants of misreporting differ between the two question modes. The effect of need for social approval, for example, tends to be stronger in RRT than in DQ mode. Interviewer experience turns out to be positively related to answer validity in DQ and negatively in RRT mode. Our findings support a skeptical position toward RRT, shed new light on long-standing debates within survey methodology, and stimulate theoretical reasoning about response behavior in surveys.
Bias in inferences from samples constitutes one of the main sources of error in macrosociological studies. The problem is typically understood as one of "selection bias." What constitutes the population available for selection, by contrast, is rarely thought of as being problematic. This article works to remedy this oversight and to demonstrate that this factor is not to be overlooked—suggesting that the categorization of cases constitutes another important source of biased inferences. This article argues that, because the boundaries of most macrosociological concepts are fuzzy, samples formed according to the logic of crisp-set categorization might lead to "categorization bias." In order to avoid such a categorization bias, we argue that cases should be weighted depending on how strongly they resemble the prototypes of the categories for which one wants to arrive at generalizations. This article further argues that, by weighting for prototypicality, the logic of fuzzy boundaries of concepts and comparative and statistical methods can be combined. In the process, we provide an example of how weighting by prototypicality can be applied in the area of ethnic studies.
Many social science studies are based on coded in-depth semistructured interview transcripts. But researchers rarely report or discuss coding reliability in this work. Nor is there much literature on the subject for this type of data. This article presents a procedure for developing coding schemes for such data. It involves standardizing the units of text on which coders work and then improving the coding scheme’s discriminant capability (i.e., reducing coding errors) to an acceptable point as indicated by measures of either intercoder reliability or intercoder agreement. This approach is especially useful for situations where a single knowledgeable coder will code all the transcripts once the coding scheme has been established. This approach can also be used with other types of qualitative data and in other circumstances.
The increasing availability of data from multisite randomized trials provides a potential opportunity to use instrumental variables (IV) methods to study the effects of multiple hypothesized mediators of the effect of a treatment. We derive nine assumptions needed to identify the effects of multiple mediators when using site-by-treatment interactions to generate multiple instruments. Three of these assumptions are unique to the multiple-site, multiple-mediator case: (1) the assumption that the mediators act in parallel (no mediator affects another mediator); (2) the assumption that the site-average effect of the treatment on each mediator is independent of the site-average effect of each mediator on the outcome; and (3) the assumption that the site-by-compliance matrix has sufficient rank. The first two of these assumptions are nontrivial and cannot be empirically verified, suggesting that multiple-site, multiple-mediator IV models must be justified by strong theory.
Widely used methods for analyzing missing data can be biased in small samples. To understand these biases, we evaluate in detail the situation where a small univariate normal sample, with values missing at random, is analyzed using either observed-data maximum likelihood (ML) or multiple imputation (MI). We evaluate two types of MI: the usual Bayesian approach, which we call posterior draw (PD) imputation, and a little used alternative, which we call ML imputation, in which values are imputed conditionally on an ML estimate. We find that observed-data ML is more efficient and has lower mean squared error than either type of MI. Between the two types of MI, ML imputation is more efficient than PD imputation, and ML imputation also has less potential for bias in small samples. The bias and efficiency of PD imputation can be improved by a change of prior.
Respondent-driven sampling (RDS) is a method for recruiting "hidden" populations through a network-based, chain and peer referral process. RDS recruits hidden populations more effectively than other sampling methods and promises to generate unbiased estimates of their characteristics. RDS’s faithful representation of hidden populations relies on the validity of core assumptions regarding the unobserved referral process. With empirical recruitment data from an RDS study of female sex workers (FSWs) in Shanghai, we assess the RDS assumption that participants recruit nonpreferentially from among their network alters. We also present a bootstrap method for constructing the confidence intervals around RDS estimates. This approach uniquely incorporates real-world features of the population under study (e.g., the sample’s observed branching structure). We then extend this approach to approximate the distribution of RDS estimates under various peer recruitment scenarios consistent with the data as a means to quantify the impact of recruitment bias and of rejection bias on the RDS estimates. We find that the hierarchical social organization of FSWs leads to recruitment biases by constraining RDS recruitment across social classes and introducing bias in the RDS estimates.
This article presents a method for estimating and interpreting total, direct, and indirect effects in logit or probit models. The method extends the decomposition properties of linear models to these models; it closes the much-discussed gap between results based on the "difference in coefficients" method and the "product of coefficients" method in mediation analysis involving nonlinear probability models models; it reports effects measured on both the logit or probit scale and the probability scale; and it identifies causal mediation effects under the sequential ignorability assumption. We also show that while our method is computationally simpler than other methods, it always performs as well as, or better than, these methods. Further derivations suggest a hitherto unrecognized issue in identifying heterogeneous mediation effects in nonlinear probability models. We conclude the article with an application of our method to data from the National Educational Longitudinal Study of 1988.
Using population representative survey data from the German Socio-Economic Panel (SOEP) and administrative pension records from the Statutory Pension Insurance, the authors compare four statistical matching techniques to complement survey information on net worth with social security wealth (SSW) information from the administrative records. The unique properties of the linked data allow for a straight control of the quality of matches under each technique. Based on various evaluation criteria, Mahalanobis distance matching performs best. Exploiting the advantages of the newly assembled data, the authors include SSW in a wealth inequality analysis. Despite its quantitative relevance, SSW is thus far omitted from such analyses because adequate micro data are lacking. The inclusion of SSW doubles the level of net worth and decreases inequality by almost 25 percent. Moreover, the results reveal striking differences along occupational lines.
Set-theoretic methods and Qualitative Comparative Analysis (QCA) in particular are case-based methods. There are, however, only few guidelines on how to combine them with qualitative case studies. Contributing to the literature on multi-method research (MMR), we offer the first comprehensive elaboration of principles for the integration of QCA and case studies with a special focus on case selection. We show that QCA's reliance on set-relational causation in terms of necessity and sufficiency has important consequences for the choice of cases. Using real world data for both crisp-set and fuzzy-set QCA, we show what typical and deviant cases are in QCA-based MMR. In addition, we demonstrate how to select cases for comparative case studies aiming to discern causal mechanisms and address the puzzles behind deviant cases. Finally, we detail the implications of modifying the set-theoretic cross-case model in the light of case-study evidence. Following the principles developed in this article should increase the inferential leverage of set-theoretic MMR.
This article examines the problem of response error in survey earnings data. Comparing workers’ earnings reports in the U.S. Census Bureau’s Survey of Income and Program Participation (SIPP) to their detailed W-2 earnings records from the Social Security Administration, we employ ordinary least squares (OLS) and quantile regression models to assess the effects of earnings determinants and demographic variables on measurement errors in 2004 SIPP earnings in terms of bias and variance. Results show that measurement errors in earnings are not classical, but mean-reverting. The directions of bias for subpopulations are not constant, but varying across levels of earnings. Highly educated workers more correctly report their earnings than less educated workers at higher earnings levels, but they tend to overreport at lower earnings levels. Black workers with high earnings underreport to a greater degree than comparable whites, while black workers with low earnings overreport to a greater degree. Some subpopulations exhibit higher variances of measurement errors than others. Blacks, Hispanics, high school dropouts, part-year employed workers, and occupation "switchers" tend to misreport—both over- and underreport—their earnings rather than unilaterally in one direction. The implications of our findings are discussed.
A persistent problem in the design of bipolar attitude questions is whether or not to include a middle response alternative. On the one hand, it is reasonable to assume that people might hold opinions which are ‘neutral’ with regard to issues of public controversy. On the other, question designers suspect that offering a mid-point may attract respondents with no opinion, or those who lean to one side of an issue but do not wish to incur the cognitive costs required to determine a directional response. Existing research into the effects of offering a middle response alternative has predominantly used a split-ballot design, in which respondents are assigned to conditions which offer or omit a midpoint. While this body of work has been useful in demonstrating that offering or excluding a mid-point substantially influences the answers respondents provide, it does not offer any clear resolution to the question of which format yields more accurate data. In this paper, we use a different approach. We use follow-up probes administered to respondents who initially select the mid-point to determine whether they selected this alternative in order to indicate opinion neutrality, or to indicate that they do not have an opinion on the issue. We find the vast majority of responses turn out to be what we term ‘face-saving don’t knows’ and that reallocating these responses from the mid-point to the don’t know category significantly alters descriptive and multivariate inferences. Counter to the survey-satisficing perspective, we find that those with this tendency is greatest amongst those who express more interest in the topic area.
Autism prevalence has increased rapidly in the United States during the past two decades. We have previously shown that the diffusion of information about autism through spatially proximate social relations has contributed significantly to the epidemic. This study expands on this finding by identifying the focal points for interaction that drive the proximity effect on subsequent diagnoses. We then consider how diffusion dynamics through interaction at critical focal points, in tandem with exogenous shocks, could have shaped the spatial dynamics of autism in California. We achieve these goals through an empirically calibrated simulation model of the whole population of 3- to 9-year-olds in California. We show that in the absence of interaction at these foci—principally malls and schools—we would not observe an autism epidemic. We also explore the idea that epigenetic changes affecting one generation in the distal past could shape the precise spatial patterns we observe among the next generation.