Which of the Following Urls Would You Most Distrust in Writing a Scientific Paper?

The integration of research results into teachers' classroom practice is becoming ever more important (e.g., Southerland et al., 2016). However, several studies show that both preservice and in-service teachers tend to prefer conventional wisdom and practically derived knowledge sources over scientific evidence (Bråten & Ferguson, 2015; Cramer, 2013; Parr & Timperley, 2008). This has potentially far-reaching consequences: Teachers not knowing the science behind their profession may succumb to misconceptions on teaching and learning, which, in turn, may lead to dysfunctional educational practices. For example, they may waste time developing teaching materials tailored to individual students' learning styles (the "learning style" myth has long been debunked; Kirschner, 2017; Pashler et al., 2008), or they may recommend grade retention for lower performing students (there is no evidence of academic benefit for retained students; Hughes et al., 2018; Jimmerson, 2001).

Several authors have argued that this underreliance on research knowledge might be grounded in beliefs about the sources of educational knowledge (e.g., Bråten & Ferguson, 2015; Merk et al., 2017; Sjølie, 2014). This is because directly evaluating the validity and robustness of educational research is hard for teachers, as it often requires a considerable amount of background knowledge in research methodology (Hendriks et al., 2021). Therefore, to "acquire reliable and useable information about learning and teaching, (pre-service) teachers must instead be able to identify and evaluate knowledgeable and trustworthy sources of information—that is, to figure out whom to believe" (Hendriks et al., 2021, p. 2). Keeping in mind the importance of such "second-hand evaluations" (i.e., figuring out "whom to believe" instead of "what is true"; Bromme et al., 2015), we know from social psychology that members of one's in-group are seen as more positive compared with individuals from outside one's group (in-group bias; Mullen et al., 1992). Therefore, teachers might rate fellow teachers as more trustworthy compared with researchers. Furthermore, considering the findings that teachers view researchers as competent but self-interested (Critchley, 2008), and that researchers are generally ascribed less warmth (i.e., benevolence) but similar competence (i.e., expertise) compared with teachers (Fiske & Dupree, 2014), one might suppose that this is especially true for researchers' perceived benevolence and integrity.

In line with such arguments, Merk and Rosman (2019) found what they subsequently called the "smart but evil" stereotype: In two experimental studies, they showed that preservice teachers view educational researchers' expertise as significantly higher than their integrity and benevolence compared with their views on practitioners (i.e., in-service teachers). While it is still disputed whether comparably lower—but still high—integrity and benevolence ascriptions may indeed be labeled as "evil" (Hendriks et al., 2021), Hendriks et al. (2021) found additional (partial) support for a smart but evil pattern in preservice teachers: When following the epistemic aim of acquiring theoretical explanations in a teaching context, they saw researchers as having more expertise and integrity but less benevolence compared with practitioners. In contrast, when their participants were looking to gain practical advice for everyday school life, teachers were seen, compared with researchers, as having altogether higher expertise, integrity, and benevolence. In other words, researchers were generally less trusted than teaching practitioners, with the exception of their expertise and integrity in providing theoretical explanations.

Taken together, these studies provide evidence for the existence of a smart but evil pattern. As impaired epistemic trust might lead to teachers disregarding empirical evidence in their teaching, close attention to such findings is warranted. To date, however, the existing evidence does not allow us to specify the extent to which the pattern is specific for the domain of educational research (Merk & Rosman, 2019), especially since the aforementioned study by Fiske and Dupree (2014) found a similar pattern in a general population sample. Moreover, current studies on epistemic trust (including the works by Merk & Rosman, 2019, and Hendriks et al., 2021) do not allow the delineation between trust and distrust since they simply operationalize distrust as being the opposite of trust (which is not necessarily true; e.g., Lewicki et al., 1998). Furthermore, it is not yet clear whether the pattern only applies to comparisons between researchers and practitioners, or whether the smart but evil pattern also manifests itself in the mean differences between researchers' expertise, integrity, and benevolence. Finally, considering the role of teachers as multipliers of knowledge and beliefs, the studies' focus on preservice teachers narrows the generalizability of the findings. Hence, in the present article, we strive to replicate the smart but evil pattern and address the mentioned shortcomings by (1) analyzing reasons for trust and distrust in domains of differing granularity (i.e., educational research and research in general), (2) investigating trust and distrust as separate concepts, (3) employing a different methodological approach, and (4) using a larger and more heterogeneous sample (i.e., in-service teachers throughout Germany). The findings outlined here provide a foundation for future research focusing on the differentiated analysis of trust and distrust in (educational) researchers.

The Concept of Epistemic Trust

Epistemic trust is defined as the amount of trust that individuals ascribe to a specific knowledge source such as, for example, educational researchers. As Hendriks et al. (2016) point out, this amount of epistemic trust depends on the epistemic trustworthiness of a source—in other words, on specific information features of the source that make it more or less trustworthy. For example, teachers will likely ascribe more trust to a renowned professor compared with an undergraduate student in his or her first semester. From a psychological perspective, this is because individuals use specific source information features to gauge a source's epistemic trustworthiness (Hendriks et al., 2016; Landrum et al., 2015). Several researchers have proposed to distinguish between three dimensions that individuals use when evaluating this trustworthiness using source information features: expertise, integrity, and benevolence (e.g., Hendriks et al., 2015, 2016; Mayer et al., 1995). A source with high expertise (or ability; Mayer et al., 1995) is highly skilled and qualified within a particular domain (Hendriks et al., 2015, 2016). A source with high integrity is honest and adheres to recognized standards (e.g., transparency and openness) in his or her field (Hendriks et al., 2015). Finally, a benevolent source has good intentions and acts for the greater good of others—in contrast to someone who is only interested in his or her personal benefit (Hendriks et al., 2016; Mayer et al., 1995). Here, we must mention the similarities that expertise and benevolence share with the stereotype content dimensions of competence (e.g., competent, capable, intelligent) and warmth (e.g., sincere, friendly, well-intentioned) suggested by Fiske et al. (2002).

Established measurement instruments on epistemic trustworthiness (e.g., the Münster Epistemic Trustworthiness Inventory [METI]; Hendriks et al., 2015) often operationalize trust and distrust, along the dimensions outlined above, as opposite ends of three continuous variables (i.e., expertise, benevolence, and integrity; see also Saunders & Thornhill, 2004). However, a growing body of theories and empirical studies suggest that (1) trust and distrust may coexist or (2) that that they may be separate constructs (Bigley & Pearce, 1998; Ou & Sia, 2010; Saunders et al., 2014; Saunders & Thornhill, 2004; Sitkin & Roth, 1993). With regard to the latter, distrust arises when "fundamental values are violated, and perceived trustworthiness is undermined across contexts" (Sitkin & Roth, 1993, p. 370). In contrast, violations in trust are seen as specific to a particular context, which is why distrust, because of its higher generality, would be more persistent and harder to change than reduced trust (Sitkin & Roth, 1993). Regarding the possible coexistence of trust and distrust, findings from social psychology suggest that humans may well express attitudes of positive valence and negative valence simultaneously (Cacioppo et al., 1997). More specifically, Cacioppo et al. (1997) argue that "a stimulus may vary in terms of the strength of positive evaluative activation and the strength of negative evaluative activation it evokes" (p. 3). Therefore, it is possible that certain stimuli evoke a strong activation of both positive and negative evaluative processes, thus resulting in attitude ambivalence (Cacioppo et al., 1997). While we are not aware of any corresponding studies, such arguments are easily transferred to trust in (educational) researchers. For example, teachers may very well trust the Programme for International Student Assessment team's expertise in large-scale data analysis, while at the same time exhibiting a more abstract and general distrust in their ability to derive adequate conclusions on students' competencies.

In line with such arguments, it is not surprising that the 2017 and 2018 Science Barometer ("Wissenschaftsbarometer"), a representative survey of German citizens' attitudes toward science and research (Wissenschaft im Dialog/Kantar Emnid, 2017, 2018), not only included items assessing participants' individual trust in researchers but also items assessing why they distrusted researchers.

The Smart but Evil Pattern

In 2019, Merk and Rosman investigated whether preservice teachers differ in the amount of epistemic trustworthiness they ascribe to different sources of educational knowledge. Specifically, they confronted their participants with short texts from the educational domain (e.g., on the prevalence of bullying in schools) that were experimentally manipulated with regard to their alleged source (practitioner vs. scientific study) while remaining invariant in content. For example, in one text version, a practitioner (i.e., a teacher) reported on his or her experiences regarding the prevalence of bullying, whereas another text version contained the same information, but was framed as a report of an empirical study written by researchers (Merk & Rosman, 2019). After reading each text, participants were asked to rate the epistemic trustworthiness of the texts' authors on the three METI dimensions (expertise, integrity, and benevolence; Hendriks et al., 2015). In two experimental studies, Merk and Rosman (2019) found what they later coined as the smart but evil stereotype: Educational researchers were seen as having less integrity and benevolence but, at the same time, as having more expertise compared with practitioners. Furthermore, as outlined above, Hendriks et al. (2021) found that preservice teachers with the epistemic aim of receiving theoretical explanations regarded researchers as having more expertise and integrity but less benevolence as compared with practitioners, thus lending further support for a smart but evil pattern (but note their diverging findings on the integrity dimension). Nevertheless, one should keep in mind that such findings of researchers being perceived as more smart but evil compared with practitioners (e.g., Hendriks et al., 2021; Merk & Rosman, 2019) do not imply that educational researchers are seen as smart but evil per se, especially when considering the rather high means that were found for the METI dimensions across all experimental conditions.

The Generalizability of Epistemic Trust Across Different Scientific Domains

In their theoretical framework on learning to trust and trusting to learn, Landrum et al. (2015) suggest that learners will generalize their beliefs about whom to trust by referring to "an unfamiliar individual's domain of expertise to make inferences about what he or she is likely to know" (p. 110). As both educational researchers and researchers in general stem from the "science" domain, a certain amount of generalizability of epistemic trust from educational research to other scientific domains (and vice versa) is thus likely. Further support for this assumption comes from the field of epistemic beliefs. In fact, as evidenced by a recent study (Merk & Rosman, 2019), epistemic beliefs and epistemic trust share a certain conceptual overlap—while epistemic beliefs focus on how people think about knowledge itself (Hofer & Pintrich, 1997), epistemic trust denotes how individuals evaluate the expertise, benevolence, and integrity of a specific knowledge source (Hendriks et al., 2016). Therefore, the Theory of Integrated Domains in Personal Epistemology (TIDE; Merk et al., 2018; Muis et al., 2006) provides a foundation for the generalizability of epistemic trust across domains. In fact, the TIDE framework suggests that epistemic beliefs from one domain (e.g., biology-specific epistemic beliefs) influence both more general (e.g., academic) and more specific (e.g., topic-specific) epistemic beliefs, and several studies have found empirical evidence for such predictions (e.g., Merk et al., 2018; Muis et al., 2006).

The Present Study

The present study, which was preregisterered at PsychArchives (Rosman & Merk, 2020), builds on Merk and Rosman's (2019) work and extends it with regard to the aspects outlined above. To do so, we use data from a cross-sectional survey asking German in-service teachers about three reasons (expertise, benevolence, and integrity) for their epistemic trust and distrust. Participating teachers were asked to provide two responses (on Likert-type scales)—one regarding researchers in general, and one regarding educational researchers.

In a first step, we investigate these reasons with regard to the latter group of educational researchers. Our underlying assumption is that if the smart but evil pattern is present in teachers, it will impact their explanations on why they trust or distrust educational researchers. More specifically, when justifying their trust in educational researchers, we expect that our participants will more strongly refer to expertise-related reasons rather than provide explanations focusing on educational researchers' benevolence and integrity. In contrast, to justify their distrust in educational researchers and in line with the smart but evil pattern, we expect our participants to more strongly refer to benevolence- and integrity-related reasons compared with expertise-related reasons. To conceptually replicate the smart but evil pattern in in-service teachers, we therefore posit the following confirmatory hypotheses:

Hypothesis 1: Concerning their reasons for trusting educational researchers, teachers will score higher on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons (H₁ ¹).
Hypothesis 2: Concerning their reasons for distrusting educational researchers, teachers will score lower on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons (H₂).

Second, we will test whether these hypothesized effects are specific to educational research or whether they apply to research in general. Tentatively, one may expect the latter. In fact, a generalizability of the reasons for trust and distrust across domains would be in in line with the aforementioned trust research and epistemic beliefs frameworks (e.g., Landrum et al., 2015; Merk et al., 2018; Muis et al., 2006). The empirical results on the reasons for trust and distrust in the Science Barometer (Wissenschaft im Dialog/Kantar Emnid, 2017, 2018) further support this assumption: As outlined above, the Science Barometer included three trust items and three distrust items pertaining to researchers in general. The response patterns obtained for these items—at least descriptively—speak in favor of a preference for expertise-related reasons over reasons of benevolence and integrity regarding trust in researchers in general (and for the opposite pattern regarding distrust; Könneker, 2018, 2020; Wissenschaft im Dialog/Kantar Emnid, 2017). To further investigate this generalizability aspect, we will analyze whether the reasons for trust and distrust suggested in Hypotheses 1 and 2 differ when teachers are requested to provide their responses with regard to researchers in general. We justify the importance of these analyses as follows: If teachers' smart but evil patterns generalize from educational researchers to researchers in general, this would suggest that teachers are not specifically biased toward educational researchers, thus making it easier to adopt interventions for rebuilding trust in science (e.g., increased transparency; Bachmann et al., 2015) to the educational context. In line with our reasoning above, we suggest the following confirmatory hypotheses:

Hypothesis 3: Concerning their reasons for trusting researchers in general, teachers will score higher on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons (H₃).
Hypothesis 4: Concerning their reasons for distrusting researchers in general, teachers will score lower on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons (H₄).

Testing the similarity of and differences between belief configurations across domains has a certain tradition in epistemic beliefs research because of its implications for theory building (e.g., Merk et al., 2018). More specifically, the TIDE framework (see above; Merk et al., 2018) predicts that beliefs about knowledge regarding one domain generalize to more general beliefs about scientific knowledge. Thus, examining the magnitude of differences between smart but evil patterns for educational researchers and researchers in general might lend further support to such assumptions. Furthermore, if we were to show that such differences are rather small, this would make it even easier to adopt more general interventions on rebuilding trust to the context of educational research. Hence, if Hypotheses 1 and 3 or 2 and 4 are both significant, we will conduct additional exploratory analyses on a within-person level to examine the magnitude of differences between reasons for (dis)trusting educational researchers and researchers in general.

As a third set of hypotheses, we will investigate, using additional data (2018 Science Barometer; Wissenschaft im Dialog/Kantar Emnid, 2018), whether the general populations' explanations for their trust and distrust in researchers also reflect the patterns outlined in the previous hypotheses. While this set of hypotheses does not directly relate to the teacher educational context, it allows us to test whether the patterns suggested in Hypotheses 3 and 4 are specific for teachers or whether they generalize to a comparable population of German nonteachers. As outlined above, the 2017 and 2018 Science Barometer data support the latter assumptions on a descriptive level (Wissenschaft im Dialog/Kantar Emnid, 2017, 2018). In the present study, we will investigate whether this still holds true when matching the 2018 Science Barometer sample to our teacher sample with regard to age and socioeconomic status (SES; e.g., education and interest in politics, science, and sports). Such analyses are important because they allow researchers to determine whether teachers are a special subgroup who have distinct beliefs about researchers or whether their patterns of epistemic trust are largely similar to those of the general population. If the latter were true, this would, again, allow to adopt more general interventions on rebuilding trust to teachers without much effort—which would certainly be good news. We thus posit the following hypotheses:

Hypothesis 5: Concerning their reasons for trusting researchers in general, a general population sample matched for age and SES will score higher on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons (H₅).
Hypothesis 6: Concerning their reasons for distrusting researchers in general, a general population sample matched for age and SES will score lower on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons (H₆).

While this set of hypotheses does not directly relate to the teacher education context, it allows us to test whether our predictions are specific for teachers or whether they generalize to a comparable population of German nonteachers. For this research question, estimating the magnitude of potential differences between teachers and the general population in their reasons for trust in researchers might be particularly interesting because if we find distinct trust patterns for teachers as compared with the general population, it is important to uncover the extent of these differences. If the analyses determine that both Hypotheses 3 and 5 are significant, we will conduct corresponding exploratory analyses using a data set that combines our data with the matched 2018 Science Barometer data.

Method

Pilot Study

To draw valid comparisons between a teacher sample and the general population sample from the 2018 Science Barometer (Wissenschaft im Dialog/Kantar Emnid, 2018), it is important that the trust and distrust measurements are identical across both studies. However, the 2018 Science Barometer draws on nonvalidated single items, thus making it impossible to assess their psychometric properties. Moreover, it is important to note that all our hypotheses focus on differences between expertise, benevolence, and integrity as reasons for trust and distrust—which complicates making inferences about the objective (or absolute) levels of trust and distrust. To address these issues, we additionally conducted a pilot study² aiming at testing the validity of our items.

For this study, we recruited a German general population sample of N = 504 adults (252 female; 11% 18–24 years; 15% 25–31 years; 15% 32–38 years; 13% 39–45 years; 15% 46–52 years; 18% 53–59 years; 14% 60–66 years) using an online sample provider. In addition to a number of survey questions unrelated to the present study, participants were asked about their reasons for trusting and distrusting researchers in general, using the exact same items that were included in the 2017 and 2018 Science Barometer (Wissenschaft im Dialog/Kantar Emnid, 2017, 2018). Participants responded to two items per dimension (expertise, benevolence, integrity) on scales with five response categories (ranging from "do not agree" to "fully agree"). Specifically, for a given dimension, one item addressed reasons to trust researchers and the second item addressed reasons to distrust researchers (see Table 1). All scales included a "don't know" option which was treated as missing data and dealt with using casewise deletion.

To validate these items, we additionally administered the METI (Hendriks et al., 2015) to assess researchers' expertise, integrity, and benevolence using a 14-item adjective-based semantic differential with five response categories (e.g., "incompetent-competent" as an indicator of expertise). Furthermore, we included six reworded items of the "reasons for trust" indicators described above. These items as well as their introduction had been stripped of all content pertaining to reasons for trust, thus directly asking participants about their epistemic trust in researchers in general (e.g., "Because researchers are experts in their field" was replaced by "Researchers are experts in their field," and the notion, "There are several reasons for trusting researchers" was removed from the introduction). The order of the survey pages including the original and reworded items was balanced out, and the METI was administered between the two variants.

To investigate³ these data, we first tested the factorial validity of the METI. As expected, the theoretically proposed three-dimensional confirmatory factor analysis model outperformed competing models and showed good fit indices referring to classical benchmarks like those proposed by Hu and Bentler (1999). Subsequently, we conducted a multitrait-multimethod analysis (Eid, 2000) to assess the construct validity of the Science Barometer items. Specifically, we fitted a correlated trait/uncorrelated methods model, inspected the fit indices, and tested if the standardized loadings of the Science Barometer items on the corresponding dimension were substantial (> .30) and largely equivalent (absolute difference of standardized loadings < .15) to the loadings of the reworded items using approximative adjusted fractional Bayes factors for structural equation modeling (Gu et al., 2019). The theoretically expected correlated traits/uncorrelated methods model showed good fit measures for the Science Barometer items measuring trust, χ²(df) = 424.68(147), Tucker–Lewis index (TLI) = 0.951, comparative fit index (CFI) = 0.962, root-mean-square error of approximation (RMSEA) = 0.061, standardized root-mean-square residual (SRMR) = 0.029) as well as for those measuring distrust, χ²(df) = 471.01(147), TLI = 0.938, CFI = 0.952, RMSEA = 0.066, SRMR = 0.040, which we view as strong evidence for the convergent validity of these items. This was corroborated by the strong evidence provided by the Bayes factors (i.e., 10 of the 12 standardized trait factor loadings of the single items were substantial with a BF_u > 4; the two small factor loadings were lack of expertise as reason for distrust and the corresponding rewording), and all Science Barometer items loaded equivalent to their corresponding rewordings (BF_u > 4).

Participants and Procedure

All study procedures (except the pilot study) were preregistered at PsychArchives (Rosman & Merk, 2020). Hypotheses were tested in a sample of German in-service school teachers from schools all over Germany. Participants were recruited by means of a professional opinion research service (forsa). All participants who agreed to participate in the study were then then directed to an online survey where the study data were collected. In total, N = 414 in-service teachers completed the data collection (67% female; age: M = 47.7; SD = 10.8; teaching experience: M = 17.4 years, SD = 11.1; 26.1% primary schools, 29.5% grammar schools; 79% former West Germany).

As the items analyzed for the present study were part of a larger survey with experimental elements, the sample size was already determined by the requirements of these elements. As we planned to test our hypotheses using approximated adjusted Bayes factors for informative hypotheses (Gu et al., 2018; Hoijtink, 2011), we ran a Bayes factor design analysis (Schönbrodt & Wagenmakers, 2018) for this fixed N to estimate the "probability of achieving a research goal" (i.e., statistical power; Kruschke, 2010, p. 658) of our analyses. We thereby specified the smallest effect size of interest to d = .30. As can be seen in the Bayes factor design analysis documentation in our preregistration (Rosman & Merk, 2020), the planned decision procedure leads, according to our simulations, very rarely to "inconclusive" or "wrong" (false positive or false negative) results (see preregistration for more details; Rosman & Merk, 2020).

Design and Materials

Even though later parts of the survey drew on an experimental design, all data used for the present study were collected before assignment to any experimental groups (Rosman & Merk, 2020). Each participant thus received an identical set of materials. All materials were administered in German language; the examples below have been translated.

At the beginning of the survey, some covariates were measured (mostly single items; i.e., interest in science, politics, and sports; self-reported scientific literacy; beliefs about science; general trust in science). These covariates were taken from the 2018 Science Barometer, and they were required for the investigation of other research questions (which were significantly different from those of the present article). Subsequently, participants were asked about their reasons for trusting and distrusting researchers in general using the Science Barometer questions also employed in our pilot study (see Table 1). After responding to other covariates (interest in educational research; self-reported scientific literacy regarding educational research; beliefs about educational research; general trust in educational research), participants were asked about their reasons for trusting and distrusting educational researchers. These questions were again identical to the Science Barometer questions with the exception that they related to educational researchers rather than researchers in general. Specifically, we exchanged the term "researchers" with "educational researchers." Furthermore, a brief definition of "educational research" was provided to reduce the risk of bias by participants conceptualizing educational researchers in different ways ("Educational research focuses on the theory and practice of education and pedagogy. Subdisciplines are, among others, pedagogy, educational psychology, educational economics, and educational sociology"). Again, participants responded on 5-point Likert-type scales with an additional "don't know" option.

Data Analysis

Our preregistered confirmatory hypotheses were tested using Bayesian Informative Hypothesis Evaluation (i.e., the so-called bain framework; Hoijtink, 2011) by means of the R-package bain (Gu et al., 2019). Through the estimation of Bayes factors, this approach generates relative evidence on how much more likely the current data are to be observed under a specific hypothesis H_i compared with another hypothesis H_j $(B F_{i j} = \frac{p (D a t a | H_{i})}{p (D a t a | H_{j})})$ . Furthermore, the bain framework allows researchers to specify hypotheses that contain equality and inequality constraints as well as order constraints. For example, in an analysis of variance (ANOVA) context, one may be interested in comparing the means of a baseline group $(μ_{b a s e})$ , a group that received an intervention A $(μ_{A})$ , a group that received an intervention B $(μ_{B})$ , and a group that received a combined intervention AB $(μ_{A B})$ . However, the classical frequentist procedure only gains evidence against the null hypothesis $H_{0} : μ_{b a s e} = μ_{A} = μ_{B} = μ_{A B}$ . In contrast, the bain framework is able to provide relative evidence for or against much more general and therefore more informative (Schnell et al., 2008) hypotheses such as $H^{'} : 0 = μ_{b a s e} < μ_{A} = μ_{B} < μ_{A B}$ or $H^{″} : 0 = μ_{b a s e} < μ_{A} < μ_{B} < μ_{A B}$ .

In the present study, we hypothesized, for example, that expertise-related reasons would be more strongly endorsed by teachers than benevolence-related and integrity-related reasons (H₁; see The Present Study section), which can be written as $H_{1} : μ_{e x p} > (μ_{i n t}, μ_{b e n})$ . To gain robust relative evidence for or against this hypothesis, we used the following preregistered decision procedure (Rosman & Merk, 2020): First, we computed approximate adjusted fractional Bayes factors (Gu et al., 2019; Hoijtink et al., 2019) for a Bayesian repeated measures ANOVA regarding the comparison of H₁ with the more restricted hypothesis H_1R: µ_exp > µ_int = µ_ben. This hypothesis states that, on average, higher scores are found for teachers on expertise-related reasons compared with benevolence- and integrity-related reasons, and, furthermore, that the scores of benevolence- and integrity-related reasons are the same. Second, we computed Bayes factors for the comparison of H₁ with the null hypothesis H₀, which states that all means are equal (H₀: µ_exp = µ_int = µ_ben). If the Bayes factors provided evidence for H₁ in both cases, we finally computed a Bayes factor comparing H₁ with its complement H_1C. This complement includes any ordering of the means µ_exp, µ_int, µ_ben which does not satisfy H₁ (and hence includes, for example, H₀ and H_1R, but also mean configurations such as µ_exp < µ_int < µ_ben). We intended to label our results as "evidence for H₁ in comparison to H_1R, H₀ and H_1C" if, and only if, all three of these Bayes factors provide evidence for H₁—in all other cases, we intended to use the label "inconclusive."

This decision procedure, which is exemplarily described here for H₁, was also used for H₂, H₃, H₄, H₅, and H₆. Within each step of the decision procedure, we specified Bayes factors greater as 1/3 but smaller than 3 as rather inconclusive (despite the fact that we will try to refrain from a "dichotomous" interpretation of the resulting Bayes factors).

Technically, the bain framework follows a tradition set by O'Hagan (1995) and uses a fraction of the information in the data to set the variance of the (normal) prior distribution, which, in the context of ANOVA, is $μ_{g} ~ N (μ_{B}, \frac{1}{b_{g}} \cdot \frac{{\hat{σ}}^{2}}{N_{g}})$ . This means that the prior distribution of the mean of group g $(μ_{g})$ is located at the boundary of the hypotheses under consideration mB, and its variance is defined by the residual variance of the ANOVA $({\hat{σ}}^{2})$ , the size of the group $(N_{g})$ , and the fraction of the information regarding mg in the data $(\frac{1}{b_{g}})$ . Despite the fact that the idea of using a fraction of the data as minimal training sample for the specification of the prior distribution is well established in the literature (Hoijtink et al., 2019), deciding which fraction remains somewhat subjective. Many authors suggest $b_{g} = \frac{J}{G} \cdot \frac{1}{N_{g}}$ , whereby G denotes the number of groups and J the number of constraints of H₀ (see Gu et al., 2018, for further elaborations). To ensure the robustness of our results, we repeated each analysis with $2 \cdot b_{g}$ and $3 \cdot b_{g}$ , a procedure called "sensitivity analysis," which is recommended by several checklists for Bayesian analyses (Depaoli & van de Schoot, 2017; van Doorn et al., 2019).

Results

After carefully checking our data with regard to the exclusion criteria specified in our preregistration (i.e., major protocol deviations), we decided not to exclude any cases and proceeded with the data analysis using the full data set. Table 2 includes descriptive statistics on the study variables from the in-service teacher sample. As model assumptions were not heavily violated (see Table 2; Bosman, 2018; Tabachnick & Fidell, 2014; van Rossum et al., 2013), we used standard (instead of robust) estimators for the mean parameters and their covariance matrices.

Table

Table 2 Descriptive Statistics of Trust and Distrust Items in the In-Service Teacher Data Set

Hypotheses 1 and 2

Hypothesis 1 predicts that teachers would score higher on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons when asked about their reasons for trusting educational researchers. Moreover, concerning distrust in educational researchers, Hypothesis 2 suggests that teachers would score lower on expertise-related reasons compared with benevolence-related reasons and integrity-related reasons. Figure 1 and Table 2 provide an overview of these results and not only indicate a confirmation of Hypothesis 2 but surprisingly similar mean values for expertise and integrity as reasons for trust in education researchers (thereby contradicting H₁). In line with this, the Bayes factors (regarding reasons for trust) favored H₁: µ_exp > (µ_int, µ_ben) over H₀: µ_exp = µ_int = µ_ben and H_1R: µ_exp > µ_int = µ_ben, but not against its complement H_1C (see Rosman & Merk, 2021a, for the code and Rosman & Merk, 2021b, for the corresponding data and markdown file). According to our preregistered decision procedure, this is an inconclusive result. However, considering our rather conservative hypothesis formulation in the preregistration and the strong descriptive differences between expertise- and benevolence-related reasons, we decided to further investigate this inconclusive result and set up two new exploratory hypotheses using the bain framework: H_1a (µ_exp > µ_int) and H_1b (µ_exp > µ_ben). These additional (thus not preregistered and to be interpreted with caution) analyses resulted in strong evidence for µ_exp = µ_int (against µ_exp > µ_int; BF = 30.2) and µ_exp > µ_ben (against µ_exp = µ_ben; BF = 2,380). Hypothesis 2, in contrast, was clearly supported according to the preregistered decision procedure (all BFs > 50). All effect sizes can can be viewed in Table 3.

Figure 1. Reasons for trust and distrust in educational researchers: Product plot of the raw data and means ± 1*SD.

Hypotheses 3 and 4

Regarding the reasons for trust and distrust, Hypotheses 3 and 4 predict the same patterns as Hypotheses 1 and 2, but focus on how teachers view researchers in general instead of considering the specific group of educational researchers. Descriptive results of Hypotheses 3 and 4 are depicted in Figure 2 and Table 2. Descriptively, all effects are in line with the hypotheses and are, according to Cohen's (1988) benchmarks, "small" to "strong" in magnitude (see Table 3). All derived Bayes factors for the decision procedure specified above provided very strong evidence for the preregistered hypotheses (all BFs > 1,000; see Rosman & Merk, 2021b).

Figure 2. Reasons for trust and distrust in researchers in general: Product plot of the raw data and means ± 1*SD.

As the investigation of Hypothesis 1 remained inconclusive, as outlined in the preregistration, we did not explore differences between reasons for trusting educational researchers and researchers in general. To nevertheless gain an insight into this exploratory research question and given that Hypotheses 2 and 4 were both significant, we subsequently conducted these exploratory analyses with regard to the distrust variables. To do so, we tested the following hypotheses against each other and their complements (whereby µ indicates the group mean, "sig" denotes "science/research in general," and "es" denotes "educational science/research"):

\begin{array}{l} | μ_{\exp_sig} - μ_{int_sig} | > | μ_{\exp_es} - μ_{int_es} | & \\ | μ_{\exp_sig} - μ_{ben_sig} | > | μ_{\exp_es} - μ_{ben_es} | \\ | μ_{\exp_sig} - μ_{int_sig} | = | μ_{\exp_es} - μ_{int_es} | & \\ | μ_{\exp_sig} - μ_{ben_sig} | = | μ_{\exp_es} - μ_{ben_es} | \\ | μ_{\exp_sig} - μ_{int_sig} | < | μ_{\exp_es} - μ_{int_es} | & \\ | μ_{\exp_sig} - μ_{ben_sig} | < | μ_{\exp_es} - μ_{ben_es} | \end{array}

In this analysis, the first hypothesis was strongly favored by the Bayes factors (against the other two and against its complement; all BFs > 6,000). This leads to the conclusion that there is strong evidence for the hypothesis of a relatively stronger smart but evil stereotype regarding researchers in general.

Hypotheses 5 and 6

Hypotheses 5 and 6 focus on how teachers and an age- and SES-matched general population sample compare in their trust and distrust of researchers in general. These comparisons were made using data from the 2018 Science Barometer, a study conducted in August 2018 by means of computer-assisted telephone interviewing. According to Kantar Emnid, who conducted the data collection, the sample is representative for the German general population aged 14 years and older (Wissenschaft im Dialog/Kantar Emnid, 2018). In total, it comprises N = 1,008 participants (51% female; 11% 14–19 years; 10% 20–29 years; 14% 30–39 years; 16% 40–49 years; 18% 50–59 years; 13% 60–69 years; 18% 70 years+; 83% former West Germany).

Before testing Hypotheses 5 and 6, we matched participants of our study to the 2018 Science Barometer participants. We thereby used genetic matching (Diamond & Sekhon, 2013; Sekhon, 2011) with replacement to achieve a general population sample with comparable joint distributions concerning age, education, and interest in politics, sports, and science. As can be seen in Table 3, this matching procedure resulted in very similar samples concerning central tendency (maximum difference in means corresponds to an absolute Cohen's d of 0.02) and shape of the distributions (maximum of differences in empirical cumulative distribution functions = .09).

With the matched data set, we conceptually carried out the same analyses as in Hypotheses 3 and 4, but using an estimation method that takes the sample weights (stemming from the matching with replacement) into account. The corresponding descriptive statistics are illustrated in Figure 3 . Our analyses resulted in Bayes factors favoring the hypothesis of higher expertise-related compared with benevolence- and integrity-related reasons for trust (all BFs > 1,000), and lower expertise-related compared with benevolence- and integrity-related reasons for distrust (all BFs > 1,000). Hypotheses 5 and 6 are thus supported.

Figure 3. Reasons for trust and distrust in researchers in general: Product plot of the raw data and means ± 1*SD (matched general population).

The exploratory part of these analyses (see above) was tested by contrasting the reasons for trusting researchers in general between the teacher sample and the matched sample. This was done by comparing Bayes factors for the following hypotheses (whereby µ indicates the group mean and "sig" indicates "scientists/researchers in general"):

\begin{array}{l} (μ_{\exp_{sig}^{teachers}} - μ_{{ben}_{sig}^{teachers}}) > (μ_{\exp_{sig}^{gen . pop .}} - μ_{{ben}_{sig}^{gen . pop .}}) & \\ (μ_{\exp_{sig}^{teachers}} - μ_{{int}_{sig}^{teachers}}) > (μ_{\exp_{sig}^{gen . pop .}} - μ_{{int}_{sig}^{gen . pop .}}) \end{array}

\begin{array}{l} (μ_{\exp_{sig}^{teachers}} - μ_{{ben}_{sig}^{teachers}}) = (μ_{\exp_{sig}^{gen . pop .}} - μ_{{ben}_{sig}^{gen . pop .}}) & \\ (μ_{\exp_{sig}^{teachers}} - μ_{{int}_{sig}^{teachers}}) = (μ_{\exp_{sig}^{gen . pop .}} - μ_{{int}_{sig}^{gen . pop .}}) \end{array}

\begin{array}{l} (μ_{\exp_{sig}^{teachers}} - μ_{{ben}_{sig}^{teachers}}) < (μ_{\exp_{sig}^{gen . pop .}} - μ_{{ben}_{sig}^{gen . pop .}}) & \\ (μ_{\exp_{sig}^{teachers}} - μ_{{int}_{sig}^{teachers}}) < (μ_{\exp_{sig}^{gen . pop .}} - μ_{{int}_{sig}^{gen . pop .}}) \end{array}

As the comparison of Figures 2 and 3 already suggests, the Bayes factors favored the second hypothesis against the two others and its complement (all BFs > 60), which can be interpreted as relative evidence for the equality of the magnitude of the pattern between both groups.

Discussion

The present study aimed to investigate in-service teachers' reasons for epistemic trust in educational researchers as well as in researchers in general. Previous research has suggested that educational researchers are seen as competent and qualified, but also as having comparably less integrity and benevolence. In line with such results, we expected that our participants would more strongly refer to expertise-related reasons compared with explanations focusing on benevolence and integrity to justify their trust in (educational) researchers, and that the contrary would be true when justifying their distrust. Furthermore, we expected that this pattern of results would generalize to research in general and that it would also be present in the general population. We formulated three corresponding sets of preregistered hypotheses (Rosman & Merk, 2020) that were subsequently tested in a sample of 414 German in-service teachers.