The Integrated Impact Indicator (I3), the top-10% Excellence Indicator, and the use of non-parametric statistics
Competitions generate skewed distributions. For example, a few papers are highly cited, but the majority is not or hardly cited. The skewness in bibliometric distributions is reinforced by mechanisms which have variously been called “the Matthew effect” (1), “cumulative advantages” (2) and “preferential attachment” (3). These mechanisms describe the “rich get richer” phenomenon in science. Skewed distributions should not be studied in terms of central tendency statistics such as arithmetic means (4). Instead, one can use non-parametric statistics, such as the top-1%, top-10%, etc.
In Figure 1, for example, the 2009 citation distributions of citable items in 2007 and 2008 in two journals from the field of nanotechnology (Nano Letters and Nature Nanotechnology) are compared using a logarithmic scale. The Impact Factor (IF) 2009 of the latter journal is almost three times as high as the one of the former because the IF is a two-year average. Using the number of publications in the previous two years (N) in the respective denominators erroneously suggests that Nano Letters had less impact than Nature Nanotechnology. If one instead considers the citation distributions in terms of six classes — top 1%, top-5%, etc. (Figure 2) — Nano Letters outperforms Nature Nanotechnology in all classes.
Figure 2: Frequency distribution of six percentile rank classes of publications in Nano Letters and Nature Nanotechnology, with reference to the 58 journals of the WoS Subject Category “nanoscience & nanotechnology.” Source: (5).
These six classes have been used by the US National Science Board (e.g., 6) for the Science and Engineering Indicators for a decade. By attributing a weight of six to each paper in the first class (top-1%) and five to each paper in the second class, etc., the stepwise function of six so-called “percentile-rank classes” (PR6) in Figure 2 can be integrated using the following fomula: . In this formula, x represents the percentile value and f(x) the frequency of this rank. For example, i = 6 in the case above, or i = 100 when using 100 equal classes such as top-1%, top-2%, etc.
Measuring “integrated impact” with I3 and/or PR6
Under the influence of using impact factors, scientometricians have confused impact with average impact: a research team as a group has more impact than one leading researcher, but the leading researcher him/herself can be expected to have more average impact, that is, citations per publication (c/p). Existing bibliometric indicators such as IF and SNIP are based on central tendency statistics, with the exception of the excellence indicator of the top-10% most-highly cited papers which is increasingly used in university rankings (7,8; cf. 9,10). An excellence indicator can be considered as the specification of two classes: excellent papers are counted as ones and the others as zeros.
Leydesdorff & Bornmann called this scheme of percentile-based indicators I3 as an abbreviation of “integrated impact indicator” (11). I3 is extremely flexible because one can sum across journals and/or across nations by changing the systems of reference. Unlike using the arithmetic mean as a parameter, the percentile-normalized citation ranks can be tested using non-parametric statistics such as chi-square or Kruskall-Wallis because an expectation can also be specified. In the case of hundred percentile rank classes, 50 is the expectation, but because of the non-linearity involved this expectation is 1.91 for the six classes used above (12). Various tests allow for comparing the resulting proportions with the expectation in terms of their statistical significance (e.g., 7,13).
The outcome of evaluations using non-parametric statistics can be very different from using averages. Figure 3, for example, shows citation profiles of two Principal Investigators (PIs) of the Academic Medical Center of the University of Amsterdam (using the journals in which these authors published as the reference sets). In this academic hospital the averaged c/p ratios are used in a model to allocate funding, raising the stakes for methods of assessing impact and inciting the researchers to question the exactness of the evaluation (15). The average impact (c/p ratio) of PI1, for example, is 70.96, but it is only 24.28 for PI2; the PR6 values as a measure of integrated impact, however, show a reverse ranking: 65 and 122, respectively (14). This difference is statistically significant.
I3 quantifies the skewed citation curves by normalizing the documents first in terms of percentiles (or the continuous equivalent: quantiles). The scheme used for the evaluation can be considered as the specification of an aggregation rule for the binning and weighting of these citation impacts; for example as above, in terms of six percentile rank classes. However, policy makers may also wish to consider quartiles or the top-10% as in the case of an excellence indicator. Bornmann & Leydesdorff, for example, used top-10% rates for showing cities with research excellence as overlays to Google Maps using green circles for cities ranked statistically significantly above and red circles for ones below expectation (9).
Conclusions and implications
The use of quantiles and percentile rank classes improves impact measurement when compared with using averages. First, one appreciates the skewness of the distribution. Second, the confusion between impact and average impact can be solved: averages over skewed distributions are not informative and the error can be large. Using I3 with 100 percentiles, a paper in the 39th percentile can be counted as half the value of one in the 78th percentile. Using PR6, alternatively, one would rate the latter with a 4 and the former with a 6. Thus, the use of I3 allows thirdly for the choice of normative evaluation schemes such as the six percentile ranks used by the NSF or the excellence indicator of the top-10%. Fourth, institutional and document-based evaluations (such as journal evaluations) can be brought into an encompassing framework (5). These indicators are finally well suited for significance testing so that one can also assess whether “excellent” can be distinguished from “good” research, and indicate error bars. Different publication and citation profiles (such as between PI1 and PI2 in Figure 3) can thus be compared and uncertainty be specified.
Loet Leydesdorff* & Lutz Bornmann***Amsterdam School of Communication Research, University of Amsterdam, Kloveniersburgwal 48, NL-1012 CX, Amsterdam, The Netherlands; email@example.com **Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstr. 8, D-80539 Munich, Germany; firstname.lastname@example.org