Since scientific literature is now published and distributed mainly online, a number of initiatives have been developed to attempt to measure scientific impact from download data. Such data would allow scientific activity to be observed immediately after publication, rather than having to wait for the citations. Shepherd (1) and Bollen et al. (2) propose a Download Impact Factor as a journal metric. It consists of the average download rates of articles published in a journal, similar to the citation-based Journal Impact Factor (JIF). COUNTER (3) define as standard a Journal Usage Factor using the median rather than the mean. Bollen et al. (2, 4) have demonstrated the feasibility of a variety of social network metrics calculated from the download networks extracted from the information contained in the clicks recorded in download logs.

Bollen et al. (5) conducted a principal component analysis of the rankings of journals produced by 39 measures of academic impact calculated from both citation and download log data. Their results indicate that the notion of scientific impact is multi-dimensional, and cannot be adequately measured by a single indicator, although some might be more suitable than others. In particular, they observed greater significance with indicators based on downloads, possibly because of the great amount of download data that can be collected.

Although Kurtz et al. (6) show how the citation obsolescence function (7) and readership follow similar trajectories over time, Schloegl & Gorraiz (8, 9) find that downloads and citations have different patterns of obsolescence. While Darmoni et al. (10) and Bollen et al. (5) report that a journal's download frequency does not to any great degree correspond with the impact factor, Schloegl & Gorraiz (9) calculate a strong correlation at the journal level between citation and download frequency when absolute values are used, and a moderate to strong correlation between the number of downloads and the journal impact factor. In this sense too, Wan et al. (11) define a download immediacy index.

 

Download as predictor of citation

In recent papers (12, 13) we have used data from Scopus (citations) and ScienceDirect (downloads) to study the relationship between Downloads and Citations and the influence of publication language. Therefore we studied these parameters for the journals in non-English languages in ScienceDirect, specifically, for those with more than 95% of their articles in French, German, or Spanish in the period 2003-2011. We also defined a control group of journals in English in order to establish the differences with the non-English language journals. For each non-English journal, we selected as control at least one English-language journal that was present in both databases, that belonged to the same Specific Subject Area, and had a similar number of published articles. To look deeper into the phenomenon, we compared the geographical origins of the downloads and of the citations of the two groups. It must be noted that the set of German- and Spanish-language journals is too small to draw any significant separate conclusions.

Scopus and ScienceDirect cover different numbers of papers. This is because the latter includes all papers, while the former does not include Conference/Meeting Abstracts or Book Reviews. The divergence between them is mainly due to the Conference/Meeting Abstracts. The time-obsolescence curves of citations and downloads differ (see Figure 1). One appreciates the effect in the former of the time it takes for a paper to be cited, and in the latter of novelty in the downloads. The proportional difference between the downloads received by Reviews and other document types increases relative to the citations.

Figure 1- Left panel: Mean primary citations for Scopus document types by age of the document in years. Right panel: Mean downloads of the main document types corresponding to Scopus by age in years after the online publication date. Comparing the data for "excellent" papers (solid lines) with those for other papers (dashed lines).

The "excellent" papers (those belonging to the top 10% cited in the corresponding Specific Subject Area, document type, and year) (14) showed a great difference in mean downloads with respect to the non-excellent papers throughout the period. The percentage difference was greater both at the end of the period and for the document types of medium or low download levels.

The order of the Subject Areas in mean citation does not coincide with that in mean downloads: while Psychology was always behind Medicine in citations, it was always ahead of Medicine in downloads. This may reflect different habits in different areas, with some areas seeming to read proportionally more than they cite.

There were positive and statistically significant correlations between downloads and citations by journal and by age in years for the entire set of journals, both English and non-English (0.77 on average), but these were greatly reduced both in value and in statistical significance in the case of the non-English language journals.

In the control journals, it seems that there is a novelty effect at the beginning, with there being many downloads that do not result in citations. This may be the reason that the correlations are weakest in the first year after publication. Interestingly, the strongest correlations are found in the seventh year after publication. This may correspond to when researchers are looking for a specific paper, probably redirected by some citation.

The correlations at the level of individual papers are considerably weaker (0.42 on average) than those at journal level, but markedly more significant statistically because of the far greater sample size. Nonetheless, the relative weakness of the correlation (around 55% of the correlations of the journals) may be indicative that the number of downloads, besides being a function of the quality of the paper (reflected in its citations), largely depends on the diffusion of the journal and on the effect of novelty itself. Thus, articles published in journals of wide circulation and diffusion, with high mean impact, have many downloads, even though for some papers this does not lead to many citations. Also, works published in journals of lower mean impact have fewer downloads, regardless of whether or not some of those papers later receive many citations.

All this means that the potential usefulness of download data as a predictor of citation is limited, especially so given that it is in the early years when the significance is the lowest. This circumstance was even more marked in the case of non-English language journals.

 

Origin of Download/Citation and language

Figure 2 reveals that the control journals are downloaded proportionally slightly less than they are cited by the most productive countries. Instead, the non-English journals studied are downloaded proportionally more than twice as much as they are cited. This may reflect that a part of the citation impact of these non-English language journals is invisible to Scopus, because the authors who download those papers cite them in articles published in journals that are not indexed in Scopus. For example, Belgium has a percentage of downloads of control journals that is 42% less than the percentage of citation to the same journals, while having a percentage of downloads from the non-English journals which is 242% higher than the percentage citation to these journals.

Figure 2- Plot, for the 27 greatest scientific production countries, on the vertical axis the ratio of the percentage of downloads from the control journals and the percentage of citations to these journals, against on the horizontal axis the ratio of the percentage of downloads from the French, German, and Spanish journals and the percentage of citations to these non-English journals. The area of each circle is proportional to that country's total number of downloads.

In the most productive countries, there is an association between the control journals' citations or downloads and a proportional increase in their downloads relative to their citations. This is to say that users who frequently download or cite the control journals download them proportionally more than they cite them. This effect is not observed for the non-English language journals studied.

In francophone regions, there is a proportionally greater decrease of downloads from control journals than of citations to those journals. In the German and Spanish language cases, the equivalent results have little significance because of the very few journals involved, some of which have been loaded into ScienceDirect retrospectively.

In sum, there seems to be a part of the citation impact of non-English language journals that is invisible to Scopus, which makes the number of downloads proportionately greater than the citations. This also has its effect on the lack of correlation between downloads and citations in these non-English journals, which means that if one wants to predict the citation rate for these titles, it will be difficult to use download data to do so.

Acknowledgments

This work was supported by Elsevier as part of the Elsevier Bibliometric Research Program (EBRP), and financed by the Junta de Extremadura, Consejería de Empleo, Empresa e Innovación and by the Fondo Social Europeo as part of the research group grant GR10019.

 

References

(1) Shepherd, P.T. (2007) “The feasibility of developing and implementing journal usage factors: a research project sponsored by UKSG”, Serials: The Journal for the Serials Community, Vol. 20, No. 2, pp. 117-123.
(2) Bollen, J., Van de Sompel, H. and Rodriguez, M.A. (2008) “Towards usage-based impact metrics: First results from the MESUR project”. In Joint Conference on Digital Libraries (JCDL2006), Pittsburgh, PA, June 2008.
(3) COUNTER (2014). “Usage Factor a COUNTER standard”. Available at: http://www.projectcounter.org/documents/Draft_UF_R1.pdf.
(4) Bollen, J., Van de Sompel, H., Smith, J. and Luce, R. (2005) “Toward alternative metrics of journal impact: a comparison of download and citation data”, Information Processing and Management, Vol. 41, No. 6, pp. 1419-1440.
(5) Bollen, J., Van de Sompel, H., Hagberg, A, and Chute, R. (2009) “A principal component analysis of 39 scientific impact measures”, PLOS ONE, Vol. 4, No. 6: e6022. doi:10.1371/journal.pone.0006022.
(6) Kurtz, M.J., Eichhorn, G., Accomazzi, A., Grant, C.S., Demleitner, M. & Murray, S.S. (2005) “The bibliometric properties of article readership information”, Journal of the American Society for Information Science and Technology, Vol. 56, pp. 111-128.
(7) Egghe, L. & Rousseau, R. (2000) “Aging, obsolescence, impact, growth, and utilization: Definitions and relations”, Journal of the American Society for Information Science, Vol. 51, No. 11, pp. 1004–1017.
(8) Schloegl, C. & Gorraiz, J. (2010) “Comparison of citation and usage indicators: The case of oncology journals”, Scientometrics, Vol. 82, No. 3, pp. 567–580.
(9) Schloegl, C. & Gorraiz, J. (2011) “Global Usage Versus Global Citation Metrics: The Case of Pharmacology Journals”, Journal of the American Society for Information Science and Technology, Vol. 62, No. 1, pp. 161–170.
(10) Darmoni, S.J., Roussel, F., Benichou, J., Faure, G.C., Thirion, B. & Pinhas, N. (2000) “Reading factor as a credible alternative to impact factor: a preliminary study”, Technol. Health Care, Vol. 8, No. 3-4, pp. 174–175.
(11) Wan, J.-K., Hua, P.-H., Rousseau, R. & Sun, X.-K. (2010) “The journal download immediacy index (DII): Experiences using a Chinese full-text database”, Scientometrics, Vol. 82, No. 3, pp. 555–566.
(12) Guerrero-Bote, V.P. & Moya-Anegón, F. (2013) “Relationship between Downloads and Citation and the influence of language”. In: J. Gorraiz, E. Schiebel, C. Gumpenberger, M. Hörlesberger & H. Moed (Eds.), Proceedings of the 14th International Conference on Scientometrics and Informetrics—ISSI 2013 (pp. 1469–1484). Vienna: Austrian Institute of Technology.
(13) Guerrero-Bote, V.P. & Moya-Anegón, F. (2014) “Relationship between Downloads and Citations at Journal and Paper Levels, and the Influence of Language”, Scientometrics (in press).
(14) Bornmann, L., Moya-Anegón, F. & Leydesdorff, L. (2012) “The new excellence indicator in the world report of the SCImago institutions rankings 2011”, Journal of Informetrics, Vol. 6, No. 2, pp. 333-335
VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)