A comparison of citations, downloads and readership data for an information systems journal
In the past, citations were the prime source for measuring scholarly impact. With the advent of altmetrics, it is possible to detect the use and consumption of scholarly publishing on a much broader basis (1). According to Plum Analytics, besides citations, metrics can be provided on the basis of usage, captures, mentions, and social media (2). In this contribution we will elaborate on the similarities and differences between one example from each of the first three metrics types mentioned above: citations from Scopus; downloads from ScienceDirect; and readership counts from Mendeley. As a use case, we chose the Information Systems journal Information and Management, including all issues from 2002 to 2011.
Information and Management is one of the leading Information Systems journals. It usually publishes eight issues per year and has a geographical focus on Anglo-American and South East Asian countries with regard to authorship and associate editors. From the nearly 600 research articles in the period of analysis, half were published by authors from the U.S. and approximately one third by authors from Taiwan, China, South Korea and Singapore.
Citations and downloads were provided by Elsevier in the framework of the Elsevier Bibliometric Research Program (EBRP) (3). For the publications of the analyzed Information Systems journal all monthly downloads were made available from ScienceDirect (4) and all monthly citations from Scopus (5). Furthermore, we received the readership counts from Mendeley (6). Mendeley is a social reference management system which helps users with the organization of their personal research libraries. The articles, provided by users around the world, are crowd-sourced into a single collection called the Mendeley research catalogue. This makes it possible to calculate the readership frequencies of an article which indicates how many Mendeley users have added it to their personal research library. At the time of writing, this catalogue contains more than 110 million unique articles, crowd-sourced from over 2.5 million users, making it an interesting source of data for large scale network analysis.
Relation between citations, downloads and readership counts
Figure 1 shows the relationship between downloads, citations and readership frequencies for all full-length articles (7) published between 2002 and 2011. Data were provided mid 2012 for citations and downloads and in October 2012 for readership data. As can be seen, articles that are downloaded more often are in general cited more frequently. Furthermore, the more frequently an article can be found in Mendeley user libraries (number of readers), the more often it is usually downloaded and cited.
Figure 1 - Downloads vs. cites vs. readers (publication year: 2002-2011, doc type: full-length article)
This is also reflected through the rank correlations (Spearman) among these three indicators, which are 0.76between citations and downloads, 0.66between downloads and readership counts, and 0.59 between citations and readership counts. Similar correlations were computed for another Information Systems journal (Journal of Strategic Information Systems) (8). The fact that there is a strong but not a perfect correlation between these three indicators gives a first indication that they measure partly different aspects of scholarly communication. Therefore, we will look deeper into each measure. In a first step, we will investigate possible differences in obsolescence characteristics. Since Mendeley started only in 2009 and had a high growth in its user base since then, we will perform the obsolescence analysis only for citations and downloads.
Obsolescence characteristics of citations and downloads
Figure 2 shows the year-wise citations and the year-wise downloads (for privacy reasons, the download numbers are not specified) for an article (9) published in Information and Management in 2004. Since the article was put online in ScienceDirect on October 14th, 2003, it was already downloaded before the print publication year. Typically, the download numbers peak in the (print) publication year. In the following years, the download volume normally decreases slowly. However, a new increase is possible, for instance, due to the citation impact of an article. To some degree, also the general rise of downloads (users) in ScienceDirect might have some effect. In contrast, citations are low in the year of publication and reach their maximum several years later.
Figure 2 - Year-wise downloads and citations for the article by Amoako-Gyampah and Salam (2004) (9)
To give a more general picture, we show the year-wise downloads for all full-length articles published in Information and Management from 2002 and 2011 in Table 1. For privacy reasons, we only give relational numbers. As a matter of fact, the download numbers are one “magnitude” higher than the citation counts. As can be seen, the download maximum (formatted in bold) always (besides 2002) occurs in the (print) publication year. However, for older volumes (publication years: 2002 - 2005) a re-increase in the downloads can be observed in the years 2008 and 2009 after a decline in the previous years. Table 2 displays the year-wise citations for the corresponding document types in Scopus (article, proceedings paper, and review) and confirms what was already mentioned above.
Table 1 - Year-wise relation of downloads per print publication year (2002-2011), document type: full-length article - FLA (n=581)
Table 2 - Year-wise citations per publication year (2002-2011), document types: article, review, conference paper (n=533, only cited documents)
User analysis of Mendeley readers
Mendeley enables their users to create and maintain user profiles that include, among other information, their professional status. This makes it possible to conduct an analysis of the user structure of Mendeley “readers”. As can be seen in Figure 3, more than two thirds of the readers of the analyzed journal are students (most of them PhD and master students). Professors, associate professors and assistant professors, who might have a considerably higher proportion in the Scopus publications, account for only 15 % of Mendeley users. These results are in line with those found when investigating another Information Systems journal (10).
Figure 3 - Readership structure of the articles in Mendeley (2002-2011) (data extraction: October 2012)
In our analysis we identified a high (though not a perfect) correlation between citations and downloads which was slightly lower between downloads and readership frequencies and again between citations and readership counts. This is mainly due to the fact that the used data (sources) are related either to research or at least to teaching in higher education institutions. In the research process, papers are downloaded (for instance, from ScienceDirect) and, more or less frequently, their bibliographic data are entered into a reference management system (for instance, Mendeley). Later on, the very same papers may be cited by an article which, when accepted in a journal covered by a citation index such as Scopus, will increase their citation impact. Though being used in a similar “context”, the three data sources have several differences. They concern, among others, the contents and the user population.
The Mendeley catalogue with its 110 million unique documents is the largest data source among the three. It includes articles not only from journals (also from journals not included in Scopus) but also grey literature, proceedings articles and monographs. Since an article must be entered by at least one user in Mendeley, not all of the journal articles from Scopus are necessarily covered by Mendeley. In particular, coverage varies between disciplines (11). ScienceDirect is a full-text service, providing a subset of Scopus articles (see Figure 4). All three are owned by Reed Elsevier.
Figure 4 - Coverage of ScienceDirect, Scopus and Mendeley (size of the ovals does not represent the real relations in size among the data sources; the rectangle represents the articles from the analyzed journal Information and Management)
Since the analyzed journal was almost fully covered by the three data sources (more than 95 per cent of ScienceDirect’s full-length articles published between 2002 and 2011 were covered by Mendeley in October 2012), one of the strongest remaining influencing factors onto the relation between citations, downloads and readership frequencies might be their user structure (see Figure 5).
Figure 5 - Size of user communities of ScienceDirect (downloading users), Scopus (publishing and citing authors) and Mendeley (readers) (size of the ovals does not represent the real relations in size among the user numbers)
As was reported before, two thirds of the Mendeley users are students. Contrary to bachelor and master students (approximately 25 per cent of all Mendeley users), PhD and doctoral students are often also engaged in publication activities in particular in the Natural Sciences. Nevertheless, senior researchers might have the highest publication output in Scopus. ScienceDirect might have the broadest user base covering also users who are not actively involved in scholarly publishing (for instance, university teachers). Due to the different user structure the motives for downloading, reading and citing articles will be different too. Therefore, a perfect relation between the three indicators cannot be expected.
This report is based in part on the analysis of anonymous ScienceDirect usage data and Scopus citation data provided by Elsevier within the framework of the Elsevier Bibliometric Research Program (EBRP). Readership data were provided by Mendeley. The authors would like to thank both Elsevier and Mendeley for their great support and the reviewers from Research Trends for their useful comments.