Google Search performance compared for DSpace repositories
Discover Google Search Console (GSC) data for 25 DSpace repositories and contribute your own.
As illustrated in the recent webinar on Google Search Console (GSC), repository managers use GSC to gain insight into the performance of repository pages on Google. At the end of the webinar, an attendee asked what "normal" numbers are for DSpace repositories, so he could compare his own repository against this benchmark. Unable to address this question at the time, we set out to find an answer.
At the bottom of the article, you can find instructions to send in your own data, if you repository is not part of the comparison yet.
A period of 4 months, January until April 2020 was chosen as the reference period for this comparison. An alternative reference period of one month was evaluated as too short, because of potential positive spikes or negative drops skewing the results. Making the period longer, and tracking back into 2019 was evaluated as too long. The further you go back, the higher the chance that this past performance is no longer a good indicator of today's performance, or what to expect in the future.
This is a substantial caveat in any case, especially since Google has now rolled out its "May 2020 Core Update" between May 4th and May 18th.
It is still too early to tell what the effect on DSpace repositories is for this update, but early reports from other industries seem to indicate that there were very large changes.
So if you see your numbers go up or down in May 2020, this core update has to be taken into account as a potential factor.
Anonymised vs identified repositories
Repositories included in this comparison are either listed under their repository URL, or under an anonymous number. We hope that more repositories will allow us to list their data, with the URL. Identified listings potentially give insight into so many more factors, that can make comparing more useful. Just to name one: the primary language of repository content.
For the analysis below, it is important to know that Atmire does know which repositories are under the anonymised numbers, so we could take language into account, also for those listings.
Data collected so far
The following table shows the collected data, on May 19th 2020.
You can also consult the data in a Google Spreadsheet shared at:
Clickthrough rate observations
Clickthrough rate (CTR) is the division between clicks and impressions. How many times does a link to your repository actually get clicked, when it was on a page shown to a user? Getting clicked will cause the link to gain a click in the metrics. A link will gain an impression, when it is on a page of search results that was shown to the user. If I search for something, and go to the third page of search results, around 30 links will have gained an impression, as there are around 10 links per page.
Let's say your repository link was on the 5th position in search results, and it was clicked. In this case, the link gained both an impression and a click.
Imagine another scenario: your repository link was on the 25th position in search results, but the user never navigated to the search results page where the link. Not only did the link fail to get a click, it also does not get an impression at this time.
This is just to illustrate that CTR, impressions and average position are very much interconnected, and that the value of evaluating them individually is limited.
That said, 17 repositories in the current sample of 25, or 68%, have a clickthrough rate in the range of 1.5% and 3.5%
When comparing impressions and clicks, we set out to explore whether repositories with more items generally get more clicks. This hypothesis does not seem to be supported at all. When we divide the number of clicks in this four month period, by the repository item count, we get "impressions per item" figures in a very very wide range, with the lowest ones below 10, and the highest even over 1000.
This does seem to indicate that not all repository content is created equal, and that variations in language, metadata, presence of full text, ... are substantial.
Probably the most counter-intuitive surprise, at least from our perspective, is the great performance of content in languages other than English. The top performing repositories in this sample, across all metrics, all have substantial numbers of non-English items.
We had previously assumed that, because you open up your content to a potentially larger audience, more English content would lead to more traffic. On the contrary, this data seems to show that there is a lot more competition for English keywords, leading to worse average position. That competition seems to weigh a lot heavier than the potential of the bigger English speaking audience.
An ideal position in search engine results is number 1, meaning that your link is at the top of the list of search results a user sees. So when it comes to average position, a lower number is better.
On the basis of this sample, repositories with non-English items seem to hit higher average positions compared to repositories of which the contents are predominantly in English. Intuitively, one could also assume that a better average position automatically translates into more clicks, but the data doesn't allow to make very hard conclusions here.
A mere 4 out of 25 repositories (16%), has an average position that is lower than 20. All of these had a CTR of over 3%. How Google decides on which position to display your link is not publicly disclosed, but is rumoured to have many factors. We feel like we don't have enough information to conclude whether a good position drives a good CTR, the other way around, or whether they are entirely independent variables.
13 out of 25 repositories (52%), have an average position between 28 and 35.
What can you do with these observations?
Right now, all data seems to indicate that non-English items should be embraced rather than avoided, and that more searchers might find this content useful than you may initially think.
If your average position is worse than 35 and/or your click through rate is worse than 1.5%, there might be technical improvements that could improve these, assuming that your repository contents are not less relevant to searchers than content in other repositories.
If you are unsatisfied with the absolute volume of impressions and clicks, making sure that your sitemap is registered in GSC, and that all URLs in the sitemap are fully indexed, is the first step.
How to contribute your data
To contribute your data, please send a screenshot of your "Performance" page in Google Search Console.
Make sure you have selected the time window of 2020-01-01 until, and including 2020-04-30.
Send this screenshot to firstname.lastname@example.org, together with a confirmation to include your data either under the repository URL, or under an anonymised number.
What happened to the March census?
Our earlier attempt at a comparison, the March census, failed, because it was based on automated statistics report emails, one from Google Search console, and one from Google Analytics.
As far as we could see, the February 2020 email was the last email of this kind sent by Google Analytics. It is unknown at this point why these emails haven't been sent for March or April.
Atmire provides services for DSpace repositories, including analysis and troubleshooting for indexing issues and other challenges related to online exposure of your repository content. Contact us today.
University of Bordeaux releases Citation Styles for DSpace
Vancouver, Harvard, Nature, IEEE ... it is now possible to export your DSpace metadata in any citation style of your choice.
Webinar Recording: DSpace Fundamentals
Refresh or expand your knowledge on fundamental DSpace concepts.
Webinar Recording: ORCID for DSpace Repositories
The integration with ORCID, the Open Researcher and Contributor ID, has been part of DSpace since version 5.0. Both the ORCID API, and how DSpace interacts with it, have evolved over time.