Internet Archive – only in the reading room?
The World Wide Web of the past decades is not only exciting for nostalgic trips to the early web and historical research. Archived websites are also playing an increasingly important role in legal proceedings. The most prominent way to take such a trip into the digital past is the “Wayback Machine” of the US Internet Archive. This is not a state-run archive, but a non-profit organization based in the state of California. The data sets of archived websites date back to 1995 and currently comprise over 946 billion web pages.
Internet Archive: available 24/7
In addition to its large number of archived web pages, the Wayback Machine is particularly impressive due to its unrestricted accessibility. You don’t even need to register as a user to browse the archived web. The Wayback Machine’s bots tirelessly crawl the web, performing what is known as web harvesting: they copy the websites they visit and use them to create mementos in WARC file format. WARC stands for “Web ARChive” and is the relevant file format for web archiving.
Users can surf through these mementos, moving along the hyperlinks. In addition, functions such as a term search and navigation along a timeline make it easier to find your way around. This user-friendliness probably makes it a little easier for us as users to overlook websites that are sometimes inadequately archived; for example, missing photos are a common problem. The Wayback Machine has long since become a popular part of digital cultural memory; it is always just a click away, 24 hours a day, seven days a week.
In Germany, access is only available in the reading room
In Germany, the German National Library (DNB) was given the legal mandate to archive websites in 2006. By November 2023, its web archive had grown to 60,000 mementos from nearly 8,000 websites. However, only a very small portion of these are freely accessible on the web. Access to most of the holdings is only possible on site in the DNB reading rooms in Frankfurt am Main and Leipzig. In addition, since 2018, the amended copyright law has led to cooperation between memory institutions at the federal, state, and regional levels. In states such as Thuringia and Hamburg, these collaborations with the DNB are already well advanced.
This results in the advantage for users that the reading rooms of the participating libraries offer full access to the DNB’s web archive. The more institutions cooperate with the DNB in terms of web archiving, the greater this advantage will be for users. This is because the hurdle of visiting one of the reading rooms during normal opening hours puts the DNB’s web archive – and also the web archives of other public memory institutions – at a significant disadvantage compared to the Wayback Machine. Anyone who wants to use the web archives provided by public archives in Germany therefore needs a lot of patience and commitment.
Why does it work for the Internet Archive, but not in Germany?
The difference between the free accessibility of the Wayback Machine and the reading room requirement of German archives and libraries results from the respective copyright laws. As a US-based institution, the Internet Archive invokes the fair use principle for its Wayback Machine.
What is the fair use principle?
The fair use principle is a legal doctrine from some common law countries. It allows the use of copyrighted material for purposes of public education or intellectual production without authorization. Classic areas of application for fair use include commentary, criticism, parody, news reporting, research, and science. Fair use also applies to the cache functionality of search engines and to web harvesting of web archives.
However, this does not mean that fair use prevents website owners from taking steps to prevent their own web content from being archived. On the one hand, technical measures can be taken in advance to prevent harvesting, and owners can also protect their content retrospectively. Anyone who owns a website that has been archived by the Internet Archive can request its deletion from the archive. Observers assume that the majority of conflicts between website owners and the Internet Archive are resolved through such subsequent interventions.
“In their own space” – the terminal barrier in copyright law
German memory institutions, on the other hand, have little legal leeway under copyright law for collecting, storing, and making web content accessible. According to Section 16 of the German Copyright Act (UrhG), “harvesting,” i.e., storing websites from the current web in the inventory of the respective archive, is a process of reproduction. The resulting legal requirements and restrictions remain significant even after the 2018 amendment to copyright law.
For example, the terminal barrier in Section 60e (4) UrhG is limited to making content accessible “on its premises,” i.e., in the library or archive. Access from elsewhere, for example via VPN or a password-protected account, is not provided for. The obligation to use the reading room is thus cemented.
Those who collect a lot also collect many risks
In addition to making archived websites accessible, the comprehensive collection of websites is already legally challenging. In this so-called “blanket harvesting,” neither a specific obligation to deliver nor the consent of the website owner can be assumed. This inevitably means that material from the web that is protected by copyright and . Also, for example, the images and texts embedded on an original website may violate the license or personal rights of third parties. If such a problematic website is then copied in the course of extensive harvesting, stored on an archive server, and later reproduced for archive users, this means potential legal problems for the memory institution concerned.
Accessibility as a topic for the future?
Perhaps it is still too early to fundamentally redefine the accessibility of web archives. Both historical science and the cultural heritage sector are still gradually approaching the web of days gone by. The demand for content from web archives is therefore likely to increase. This will also increase the pressure to address the problem.
Florent Thouvenin, holder of the Chair of Information and Communication Law at the University of Zurich, emphasized in 2017 during a discussion between “science and practice” on the revision of Swiss copyright law the need to give greater consideration to private and public web archives in copyright law. This is relevant in view of the volatility of the World Wide Web and the dangers of “abuse of power” by global private-sector players such as Google. However, according to the conference minutes, the first priority is rapid web archiving, while accessibility “could also be resolved in the future under certain circumstances.”
This certainly not uncontroversial thesis may be read as a challenge to promote the discourse on the accessibility of web archives more actively than before, whether at the level of legal practice, legal systematics, or politics. The clock is ticking to ensure that yesterday’s web can still be used tomorrow.
Would you like to support iRights.info?
iRights.info provides information and explanations on the subject of “Copyright and creativity in the digital world”. All texts are published free of charge and openly licensed.
If you like, you can support us via the donation platform Betterplace and receive a donation receip. Betterplace accepts PayPal, direct debit, credit card, paydirekt or bank transfer.
We would be particularly pleased to receive a regular contribution, for example as a monthly standing order. The non-profit organization iRights e.V. would like to thank you for your support!
DOI for this text: · Automatic DOI assignment for blogs via The Rogue Scholar






Was sagen Sie dazu?