Authoring Webpages/Preventing link rot

If you maintain a web site, or if you use links to other web sites (like in a blog or on a wiki), then you could suffer from link rot. Link rot is the process by which links on a website gradually become irrelevant or broken as time goes on, because websites that they link to disappear, change their content or redirect to new locations. Link rot particularly affects free web hosts, like GeoCities, where people lose interest in maintaining their site.

Discovering

Detecting link rot for a given URL may be difficult using automated methods. If a URL is accessed and returns back an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a 200 (found) response (instead of a 404) that indicates the URL is no longer accessible. In the end, the only reliable way to test that a link is still valid is to click through and check it.

Combating

Webmasters

A number of basic rules can help webmasters to reduce link rot, including:

Do not keep a hyperlink collection unless you are willing to look after it.
Design your hyperlinks to be maintained, such as a central hyperlink collection.
Do not link to sub-pages ("deep linking") unless you are confident that they will remain stable.
Use hyperlink checker software or a Content Management System (CMS) with link checking included.
Use permalinks.
Put the right e-mail address or other contact information on the same page where the links are with specific information ("Found a bad link? Contact links@example.com and we'll fix it.")
When changing domains, help others fix their link pages by spreading the information well ahead of the migration, and use HTTP status codes to communicate that a page has moved (e.g. "301: Moved Permanently").

Citing URLs

When linking you should avoid citing "unstable" Internet references. There are several approaches that you can take to avoid introducing link rot:

Avoid using URLs that point to resources on a personal site
Use Persistent Uniform Resource Locators (PURLs) and digital object identifiers (DOIs) whenever possible.
Use WebCite to permanently archive and retrieve cited Internet references

Tools

There are a number of tools that can be used to combat link rot by archiving web resources:

WebCite, a tool specifically for scholarly authors, journal editors and publishers to permanently archive "on-demand" and retrieve cited Internet references.
Archive-It, a subscription service, allows institutions to build, manage and search their own web archive
hanzo:web is a personal web archiving service created by Hanzo Archives that can archive a single web resource, a cluster of web resources, or an entire website, as a one-off collection, scheduled/repeated collection, an RSS/Atom feed collection or collect on-demand via Hanzo's open API.
Spurl.net is a free on-line bookmarking service and search engine that allows users to save important web resources.

Modern management

On Wikipedia, and other Wiki-based websites only external links still present a maintenance problem. Wikipedia uses a clear color system with internal links, so the user can see if the link is live before clicking on it. If referencing an old website or dated information, users can externally link to pages in the Internet Archive, allowing for a reliable permanent link.

References

Gunther Eysenbach and Mathieu Trudel (2005). "Going, going, still there: using the WebCite service to permanently archive cited web pages". Journal of Medical Internet Research. 7 (5).

Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins (2004). "Sic transit gloria telae: towards an understanding of the Web’s decay". Proceedings of the 13th international conference on World Wide Web. pp. 328–337. http://www2004.org/proceedings/docs/1p328.pdf.

External links

^{[dead link]} - a tool for recovering lost websites from the Internet Archive and search engine caches

"Solving link rot", w3-uid.net Site ID Directory Registration, UTC: Fri Nov 09 2012 13:45:09, 1352468709090 ms.