How Wikipedia Works/Chapter 1
Chapter 1. What's in Wikipedia?
Wikipedia is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. Even if you only read the titles of Wikipedia articles, it would take you most of a month, without a break, to scan all of them. If you tried the same with Microsoft Encarta, or any traditional encyclopedia, you could be done in about a day, with time left over to eat, shower, and take yourself to bed. Reading the full content of Wikipedia would take you well over two years, if you read continuously—and then you would have to start over, as most of the pages would have changed in the meantime.
There are well over five million articles in Wikipedia. And the site is still growing at an enormous rate, so this total will doubtless be much higher when you read this than it is as we write it (see Figure 1.1, “Wikipedia's growth over time”). By early 2008, the English-language Wikipedia was estimated to consist of over 960,000,000 words, which is equivalent to over 1,700 copies of War and Peace (itself about 560,000 words long in a standard English translation).  On average, another 20 to 40 million words were being added each month, or 35 to 70 more copies of War and Peace—or one copy every 12 hours, all day, every day, continuously.
Figure 1.1. Wikipedia's growth over time
This enormous growth has been occurring since Wikipedia began. Some more statistics show that the site has grown most rapidly since 2005, as Wikipedia's mainstream popularity took off:
- The site launched on January 15, 2001.
- It ballooned to 250,000 articles by April 2004, on the English-language site alone.
- It passed 500,000 English-language articles in March 2005.
- A year later, on March 1, 2006, the English-language Wikipedia surpassed the 1,000,000-article milestone.
- By late 2006, there were over 1.5 million English-language articles, with around 1,700 new articles being added each day.
- The article total surpassed 2,000,000 in September 2007.
- By August 2008, there were over 2,500,000 articles. At this point, articles were being created at a rate of 10,000 articles per week.
During this same period, Wikipedias in other languages were also experiencing tremendous growth; see Chapter 15, 200 Languages and Counting for more on these projects.
Wikipedia has never had a target number of articles; any contribution is kept in the encyclopedia as long as it meets Wikipedia's standards. The average Wikipedia article is still quite short, say 500 words, but articles also tend to grow over time.
With well over five million articles in the English-language Wikipedia, topics include almost everything imaginable: from detailed explanations of basic science topics to equally detailed expositions of episodes of popular television shows. There are articles on railway locomotives, programming languages, people of all types, abstract concepts, and cities and towns all around the world. Finding out what's in Wikipedia is one of the great joys of exploring the site.
This first chapter will offer an introduction to the encyclopedia through the following approaches:
- Describing the content found in Wikipedia. (If you're overwhelmed by Wikipedia's labyrinthine setup, Chapter 3, Finding Wikipedia's Content will discuss good ways to navigate around the site and explain how to find content by searching and browsing.)
- Explaining the types of content the encyclopedia aims to include by outlining the criteria for topic inclusion, the style in which topics are covered, and other content policies. Once you understand something about the policies and guidelines that govern content, you can start to get a feel for Wikipedia's house style—the telling details that indicate whether an article has been worked on by good editors. (Chapter 4, Understanding and Evaluating an Article will explain in greater detail how to evaluate an article's quality.)
What Is an Article?
An article, in this context, is defined as a Wikipedia page that contains encyclopedic information. Technically, the article count only measures pages of content that are not dead ends (which means they contain at least one internal link leading to another Wikipedia article) and are not redirects (pages that simply automatically take you to another article). The article count also ignores a great variety of other types of pages that are not devoted to content (administrative, internal, image description, and community pages, all described in detail in "Non-article Content" on Section 3.1, “Types of Non-article Pages”). Counting all these other pages brought the total Wikipedia page count to over 13,000,000 by mid-2008.
Summarizing the parts of Wikipedia that do not consist of encyclopedia articles and explaining how to tell the difference between articles and other types of pages.
The basic information in this chapter will provide the foundation for understanding how to edit Wikipedia, described in Part II, and how to participate in the site's community, described in Part III.
Wikipedia covers every topic found in general encyclopedias, specialist encyclopedias, and almanacs, along with many topics not covered in any of these traditional references. This is possible in part because Wikipedia is not constrained by the economics of traditional publishing; it does not need to pay writers or spend money on paper. (Wikipedia is instead constrained by the judgment of its volunteers: It does not accept just any article. Several inclusion policies are enforced.)
- Note: The ultimate purpose of Wikipedia's community is to create and improve articles and to distribute them freely.
- There has always been interest in Wikipedia's milestones—the moments at which the number of Wikipedia articles surpasses certain round numbers. Friendly betting pools developed around guessing the milestone date for a half-million and then a million articles. At this writing, the five million and ten million article betting pools are open for guessing the exact date when Wikipedia will reach these milestones. (The prize is widespread recognition of your remarkable guessing skills.) See w:Wikipedia:Pools.
- The actual millionth article, created on March 1, 2006, was w:Jordanhill (railway station), an article about a railway station in Scotland. Hundreds of people counted down on the IRC channel and the wiki to see which of a flurry of new articles would be the one millionth article. Many editors waited anxiously for the opportunity to post; over one hundred articles were contributed during the same second. There was even major media coverage of the event; see . The two millionth article was created on September 9, 2007. Amid some confusion, the article w:El Hormiguero, about a Spanish TV comedy, was identified as probably being the two millionth article.
Audience and Level
All articles should be clearly worded and accessible to a general readership, but Wikipedia also welcomes specialist articles that require a background in the topic to be fully understood. These articles should include context for the lay reader, however.
On rare occasions, two articles about a topic exist—an uncompromising article that provides a full picture and a more accessible "introduction" article for nonspecialists (for example, w:Introduction to entropy). See w:Category:Introductions.
Articles vary widely in length, detail, and comprehensiveness. Most of Wikipedia's articles begin their lives as stubs (very short summaries) and are gradually built into more comprehensive treatments by several editors. Stubs are incomplete—by definition, they lack something vital—but they are often useful and well written. Approximately 70 percent of Wikipedia articles are still classified as stubs.
The remaining 30 percent of articles (perhaps numbering over half a million) are more in-depth, comprehensive treatments of a subject. These may rival or go beyond the best work in traditional encyclopedias. A high-quality article includes numerous sources and references, pictures or diagrams, and a complete and clear explanation of the topic.
Types of Articles
Are you wondering how Wikipedia found enough topics to fill two million articles? Here are some (but by no means all) of the types of content that are included:
- Traditional encyclopedia topics
You can find all the types of content that you might expect from a general encyclopedia such as Encyclopaedia Britannica. Articles about science, historical events, geography, the arts, and literature are all included.
No occupations or groups are restricted or emphasized, although in order to qualify for an article, the person must be notable, that is, well known within his or her major field of endeavor. Once this criterion is met, you may write an article about anyone: artists, musicians, scientists, historical figures, authors, athletes, politicians, monarchs, and on and on. (People are discouraged from writing about themselves, however.) The Wikipedia biography project (Wikipedia:WikiProject Biography) keeps track of biographical articles; by the end of 2007, there were nearly 400,000 articles listed as biographies, or nearly 20 percent of Wikipedia (see Figure 1.2, “A representation of content in Wikipedia from August 2007: 7.2 percent of articles are about places; 3.4 percent about albums and singles; 3.0 percent about tree-of-life zoology; 1.6 percent about films; 10.8 percent about living people; and 8.9 percent about other biographies. Disambiguation (dab) pages comprise 4.2 percent of Wikipedia. Twenty thousand articles represent 1 percent of Wikipedia. These numbers were compiled by Dutch Wikipedian Eugene van der Pijll.”).
There are articles not just on countries, provinces, and major geographical features but also about cities and towns worldwide. For instance, there is an article about every city or hamlet in the United States (approximately 40,000 are recognized by the US Census Bureau).
Rambot: Most of the 40,000 articles about American towns were not created by hand; instead, they were created automatically with freely available census data. (The automated user account that created the pages is affectionately called Rambot.) For some time after Rambot made its initial efforts in 2002 and 2003, some community members complained that these census-based articles made up too much of the total article count. Now, however, it's not an issue because local residents and others have improved nearly all of the bot's articles, and the increase in other content means these articles now comprise only about 2 percent of the site.
There is still plenty to do in these conventional topic areas, but they don't crowd out other topics. Wikipedia includes many nontraditional subjects as well, including the following:
- Fictional characters
Want to read up on the personal history of Frodo or Darth Vader? While articles about real people are certainly included on Wikipedia, articles about well-known fictional characters are included as well.
- Media—movies, books, albums, songs, television shows (and their episodes), videogames, and more
Work in almost any medium can be considered for its own article.
- Companies and organizations
There are factual articles about most well-known corporations. The field of technology is covered particularly well. For example, the articles about Microsoft and Apple, Inc., are both comprehensive; these two articles reference roughly 100 outside sources apiece. Companies can be included in Wikipedia if there is enough reliable information and independent reporting available to support a useful article (simple existence of the company is not enough to qualify, and promotional material is not welcome). As with biographies, writing about your own organization or company is discouraged.
Figure 1.2. A representation of content in Wikipedia from August 2007: 7.2 percent of articles are about places; 3.4 percent about albums and singles; 3.0 percent about tree-of-life zoology; 1.6 percent about films; 10.8 percent about living people; and 8.9 percent about other biographies. Disambiguation (dab) pages comprise 4.2 percent of Wikipedia. Twenty thousand articles represent 1 percent of Wikipedia. These numbers were compiled by Dutch Wikipedian Eugene van der Pijll.
- Computer software and hardware
Considering the way Wikipedia is authored, you might expect a few articles about computers, and you'd be right—there are thousands of articles about programming languages, software, hardware, and computer science theory.
Wikipedia has been a hit with transportation enthusiasts. There are thousands of articles about railway stations, canals, airports, and other minutiae of transport networks. For instance, the article I-35W Mississippi River bridge, about the interstate highway bridge in Minnesota that collapsed on August 1, 2007, was created well over a year before that event.
- Current events
Though the site does not support original reporting, Wikipedia is updated rapidly when major stories break. Current events coverage has had a major profile ever since the up-to-the-minute coverage of the 2004 Indian Ocean earthquake and related tsunami (this article alone had well over 1,000 edits in its first 48 hours). Finding out more about current events on the site is described in Chapter 3, Finding Wikipedia's Content.
Some pages are primarily navigational. These pages exist to point the way toward other Wikipedia pages. Three types of navigational pages are well worth noting:
Linked lists are a defining feature of Wikipedia. Want to find a list of songs about Elvis Presley? No problem—it's at List of songs about or referencing Elvis Presley. Lists can be about nearly about any topic; though like any content, they should ideally be referenced. In fact, List of female tennis players was one of the earliest pages created on Wikipedia. Lists are browsable; start from List of topics to find lists of … well, nearly anything. (See Chapter 3, Finding Wikipedia's Content for some of our favorites.)
- Disambiguation pages
These pages include a whole list of links to possible articles that have similar names. For example, the Wikipedia page Orange links to articles on the color orange, the fruit, the Orange Bowl, the Dutch royal house of Orange, and numerous other pages (see Figure 1.3). Because it is not possible to anticipate which meaning you may be searching for when different topics share a name, these disambiguation pages pull together all the possible options. These pages are especially useful for biographical names: If in the course of some research, you come across a surname only, try the Wikipedia page for that name. It may quickly offer you a range of individuals to choose from.
Figure 1.3. The disambiguation page Orange
These pages simply push you from one page title to another automatically. You won't actually see these pages directly, but they are used extensively for alternate spellings, variations on names, and any other situation where confusion might exist over the precise article title. Redirects are not included in the official article count, but lists and disambiguation pages certainly are.
- http://en.wikipedia.org/wiki/Special:Statistics The auto-generated statistics page that gives the current article count
- http://en.wikipedia.org/wiki/Wikipedia:Statistics A page with other statistics and interpretations
- http://meta.wikimedia.org/wiki/Milestones A list of historical milestones for the projects
- http://en.wikipedia.org/wiki/Wikipedia:What_is_an_article? An FAQ page that describes what an article is
- http://en.wikipedia.org/wiki/Wikipedia:Wikipedia's_oldest_articles A list of some of Wikipedia's oldest articles
Article and Content Inclusion Policies
When people find out that anyone is allowed to add content to Wikipedia, they often assume that any type of content can be added and in any fashion. But in reality, editing and writing on Wikipedia is constrained by a kaleidoscopic array of rules, or policies (these are discussed fully in Chapter 13, Policy and Your Input).
Like a traditional encyclopedia, Wikipedia doesn't accept just anything, although its inclusion policies are clearly much broader than those for most encyclopedias. Articles are only kept on Wikipedia if they meet specific criteria.
Wikipedia has tried to filter out unencyclopedic material by codifying and abiding by general content policies, rather than by creating a list of approved topics ahead of time. What can be added to the encyclopedia is not laid down in advance, but is decided according to some basic principles worked out in the early days.
Policies determine both the kinds of topics that are acceptable and the way in which those topics are treated. If properly applied, the policies are designed to result in a fair treatment, no matter how contentious the topic. If policies cannot be conformed to—for example, if there are no reliable sources about a topic—then an attempt to create a good Wikipedia article for that particular topic may fail. Whether someone likes or dislikes the topic itself, however, should not have any bearing on whether an article is included. In other words, the only limit on what appears in Wikipedia is whether an article can be written that complies with all of the content policies.
No one in particular has the job of deciding whether an article is suitable for Wikipedia. Rather, contributors submit new pages to the site directly, and they go live immediately without intermediaries. Other contributors then review these articles. Large numbers of new articles are deleted every day, but new content that conforms to the content policies is kept. (See Chapter 6, Good Writing and Research for how to start a new article and Chapter 7, Cleanup, Projects, and Processes for how articles are deleted.) A new article may also be edited quite savagely to make it more suitable for keeping. An editor who inserts content that falls outside the policies, or removes content that is within them, is not furthering the aims of the project.
Although there is generally broad agreement on these policies, they rely (as with all things on Wikipedia) on editors actually applying them. If you find content that seems to violate these guidelines, it often means that no one has gotten around to fixing it yet.
Core Policies: V, NOR, and NPOV
Three policies are so central to Wikipedia's workings that the encyclopedia would be unrecognizable (or nonexistent) without them. These core policies are Verifiability (V), No Original Research (NOR), and Neutral Point of View (NPOV). In broad strokes, they form the framework in which content is created and edited on a daily basis with no top-down editorial control.
From the outset, Wikipedia was committed to a Neutral Point of View (NPOV). This policy is similar to what journalists mean by objectivity in reporting.
As time went by, contributors became more determined to keep out guesswork and rumors, so Wikipedia needed a policy that promoted fact-checking. This principle is now formulated as verifiability from reliable sources.
With Wikipedia's growing popularity, there was also a basic need to prevent Wikipedia from being used as a soapbox to spread new ideas that someone had just thought up (euphemistically referred to as original research). The No Original Research (NOR) policy says that ideas and facts must be previously published elsewhere by a third party before they are documented in Wikipedia.
Policies Are Important
Most of Wikipedia's policies began as temporary solutions to disputes or other problems. Because they worked well and proved robust in so many contentious areas, they became universal across the encyclopedia. The practical application of these policies is open to some interpretation, but if a Wikipedia contributor has major disagreements with these policies even in theory, that contributor will probably not be happy on Wikipedia.
Policies vs. Guidelines
There is a distinction between a policy, which is mandatory, and a guideline, which is advisory. Guidelines are more complex rules that help to keep Wikipedia's quality high. The three core content policies are supported by a host of associated guidelines, which will be discussed as we go along. These guidelines include the concept of notability and various principles defining the boundaries of Wikipedia's coverage.
In outline, each of the major policies is apparently simple enough. The unpacking of their implications is another matter. Imagine, if you can, an article about a rock band that is neutral about drug abuse and explicit lyrics, that only reports published documentation on trashed hotel rooms and the influence of The Smashing Pumpkins, and that cites its references in footnotes as assiduously as any doctoral dissertation. You are coming close to the distinctive Wikipedia voice.
Understanding the Policies
Verifiability (Wikipedia:Verifiability, shortcut WP:V) means that you should always be able to verify that the content of a Wikipedia article is factual, using reliable outside sources that are cited within the article. The Verifiability policy exists to make Wikipedia more accurate. Misremembered facts, casual writing, and gossip should not be included in articles.
In a perfect article, any major statement of fact is attributable to a source outside of Wikipedia, no matter which editor (anonymous or not, expert in the field or not) added the information. References in Wikipedia are explicitly cited, which is different from many traditional encyclopedias. Those works are written by small groups of experts, but because Wikipedia is open to everyone who wants to contribute, even anonymously, it is correspondingly important to be sure that an article's statements can be confirmed by reliable outside sources.
If a topic has never been discussed by any reliable, third-party sources, the Verifiability policy dictates that Wikipedia should not have an article about that topic. Writing the article should be put off until better sources have been published outside Wikipedia. (A lack of published sources might also indicate that the topic is only of interest to a few people; see "Other Guidelines" on Section 2.3, “Other Guidelines”.)
In practice, being able to verify information from other sources is very useful, even on apparently minor points. And when an article provides a list of sources, it becomes a convenient jumping-off point for further research.
Aside from benefiting readers, the Verifiability policy also simplifies things for Wikipedia editors by giving them a clear question to ask when evaluating an article's quality: Is this statement reflected in outside sources?
Though Verifiability is a core policy, it has yet to be fully implemented, and thousands of articles are tagged as being unreferenced (see Figure 1.4, “This is the template message for articles that don't cite any sources, which is a key part of complying with the Verifiability policy. These messages are meant to warn readers and alert editors that the article is unfinished.”). Verifiability is applied as a general principle. In practice, the ability of editors to verify a statement may depend on, for example, having access to a good library (a major concern in many developing countries). A fact should only be included if checking its accuracy is at least possible in theory; for important true statements, sources can almost always be found with time.
Figure 1.4. This is the template message for articles that don't cite any sources, which is a key part of complying with the Verifiability policy. These messages are meant to warn readers and alert editors that the article is unfinished.
You will certainly see unreferenced content on Wikipedia. Some of this content remains unsourced simply because sourcing is hard work, and Wikipedia is a work in progress. But some content clearly violates the idea of verifiability (for example, anything that is contentious and badly referenced or that really couldn't be referenced, such as things said in a private conversation). This material may be challenged and ultimately removed. (For more discussion on referencing style and sourcing, see Chapter 6, Good Writing and Research.)
No Original Research (Wikipedia:No original research, shortcut WP:NOR) means that all concepts and theories in Wikipedia articles should be based on previously published accounts and ideas. Wikipedia articles shouldn't contain original ideas, conclusions, descriptions, or interpretations of facts. Nor should they contain editors' personal views, political opinions, or any unpublished analysis of published material.
If you have something innovative to say, Wikipedia is not the right place to present it to the public. In other words, if you have performed an experiment, thought of a philosophical argument, or developed a mathematical proof—good for you! But this content doesn't belong in the encyclopedia unless your work has already been published somewhere else (ideally in a peer-reviewed and scholarly source).
Inevitably, there is much debate within the project about what exactly a reliable source is; this debate has gradually produced a guideline called Reliable Sources (which clarifies the Verifiability policy). It lists a wide variety of possible types of sources and naturally includes traditional scholarly books and articles. Certain websites do qualify, but self-published sources such as blogs usually do not. While source criticism (the picking of holes in the reputation of sources) should mostly be left to experts in a particular area, the meaning of the guideline is evident enough: Wikipedia aims to produce accurate, serious reference material, and the sources upon which it bases its facts must, therefore, be as reputable as possible. See Wikipedia:Reliable sources (shortcut WP:RS).
The initial motivation for the No Original Research policy was to prevent people with unconventional personal theories from using Wikipedia to draw attention to their ideas. These days, No Original Research is consistently used against the inclusion of material that is in no sense crackpot but is simply too novel for Wikipedia. Articles may also be tagged as possibly containing original research if it is suspected that material in them comes from an editor's personal experience, rather than verifiable sources (see Figure 1.5, “Article template message indicating concerns over violations of the No Original Research policy”).
Figure 1.5. Article template message indicating concerns over violations of the No Original Research policy
NOR also means that editors should not be tempted to provide historical interpretations or draw conclusions, even if they seem self-evident, without citing supporting outside sources giving the same interpretations. One consequence is that historical articles tend not to end with overall summary assessments of people or events. Conclusions from historians can be cited, but if two historians disagree, there should be no authorial attempt to reconcile the views; both sides should be given and the readers left to draw their own conclusions. Some pattern may exist in the facts, but it is not for Wikipedia to break this to the world. If someone else points it out, it can be mentioned and attributed.
Verifiability, Reliable Sources, and No Original Research clearly have something in common. In Wikipedia, both facts and opinions must be based on and referenced to outside information and ideas that have already been published. There is ongoing discussion on whether these principles can be summarized together under the idea of attribution.
Neutral Point of View (Wikipedia:Neutral point of view, shortcut WP:NPOV) means that all points of view about a particular topic should be fairly represented. NPOV is one of the oldest, most respected, and most central policies on Wikipedia. A neutral article makes no case and concentrates on informing the reader by providing a good survey of its topic. It is fair-minded and accurate and deals with controversial matters by reporting the main points where there is disagreement.
From the reader's perspective, the effect of neutrality should be this: An article on a contentious topic, such as a historical event that is seen differently by various groups, should not reveal where the article author stands on the matter. In almost all cases, such an article will have been worked over by a group of editors, and their opinions should not come through. Although the example of a rock band was given previously, there are more serious topics where maintaining a neutral point of view is not easy to apply. Consider a neutral treatment of slavery, communism, the history of Ireland, or abortion. Each of these has to be treated on a scrupulous basis, with proper weight given to all sides of the story. The discussion of rival opinions should be in a tone containing no sympathy or bias, regardless of the topic.
Neutral articles should also be comprehensive, though they don't have to be all-inclusive. All significant views should be provided or outlined, however. The reasons why a particular view is popular should be given in fair summary, but the overall expression in an article should not be slanted. NPOV doesn't mean that minority views must be written about with equal coverage to majority views, particularly when there is a wide disparity in their acceptance; points of view should be written up proportionately. Small minority views, such as "the Earth is flat," can be treated briefly, or in some cases omitted as being below Wikipedia's natural threshold of attention. There is no doctrine of equal time. In fact, to give all views equal coverage regardless of their outside acceptance is in itself an act of editorializing. The same goes for what facts or incidents are emphasized in an article; a scandal, rumor, or conspiracy theory may be included (if properly sourced), but shouldn't be given unwarranted headline status. Wikipedia is not tabloid journalism.
Using a neutral point of view, all sorts of controversies can be handled. An article should never directly include opinion within the text: "Coke is much better than Pepsi" is the wrong approach. Rather, the statement should be neutral, indirect, accurate, and specific. For example, it is acceptable to write "according to a 2006 Taste Tester's poll published in Taste Testers Monthly, 52 percent of taste testers found Coke to be better than Pepsi," with a full citation to the article being referred to. (This is a fabricated quote, by the way. See New Coke for some real quotes.) Of course, neutrality also rules out all sorts of propaganda tricks based on selective quotation.
NPOV also comes to the rescue where sources differ on the facts. Editors are often faced with contradictions in the historical record or factual matters; for example, whether person X was a nephew or a son of person Y. Both claims can be included. According to Verifiability and Neutral Point of View, this disputed factual point should appear as "Source A says X was the nephew of Y, whereas B says X was the son of Y," with references. According to the No Original Research policy, the matter should be left there, and if source C publishes some new evidence, this should then be added. Wikipedia is not a court in which verdicts are reached, and editors should not attempt to figure out the "right" answer themselves; an article may simply present the evidence, fairly and at adequate length, for the reader to consider.
Following NPOV means that advertisements, press releases, and other promotional materials aren't welcome on Wikipedia because these are inherently non-neutral. This may sound fairly obvious, but it affects the community's acceptance of other sources as well. For example, text from promotional websites for companies or schools, which are often used for sources, is often non-neutral and should be considered carefully before being cited.
In addition to making advertising unacceptable, NPOV is also a prime reason why editors are strongly discouraged from working on articles about themselves or their organizations. Except for basic factual corrections, it really is difficult to be neutral about yourself. (Also remember that any statement in an article, even if it's about a subject you know as intimately as your own life, needs to be backed up with a citation to an outside source because of Verifiability and No Original Research. Wikipedia should never be used for promotion.)
Some violations of the NPOV policy have been high profile; for instance, it was discovered that staffers for a politician were editing that politician's biography to be more favorable and removing uncomfortable facts. Naturally, this violated the Neutral Point of View policy. On January 27, 2006, the Lowell Sun reported on the Wikipedia article about an American politician, Representative Marty Meehan. It claimed that an anonymous editor, with an IP address traced to the House of Representatives offices, had been at work erasing mention of the congressman's broken term-limit promise. This then became a national news story.
All of the content policies, but particularly NPOV, affect Wikipedia's style and the way its text is worded. Disputes about NPOV often end up on the Talk Page of the article (discussed in Chapter 4, Understanding and Evaluating an Article); if there is heavy debate about a topic in evidence, an editor may flag the article as being involved in an NPOV dispute (see Figure 1.6, “Article template message indicating concern that the tagged article does not have a neutral point of view”).
Figure 1.6. Article template message indicating concern that the tagged article does not have a neutral point of view
Along with the three core policies discussed in the previous section, a handful of other guidelines help determine what content is included in Wikipedia.
Wikipedia should only cover topics considered noteworthy in the outside world, as determined by reliable, independent secondary sources. Notability helps set a baseline level for inclusion to prevent Wikipedia from becoming something other than an encyclopedia. In practice, the lack of notability is the most common reason why a topic is deemed unsuitable for a Wikipedia article.
This concept is distinct from "fame," "importance," or "popularity," but it does mean there shouldn't be articles about topics that are of interest only to a very few people or of such local interest that there are no publications about them. In other words, an article should not be about your pet or your house (unless either of these is particularly well known and has been written about previously).
Notability is easy to think about superficially but difficult to apply or cleanly define in the abstract. A feeling for notability requires a practical sense of the relative significance of topics in a field, and it also requires a scholarly sense of which types of sources determine notability. An encyclopedist has to wrestle with weighing the extent and quality of information available on a topic. To take one example, King Edward V of England, one of the princes in the Tower whose reign was cut short when his uncle, Richard III, took the throne, is clearly notable, even though much that has been written about him and his fate is speculative.
In part because of this ambiguity, Notability is much more controversial and open to debate than Verifiability, No Original Research, and Neutral Point of View, but it is also closely related to these policies. Arguments about it may be tortuous in the abstract, but in practical terms, non-notable articles are deleted from Wikipedia over time.
There are separate notability guidelines that have been set up for various controversial areas, such as actors and actresses, websites, companies, musical groups, videogames, and so on; these guidelines may be found through links on the main notability page. Many of these guidelines are in place to help reinforce the idea that Wikipedia is not a promotional service, and most of them fall back on whether there are any reliable secondary sources to be had and the amount of documentation available on a topic. For example, if Alice has a website that gets thousands of hits a day, but no one has written about it in any sort of publication, Bob will likely not be able to write a successful Wikipedia article about Alice's site that doesn't get deleted by other editors as being non-notable, or with the short dismissive comment nn.
Similarly, suppose Carla hopes to write about her favorite band, which is much beloved locally but has no major music press. Not only would writing a neutral article be difficult, but also there are no reliable published sources that Carla can use (even if she knows the band's history first-hand).
As in the previous example, notability is something that should be considered in relation to each individual article, rather than whole classes of topics. Some musical groups are certainly notable, as are some companies and some videogames; others are not. The notability guidelines help sort this out.
On the other hand, there are inherent problems with the idea of notability which have led to many ongoing debates over the years on how to phrase and apply the guidelines. Here are some caveats to keep in mind regarding notability:
- Notability may be perishable. Some topics are ephemeral in their interest, such as Internet memes and celebrities in the "famous for being famous" category.
- On Notability: Notability is something that is judged by the world at large, not by Wikipedia editors making personal judgments. If multiple people in the world at large who are independent of the subject have gone to the effort of creating and publishing nontrivial works of their own about the subject, then they clearly consider it to be notable. Wikipedia simply reflects this judgment. (Adapted from User:Uncle G/On notability)
- Notability is not the same as having a fan or someone taking time to research a topic in depth; there must be multiple independent sources.
- The availability of accessible literature in English on any given subject can distort perceptions of notability; biographical facts, in particular, are unevenly accessible, leading to systemic bias, which will be discussed in Chapter 12, Community and Communication.
- Notability is not distinction. It might arise from scandals or participation in controversies, as well as from recognized work such as writing a book.
- Notability in a field is not the same as reputation. Wikipedia will, for example, include cranks who are now discredited but became famous for some reason, but omit solid scientists who are simply not well known.
On that last point, it is obviously flawed to assume that if there's no Wikipedia article, the subject is not notable. Wikipedia is a work in progress, and many worthwhile potential articles have not yet been written.
To sum up, writing a verifiable article without good sources is a bricks-without-straw exercise, and the presence or absence of sources helps determine notability. Thinking about notability helps to keep the project encyclopedic. The notability guideline as applied probably still errs in the direction of inclusion, with a bias toward lesser topics that are well documented elsewhere. This is a natural consequence of a policy evolution that has made reliable sources ever more central.
What Not to Write
There are some article topics that are pretty much always bad ideas. For instance, you can safely assume an article about or described by any of the following is among the category of unnecessary articles:
- You or the organization you work for
- Your band, which has only sold 47 copies of its one album (even if you think it will sell 48—or maybe 49!)
- The religion or language that you made up with your friends in school one day
- The street you live on (unless it is on a Monopoly board)
- Any one of the 56 distinct regions in the Pokémon videogame series
- Your apartment building
- A stunt or trick only you have ever attempted, probably unsuccessfully
- Any movie you made yourself that has never been seen by more people at one time than can fit in your basement
As with other publications and organizations where writing is submitted, plagiarism is not allowed. In addition, any materials submitted to Wikipedia must be specifically licensed under the GNU Free Documentation License (GFDL), which is a "free license" (see Chapter 2, The World Gets a Free Encyclopedia) distinct from traditional copyright. This license means that anyone can reuse and redistribute Wikipedia's content for any purpose without asking permission, as long as they meet certain conditions; Wikipedia content can be used on other sites or even republished in print.
For these reasons, materials taken from other places generally shouldn't appear on Wikipedia. You shouldn't take text or photos from the Internet or elsewhere and reproduce them on Wikipedia without explicit permission; copying any work that is not in the public domain or explicitly licensed as being freely available is a copyright violation.
Additionally, material that was not originally written for Wikipedia (such as a term paper) typically doesn't meet the other content guidelines. It is best, in almost all cases, to simply write the article afresh.
Some non-encyclopedic content is inappropriate for Wikipedia but may be welcome on other sister Wikimedia projects. For instance, definitions of words (without supporting encyclopedic information) are outside of Wikipedia's scope. The jargon used to describe such articles is dicdef, short for dictionary definition. A dictionary definition alone isn't sufficient for a Wikipedia article. However, dictionary definitions are very welcome at Wiktionary, Wikimedia's free dictionary project.
Original reporting of events is also not a part of Wikipedia. You may have been an eyewitness to an event, but writing what you know you saw straight into the encyclopedia probably violates the No Original Research or Verifiability policy. Wikipedia must wait for the mainstream media to report the facts, which it can then collate. On the other hand, original reporting is part of the mission of Wikinews, which is a citizen journalism project.
Similarly, a "how-to" article may not be encyclopedic, but would be just fine over at Wikibooks, Wikimedia's project to write free textbooks.
Original source documents (for example, the text of Coleridge's "Rime of the Ancient Mariner") are not welcome on Wikipedia, but that is because primary sources belong on Wikisource.
These sister projects are fully described in Chapter 16, Wikimedia Commons and Other Sister Projects.
What Wikipedia Is Not
It's sometimes helpful to think about content inclusion guidelines in negative terms. Here is the basic consensus about what Wikipedia is not (adapted from Wikipedia:What Wikipedia is not, shortcut WP:NOT). Taken together, these statements usefully define boundaries applied to Wikipedia's content. They also exist as longer formulations spelled out in policies and guidelines.
- Wikipedia is not an indiscriminate collection of information, a directory, or a dictionary.
It's an encyclopedia (and preferably a well-rounded one) in which criteria such as notability are used to weed out entries. For example, an article titled List of bands beginning with the word "Lemon" was exactly what its title implied: a simple list, without analysis or context, that named the Lemonheads, Lemon Jelly, and a few other bands. It was quickly deleted. Articles on Wikipedia ought to serve some purpose. They should provide something recognizable as "information," concerning something recognizable as a "subject."
On a similar note, Wikipedia doesn't strive to be a Who's Who or a catalog of published works. Family trees and other family histories are not stored on Wikipedia, as much family history is considered "indiscriminate": Being related to someone notable doesn't make a person notable (with the exception of royal families and others where the hereditary principle matters).
- Wikipedia is not a paper encyclopedia.
In particular, Wikipedia does not need to worry about printing costs or physical unwieldiness. It doesn't need to shorten or triage articles to conserve space. As long as there is money to buy servers and bandwidth, there are no physical restrictions on growth.
The implications for coverage are major: "Not worth including" is a decision that need not be made quite as often. This is another reason Wikipedia's model is a dramatic change from earlier encyclopedias. As long as articles conform to the site's other guidelines, specialized or minor articles can be included. Wikipedia has no set restrictions on what branches of human knowledge should be included.
- Wikipedia is not a publisher of original thought, nor a soapbox.
This reiterates the policy of No Original Research: Wikipedia is not interested in personal essays. Indeed, it's a bad platform on which to air personal or political views. If you're looking for a way to get your name and opinions online, many free website and blog providers exist. Reviews of products, companies, and other personal opinions—whether positive or negative—are likewise unwelcome in Wikipedia articles. These are better placed on a website dedicated to reviews.
- Wikipedia is not a mirror, repository of files, a blog, webspace provider, or social networking site.
This might seem like a strange point to make as it is directed not at Wikipedia's articles but at its user pages, the pages editors create for their own working space. (We will cover user pages in Chapter 11, Becoming a Wikipedian.) Anyone can come along and create a user page, but Wikipedia only supplies this working space to allow editors to identify themselves and collaborate more effectively—not to back up unrelated files, publish a blog, or find a potential mate. Wikipedia is a project with a very specific purpose—to create and distribute an encyclopedia. It is not a helpful web application for storing other unrelated information.
- Wikipedia is not a crystal ball.
This is a warning about posting rumor and speculation about future events, such as gossip about films that are currently in production. If it hasn't happened yet, it isn't Wikipedia material (though as with all guidelines, this should be interpreted using common sense: It doesn't mean that the article on the 2012 Summer Olympics should be started only when the opening ceremony gets under way).
- Wikipedia is not censored.
Articles aim at a general and educated adult audience, and Wikipedia is neither simplified, nor is it compiled with regard to the needs or protection of children. While content is intended to be factual, it is also frank, and human sexuality is extensively covered. Religion is treated along the same lines as all other content. Some images in the encyclopedia may be disturbing or shocking.
Thus, some content may be considered offensive or inappropriate for young children. Understandably, this lack of censorship can cause distress—there are many hundreds of articles about topics that many people would prefer not to think about. Considering that the aim is to be a repository of all human information, written by a truly diverse group of people from all over the world, this is unavoidable. And given the policies of Neutral Point of View and Verifiability, Wikipedia is often an excellent source for information on controversial or potentially offensive topics.
Note: Wikipedia, however, should certainly not contain anything defamatory toward individuals. w:Wikipedia:Biographies of living persons (shortcut WP:BLP) sets down strict conditions of inclusion for articles about people. Verifiability and NPOV apply to all topics and are firmly enforced in cases where real lives may be affected. If, by misfortune, you do feel defamed, turn to "Help, an Article About Me Is Incorrect!" on Section 2.4.1, “Help, an Article About Me Is Incorrect!” for specific complaint advice.
- No Blue Pencil, No Free Speech
"No censorship of topics" does not mean that other inclusion policies and behavioral guidelines for onsite interactions can be ignored. Though broadmindedness is highly valued on Wikipedia, nowhere in the policies is there anything about free speech. The site is designed as an encyclopedia project, not as a general forum.
- Wikipedia is not static.
Articles are never set in stone. The encyclopedia is an open-ended work in progress, and Wikipedia articles are, by definition, always provisional. Even the best articles aren't considered off limits for further improvement. This attitude reflects a shared view of knowledge as something that by its nature is dynamic and expanding, rather than settled.
This final point is often left unspoken, but it is key. Changes can always be made, articles can always be improved, and there is always something else to do.
- Further Reading
http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view The NPOV policy
http://en.wikipedia.org/wiki/Wikipedia:No_original_research The NOR policy
http://en.wikipedia.org/wiki/Wikipedia:Verifiability The Verifiability policy
http://en.wikipedia.org/wiki/Wikipedia:Notability The Notability guideline
http://en.wikipedia.org/wiki/Category:Wikipedia_notability_guidelines Various notability guidelines for specific subjects
http://en.wikipedia.org/wiki/Wikipedia:Reliable_sources Guideline for judging reliable sources
http://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not The policy on what Wikipedia is not
All pages on Wikipedia are of two types: About two million articles constitute the encyclopedic content, but ten million project-related pages also exist. What are these pages? Will you see them if you just look something up? If you find them when using a search engine, should you ignore those hits?
Wikipedia's readers should recognize that some Wikipedia pages are not articles, but they do not need to have any particular understanding of the non-article pages and can ignore them freely. On the other hand, involved editors should understand the different types of pages—their purpose and the way they help grease Wikipedia's wheels. The project-related and administrative pages are not as glamorous as articles, but they're of no less importance when it comes to understanding what happens in practice on the site.
3.1. Types of Non-article Pages
These extra pages come in several varieties. Non-article pages are devoted to the administration of Wikipedia, discussion of article content, technical infrastructure, descriptions of images, and the Wikipedia community.
Although they are not as widely known as articles, two of these page types—discussion pages and user pages—are actually the easiest places to start participating on Wikipedia.
- Talk pages
Every article is coupled with a talk page (also called a discussion page), which is accessed by clicking the Discussion tab at the top of the screen. Here editors ask questions about the article's content, propose changes, display notices for other editors, and discuss technical matters (like the title of an article and whether an article should be split into pieces or combined with another).
Each discussion page is meant only for discussing the article it is linked to. Despite the name, discussion pages are not forums for general discussion of the article's subject.
A discussion page is attached to almost every non-article page as well. (Discussions about Wikipedia policy tend to range more widely than discussions about individual articles, but still remain somewhat tied to the topic of the attached page.)
For more on talk pages, see Chapters Chapter 4, Understanding and Evaluating an Article, Chapter 11, Becoming a Wikipedian, and Chapter 12, Community and Communication.
- User pages and user talk pages
User pages are for individual editors (users) to describe themselves in whatever detail they see fit. By custom, they are set aside as a private space where editors can work. Often, editors will list projects they're a part of and articles they've worked on.
User talk pages, like article discussion pages, can be reached by clicking a tab at the top of the screen. To communicate with each other, editors leave notes on user talk pages. Whenever someone leaves a note on your user talk page, Wikipedia's software notifies you. (You'll find more on setting up user pages and leaving messages in Chapter 11, Becoming a Wikipedian.)
The other kinds of pages are typically used as references and project coordination pages.
- Policy pages and guidelines
These pages provide guidance about editing content and interacting with other volunteers. Policies and guidelines lay out stylistic guidelines for editing, content inclusion policies, procedures to resolve disputes, and much more. Policies will be described further in Chapter 13, Policy and Your Input.
- Community discussion, procedural, and project pages
These pages are where the community discusses proposals and coordinates editing projects. Routine procedures, such as deletion discussions, are usually based on policies and are carried out on special procedure pages. These processes will be described more in Chapter 7, Cleanup, Projects, and Processes. On Wikipedia what the community means tends to vary according to context—after all, the site is open to all comers—but often enough, it implies those who take part in these open-forum discussions.
- Help pages
These pages include documentation of editing syntax, technical procedures, and best practices, and are referenced throughout this book.
- Image description pages
Each image is coupled with an image description page. These pages exist to provide the image with a textual description (metadata).
- MediaWiki-generated special pages and administrative pages
These are pages generated on the fly by the MediaWiki software and serve as utilities rather than editable pages. They are used for special lists and essential pages, such as the account creation pages.
Each type of page is distinguished from every other type (including from articles) by a prefix; for example, discussion pages are prefixed with Talk:. This prevents "collisions" between similarly named pages, for example, Sorting, which is an encyclopedia article about the process of arranging items, and Help:Sorting, which is not an encyclopedia article but instead offers technical assistance about the sortable tables found on some Wikipedia pages.
Each prefix is actually an indicator that the page is inside a particular namespace. (A namespace is a kind of container for different types of content.) For example, in this full Wikipedia URL
Talk indicates the namespace where the page exists, whereas Benjamin Franklin, separated from the namespace with a colon (:), is the page's name. If you were internally linking to this URL, you'd use the combination of the namespace and page name to properly indicate what page you meant: Talk:Benjamin Franklin.
Articles, which exist in the so-called main or article space namespace, do not have prefixes:
Benjamin Franklin is the full page name; the absence of a prefix tells you the page is an encyclopedia article.
All other types of content in Wikipedia exist in one of the other namespaces, which are indicated with one of 19 possible prefixes. Seeing a prefix before a title tells you that the page is likely part of the community or administration of the site (and therefore is not subject to the same content guidelines as articles).
The namespace also provides context and indicates the type of content that a page contains. For example, help pages contain technical documentation, rather than (say) encyclopedia articles or policies.
Although two pages in the same namespace cannot share a title, pages can exist under the same "name" in different namespaces. For example, the article Phoebe is about a personal name and is part of the encyclopedic content of the site. It is not the same thing at all as the page User:Phoebe, which exists in the User namespace and describes an editor who uses this name as a pseudonym.
The lines between encyclopedia content, on the one hand, and the Wikipedia community pages, on the other, are extremely clear and are delineated with the use of namespaces. As implemented on Wikipedia, community namespaces do not always exactly correlate with a single specific type of content. For instance, whereas only user pages are in the User namespace, you may find various pages such as technical documentation, community projects, and policies in the Wikipedia namespace. All of these pages, however, will have something to do with the running of Wikipedia.
All Pages in a Namespace
To scan a list of all of the pages in a namespace, click Special Pages in the Toolbox menu on the left-hand sidebar. At the top of the list that appears is the entry All pages. Click that, and a pull-down menu (to select a namespace) and a search box appears. The namespace listing will start at whatever spelling you place in the search box, something very necessary because several namespaces contain millions of pages. (Adapted from Wikipedia:Tip of the day/October 25, 2006)
List of Namespaces
Wikipedia has 20 built-in namespaces. These occur in pairs (for example, User and User_talk); there are nine such pairs, including the main namespace, where page names have no prefix, and two special namespaces, Special and Media. A namespace prefix must be kept when linking to a page. The prefix always comes before the page name and is separated from it with a colon.
Wikipedia runs using MediaWiki software, so all other wikis running on MediaWiki have these namespaces as well. Wikipedia adds two custom namespaces that do not exist on other wikis (Portal and Portal_talk) and has the Wikipedia and Wikipedia_talk namespaces, which may be appropriately renamed on other wikis.
For reference, the following namespaces exist:
- The main or article namespace has no special prefix. This namespace is where all regular articles (all the "encyclopedic" pieces of the encyclopedia) exist. Pages in this namespace can be linked to internally with simply their name: [[pagename]].
- The Wikipedia namespace is what could be called the project page namespace. It is for pages that are specifically about running Wikipedia and meta-level subjects related to the project. For example, the Community Portal can be found at Wikipedia:Community_portal and is meant as a place for the Wikipedia community to gather; Wikipedia:Statistics and its talk page, Wikipedia_talk:Statistics, are meant for describing and discussing the project's statistics. Policies, procedures, guidelines, community projects, and many help pages all exist within the Wikipedia namespace. The Wikipedia namespace may sometimes be abbreviated to WP, enabling shortcuts to be set up. For instance, WP:ARB redirects to Wikipedia:Arbitration_Committee.
- The User namespace refers to user pages or pages that have been set up by individual editors to describe themselves, for example, User:Jimbo Wales. By custom, your user page is available when you register a username.
- The Help namespace refers to basic documentation and help pages for using and editing Wikipedia. The prefix for these is simply Help:. Most of the project documentation pages are here or in the Wikipedia namespace.
- The Category namespace is a major part of expertly using Wikipedia; we discuss categories at length in Chapter 3, Finding Wikipedia's Content and Chapter 8, Make and Mend Wikipedia's Web.
- The Image namespace is prefaced by Image: and is used for describing and attributing images (for example, Image:White shark.jpg). If you upload any image or other media file to Wikipedia, one of these pages will be created. The Media namespace is prefaced by Media: and is used for a link directly to a media file, rather than its description page. Details are in Chapter 9, Images, Templates, and Special Characters.
- The Template namespace is prefaced by Template: and is used exclusively for templates that are transcluded or substituted into an article. You'll find more on templates in Chapter 9, Images, Templates, and Special Characters.
- The Portal namespace is for portal pages that collect articles on a particular topic; this is special to Wikipedia and not generally for MediaWiki. For more on portals, see Chapter 3, Finding Wikipedia's Content and Chapter 7, Cleanup, Projects, and Processes.
- The Talk namespaces contain all the discussion pages. Except for special pages, every namespace has an associated Talk namespace, designated by adding talk: after the normal namespace prefix. In this book, we write these compound names with an underscore to be clear, but you can always use a space. The Talk namespace associated with the main article namespace simply uses the prefix Talk:, for example, Talk:Mathematics. The Talk namespace associated with the User namespace, however, has the prefix User_talk:. Similarly, Wikipedia namespace discussion pages are in the Wikipedia_talk namespace, so the discussion page for Wikipedia:No original research is at Wikipedia_talk:No original research. Generally, pages in the Talk namespaces are used to discuss changes to their corresponding page; however, pages in the User_talk namespace are used to leave messages for a particular user. The User_talk namespace is special in that, whenever a user's talk page is edited, that user (if logged in) will immediately see a message informing them that they have new messages.
- The Special namespace refers to pages that are autocreated by the site's software on demand. These pages are not editable in the usual way and are generally either tools or automatically generated variable lists, such as a list of all pages on the site. See Help:Special page for a list.
- The MediaWiki namespace is used for certain site messages along with a few other areas to define shortcuts and other text strings used around Wikipedia (for example, MediaWiki:Disclaimers). These pages are not usually editable by users.
- Further Reading
- http://en.wikipedia.org/wiki/MediaWiki#Namespaces An article about MediaWiki with a good explanation of namespaces
- http://en.wikipedia.org/wiki/Wikipedia:Namespace The help page on namespaces
- http://en.wikipedia.org/wiki/Help:Special_page A description of each Special namespace page
Summary and What to Read Next
Wikipedia contains a staggering volume and remarkable variety of content, ranging from traditional encyclopedic subjects to articles about popular culture and technical topics.
Even so, every Wikipedia article must meet several criteria related to the site's mission. The most important criteria are the three core policies: Verifiability (V), No Original Research (NOR), and Neutral Point of View (NPOV). A number of further guidelines and corollaries to the major policies, particularly the notability guideline, help define what you should find in Wikipedia and what types of articles are acceptable.
Although there are now over two million articles in the English-language Wikipedia, there are even more pages devoted to the administration and community of the site. These pages, none of which are part of the Wikipedia encyclopedia, include discussion (or talk) pages; user and user talk pages; policy, procedure, and help pages; project administration and community discussion pages; image description pages; and MediaWiki-generated special site-related pages. All of these different kinds of pages are differentiated from each other by namespaces, which are indicated with prefixes that are separated from the page's name with a colon. Articles reside in the main or article namespace and have no special prefix.
In the next chapter, we'll discuss the origins of Wikipedia and how three disparate historical strands—wikis, encyclopedias, and free software—came together to influence the site's development. Skip to Chapter 3, Finding Wikipedia's Content to explore the structure of Wikipedia and learn better ways to search and browse the site or to Chapter 4, Understanding and Evaluating an Article to learn how to evaluate an individual article.