HyperText Markup Language/Introduction

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The HyperText Markup Language (HTML) is a simple data format used to create hypertext documents that are portable from one platform to another. The HTML (HyperText Markup Language) is used in most pages of the World Wide Web. HTML files contain both the primary text content and additional formatting markup, i.e. sequences of characters that tell web browsers how to display and handle the main content. The markup can specify which parts of text should be bold, where the headings are, or where tables, table rows, and table cells start and end. Though most commonly displayed by a visual web browser, HTML can also be used by browsers that generate audio of the text, by braille readers that convert pages to a braille format, and by accessory programs such as email clients.

Before we start[edit | edit source]

To author and test HTML pages, you will need an editor and a web browser. HTML can be edited in any plain text editor. Ideally, use one that highlights HTML markup with colors to make it easier to read. Common plain text editors include Notepad (or Notepad++) for Microsoft® Windows, TextEdit for Mac, and Kate, Gedit, Vim, and Emacs for Linux.

Many others editors exist with a wide range of features. While some offer WYSIWYG (what you see is what you get) functionality, that means hiding the markup itself and having to auto-generate it. WYSIWYG options are rarely as clean or transparent as manually written code. Furthermore WYSIWYG is not nearly as useful for learning compared with real code-based text editors.

To preview your documents, you'll need a web browser. To assure most viewers will see good results, ideally you will test your documents in several browsers. Each browser has slightly different rendering and particular quirks.

The most common browsers include Microsoft Edge, Google Chrome, Mozilla Firefox, Safari, and Opera. To assure that your documents are readable in a text-only environment, you can test with Lynx.

A simple document[edit | edit source]

Let's start with a simple document. Write this code in your editor (or copy-and-paste it), and save it as "index.html" or "index.htm". The file must be saved with the exact extension, or it will not be rendered correctly.

<!DOCTYPE html>
<html>
  <head>
    <title>Simple document</title>
  </head>
  <body>
    <p>This is some text in a paragraph that will be seen by viewers.</p>
  </body>
</html>

Now open the document in your browser and look at the result. From the above example, we can deduce certain essentials of an HTML document:

  • The first line with <!DOCTYPE html> declares the type of the document.
  • The HTML document begins with a <html> tag and ends with its counterpart, the </html> tag.
  • Within the <html></html> tags, there are two main pairs of tags, <head></head> and <body></body>.
  • Within the <head></head> tags, there are the <title></title> tags which enclose the textual title to be shown in the title bar of the web browser.
  • Within the <body></body> is a paragraph marked by a <p></p> tag pair.

General HTML tag code style[edit | edit source]

  • Most tags must be written in pairs between which the effects of the tag will be applied.
    • <em>This text is emphasized</em> – This text is emphasized
    • This text includes <code>computer code</code> – This text includes computer code
    • <em>This text is emphasized and has <code>computer code</code></em> – This text is emphasized and has computer code
  • HTML tag pairs must be aligned to encapsulate other tag pairs, for example:
    • <code><em>This text is both code and emphasized</em></code> – This text is both code and emphasized
    • A mistake: <em><code>This markup is erroneous</em></code>

The <html> Tag[edit | edit source]

The <html> and </html> tags are used to mark the beginning and end of an HTML document. This tag does not have any effect on the appearance of the document.
This tag is used to make browsers and other programs know that this is an HTML document.

Useful attributes:

dir attribute
This attribute specifies in which manner the browser will present text within the entire document. It can have values of either ltr (left to right) or rtl (right to left). By default this is set to ltr. Generally rtl is used for languages like Persian, Chinese, Hebrew, Urdu etc.

Example: <html dir="ltr">

lang attribute
The lang attribute generally specifies which language is being used within the document.

Special types of codes are used to specify different languages:
en - English
fa - Farsi
fr - French
de - German
it - Italian
nl - Dutch
el - Greek
es - Spanish
pt - Portuguese
ar - Arabic
he - Hebrew
ru - Russian
zh - Chinese
ja - Japanese
hi - Hindi

Example: <html lang="en">