HyperText Markup Language (HTML) is a markup language used to create the content and semantic structure of Web pages. A Web page is comprised of a number of HTML elements, each of which has a particular meaning in the context of a Web page. Some elements are stand-alone, while others can be nested to create increasingly complex structure for your content. Web browsers interpret HTML to build the content of a page, and interpret that HTML in the context of Cascading Style Sheets (CSS) that affect the visual appearance of that content.
This article serves as an introduction to HTML, and will help you learn how it works and how to construct a basic web page. If you've wondered how clicking on one part of a page can lead you to another, or to create lists of bulleted text, this article is the place to begin your journey through learning HTML.
Elements - the basic building blocks
The basic unit of information in HTML is conveyed by the element. Elements basically answer the question "what kind of information is this?" and define the semantic meaning of their content. Some elements have very precise meaning, as in "this is an image," "this is a header," or "this is an ordered list," while some are less specific as in "this is a part of the page" or "this is a part of the text", and yet others are used for technical reasons. But in some way or the other they all have a semantic value.
Most elements may contain other elements, forming a hierarchic structure, a tree, called the DOM - the Document Object Model. There are a little bit more than 100 elements defined in HTML.
HTML uses plain text as a foundation and attaches special meaning to anything that starts with the less than sign (<) and ends with the greater than sign (>). Such markup is called a tag. Here is a simple example:
<p>This is text within a paragraph.</p>
In that example there is a start tag and a closing tag. Closing tags use the same tag name as the starting tag, but also contain a forward slash immediately after the leading less than sign. Most elements in HTML are written using both start and closing tags. Start and closing tags should be properly nested, that is closing tags should be written in the opposite order of the start tags. Proper nesting is one rule that must be obeyed in order to write valid code.
This is an an example of valid code:
<em>I <strong>really</strong> mean that</em>.
This is an example of invalid code:
Invalid: <em>I <strong>really</em> mean that</strong>.
Until the adoption of the HTML5 parsing rules, browsers didn't interpret invalid code in the same way and produced different results when facing invalid code. Browsers were forgiving to Web authors, but unfortunately not all in the same way, resulting in almost unpredictable results in case of invalid HTML. These days are over with the latest evolution of browsers, like Internet Explorer 10, Firefox 4, Opera 11.60, Chrome 18 or Safari 5, as they implement the now standard invalid-code-parsing rules. Invalid code results in the same DOM tree on all modern browsers.
Some elements however have no text content nor contain any other elements. These are empty elements and need no closing tag. This is an example:
Some like to mark up empty elements using a trailing forward slash, which is mandatory in XHTML. In HTML this slash has no technical functionality and using it is a pure stylistic choice.
<meta charset="utf-8" />
The start tag may contain additional information, as in the preceding example. Such information is called an attribute. Attributes usually consist of 2 parts:
- An attribute name.
- An attribute value.
A few attributes can only have one value. They are Boolean attributes and may be shortened to the attribute name only or to having an empty attribute value. Thus, the following 3 examples have the same meaning:
<input required="required"> <input required=""> <input required>
Attribute values that consist of a single word or number may be written as they are, but as soon as there are two or more strings of characters in the value, it must be written within quote marks. Both single quotes (') and double quotes (") are OK. Many developers prefer to always use quotes to make the code less ambiguous to the eye and to avoid mistakes. The following is such a mistake:
<p class=foo bar> (Beware, this probably does not mean what you think it means.)
In this example the value was supposed to be "foo bar", but since there were no quotes the code is interpreted as if it had been written like this:
<p class="foo" bar="">
Named character references
Since the less than and greater than sign have special meaning one must (usually) use a named character reference, often casually named an entity, to mark up those signs in plain text. Entities may be written using names, decimal numbers or hexadecimal numbers. (More on that later.) There are four basic named entities one must know:
>denotes the greater, "
>", than sign.
<denotes less, "
<", than sign.
&denotes the ampersand, "
"denotes double quote, ' " ', sign.
There are many more entities in the latest version of the standard, but these 4 are absolutely fundamental, since they represent characters that have a special meaning in HTML. Others are allowed to represent characters that are not part of the character set of the Web document.
Doctype and comments
In addition to tags, text content and entities, a document should always contain a doctype at the very top. In modern HTML this is written like this:
The doctype has a long and intricate history, but for now all you need to know is that this doctype tells the browser to interpret the HTML and CSS code according to standards and not try to pretend that it is Internet Explorer from the 90's. (See quirks mode.)
To help you remember what you are doing while developing code or communicating with other developers, your code may also contain comments. Comments in HTML start with less than + exclamation mark + 2 minus signs (<!--) and end with 2 minus signs + greater than.
<!-- This is comment text -->
A complete but small document
Putting this together here is a tiny example of an HTML-document. The document structure and the elements will be explained in a later article. Go ahead and write this code in a text editor, save it as myfirstdoc.html and load it in a browser. Make sure you are saving it using the character encoding UTF-8. Since this document uses no styling it will look very plain, but it is only a small start.
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <title>A tiny document</title> </head> <body> <h1>Main heading in my document</h1> <!-- Note that it is "h" + "one", not "h" + the letter "l" --> <p>Loook ma, I am coding <abbr title="Hyper Text Markup Language">HTML</abbr>.</p> </body> </html>