HTML

The document that a web server is written in the Hyper Text Markup Model language. Chrome allows you to view the HTML source of any page by using the view-source:// protocol instead of http://.

You will notice many of the < and > signs on the document, as well as a lot of text which doesn't appear in the page in your browser. This is the HTML code. Lucky for developers who use HTML, HTML is human readable. That means that it is reasonable for you to simply type HTML (once you know it); it does not use any very special characters and the names of things are all based on English.

Still, like any computer language, HTML requires a very strict and highly annotated format in order for the computer to understand. Details matter—computer code is never ambiguous. If what is written seems ambiguous to a human, it is probably not right.

HTML's Format

Tags

Most elements require both an opening tag and a closing tag. For the most part, tags must come in pairs; the second one in the pair needs a slash as the first character after the <.

The name or type of a tag is the first word inside of its definition. The capitalization doesn't matter to the computer, but you should pick one style and use it throughout. Almost all websites use either full capitalization (TAG) or no capitalization (tag), and most use the lowercase style.

The Underline Tag The underline tag is U. It makes text underlined.^{Note that you are not supposed to actually use the U tag for making underlined text unless there is a non-style reason to do soo, for example, to underline a book's name or web address in MLA style. For the purposes of testing and playing with HTML feel free to use it.}

The Bold Tag The bold tag is B. It makes text bold.^{The same style suggestion for U applies to B}

The Italic Tag The italic tag is I. It makes text italicized.^{The same style suggestion for U and B applies to I}

The Document

HTML elements may contain other HTML elements by enclosing them:


		<parent>

		<one>Hi</one>

		<two>Hello</two>

		</parent>

This can be represented in a way similar to a family tree:

"parent" is the parent of "one".
"one" and "two" are the children of "parent".
"one" and "two" are considered siblings (they share the same parent).

HTML tags must group in the correct order. The following is NOT valid:

We <B> hold <U> these  </B> truths </U> to be

Before closing an element, all of its children MUST be closed. (It makes sense why you can't do this when you try to draw the "family tree" for the above)

Reproduce the following with HTML:
We the people of the United States do ordain and establish this Constitution

Reproduce the following with HTML:
Plain All Underlined Bold Bold-Italic Italic Still Underlined

The whole HTML document continues in this way. It has several set "roots" that you must include. Here is the minimum page which is valid (correct) HTML:


	<!doctype html>

	<html>

	<head>

	<title>Title of Page</title>

	</head>

	<body>

	This is content that appears

	on your page.

	</body>

	</html>

The !DOCTYPE The !DOCTYPE tag is special. It must be the first tag in the document, and also is self-closing (does not need a second tag to close it—a closing tag here is wrong). It is not really a tag. It must appear as written here at the beginning of a document to specify that the document is a modern, HTML5 document.

The HTML Tag The HTML tag covers the whole HTML document. It includes a head and at least one body.
Since nothing else is allowed to be over the whole document, the closing tag is actually optional.

The Head Tag The HEAD tag defines the "head" of the document (the stuff that's not the "body"). It holds information about the page and lists resources the page uses.
Since the only allowed sibling of the head tag is body, its closing tag is actually optional and implied by the openning tag of body.

The Title Tag The TITLE tag defines the title of a page. This appears in the title of the web browser, and in the tab. It is required.

The Body Tag The BODY tag defines the "body" (content) of the document. Anything visible must be part of the body tag. Since no elements (except other body tags) are allowed to follow a body in a document, the closing tag is also optional.

Notice that the !doctype tag does not have a corresponding closing tag. The technical term for this is a void tag. What this means is that it does not make sense for this tag to have any children (a more obvious example of such a tag is an image) so it does not have a closing tag that would allow you the opportunity of giving it any—the closing tag is implied.
^{^{This is one of the differences from XML, a language which HTML is similar to (HTML is SGML). XHTML is a (correct) variant of HTML which is also valid XML and treats void tags, among other things, differently)}}

The head tag does not actually appear on the page in your web browser. Instead, it defines information about the page and also links in other files like stylesheets and scripts.

For example, the head element contains the title element. The title element is not allowed to have any children, only text. It specifies what appears in the program's title or tab's title (HTML - Web Technology for this page). This is the only required element in HTML other than body and head.

The body tag contains all of the content of a page. Only things inside of the body tag can appear on screen.

Comments Comment tags are special HTML tags. The computer ignores them, and so they don't display. Most websites contain very few comments, but sometimes they are useful for remembering how or why something in particular is done. Comments look like this:  The only limitation on the content of comments is that they can't contain any repeated hyphens and shouldn't contain the closing >.

Attributes

Tags can have extra information supplied to them about what they are, how they look, and how they behave. For example, the URL a link takes you to is an attribute of the link.

The Anchor Tag (Links) The anchor tag, A, defines links in HTML. It has the HREF attribute which specifies where the link goes to. Note that HREF requires the http:// or https:// when referring absolutely to a place.

The href= is specifying the attribute "href" ^{I'm pretty sure these means hypertext reference}. Attributes are specified after the tag's name is stated, with spaces separating them and quotes* around the values they specify. There are a handful of attributes which are only present, and lack the equals and quotation marks. Here is what it looks like:


		
<tag atrib="hi" novalue width='97'></tag>

If for some reason your attributes require to use quotes themselves, there are two options. 1) Escape the quote using a backslash \ or 2) use the opposite variety of quotes (single vs double). It is much better to use a single style of quotes, so option 1) should be the default.


		<tag attribute="I love \"quotes\"">

Note the quotes in blue. Also be careful that because of this the backslash is a special character inside of attributes. To specify the backslash itself, you need to use two: "\\".

Make a link which goes to the (nonsense) website: I'm using \\\/// some "quotes" here

HTML Entities

A "character" is a single symbol. In English, we have spaces, punctuation marks, letters, numbers, and a bunch of other operators. Foreign-script characters (中国普通话 / العربية / עברית / मराठी / 日本の / 한국의 / русский / español / français) all complicate this. In fact, several of these listed here are actually showing wrong because they're missing important characters, which you can't see. The modern standardization of characters is accomplished by the Unicode standard. It is Unicode because it is universal- any character you would ever need can be found in it. For instance, Egyptian heiroglyphics: �. Or Byzantine musical notation: �. Chances are your font doesn't support those- but the underlying characters are distinct.

Certain characters are somewhat special. You will find difficulty putting the < into an HTML document. Why? Because the browser has to assume you're trying to start writing a tag, even though you aren't! There is a fix for this issue, and it involves character entities.

To write a <, simply write < instead. The ampersand (&) has a special purpose in HTML: it denotes character entities like <. Here's a bunch with simple English names:

These correspond to the characters < > & " ' non-breaking space and soft (or shy) hyphen, respectively. You may find these useful. There are others, but beyond these, few are frequently useful.

If you ever want to hyphenate a word to split it across two lines, it's preferable to use the soft-hyphen. The hyphen is displayed only when the word is actually split across two lines- this way, if the layout of the page changes and the word is no longer split, the hyphen won't be visible. That said, avoid doing this too, because the browser's text layout is usually good enough anyway.

Every character in UNICODE is also assigned a unique numeric identifier. You may run into these occasionally. As HTML entities, they are written thus: Ӓ or 覯 where the latter is a hexadecimal number.

Reproduce the following with HTML entities:
&><& "
(note that the space is non-breaking)

Important Tags

You've already seen some tags. Here is a big list, and a short explanation of what they do.

<div>

The "div" tag, short for "divider" divides the page into pieces. As a default, a div tag looks no different from its contents. Essentially, they are invisible containers capable of setting the width and height of their contents, along with a few other features when you style them. Divs are boxes for content, not paragraphs. A line break occurs after each one.

<p>

The "p" tag, short for "paragraph" is what it sounds like. A paragraph of text should go inside. They are very similar to div tags, with a few differences. p tags should not contain p tags. They should be limited to one paragraph. There is a line break after each one.

<a>

The "a" tag, short for "anchor" functions primary as hyperlinks. They are also the sub-titles you see on sites like Wikipedia when you follow a link to a particular spot on a page. Their most important attribute is href which points to where they will send you. Any content inside of an a tag, such as text or images, will be clickable. They are completely inline.

<img>

The "img" tag, short for "image" shows an image, as you would expect. The location of the image file is given by the src attribute. You can also set width and height attributes (whose units are pixels), which do what you'd expect. If you set only one of them, the other dimension of the image will be scaled to match the image's ratio. It is typical, however, for width/height to be set in style information, and thus omitted from the HTML entirely.

<b>

The b tag bolds its contents. Avoid use of this tag- it was originally the only effective way to bold text. To bold for emphasis, use the strong tag instead. You should use b when there is a typographic reason to bold specifically, such as following a particular citation style or brand standard.

The strong tag, by default, makes text appear bold. However, it is also semantic, meaning that it carries meaning. strong is more accessible to the blind, for instance, because a semantic accessibility-enabled web-browser will know to increase the volume of text spoken aloud inside a strong tag. Do not use the strong tag when you are making text bold to match a brand or typographic standard.

<i>

The i tag italicizes its contents. Avoid use of this tag- it was originally the only effective way to italicize text. To italicize for emphasis, use the em tag instead. You should use i when there is a typographic reason to italicize specifically, such as following a particular citation style or brand standard.

<em>

The em tag, short for "emphasis", by default, makes text appear italicized. However, it is also semantic, meaning that it carries meaning. em is more accessible to the blind, for instance, because a semantic accessibility-enabled web-browser will know to change the tone of text spoken aloud inside an em tag. Do not use the em tag when you are making text italicized to match a brand or typographic standard.

<span>

The "span" tag "spans" a piece of content. They are by default totally invisible and do not affect layout in any way. Their purpose is to allow you to style or select a piece of content. A span makes this text red. Spans are discouraged, however, because they divorce content and appearance. Semantic elements, such as strong, em, b, i, or u.

The "script" tag encloses a script, which is a small program written (almost always) in JavaScript. Scripts perform almost all behavior on a page, in particular animation or interactivity. (Modern styling and forms allow for dynamic content without scripts). If you do not know how to program, do not modify the contents of the script in any way. Scripts are special in that there are no nested tags within them: until the "</script>" is found, HTML parsing is effectively ignored.

These stand for "unordered list" and "ordered list" respectively. They are supposed to contain only <li> ("list item") elements, which are the items in the list. Ordered lists are numbered, while unordered lists are bulleted. There are a couple unordered lists on this page.

<br>

"br" is short for "break." They are linebreaks. Inserted essentially anywhere, they cause text to move onto the next line. They are "self-closing" which means that they do not have a closing tag. In modern HTML5, that means that the complete line break can be written as either

<hr>

"hr" is short for "horizontal rule." They act like a line break, but also draw a line across the page, like this:

They are semantic intended to divide sections of a document. We abuse them on the side to look like strips cut out of the background.

<table>

Tables are complicated. Getting them right is very, very hard. Historically, tables were used excessively to control the appearance of content on a page, because they were accidentally the easiest way to lay the page out. This practice is now severely frowned upon, and has mostly died out.

Fully describing the table rules takes far too much time. Here's an abridged version.

Tables contain a series of tr tags (short for "table row") which each contain the same number of td tags "table data". td tags contain whatever cell content you would like.

Occasionally, tables will also contain thead and tbody and tfoot tags. These are the head, body, and foot of the table, and are semantic. They are all optional, but occasionally necessary.

Avoid writing tables yourself. Get them written by a script for you. We will provide assistance with these scripts for the student / mentor pages, because writing the HTML is tedious and the Wordpress editor really messes it up.