The document that a web server is written in the Hyper Text Markup Model language. Chrome allows you to view the HTML source of any page by using the view-source:// protocol instead of http://.
You can view the AdamBots' homepage source here.
You will notice many of the < and > signs on the document, as well as a lot of text which doesn't appear in the page in your browser. This is the HTML code. Lucky for developers who use HTML, HTML is human readable. That means that it is reasonable for you to simply type HTML (once you know it); it does not use any very special characters and the names of things are all based on English.
Still, like any computer language, HTML requires a very strict and highly annotated format in order for the computer to understand. Details matter—computer code is never ambiguous. If what is written seems ambiguous to a human, it is probably not right.
Not enclosed.<tag>Enclosed content.</tag>Not enclosed.
The bolded portions here are the tags. The one without a slash is an opening tag and the one with the slash is the closing tag. We call the combination of the opening tag, closing tag, and enclosed content an HTML element. The "tag" element here is shown in blue. An element is closed if it has a closing element.
Most elements require both an opening tag and a closing tag. For the most part, tags must come in pairs; the second one in the pair needs a slash as the first character after the <.
The name or type of a tag is the first word inside of its definition. The capitalization doesn't matter to the computer, but you should pick one style and use it throughout. Almost all websites use either full capitalization (TAG) or no capitalization (tag), and most use the lowercase style.
HTML elements may contain other HTML elements by enclosing them:
<parent>
This can be represented in a way similar to a family tree:
<one>Hi</one>
<two>Hello</two>
</parent>
parent / \ one two / \ "Hi" "Hello"We take the terms parent, sibling, and child to describe how elements are related in this tree.
"parent" is the parent of "one".
"one" and "two" are the children of "parent".
"one" and "two" are considered siblings (they share the same parent).
We <B> hold <U> these </B> truths </U> to beBefore closing an element, all of its children MUST be closed. (It makes sense why you can't do this when you try to draw the "family tree" for the above)
The whole HTML document continues in this way. It has several set "roots" that you must include. Here is the minimum page which is valid (correct) HTML:
<!doctype html>
<html>
<head>
<title>Title of Page</title>
</head>
<body>
This is content that appears
on your page.
</body>
</html>
Notice that the !doctype tag does not have a corresponding closing tag. The technical term for this is a void tag. What this means is that it does not make sense for this tag to have any children (a more obvious example of such a tag is an image) so it does not have a closing tag that would allow you the opportunity of giving it any—the closing tag is implied.
This is one of the differences from XML, a language which HTML is similar to (HTML is SGML). XHTML is a (correct) variant of HTML which is also valid XML and treats void tags, among other things, differently)
The head tag does not actually appear on the page in your web browser. Instead, it defines information about the page and also links in other files like stylesheets and scripts.
For example, the head element contains the title element. The title element is not allowed to have any children, only text. It specifies what appears in the program's title or tab's title (HTML - Web Technology for this page). This is the only required element in HTML other than body and head.
The body tag contains all of the content of a page. Only things inside of the body tag can appear on screen.
Tags can have extra information supplied to them about what they are, how they look, and how they behave. For example, the URL a link takes you to is an attribute of the link.
<a href="http://google.com">Go to Google</a>
The href= is specifying the attribute "href" I'm pretty sure these means hypertext reference. Attributes are specified after the tag's name is stated, with spaces separating them and quotes* around the values they specify. There are a handful of attributes which are only present, and lack the equals and quotation marks. Here is what it looks like:
<tag atrib="hi" novalue width='97'></tag>
The order of attributes does not matter.
If for some reason your attributes require to use quotes themselves, there are two options. 1) Escape the quote using a backslash \ or 2) use the opposite variety of quotes (single vs double). It is much better to use a single style of quotes, so option 1) should be the default.
<tag attribute="I love \"quotes\"">
Note the quotes in blue. Also be careful that because of this the backslash is a special character inside of attributes. To specify the backslash itself, you need to use two: "\\".
I'm using \\\/// some "quotes" here
A "character" is a single symbol. In English, we have spaces, punctuation marks, letters, numbers, and a bunch of other operators. Foreign-script characters (中国普通话 / العربية / עברית / मराठी / 日本の / 한국의 / русский / español / français) all complicate this. In fact, several of these listed here are actually showing wrong because they're missing important characters, which you can't see. The modern standardization of characters is accomplished by the Unicode standard. It is Unicode because it is universal- any character you would ever need can be found in it. For instance, Egyptian heiroglyphics: �. Or Byzantine musical notation: �. Chances are your font doesn't support those- but the underlying characters are distinct.
Here's some characters which are important but you may not think about.
Here we have an example sentence. Do not understate the importance of the non-breaking space, despite its subtle behavior.
If we wanted the phrase in bold to remain on one line, we can replace the space in "non-breaking space" with a non-breaking version:
Here we have an example sentence. Do not understate the importance of the non-breaking space, despite its subtle behavior.
We have introduced a new problem, putting "non-" on its own line. We'll replace the hyphen (-) with the non-breaking hyphen (‑) and try again:
Here we have an example sentence. Do not understate the importance of the non‑breaking space, despite its subtle behavior.
Certain characters are somewhat special. You will find difficulty putting the < into an HTML document. Why? Because the browser has to assume you're trying to start writing a tag, even though you aren't! There is a fix for this issue, and it involves character entities.
To write a <, simply write < instead. The ampersand (&) has a special purpose in HTML: it denotes character entities like <. Here's a bunch with simple English names:
These correspond to the characters < > & " ' non-breaking space and soft (or shy) hyphen, respectively. You may find these useful. There are others, but beyond these, few are frequently useful.
If you ever want to hyphenate a word to split it across two lines, it's preferable to use the soft-hyphen. The hyphen is displayed only when the word is actually split across two lines- this way, if the layout of the page changes and the word is no longer split, the hyphen won't be visible. That said, avoid doing this too, because the browser's text layout is usually good enough anyway.
Every character in UNICODE is also assigned a unique numeric identifier. You may run into these occasionally. As HTML entities, they are written thus: Ӓ or 覯 where the latter is a hexadecimal number.
You've already seen some tags. Here is a big list, and a short explanation of what they do.
The "div" tag, short for "divider" divides the page into pieces. As a default, a div tag looks no different from its contents. Essentially, they are invisible containers capable of setting the width and height of their contents, along with a few other features when you style them. Divs are boxes for content, not paragraphs. A line break occurs after each one.
The "p" tag, short for "paragraph" is what it sounds like. A paragraph of text should go inside. They are very similar to div tags, with a few differences. p tags should not contain p tags. They should be limited to one paragraph. There is a line break after each one.
The "a" tag, short for "anchor" functions primary as hyperlinks. They are also the sub-titles you see on sites like Wikipedia when you follow a link to a particular spot on a page. Their most important attribute is href which points to where they will send you. Any content inside of an a tag, such as text or images, will be clickable. They are completely inline.
The "img" tag, short for "image" shows an image, as you would expect. The location of the image file is given by the src attribute. You can also set width and height attributes (whose units are pixels), which do what you'd expect. If you set only one of them, the other dimension of the image will be scaled to match the image's ratio. It is typical, however, for width/height to be set in style information, and thus omitted from the HTML entirely.
Images are self-closing. See the <br> tag.
The b tag bolds its contents. Avoid use of this tag- it was originally the only effective way to bold text. To bold for emphasis, use the strong tag instead. You should use b when there is a typographic reason to bold specifically, such as following a particular citation style or brand standard.
The strong tag, by default, makes text appear bold. However, it is also semantic, meaning that it carries meaning. strong is more accessible to the blind, for instance, because a semantic accessibility-enabled web-browser will know to increase the volume of text spoken aloud inside a strong tag. Do not use the strong tag when you are making text bold to match a brand or typographic standard.
The i tag italicizes its contents. Avoid use of this tag- it was originally the only effective way to italicize text. To italicize for emphasis, use the em tag instead. You should use i when there is a typographic reason to italicize specifically, such as following a particular citation style or brand standard.
The em tag, short for "emphasis", by default, makes text appear italicized. However, it is also semantic, meaning that it carries meaning. em is more accessible to the blind, for instance, because a semantic accessibility-enabled web-browser will know to change the tone of text spoken aloud inside an em tag. Do not use the em tag when you are making text italicized to match a brand or typographic standard.
The "span" tag "spans" a piece of content. They are by default totally invisible and do not affect layout in any way. Their purpose is to allow you to style or select a piece of content. A span makes this text red. Spans are discouraged, however, because they divorce content and appearance. Semantic elements, such as strong, em, b, i, or u.
The "script" tag encloses a script, which is a small program written (almost always) in JavaScript. Scripts perform almost all behavior on a page, in particular animation or interactivity. (Modern styling and forms allow for dynamic content without scripts). If you do not know how to program, do not modify the contents of the script in any way. Scripts are special in that there are no nested tags within them: until the "</script>" is found, HTML parsing is effectively ignored.
These stand for "unordered list" and "ordered list" respectively. They are supposed to contain only <li> ("list item") elements, which are the items in the list. Ordered lists are numbered, while unordered lists are bulleted. There are a couple unordered lists on this page.
"br" is short for "break." They are linebreaks. Inserted essentially anywhere, they cause text to move onto the next line. They are "self-closing" which means that they do not have a closing tag. In modern HTML5, that means that the complete line break can be written as either
<br/>(a proper self closing tag) or simply
<br>(since they are not allowed to contain content anyway, the self-close is implied).
"hr" is short for "horizontal rule." They act like a line break, but also draw a line across the page, like this:
They are semantic intended to divide sections of a document. We abuse them on the side to look like strips cut out of the background.
Tables are complicated. Getting them right is very, very hard. Historically, tables were used excessively to control the appearance of content on a page, because they were accidentally the easiest way to lay the page out. This practice is now severely frowned upon, and has mostly died out.
Fully describing the table rules takes far too much time. Here's an abridged version.
Tables contain a series of tr tags (short for "table row") which each contain the same number of td tags "table data". td tags contain whatever cell content you would like.
Occasionally, tables will also contain thead and tbody and tfoot tags. These are the head, body, and foot of the table, and are semantic. They are all optional, but occasionally necessary.
See our students page for an example of a table.
Avoid writing tables yourself. Get them written by a script for you. We will provide assistance with these scripts for the student / mentor pages, because writing the HTML is tedious and the Wordpress editor really messes it up.