htmlcss.tex

\chapter{HTML and CSS}\label{s:htmlcss}

HTML is the standard way to represent documents for presentation in web browsers,
and CSS is the standard way to describe how it should look.
Both are more complicated than they should have been,
but in order to create web applications,
we need to understand a little of both.

\section{Formatting}\label{s:htmlcss-formatting}

An HTML \gref{g:document}{document} contains \gref{g:element}{elements} and text
(and possibly other things that we will ignore for now).
Elements are shown using \gref{g:tag}{tags}:
an opening tag \texttt{{\textless}tagname{\textgreater}{}} shows where the element begins,
and a corresponding closing tag \texttt{{\textless}/tagname{\textgreater}{}} (with a leading slash) shows where it ends.
If there's nothing between the two, we can write \texttt{{\textless}tagname/{\textgreater}{}} (with a trailing slash).

A document's elements must form a \gref{g:tree}{tree},
i.e.,
they must be strictly nested.
This means that if Y starts inside X,
Y must end before X ends,
so \texttt{{\textless}X{\textgreater}{}...{\textless}Y{\textgreater}{}...{\textless}/Y{\textgreater}{}{\textless}/X{\textgreater}{}} is legal,
but \texttt{{\textless}X{\textgreater}{}...{\textless}Y{\textgreater}{}...{\textless}/X{\textgreater}{}{\textless}/Y{\textgreater}{}} is not.
Finally,
every document should have a single \gref{g:root-element}{root element} that encloses everything else,
although browsers aren't strict about enforcing this.
In fact,
most browsers are pretty relaxed about enforcing any kind of rules at all,
since most people don't obey them anyway.

\section{Text}\label{s:htmlcss-text}

The text in an HTML page is normal printable text.
However,
since \texttt{{\textless}} and \texttt{{\textgreater}{}} are used to show where tags start and end,
we must use \gref{g:escape-sequence}{escape sequences} to represent them,
just as we use \texttt{\textbackslash{}"} to represented a literal double-quote character
inside a double-quoted string in JavaScript.
In HTML,
escape sequences are written \texttt{\&name;},
i.e.,
an ampersand, the name of the character, and a semi-colon.
A few common escape sequences are shown in \tblref{t:htmlcss-escapes}.


\begin{table}
\begin{longtable}{lll}
Name & Escape Sequence & Character\tabularnewline
Less than & \texttt{\&lt;} & {\textless}\tabularnewline
Greater than & \texttt{\&gt;} & {\textgreater}{}\tabularnewline
Ampersand & \texttt{\&amp;} & \&\tabularnewline
Copyright & \texttt{\&copy;} & ©\tabularnewline
Plus/minus & \texttt{\&plusmn;} & ±\tabularnewline
Micro & \texttt{\&micro;} & µ\tabularnewline
\end{longtable}
\caption{HTML Escapes}
\label{t:htmlcss-escapes}
\end{table}

The first two are self-explanatory,
and \texttt{\&amp;} is needed so that we can write a literal ampersand
(just as \texttt{\textbackslash{}\textbackslash{}} is needed in JavaScript strings so that we can write a literal backslash).
\texttt{\&copy;}, \texttt{\&plusmn;}, and \texttt{\&micro;} are usually not needed any longer,
since most editors will allow us to put non-ASCII characters directly into documents these days,
but occasionally we will run into older or stricter systems.

\section{Pages}\label{s:htmlcss-pages}

An HTML page should have:

\begin{itemize}
\item
  a single \texttt{html} element that encloses everything else
\item
  a single \texttt{head} element that contains information about the page
\item
  a single \texttt{body} element that contains the content to be displayed
\end{itemize}

It doesn't matter whether or how we indent the tags showing these elements and the content they contain,
but laying them out on separate lines
and indenting to show nesting
helps human readers.
Well-written pages also use comments, just like code:
these start with \texttt{{\textless}!-\/-} and end with \texttt{-\/-{\textgreater}{}}.
Unfortunately,
comments cannot be nested,
i.e.,
if you comment out a section of a page that already contains a comment,
the results are unpredictable.

Here's an empty HTML page with the structure described above:

\begin{minted}{html}
<html>
  <head>
    <!-- description of page goes here -->
  </head>
  <body>
    <!-- content of page goes here -->
  </body>
</html>
\end{minted}

Nothing shows up if we open this in a browser,
so let's add a little content:

\begin{minted}{html}
<html>
  <head>
    <title>This text is displayed in the browser bar</title>
  </head>
  <body>
    <h1>Displayed Content Starts Here</h1>
    <p>
      This course introduces core features of <em>JavaScript</em>
      and shows where and how to use them.
    </p>
    <!-- The word "JavaScript" is in italics (emphasis) in the preceding paragraph. -->
  </body>
</html>
\end{minted}

\begin{itemize}
\item
  The \texttt{title} element inside \texttt{head} gives the page a title.
  This is displayed in the browser bar when the page is open,
  but is \emph{not} displayed as part of the page itself.
\item
  The \texttt{h1} element is a level-1 heading;
  we can use \texttt{h2}, \texttt{h3}, and so on to create sub-headings.
\item
  The \texttt{p} element is a paragraph.
\item
  Inside a heading or a paragraph,
  we can use \texttt{em} to \emph{emphasize} text.
  We can also use \texttt{strong} to make text \textbf{stronger}.
  Tags like these are better than tags like \texttt{i} (for italics) or \texttt{b} (for bold)
  because they signal intention rather than forcing a particular layout.
  Someone who is visually impaired, or someone using a small-screen device,
  may want emphasis of various kinds displayed in different ways.
\end{itemize}

\section{Attributes}\label{s:htmlcss-attributes}

Elements can be customized by giving them \gref{g:attribute}{attributes},
which are written as \texttt{name="value"} pairs inside the element's opening tag.
For example:

\begin{minted}{html}
<h1 align="center">A Centered Heading</h1>
\end{minted}

centers the \texttt{h1} heading on the page, while:

\begin{minted}{html}
<p class="disclaimer">This planet provided as-is.</p>
\end{minted}

\noindent
marks this paragraph as a disclaimer.
That doesn't mean anything special to HTML,
but as we'll see later,
we can define styles based on the \texttt{class} attributes of elements.

An attribute's name may appear at most once in any element,
just like a key can only appear once in any JavaScript object,
so \texttt{{\textless}p\ align="left"\ align="right"{\textgreater}{}...{\textless}/p{\textgreater}{}} is illegal.
If we want to give an attribute multiple values---for example,
if we want an element to have several classes---we put all the values in one string.
Unfortunately,
as the example below shows,
HTML is inconsistent about whether values should be separated by spaces or semi-colons:

\begin{minted}{html}
<p class="disclaimer optional" style="color: blue; font-size: 200%;">
\end{minted}

However they are separated,
values are supposed to be quoted,
but in practice we can often get away with \texttt{name=value}.
And for Boolean attributes whose values are just true or false,
we can even sometimes just get away with \texttt{name} on its own.

\section{Lists}\label{s:htmlcss-lists}

Headings and paragraphs are all very well,
but data scientists need more.
To create an unordered (bulleted) list,
we use a \texttt{ul} element,
and wrap each item inside the list in \texttt{li}.
To create an ordered (numbered) list,
we use \texttt{ol} instead of \texttt{ul},
but still use \texttt{li} for the list items.

\begin{minted}{html}
<ul>
  <li>first</li>
  <li>second</li>
  <li>third</li>
</ul>
\end{minted}

\begin{itemize}
\item
  first
\item
  second
\item
  third
\end{itemize}

\begin{minted}{html}
<ol>
  <li>first</li>
  <li>second</li>
  <li>third</li>
</ol>
\end{minted}

\begin{enumerate}
\item
  first
\item
  second
\item
  third
\end{enumerate}

Lists can be nested by putting the inner list's \texttt{ul} or \texttt{ol}
inside one of the outer list's \texttt{li} elements:

\begin{minted}{html}
<ol>
  <li>Major A
    <ol>
      <li>minor p</li>
      <li>minor q</li>
    </ol>
  </li>
  <li>Major B
    <ol>
      <li>minor r</li>
      <li>minor s</li>
    </ol>
  </li>
</ol>
\end{minted}

\begin{enumerate}
\item
  Major A

  \begin{enumerate}
    \item
    minor p
  \item
    minor q
  \end{enumerate}
\item
  Major B

  \begin{enumerate}
    \item
    minor r
  \item
    minor s
  \end{enumerate}
\end{enumerate}

\section{Tables}\label{s:htmlcss-tables}

Lists are a great way to get started,
but if we \emph{really} want to impress people with our data science skills,
we need tables.
Unsurprisingly,
we use the \texttt{table} element to create these.
Each row is a \texttt{tr} (for ``table row''),
and within rows,
column items are shown with \texttt{td} (for ``table data'')
or \texttt{th} (for ``table heading'').

\begin{minted}{html}
<table>
  <tr> <th>Alkali</th>   <th>Noble Gas</th> </tr>
  <tr> <td>Hydrogen</td> <td>Helium</td>    </tr>
  <tr> <td>Lithium</td>  <td>Neon</td>      </tr>
  <tr> <td>Sodium</td>   <td>Argon</td>     </tr>
</table>
\end{minted}

\begin{longtable}{ll}
Alkali & Noble Gas\tabularnewline
Hydrogen & Helium\tabularnewline
Lithium & Neon\tabularnewline
Sodium & Argon\tabularnewline
\end{longtable}

Do \emph{not} use tables to create multi-column layouts:
there's a better way.

\section{Links}\label{s:htmlcss-links}

Links to other pages are what makes HTML hypertext.
Confusingly,
the element used to show a link is called \texttt{a}.
The text inside the element is displayed and (usually) highlighted for clicking.
Its \texttt{href} attribute specifies what the link is pointing at;
both local filenames and URLs are supported.
Oh,
and we can use \texttt{{\textless}br/{\textgreater}{}} to force a line break in text
(with a trailing slash inside the tag, since the \texttt{br} element doesn't contain any content):

\begin{minted}{html}
<a href="https://nodejs.org/">Node.js</a>
<br/>
<a href="https://facebook.github.io/react/">React</a>
<br/>
<a href="../index.html">home page (relative path)</a>
\end{minted}

This appears as:

\begin{minted}{text}
Node.js
React
home page (relative path)
\end{minted}

\noindent
with the usual clickability.

\section{Images}\label{s:htmlcss-images}

Images can be stored inside HTML pages in two ways:
by using SVG (which we will discuss in \chapref{s:vis})
or by encoding the image as text and including that text in the body of the page,
which is clever,
but makes the source of the pages very hard to read.

It is far more common to store each image in a separate file
and refer to that file using an \texttt{img} element
(which also allows us to use the image in many places without copying it).
The \texttt{src} attribute of the \texttt{img} tag specifies where to find the file;
as with the \texttt{href} attribute of an \texttt{a} element,
this can be either a URL or a local path.
Every \texttt{img} should also include a \texttt{title} attribute (whose purpose is self-explanatory)
and an \texttt{alt} attribute with some descriptive text to aid accessibility and search engines.
(Again, we have wrapped and broken lines so that they will display nicely in the printed version.)

\begin{minted}{html}
    <img src="./html5.png" title="HTML5 Logo"
         alt="Displays the HTML5 logo using a local path" />
    <img src="https://github.com/software-tools-in-javascript/js-vs-rc/blob/master/src/htmlcss/html5.png"
         title="HTML5 Logo"
         alt="Display the HTML5 logo using a URL" />
\end{minted}

Two things are worth noting here:

\begin{enumerate}
\item
  Since \texttt{img} elements don't contain any text,
  they are often written with the trailing-slash notation.
  However,
  they are also often written improperly as \texttt{{\textless}img\ src="..."{\textgreater}{}} without any slashes at all.
  Browsers will understand this,
  but some software packages will complain.
\item
  If an image file is referred to using a path rather than a URL,
  that path can be either \gref{g:relative-path}{relative} or \gref{g:absolute-path}{absolute}.
  If it's a relative path,
  it's interpreted starting from where the web page is located;
  if it's an absolute path,
  it's interpreted relative to wherever the web browser thinks the \gref{g:root-directory}{root directory} of the filesystem is.
  As we will see in \chapref{s:server},
  this can change from one installation to the next,
  so you should always try to use relative paths,
  except where you can't.
  It's all very confusing\ldots{}
\end{enumerate}

\section{Cascading Style Sheets}\label{s:htmlcss-css}

When HTML first appeared, people styled elements by setting their attributes:

\begin{minted}{html}
<html>
  <body>
    <h1 align="center">Heading is Centered</h1>
    <p>
      <b>Text</b> can be highlighted
      or <font color="coral">colorized</font>.
    </p>
  </body>
</html>
\end{minted}

Many still do,
but a better way is to use \gref{g:css}{Cascading Style Sheets} (CSS).
These allow us to define a style once and use it many times,
which makes it much easier to maintain consistency.
(I was going to say ``\ldots{}and keep pages readable'',
but given how complex CSS can be,
that's not a claim I feel I can make.)
Here's a page that uses CSS instead of direct styling:

\begin{minted}{html}
<html>
  <head>
    <link rel="stylesheet" href="simple-style.css" />
  </head>
  <body>
    <h1 class="title">Heading is Centered</h1>
    <p>
      <span class="keyword">Text</span> can be highlighted
      or <span class="highlight">colorized</span>.
    </p>
  </body>
</html>
\end{minted}

The \texttt{head} contains a link to a CSS file stored in the same directory as the page itself;
we could use a URL here instead of a relative path,
but the \texttt{link} element \emph{must} have the \texttt{rel="stylesheet"} attribute.
Inside the page,
we then set the \texttt{class} attribute of each element we want to style.

The file \texttt{simple-style.css} looks like this:

\begin{minted}{css}
h1.title {
  text-align: center;
}
span.keyword {
  font-weight: bold;
}
.highlight {
  color: coral;
}
\end{minted}

Each entry has the form \texttt{tag.class} followed by a group of properties inside curly braces,
and each property is a key-value pair.
We can omit the class and just write (for example):

\begin{minted}{css}
p {
  font-style: italic;
}
\end{minted}

in which case the style applies to everything with that tag.
If we do this,
we can override general rules with specific ones:
the style for a disclaimer paragraph is defined by \texttt{p} with overrides defined by \texttt{p.disclaimer}.
We can also omit the tag and simply use \texttt{.class},
in which case every element with that class has that style.

As suggested by the earlier discussion of separators,
elements may have multiple values for class,
as in \texttt{{\textless}span\ class="keyword\ highlight"{\textgreater}{}...{\textless}/span{\textgreater}{}}.
(The \texttt{span} element simply marks a region of text,
but has no effect unless it's styled.)

These two features are one
(but unfortunately not the only)
common source of confusion with CSS:
If one may override general rules with specific ones
but also provide multiple values for class,
how do we keep track of which rules will apply to an element with multiple classes?
A detailed discussion of the order of precedence for CSS rules
is outside the scope of this tutorial. We recommend that those
likely to work often with stylesheets read (and consider bookmarking)
\href{https://www.w3schools.com/css/css_specificity.asp}{this W3Schools page}.

One other thing CSS can do is match specific elements.
We can label particular elements uniquely within a page using the \texttt{id} attribute,
then refer to those elements using \texttt{\#name} as a \gref{g:selector}{selector}.
For example,
if we create a page that gives two spans unique IDs:

\begin{minted}{html}
<html>
  <head>
    <link rel="stylesheet" href="selector-style.css" />
  </head>
  <body>
    <p>
      First <span id="major">keyword</span>.
    </p>
    <p>
      Full <span id="minor">explanation</span>.
    </p>
  </body>
</html>
\end{minted}

\noindent
then we can style those spans like this:

\begin{minted}{css}
#major {
  text-decoration: underline red;
}
#minor {
  text-decoration: overline blue;
}
\end{minted}

\begin{aside}{Internal Links}
  We can also link directly to an element within a page using \texttt{\#name}
  inside the \texttt{href} attribute of a link.
  For example,
  \texttt{{\textless}a\ href="selectors.html\#major"{\textgreater}{}some\ text{\textless}/a{\textgreater}{}}
  refers to the \texttt{\#major} element in \texttt{selectors.html},
  while \texttt{{\textless}a\ href="selectors.html\#minor"{\textgreater}{}some\ text{\textless}/a{\textgreater}{}}
  refers to the \texttt{\#minor} element.
  This is particularly useful \emph{within} pages:
  \texttt{{\textless}a\ href="\#major"{\textgreater}{}jump{\textless}/a{\textgreater}{}}
  takes us straight to the \texttt{\#major} element within this page.
  Internal links like this are often used for cross-referencing and to create a table of contents.
\end{aside}

\section{Bootstrap}\label{s:htmlcss-bootstrap}

CSS can become very complicated very quickly,
so most people use a framework to take care of the details.
One of the most popular is \href{https://getbootstrap.com/}{Bootstrap}
(which is what we're using to style this website).
Here's the entire source of a page that uses Bootstrap
to create a two-column layout with a banner at the top:

\begin{minted}{html}
<html>
  <head>
    <link rel="stylesheet"
          href="https://stackpath.bootstrapcdn.com/bootstrap/\
                4.1.3/css/bootstrap.min.css">
    <style>
      div {
        border: solid 1px;
      }
    </style>
  </head>
  <body>

    <div class="jumbotron text-center">
      <h1>Page Title</h1>
      <p>Resize this page to see the layout adjust dynamically.</p>
    </div>

    <div class="container">
      <div class="row">
        <div class="col-sm-4">
          <h2>First column is 4 wide</h2>
          <p>Text here goes</p>
          <p>in the column</p>
        </div>
        <div class="col-sm-8">
          <h2>Second column is 8 wide</h2>
          <p>Text over here goes</p>
          <p>in the other column</p>
        </div>
      </div>
    </div>

  </body>
</html>
\end{minted}

The page opens by loading Bootstrap from the web;
we can also download \texttt{bootstrap.min.css} and refer to it with a local path.
(The \texttt{.min} in the file's name signals that the file has been \gref{g:minimization}{minimized}
so that it will load more quickly.)

The page then uses a \texttt{style} element to create an \gref{g:internal-style-sheet}{internal style sheet}
to put a solid one-pixel border around every \texttt{div}
so that we can see the regions of the page more clearly.
Defining styles in the page header is generally a bad idea,
but it's a good way to test things quickly.
Oh,
and a \texttt{div} just marks a region of a page without doing anything to it,
just as a \texttt{span} marks a region of text without changing its appearance
unless we apply a style.

The first \texttt{div} creates a header box (called a ``jumbotron'') and centers its text.
The second \texttt{div} is a container,
which creates a bit of margin on the left and right sides of our content.
Inside that container is a row with two columns,
one 4/12 as wide as the row and the other 8/12 as wide.
(Bootstrap uses a 12-column system because 12 has lots of divisors.)

Bootstrap is \gref{g:responsive-design}{responsive}:
elements change size as the page grows narrower,
and are then stacked when the screen becomes too small to display them side by side.

We've left out many other aspects of HTML and CSS as well,
such as figure captions,
multi-column table cells,
and why it's so hard to center text vertically within a \texttt{div}.
One thing we will return to in \chapref{s:interactive} is
how to include interactive elements like buttons and forms in a page.
Handling those is part of why JavaScript was invented in the first place,
but we need more experience before tackling them.

\section{Exercises}\label{s:htmlcss-exercises}

\exercise{Cutting Corners}

What does your browser display if you forget to close a paragraph or list item tag
like this:

\begin{minted}{html}
<p>This paragraph starts but doesn't officially end.

<p>Another paragraph starts here but also doesn't end.

<ul>
  <li>First item in the list isn't closed.
  <li>Neither is the second.
</ul>
\end{minted}

\begin{enumerate}
\item
  What happens if you don't close a \texttt{ul} or \texttt{ol} list?
\item
  Is that behavior consistent with what happens when you omit \texttt{{\textless}/p{\textgreater}{}} or \texttt{{\textless}/li{\textgreater}{}}?
\end{enumerate}

\exercise{Mix and Match}

\begin{enumerate}
\item
  Create a page that contains a 2x2 table,
  each cell of which has a three-item bullet-point list.
  How can you reduce the indentation of the list items within their cells using CSS?
\item
  Open your page in a different browser (e.g., Firefox or Edge).
  Do they display your indented lists consistently?
\item
  Why do programs behave inconsistently?
  Why do programmers do this to us?
  Why?
  Why why why why why?
\end{enumerate}

\exercise{Naming}

What does the \texttt{sm} in Bootstrap's \texttt{col-sm-4} and \texttt{col-sm-8} stand for?
What other options could you use instead?
Why do web developers still use FORTRAN-style names in the 21st Century?

\exercise{Color}

HTML and CSS define names for a small number of colors.
All other colors must be specified using \gref{g:rgb}{RGB} values.
Write a small JavaScript program that creates an HTML page
that displays the word \texttt{color} in 100 different randomly-generated colors.
Compare this to the color scheme used in your departmental website.
Which one hurts your eyes less?

\exercise{Units}

What different units can you use to specify text size in CSS?
What do they mean?
What does \emph{anything} mean, when you get right down to it?

\section*{Key Points}

\input{keypoints/htmlcss}