forked from marijnh/Eloquent-JavaScript
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path12_browser.txt
396 lines (325 loc) · 17.5 KB
/
12_browser.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
:chap_num: 12
:prev_link: 11_language
:next_link: 13_dom
= JavaScript and the Browser =
[chapterquote="true"]
[quote,Douglas Crockford,The JavaScript Programming Language (video lecture)]
____
The browser is a really hostile programming environment.
____
(((Crockford+++,+++ Douglas)))(((JavaScript,history of)))(((World Wide
Web)))The next part of this book will talk about web browsers. Without
web ((browser))s, there would be no JavaScript. And even if there
were, no one would ever have paid any attention to it.
(((decentralization)))(((compatibility)))Web technology has, from the
start, been decentralized, not just technically but also in the
way it has evolved. Various browser vendors have added new
functionality in ad hoc and sometimes poorly thought out ways, which
then sometimes ended up being adopted by others and finally set down
as a ((standard)).
This is both a blessing and a curse. On the one hand, it is empowering
to not have a central party control a system but have it be improved
by various parties working in loose ((collaboration)) (or,
occasionally, open hostility). On the other hand, the haphazard way in
which the Web was developed means that the resulting system is not
exactly a shining example of internal ((consistency)). In fact, some
parts of it are downright messy and confusing.
== Networks and the Internet ==
Computer ((network))s have been around since the 1950s. If you put
cables between two or more computers and allow them to send data back
and forth through these cables, you can do all kinds of wonderful
things.
If connecting two machines in the same building allows us to do
wonderful things, connecting machines all over the planet should be
even better. The technology to start implementing this vision was
developed in the 1980s, and the resulting network is called the
_((Internet))_. It has lived up to its promise.
A computer can use this network to spew bits at another computer. For
any effective ((communication)) to arise out of this bit-spewing, the
computers at both ends must know what the bits are supposed to
represent. The meaning of any given sequence of bits depends entirely
on the kind of thing that it is trying to express and on the
((encoding)) mechanism used.
A _network ((protocol))_ describes a style of communication over a
((network)). There are protocols for sending email, for fetching email,
for sharing files, or even for controlling computers that happen to be
infected by malicious software.
For example, a simple ((chat)) protocol might consist of one computer
sending the bits that represent the text “CHAT?” to another machine
and the other responding with “OK!” to confirm that it understands the
protocol. They can then proceed to send each other strings of text,
read the text sent by the other from the network, and display whatever
they receive on their screens.
(((layering)))(((stream)))(((ordering)))Most protocols are built on
top of other protocols. Our example chat protocol treats the network
as a streamlike device into which you can put bits and have them
arrive at the correct destination in the correct order. Ensuring those
things is already a rather difficult technical problem.
indexsee:[Transmission Control Protocol,TCP]
(((TCP)))The _Transmission Control Protocol_ (TCP) is a ((protocol)) that
solves this problem. All Internet-connected devices “speak” it, and
most communication on the ((Internet)) is built on top of it.
(((listening (TCP))))A TCP ((connection)) works as follows: one
computer must be waiting, or _listening_, for other computers to start
talking to it. To be able to listen for different kinds of
communication at the same time on a single machine, each listener has
a number (called a _((port))_) associated with it. Most ((protocol))s
specify which port should be used by default. For example, when we
want to send an email using the ((SMTP)) protocol, the machine through
which we send it is expected to be listening on port 25.
Another computer can then establish a ((connection)) by connecting to
the target machine using the correct port number. If the target
machine can be reached and is listening on that port, the connection
is successfully created. The listening computer is called the
_((server))_, and the connecting computer is called the _((client))_.
Such a connection acts as a two-way ((pipe)) through which bits can
flow—the machines on both ends can put data into it. Once the bits are
successfully transmitted, they can be read out again by the machine on
the other side. This is a convenient model. You could say that
((TCP)) provides an ((abstraction)) of the network.
[[web]]
== The Web ==
The _((World Wide Web))_ (not to be confused with the ((Internet)) as
a whole) is a set of ((protocol))s and formats that allow us to visit
web pages in a browser. The “Web” part in the name refers to the fact
that such pages can easily link to each other, thus connecting into a
huge ((mesh)) that users can move through.
indexsee:[Hypertext Transfer Prototol,HTTP]
(((HTTP)))To add content to the Web, all you need to do is connect a machine to
the ((Internet)), and have it listen on port 80, using the
_Hypertext Transfer Protocol_ (HTTP). This protocol allows other computers
to request documents over the ((network)).
indexsee:[Universal Resource Locator,URL]
(((URL)))Each ((document)) on the Web is named by a _Universal Resource
Locator_ (URL), which looks something like this:
----
http://eloquentjavascript.net/12_browser.html
| | | |
protocol server path
----
(((HTTPS)))The first part tells us that this URL uses the HTTP
((protocol)) (as opposed to, for example, encrypted HTTP, which would
be _https://_). Then comes the part that identifies which ((server))
we are requesting the document from. Last is a path string that
identifies the specific document (or _((resource))_) we are interested
in.
Each machine connected to the Internet gets a unique _((IP address))_,
which looks something like `37.187.37.82`. You can use these directly
as the server part of a ((URL)). But lists of more or less random
numbers are hard to remember and awkward to type, so you can instead
register a _((domain)) name_ to point toward a specific machine or
set of machines. I registered _eloquentjavascript.net_ to point at the
IP address of a machine I control and can thus use that domain name
to serve web pages.
(((browser)))If you type the previous URL into your browser's ((address
bar)), it will try to retrieve and display the ((document)) at that
URL. First, your browser has to find out what address
_eloquentjavascript.net_ refers to. Then, using the ((HTTP)) protocol,
it makes a connection to the server at that address and asks for the
resource _/12_browser.html_.
We will take a closer look at the HTTP protocol in
link:17_http.html#http[Chapter 17].
== HTML ==
indexsee:[Hypertext Markup Language,HTML]
(((HTML)))HTML, which stands for _Hypertext Markup Language_, is the
document format used for web pages. An HTML document contains
((text)), as well as _((tag))s_ that give structure to the text,
describing things such as links, paragraphs, and headings.
A simple HTML document looks like this:
[source,text/html]
----
<!doctype html>
<html>
<head>
<title>My home page</title>
</head>
<body>
<h1>My home page</h1>
<p>Hello, I am Marijn and this is my home page.</p>
<p>I also wrote a book! Read it
<a href="http://eloquentjavascript.net">here</a>.</p>
</body>
</html>
----
ifdef::book_target[]
This is what such a document would look like in the browser:
image::img/home-page.png[alt="My home page",width="6.3cm"]
endif::book_target[]
(((angle brackets)))The tags, wrapped in angle brackets (`<`
and `>`), provide information about the ((structure)) of the
document. The other ((text)) is just plain text.
(((doctype)))(((version)))The document starts with `<!doctype html>`,
which tells the browser to interpret it as _modern_ HTML, as opposed
to various dialects that were in use in the past.
(((head (HTML tag))))(((body (HTML tag))))(((title (HTML tag))))(((h1
(HTML tag))))(((p (HTML tag))))HTML documents have a head and a body.
The head contains information _about_ the document, and the body
contains the document itself. In this case, we first declared that the
title of this document is “My home page” and then gave a document
containing a heading (`<h1>`, meaning “heading 1”—++<h2>++ to `<h6>`
produce more minor headings) and two ((paragraph))s (`<p>`).
(((href attribute)))(((a (HTML tag))))Tags come in several forms. An
((element)), such as the body, a paragraph, or a link, is started by
an _((opening tag))_ like `<p>` and ended by a _((closing tag))_ like
`</p>`. Some opening tags, such as the one for the ((link)) (`<a>`),
contain extra information in the form of `name="value"` pairs. These
are called _((attribute))s_. In this case, the destination of the link
is indicated with `href="http://eloquentjavascript.net"`, where `href`
stands for “hypertext reference”.
(((src attribute)))(((self-closing tag)))(((img (HTML tag))))Some
kinds of ((tag))s do not enclose anything and thus do not need to be
closed. An example of this would be `<img
src="http://example.com/image.jpg">`, which will display the ((image))
found at the given source URL.
(((escaping,in HTML)))To be able to include ((angle brackets)) in
the text of a document, even though they have a special meaning in
HTML, yet another form of special notation has to be introduced. A
plain opening angle bracket is written as `<` (“less than”), and
a closing bracket is written as `>` (“greater than”). In HTML, an ampersand
(`&`) character followed by a word and a semicolon (`;`) is called an
_((entity))_, and will be replaced by the character it encodes.
(((backslash character)))(((ampersand character)))(((double-quote
character)))This is analogous to the way backslashes are used in
JavaScript strings. Since this mechanism gives ampersand characters a
special meaning, too, those need to be escaped as `&`. Inside an
attribute, which is wrapped in double quotes, `"` can be used to
insert an actual quote character.
(((error tolerance)))(((parsing)))HTML is parsed in a remarkably
error-tolerant way. When tags that should be there are missing, the
browser reconstructs them. The way in which this is done has been
standardized, and you can rely on all modern browsers to do it in the
same way.
The following document will be treated just like the one shown previously:
[source,text/html]
----
<!doctype html>
<title>My home page</title>
<h1>My home page</h1>
<p>Hello, I am Marijn and this is my home page.
<p>I also wrote a book! Read it
<a href=http://eloquentjavascript.net>here</a>.
----
(((title (HTML tag))))(((head (HTML tag))))(((body (HTML
tag))))(((html (HTML tag))))The `<html>`, `<head>`, and `<body>` tags
are gone completely. The browser knows that `<title>` belongs in a
head, and that `<h1>` in a body. Furthermore, I am no longer explicitly
closing the paragraphs since opening a new paragraph or ending the
document will close them implicitly. The quotes around the link target
are also gone.
This book will usually omit the `<html>`, `<head>`, and `<body>` tags
from examples to keep them short and free of clutter. But I _will_
close tags and include quotes around attributes.
(((browser)))I will also usually omit the ((doctype)). This is not to
be taken as an encouragement to omit doctype declarations. Browsers
will often do ridiculous things when you forget them. You should
consider doctypes implicitly present in examples, even when they are
not actually shown in the text.
[[script_tag]]
== HTML and JavaScript ==
(((JavaScript,in HTML)))(((script (HTML tag))))In the context of this
book, the most important ((HTML)) tag is `<script>`. This tag allows
us to include a piece of JavaScript in a document.
[source,text/html]
----
<h1>Testing alert</h1>
<script>alert("hello!");</script>
----
(((alert function)))(((timeline)))Such a script will run as soon as
its `<script>` tag is encountered as the browser reads the HTML. The
page shown earlier will pop up an `alert` dialog when opened.
(((src attribute)))Including large programs directly in HTML documents
is often impractical. The `<script>` tag can be given an `src`
attribute in order to fetch a script file (a text file containing a
JavaScript program) from a URL.
[source,text/html]
----
<h1>Testing alert</h1>
<script src="code/hello.js"></script>
----
The _code/hello.js_ file included here contains the same simple program,
`alert("hello!")`. When an HTML page references other URLs as part of
itself, for example an image file or a script—web browsers will
retrieve them immediately and include them in the page.
(((script (HTML tag))))(((closing tag)))A script tag must always be
closed with `</script>`, even if it refers to a script file and
doesn't contain any code. If you forget this, the rest of the page
will be interpreted as part of the script.
(((button (HTML tag))))(((onclick attribute)))Some attributes can also
contain a JavaScript program. The `<button>` tag shown next (which shows up
as a button) has an `onclick` attribute, whose content will be run
whenever the button is clicked.
[source,text/html]
----
<button onclick="alert('Boom!');">DO NOT PRESS</button>
----
(((single-quote character)))(((escaping,in HTML)))Note that I had to
use single quotes for the string in the `onclick` attribute because
double quotes are already used to quote the whole attribute. I could
also have used `"`, but that'd make the program harder to read.
== In the sandbox ==
(((malicious script)))(((World Wide
Web)))(((browser)))(((website)))(((security)))Running programs
downloaded from the ((Internet)) is potentially dangerous. You do not
know much about the people behind most sites you visit, and they do
not necessarily mean well. Running programs by people who do not mean
well is how you get your computer infected by ((virus))es, your data
stolen, and your accounts hacked.
Yet the attraction of the Web is that you can surf it without
necessarily ((trust))ing all the pages you visit. This is why browsers
severely limit the things a JavaScript program may do: it can't look
at the files on your computer or modify anything not related to the
web page it was embedded in.
(((isolation)))Isolating a programming environment in this way is
called _((sandbox))ing_, the idea being that the program is harmlessly
playing in a sandbox. But you should imagine this particular kind of
sandbox as having a cage of thick steel bars over it, which makes it
somewhat different from your typical playground sandbox.
The hard part of sandboxing is allowing the programs enough room to be
useful yet at the same time restricting them from doing anything
dangerous. Lots of useful functionality, such as communicating with
other servers or reading the content of the copy-paste ((clipboard)),
can also be used to do problematic, ((privacy))-invading things.
(((leak)))(((exploit)))(((security)))Every now and then, someone comes
up with a new way to circumvent the limitations of a ((browser)) and
do something harmful, ranging from leaking minor private information
to taking over the whole machine that the browser runs on. The browser
developers respond by fixing the hole, and all is well again—that is,
until the next problem is discovered, and hopefully publicized, rather
than secretly exploited by some government or ((mafia)).
== Compatibility and the browser wars ==
(((Microsoft)))(((World Wide Web)))In the early stages of the
Web, a browser called ((Mosaic)) dominated the market. After a few
years, the balance had shifted to ((Netscape)), which was then, in
turn, largely supplanted by Microsoft's ((Internet Explorer)). At any
point where a single ((browser)) was dominant, that browser's vendor
would feel entitled to unilaterally invent new features for the Web.
Since most users used the same browser, ((website))s would simply
start using those features—never mind the other browsers.
This was the dark age of ((compatibility)), often called the
_((browser wars))_. Web developers were left with not one unified Web
but two or three incompatible platforms. To make things worse, the
browsers in use around 2003 were all full of ((bug))s, and of course
the bugs were different for each ((browser)). Life was hard for people
writing web pages.
(((Apple)))(((Internet Explorer)))(((Mozilla)))Mozilla ((Firefox)), a
not-for-profit offshoot of ((Netscape)), challenged Internet
Explorer's hegemony in the late 2000s. Because ((Microsoft)) was not
particularly interested in staying competitive at the time, Firefox
took quite a chunk of market share away from it. Around the same
time, ((Google)) introduced its ((Chrome)) browser, and Apple's
((Safari)) browser gained popularity, leading to a situation where
there were four major players, rather than one.
(((compatibility)))The new players had a more serious attitude toward
((standards)) and better ((engineering)) practices, leading to less
incompatibility and fewer ((bug))s. Microsoft, seeing its market share
crumble, came around and adopted these attitudes. If you are starting
to learn web development today, consider yourself lucky. The latest
versions of the major browsers behave quite uniformly and have
relatively few bugs.
(((World Wide Web)))That is not to say that the situation is perfect
just yet. Some of the people using the Web are, for reasons of inertia
or corporate policy, stuck with very old ((browser))s. Until those
browsers die out entirely, writing websites that work for them will
require a lot of arcane knowledge about their shortcomings and quirks.
This book is not about those ((quirks)). Rather, it aims to present
the modern, sane style of ((web programming)).