Google Machine Learning Glossary.csv

Introduction (Google Machine Learning Glossary)|"

<p>This deck has taken definitions outlined in <a href=""https://developers.google.com/machine-learning/glossary/"">Google's Machine Learning Glossary</a> and put them into a form which can be easily learnt/revised using <a href=""https://apps.ankiweb.net/"">Anki</a> a cross platform app.</p>
<h2>Notes</h2>
<p>Please note the modifications which have been made &amp; where you can find updates.</p>
<ol align=""left"">
<li>Every card has ""(Google Machine Learning Glossary)"" appended to the end so that if you have any other Machine Learning words in your collection, the Google definition will still be added when importing it.</li>
<li>The original relative urls have been made into full urls so they are clickable within Anki.</li>
<li>All LaTeX formulae have been wrapped with text to ensure that it can be rendered correctly in Anki.</li>
<li>Any updates, translations or corrections to the deck will be available at <a href=""https://github.com/darigovresearch/Google-Machine-Learning-Glossary-Flashcards""></a><a href=""https://github.com/darigovresearch/Google-Machine-Learning-Glossary-Flashcards"">https://github.com/darigovresearch/Google-Machine-Learning-Glossary-Flashcards</a> so do return periodically to check if you have the latest version.</li>
</ol>
<p>Feel free to share the deck and give the repository a star so more people can get the most out of it.</p>
<h2>License</h2>
<p>Except as otherwise noted, the content of this deck is licensed under the <a href=""https://creativecommons.org/licenses/by/4.0/"">Creative Commons Attribution 4.0 License</a>, and code samples are licensed under the <a href=""https://www.apache.org/licenses/LICENSE-2.0"">Apache 2.0 License</a>. For details, see the <a href=""https://developers.google.com/site-policies"">Google Developers Site Policies</a>. Java is a registered trademark of Oracle and/or its affiliates.</p>
<p>To see this work in full go to <a href=""https://developers.google.com/machine-learning/glossary/"">https://developers.google.com/machine-learning/glossary/</a></p>

"
A/B testing (Google Machine Learning Glossary)|"

<p>A statistical way of comparing two (or more) techniques, typically an incumbent
against a new rival. A/B testing aims to determine not only which technique
performs better but also to understand whether the difference is
statistically significant. A/B testing usually considers only two techniques
using one measurement, but it can be applied to any finite number of techniques
and measures.</p>

"
accuracy (Google Machine Learning Glossary)|"

<p>The fraction of <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a> that a
<a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification model</strong></a> got right. In
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a>, accuracy
is defined as follows:</p>

<div>
[$$]\text{Accuracy} =
\frac{\text{Correct Predictions}} {\text{Total Number Of Examples}}[/$$]
</div>

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a>, accuracy has
the following definition:</p>

<div>
[$$]\text{Accuracy} = \frac{\text{True Positives} + \text{True Negatives}}
                         {\text{Total Number Of Examples}}[/$$]
</div>

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#TP""><strong>true positive</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#TN""><strong>true negative</strong></a>.</p>

"
action (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the mechanism by which the <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a>
transitions between <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>states</strong></a> of the
<a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environment</strong></a>. The agent chooses the action by using a
<a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a>.</p>

"
activation function (Google Machine Learning Glossary)|"

<p>A function (for example, <a href=""https://developers.google.com/machine-learning/glossary/#ReLU""><strong>ReLU</strong></a> or <a href=""https://developers.google.com/machine-learning/glossary/#sigmoid_function""><strong>sigmoid</strong></a>)
that takes in the weighted sum of all of the inputs from the previous layer
and then generates and passes an output value (typically nonlinear) to the next
layer.</p>

"
active learning (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>training</strong></a> approach in which the
algorithm <em>chooses</em> some of the data it learns from. Active learning
is particularly valuable when <a href=""https://developers.google.com/machine-learning/glossary/#labeled_example""><strong>labeled examples</strong></a>
are scarce or expensive to obtain. Instead of blindly seeking a diverse
range of labeled examples, an active learning algorithm selectively seeks
the particular range of examples it needs for learning.</p>

"
AdaGrad (Google Machine Learning Glossary)|"

<p>A sophisticated gradient descent algorithm that rescales the
gradients of each parameter, effectively giving each parameter
an independent <a href=""https://developers.google.com/machine-learning/glossary/#learning_rate""><strong>learning rate</strong></a>. For a full explanation, see
<a href=""http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf""
target=""T"">this paper</a>.</p>

"
agent (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the entity that uses a <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a>
to maximize expected <a href=""https://developers.google.com/machine-learning/glossary/#return""><strong>return</strong></a> gained from transitioning
between <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>states</strong></a> of the  <a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environment</strong></a>.</p>

"
agglomerative clustering (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#hierarchical_clustering""><strong>hierarchical clustering</strong></a>.</p>

"
AR (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#augmented_reality""><strong>augmented reality</strong></a>.</p>

"
area under the PR curve (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#PR_AUC""><strong>PR AUC (Area under the PR Curve)</strong></a>.</p>

"
area under the ROC curve (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#AUC""><strong>AUC (Area under the ROC curve)</strong></a>.</p>

"
artificial general intelligence (Google Machine Learning Glossary)|"

<p>A non-human mechanism that demonstrates a <em>broad range</em> of problem solving,
creativity, and adaptability. For example, a program demonstrating artificial
general intelligence could translate text, compose symphonies, <em>and</em> excel at
games that have not yet been invented.</p>

"
artificial intelligence (Google Machine Learning Glossary)|"

<p>A non-human program or model that can solve sophisticated tasks. For example,
a program or model that translates text or a program or model that identifies
diseases from radiologic images both exhibit artificial intelligence.</p>

<p>Formally, <a href=""https://developers.google.com/machine-learning/glossary/#machine_learning""><strong>machine learning</strong></a> is a sub-field of artificial
intelligence. However, in recent years, some organizations have begun using the
terms <em>artificial intelligence</em> and <em>machine learning</em> interchangeably.</p>

"
attribute (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a>. In fairness, attributes often refer to
characteristics pertaining to individuals.</p>

"
AUC (Area under the ROC Curve) (Google Machine Learning Glossary)|"

<p>An evaluation metric that considers all possible
<a href=""https://developers.google.com/machine-learning/glossary/#classification_threshold""><strong>classification thresholds</strong></a>.</p>

<p>The Area Under the <a href=""https://developers.google.com/machine-learning/glossary/#ROC""><strong>ROC curve</strong></a> is the probability that a classifier
will be more confident that a randomly chosen positive example is actually
positive than that a randomly chosen negative example is positive.</p>

"
augmented reality (Google Machine Learning Glossary)|"

<p>A technology that superimposes a computer-generated image on a user&#39;s view of
the real world, thus providing a composite view.</p>

"
automation bias (Google Machine Learning Glossary)|"

<p>When a human decision maker favors recommendations made by an automated
decision-making system over information made without automation, even
when the automated decision-making system makes errors.</p>

"
average precision (Google Machine Learning Glossary)|"

<p>A metric for summarizing the performance of a ranked sequence of results.
Average precision is calculated by taking the average of the
<a href=""https://developers.google.com/machine-learning/glossary/#precision""><strong>precision</strong></a> values for each relevant result (each result in
the ranked list where the recall increases relative to the previous result).</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#area_under_the_pr_curve""><strong>Area under the PR Curve</strong></a>.</p>


"
backpropagation (Google Machine Learning Glossary)|"

<p>The primary algorithm for performing
<a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> on
<a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural networks</strong></a>. First, the output values
of each node are calculated (and cached) in a forward pass.
Then, the <a href=""https://developers.google.com/machine-learning/glossary/#partial_derivative""><strong>partial derivative</strong></a>
of the error with respect to each parameter is calculated in a backward
pass through the graph.</p>

"
bag of words (Google Machine Learning Glossary)|"

<p>A representation of the words in a phrase or passage,
irrespective of order. For example, bag of words represents the
following three phrases identically:</p>

<ul align=""left"">
<li>the dog jumps</li>
<li>jumps the dog</li>
<li>dog jumps the</li>
</ul>

<p>Each word is mapped to an index in a <a href=""https://developers.google.com/machine-learning/glossary/#sparse_vector""><strong>sparse vector</strong></a>, where
the vector has an index for every word in the vocabulary.  For example,
the phrase <em>the dog jumps</em> is mapped into a feature vector with non-zero
values at the three indices corresponding to the words <em>the</em>, <em>dog</em>, and
<em>jumps</em>. The non-zero value can be any of the following:</p>

<ul align=""left"">
<li>A 1 to indicate the presence of a word.</li>
<li>A count of the number of times a word appears in the bag. For example,
if the phrase were <em>the maroon dog is a dog with maroon fur</em>, then both
<em>maroon</em> and <em>dog</em> would be represented as 2, while the other words would
be represented as 1.</li>
<li>Some other value, such as the logarithm of the count of the number of
times a word appears in the bag.</li>
</ul>

"
baseline (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> used as a reference point for comparing how well another
model (typically, a more complex one) is performing. For example, a
<a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression model</strong></a> might serve as a
good baseline for a <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep model</strong></a>.</p>

<p>For a particular problem, the baseline helps model developers quantify
the minimal expected performance that a new model must achieve for the new
model to be useful.</p>

"
batch (Google Machine Learning Glossary)|"

<p>The set of examples used in one <a href=""https://developers.google.com/machine-learning/glossary/#iteration""><strong>iteration</strong></a> (that is, one
<a href=""https://developers.google.com/machine-learning/glossary/#gradient""><strong>gradient</strong></a> update) of
<a href=""https://developers.google.com/machine-learning/glossary/#model_training""><strong>model training</strong></a>.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#batch_size""><strong>batch size</strong></a>.</p>

"
batch normalization (Google Machine Learning Glossary)|"

<p><a href=""https://developers.google.com/machine-learning/glossary/#normalization""><strong>Normalizing</strong></a> the input or output of the
<a href=""https://developers.google.com/machine-learning/glossary/#activation_function""><strong>activation functions</strong></a> in a
<a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layer</strong></a>. Batch normalization can
provide the following benefits:</p>

<ul align=""left"">
<li>Make <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural networks</strong></a> more stable by protecting
against <a href=""https://developers.google.com/machine-learning/glossary/#outliers""><strong>outlier</strong></a> weights.</li>
<li>Enable higher <a href=""https://developers.google.com/machine-learning/glossary/#learning_rate""><strong>learning rates</strong></a>.</li>
<li>Reduce <a href=""https://developers.google.com/machine-learning/glossary/#overfitting""><strong>overfitting</strong></a>.</li>
</ul>

"
batch size (Google Machine Learning Glossary)|"

<p>The number of examples in a <a href=""https://developers.google.com/machine-learning/glossary/#batch""><strong>batch</strong></a>. For example, the batch size
of <a href=""https://developers.google.com/machine-learning/glossary/#SGD""><strong>SGD</strong></a> is 1, while the batch size of
a <a href=""https://developers.google.com/machine-learning/glossary/#mini-batch""><strong>mini-batch</strong></a> is usually between 10 and 1000. Batch size is
usually fixed during <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>training</strong></a> and <a href=""https://developers.google.com/machine-learning/glossary/#inference""><strong>inference</strong></a>;
however, <a href=""https://developers.google.com/machine-learning/glossary/#TensorFlow""><strong>TensorFlow</strong></a> does permit dynamic batch sizes.</p>

"
Bayesian neural network (Google Machine Learning Glossary)|"

<p>A probabilistic <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> that accounts for
uncertainty in <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weights</strong></a> and outputs. A standard neural network
regression model typically <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predicts</strong></a> a scalar value;
for example, a model predicts a house price
of 853,000. By contrast, a Bayesian neural network predicts a distribution of
values; for example, a model predicts a house price of 853,000 with a standard
deviation of 67,200. A Bayesian neural network relies on
<a href=""https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/"" target=""T"">
Bayes&#39; Theorem</a>
to calculate uncertainties in weights and predictions. A Bayesian neural
network can be useful when it is important to quantify uncertainty, such as in
models related to pharmaceuticals. Bayesian neural networks can also help
prevent <a href=""https://developers.google.com/machine-learning/glossary/#overfitting""><strong>overfitting</strong></a>.</p>

"
Bellman equation (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the following identity satisfied by the optimal
<a href=""https://developers.google.com/machine-learning/glossary/#q-function""><strong>Q-function</strong></a>:</p>

<p>\[Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s&#39;|s,a} \max_{a&#39;} Q(s&#39;, a&#39;))\]</p>

<p><a href=""https://developers.google.com/machine-learning/glossary/#reinforcement_learning""><strong>Reinforcement learning</strong></a> algorithms apply this
identity to create <a href=""https://developers.google.com/machine-learning/glossary/#q-learning""><strong>Q-learning</strong></a> via the following update rule:</p>

<p>\[Q(s,a) \gets Q(s,a) + \alpha
  \left[r(s,a)
      + \gamma \displaystyle\max_{\substack{a_1}} Q(s’,a’)
    - Q(s,a) \right]
\]</p>

<p>Beyond reinforcement learning, the Bellman equation has applications to
dynamic programming. See the
<a href=""https://wikipedia.org/wiki/Bellman_equation"" target=""T"">
Wikipedia entry for Bellman Equation</a>.</p>

"
bias (ethics/fairness) (Google Machine Learning Glossary)|"

<p>
1. Stereotyping, prejudice or favoritism towards some things, people,
or groups over others. These biases can affect collection and
interpretation of data, the design of a system, and how users interact
with a system.  Forms of this type of bias include:
</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#automation_bias""><strong>automation bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#confirmation_bias""><strong>confirmation bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#confirmation_bias""><strong>experimenter’s bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#group_attribution_bias""><strong>group attribution bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#implicit_bias""><strong>implicit bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#in-group_bias""><strong>in-group bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#out-group_homogeneity_bias""><strong>out-group homogeneity bias</strong></a></li>
</ul>

<p>
2. Systematic error introduced by a sampling or reporting procedure.
Forms of this type of bias include:
</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>coverage bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>non-response bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#participation_bias""><strong>participation bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#reporting_bias""><strong>reporting bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>sampling bias</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>selection bias</strong></a></li>
</ul>

<p>Not to be confused with the <a href=""https://developers.google.com/machine-learning/glossary/#bias"">bias term</a> in machine learning models
or <a href=""https://developers.google.com/machine-learning/glossary/#prediction_bias""><strong>prediction bias</strong></a>.</p>

"
bias (math) (Google Machine Learning Glossary)|"

<p>An intercept or offset from an origin. Bias (also known as the
<strong>bias term</strong>) is referred to as <em>b</em> or <i>w<sub>0</sub></i> in
machine learning models.  For example, bias is the <em>b</em> in the
following formula:</p>

<div>
[$$]y' = b + w_1x_1 + w_2x_2 + … w_nx_n[/$$]
</div>

<p>Not to be confused with <a href=""https://developers.google.com/machine-learning/glossary/#bias_ethics""><strong>bias in ethics and fairness</strong></a>
or <a href=""https://developers.google.com/machine-learning/glossary/#prediction_bias""><strong>prediction bias</strong></a>.</p>

"
bigram (Google Machine Learning Glossary)|"

<p>An <a href=""https://developers.google.com/machine-learning/glossary/#N-gram""><strong>N-gram</strong></a> in which N=2.</p>

"
binary classification (Google Machine Learning Glossary)|"

<p>A type of <a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification</strong></a> task that outputs one
of two mutually exclusive <a href=""https://developers.google.com/machine-learning/glossary/#class""><strong>classes</strong></a>. For example, a machine
learning model that evaluates email messages and outputs either &quot;spam&quot; or
&quot;not spam&quot; is a <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classifier</strong></a>.</p>

"
binning (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#bucketing""><strong>bucketing</strong></a>.</p>

"
boosting (Google Machine Learning Glossary)|"

<p>A machine learning technique that iteratively combines a set of simple and
not very accurate classifiers (referred to as &quot;weak&quot; classifiers) into a
classifier with high accuracy (a &quot;strong&quot; classifier) by
<a href=""https://developers.google.com/machine-learning/glossary/#upweighting""><strong>upweighting</strong></a> the examples that the model is currently
misclassfying.</p>

"
bounding box (Google Machine Learning Glossary)|"

<p>In an image, the (<em>x</em>, <em>y</em>) coordinates of a rectangle around an area of
interest, such as the dog in the image below.</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/bounding_box.jpg""
     alt=""Photograph of a dog sitting on a sofa. A green bounding box
          with top-left coordinates of (275, 1271) and bottom-right
          coordinates of (2954, 2761) circumscribes the dog's body"">
</p>


"
broadcasting (Google Machine Learning Glossary)|"

<p>Expanding the shape of an operand in a matrix math operation to
<a href=""https://developers.google.com/machine-learning/glossary/#dimensions""><strong>dimensions</strong></a> compatible for that operation. For instance,
linear algebra requires that the two operands in a matrix addition operation
must have the same dimensions. Consequently, you can&#39;t add a matrix of shape
(m, n) to a vector of length n. Broadcasting enables this operation by
virtually expanding the vector of length n to a matrix of shape (m,n) by
replicating the same values down each column.</p>

<p>For example, given the following definitions, linear algebra prohibits
A+B because A and B have different dimensions:</p>
<pre class=""prettyprint"" translate=""no"" dir=""ltr""><code translate=""no"" dir=""ltr"">A = [[7, 10, 4],
     [13, 5, 9]]
B = [2]
</code></pre>
<p>However, broadcasting enables the operation A+B by virtually expanding B to:</p>
<pre class=""prettyprint"" translate=""no"" dir=""ltr""><code translate=""no"" dir=""ltr""> [[2, 2, 2],
  [2, 2, 2]]
</code></pre>
<p>Thus, A+B is now a valid operation:</p>
<pre class=""prettyprint"" translate=""no"" dir=""ltr""><code translate=""no"" dir=""ltr"">[[7, 10, 4],  +  [[2, 2, 2],  =  [[ 9, 12, 6],
 [13, 5, 9]]      [2, 2, 2]]      [15, 7, 11]]
</code></pre>
<p>See the following description of
<a href=""https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html""
target=""T"">broadcasting in NumPy</a> for more details.</p>

"
bucketing (Google Machine Learning Glossary)|"

<p>Converting a (usually <a href=""https://developers.google.com/machine-learning/glossary/#continuous_feature""><strong>continuous</strong></a>) feature into
multiple binary features called buckets or bins, typically based on value
range. For example, instead of representing temperature as a single
continuous floating-point feature, you could chop ranges of temperatures
into discrete bins. Given temperature data sensitive to a tenth of a degree,
all temperatures between 0.0 and 15.0 degrees could be put into one bin,
15.1 to 30.0 degrees could be a second bin, and 30.1 to 50.0 degrees could
be a third bin.</p>


"
calibration layer (Google Machine Learning Glossary)|"

<p>A post-prediction adjustment, typically to account for
<a href=""https://developers.google.com/machine-learning/glossary/#prediction_bias""><strong>prediction bias</strong></a>. The adjusted predictions and
probabilities should match the distribution of an observed set of labels.</p>

"
candidate generation (Google Machine Learning Glossary)|"

<p>The initial set of recommendations chosen by a recommendation system. For
example, consider a bookstore that offers 100,000 titles. The candidate
generation phase creates a much smaller list of suitable books for a
particular user, say 500. But even 500 books is way too many to recommend
to a user. Subsequent, more expensive, phases of a recommendation system
(such as <a href=""https://developers.google.com/machine-learning/glossary/#scoring""><strong>scoring</strong></a> and <a href=""https://developers.google.com/machine-learning/glossary/#re-ranking""><strong>re-ranking</strong></a>) whittle
down those 500 to a much smaller, more useful set of recommendations.</p>

"
candidate sampling (Google Machine Learning Glossary)|"

<p>A training-time optimization in which a probability is calculated for all the
positive labels, using, for example, <a href=""https://developers.google.com/machine-learning/glossary/#softmax""><strong>softmax</strong></a>,
but only for a random
sample of negative labels. For example, if we have an example labeled
<em>beagle</em> and <em>dog</em> candidate sampling computes the predicted probabilities
and corresponding loss terms for the <em>beagle</em> and <em>dog</em> class outputs
in addition to a random subset of the remaining classes
(<em>cat</em>, <em>lollipop</em>, <em>fence</em>). The idea is that the
<a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative classes</strong></a> can learn from less frequent
negative reinforcement as long as
<a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive classes</strong></a> always get proper positive
reinforcement, and this is indeed observed empirically. The motivation for
candidate sampling is a computational efficiency win from not computing
predictions for all negatives.</p>

"
categorical data (Google Machine Learning Glossary)|"

<p><a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>Features</strong></a> having a discrete set of possible values. For example,
consider a categorical feature named <code translate=""no"" dir=""ltr"">house style</code>, which has a discrete set of
three possible values: <code translate=""no"" dir=""ltr"">Tudor, ranch, colonial</code>. By representing <code translate=""no"" dir=""ltr"">house style</code>
as categorical data, the model can learn the separate impacts of <code translate=""no"" dir=""ltr"">Tudor</code>,
<code translate=""no"" dir=""ltr"">ranch</code>, and <code translate=""no"" dir=""ltr"">colonial</code> on house price.</p>

<p>Sometimes, values in the discrete set are mutually exclusive, and only one
value can be applied to a given example. For example, a <code translate=""no"" dir=""ltr"">car maker</code>
categorical feature would probably permit only a single value (<code translate=""no"" dir=""ltr"">Toyota</code>)
per example.  Other times, more than one value may be applicable. A single
car could be painted more than one different color, so a <code translate=""no"" dir=""ltr"">car color</code>
categorical feature would likely permit a single example to have multiple
values (for example, <code translate=""no"" dir=""ltr"">red</code> and <code translate=""no"" dir=""ltr"">white</code>).</p>

<p>Categorical features are sometimes called
<a href=""https://developers.google.com/machine-learning/glossary/#discrete_feature""><strong>discrete features</strong></a>.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#numerical_data""><strong>numerical data</strong></a>.</p>

"
centroid (Google Machine Learning Glossary)|"

<p>The center of a cluster as determined by a <a href=""https://developers.google.com/machine-learning/glossary/#k-means""><strong>k-means</strong></a> or
<a href=""https://developers.google.com/machine-learning/glossary/#k-median""><strong>k-median</strong></a> algorithm. For instance, if k is 3,
then the k-means or k-median algorithm finds 3 centroids.</p>

"
centroid-based clustering (Google Machine Learning Glossary)|"

<p>A category of <a href=""https://developers.google.com/machine-learning/glossary/#clustering""><strong>clustering</strong></a> algorithms that organizes data
into nonhierarchical clusters. <a href=""https://developers.google.com/machine-learning/glossary/#k-means""><strong>k-means</strong></a> is the most widely
used centroid-based clustering algorithm.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#hierarchical_clustering""><strong>hierarchical clustering</strong></a>
algorithms.</p>

"
checkpoint (Google Machine Learning Glossary)|"

<p>Data that captures the state of the variables of a model at a particular
time. Checkpoints enable exporting model <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weights</strong></a>, as well
as performing training across multiple sessions. Checkpoints also enable
training to continue past errors (for example, job preemption). Note that
the <a href=""https://developers.google.com/machine-learning/glossary/#graph""><strong>graph</strong></a> itself is not included in a checkpoint.</p>

"
class (Google Machine Learning Glossary)|"

<p>One of a set of enumerated target values for a label. For example, in a
<a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a> model that detects
spam, the two classes are <em>spam</em> and <em>not spam</em>.  In a
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a> model that
identifies dog breeds, the classes would be <em>poodle</em>, <em>beagle</em>, <em>pug</em>, and so
on.</p>

"
classification model (Google Machine Learning Glossary)|"

<p>A type of machine learning model for distinguishing among two or more
discrete classes. For example, a natural language processing classification
model could determine whether an input sentence was in French, Spanish,
or Italian. Compare with <a href=""https://developers.google.com/machine-learning/glossary/#regression_model""><strong>regression model</strong></a>.</p>

"
classification threshold (Google Machine Learning Glossary)|"

<p>A scalar-value criterion that is applied to a model&#39;s predicted score in order
to separate the <a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive class</strong></a> from the <a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative
class</strong></a>.  Used when mapping
<a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a> results to
<a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a>. For example, consider
a logistic regression model that determines the probability of a given email
message being spam. If the classification threshold is 0.9, then logistic
regression values above 0.9 are classified as <em>spam</em> and those below
0.9 are classified as <em>not spam</em>.</p>

"
class-imbalanced dataset (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a> problem in which the
<a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>labels</strong></a> for the two classes have significantly different
frequencies.  For example, a disease dataset in which 0.0001 of examples
have positive labels and 0.9999 have negative labels is a class-imbalanced
problem, but a football game predictor in which 0.51 of examples label one
team winning and 0.49 label the other team winning is <em>not</em> a
class-imbalanced problem.</p>

"
clipping (Google Machine Learning Glossary)|"

<p>A technique for handling <a href=""https://developers.google.com/machine-learning/glossary/#outliers""><strong>outliers</strong></a>. Specifically, reducing
feature values that are greater than a set maximum value down to that maximum
value. Also, increasing feature values that are less than a specific minimum
value up to that minimum value.</p>

<p>For example, suppose that only a few feature values fall outside the
range 40–60. In this case, you could do the following:</p>

<ul align=""left"">
<li>Clip all values over 60 to be exactly 60.</li>
<li>Clip all values under 40 to be exactly 40.</li>
</ul>

<p>In addition to bringing <em>input values</em> within a designated range, clipping
can also used to force <em>gradient values</em> within a designated range during
training.</p>

"
Cloud TPU (Google Machine Learning Glossary)|"

<p>A specialized hardware accelerator designed to speed up machine
learning workloads on Google Cloud Platform.</p>

"
clustering (Google Machine Learning Glossary)|"

<p>Grouping related <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a>, particularly during
<a href=""https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning""><strong>unsupervised learning</strong></a>. Once all the
examples are grouped, a human can optionally supply meaning to each cluster.</p>

<p>Many clustering algorithms exist.  For example, the <a href=""https://developers.google.com/machine-learning/glossary/#k-means""><strong>k-means</strong></a>
algorithm clusters examples based on their proximity to a
<a href=""https://developers.google.com/machine-learning/glossary/#centroid""><strong>centroid</strong></a>, as in the following diagram:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/Cluster.svg"">
</p>

<p>A human researcher could then review the clusters and, for example,
label cluster 1 as &quot;dwarf trees&quot; and cluster 2 as &quot;full-size trees.&quot;</p>

<p>As another example, consider a clustering algorithm based on an
example&#39;s distance from a center point, illustrated as follows:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/RingCluster.svg"">
</p>

"
co-adaptation (Google Machine Learning Glossary)|"

<p>When <a href=""https://developers.google.com/machine-learning/glossary/#neuron""><strong>neurons</strong></a> predict patterns in training data by relying
almost exclusively on outputs of specific other neurons instead of relying on
the network&#39;s behavior as a whole. When the patterns that cause co-adaption
are not present in validation data, then co-adaptation causes overfitting.
<a href=""https://developers.google.com/machine-learning/glossary/#dropout_regularization""><strong>Dropout regularization</strong></a> reduces co-adaptation
because dropout ensures neurons cannot rely solely on specific other neurons.</p>

"
collaborative filtering (Google Machine Learning Glossary)|"

<p>Making <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a> about the interests of one user
based on the interests of many other users.  Collaborative filtering
is often used in <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation systems</strong></a>.</p>

"
confirmation bias (Google Machine Learning Glossary)|"

<p>The tendency to search for, interpret, favor, and recall information in a
way that confirms one&#39;s preexisting beliefs or hypotheses.
Machine learning developers may inadvertently collect or label
data in ways that influence an outcome supporting their existing
beliefs.  Confirmation bias is a form of <a href=""https://developers.google.com/machine-learning/glossary/#implicit_bias""><strong>implicit bias</strong></a>.</p>

<p><strong>Experimenter&#39;s bias</strong> is a form of confirmation bias in which
an experimenter continues training models until a preexisting
hypothesis is confirmed.</p>

"
confusion matrix (Google Machine Learning Glossary)|"

<p>An NxN table that summarizes how successful a
<a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification model&#39;s</strong></a> predictions were; that is,
the correlation between the label and the model&#39;s classification. One axis of
a confusion matrix is the <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a> that the model predicted, and the
other axis is the actual label. N represents the number of
<a href=""https://developers.google.com/machine-learning/glossary/#class""><strong>classes</strong></a>. In a <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a>
problem, N=2. For example, here is a sample confusion matrix for a
binary classification problem:</p>

<table>
<thead>
<tr>
<th></th>
<th>Tumor (predicted)</th>
<th>Non-Tumor (predicted)</th>
</tr>
</thead>

<tbody>
<tr>
<td>Tumor (actual)</td>
<td>18</td>
<td>1</td>
</tr>
<tr>
<td>Non-Tumor (actual)</td>
<td>6</td>
<td>452</td>
</tr>
</tbody>
</table>

<p>The preceding confusion matrix shows that of the 19 samples that actually had
tumors, the model correctly classified 18 as having tumors
(18 <a href=""https://developers.google.com/machine-learning/glossary/#TP""><strong>true positives</strong></a>), and incorrectly classified 1 as
not having a tumor (1 <a href=""https://developers.google.com/machine-learning/glossary/#FN""><strong>false negative</strong></a>). Similarly, of
458 samples that actually did not have tumors, 452 were correctly classified
(452 <a href=""https://developers.google.com/machine-learning/glossary/#TN""><strong>true negatives</strong></a>) and 6 were
incorrectly classified (6 <a href=""https://developers.google.com/machine-learning/glossary/#FP""><strong>false positives</strong></a>).</p>

<p>The confusion matrix for a <a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a>
problem can help you determine mistake patterns. For example, a
confusion matrix could reveal that a model trained to recognize handwritten
digits tends to mistakenly predict 9 instead of 4, or 1 instead of 7.</p>

<p>Confusion matrices contain sufficient information to calculate a
variety of performance metrics, including <a href=""https://developers.google.com/machine-learning/glossary/#precision""><strong>precision</strong></a>
and <a href=""https://developers.google.com/machine-learning/glossary/#recall""><strong>recall</strong></a>.</p>

"
continuous feature (Google Machine Learning Glossary)|"

<p>A floating-point feature with an infinite range of possible values.
Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#discrete_feature""><strong>discrete feature</strong></a>.</p>

"
convenience sampling (Google Machine Learning Glossary)|"

<p>Using a dataset not gathered scientifically in order to run quick
experiments. Later on, it&#39;s essential to switch to a scientifically gathered
dataset.</p>

"
convergence (Google Machine Learning Glossary)|"

<p>Informally, often refers to a state reached during <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>training</strong></a>
in which training <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> and <a href=""https://developers.google.com/machine-learning/glossary/#validation""><strong>validation</strong></a> loss
change very little or not at all
with each iteration after a certain number of iterations. In other words, a
model reaches convergence when additional training on the current data will
not improve the model. In <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep learning</strong></a>, loss values
sometimes stay constant or nearly so for many iterations before finally
descending, temporarily producing a false sense of convergence.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#early_stopping""><strong>early stopping</strong></a>.</p>

<p>See also Boyd and Vandenberghe,
<a href=""https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf""
target=""T"">Convex Optimization</a>.</p>

"
convex function (Google Machine Learning Glossary)|"

<p>A function in which the region above the graph of the function is a
<a href=""https://developers.google.com/machine-learning/glossary/#convex_set""><strong>convex set</strong></a>.  The prototypical convex function is
shaped something like the letter <strong>U</strong>.  For example, the following
are all convex functions:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/convex_functions.png"" height=""300""
alt=""A typical convex function is shaped like the letter 'U'.""></img>
</p>

<p>By contrast, the following function is not convex.  Notice how the
region above the graph is not a convex set:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/nonconvex_function.svg"">
</p>

<p>A <strong>strictly convex function</strong> has exactly one local minimum point, which
is also the global minimum point. The classic U-shaped functions are
strictly convex functions.  However, some convex functions
(for example, straight lines) are not U-shaped.</p>

<p>A lot of the common <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss functions</strong></a>, including the
following, are convex functions:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#L2_loss""><strong>L<sub>2</sub> loss</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#Log_Loss""><strong>Log Loss</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#L1_regularization""><strong>L<sub>1</sub> regularization</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#L2_regularization""><strong>L<sub>2</sub> regularization</strong></a></li>
</ul>

<p>Many variations of <a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> are
guaranteed to find a point close to the minimum of a
strictly convex function.  Similarly, many variations of
<a href=""https://developers.google.com/machine-learning/glossary/#SGD""><strong>stochastic gradient descent</strong></a> have a high probability
(though, not a guarantee) of finding a point close to the minimum of a
strictly convex function.</p>

<p>The sum of two convex functions (for example,
L<sub>2</sub> loss + L<sub>1</sub> regularization) is a convex function.</p>

<p><a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>Deep models</strong></a> are never convex functions.
Remarkably, algorithms designed for
<a href=""https://developers.google.com/machine-learning/glossary/#convex_optimization""><strong>convex optimization</strong></a> tend to find
reasonably good solutions on deep networks anyway, even though
those solutions are not guaranteed to be a global minimum.</p>

"
convex optimization (Google Machine Learning Glossary)|"

<p>The process of using mathematical techniques such as
<a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> to find
the minimum of a <a href=""https://developers.google.com/machine-learning/glossary/#convex_function""><strong>convex function</strong></a>.
A great deal of research in machine learning has focused on formulating various
problems as convex optimization problems and in solving those problems more
efficiently.</p>

<p>For complete details, see Boyd and Vandenberghe,
<a href=""https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf"" target=""T"">Convex
Optimization</a>.</p>

"
convex set (Google Machine Learning Glossary)|"

<p>A subset of Euclidean space such that a line drawn between any two points in the
subset remains completely within the subset.  For instance, the following two
shapes are convex sets:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/convex_set.png"" alt=""A rectangle
and a semi-ellipse are both convex sets.""></img>
</p>

<p>By contrast, the following two shapes are not convex sets:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/nonconvex_set.png"" alt=""A pie-chart
with a missing slice and a firework are both nonconvex sets.""></img>
</p>

"
convolution (Google Machine Learning Glossary)|"

<p>In mathematics, casually speaking, a mixture of two functions. In machine
learning, a convolution mixes the convolutional filter and the input matrix
in order to train <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weights</strong></a>.</p>

<p>The term &quot;convolution&quot; in machine learning is often a shorthand way of
referring to either <a href=""https://developers.google.com/machine-learning/glossary/#convolutional_operation""><strong>convolutional operation</strong></a>
or <a href=""https://developers.google.com/machine-learning/glossary/#convolutional_layer""><strong>convolutional layer</strong></a>.</p>

<p>Without convolutions, a machine learning algorithm would have to learn
a separate weight for every cell in a large <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>tensor</strong></a>.  For example,
a machine learning algorithm training on 2K x 2K images would be forced to
find 4M separate weights. Thanks to convolutions, a machine learning
algorithm only has to find weights for every cell in the
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_filter""><strong>convolutional filter</strong></a>, dramatically reducing
the memory needed to train the model.  When the convolutional filter is
applied, it is simply replicated across cells such that each is multiplied
by the filter.</p>

"
convolutional filter (Google Machine Learning Glossary)|"

<p>One of the two actors in a
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_operation""><strong>convolutional operation</strong></a>. (The other actor
is a slice of an input matrix.) A convolutional filter is a matrix having
the same <a href=""https://developers.google.com/machine-learning/glossary/#rank""><strong>rank</strong></a> as the input matrix, but a smaller shape.
For example, given a 28x28 input matrix, the filter could be any 2D matrix
smaller than 28x28.</p>

<p>In photographic manipulation, all the cells in a convolutional filter are
typically set to a constant pattern of ones and zeroes. In machine learning,
convolutional filters are typically seeded with random numbers and then the
network <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>trains</strong></a> the ideal values.</p>

"
convolutional layer (Google Machine Learning Glossary)|"

<p>A layer of a <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep neural network</strong></a> in which a
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_filter""><strong>convolutional filter</strong></a> passes along an input
matrix.  For example, consider the following 3x3
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_filter""><strong>convolutional filter</strong></a>:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/ConvolutionalFilter33.svg"">
</p>

<p>The following animation shows a convolutional layer consisting of 9
convolutional operations involving the 5x5 input matrix. Notice that each
convolutional operation works on a different 3x3 slice of the input matrix.
The resulting 3x3 matrix (on the right) consists of the results of the 9
convolutional operations:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/AnimatedConvolution.gif"">
</p>

"
convolutional neural network (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> in which at least one layer is a
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_layer""><strong>convolutional layer</strong></a>. A typical convolutional
neural network consists of some combination of the following layers:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#convolutional_layer""><strong>convolutional layers</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#pooling""><strong>pooling layers</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#dense_layer""><strong>dense layers</strong></a></li>
</ul>

<p>Convolutional neural networks have had great success in certain kinds
of problems, such as image recognition.</p>

"
convolutional operation (Google Machine Learning Glossary)|"

<p>The following two-step mathematical operation:</p>

<ol align=""left"">
<li>Element-wise multiplication of the
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_filter""><strong>convolutional filter</strong></a> and a slice of an
input matrix. (The slice of the input matrix has the same rank and
size as the convolutional filter.)</li>
<li>Summation of all the values in the resulting product matrix.</li>
</ol>

<p>For example, consider the following 5x5 input matrix:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/ConvolutionalLayerInputMatrix.svg"">
</p>

<p>Now imagine the following 2x2 convolutional filter:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/ConvolutionalLayerFilter.svg"">
</p>

<p>Each convolutional operation involves a single 2x2 slice of the
input matrix. For instance, suppose we use the 2x2 slice at the
top-left of the input matrix.  So, the convolution operation on
this slice looks as follows:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/ConvolutionalLayerOperation.svg"">
</p>

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#convolutional_layer""><strong>convolutional layer</strong></a> consists of a
series of convolutional operations, each acting on a different slice
of the input matrix.</p>

"
cost (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a>.</p>

"
counterfactual fairness (Google Machine Learning Glossary)|"
A <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>fairness metric</strong></a> that checks whether a classifier
produces the same result for one individual as it does for another individual
who is identical to the first, except with respect to one or more
<a href=""https://developers.google.com/machine-learning/glossary/#sensitive_attribute""><strong>sensitive attributes</strong></a>. Evaluating a classifier for
counterfactual fairness is one method for surfacing potential sources of
bias in a model.</p>

<p>See
<a href=""https://papers.nips.cc/paper/7220-when-worlds-collide-integrating-different-counterfactual-assumptions-in-fairness.pdf"" target=""T"">&quot;When Worlds
Collide: Integrating Different Counterfactual Assumptions in Fairness&quot;</a>
for a more detailed discussion of counterfactual fairness.</p>

"
coverage bias (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>selection bias</strong></a>.</p>

"
crash blossom (Google Machine Learning Glossary)|"

<p>A sentence or phrase with an ambiguous meaning.
Crash blossoms present a significant problem in <a href=""https://developers.google.com/machine-learning/glossary/#natural_language_understanding""><strong>natural
language understanding</strong></a>.
For example, the headline <em>Red Tape Holds Up Skyscraper</em> is a
crash blossom because an NLU model could interpret the headline literally or
figuratively.</p>

"
critic (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#deep_q-network""><strong>Deep Q-Network</strong></a>.</p>

"
cross-entropy (Google Machine Learning Glossary)|"

<p>A generalization of <a href=""https://developers.google.com/machine-learning/glossary/#Log_Loss""><strong>Log Loss</strong></a> to
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification problems</strong></a>. Cross-entropy
quantifies the difference between two probability distributions.  See also
<a href=""https://developers.google.com/machine-learning/glossary/#perplexity""><strong>perplexity</strong></a>.</p>

"
cross-validation (Google Machine Learning Glossary)|"

<p>A mechanism for estimating how well a model will generalize to new data by
testing the model against one or more non-overlapping data subsets withheld
from the <a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training set</strong></a>.</p>

"
custom Estimator (Google Machine Learning Glossary)|"

<p>An <a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimator</strong></a> that you write yourself by following
<a href=""https://www.tensorflow.org/guide/custom_estimators""
target=""T"">these directions</a>.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#premade_Estimator""><strong>premade Estimators</strong></a>.</p>


"
data analysis (Google Machine Learning Glossary)|"

<p>Obtaining an understanding of data by considering samples, measurement,
and visualization. Data analysis can be particularly useful when a
dataset is first received, before one builds the first model. It is
also crucial in understanding experiments and debugging problems with
the system.</p>

"
data augmentation (Google Machine Learning Glossary)|"

<p>Artificially boosting the range and number of <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>training</strong></a> examples
by transforming existing examples to create additional examples. For example,
suppose images are one of your features, but your dataset doesn&#39;t contain
enough image examples for the model to learn useful associations. Ideally,
you&#39;d add enough <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>labeled</strong></a> images to your dataset to enable your
model to train properly. If that&#39;s not possible, data augmentation can rotate,
stretch, and reflect each image to produce many variants of the original
picture, possibly yielding enough labeled data to enable excellent training.</p>

"
DataFrame (Google Machine Learning Glossary)|"

<p>A popular datatype for representing datasets in <a href=""https://developers.google.com/machine-learning/glossary/#pandas""><strong>pandas</strong></a>. A
DataFrame is analogous to a table. Each column of the DataFrame has a name (a
header), and each row is identified by a number.</p>

"
data set or dataset (Google Machine Learning Glossary)|"

<p>A collection of <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a>.</p>

"
Dataset API (tf.data) (Google Machine Learning Glossary)|"

<p>A high-level <a href=""https://developers.google.com/machine-learning/glossary/#TensorFlow""><strong>TensorFlow</strong></a> API for reading data and
transforming it into a form that a machine learning algorithm requires.
A <code translate=""no"" dir=""ltr"">tf.data.Dataset</code> object represents a sequence of elements, in which
each element contains one or more <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensors</strong></a>. A <code translate=""no"" dir=""ltr"">tf.data.Iterator</code>
object provides access to the elements of a <code translate=""no"" dir=""ltr"">Dataset</code>.</p>

<p>For details about the Dataset API, see
<a href=""https://www.tensorflow.org/programmers_guide/datasets""
target=""T"">Importing Data</a>
in the TensorFlow Programmer&#39;s Guide.</p>

"
decision boundary (Google Machine Learning Glossary)|"

<p>The separator between classes learned by a model in a
<a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary class</strong></a> or
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification problems</strong></a>. For example,
in the following image representing a binary classification problem,
the decision boundary is the frontier between the orange class and
the blue class:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/decision_boundary.png"" alt=""A
well-defined boundary between one class and another.""></img>
</p>

"
decision threshold (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#classification_threshold""><strong>classification threshold</strong></a>.</p>

"
decision tree (Google Machine Learning Glossary)|"

<p>A model represented as a sequence of branching statements. For example, the
following over-simplified decision tree branches a few times to
predict the price of a house (in thousands of USD).  According to this
decision tree, a house larger than 160 square meters, having more than three
bedrooms, and built less than 10 years ago would have a predicted price of
510 thousand USD.</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/DecisionTree.svg"" alt=""A
tree three-levels deep whose branches predict house prices.""></img>
</p>

<p>Machine learning can generate deep decision trees.</p>

"
deep model (Google Machine Learning Glossary)|"

<p>A type of <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> containing multiple
<a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layers</strong></a>.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#wide_model""><strong>wide model</strong></a>.</p>

"
deep neural network (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep model</strong></a>.</p>


"
Deep Q-Network (DQN) (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#q-learning""><strong>Q-learning</strong></a>, a deep <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>
that predicts <a href=""https://developers.google.com/machine-learning/glossary/#q-function""><strong>Q-functions</strong></a>.</p>

<p><strong>Critic</strong> is a synonym for Deep Q-Network.</p>

"
demographic parity (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>fairness metric</strong></a> that is satisfied if
the results of a model&#39;s classification are not dependent on a
given <a href=""https://developers.google.com/machine-learning/glossary/#sensitive_attribute""><strong>sensitive attribute</strong></a>.</p>

<p>For example, if both Lilliputians and Brobdingnagians apply to
Glubbdubdrib University, demographic parity is achieved if the percentage
of Lilliputians admitted is the same as the percentage of Brobdingnagians
admitted, irrespective of whether one group is on average more qualified
than the other.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#equalized_odds""><strong>equalized odds</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#equality_of_opportunity""><strong>equality of opportunity</strong></a>, which permit
classification results in aggregate to depend on sensitive attributes,
but do not permit classification results for certain specified ground-truth
labels to depend on sensitive attributes.  See
<a href=""http://research.google.com/bigpicture/attacking-discrimination-in-ml/"" target=""T"">&quot;Attacking
discrimination with smarter machine learning&quot;</a> for a visualization
exploring the tradeoffs when optimizing for demographic parity.</p>

"
dense feature (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a> in which most values are non-zero, typically
a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a> of floating-point values. Contrast with
<a href=""https://developers.google.com/machine-learning/glossary/#sparse_features""><strong>sparse feature</strong></a>.</p>

"
dense layer (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#fully_connected_layer""><strong>fully connected layer</strong></a>.</p>

"
depth (Google Machine Learning Glossary)|"

<p>The number of <a href=""https://developers.google.com/machine-learning/glossary/#layer""><strong>layers</strong></a> (including any
<a href=""https://developers.google.com/machine-learning/glossary/#embeddings""><strong>embedding</strong></a> layers) in a <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>
that learn weights. For example, a neural network with 5
<a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layers</strong></a> and 1 output layer has a depth of 6.</p>

"
depthwise separable convolutional neural network (sepCNN) (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#convolutional_neural_network""><strong>convolutional neural network</strong></a>
architecture based on
<a href=""https://github.com/tensorflow/tpu/tree/master/models/experimental/inception"">Inception</a>,
but where Inception modules are replaced with depthwise separable
convolutions.  Also known as Xception.</p>

<p>A depthwise separable convolution (also abbreviated as separable convolution)
factors a standard 3-D convolution into two separate convolution operations
that are more computationally efficient: first, a depthwise convolution,
with a depth of 1 (n ✕ n ✕ 1), and then second, a pointwise convolution,
with length and width of 1 (1 ✕ 1 ✕ n).</p>

<p>To learn more, see <a href=""https://arxiv.org/pdf/1610.02357.pdf"">Xception: Deep Learning with Depthwise Separable
Convolutions</a>.</p>

"
device (Google Machine Learning Glossary)|"

<p>A category of hardware that can run a TensorFlow session, including
CPUs, GPUs, and <a href=""https://developers.google.com/machine-learning/glossary/#TPU""><strong>TPUs</strong></a>.</p>

"
dimension reduction (Google Machine Learning Glossary)|"

<p>Decreasing the number of dimensions used to represent a particular feature in
a feature vector, typically by converting to an <a href=""https://developers.google.com/machine-learning/glossary/#embeddings""><strong>embedding</strong></a>.</p>

"
dimensions (Google Machine Learning Glossary)|"

<p>Overloaded term having any of the following definitions:</p>

<ul align=""left"">
<li><p>The number of levels of coordinates in a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a>. For
example:</p>

<ul align=""left"">
<li>A scalar has zero dimensions; for example, <code translate=""no"" dir=""ltr"">[&quot;Hello&quot;]</code>.</li>
<li>A vector  has one dimension; for example, <code translate=""no"" dir=""ltr"">[3, 5, 7, 11]</code>.</li>
<li>A matrix has two dimensions; for example, <code translate=""no"" dir=""ltr"">[[2, 4, 18], [5, 7, 14]]</code>.</li>
</ul>

<p>You can uniquely specify a particular cell in a one-dimensional vector
with one coordinate; you need two coordinates to uniquely specify a
particular cell in a two-dimensional matrix.</p></li>
<li><p>The number of entries in a <a href=""https://developers.google.com/machine-learning/glossary/#feature_vector""><strong>feature vector</strong></a>.</p></li>
<li><p>The number of elements in an <a href=""https://developers.google.com/machine-learning/glossary/#embeddings""><strong>embedding</strong></a> layer.
</ul></p></li>
</ul>

"
discrete feature (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a> with a finite set of possible values. For example,
a feature whose values may only be <em>animal</em>, <em>vegetable</em>, or <em>mineral</em> is a
discrete (or categorical) feature. Contrast with
<a href=""https://developers.google.com/machine-learning/glossary/#continuous_feature""><strong>continuous feature</strong></a>.</p>

"
discriminative model (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> that predicts labels from a set of one or more
features. More formally, discriminative models define the conditional
probability of an output given the features and weights; that is:</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
p(output | features, weights)
</pre>

<p>For example, a model that predicts whether an email is spam from features
and weights is a discriminative model.</p>

<p>The vast majority of supervised learning models, including classification
and regression models, are discriminative models.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#generative_model""><strong>generative model</strong></a>.</p>

"
discriminator (Google Machine Learning Glossary)|"

<p>A system that determines whether examples are real or fake.</p>

<p>The subsystem within a <a href=""https://developers.google.com/machine-learning/glossary/#generative_adversarial_network""><strong>generative adversarial
network</strong></a> that determines whether
the examples created by the <a href=""https://developers.google.com/machine-learning/glossary/#generator""><strong>generator</strong></a> are real or fake.</p>

"
disparate impact (Google Machine Learning Glossary)|"

<p>Making decisions about people that impact different population
subgroups disproportionately. This usually refers to situations
where an algorithmic decision-making process harms or benefits
some subgroups more than others.</p>

<p>For example, suppose an algorithm that determines a Lilliputian&#39;s
eligibility for a miniature-home loan is more likely to classify
them as “ineligible” if their mailing address contains a certain
postal code. If Big-Endian Lilliputians  are more likely to have
mailing addresses with this postal code than Little-Endian Lilliputians,
then this algorithm may result in disparate impact.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#disparate_treatment""><strong>disparate treatment</strong></a>,
which focuses on disparities that result when subgroup characteristics
are explicit inputs to an algorithmic decision-making process.</p>

"
disparate treatment (Google Machine Learning Glossary)|"

<p>Factoring subjects&#39; <a href=""https://developers.google.com/machine-learning/glossary/#sensitive_attribute""><strong>sensitive attributes</strong></a>
into an algorithmic decision-making process such that different subgroups
of people are treated differently.</p>

<p>For example, consider an algorithm that
determines Lilliputians’ eligibility for a miniature-home loan based on the
data they provide in their loan application.  If the algorithm uses a
Lilliputian’s affiliation as Big-Endian or Little-Endian as an input, it
is enacting disparate treatment along that dimension.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#disparate_impact""><strong>disparate impact</strong></a>, which focuses
on disparities in the societal impacts of algorithmic decisions on subgroups,
irrespective of whether those subgroups are inputs to the model.</p>
<aside class=""warning""><strong>Warning:</strong><span> Because sensitive attributes are almost always correlated with
other features the data may have, explicitly removing sensitive attribute
information does not guarantee that subgroups will be treated equally.
For example, removing sensitive demographic attributes from a training
data set that still includes postal code as a feature may address disparate
treatment of subgroups, but there still might be
disparate impact upon these groups because
postal code might serve as a <a href=""https://developers.google.com/machine-learning/glossary/#proxy_sensitive_attributes""><strong>proxy</strong></a> for other
demographic information.</span></aside>
"
divisive clustering (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#hierarchical_clustering""><strong>hierarchical clustering</strong></a>.</p>

"
downsampling (Google Machine Learning Glossary)|"

<p>Overloaded term that can mean either of the following:</p>

<ul align=""left"">
<li>Reducing the amount of information in a feature in order to train
a model more efficiently. For example, before training an image
recognition model, downsampling high-resolution images to a
lower-resolution format.</li>
<li>Training on a disproportionately low percentage of over-represented class
examples in order to improve model training on under-represented classes.
For example, in a <a href=""https://developers.google.com/machine-learning/glossary/#class_imbalanced_data_set""><strong>class-imbalanced
dataset</strong></a>, models tend to learn a lot about the
<a href=""https://developers.google.com/machine-learning/glossary/#majority_class""><strong>majority class</strong></a> and not enough about the
<a href=""https://developers.google.com/machine-learning/glossary/#minority_class""><strong>minority class</strong></a>. Downsampling helps
balance the amount of training on the majority and minority classes.</li>
</ul>

"
DQN (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#deep_q-network""><strong>Deep Q-Network</strong></a>.</p>

"
dropout regularization (Google Machine Learning Glossary)|"

<p>A form of <a href=""https://developers.google.com/machine-learning/glossary/#regularization""><strong>regularization</strong></a> useful in training
<a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural networks</strong></a>. Dropout regularization works by
removing a random selection of a fixed number of the units in a network
layer for a single gradient step. The more units dropped out, the stronger
the regularization. This is analogous to training the network to emulate
an exponentially large ensemble of smaller networks. For full details, see
<a href=""http://jmlr.org/papers/volume15/srivastava14a.old/srivastava14a.pdf""
target=""T"">Dropout: A Simple Way to Prevent Neural Networks from
Overfitting</a>.</p>

"
dynamic model (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> that is trained online in a continuously
updating fashion.  That is, data is continuously entering the model.</p>


"
eager execution (Google Machine Learning Glossary)|"

<p>A TensorFlow programming environment in which <a href=""https://developers.google.com/machine-learning/glossary/#Operation""><strong>operations</strong></a>
run immediately. By contrast, operations called in
<a href=""https://developers.google.com/machine-learning/glossary/#graph_execution""><strong>graph execution</strong></a> don&#39;t run until they are explicitly
evaluated. Eager execution is an
<a href=""https://wikipedia.org/wiki/Imperative_programming""
target=""T"">imperative interface</a>, much
like the code in most programming languages. Eager execution programs are
generally far easier to debug than graph execution programs.</p>

"
early stopping (Google Machine Learning Glossary)|"

<p>A method for <a href=""https://developers.google.com/machine-learning/glossary/#regularization""><strong>regularization</strong></a> that involves ending
model training <em>before</em> training loss finishes decreasing. In early
stopping, you end model training when the loss on a
<a href=""https://developers.google.com/machine-learning/glossary/#validation_set""><strong>validation dataset</strong></a> starts to increase, that is, when
<a href=""https://developers.google.com/machine-learning/glossary/#generalization""><strong>generalization</strong></a> performance worsens.</p>

"
embeddings (Google Machine Learning Glossary)|"

<p>A categorical feature represented as a continuous-valued feature.
Typically, an embedding is a translation of a high-dimensional vector
into a low-dimensional space. For example, you can represent the words
in an English sentence in either of the following two ways:</p>

<ul align=""left"">
<li>As a million-element (high-dimensional)
<a href=""https://developers.google.com/machine-learning/glossary/#sparse_features""><strong>sparse vector</strong></a> in which all elements are integers.
Each cell in the vector represents a separate English word; the value in
a cell represents the number of times that word appears in a sentence.
Since a single English sentence is unlikely to contain more than 50 words,
nearly every cell in the vector will contain a 0. The few cells that
aren&#39;t 0 will contain a low integer (usually 1) representing the number of
times that word appeared in the sentence.</li>
<li>As a several-hundred-element (low-dimensional)
<a href=""https://developers.google.com/machine-learning/glossary/#dense_feature""><strong>dense vector</strong></a> in which each element holds a
floating-point value between 0 and 1.  This is an embedding.</li>
</ul>

<p>In TensorFlow, embeddings are trained by <a href=""https://developers.google.com/machine-learning/glossary/#backpropagation""><strong>backpropagating</strong></a>
<a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> just like any other parameter in a
<a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>.</p>

"
embedding space (Google Machine Learning Glossary)|"

<p>The d-dimensional vector space that features from a higher-dimensional
vector space are mapped to. Ideally, the embedding space contains a
structure that yields meaningful mathematical results; for example,
in an ideal embedding space, addition and subtraction of embeddings
can solve word analogy tasks.</p>

<p>The <a href=""https://wikipedia.org/wiki/Dot_product"" target=""T"">dot product</a>
of two embeddings is a measure of their similarity.</p>

"
empirical risk minimization (ERM) (Google Machine Learning Glossary)|"

<p>Choosing the function that minimizes loss on the training set. Contrast
with <a href=""https://developers.google.com/machine-learning/glossary/#SRM""><strong>structural risk minimization</strong></a>.</p>

"
ensemble (Google Machine Learning Glossary)|"

<p>A merger of the predictions of multiple <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>models</strong></a>. You can create an
ensemble via one or more of the following:</p>

<ul align=""left"">
<li>different initializations</li>
<li>different <a href=""https://developers.google.com/machine-learning/glossary/#hyperparameter""><strong>hyperparameters</strong></a></li>
<li>different overall structure</li>
</ul>

<p><a href=""https://www.tensorflow.org/tutorials/wide_and_deep"" target=""T"">Deep and
wide models</a>
are a kind of ensemble.

"
environment (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the world that contains the <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a>
and allows the agent to observe that world&#39;s <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a>. For example,
the represented world can be a game like chess, or a physical world like a
maze. When the agent applies an <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a> to the environment,
then the environment transitions between states.</p>

"
episode (Google Machine Learning Glossary)|"

<p>In reinforcement learning, each of the repeated attempts by the
<a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a> to learn an <a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environment</strong></a>.</p>

"
epoch (Google Machine Learning Glossary)|"

<p>A full training pass over the entire dataset such that each example has been
seen once.  Thus, an epoch represents <code translate=""no"" dir=""ltr"">N</code>/<a href=""https://developers.google.com/machine-learning/glossary/#batch_size""><strong>batch size</strong></a> training
<a href=""https://developers.google.com/machine-learning/glossary/#iteration""><strong>iterations</strong></a>, where <code translate=""no"" dir=""ltr"">N</code> is the total number of examples.</p>

"
epsilon greedy policy (Google Machine Learning Glossary)|"

<p>In reinforcement learning, a <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a> that either follows a
<a href=""https://developers.google.com/machine-learning/glossary/#random_policy""><strong>random policy</strong></a> with epsilon probability or a
<a href=""https://developers.google.com/machine-learning/glossary/#greedy_policy""><strong>greedy policy</strong></a> otherwise. For example, if epsilon is
0.9, then the policy follows a random policy 90% of the time and a greedy
policy 10% of the time.</p>

<p>Over successive episodes, the algorithm reduces epsilon’s value in order
to shift from following a random policy to following a greedy policy. By
shifting the policy, the agent first randomly explores the environment and
then greedily exploits the results of random exploration.</p>

"
equality (Google Machine Learning Glossary)|"
A <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>fairness metric</strong></a> that checks whether, for
a preferred <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a> (one that confers an advantage or
benefit to a person) and a given <a href=""https://developers.google.com/machine-learning/glossary/#attribute""><strong>attribute</strong></a>, a classifier
predicts that preferred label equally well for all values of that
attribute. In other words, equality of opportunity measures whether
the people who should qualify for an opportunity are equally likely
to do so regardless of their group membership.</p>

<p>For example, suppose Glubbdubdrib University admits both Lilliputians
and Brobdingnagians to a rigorous mathematics program. Lilliputians’
secondary schools offer a robust curriculum of math classes, and the
vast majority of students are qualified for the university program.
Brobdingnagians’ secondary schools don’t offer math classes at all,
and as a result, far fewer of their students are qualified. Equality
of opportunity is satisfied for the preferred label of &quot;admitted&quot; with
respect to nationality (Lilliputian or Brobdingnagian) if qualified
students are equally likely to be admitted irrespective of whether
they&#39;re a Lilliputian or a Brobdingnagian.</p>

<p>For example, let&#39;s say 100 Lilliputians and 100 Brobdingnagians apply
to Glubbdubdrib University, and admissions decisions are made as follows:</p>

<p><b>Table 1.</b> Lilliputian applicants (90% are qualified)</p>

<table>
  <tr> <th>&nbsp;</th>   <th>Qualified</th> <th>Unqualified</th> </tr>
  <tr> <th>Admitted</th> <td>45</td>        <td>3</td>           </tr>
  <tr> <th>Rejected</th> <td>45</td>        <td>7</td>           </tr>
  <tr> <th>Total</th>    <td>90</td>        <td>10</td>          </tr>
  <tr>
     <td colspan=""3"">
        Percentage of qualified students admitted: 45/90 = 50%<br/>
        Percentage of unqualified students rejected: 7/10 = 70%<br/>
        Total percentage of Lilliputian students admitted: (45+3)/100 = 48%
     </td>
  </tr>
</table>

<p>&nbsp;</p>

<p><b>Table 2.</b> Brobdingnagian applicants (10% are qualified):</p>

<table>
  <tr> <th>&nbsp;</th>   <th>Qualified</th> <th>Unqualified</th> </tr>
  <tr> <th>Admitted</th> <td>5</td>         <td>9</td>           </tr>
  <tr> <th>Rejected</th> <td>5</td>         <td>81</td>          </tr>
  <tr> <th>Total</th>    <td>10</td>        <td>90</td>          </tr>
  <tr>
     <td colspan=""3"">
        Percentage of qualified students admitted: 5/10 = 50%<br/>
        Percentage of unqualified students rejected: 81/90 = 90%<br/>
        Total percentage of Brobdingnagian students admitted: (5+9)/100 = 14%
     </td>
  </tr>
</table>

<p>The preceding examples satisfy equality of opportunity for acceptance
of qualified students because qualified Lilliputians and Brobdingnagians
both have a 50% chance of being admitted.</p>
<aside class=""note""><strong>Note:</strong><span> While equality of opportunity is satisfied, the following two
fairness metrics are not satisfied:
<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#demographic_parity""><strong>demographic parity</strong></a>: Lilliputians and
     Brobdingnagians are admitted to the university at different rates;
     48% of Lilliputians students are admitted, but only 14% of
     Brobdingnagian students are admitted.</li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#equalized_odds""><strong>equalized odds</strong></a>: While qualified Lilliputian
     and Brobdingnagian students both have the same chance of being admitted,
     the additional constraint that unqualified Lilliputians and
     Brobdingnagians both have the same chance of being rejected is not
     satisfied. Unqualified Lilliputians have a 70% rejection rate, whereas
     unqualified Brobdingnagians have a 90% rejection rate.</li>
</ul></span></aside>
<p>See
<a href=""https://arxiv.org/pdf/1610.02413.pdf"" target=""T"">&quot;Equality of
Opportunity in Supervised Learning&quot;</a> for a more detailed discussion
of equality of opportunity. Also see
<a href=""http://research.google.com/bigpicture/attacking-discrimination-in-ml/"" target=""T"">&quot;Attacking
discrimination with smarter machine learning&quot;</a> for a visualization
exploring the tradeoffs when optimizing for equality of opportunity.</p>

"
equalized odds (Google Machine Learning Glossary)|"
A <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>fairness metric</strong></a> that checks if, for any particular
label and attribute, a classifier predicts that label equally well for all
values of that attribute.</p>

<p>For example, suppose Glubbdubdrib University admits both Lilliputians and
Brobdingnagians to a rigorous mathematics program. Lilliputians&#39; secondary
schools offer a robust curriculum of math classes, and the vast majority of
students are qualified for the university program. Brobdingnagians&#39; secondary
schools don’t offer math classes at all, and as a result, far fewer of
their students are qualified.  Equalized odds is satisfied provided that no
matter whether an applicant is  a Lilliputian or a Brobdingnagian, if they
are qualified, they are equally as likely to get admitted to the program,
and if they are not qualified, they are equally as likely to get rejected.</p>

<p>Let’s say 100 Lilliputians and 100 Brobdingnagians apply to Glubbdubdrib
University, and admissions decisions are made as follows:</p>

<p><b>Table 3.</b> Lilliputian applicants (90% are qualified)</p>

<table>
  <tr> <th>&nbsp;</th>   <th>Qualified</th> <th>Unqualified</th> </tr>
  <tr> <th>Admitted</th> <td>45</td>        <td>2</td>           </tr>
  <tr> <th>Rejected</th> <td>45</td>        <td>8</td>           </tr>
  <tr> <th>Total</th>    <td>90</td>        <td>10</td>          </tr>
  <tr>
     <td colspan=""3"">
        Percentage of qualified students admitted: 45/90 = 50%<br/>
        Percentage of unqualified students rejected: 8/10 = 80%<br/>
        Total percentage of Lilliputian students admitted: (45+2)/100 = 47%
     </td>
  </tr>
</table>

<p>&nbsp;</p>

<p><b>Table 4.</b> Brobdingnagian applicants (10% are qualified):</p>

<table>
  <tr> <th>&nbsp;</th>   <th>Qualified</th> <th>Unqualified</th> </tr>
  <tr> <th>Admitted</th> <td>5</td>         <td>18</td>           </tr>
  <tr> <th>Rejected</th> <td>5</td>         <td>72</td>           </tr>
  <tr> <th>Total</th>    <td>10</td>        <td>90</td>          </tr>
  <tr>
     <td colspan=""3"">
        Percentage of qualified students admitted: 5/10 = 50%<br/>
        Percentage of unqualified students rejected: 72/90 = 80%<br/>
        Total percentage of Brobdingnagian students admitted: (5+18)/100 = 23%
     </td>
  </tr>
</table>

<p>Equalized odds is satisfied because qualified Lilliputian and Brobdingnagian
students both have a 50% chance of being admitted, and unqualified Lilliputian
and Brobdingnagian have an 80% chance of being rejected.</p>
<aside class=""note""><strong>Note:</strong><span> While equalized odds is satisfied here,
<a href=""https://developers.google.com/machine-learning/glossary/#demographic_parity""><strong>demographic parity</strong></a> is <em>not satisfied</em>. Lilliputian
and Brobdingnagian students are admitted to Glubbdubdrib University at
different rates; 47% of Lilliputian students are admitted, and 23% of
Brobdingnagian students are admitted.</span></aside>
<p>Equalized odds is formally defined in
<a href=""https://arxiv.org/pdf/1610.02413.pdf"" target=""T"">&quot;Equality of
Opportunity in Supervised Learning&quot;</a> as follows:
&quot;predictor Ŷ satisfies equalized odds with respect
to protected attribute A and outcome Y if Ŷ and A are independent,
conditional on Y.&quot;</p>
<aside class=""note""><strong>Note:</strong><span> Contrast equalized odds with the more relaxed
<a href=""https://developers.google.com/machine-learning/glossary/#equality_of_opportunity""><strong>equality of opportunity</strong></a> metric.</span></aside>
"
Estimator (Google Machine Learning Glossary)|"

<p>An instance of the <code translate=""no"" dir=""ltr"">tf.Estimator</code> class, which encapsulates logic that builds
a TensorFlow graph and runs a TensorFlow session. You may create your own
<a href=""https://developers.google.com/machine-learning/glossary/#custom_estimator""><strong>custom Estimators</strong></a> (as described
<a href=""https://www.tensorflow.org/extend/estimators"" target=""T"">here</a>)
or instantiate <a href=""https://developers.google.com/machine-learning/glossary/#premade_Estimator""><strong>premade Estimators</strong></a> created by
others.</p>

"
example (Google Machine Learning Glossary)|"

<p>One row of a dataset. An example contains one or more <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a>
and possibly a <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a>. See also
<a href=""https://developers.google.com/machine-learning/glossary/#labeled_example""><strong>labeled example</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#unlabeled_example""><strong>unlabeled example</strong></a>.</p>

"
experience replay (Google Machine Learning Glossary)|"

<p>In reinforcement learning, a <a href=""https://developers.google.com/machine-learning/glossary/#deep_q-network""><strong>DQN</strong></a> technique used to
reduce temporal correlations in training data. The <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a>
stores state transitions in a <a href=""https://developers.google.com/machine-learning/glossary/#replay_buffer""><strong>replay buffer</strong></a>, and then
samples transitions from the replay buffer to create training data.</p>

"
experimenter&#39;s bias (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#confirmation_bias""><strong>confirmation bias</strong></a>.</p>

"
exploding gradient problem (Google Machine Learning Glossary)|"

<p>The tendency for <a href=""https://developers.google.com/machine-learning/glossary/#gradient""><strong>gradients</strong></a> in a
<a href=""https://developers.google.com/machine-learning/glossary/#deep_neural_network""><strong>deep neural networks</strong></a> (especially
<a href=""https://developers.google.com/machine-learning/glossary/#recurrent_neural_network""><strong>recurrent neural networks</strong></a>) to become
surprisingly steep (high). Steep gradients result in very large updates
to the weights of each node in a deep neural network.</p>

<p>Models suffering from the exploding gradient problem become difficult
or impossible to train. <a href=""https://developers.google.com/machine-learning/glossary/#gradient_clipping""><strong>Gradient clipping</strong></a>
can mitigate this problem.</p>

<p>Compare to <a href=""https://developers.google.com/machine-learning/glossary/#vanishing_gradient_problem""><strong>vanishing gradient problem</strong></a>.</p>


"
fairness constraint (Google Machine Learning Glossary)|"
Applying a constraint to an algorithm to ensure one or more definitions
of fairness are satisfied. Examples of fairness constraints include:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#post-processing""><strong>Post-processing</strong></a> your model&#39;s output.</li>
<li>Altering the <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss function</strong></a> to incorporate a penalty
 for violating a <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>fairness metric</strong></a>.</li>
<li>Directly adding a mathematical constraint to an optimization problem.</li>
</ul>

"
fairness metric (Google Machine Learning Glossary)|"

<p>A mathematical definition of “fairness” that is measurable.
Some commonly used fairness metrics include:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#equalized_odds""><strong>equalized odds</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#predictive_parity""><strong>predictive parity</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#counterfactual_fairness""><strong>counterfactual fairness</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#demographic_parity""><strong>demographic parity</strong></a></li>
</ul>

<p>Many fairness metrics are mutually exclusive; see
<a href=""https://developers.google.com/machine-learning/glossary/#incompatibility_of_fairness_metrics""><strong>incompatibility of fairness metrics</strong></a>.</p>

"
false negative (FN) (Google Machine Learning Glossary)|"

<p>An example in which the model mistakenly predicted the
<a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative class</strong></a>. For example, the model
inferred that a particular email message was not spam
(the negative class), but that email message actually was spam.</p>

"
false positive (FP) (Google Machine Learning Glossary)|"

<p>An example in which the model mistakenly predicted the
<a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive class</strong></a>. For example, the model inferred
that a particular email message was spam (the positive class), but that
email message was actually not spam.</p>

"
false positive rate (FPR) (Google Machine Learning Glossary)|"

<p>The x-axis in an <a href=""https://developers.google.com/machine-learning/glossary/#ROC""><strong>ROC curve</strong></a>. The false positive rate is defined
as follows:</p>

<div>
[$$]\text{False Positive Rate} =
\frac{\text{False Positives}}{\text{False Positives} + \text{True Negatives}}[/$$]
</div>

"
feature (Google Machine Learning Glossary)|"

<p>An input variable used in making <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a>.</p>

"
Feature column (tf.feature_column) (Google Machine Learning Glossary)|"

<p>A function that specifies how a model should interpret a particular feature. A
list that collects the output returned by calls to such functions is a required
parameter to all <a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimators</strong></a> constructors.</p>

<p>The <code translate=""no"" dir=""ltr"">tf.feature_column</code> functions enable models to easily experiment
with different representations of input features. For details, see the
<a href=""https://www.tensorflow.org/guide/feature_columns""
target=""T"">Feature Columns chapter</a>
in the TensorFlow Programmers Guide.</p>

<p>&quot;Feature column&quot; is Google-specific terminology.
A feature column is referred to as a &quot;namespace&quot; in the
<a href=""https://wikipedia.org/wiki/Vowpal_Wabbit"" target=""T"">VW</a>
system (at Yahoo/Microsoft), or a
<a href=""https://www.csie.ntu.edu.tw/~cjlin/libffm/"" target=""T"">field</a>.</p>

"
feature cross (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#synthetic_feature""><strong>synthetic feature</strong></a> formed by crossing (taking a
<a href=""https://wikipedia.org/wiki/Cartesian_product"" target=""T"">Cartesian
product</a> of) individual binary features obtained from
<a href=""https://developers.google.com/machine-learning/glossary/#categorical_data""><strong>categorical data</strong></a> or from
<a href=""https://developers.google.com/machine-learning/glossary/#continuous_feature""><strong>continuous features</strong></a> via <a href=""https://developers.google.com/machine-learning/glossary/#bucketing""><strong>bucketing</strong></a>.
Feature crosses help represent nonlinear relationships.</p>

"
feature engineering (Google Machine Learning Glossary)|"

<p>The process of determining which <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a> might be useful
in training a model, and then converting raw data from log files and other
sources into said features. In TensorFlow, feature engineering often means
converting raw log file entries to <a href=""https://developers.google.com/machine-learning/glossary/#tf.Example""><strong>tf.Example</strong></a>
protocol buffers.  See also
<a href=""https://github.com/tensorflow/transform"" target=""T"">tf.Transform</a>.</p>

<p>Feature engineering is sometimes called <strong>feature extraction</strong>.</p>

"
feature extraction (Google Machine Learning Glossary)|"

<p>Overloaded term having either of the following definitions:</p>

<ul align=""left"">
<li>Retrieving intermediate feature representations calculated by an
<a href=""https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning""><strong>unsupervised</strong></a> or pretrained model
(for example, <a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layer</strong></a> values in a
<a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>) for use in another model as input.</li>
<li>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#feature_engineering""><strong>feature engineering</strong></a>.</li>
</ul>

"
feature set (Google Machine Learning Glossary)|"

<p>The group of <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a> your machine learning model trains on.
For example, postal code, property size, and property condition might
comprise a simple feature set for a model that predicts housing prices.</p>

"
feature spec (Google Machine Learning Glossary)|"

<p>Describes the information required to extract <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a> data
from the <a href=""https://developers.google.com/machine-learning/glossary/#tf.Example""><strong>tf.Example</strong></a> protocol buffer. Because the
tf.Example protocol buffer is just a container for data, you must specify
the following:</p>

<ul align=""left"">
<li>the data to extract (that is, the keys for the features)</li>
<li>the data type (for example, float or int)</li>
<li>The length (fixed or variable)</li>
</ul>

<p>The <a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimator API</strong></a> provides facilities for producing a
feature spec from a list of
<a href=""https://developers.google.com/machine-learning/glossary/#feature_columns""><strong>FeatureColumns</strong></a>.</p>

"
feature vector (Google Machine Learning Glossary)|"

<p>The list of feature values representing an <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>example</strong></a>
passed into a model.</p>

"
federated learning (Google Machine Learning Glossary)|"

<p>A distributed machine learning approach that <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>trains</strong></a>
machine learning <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>models</strong></a> using decentralized
<a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a> residing on devices such as smartphones.
In federated learning, a subset of devices downloads the current model
from a central coordinating server. The devices use the examples stored
on the devices to make improvements to the model. The devices then upload
the model improvements (but not the training examples) to the coordinating
server, where they are aggregated with other updates to yield an improved
global model. After the aggregation, the model updates computed by devices
are no longer needed, and can be discarded.</p>

<p>Since the training examples are never uploaded, federated learning follows the
privacy principles of focused data collection and data minimization.</p>

<p>For more information about federated learning,
see <a href=""https://federated.withgoogle.com"" target=""T"">this tutorial</a>.</p>

"
feedback loop (Google Machine Learning Glossary)|"

<p>In machine learning, a situation in which a model&#39;s predictions influence the
training data for the same model or another model. For example, a model that
recommends movies will influence the movies that people see, which will then
influence subsequent movie recommendation models.</p>

"
feedforward neural network (FFN) (Google Machine Learning Glossary)|"

<p>A neural network without cyclic or recursive connections. For example,
traditional <a href=""https://developers.google.com/machine-learning/glossary/#deep_neural_network""><strong>deep neural networks</strong></a> are
feedforward neural networks. Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#recurrent_neural_network""><strong>recurrent neural
networks</strong></a>, which are cyclic.</p>

"
few-shot learning (Google Machine Learning Glossary)|"

<p>A machine learning approach, often used for object classification,
designed to learn effective classifiers from only a small number of
training examples.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#one-shot_learning""><strong>one-shot learning</strong></a>.</p>

"
fine tuning (Google Machine Learning Glossary)|"

<p>Perform a secondary optimization to adjust the parameters of an already
trained <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> to fit a new problem. Fine tuning often
refers to refitting the weights of a trained
<a href=""https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning""><strong>unsupervised</strong></a> model to a
<a href=""https://developers.google.com/machine-learning/glossary/#supervised_machine_learning""><strong>supervised</strong></a> model.</p>

"
forget gate (Google Machine Learning Glossary)|"

<p>The portion of a <a href=""https://developers.google.com/machine-learning/glossary/#Long_Short-Term_Memory""><strong>Long Short-Term Memory</strong></a>
cell that regulates the flow of information through the cell.
Forget gates maintain context by deciding which information to discard
from the cell state.</p>

"
full softmax (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#softmax""><strong>softmax</strong></a>. Contrast with
<a href=""https://developers.google.com/machine-learning/glossary/#candidate_sampling""><strong>candidate sampling</strong></a>.</p>

"
fully connected layer (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layer</strong></a> in which each <a href=""https://developers.google.com/machine-learning/glossary/#node""><strong>node</strong></a> is
connected to <em>every</em> node in the subsequent hidden layer.</p>

<p>A fully connected layer is also known as a <a href=""https://developers.google.com/machine-learning/glossary/#dense_layer""><strong>dense layer</strong></a>.</p>


"
GAN (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#generative_adversarial_network""><strong>generative adversarial
network</strong></a>.</p>

"
generalization (Google Machine Learning Glossary)|"

<p>Refers to your model&#39;s ability to make correct predictions on new,
previously unseen data as opposed to the data used to train the model.</p>

"
generalization curve (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#loss_curve""><strong>loss curve</strong></a> showing both the
<a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training set</strong></a> and the
<a href=""https://developers.google.com/machine-learning/glossary/#validation_set""><strong>validation set</strong></a>.
A generalization curve can help you detect possible
<a href=""https://developers.google.com/machine-learning/glossary/#overfitting""><strong>overfitting</strong></a>.  For example, the following
generalization curve suggests overfitting because loss for
the validation set ultimately becomes significantly higher
than for the training set.</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/GeneralizationCurve.svg"" height=""500""
alt=""A Cartesian plot in which the y-axis is labeled 'loss' and the x-axis
is labeled 'iterations'. Two graphs appear. One graph shows a loss curve
for a training set and the other graph shows a loss curve for a validation
set. The two curves start off similarly, but the curve for the training set
eventually dips far lower than the curve for the validation set.""></img>
</p>

"
generalized linear model (Google Machine Learning Glossary)|"

<p>A generalization of <a href=""https://developers.google.com/machine-learning/glossary/#least_squares_regression""><strong>least squares regression</strong></a>
models, which are based on
<a href=""https://wikipedia.org/wiki/Gaussian_noise"" target=""T"">Gaussian
noise</a>, to other
types of models based on other types of noise, such as
<a href=""https://wikipedia.org/wiki/Shot_noise"" target=""T"">Poisson noise</a>
or
categorical noise. Examples of generalized linear models include:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a></li>
<li>multi-class regression</li>
<li>least squares regression</li>
</ul>

<p>The parameters of a generalized linear model can be found through
<a href=""https://developers.google.com/machine-learning/glossary/#convex_optimization""><strong>convex optimization</strong></a>.</p>

<p>Generalized linear models exhibit the following properties:</p>

<ul align=""left"">
<li>The average prediction of the optimal least squares regression model is
equal to the average label on the training data.</li>
<li>The average probability predicted by the optimal logistic regression
model is equal to the average label on the training data.</li>
</ul>

<p>The power of a generalized linear model is limited by its features. Unlike
a deep model, a generalized linear model cannot &quot;learn new features.&quot;</p>

"
generative adversarial network (GAN) (Google Machine Learning Glossary)|"

<p>A system to create new data in which a <a href=""https://developers.google.com/machine-learning/glossary/#generator""><strong>generator</strong></a> creates
data and a <a href=""https://developers.google.com/machine-learning/glossary/#discriminator""><strong>discriminator</strong></a> determines whether that
created data is valid or invalid.</p>

"
generative model (Google Machine Learning Glossary)|"

<p>Practically speaking, a model that does either of the following:</p>

<ul align=""left"">
<li>Creates (generates) new examples from the training dataset.
For example, a generative model could create poetry after training
on a dataset of poems. The <a href=""https://developers.google.com/machine-learning/glossary/#generator""><strong>generator</strong></a> part of a
<a href=""https://developers.google.com/machine-learning/glossary/#generative_adversarial_network""><strong>generative adversarial network</strong></a>
falls into this category.</li>
<li>Determines the probability that a new example comes from the
training set, or was created from the same mechanism that created
the training set.  For example, after training on
a dataset consisting of English sentences, a generative model could
determine the probability that new input is a valid English sentence.</li>
</ul>

<p>A generative model can theoretically discern the distribution of examples
or particular features in a dataset. That is:</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
p(examples)
</pre>

<p>Unsupervised learning models are generative.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#discriminative_model""><strong>discriminative models</strong></a>.</p>

"
generator (Google Machine Learning Glossary)|"

<p>The subsystem within a <a href=""https://developers.google.com/machine-learning/glossary/#generative_adversarial_network""><strong>generative adversarial
network</strong></a>
that creates new <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a>.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#discriminative_model""><strong>discriminative model</strong></a>.</p>

"
gradient (Google Machine Learning Glossary)|"

<p>The vector of <a href=""https://developers.google.com/machine-learning/glossary/#partial_derivative""><strong>partial derivatives</strong></a> with respect to
all of the independent variables.  In machine learning, the gradient is
the vector of partial derivatives of the model function.  The gradient points
in the direction of steepest ascent.</p>

"
gradient clipping (Google Machine Learning Glossary)|"

<p>A commonly used mechanism to mitigate the
<a href=""https://developers.google.com/machine-learning/glossary/#exploding_gradient_problem""><strong>exploding gradient problem</strong></a> by artificially
limiting (clipping) the maximum value of gradients when using
<a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> to train a model.</p>

"
gradient descent (Google Machine Learning Glossary)|"

<p>A technique to minimize <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> by computing the gradients of
loss with respect to the model&#39;s parameters, conditioned on training data.
Informally, gradient descent iteratively adjusts parameters, gradually
finding the best combination of <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weights</strong></a> and bias to
minimize loss.</p>

"
graph (Google Machine Learning Glossary)|"

<p>In TensorFlow, a computation specification. Nodes in the graph
represent operations. Edges are directed and represent passing the result
of an operation (a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a>) as an
operand to another operation. Use
<a href=""https://developers.google.com/machine-learning/glossary/#TensorBoard""><strong>TensorBoard</strong></a> to visualize a graph.</p>

"
graph execution (Google Machine Learning Glossary)|"

<p>A TensorFlow programming environment in which the program first constructs
a <a href=""https://developers.google.com/machine-learning/glossary/#graph""><strong>graph</strong></a> and then executes all or part of that graph. Graph
execution is the default execution mode in TensorFlow 1.x.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#eager_execution""><strong>eager execution</strong></a>.</p>

"
greedy policy (Google Machine Learning Glossary)|"

<p>In reinforcement learning, a <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a> that always chooses the
action with the highest expected <a href=""https://developers.google.com/machine-learning/glossary/#return""><strong>return</strong></a>.</p>

"
ground truth (Google Machine Learning Glossary)|"

<p>The correct answer. Reality. Since reality is often subjective,
expert <a href=""https://developers.google.com/machine-learning/glossary/#rater""><strong>raters</strong></a> typically are the proxy for ground truth.</p>

"
group attribution bias (Google Machine Learning Glossary)|"

<p>Assuming that what is true for an individual is also true for everyone
in that group. The effects of group attribution bias can be exacerbated
if a <a href=""https://developers.google.com/machine-learning/glossary/#convenience_sampling""><strong>convenience sampling</strong></a>
is used for data collection. In a non-representative sample, attributions
may be made that do not reflect reality.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#out-group_homogeneity_bias""><strong>out-group homogeneity bias</strong></a>
and <a href=""https://developers.google.com/machine-learning/glossary/#in-group_bias""><strong>in-group bias</strong></a>.</p>


"
hashing (Google Machine Learning Glossary)|"

<p>In machine learning, a mechanism for bucketing
<a href=""https://developers.google.com/machine-learning/glossary/#categorical_data""><strong>categorical data</strong></a>, particularly when the number
of categories is large, but the number of categories actually appearing
in the dataset is comparatively small.</p>

<p>For example, Earth is home to about 60,000 tree species. You could
represent each of the 60,000 tree species in 60,000 separate categorical
buckets. Alternatively, if only 200 of those tree species actually appear
in a dataset, you could use hashing to divide tree species into
perhaps 500 buckets.</p>

<p>A single bucket could contain multiple tree species. For example, hashing
could place <em>baobab</em> and <em>red maple</em>&mdash;two genetically dissimilar
species&mdash;into the same bucket. Regardless, hashing is still a good way to
map large categorical sets into the desired number of buckets. Hashing turns a
categorical feature having a large number of possible values into a much
smaller number of values by grouping values in a
deterministic way.</p>

<p>For more information on hashing, see the
<a href=""https://www.tensorflow.org/guide/feature_columns"" target=""T"">Feature
Columns chapter</a> in the TensorFlow Programmers Guide.</p>

"
heuristic (Google Machine Learning Glossary)|"

<p>A quick solution to a problem, which may or may not be the best solution.
For example, &quot;With a heuristic, we achieved 86% accuracy. When we switched
to a deep neural network, accuracy went up to 98%.&quot;</p>

"
hidden layer (Google Machine Learning Glossary)|"

<p>A synthetic layer in a <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> between the
<a href=""https://developers.google.com/machine-learning/glossary/#input_layer""><strong>input layer</strong></a> (that is, the features) and the
<a href=""https://developers.google.com/machine-learning/glossary/#output_layer""><strong>output layer</strong></a> (the prediction). Hidden layers typically
contain an <a href=""https://developers.google.com/machine-learning/glossary/#activation_function""><strong>activation function</strong></a> (such as
<a href=""https://developers.google.com/machine-learning/glossary/#ReLU""><strong>ReLU</strong></a>) for training.  A <a href=""https://developers.google.com/machine-learning/glossary/#deep_neural_network""><strong>deep neural
network</strong></a> contains more than one
hidden layer.</p>

"
hierarchical clustering (Google Machine Learning Glossary)|"

<p>A category of <a href=""https://developers.google.com/machine-learning/glossary/#clustering""><strong>clustering</strong></a> algorithms that create a tree
of clusters. Hierarchical clustering is well-suited to hierarchical data,
such as botanical taxonomies. There are two types of hierarchical
clustering algorithms:</p>

<ul align=""left"">
<li><strong>Agglomerative clustering</strong> first assigns every example to its own cluster,
and iteratively merges the closest clusters to create a hierarchical
tree.</li>
<li><strong>Divisive clustering</strong> first groups all examples into one cluster and then
iteratively divides the cluster into a hierarchical tree.</li>
</ul>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#centroid_based_clustering""><strong>centroid-based clustering</strong></a>.</p>

"
hinge loss (Google Machine Learning Glossary)|"

<p>A family of <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> functions for
<a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification</strong></a> designed to find the
<a href=""https://developers.google.com/machine-learning/glossary/#decision_boundary""><strong>decision boundary</strong></a> as distant as possible
from each training example,
thus maximizing the margin between examples and the boundary.
<a href=""https://developers.google.com/machine-learning/glossary/#KSVMs""><strong>KSVMs</strong></a> use hinge loss (or a related function, such as
squared hinge loss). For binary classification, the hinge loss function
is defined as follows:</p>

<div>
[$$]\text{loss} = \text{max}(0, 1 - (y * y'))[/$$]
</div>

<p>where <em>y</em> is the true label, either -1 or +1, and <em>y&#39;</em> is the raw output
of the classifier model:</p>

<div>
[$$]y' = b + w_1x_1 + w_2x_2 + … w_nx_n[/$$]
</div>

<p>Consequently, a plot of hinge loss vs. (y * y&#39;) looks as follows:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/hinge-loss.svg"">
</p>

"
holdout data (Google Machine Learning Glossary)|"

<p><a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>Examples</strong></a> intentionally not used (&quot;held out&quot;) during training.
The <a href=""https://developers.google.com/machine-learning/glossary/#validation_set""><strong>validation dataset</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#test_set""><strong>test dataset</strong></a> are examples of holdout data. Holdout data
helps evaluate your model&#39;s ability to generalize to data other than the
data it was trained on. The loss on the holdout set provides a better
estimate of the loss on an unseen dataset than does the loss on the
training set.</p>

"
hyperparameter (Google Machine Learning Glossary)|"

<p>The &quot;knobs&quot; that you
tweak during successive runs of training a model. For example,
<a href=""https://developers.google.com/machine-learning/glossary/#learning_rate""><strong>learning rate</strong></a> is a hyperparameter.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#parameter""><strong>parameter</strong></a>.</p>

"
hyperplane (Google Machine Learning Glossary)|"

<p>A boundary that separates a space into two subspaces.  For example, a line is a
hyperplane in two dimensions and a plane is a hyperplane in three dimensions.
More typically in machine learning, a hyperplane is the boundary separating a
high-dimensional space.  <a href=""https://developers.google.com/machine-learning/glossary/#KSVMs""><strong>Kernel Support Vector Machines</strong></a> use
hyperplanes to separate positive classes from negative classes, often in a very
high-dimensional space.</p>


"
i.i.d. (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#iid""><strong>independently and identically distributed</strong></a>.</p>

"
image recognition (Google Machine Learning Glossary)|"

<p>A process that classifies object(s), pattern(s), or concept(s) in an image.
Image recognition is also known as <strong>image classification</strong>.</p>

<p>For more information, see
<a href=""/machine-learning/practica/image-classification/""
   target=""T""
   class=""gc-analytics-event""
   data-category=""launchImageClassificationPracticum""
   data-label=""ml-glossary""
   data-action=""click"">ML Practicum: Image Classification</a>.</p>

"
imbalanced dataset (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#class_imbalanced_data_set""><strong>class-imbalanced dataset</strong></a>.</p>

"
implicit bias (Google Machine Learning Glossary)|"

<p>Automatically making an association or assumption based on one’s mental
models and memories. Implicit bias can affect the following:</p>

<ul align=""left"">
<li>How data is collected and classified.</li>
<li>How machine learning systems are designed and developed.</li>
</ul>

<p>For example, when building a classifier to identify wedding photos,
an engineer may use the presence of a white dress in a photo as a feature.
However, white dresses have been customary only during certain eras and
in certain cultures.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#confirmation_bias""><strong>confirmation bias</strong></a>.</p>

"
incompatibility of fairness metrics (Google Machine Learning Glossary)|"

<p>The idea that some notions of fairness are mutually incompatible and
cannot be satisfied simultaneously. As a result, there is no single
universal <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>metric</strong></a> for quantifying fairness
that can be applied to all ML problems.</p>

<p>While this may seem discouraging, incompatibility of fairness metrics
doesn’t imply that fairness efforts are fruitless. Instead, it suggests
that fairness must be defined contextually for a given ML problem, with
the goal of preventing harms specific to its use cases.</p>

<p>See <a href=""https://arxiv.org/pdf/1609.07236.pdf"" target=""T"">&quot;On the
(im)possibility of fairness&quot;</a> for a more detailed discussion of this topic.</p>

"
independently and identically distributed (i.i.d) (Google Machine Learning Glossary)|"

<p>Data drawn from a distribution that doesn&#39;t change, and where each value
drawn doesn&#39;t depend on values that have been drawn previously. An i.i.d.
is the <a href=""https://wikipedia.org/wiki/Ideal_gas"" target=""T"">ideal gas</a>
of machine
learning—a useful mathematical construct but almost never exactly found
in the real world. For example, the distribution of visitors to a web page
may be i.i.d. over a brief window of time; that is, the distribution doesn&#39;t
change during that brief window and one person&#39;s visit is generally
independent of another&#39;s visit. However, if you expand that window of time,
seasonal differences in the web page&#39;s visitors may appear.</p>

"
individual fairness (Google Machine Learning Glossary)|"

<p>A fairness metric that checks whether similar individuals are classified
similarly. For example, Brobdingnagian Academy might want to satisfy
individual fairness by ensuring that two students with identical grades
and standardized test scores are equally likely to gain admission.</p>

<p>Note that individual fairness relies entirely on how you define &quot;similarity&quot;
(in this case, grades and test scores), and you can run the risk of
introducing new fairness problems if your similarity metric misses important
information (such as the rigor of a student’s curriculum).</p>

<p>See <a href=""https://arxiv.org/pdf/1104.3913.pdf"" target=""T"">&quot;Fairness Through
Awareness&quot;</a> for a more detailed discussion of individual fairness.</p>

"
inference (Google Machine Learning Glossary)|"

<p>In machine learning, often refers to the process of making predictions by
applying the trained model to <a href=""https://developers.google.com/machine-learning/glossary/#unlabeled_example""><strong>unlabeled examples</strong></a>.
In statistics, inference refers to the process of fitting the parameters
of a distribution conditioned on some observed data. (See the
<a href=""https://wikipedia.org/wiki/Statistical_inference"" target=""T"">
Wikipedia article on statistical inference</a>.)</p>

"
in-group bias (Google Machine Learning Glossary)|"

<p>Showing partiality to one&#39;s own group or own characteristics.
If testers or raters consist of the machine learning developer&#39;s friends,
family, or colleagues, then in-group bias may invalidate product testing
or the dataset.</p>

<p>In-group bias is a form of
<a href=""https://developers.google.com/machine-learning/glossary/#group_attribution_bias""><strong>group attribution bias</strong></a>.
See also <a href=""https://developers.google.com/machine-learning/glossary/#out-group_homogeneity_bias""><strong>out-group homogeneity bias</strong></a>.</p>

"
input function (Google Machine Learning Glossary)|"

<p>In TensorFlow, a function that returns input data to the training, evaluation,
or prediction method of an <a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimator</strong></a>.  For example,
the training input function returns a <a href=""https://developers.google.com/machine-learning/glossary/#batch""><strong>batch</strong></a> of features
and labels from the <a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training set</strong></a>.</p>

"
input layer (Google Machine Learning Glossary)|"

<p>The first layer (the one that receives the input data) in
a <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>.</p>

"
instance (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>example</strong></a>.</p>

"
interpretability (Google Machine Learning Glossary)|"

<p>The degree to which a model&#39;s predictions can be readily explained. Deep models
are often non-interpretable; that is, a deep model&#39;s different layers can be
hard to decipher. By contrast, linear regression models and <a href=""https://developers.google.com/machine-learning/glossary/#wide_model""><strong>wide
models</strong></a> are typically far more interpretable.</p>

"
inter-rater agreement (Google Machine Learning Glossary)|"

<p>A measurement of how often human raters agree when doing a task.
If raters disagree, the task instructions may need to be improved.
Also sometimes called <strong>inter-annotator agreement</strong> or
<strong>inter-rater reliability</strong>.  See also
<a href=""https://wikipedia.org/wiki/Cohen%27s_kappa"" target=""T"">Cohen&#39;s
kappa</a>,
which is one of the most popular inter-rater agreement measurements.</p>

"
intersection over union (IoU) (Google Machine Learning Glossary)|"

<p>The intersection of two sets divided by their union. In machine-learning
image-detection tasks, IoU is used to measure the accuracy of the model’s
predicted <a href=""https://developers.google.com/machine-learning/glossary/#bounding_box""><strong>bounding box</strong></a> with respect to the
<a href=""https://developers.google.com/machine-learning/glossary/#ground_truth""><strong>ground-truth</strong></a> bounding box. In this case, the IoU for the
two boxes is the ratio between the overlapping area and the total area, and
its value ranges from 0 (no overlap of predicted bounding box and ground-truth
bounding box) to 1 (predicted bounding box and ground-truth bounding box have
the exact same coordinates).</p>

<p>For example, in the image below:</p>

<ul align=""left"">
<li>The predicted bounding box (the coordinates delimiting where the model
predicts the night table in the painting is located) is outlined in purple.</li>
<li>The ground-truth bounding box (the coordinates delimiting where the night
table in the painting is actually located) is outlined in green.</li>
</ul>

<p><img src=""https://developers.google.com/machine-learning/glossary/images/iou_van_gogh_bounding_boxes.jpg""
     alt=""The Van Gogh painting 'Vincent's Bedroom in Arles', with two different
          bounding boxes around the night table beside the bed. The ground-truth
          bounding box (in green) perfectly circumscribes the night table. The
          predicted bounding box (in purple) is offset 50% down and to the right
          of the ground-truth bounding box; it encloses the bottom-right quarter
          of the night table, but misses the rest of the table.""></p>

<p>Here, the intersection of the bounding boxes for prediction and ground truth
(below left) is 1, and the union of the bounding boxes for prediction and
ground truth (below right) is 7, so the IoU is \(\frac{1}{7}\).</p>

<div id=""intersection-union-side-by-side"">
<img src=""https://developers.google.com/machine-learning/glossary/images/iou_van_gogh_intersection.jpg""
     alt=""Same image as above, but with each bounding box divided into four
          quadrants. There are seven quadrants total, as the bottom-right
          quadrant of the ground-truth bounding box and the top-left
          quadrant of the predicted bounding box overlap each other. This
          overlapping section (highlighted in green) represents the
          intersection, and has an area of 1."">

<img src=""https://developers.google.com/machine-learning/glossary/images/iou_van_gogh_union.jpg""
     alt=""Same image as above, but with each bounding box divided into four
          quadrants. There are seven quadrants total, as the bottom-right
          quadrant of the ground-truth bounding box and the top-left
          quadrant of the predicted bounding box overlap each other.
          The entire interior enclosed by both bounding boxes
          (highlighted in green) represents the union, and has
          an area of 7."">
</div>

"
IoU (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#intersection_over_union""><strong>intersection over union</strong></a>.</p>

"
item matrix (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation systems</strong></a>, a
matrix of <a href=""https://developers.google.com/machine-learning/glossary/#embeddings""><strong>embeddings</strong></a> generated by
<a href=""https://developers.google.com/machine-learning/glossary/#matrix_factorization""><strong>matrix factorization</strong></a>
that holds latent signals about each <a href=""https://developers.google.com/machine-learning/glossary/#items""><strong>item</strong></a>.
Each row of the item matrix holds the value of a single latent
feature for all items.
For example, consider a movie recommendation system. Each column
in the item matrix represents a single movie. The latent signals
might represent genres, or might be harder-to-interpret
signals that involve complex interactions among genre, stars,
movie age, or other factors.</p>

<p>The item matrix has the same number of columns as the target
matrix that is being factorized. For example, given a movie
recommendation system that evaluates 10,000 movie titles, the
item matrix will have 10,000 columns.</p>

"
items (Google Machine Learning Glossary)|"

<p>In a <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation system</strong></a>, the entities that
a system recommends. For example, videos are the items that a video store
recommends, while books are the items that a bookstore recommends.</p>

"
iteration (Google Machine Learning Glossary)|"

<p>A single update of a model&#39;s weights during training.  An iteration
consists of computing the gradients of the parameters with respect to the
loss on a single <a href=""https://developers.google.com/machine-learning/glossary/#batch""><strong>batch</strong></a> of data.</p>


"
Keras (Google Machine Learning Glossary)|"

<p>A popular Python machine learning API.
<a href=""https://keras.io"" target=""T"">Keras</a>
runs on
several deep learning frameworks, including TensorFlow, where it is made
available as
<a href=""https://www.tensorflow.org/api_docs/python/tf/keras""
target=""T"">tf.keras</a>.</p>

"
keypoints (Google Machine Learning Glossary)|"

<p>The coordinates of particular features in an image. For example, for an
<a href=""https://developers.google.com/machine-learning/glossary/#image_recognition""><strong>image recognition</strong></a> model that distinguishes
flower species, keypoints might be the center of each petal, the stem,
the stamen, and so on.</p>

"
Kernel Support Vector Machines (KSVMs) (Google Machine Learning Glossary)|"

<p>A classification algorithm that seeks to maximize the margin between
<a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative classes</strong></a> by mapping input data vectors
to a higher dimensional space.  For example, consider a classification
problem in which the input dataset
has a hundred features. To maximize the margin between
positive and negative classes, a KSVM could internally map those features into
a million-dimension space.  KSVMs uses a loss function called
<a href=""https://developers.google.com/machine-learning/glossary/#hinge-loss""><strong>hinge loss</strong></a>.</p>

"
k-means (Google Machine Learning Glossary)|"

<p>A popular <a href=""https://developers.google.com/machine-learning/glossary/#clustering""><strong>clustering</strong></a> algorithm that groups examples
in unsupervised learning. The k-means algorithm basically does the following:</p>

<ul align=""left"">
<li>Iteratively determines the best k center points (known
as <a href=""https://developers.google.com/machine-learning/glossary/#centroid""><strong>centroids</strong></a>).</li>
<li>Assigns each example to the closest centroid.  Those examples nearest
the same centroid belong to the same group.</li>
</ul>

<p>The k-means algorithm picks centroid locations to minimize the cumulative
<em>square</em> of the distances from each example to its closest centroid.</p>

<p>For example, consider the following plot of dog height to dog width:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/DogDimensions.svg"">
</p>

<p>If k=3, the k-means algorithm will determine three centroids.  Each example
is assigned to its closest centroid, yielding three groups:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/DogDimensionsKMeans.svg"">
</p>

<p>Imagine that a manufacturer wants to determine the ideal sizes for small,
medium, and large sweaters for dogs. The three centroids identify the mean
height and mean width of each dog in that cluster. So, the manufacturer
should probably base sweater sizes on those three centroids.  Note that
the centroid of a cluster is typically <em>not</em> an example in the cluster.</p>

<p>The preceding illustrations shows k-means for examples with only
two features (height and width). Note that k-means can group examples
across many features.</p>

"
k-median (Google Machine Learning Glossary)|"

<p>A clustering algorithm closely related to <a href=""https://developers.google.com/machine-learning/glossary/#k-means""><strong>k-means</strong></a>. The
practical difference between the two is as follows:</p>

<ul align=""left"">
<li>In k-means, centroids are determined by minimizing the sum of the
<em>squares</em> of the distance between a centroid candidate and each of
its examples.</li>
<li>In k-median, centroids are determined by minimizing the sum of the
distance between a centroid candidate and each of its examples.</li>
</ul>

<p>Note that the definitions of distance are also different:</p>

<ul align=""left"">
<li>k-means relies on the
<a href=""https://wikipedia.org/wiki/Euclidean_distance""
target=""T"">Euclidean distance</a> from
the centroid to an example.  (In two dimensions, the Euclidean
distance means using the Pythagorean theorem to calculate
the hypotenuse.)  For example, the k-means distance between (2,2)
and (5,-2) would be:</li>
</ul>

<div>
[$$]
{\text{Euclidean distance}} = {\sqrt {(2-5)^2 + (2--2)^2}} = 5
[/$$]
</div>

<ul align=""left"">
<li>k-median relies on the <a href=""https://wikipedia.org/wiki/Taxicab_geometry""
target=""T""> Manhattan distance</a>
from the centroid to an example.  This distance is the sum of the
absolute deltas in each dimension.  For example, the k-median
distance between (2,2) and (5,-2) would be:</li>
</ul>

<div>
[$$]
{\text{Manhattan distance}} = \lvert 2-5 \rvert + \lvert 2--2 \rvert = 7
[/$$]
</div>


"
L<sub>1</sub> loss (Google Machine Learning Glossary)|"

<p><a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>Loss</strong></a> function based on the absolute value of the difference
between the values that a model is predicting and the actual values of
the <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>labels</strong></a>. L<sub>1</sub> loss is less sensitive to outliers
than <a href=""https://developers.google.com/machine-learning/glossary/#squared_loss""><strong>L<sub>2</sub> loss</strong></a>.</p>

"
L<sub>1</sub> regularization (Google Machine Learning Glossary)|"

<p>A type of <a href=""https://developers.google.com/machine-learning/glossary/#regularization""><strong>regularization</strong></a> that penalizes weights
in proportion to the sum of the absolute values of the weights. In models
relying on <a href=""https://developers.google.com/machine-learning/glossary/#sparse_features""><strong>sparse features</strong></a>, L<sub>1</sub>
regularization helps drive the weights of irrelevant or barely relevant
features to exactly 0, which removes those features from the model.
Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#L2_regularization""><strong>L<sub>2</sub> regularization</strong></a>.</p>

"
L<sub>2</sub> loss (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#squared_loss""><strong>squared loss</strong></a>.</p>

"
L<sub>2</sub> regularization (Google Machine Learning Glossary)|"

<p>A type of <a href=""https://developers.google.com/machine-learning/glossary/#regularization""><strong>regularization</strong></a> that penalizes weights
in proportion to the sum of the <em>squares</em> of the weights.
L<sub>2</sub> regularization helps drive outlier weights (those with
high positive or low negative values) closer to 0 but not quite to 0.
(Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#L1_regularization""><strong>L1 regularization</strong></a>.)
L<sub>2</sub> regularization always improves generalization in linear models.</p>

"
label (Google Machine Learning Glossary)|"

<p>In supervised learning, the &quot;answer&quot; or &quot;result&quot; portion of an
<a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>example</strong></a>. Each example in a labeled dataset consists of one or
more features and a label. For instance, in a housing dataset, the features
might include the number of bedrooms, the number of bathrooms, and the age
of the house, while the label might be the house&#39;s price.
In a spam detection dataset, the features might include the subject line, the
sender, and the email message itself, while the label would probably be either
&quot;spam&quot; or &quot;not spam.&quot;</p>

"
labeled example (Google Machine Learning Glossary)|"

<p>An example that contains <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a> and a
<a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a>. In supervised training, models learn from labeled
examples.</p>

"
lambda (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#regularization_rate""><strong>regularization rate</strong></a>.</p>

<p>(This is an overloaded term. Here we&#39;re focusing on the term&#39;s
definition within <a href=""https://developers.google.com/machine-learning/glossary/#regularization""><strong>regularization</strong></a>.)</p>

"
landmarks (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#keypoints""><strong>keypoints</strong></a>.</p>

"
layer (Google Machine Learning Glossary)|"

<p>A set of <a href=""https://developers.google.com/machine-learning/glossary/#neuron""><strong>neurons</strong></a> in a
<a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> that process a set of input
features, or the output of those neurons.</p>

<p>Also, an abstraction in TensorFlow. Layers are Python
functions that take <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensors</strong></a> and configuration options
as input and produce other tensors as output. Once the necessary Tensors
have been composed, the user can convert the result into an
<a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimator</strong></a> via a <a href=""https://developers.google.com/machine-learning/glossary/#model_function""><strong>model function</strong></a>.</p>

"
Layers API (tf.layers) (Google Machine Learning Glossary)|"

<p>A TensorFlow API for constructing a <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep</strong></a> neural network
as a composition of layers. The Layers API enables you to build different
types of <a href=""https://developers.google.com/machine-learning/glossary/#layer""><strong>layers</strong></a>, such as:</p>

<ul align=""left"">
<li><code translate=""no"" dir=""ltr"">tf.layers.Dense</code> for a <a href=""https://developers.google.com/machine-learning/glossary/#fully_connected_layer""><strong>fully-connected layer</strong></a>.</li>
<li><code translate=""no"" dir=""ltr"">tf.layers.Conv2D</code> for a convolutional layer.</li>
</ul>

<p>When writing a <a href=""https://developers.google.com/machine-learning/glossary/#custom_estimator""><strong>custom Estimator</strong></a>, you compose Layers
objects to define the characteristics of all the
<a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layers</strong></a>.</p>

<p>The Layers API follows the <a href=""https://developers.google.com/machine-learning/glossary/#Keras""><strong>Keras</strong></a> layers API conventions.
That is, aside from a different prefix, all functions in the Layers API
have the same names and signatures as their counterparts in the Keras
layers API.</p>

"
learning rate (Google Machine Learning Glossary)|"

<p>A scalar used to train a model via gradient descent. During each iteration,
the <a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> algorithm multiplies the
learning rate by the gradient.  The resulting product is called the
<strong>gradient step</strong>.</p>

<p>Learning rate is a key <a href=""https://developers.google.com/machine-learning/glossary/#hyperparameter""><strong>hyperparameter</strong></a>.</p>

"
least squares regression (Google Machine Learning Glossary)|"

<p>A linear regression model trained by minimizing
<a href=""https://developers.google.com/machine-learning/glossary/#L2_loss""><strong>L<sub>2</sub> Loss</strong></a>.</p>

"
linear model (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> that assigns one <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weight</strong></a> per
<a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a> to make <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a>.
(Linear models also incorporate a <a href=""https://developers.google.com/machine-learning/glossary/#bias""><strong>bias</strong></a>.) By contrast,
the relationship of weights to features in  <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep models</strong></a>
is not one-to-one.</p>

<p>A linear model uses the following formula:</p>

<div>
[$$]y' = b + w_1x_1 + w_2x_2 + … w_nx_n[/$$]
</div>

<p>where:
<ul align=""left"">
<li>\(y&#39;\) is the raw prediction. (In certain kinds of linear models, this
raw prediction will be further modified.  For example, see
<a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression"">logistic regression</a>.)</li>
<li>\(b\) is the <a href=""https://developers.google.com/machine-learning/glossary/#bias""><strong>bias</strong></a>.</li>
<li>\(w\) is a <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weight</strong></a>, so \(w_1\) is the weight of
the first feature, \(w_2\) is the weight of the second feature,
and so on.</li>
<li>\(x\) is a <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a>, so \(x_1\) is the value of the
first feature, \(x_2\) is the value of the second feature, and so on.</li>
</ul></p>

<p>For example, suppose a linear model for three features learns the following
bias and weights:
<ul align=""left"">
<li>\(b\) = 7</li>
<li>\(w_1\) = -2.5</li>
<li>\(w_2\) = -1.2</li>
<li>\(w_3\) = 1.4</li>
</ul></p>

<p>Therefore, given three features (\(x_1\), \(x_2\), and \(x_3\)),
the linear model uses the following equation to generate each prediction:</p>

<div>
[$$]y' = 7 + (-2.5)(x_1) + (-1.2)(x_2) + (1.4)(x_3)[/$$]
</div>

<p>Suppose a particular example contains the following values:
<ul align=""left"">
<li>\(x_1\) = 4</li>
<li>\(x_2\) = -10</li>
<li>\(x_3\) = 5</li>
</ul></p>

<p>Plugging those values into the formula yields a prediction for this example:</p>

<div>
[$$]y' = 7 + (-2.5)(4) + (-1.2)(-10) + (1.4)(5)[/$$]
[$$]y' = 16[/$$]
</div>

<p>Linear models tend to be easier to analyze and train than deep models. However,
deep models can model complex relationships <em>between</em> features.</p>

<p><a href=""https://developers.google.com/machine-learning/glossary/#linear_regression""><strong>Linear regression</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a> are two types of linear models.
Linear models include not only models that use the linear equation but also a
broader set of models that use the linear equation as part of the formula.
For example, logistic regression post-processes the raw
prediction (\(y&#39;\)) to calculate the prediction.</p>

"
linear regression (Google Machine Learning Glossary)|"

<p>Using the raw output (\(y&#39;\)) of a <a href=""https://developers.google.com/machine-learning/glossary/#linear_model""><strong>linear model</strong></a> as the
actual prediction in a <a href=""https://developers.google.com/machine-learning/glossary/#regression_model""><strong>regression model</strong></a>. The goal of
a regression problem is to make a real-valued prediction. For example, if
the raw output (\(y&#39;\)) of a linear model is 8.37, then the prediction is
8.37.</p>

<p>Contrast linear regression with <a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a>.
Also, contrast regression with <a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification</strong></a>.</p>

"
logistic regression (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification model</strong></a> that uses a
<a href=""https://developers.google.com/machine-learning/glossary/#sigmoid_function""><strong>sigmoid function</strong></a> to convert
a <a href=""https://developers.google.com/machine-learning/glossary/#linear_model""><strong>linear model&#39;s</strong></a> raw prediction (\(y&#39;\)) into a
value between 0 and 1. You can interpret the value between 0 and 1 in either
of the following two ways:</p>

<ul align=""left"">
<li>As a probability that the example belongs to the
<a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive class</strong></a> in a binary classification problem.</li>
<li>As a value to be compared against a
<a href=""https://developers.google.com/machine-learning/glossary/#classification_threshold""><strong>classification threshold</strong></a>. If the value is
equal to or above the classification threshold, the system classifies the
example as the positive class. Conversely, if the value is below the given
threshold, the system classifies the example as the
<a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative class</strong></a>.  For example, suppose the
classification threshold is 0.82:
<ul align=""left"">
<li>Imagine an example that produces a raw prediction (\(y&#39;\)) of 2.6.
 The sigmoid of 2.6 is 0.93. Since 0.93 is greater than 0.82, the
 system classifies this example as the positive class.</li>
<li>Imagine a different example that produces a raw prediction of 1.3. The
 sigmoid of 1.3 is 0.79. Since 0.79 is less than 0.82, the system
 classifies that example as the negative class.</li>
</ul></li>
</ul>

<p>Although logistic regression is often used in
<a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a> problems, logistic
regression can also be used in
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a> problems
(where it becomes called <strong>multi-class logistic regression</strong> or
<strong>multinomial regression</strong>).</p>

"
logits (Google Machine Learning Glossary)|"

<p>The vector of raw (non-normalized) predictions that a classification
model generates, which is ordinarily then passed to a normalization function.
If the model is solving a <a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a>
problem, logits typically become an input to the
<a href=""https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2""
target=""T"">softmax function</a>.
The softmax function then generates a vector of (normalized)
probabilities with one value for each possible class.</p>

<p>In addition, logits sometimes refer to the element-wise inverse of the
<a href=""https://developers.google.com/machine-learning/glossary/#sigmoid_function""><strong>sigmoid function</strong></a>. For more information, see
<a href=""https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits""
target=""T"">tf.nn.sigmoid_cross_entropy_with_logits</a>.</p>

"
Log Loss (Google Machine Learning Glossary)|"

<p>The <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> function used in binary
<a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a>.</p>

"
log-odds (Google Machine Learning Glossary)|"

<p>The logarithm of the odds of some event.</p>

<p>If the event refers to a binary probability, then <strong>odds</strong> refers to
the ratio of the probability of success (p) to the probability of
failure (1-p).  For example, suppose that a given event has a 90%
probability of success and a 10% probability of failure. In this case,
odds is calculated as follows:</p>

<div>
[$$]
{\text{odds}} =
\frac{\text{p}} {\text{(1-p)}} =
\frac{.9} {.1} =
{\text{9}}
[/$$]
</div>

<p>The log-odds is simply the logarithm of the odds. By convention,
&quot;logarithm&quot; refers to natural logarithm, but logarithm could actually
be any base greater than 1.  Sticking to convention, the log-odds of
our example is therefore:</p>

<div>
[$$]
{\text{log-odds}} =
ln(9) ~= 2.2
[/$$]
</div>

<p>The log-odds are the inverse of the <a href=""https://developers.google.com/machine-learning/glossary/#sigmoid_function""><strong>sigmoid function</strong></a>.</p>

"
Long Short-Term Memory (LSTM) (Google Machine Learning Glossary)|"

<p>A type of cell in a
<a href=""https://developers.google.com/machine-learning/glossary/#recurrent_neural_network""><strong>recurrent neural network</strong></a> used to process
sequences of data in applications such as handwriting recognition, machine
translation, and image captioning. LSTMs address the
<a href=""https://developers.google.com/machine-learning/glossary/#vanishing_gradient_problem""><strong>vanishing gradient problem</strong></a> that occurs when
training RNNs due to long data sequences by maintaining history in an
internal memory state based on new input and context from previous cells
in the RNN.</p>

"
loss (Google Machine Learning Glossary)|"

<p>A measure of how far a model&#39;s <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a> are from its
<a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a>. Or, to phrase it more pessimistically, a measure of
how bad the model is. To determine this value, a model must define a loss
function. For example, linear regression models typically use
<a href=""https://developers.google.com/machine-learning/glossary/#MSE""><strong>mean squared error</strong></a> for a loss function,
while logistic regression models use <a href=""https://developers.google.com/machine-learning/glossary/#Log_Loss""><strong>Log Loss</strong></a>.</p>

"
loss curve (Google Machine Learning Glossary)|"

<p>A graph of <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> as a function of training
<a href=""https://developers.google.com/machine-learning/glossary/#iteration""><strong>iterations</strong></a>. For example:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/LossCurve.svg"" height=""300"" alt=""A
graph of loss versus training iterations, showing a steady drop as iterations
increase, but then a slight rise in loss at a high number of iterations.""></img>
</p>

<p>The loss curve can help you determine when your model is
<a href=""https://developers.google.com/machine-learning/glossary/#convergence""><strong>converging</strong></a>, <a href=""https://developers.google.com/machine-learning/glossary/#overfitting""><strong>overfitting</strong></a>,
or <a href=""https://developers.google.com/machine-learning/glossary/#underfitting""><strong>underfitting</strong></a>.</p>

"
loss surface (Google Machine Learning Glossary)|"

<p>A graph of weight(s) vs. loss. <a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>Gradient descent</strong></a> aims
to find the weight(s) for which the loss surface is at a local minimum.</p>

"
LSTM (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#Long_Short-Term_Memory""><strong>Long Short-Term Memory</strong></a>.</p>


"
machine learning (Google Machine Learning Glossary)|"

<p>A program or system that builds (trains) a predictive model from input data.
The system uses the learned model to make useful predictions from new
(never-before-seen) data drawn from the same distribution as the one used to
train the model. Machine learning also refers to the field of study concerned
with these programs or systems.</p>

"
majority class (Google Machine Learning Glossary)|"

<p>The more common label in a
<a href=""https://developers.google.com/machine-learning/glossary/#class_imbalanced_data_set""><strong>class-imbalanced dataset</strong></a>. For example,
given a dataset containing 99% non-spam labels and 1% spam labels, the
non-spam labels are the majority class.</p>

"
Markov decision process (MDP) (Google Machine Learning Glossary)|"

<p>A graph representing the decision-making model where decisions
(or <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>actions</strong></a>) are taken to navigate a sequence of
<a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>states</strong></a> under the assumption that the
<a href=""https://developers.google.com/machine-learning/glossary/#Markov_property""><strong>Markov property</strong></a> holds. In reinforcement learning,
these transitions between states return a numerical <a href=""https://developers.google.com/machine-learning/glossary/#reward""><strong>reward</strong></a>.</p>

"
Markov property (Google Machine Learning Glossary)|"

<p>A property of certain <a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environments</strong></a>, where state
transitions are entirely determined by information implicit in the
current <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a> and the agent’s <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a>.</p>

"
matplotlib (Google Machine Learning Glossary)|"

<p>An open-source Python 2D plotting library.
<a href=""https://matplotlib.org/"" target=""T"">matplotlib</a> helps you visualize
different aspects of machine learning.</p>

"
matrix factorization (Google Machine Learning Glossary)|"

<p>In math, a mechanism for finding the matrices whose dot product approximates a
target matrix.</p>

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation systems</strong></a>, the target matrix
often holds users&#39; ratings on <a href=""https://developers.google.com/machine-learning/glossary/#items""><strong>items</strong></a>. For example, the target
matrix for a movie recommendation system might look something like the
following, where the positive integers are user ratings and 0
means that the user didn&#39;t rate the movie:</p>

<table>
  <tr>
    <th>&nbsp;</th>
    <th>Casablanca</th>
    <th>The Philadelphia Story</th>
    <th>Black Panther</th>
    <th>Wonder Woman</th>
    <th>Pulp Fiction</th>
  </tr>

  <tr>
    <td>User 1</td>
    <td>5.0</td>
    <td>3.0</td>
    <td>0.0</td>
    <td>2.0</td>
    <td>0.0</td>
  </tr>
  <tr>
    <td>User 2</td>
    <td>4.0</td>
    <td>0.0</td>
    <td>0.0</td>
    <td>1.0</td>
    <td>5.0</td>
  </tr>
  <tr>
    <td>User 3</td>
    <td>3.0</td>
    <td>1.0</td>
    <td>4.0</td>
    <td>5.0</td>
    <td>0.0</td>
  </tr>
</table>

<p>The movie recommendation system aims to predict user ratings for
unrated movies.  For example, will User 1 like <em>Black Panther</em>?</p>

<p>One approach for recommendation systems is to use matrix
factorization to generate the following two matrices:</p>

<ul align=""left"">
<li>A <a href=""https://developers.google.com/machine-learning/glossary/#user_matrix""><strong>user matrix</strong></a>, shaped as the number of users X the
number of embedding dimensions.</li>
<li>An <a href=""https://developers.google.com/machine-learning/glossary/#item_matrix""><strong>item matrix</strong></a>, shaped as the number of embedding
dimensions X the number of items.</li>
</ul>

<p>For example, using matrix factorization on our three users and five items
could yield the following user matrix and item matrix:</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
User Matrix                 Item Matrix

1.1   2.3           0.9   0.2   1.4    2.0   1.2
0.6   2.0           1.7   1.2   1.2   -0.1   2.1
2.5   0.5                                       
</pre>

<p>The dot product of the user matrix and item matrix yields a recommendation
matrix that contains not only the original user ratings but also predictions
for the movies that each user hasn&#39;t seen.
For example, consider User 1&#39;s rating of <em>Casablanca</em>, which was 5.0. The dot
product corresponding to that cell in the recommendation matrix should
hopefully be around 5.0, and it is:</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
(1.1 * 0.9) + (2.3 * 1.7) = 4.9
</pre>

<p>More importantly, will User 1 like <em>Black Panther</em>? Taking the dot product
corresponding to the first row and the third column yields a predicted
rating of 4.3:</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
(1.1 * 1.4) + (2.3 * 1.2) = 4.3
</pre>

<p>Matrix factorization typically yields a user matrix and item matrix that,
together, are significantly more compact than the target matrix.</p>

"
Mean Absolute Error (MAE) (Google Machine Learning Glossary)|"

<p>An error metric calculated by taking an average of absolute errors.
In the context of evaluating a model’s accuracy, MAE is the average
absolute difference between the expected and predicted values across
all training examples. Specifically, for \(n\) examples, for each value
\(y\) and its prediction \(\hat{y}\), MAE is defined as follows:</p>

<p>\[\text{MAE} = \frac{1}{n}\sum_{i=0}^n | y_i - \hat{y}_i |\]</p>

"
Mean Squared Error (MSE) (Google Machine Learning Glossary)|"

<p>The average squared loss per example. MSE is calculated by dividing the
<a href=""https://developers.google.com/machine-learning/glossary/#squared_loss""><strong>squared loss</strong></a> by the number of
<a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a>. The values that
<a href=""https://developers.google.com/machine-learning/glossary/#TensorFlow_Playground""><strong>TensorFlow Playground</strong></a> displays for
&quot;Training loss&quot; and &quot;Test loss&quot; are MSE.</p>

"
metric (Google Machine Learning Glossary)|"

<p>A number that you care about. May or may not be directly optimized in a
machine-learning system. A metric that your system tries to optimize is
called an <a href=""https://developers.google.com/machine-learning/glossary/#objective""><strong>objective</strong></a>.</p>

"
Metrics API (tf.metrics) (Google Machine Learning Glossary)|"

<p>A TensorFlow API for evaluating models. For example, <code translate=""no"" dir=""ltr"">tf.metrics.accuracy</code>
determines how often a model&#39;s predictions match labels. When writing a
<a href=""https://developers.google.com/machine-learning/glossary/#custom_estimator""><strong>custom Estimator</strong></a>, you invoke Metrics API functions to
specify how your model should be evaluated.</p>

"
mini-batch (Google Machine Learning Glossary)|"

<p>A small, randomly selected subset of the entire batch of
<a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a> run together in a single iteration of training
or inference. The <a href=""https://developers.google.com/machine-learning/glossary/#batch_size""><strong>batch size</strong></a> of a mini-batch is usually
between 10 and 1,000. It is much more efficient to calculate the loss on a
mini-batch than on the full training data.</p>

"
mini-batch stochastic gradient descent (SGD) (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> algorithm that uses
<a href=""https://developers.google.com/machine-learning/glossary/#mini-batch""><strong>mini-batches</strong></a>. In other words, mini-batch SGD estimates the
gradient based on a small subset of the training data. <a href=""https://developers.google.com/machine-learning/glossary/#SGD""><strong>Vanilla SGD</strong></a>
uses a mini-batch of size 1.</p>

"
minimax loss (Google Machine Learning Glossary)|"

<p>A loss function for
<a href=""https://developers.google.com/machine-learning/glossary/#generative_adversarial_network""><strong>generative adversarial networks</strong></a>,
based on the <a href=""https://developers.google.com/machine-learning/glossary/#cross-entropy""><strong>cross-entropy</strong></a> between the distribution
of generated data and real data.</p>

<p>Minimax loss is used in the
<a href=""https://arxiv.org/pdf/1406.2661.pdf"">first paper</a> to describe
generative adversarial networks.</p>


"
minority class (Google Machine Learning Glossary)|"

<p>The less common label in a
<a href=""https://developers.google.com/machine-learning/glossary/#class_imbalanced_data_set""><strong>class-imbalanced dataset</strong></a>. For example,
given a dataset containing 99% non-spam labels and 1% spam labels, the
spam labels are the minority class.</p>

"
ML (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#machine_learning""><strong>machine learning</strong></a>.</p>

"
MNIST (Google Machine Learning Glossary)|"

<p>A public-domain dataset compiled by LeCun, Cortes, and Burges containing
60,000 images, each image showing how a human manually wrote a particular
digit from 0–9.  Each image is stored as a 28x28 array of integers, where
each integer is a grayscale value between 0 and 255, inclusive.</p>

<p>MNIST is a canonical dataset for machine learning, often used to test new
machine learning approaches. For details, see
<a href=""http://yann.lecun.com/exdb/mnist/"" target=""T"">
The MNIST Database of Handwritten Digits</a>.</p>

"
model (Google Machine Learning Glossary)|"

<p>The representation of what a machine learning system has learned from
the training data. Within TensorFlow, model is an overloaded term, which
can have either of the following two related meanings:</p>

<ul align=""left"">
<li>The <a href=""https://developers.google.com/machine-learning/glossary/#TensorFlow""><strong>TensorFlow</strong></a> graph that expresses the structure of
how a prediction will be computed.</li>
<li>The particular weights and biases of that TensorFlow graph, which are
determined by <a href=""https://developers.google.com/machine-learning/glossary/#model_training""><strong>training</strong></a>.</li>
</ul>

"
model capacity (Google Machine Learning Glossary)|"

<p>The complexity of problems that a model can learn. The more complex the
problems that a model can learn, the higher the model’s capacity. A model’s
capacity typically increases with the number of model parameters. For a
formal definition of classifier capacity, see
<a href=""https://wikipedia.org/wiki/VC_dimension""
target=""T"">VC dimension</a>.</p>

"
model function (Google Machine Learning Glossary)|"

<p>The function within an <a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimator</strong></a> that implements
machine learning training, evaluation, and inference. For example, the
training portion of a model function might handle tasks such as defining
the topology of a deep neural network and identifying its
<a href=""https://developers.google.com/machine-learning/glossary/#optimizer""><strong>optimizer</strong></a> function.  When using
<a href=""https://developers.google.com/machine-learning/glossary/#premade_Estimator""><strong>premade Estimators</strong></a>, someone has
already written the model function for you.  When using
<a href=""https://developers.google.com/machine-learning/glossary/#custom_estimator""><strong>custom Estimators</strong></a>, you must write the model
function yourself.</p>

<p>For details about writing a model function, see the
<a href=""https://www.tensorflow.org/guide/custom_estimators"" target=""T"">
Creating Custom Estimators chapter</a> in the TensorFlow Programmers Guide.</p>

"
model training (Google Machine Learning Glossary)|"

<p>The process of determining the best <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a>.</p>

"
Momentum (Google Machine Learning Glossary)|"

<p>A sophisticated gradient descent algorithm in which a learning step depends
not only on the derivative in the current step, but also on the derivatives
of the step(s) that immediately preceded it. Momentum involves computing an
exponentially weighted moving average of the gradients over time, analogous
to momentum in physics.  Momentum sometimes prevents learning from getting
stuck in local minima.</p>

"
multi-class classification (Google Machine Learning Glossary)|"

<p>Classification problems that distinguish among more than two classes. For
example, there are approximately 128 species of maple trees, so a model
that categorized maple tree species would be multi-class. Conversely, a
model that divided emails into only two categories (<em>spam</em> and <em>not spam</em>)
would be a <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification model</strong></a>.</p>

"
multi-class logistic regression (Google Machine Learning Glossary)|"

<p>Using <a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a> in
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a> problems.</p>

"
multinomial classification (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification</strong></a>.</p>


"
NaN trap (Google Machine Learning Glossary)|"

<p>When one number in your model becomes a
<a href=""https://wikipedia.org/wiki/NaN"" target=""T"">NaN</a>
during training, which causes
many or all other numbers in your model to eventually become a NaN.</p>

<p>NaN is an abbreviation for &quot;Not a Number.&quot;</p>

"
natural language understanding (Google Machine Learning Glossary)|"

<p>Determining a user&#39;s intentions based on what the user typed or said.
For example, a search engine uses natural language understanding to
determine what the user is searching for based on what the user typed or said.</p>

"
negative class (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a>, one class is
termed positive and the other is termed negative. The positive class is
the thing we&#39;re looking for and the negative class is the other possibility.
For example, the negative class in a medical test might be &quot;not tumor.&quot;
The negative class in an email classifier might be &quot;not spam.&quot;
See also <a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive class</strong></a>.</p>

"
neural network (Google Machine Learning Glossary)|"

<p>A model that, taking inspiration from the brain, is composed of layers
(at least one of which is <a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden</strong></a>) consisting of
simple connected units or <a href=""https://developers.google.com/machine-learning/glossary/#neuron""><strong>neurons</strong></a> followed by nonlinearities.</p>

"
neuron (Google Machine Learning Glossary)|"

<p>A node in a <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>, typically taking in
multiple input values and generating one output value. The neuron calculates
the output value by applying an
<a href=""https://developers.google.com/machine-learning/glossary/#activation_function""><strong>activation function</strong></a> (nonlinear transformation)
to a weighted sum of input values.</p>

"
N-gram (Google Machine Learning Glossary)|"

<p>An ordered sequence of N words.  For example, <em>truly madly</em> is a 2-gram. Because
order is relevant, <em>madly truly</em> is a different 2-gram than <em>truly madly</em>.</p>

<table>
  <tr>
    <th>N</th>
    <th>Name(s) for this kind of N-gram</th>
    <th>Examples</th>
  </tr>
  <tr>
    <td>2 </td>
    <td>bigram or 2-gram </td>
    <td><em>to go, go to, eat lunch, eat dinner</em> </td>
  </tr>
  <tr>
    <td>3 </td>
    <td>trigram or 3-gram </td>
    <td><em>ate too much, three blind mice, the bell tolls</em> </td>
  </tr>
  <tr>
    <td>4 </td>
    <td>4-gram </td>
    <td><em>walk in the park, dust in the wind, the boy ate lentils</em> </td>
  </tr>
</table>

<p>Many <a href=""https://developers.google.com/machine-learning/glossary/#natural_language_understanding""><strong>natural language understanding</strong></a>
models rely on N-grams to predict the next word that the user will type
or say. For example, suppose a user typed <em>three blind</em>.
An NLU model based on trigrams would likely predict that the
user will next type <em>mice</em>.</p>

<p>Contrast N-grams with <a href=""https://developers.google.com/machine-learning/glossary/#bag_of_words""><strong>bag of words</strong></a>, which are
unordered sets of words.</p>

"
NLU (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#natural_language_understanding""><strong>natural language
understanding</strong></a>.</p>

"
node (neural network) (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#neuron""><strong>neuron</strong></a> in a <a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layer</strong></a>.</p>

"
node (TensorFlow graph) (Google Machine Learning Glossary)|"

<p>An operation in a TensorFlow <a href=""https://developers.google.com/machine-learning/glossary/#graph""><strong>graph</strong></a>.</p>

"
noise (Google Machine Learning Glossary)|"

<p>Broadly speaking, anything that obscures the signal in a dataset. Noise
can be introduced into data in a variety of ways. For example:</p>

<ul align=""left"">
<li>Human raters make mistakes in labeling.</li>
<li>Humans and instruments mis-record or omit feature values.</li>
</ul>

"
non-response bias (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>selection bias</strong></a>.</p>

"
normalization (Google Machine Learning Glossary)|"

<p>The process of converting an actual range of values into a standard range
of values, typically -1 to +1 or 0 to 1. For example, suppose the natural
range of a certain feature is 800 to 6,000. Through subtraction and division,
you can normalize those values into the range -1 to +1.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#scaling""><strong>scaling</strong></a>.</p>

"
numerical data (Google Machine Learning Glossary)|"

<p><a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>Features</strong></a> represented as integers or real-valued numbers.
For example, in a real estate model, you would probably represent the size
of a house (in square feet or square meters) as numerical data.  Representing
a feature as numerical data indicates that the feature&#39;s values have
a <em>mathematical</em> relationship to each other and possibly to the label.
For example, representing the size of a house as numerical data indicates
that a 200 square-meter house is twice as large as a 100 square-meter house.
Furthermore, the number of square meters in a house probably has some
mathematical relationship to the price of the house.</p>

<p>Not all integer data should be represented as numerical data. For example,
postal codes in some parts of the world are integers; however, integer postal
codes should not be represented as numerical data in models. That&#39;s because a
postal code of <code translate=""no"" dir=""ltr"">20000</code> is not twice (or half) as potent as a postal code of
10000. Furthermore, although different postal codes <em>do</em> correlate to different
real estate values, we can&#39;t assume that real estate values at postal code
20000 are twice as valuable as real estate values at postal code 10000.
Postal codes should be represented as <a href=""https://developers.google.com/machine-learning/glossary/#categorical_data""><strong>categorical data</strong></a>
instead.</p>

<p>Numerical features are sometimes called
<a href=""https://developers.google.com/machine-learning/glossary/#continuous_feature""><strong>continuous features</strong></a>.</p>

"
NumPy (Google Machine Learning Glossary)|"

<p>An <a href=""http://www.numpy.org/"" target=""T"">
open-source math library</a>
that provides efficient array operations in Python.
<a href=""https://developers.google.com/machine-learning/glossary/#pandas""><strong>pandas</strong></a> is built on NumPy.</p>


"
objective (Google Machine Learning Glossary)|"

<p>A metric that your algorithm is trying to optimize.</p>

"
objective function (Google Machine Learning Glossary)|"

<p>The mathematical formula or metric that a model aims to optimize.
For example, the objective function for
<a href=""https://developers.google.com/machine-learning/glossary/#linear_regression""><strong>linear regression</strong></a> is usually
<a href=""https://developers.google.com/machine-learning/glossary/#squared_loss""><strong>squared loss</strong></a>. Therefore, when training a
linear regression model, the goal is to minimize squared loss.</p>

<p>In some cases, the goal is to maximize the objective function.
For example, if the objective function is accuracy, the goal is
to maximize accuracy.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a>.</p>

"
offline inference (Google Machine Learning Glossary)|"

<p>Generating a group of <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a>, storing those
predictions, and then retrieving those predictions on demand. Contrast
with <a href=""https://developers.google.com/machine-learning/glossary/#online_inference""><strong>online inference</strong></a>.</p>

"
one-hot encoding (Google Machine Learning Glossary)|"

<p>A sparse vector in which:</p>

<ul align=""left"">
<li>One element is set to 1.</li>
<li>All other elements are set to 0.</li>
</ul>

<p>One-hot encoding is commonly used to represent strings or identifiers that
have a finite set of possible values. For example, suppose a given botany
dataset chronicles 15,000 different species, each denoted with a unique
string identifier. As part of feature engineering, you&#39;ll probably encode
those string identifiers as one-hot vectors in which the vector has a size
of 15,000.</p>

"
one-shot learning (Google Machine Learning Glossary)|"

<p>A machine learning approach, often used for object classification,
designed to learn effective classifiers from a single training example.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#few-shot_learning""><strong>few-shot learning</strong></a>.</p>

"
one-vs.-all (Google Machine Learning Glossary)|"

<p>Given a classification problem with N possible solutions, a one-vs.-all
solution consists of N separate
<a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classifiers</strong></a>—one binary classifier for
each possible outcome. For example, given a model that classifies examples
as animal, vegetable, or mineral, a one-vs.-all solution would provide the
following three separate binary classifiers:</p>

<ul align=""left"">
<li>animal vs. not animal</li>
<li>vegetable vs. not vegetable</li>
<li>mineral vs. not mineral</li>
</ul>

"
online inference (Google Machine Learning Glossary)|"

<p>Generating <a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a> on demand. Contrast with
<a href=""https://developers.google.com/machine-learning/glossary/#offline_inference""><strong>offline inference</strong></a>.</p>

"
Operation (op) (Google Machine Learning Glossary)|"

<p>A node in the TensorFlow graph. In TensorFlow, any procedure that creates,
manipulates, or destroys a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a> is an operation. For
example, a matrix multiply is an operation that takes two Tensors as
input and generates one Tensor as output.</p>

"
optimizer (Google Machine Learning Glossary)|"

<p>A specific implementation of the <a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a>
algorithm. TensorFlow&#39;s base class for optimizers is
<a href=""https://www.tensorflow.org/api_docs/python/tf/train/Optimizer""
target=""T"">tf.train.Optimizer</a>. Popular optimizers include:</p>

<ul align=""left"">
<li><a href=""https://www.tensorflow.org/api_docs/python/tf/train/AdagradOptimizer"" target=""T"">AdaGrad</a>,
    which stands for ADAptive GRADient descent.</li>
<li><a href=""https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer"" target=""T"">Adam</a>,
    which stands for ADAptive with Momentum.</li>
</ul>

<p>Different optimizers may leverage one or more of the following concepts
to enhance the effectiveness of gradient descent on a
given <a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training set</strong></a>:</p>

<ul align=""left"">
<li><a href=""https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer""
target=""T"">momentum</a> (Momentum)</li>
<li>update frequency</li>
<li>sparsity/regularization
(<a href=""https://www.tensorflow.org/api_docs/python/tf/train/FtrlOptimizer""
target=""T"">Ftrl</a>)</li>
<li>more complex math
(<a href=""https://www.tensorflow.org/api_docs/python/tf/train/ProximalGradientDescentOptimizer""
target=""T"">Proximal</a>,
and others)</li>
</ul>

<p>You might even imagine an
<a href=""https://arxiv.org/abs/1606.04474"" target=""T"">NN-driven optimizer</a>.</p>

"
out-group homogeneity bias (Google Machine Learning Glossary)|"

<p>The tendency to see out-group members as more alike than in-group members
when comparing attitudes, values, personality traits, and other
characteristics. <strong>In-group</strong> refers to people you interact with regularly;
<strong>out-group</strong> refers to people you do not interact with regularly. If you
create a dataset by asking people to provide attributes about
out-groups, those attributes may be less nuanced and more stereotyped
than attributes that participants list for people in their in-group.</p>

<p>For example, Lilliputians might describe the houses of other Lilliputians
in great detail, citing small differences in architectural styles, windows,
doors, and sizes.  However, the same Lilliputians might simply declare that
Brobdingnagians all live in identical houses.</p>

<p>Out-group homogeneity bias is a form of
<a href=""https://developers.google.com/machine-learning/glossary/#group_attribution_bias""><strong>group attribution bias</strong></a>.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#in-group_bias""><strong>in-group bias</strong></a>.</p>

"
outliers (Google Machine Learning Glossary)|"

<p>Values distant from most other values. In machine learning, any of the
following are outliers:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>Weights</strong></a> with high absolute values.</li>
<li>Predicted values relatively far away from the actual values.</li>
<li>Input data whose values are more than roughly 3 standard deviations
from the mean.</li>
</ul>

<p>Outliers often cause problems in model training. <a href=""https://developers.google.com/machine-learning/glossary/#clipping""><strong>Clipping</strong></a>
is one way of managing outliers.</p>

"
output layer (Google Machine Learning Glossary)|"

<p>The &quot;final&quot; layer of a neural network. The layer containing the answer(s).</p>

"
overfitting (Google Machine Learning Glossary)|"

<p>Creating a model that matches the <a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training data</strong></a> so
closely that the model fails to make correct predictions on new data.</p>


"
pandas (Google Machine Learning Glossary)|"

<p>A column-oriented data analysis API. Many machine learning frameworks,
including TensorFlow, support pandas data structures as input. See the
<a href=""http://pandas.pydata.org/"" target=""T"">pandas documentation</a>
for details.</p>

"
parameter (Google Machine Learning Glossary)|"

<p>A variable of a model that the machine learning system trains on its own.
For example, <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weights</strong></a> are parameters whose values the
machine learning system gradually learns through successive training
iterations. Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#hyperparameter""><strong>hyperparameter</strong></a>.</p>

"
Parameter Server (PS) (Google Machine Learning Glossary)|"

<p>A job that keeps track of a model&#39;s <a href=""https://developers.google.com/machine-learning/glossary/#parameter""><strong>parameters</strong></a> in a
distributed setting.</p>

<p>See the <a href=""https://www.tensorflow.org/guide/extend/architecture""
target=""T"">TensorFlow Architecture chapter</a> in the TensorFlow Programmers
Guide for details.</p>

"
parameter update (Google Machine Learning Glossary)|"

<p>The operation of adjusting a model&#39;s <a href=""https://developers.google.com/machine-learning/glossary/#parameter""><strong>parameters</strong></a> during
training, typically within a single iteration of
<a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a>.</p>

"
partial derivative (Google Machine Learning Glossary)|"

<p>A derivative in which all but one of the variables is considered a constant.
For example, the partial derivative of <em>f(x, y)</em> with respect to <em>x</em> is the
derivative of <em>f</em> considered as a function of <em>x</em> alone (that is, keeping <em>y</em>
constant). The partial derivative of <em>f</em> with respect to <em>x</em> focuses only on
how <em>x</em> is changing and ignores all other variables in the equation.</p>

"
participation bias (Google Machine Learning Glossary)|"

<p>Synonym for non-response bias.  See <a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>selection bias</strong></a>.</p>

"
partitioning strategy (Google Machine Learning Glossary)|"

<p>The algorithm by which variables are divided across
<a href=""https://developers.google.com/machine-learning/glossary/#Parameter_Server""><strong>parameter servers</strong></a>.</p>


"
perceptron (Google Machine Learning Glossary)|"

<p>A system (either hardware or software) that takes in one or more input values,
runs a function on the weighted sum of the inputs, and computes a single
output value. In machine learning, the function is typically nonlinear, such as
<a href=""https://developers.google.com/machine-learning/glossary/#ReLU""><strong>ReLU</strong></a>, <a href=""https://developers.google.com/machine-learning/glossary/#sigmoid_function""><strong>sigmoid</strong></a>, or tanh.
For example, the following perceptron relies on the sigmoid function to process
three input values:</p>

<div>
[$$]f(x_1, x_2, x_3) = \text{sigmoid}(w_1 x_1 + w_2 x_2 + w_3 x_3)[/$$]
</div>

<p>In the following illustration, the perceptron takes three inputs, each of which
is itself modified by a weight before entering the perceptron:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/Perceptron.svg"" height=""300""
alt=""A perceptron that takes in 3 inputs, each multiplied by separate
weights. The perceptron outputs a single value.""></img>
</p>

<p>Perceptrons are the (<a href=""https://developers.google.com/machine-learning/glossary/#node""><strong>nodes</strong></a>) in <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep neural
networks</strong></a>. That is, a deep neural network consists of
multiple connected perceptrons, plus a
<a href=""https://developers.google.com/machine-learning/glossary/#backpropagation""><strong>backpropagation</strong></a> algorithm to introduce feedback.</p>

"
performance (Google Machine Learning Glossary)|"

<p>Overloaded term with the following meanings:</p>

<ul align=""left"">
<li>The traditional meaning within software engineering. Namely: How fast
(or efficiently) does this piece of software run?</li>
<li>The meaning within machine learning. Here, performance answers the
following question: How correct is this <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a>? That is,
how good are the model&#39;s predictions?</li>
</ul>

"
perplexity (Google Machine Learning Glossary)|"

<p>One measure of how well a <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> is accomplishing its task.
For example, suppose your task is to read the first few letters of a word
a user is typing on a smartphone keyboard, and to offer a list of possible
completion words. Perplexity, P, for this task is approximately the number
of guesses you need to offer in order for your list to contain the actual
word the user is trying to type.</p>

<p>Perplexity is related to <a href=""https://developers.google.com/machine-learning/glossary/#cross-entropy""><strong>cross-entropy</strong></a> as follows:</p>

<div>
[$$]P= 2^{-\text{cross entropy}}[/$$]
</div>

"
pipeline (Google Machine Learning Glossary)|"

<p>The infrastructure surrounding a machine learning algorithm. A pipeline
includes gathering the data, putting the data into training data files,
training one or more models, and exporting the models to production.</p>

"
policy (Google Machine Learning Glossary)|"

<p>In reinforcement learning, an <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent&#39;s</strong></a> probabilistic mapping
from <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>states</strong></a> to <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>actions</strong></a>.</p>

"
pooling (Google Machine Learning Glossary)|"

<p>Reducing a matrix (or matrices) created by an earlier
<a href=""https://developers.google.com/machine-learning/glossary/#convolutional_layer""><strong>convolutional layer</strong></a> to a smaller matrix.
Pooling usually involves taking either the maximum or average value
across the pooled area. For example, suppose we have the
following 3x3 matrix:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/PoolingStart.svg"">
</p>

<p>A pooling operation, just like a convolutional operation, divides that
matrix into slices and then slides that convolutional operation by
<a href=""https://developers.google.com/machine-learning/glossary/#stride""><strong>strides</strong></a>. For example, suppose the pooling operation
divides the convolutional matrix into 2x2 slices with a 1x1 stride.
As the following diagram illustrates, four pooling operations take place.
Imagine that each pooling operation picks the maximum value of the
four in that slice:</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/PoolingConvolution.svg"">
</p>

<p>Pooling helps enforce
<a href=""https://developers.google.com/machine-learning/glossary/#translational_invariance""><strong>translational invariance</strong></a> in the input matrix.</p>

<p>Pooling for vision applications is known more formally as <strong>spatial pooling</strong>.
Time-series applications usually refer to pooling as <strong>temporal pooling</strong>.
Less formally, pooling is often called <strong>subsampling</strong> or <strong>downsampling</strong>.</p>

"
positive class (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#binary_classification""><strong>binary classification</strong></a>, the two possible
classes are labeled as positive and negative. The positive outcome is the
thing we&#39;re testing for. (Admittedly, we&#39;re simultaneously testing for
both outcomes, but play along.) For example, the positive class in a
medical test might be &quot;tumor.&quot; The positive class in an email classifier
might be &quot;spam.&quot;</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative class</strong></a>.</p>

"
post-processing (Google Machine Learning Glossary)|"
Processing the output of a model <em>after</em> the model has been run.
Post-processing can be used to enforce fairness constraints without
modifying models themselves.</p>

<p>For example, one might apply post-processing to a binary classifier
by setting a classification threshold such that
<a href=""https://developers.google.com/machine-learning/glossary/#equality_of_opportunity""><strong>equality of opportunity</strong></a> is maintained
for some attribute by checking that the <a href=""https://developers.google.com/machine-learning/glossary/#TP_rate""><strong>true positive rate</strong></a>
is the same for all values of that attribute.</p>

"
PR AUC (area under the PR curve) (Google Machine Learning Glossary)|"

<p>Area under the interpolated
<a href=""https://developers.google.com/machine-learning/glossary/#precision-recall_curve""><strong>precision-recall curve</strong></a>, obtained by plotting
(recall, precision) points for different values of the
<a href=""https://developers.google.com/machine-learning/glossary/#classification_threshold""><strong>classification threshold</strong></a>. Depending on how
it&#39;s calculated, PR AUC may be equivalent to the
<a href=""https://developers.google.com/machine-learning/glossary/#average_precision""><strong>average precision</strong></a> of the model.</p>

"
precision (Google Machine Learning Glossary)|"

<p>A metric for <a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification models</strong></a>. Precision
identifies the frequency with which a model was correct when predicting the
<a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive class</strong></a>. That is:</p>

<div>
[$$]\text{Precision} =
\frac{\text{True Positives}} {\text{True Positives} + \text{False Positives}}[/$$]
</div>

"
precision-recall curve (Google Machine Learning Glossary)|"

<p>A curve of <a href=""https://developers.google.com/machine-learning/glossary/#precision""><strong>precision</strong></a> vs. <a href=""https://developers.google.com/machine-learning/glossary/#recall""><strong>recall</strong></a> at different
<a href=""https://developers.google.com/machine-learning/glossary/#classification_threshold""><strong>classification thresholds</strong></a>.</p>

"
prediction (Google Machine Learning Glossary)|"

<p>A model&#39;s output when provided with an input <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>example</strong></a>.</p>

"
prediction bias (Google Machine Learning Glossary)|"

<p>A value indicating how far apart the average of
<a href=""https://developers.google.com/machine-learning/glossary/#prediction""><strong>predictions</strong></a> is from the average of <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>labels</strong></a>
in the dataset.</p>

<p>Not to be confused with the <a href=""https://developers.google.com/machine-learning/glossary/#bias""><strong>bias term</strong></a> in machine learning models
or with <a href=""https://developers.google.com/machine-learning/glossary/#bias_ethics""><strong>bias in ethics and fairness</strong></a>.</p>

"
predictive parity (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#fairness_metric""><strong>fairness metric</strong></a> that checks whether,
for a given classifier, the <a href=""https://developers.google.com/machine-learning/glossary/#precision""><strong>precision</strong></a> rates
are equivalent for subgroups under consideration.</p>

<p>For example, a model that predicts college acceptance would satisfy
predictive parity for nationality if its precision rate is the same
for Lilliputians and Brobdingnagians.</p>

<p>Predictive parity is sometime also called <em>predictive rate parity</em>.</p>

<p>See <a href=""http://fairware.cs.umass.edu/papers/Verma.pdf"">&quot;Fairness Definitions
Explained&quot;</a> (section 3.2.1)
for a more detailed discussion of predictive parity.</p>

"
predictive rate parity (Google Machine Learning Glossary)|"

<p>Another name for <a href=""https://developers.google.com/machine-learning/glossary/#predictive_parity""><strong>predictive parity</strong></a>.</p>

"
premade Estimator (Google Machine Learning Glossary)|"

<p>An <a href=""https://developers.google.com/machine-learning/glossary/#Estimators""><strong>Estimator</strong></a> that someone has already built.
TensorFlow provides several premade Estimators, including <code translate=""no"" dir=""ltr"">DNNClassifier</code>,
<code translate=""no"" dir=""ltr"">DNNRegressor</code>, and <code translate=""no"" dir=""ltr"">LinearClassifier</code>.  To learn more about
premade Estimators, see the
<a href=""https://www.tensorflow.org/guide/premade_estimators""
target=""T"">Premade Estimators chapter</a> in the TensorFlow Programmers Guide.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#custom_estimator""><strong>custom estimators</strong></a>.</p>

" 
preprocessing (Google Machine Learning Glossary)|"
Processing data before it&#39;s used to train a model. Preprocessing could
be as simple as removing words from an English text corpus that don&#39;t
occur in the English dictionary, or could be as complex as re-expressing
data points in a way that eliminates as many attributes that are correlated
with  <a href=""https://developers.google.com/machine-learning/glossary/#sensitive_attribute""><strong>sensitive attributes</strong></a> as possible.
Preprocessing can help satisfy <a href=""https://developers.google.com/machine-learning/glossary/#fairness_constraint""><strong>fairness constraints</strong></a>.</p>

"
pre-trained model (Google Machine Learning Glossary)|"

<p>Models or model components (such as <a href=""https://developers.google.com/machine-learning/glossary/#embeddings""><strong>embeddings</strong></a>) that have
been already been trained. Sometimes, you&#39;ll feed pre-trained embeddings
into a <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>. Other times, your model will
train the embeddings itself rather than rely on the pre-trained embeddings.</p>

"
prior belief (Google Machine Learning Glossary)|"

<p>What you believe about the data before you begin training on it. For
example, <a href=""https://developers.google.com/machine-learning/glossary/#L2_regularization""><strong>L<sub>2</sub> regularization</strong></a> relies on
a prior belief that <a href=""https://developers.google.com/machine-learning/glossary/#weight""><strong>weights</strong></a> should be small and normally
distributed around zero.</p>

"
proxy (sensitive attributes) (Google Machine Learning Glossary)|"
An attribute used as a stand-in for a
<a href=""https://developers.google.com/machine-learning/glossary/#sensitive_attribute""><strong>sensitive attribute</strong></a>. For example, an
individual&#39;s postal code might be used as a proxy for their income,
race, or ethnicity.</p>

"
proxy labels (Google Machine Learning Glossary)|"

<p>Data used to approximate labels not directly available in a dataset.</p>

<p>For example, suppose you want <em>is it raining?</em> to be a Boolean label
for your dataset, but the dataset doesn&#39;t contain rain data. If
photographs are available, you might establish pictures of people
carrying umbrellas as a proxy label for <em>is it raining?</em>  However,
proxy labels may distort results. For example, in some places, it
may be more common to carry umbrellas to protect against sun than
the rain.</p>


"
Q-function (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the function that predicts the expected
<a href=""https://developers.google.com/machine-learning/glossary/#return""><strong>return</strong></a> from taking an <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a> in a
<a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a> and then following a given <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a>.</p>

<p>Q-function is also known as <strong>state-action value function</strong>.</p>

"
Q-learning (Google Machine Learning Glossary)|"

<p>In reinforcement learning, an algorithm that allows an <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a>
to learn the optimal <a href=""https://developers.google.com/machine-learning/glossary/#q-function""><strong>Q-function</strong></a> of a
<a href=""https://developers.google.com/machine-learning/glossary/#markov_decision_process""><strong>Markov decision process</strong></a> by applying the
<a href=""https://developers.google.com/machine-learning/glossary/#bellman_equation""><strong>Bellman equation</strong></a>. The Markov decision process models
an <a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environment</strong></a>.</p>

"
quantile (Google Machine Learning Glossary)|"

<p>Each bucket in <a href=""https://developers.google.com/machine-learning/glossary/#quantile_bucketing""><strong>quantile bucketing</strong></a>.</p>

"
quantile bucketing (Google Machine Learning Glossary)|"

<p>Distributing a feature&#39;s values into <a href=""https://developers.google.com/machine-learning/glossary/#bucketing""><strong>buckets</strong></a> so that each
bucket contains the same (or almost the same) number of examples.  For example,
the following figure divides 44 points into 4 buckets, each of which
contains 11 points.  In order for each bucket in the figure to contain the
same number of points, some buckets span a different width of x-values.</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/QuantileBucketing.svg"" alt=""40 data
points divided into 4 buckets of 11 points each. Some of the buckets contain
a wider range of feature values than others.""></img>
</p>

"
quantization (Google Machine Learning Glossary)|"

<p>An algorithm that implements <a href=""https://developers.google.com/machine-learning/glossary/#quantile_bucketing""><strong>quantile bucketing</strong></a> on
a particular <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a> in a <a href=""https://developers.google.com/machine-learning/glossary/#data_set""><strong>dataset</strong></a>.</p>

"
queue (Google Machine Learning Glossary)|"

<p>A TensorFlow <a href=""https://developers.google.com/machine-learning/glossary/#Operation""><strong>Operation</strong></a> that implements a queue data
structure. Typically
used in I/O.</p>


"
random forest (Google Machine Learning Glossary)|"

<p>An ensemble approach to finding the <a href=""https://developers.google.com/machine-learning/glossary/#decision_tree""><strong>decision tree</strong></a> that
best fits the training data by creating many decision trees and then
determining the &quot;average&quot; one. The &quot;random&quot; part of the term refers to
building each of the decision trees from a random selection of features;
the &quot;forest&quot; refers to the set of decision trees.</p>

"
random policy (Google Machine Learning Glossary)|"

<p>In reinforcement learning, a <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a> that chooses an
<a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a> at random.</p>

"
rank (ordinality) (Google Machine Learning Glossary)|"

<p>The ordinal position of a class in a machine learning problem that categorizes
classes from highest to lowest. For example, a behavior ranking
system could rank a dog&#39;s rewards from highest (a steak) to
lowest (wilted kale).</p>

"
rank (Tensor) (Google Machine Learning Glossary)|"

<p>The number of dimensions in a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a>. For instance,
a scalar has rank 0, a vector has rank 1, and a matrix has rank 2.</p>

<p>Not to be confused with <a href=""https://developers.google.com/machine-learning/glossary/#rank_ordinality""><strong>rank (ordinality)</strong></a>.</p>

"
rater (Google Machine Learning Glossary)|"

<p>A human who provides <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>labels</strong></a> in <a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>examples</strong></a>.
Sometimes called an &quot;annotator.&quot;</p>

"
recall (Google Machine Learning Glossary)|"

<p>A metric for <a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification models</strong></a> that answers
the following question: Out of all the possible positive labels, how many
did the model correctly identify? That is:</p>

<p>\[\text{Recall} =
\frac{\text{True Positives}} {\text{True Positives} + \text{False Negatives}}
\]</p>

"
recommendation system (Google Machine Learning Glossary)|"

<p>A system that selects for each user a relatively small set of desirable
<a href=""https://developers.google.com/machine-learning/glossary/#items""><strong>items</strong></a> from a large corpus.
For example, a video recommendation system might recommend two videos
from a corpus of 100,000 videos, selecting <em>Casablanca</em> and
<em>The Philadelphia Story</em> for one user, and <em>Wonder Woman</em> and
<em>Black Panther</em> for another. A video recommendation system might
base its recommendations on factors such as:</p>

<ul align=""left"">
<li>Movies that similar users have rated or watched.</li>
<li>Genre, directors, actors, target demographic...</li>
</ul>

"
Rectified Linear Unit (ReLU) (Google Machine Learning Glossary)|"

<p>An <a href=""https://developers.google.com/machine-learning/glossary/#activation_function""><strong>activation function</strong></a> with the following rules:</p>

<ul align=""left"">
<li>If input is negative or zero, output is 0.</li>
<li>If input is positive, output is equal to input.</li>
</ul>

"
recurrent neural network (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> that is intentionally run multiple
times, where parts of each run feed into the next run. Specifically,
hidden layers from the previous run provide part of the
input to the same hidden layer in the next run. Recurrent neural networks
are particularly useful for evaluating sequences, so that the hidden layers
can learn from previous runs of the neural network on earlier parts of
the sequence.</p>

<p>For example, the following figure shows a recurrent neural network that
runs four times. Notice that the values learned in the hidden layers from
the first run become part of the input to the same hidden layers in
the second run. Similarly, the values learned in the hidden layer on the
second run become part of the input to the same hidden layer in the
third run. In this way, the recurrent neural network gradually trains and
predicts the meaning of the entire sequence rather than just the meaning
of individual words.</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/RNN.svg"" height=""500""
alt=""An RNN that runs four times to process four input words.""></img>
</p>

"
regression model (Google Machine Learning Glossary)|"

<p>A type of model that outputs continuous (typically, floating-point) values.
Compare with <a href=""https://developers.google.com/machine-learning/glossary/#classification_model""><strong>classification models</strong></a>, which
output discrete values, such as &quot;day lily&quot; or &quot;tiger lily.&quot;</p>

"
regularization (Google Machine Learning Glossary)|"

<p>The penalty on a model&#39;s complexity. Regularization helps prevent
<a href=""https://developers.google.com/machine-learning/glossary/#overfitting""><strong>overfitting</strong></a>. Different kinds of regularization include:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#L1_regularization""><strong>L<sub>1</sub> regularization</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#L2_regularization""><strong>L<sub>2</sub> regularization</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#dropout_regularization""><strong>dropout regularization</strong></a></li>
<li><a href=""https://developers.google.com/machine-learning/glossary/#early_stopping""><strong>early stopping</strong></a> (this is not a formal
regularization method, but can effectively limit overfitting)</li>
</ul>

"
regularization rate (Google Machine Learning Glossary)|"

<p>A scalar value, represented as lambda, specifying the relative importance
of the regularization function. The following simplified <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a>
equation shows the regularization rate&#39;s influence:</p>

<div>
[$$]\text{minimize(loss function + }\lambda\text{(regularization function))}[/$$]
</div>

<p>Raising the regularization rate reduces <a href=""https://developers.google.com/machine-learning/glossary/#overfitting""><strong>overfitting</strong></a>
but may make the model less <a href=""https://developers.google.com/machine-learning/glossary/#accuracy""><strong>accurate</strong></a>.</p>

"
reinforcement learning (RL) (Google Machine Learning Glossary)|"

<p>A family of algorithms that learn an optimal <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a>, whose goal
is to maximize <a href=""https://developers.google.com/machine-learning/glossary/#return""><strong>return</strong></a> when interacting with
an <a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environment</strong></a>.
For example, the ultimate reward of most games is victory.
Reinforcement learning systems can become expert at playing complex
games by evaluating sequences of previous game moves that ultimately
led to wins and sequences that ultimately led to losses.</p>

"
replay buffer (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#deep_q-network""><strong>DQN</strong></a>-like algorithms, the memory used by the agent
to store state transitions for use in
<a href=""https://developers.google.com/machine-learning/glossary/#experience_replay""><strong>experience replay</strong></a>.</p>

"
reporting bias (Google Machine Learning Glossary)|"

<p>The fact that the frequency with which people write about actions,
outcomes, or properties is not a reflection of their real-world
frequencies or the degree to which a property is characteristic
of a class of individuals. Reporting bias can influence the composition
of data that machine learning systems learn from.</p>

<p>For example, in books, the word <em>laughed</em> is more prevalent than
<em>breathed</em>.  A machine learning model that estimates the relative frequency of
laughing and breathing from a book corpus would probably determine
that laughing is more common than breathing.</p>

"
representation (Google Machine Learning Glossary)|"

<p>The process of mapping data to useful <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a>.</p>

"
re-ranking (Google Machine Learning Glossary)|"

<p>The final stage of a <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation system</strong></a>,
during which scored items may be re-graded according to some other
(typically, non-ML) algorithm. Re-ranking evaluates the list of items
generated by the <a href=""https://developers.google.com/machine-learning/glossary/#scoring""><strong>scoring</strong></a> phase, taking actions such as:</p>

<ul align=""left"">
<li>Eliminating items that the user has already purchased.</li>
<li>Boosting the score of fresher items.</li>
</ul>

"
return (Google Machine Learning Glossary)|"

<p>In reinforcement learning, given a certain policy and a certain state, the
return is the sum of all <a href=""https://developers.google.com/machine-learning/glossary/#reward""><strong>rewards</strong></a> that the <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a>
expects to receive when following the <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a> from the
<a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a> to the end of the <a href=""https://developers.google.com/machine-learning/glossary/#episode""><strong>episode</strong></a>. The agent
accounts for the delayed nature of expected rewards by discounting rewards
according to the state transitions required to obtain the reward.</p>

<p>Therefore, if the discount factor is \(\gamma\), and \(r_0, \ldots, r_{N}\)
denote the rewards until the end of the episode, then the return calculation
is as follows:</p>

<div>
[$$]\text{Return} = r_0 + \gamma r_1 + \gamma^2 r_2 + \ldots + \gamma^{N-1} r_{N-1}[/$$]
</div>

"
reward (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the numerical result of taking an
<a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a> in a <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a>, as defined by
the <a href=""https://developers.google.com/machine-learning/glossary/#environment""><strong>environment</strong></a>.</p>

"
ridge regularization (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#L2_regularization""><strong>L<sub>2</sub> regularization</strong></a>. The term
<strong>ridge regularization</strong> is more frequently used in pure statistics
contexts, whereas <strong>L<sub>2</sub> regularization</strong> is used more often
in machine learning.</p>

"
RNN (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#recurrent_neural_network""><strong>recurrent neural networks</strong></a>.</p>

"
ROC (receiver operating characteristic) Curve (Google Machine Learning Glossary)|"

<p>A curve of <a href=""https://developers.google.com/machine-learning/glossary/#TP_rate""><strong>true positive rate</strong></a> vs.
<a href=""https://developers.google.com/machine-learning/glossary/#FP_rate""><strong>false positive rate</strong></a> at different
<a href=""https://developers.google.com/machine-learning/glossary/#classification_threshold""><strong>classification thresholds</strong></a>. See also
<a href=""https://developers.google.com/machine-learning/glossary/#AUC""><strong>AUC</strong></a>.</p>

"
root directory (Google Machine Learning Glossary)|"

<p>The directory you specify for hosting subdirectories of the TensorFlow
checkpoint and events files of multiple models.</p>

"
Root Mean Squared Error (RMSE) (Google Machine Learning Glossary)|"

<p>The square root of the <a href=""https://developers.google.com/machine-learning/glossary/#MSE""><strong>Mean Squared Error</strong></a>.</p>

"
rotational invariance (Google Machine Learning Glossary)|"

<p>In an image classification problem, an algorithm&#39;s ability to successfully
classify images even when the orientation of the image changes. For example,
the algorithm can still identify a tennis racket whether it is pointing up,
sideways, or down. Note that rotational invariance is not always desirable;
for example, an upside-down 9 should not be classified as a 9.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#translational_invariance""><strong>translational invariance</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#size_invariance""><strong>size invariance</strong></a>.</p>


"
sampling bias (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#selection_bias""><strong>selection bias</strong></a>.</p>

"
SavedModel (Google Machine Learning Glossary)|"

<p>The recommended format for saving and recovering TensorFlow models. SavedModel
is a language-neutral, recoverable serialization format, which enables
higher-level systems and tools to produce, consume, and transform TensorFlow
models.</p>

<p>See the <a href=""https://www.tensorflow.org/guide/saved_model""
target=""T"">Saving and Restoring chapter</a>
in the TensorFlow Programmer&#39;s Guide for complete details.</p>

"
Saver (Google Machine Learning Glossary)|"

<p>A <a href=""https://www.tensorflow.org/api_docs/python/tf/train/Saver""
target=""T"">TensorFlow object</a>
responsible for saving model checkpoints.</p>

"
scalar (Google Machine Learning Glossary)|"

<p>A single number or a single string that can be represented as a
<a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>tensor</strong></a> of <a href=""https://developers.google.com/machine-learning/glossary/#rank""><strong>rank</strong></a> 0. For example, the following
lines of code each create one scalar in TensorFlow:</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
breed = tf.Variable(""poodle"", tf.string)
temperature = tf.Variable(27, tf.int16)
precision = tf.Variable(0.982375101275, tf.float64)
</pre>

"
scaling (Google Machine Learning Glossary)|"

<p>A commonly used practice in <a href=""https://developers.google.com/machine-learning/glossary/#feature_engineering""><strong>feature engineering</strong></a>
to tame a feature&#39;s range of values to match the range of other features in
the dataset. For example, suppose that you want all floating-point features
in the dataset to have a range of 0 to 1. Given a particular feature&#39;s
range of 0 to 500, you could scale that feature by dividing each value
by 500.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#normalization""><strong>normalization</strong></a>.</p>

"
scikit-learn (Google Machine Learning Glossary)|"

<p>A popular open-source machine learning platform. See
<a href=""http://www.scikit-learn.org/""
target=""T"">www.scikit-learn.org</a>.</p>

"
scoring (Google Machine Learning Glossary)|"

<p>The part of a <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation system</strong></a> that
provides a value or ranking for each item produced by the
<a href=""https://developers.google.com/machine-learning/glossary/#candidate_generation""><strong>candidate generation</strong></a> phase.</p>

"
selection bias (Google Machine Learning Glossary)|"

<p>Errors in conclusions drawn from sampled data due to a selection process
that generates systematic differences between samples observed in the data
and those not observed.  The following forms of selection bias exist:</p>

<ul align=""left"">
<li><strong>coverage bias</strong>: The population represented in the dataset does not
match the population that the machine learning model is making
predictions about.</li>
<li><strong>sampling bias</strong>: Data is not collected randomly from the target group.</li>
<li><strong>non-response bias</strong> (also called <strong>participation bias</strong>): Users from
certain groups opt-out of surveys at different rates than users from
other groups.</li>
</ul>

<p>For example, suppose you are creating a machine learning model that predicts
people&#39;s enjoyment of a movie.  To collect training data,
you hand out a survey to everyone in the front row of a theater
showing the movie.  Offhand, this may sound like a reasonable way
to gather a dataset; however, this form of data collection may
introduce the following forms of selection bias:</p>

<ul align=""left"">
<li>coverage bias: By sampling from a population who chose to see
the movie, your model&#39;s predictions may not generalize to people
who did not already express that level of interest in the movie.</li>
<li>sampling bias: Rather than randomly sampling from the
intended population (all the people at the movie), you sampled only
the people in the front row. It is possible that the people sitting
in the front row were more interested in the movie than those in
other rows.</li>
<li>non-response bias: In general, people with strong opinions tend
to respond to optional surveys more frequently than people with mild
opinions.  Since the movie survey is optional, the responses
are more likely to form a
<a href=""https://wikipedia.org/wiki/Multimodal_distribution""
target=""T"">bimodal distribution</a>
than a normal (bell-shaped) distribution.</li>
</ul>

"
semi-supervised learning (Google Machine Learning Glossary)|"

<p>Training a model on data where some of the training examples have labels but
others don’t. One technique for semi-supervised learning is to infer labels for
the unlabeled examples, and then to train on the inferred labels to create a new
model. Semi-supervised learning can be useful if labels are expensive to obtain
but unlabeled examples are plentiful.</p>

"
sensitive attribute (Google Machine Learning Glossary)|"
A human attribute that may be given special consideration for legal,
ethical, social, or personal reasons.</p>

"
sentiment analysis (Google Machine Learning Glossary)|"

<p>Using statistical or machine learning algorithms to determine a group&#39;s
overall attitude—positive or negative—toward a service, product,
organization, or topic. For example, using
<a href=""https://developers.google.com/machine-learning/glossary/#natural_language_understanding""><strong>natural language understanding</strong></a>,
an algorithm could perform sentiment analysis on the textual feedback
from a university course to determine the degree to which students
generally liked or disliked the course.</p>

"
sequence model (Google Machine Learning Glossary)|"

<p>A model whose inputs have a sequential dependence. For example, predicting
the next video watched from a sequence of previously watched videos.</p>

"
serving (Google Machine Learning Glossary)|"

<p>A synonym for <a href=""https://developers.google.com/machine-learning/glossary/#inference""><strong>inferring</strong></a>.</p>


"
session (tf.session) (Google Machine Learning Glossary)|"

<p>An object that encapsulates the state of the TensorFlow runtime
and runs all or part of a <a href=""https://developers.google.com/machine-learning/glossary/#graph""><strong>graph</strong></a>. When using the
low-level TensorFlow APIs, you instantiate and manage one or more
<code translate=""no"" dir=""ltr"">tf.session</code> objects directly. When using the Estimators API,
Estimators instantiate session objects for you.</p>

"
shape (Tensor) (Google Machine Learning Glossary)|"

<p>
The number of elements in each <b><a href=""https://developers.google.com/machine-learning/glossary/#dimensions"">dimension</a></b> of a
tensor. The shape is represented as a list of integers. For example,
the following two-dimensional tensor has a shape of [3,4]:
</p>

<pre class=""prettyprint"" translate=""no"" dir=""ltr"">
[[5, 7, 6, 4],
 [2, 9, 4, 8],
 [3, 6, 5, 1]]
</pre>

<p>TensorFlow uses row-major (C-style) format to represent the order of
dimensions, which is why the shape in TensorFlow is [3,4] rather than
[4,3]. In other words, in a two-dimensional TensorFlow Tensor, the shape
is [<em>number of rows</em>, <em>number of columns</em>].</p>


"
sigmoid function (Google Machine Learning Glossary)|"

<p>A function that maps logistic or multinomial regression output (log odds) to
probabilities, returning a value between 0 and 1.  The sigmoid function has
the following formula:</p>

<div>
[$$]y = \frac{1}{1 + e^{-\sigma}}[/$$]
</div>

<p>where \(\sigma\) in <a href=""https://developers.google.com/machine-learning/glossary/#logistic_regression""><strong>logistic regression</strong></a> problems
is simply:</p>

<div>
[$$]\sigma = b + w_1x_1 + w_2x_2 + … w_nx_n[/$$]
</div>

<p>In other words, the sigmoid function converts \(\sigma\) into a probability
between 0 and 1.</p>

<p>In some <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural networks</strong></a>, the sigmoid function acts as
the <a href=""https://developers.google.com/machine-learning/glossary/#activation_function""><strong>activation function</strong></a>.</p>

"
similarity measure (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#clustering""><strong>clustering</strong></a> algorithms, the metric used to determine
how alike (how similar) any two examples are.</p>

"
size invariance (Google Machine Learning Glossary)|"

<p>In an image classification problem, an algorithm&#39;s ability to successfully
classify images even when the size of the image changes. For example,
the algorithm can still identify a
cat whether it consumes 2M pixels or 200K pixels. Note that even the best
image classification algorithms still have practical limits on size invariance.
For example, an algorithm (or human) is unlikely to correctly classify a
cat image consuming only 20 pixels.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#translational_invariance""><strong>translational invariance</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#rotational_invariance""><strong>rotational invariance</strong></a>.</p>

"
sketching (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning""><strong>unsupervised machine learning</strong></a>,
a category of algorithms that perform a preliminary similarity analysis
on examples. Sketching algorithms use a
<a href=""https://wikipedia.org/wiki/Locality-sensitive_hashing"" target=""T"">
locality-sensitive hash function</a>
to identify points that are likely to be similar, and then group
them into buckets.</p>

<p>Sketching decreases the computation required for similarity calculations
on large datasets. Instead of calculating similarity for every single
pair of examples in the dataset, we calculate similarity only for each
pair of points within each bucket.</p>

"
softmax (Google Machine Learning Glossary)|"

<p>A function that provides probabilities for each possible class in a
<a href=""https://developers.google.com/machine-learning/glossary/#multi-class""><strong>multi-class classification model</strong></a>. The probabilities add up
to exactly 1.0. For example, softmax might determine that the probability of a
particular image being a dog at 0.9, a cat at 0.08, and a horse at 0.02.
(Also called <strong>full softmax</strong>.)</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#candidate_sampling""><strong>candidate sampling</strong></a>.</p>

"
sparse feature (Google Machine Learning Glossary)|"

<p><a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>Feature</strong></a> vector whose values are predominately zero or empty.
For example, a vector containing a single 1 value and a million 0 values is
sparse. As another example, words in a search query could also be a
sparse feature—there are many possible words in a given language, but only a
few of them occur in a given query.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#dense_feature""><strong>dense feature</strong></a>.</p>

"
sparse representation (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#representation""><strong>representation</strong></a> of a tensor that only stores
nonzero elements.</p>

<p>For example, the English language consists of about a million words.
Consider two ways to represent a count of the words used in one English
sentence:</p>

<ul align=""left"">
<li>A <strong>dense representation</strong> of this sentence
must set an integer for all one million cells, placing a 0 in most
of them, and a low integer into a few of them.</li>
<li>A sparse representation of this sentence stores only those cells
symbolizing a word actually in the sentence. So, if the sentence
contained only 20 unique words, then the sparse representation for
the sentence would store an integer in only 20 cells.</li>
</ul>

<p>For example, consider two ways to represent the sentence, &quot;Dogs wag tails.&quot;
As the following tables show, the dense representation consumes
about a million cells; the sparse representation consumes only 3 cells:</p>

<div id=""sparse-dense-tables"">
<table id=""sparse-table"">
<caption>Dense Representation</caption>
<thead>
  <tr>
  <th>Cell Number</th>
  <th>Word</th>
  <th>Occurrence</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>0</td>
    <td>a</td>
    <td>0</td>
  </tr>
  <tr>
    <td>1</td>
    <td>aardvark</td>
    <td>0</td>
  </tr>
  <tr>
    <td>2</td>
    <td>aargh</td>
    <td>0</td>
  </tr>
  <tr>
    <td>3</td>
    <td>aarti</td>
    <td>0</td>
  </tr>
  <tr class=""elided-rows"">
    <td colspan=""3""><strong>&hellip; 140,391 more words with an occurrence of 0</strong></td>
  </tr>
  <tr>
    <td>140395</td>
    <td>dogs</td>
    <td>1</td>
  </tr>
  <tr class=""elided-rows"">
    <td colspan=""3""><strong>&hellip; 633,062 words with an occurrence of 0</strong></td>
  </tr>
  <tr>
    <td>773458</td>
    <td>tails</td>
    <td>1</td>
  </tr>
  <tr class=""elided-rows"">
    <td colspan=""3""><strong>&hellip; 189,136 words with an occurrence of 0</strong></td>
  </tr>
  <tr>
    <td>962594</td>
    <td>wag</td>
    <td>1</td>
  </tr>
  <tr class=""elided-rows"">
    <td colspan=""3""><strong>&hellip; many more words with an occurrence of 0</strong></td>
  </tr>
</tbody>
</table>

<table id=""dense-table"">
<caption>Sparse Representation</caption>
<thead>
  <tr>
  <th>Cell Number</th>
  <th>Word</th>
  <th>Occurrence</th>
  </tr>
</thead>
<tbody>
<tr>
  <td>140395</td>
  <td>dogs</td>
  <td>1</td>
</tr>
<tr>
  <td>773458</td>
  <td>tails</td>
  <td>1</td>
</tr>
<tr>
  <td>962594</td>
  <td>wag</td>
  <td>1</td>
</tr>
</tbody>
</table>
</div>

"
sparse vector (Google Machine Learning Glossary)|"

<p>A vector whose values are mostly zeroes. See also <a href=""https://developers.google.com/machine-learning/glossary/#sparse_features""><strong>sparse
feature</strong></a>.</p>

"
sparsity (Google Machine Learning Glossary)|"

<p>The number of elements set to zero (or null) in a vector or matrix divided
by the total number of entries in that vector or matrix. For example,
consider a 10x10 matrix in which 98 cells contain zero. The calculation of
sparsity is as follows:</p>

<div>
[$$]
{\text{sparsity}} =
\frac{\text{98}} {\text{100}} =
{\text{0.98}}
[/$$]
</div>

<p><strong>Feature sparsity</strong> refers to the sparsity of a feature vector;
<strong>model sparsity</strong> refers to the sparsity of the model weights.</p>

"
spatial pooling (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#pooling""><strong>pooling</strong></a>.</p>

"
squared hinge loss (Google Machine Learning Glossary)|"

<p>The square of the <a href=""https://developers.google.com/machine-learning/glossary/#hinge-loss""><strong>hinge loss</strong></a>.  Squared hinge loss penalizes
outliers more harshly than regular hinge loss.</p>

"
squared loss (Google Machine Learning Glossary)|"

<p>The <a href=""https://developers.google.com/machine-learning/glossary/#loss""><strong>loss</strong></a> function used in
<a href=""https://developers.google.com/machine-learning/glossary/#linear_regression""><strong>linear regression</strong></a>.  (Also known as
<strong>L<sub>2</sub> Loss</strong>.) This function calculates the squares of
the difference between a model&#39;s predicted value for a labeled
<a href=""https://developers.google.com/machine-learning/glossary/#example""><strong>example</strong></a> and the actual value of the <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a>.
Due to squaring, this loss function amplifies the influence of bad predictions.
That is, squared loss reacts more strongly to outliers
than <a href=""https://developers.google.com/machine-learning/glossary/#L1_loss""><strong>L<sub>1</sub> loss</strong></a>.</p>

"
state (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the parameter values that describe the current
configuration of the environment, which the <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a> uses to
choose an <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a>.</p>

"
state-action value function (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#q-function""><strong>Q-function</strong></a>.</p>

"
static model (Google Machine Learning Glossary)|"

<p>A model that is trained offline.</p>

"
stationarity (Google Machine Learning Glossary)|"

<p>A property of data in a dataset, in which the data distribution stays constant
across one or more dimensions. Most commonly, that dimension is time, meaning
that data exhibiting stationarity doesn&#39;t change over time. For example, data
that exhibits stationarity doesn&#39;t change from September to December.</p>

"
step (Google Machine Learning Glossary)|"

<p>A forward and backward evaluation of one <a href=""https://developers.google.com/machine-learning/glossary/#batch""><strong>batch</strong></a>.</p>

"
step size (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#learning_rate""><strong>learning rate</strong></a>.</p>

"
stochastic gradient descent (SGD) (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#gradient_descent""><strong>gradient descent</strong></a> algorithm in which the batch size
is one. In other words, SGD relies on a single example chosen uniformly at
random from a dataset to calculate an estimate of the gradient at each step.</p>

"
stride (Google Machine Learning Glossary)|"

<p>In a convolutional operation or pooling, the delta in each dimension of the
next series of input slices. For example, the following animation
demonstrates a (1,1) stride during a convolutional operation. Therefore,
the next input slice starts one position to the right of the previous input
slice. When the operation reaches the right edge, the next slice is all
the way over to the left but one position down.</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/AnimatedConvolution.gif"">
</p>

<p>The preceding example demonstrates a two-dimensional stride.  If the input
matrix is three-dimensional, the stride would also be three-dimensional.</p>

"
structural risk minimization (SRM) (Google Machine Learning Glossary)|"

<p>An algorithm that balances two goals:</p>

<ul align=""left"">
<li>The desire to build the most predictive model (for example, lowest loss).</li>
<li>The desire to keep the model as simple as possible (for example, strong
regularization).</li>
</ul>

<p>For example, a function that minimizes loss+regularization on the
training set is a structural risk minimization algorithm.</p>

<p>For more information, see
<a href=""http://www.svms.org/srm/"" target=""T"">http://www.svms.org/srm/</a>.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#ERM""><strong>empirical risk minimization</strong></a>.</p>

"
subsampling (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#pooling""><strong>pooling</strong></a>.</p>

"
summary (Google Machine Learning Glossary)|"

<p>In TensorFlow, a value or set of values calculated at a particular
<a href=""https://developers.google.com/machine-learning/glossary/#step""><strong>step</strong></a>, usually used for tracking model metrics during training.</p>

"
supervised machine learning (Google Machine Learning Glossary)|"

<p>Training a <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> from input data and its corresponding
<a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>labels</strong></a>. Supervised machine learning is analogous to a student
learning a subject by studying a set of questions and their corresponding
answers.  After mastering the mapping between questions and answers, the
student can then provide answers to new (never-before-seen) questions on
the same topic.  Compare with
<a href=""https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning""><strong>unsupervised machine learning</strong></a>.</p>

"
synthetic feature (Google Machine Learning Glossary)|"

<p>A <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a> not present among the input features, but
created from one or more of them. Kinds of synthetic features include:</p>

<ul align=""left"">
<li><a href=""https://developers.google.com/machine-learning/glossary/#bucketing""><strong>Bucketing</strong></a> a continuous feature into range bins.</li>
<li>Multiplying (or dividing) one feature value by other feature value(s)
or by itself.</li>
<li>Creating a <a href=""https://developers.google.com/machine-learning/glossary/#feature_cross""><strong>feature cross</strong></a>.</li>
</ul>

<p>Features created by <a href=""https://developers.google.com/machine-learning/glossary/#normalization""><strong>normalizing</strong></a> or <a href=""https://developers.google.com/machine-learning/glossary/#scaling""><strong>scaling</strong></a>
alone are not considered synthetic features.</p>


"
tabular Q-learning (Google Machine Learning Glossary)|"

<p>In reinforcement learning, implementing <a href=""https://developers.google.com/machine-learning/glossary/#q-learning""><strong>Q-learning</strong></a>
by using a table to store the <a href=""https://developers.google.com/machine-learning/glossary/#q-function""><strong>Q-functions</strong></a> for every
combination of <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a> and <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a>.</p>

"
target (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a>.</p>

"
target network (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#q-learning""><strong>Deep Q-learning</strong></a>, a neural network that is a stable
approximation of the main neural network, where the main neural network
implements either a <a href=""https://developers.google.com/machine-learning/glossary/#q-function""><strong>Q-function</strong></a> or a <a href=""https://developers.google.com/machine-learning/glossary/#policy""><strong>policy</strong></a>.
Then, you can train the main network on the Q-values predicted by the target
network. Therefore, you prevent the feedback loop that occurs when the main
network trains on Q-values predicted by itself. By avoiding this feedback,
training stability increases.</p>

"
temporal data (Google Machine Learning Glossary)|"

<p>Data recorded at different points in time. For example, winter coat sales
recorded for each day of the year would be temporal data.</p>

"
Tensor (Google Machine Learning Glossary)|"

<p>The primary data structure in TensorFlow programs. Tensors are N-dimensional
(where N could be very large) data structures, most commonly scalars, vectors,
or matrices. The elements of a Tensor can hold integer, floating-point,
or string values.</p>

"
TensorBoard (Google Machine Learning Glossary)|"

<p>The dashboard that displays the summaries saved during the execution of one or
more TensorFlow programs.</p>

"
TensorFlow (Google Machine Learning Glossary)|"

<p>A large-scale, distributed, machine learning platform. The term also refers to
the base API layer in the TensorFlow stack, which supports general computation
on dataflow graphs.</p>

<p>Although TensorFlow is primarily used for machine learning, you may also use
TensorFlow for non-ML tasks that require numerical computation using
dataflow graphs.</p>

"
TensorFlow Playground (Google Machine Learning Glossary)|"

<p>A program that visualizes how different
<a href=""https://developers.google.com/machine-learning/glossary/#hyperparameter""><strong>hyperparameters</strong></a> influence model
(primarily neural network) training.
Go to
<a href=""http://playground.tensorflow.org"" target=""T"">
http://playground.tensorflow.org</a>
to experiment with TensorFlow Playground.</p>

"
TensorFlow Serving (Google Machine Learning Glossary)|"

<p>A platform to deploy trained models in production.</p>

"
Tensor Processing Unit (TPU) (Google Machine Learning Glossary)|"

<p>An application-specific integrated circuit (ASIC) that optimizes the
performance of machine learning workloads. These ASICs are deployed as
multiple <a href=""https://developers.google.com/machine-learning/glossary/#TPU_chip""><strong>TPU chips</strong></a> on a <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU device</strong></a>.</p>

"
Tensor rank (Google Machine Learning Glossary)|"

<p>See <a href=""https://developers.google.com/machine-learning/glossary/#rank_Tensor""><strong>rank (Tensor)</strong></a>.</p>

"
Tensor shape (Google Machine Learning Glossary)|"

<p>The number of elements a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a> contains in various dimensions.
For example, a [5, 10] Tensor has a shape of 5 in one dimension and 10
in another.</p>

"
Tensor size (Google Machine Learning Glossary)|"

<p>The total number of scalars a <a href=""https://developers.google.com/machine-learning/glossary/#tensor""><strong>Tensor</strong></a> contains. For example, a
[5, 10] Tensor has a size of 50.</p>

"
termination condition (Google Machine Learning Glossary)|"

<p>In reinforcement learning, the conditions that determine when an
<a href=""https://developers.google.com/machine-learning/glossary/#episode""><strong>episode</strong></a> ends, such as when the agent reaches a certain state
or exceeds a threshold number of state transitions.
For example, in <a href=""https://wikipedia.org/wiki/Tic-tac-toe"">tic-tac-toe</a> (also
known as noughts and crosses), an episode terminates either when a player marks
three consecutive spaces or when all spaces are marked.</p>

"
test set (Google Machine Learning Glossary)|"

<p>The subset of the dataset that you use to test your <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a>
after the model has gone through initial vetting by the validation set.

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training set</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#validation_set""><strong>validation set</strong></a>.</p>

"
tf.Example (Google Machine Learning Glossary)|"

<p>A standard
<a href=""https://developers.google.com/protocol-buffers/"" target=""T"">
protocol buffer</a>
for describing input data for machine learning model training or inference.</p>

"
tf.keras (Google Machine Learning Glossary)|"

<p>An implementation of <a href=""https://developers.google.com/machine-learning/glossary/#Keras""><strong>Keras</strong></a> integrated into
<a href=""https://developers.google.com/machine-learning/glossary/#TensorFlow""><strong>TensorFlow</strong></a>.</p>


"
time series analysis (Google Machine Learning Glossary)|"

<p>A subfield of machine learning and statistics that analyzes
<a href=""https://developers.google.com/machine-learning/glossary/#temporal_data""><strong>temporal data</strong></a>.  Many types of machine learning
problems require time series analysis, including classification, clustering,
forecasting, and anomaly detection. For example, you could use
time series analysis to forecast the future sales of winter coats by month
based on historical sales data.</p>

"
timestep (Google Machine Learning Glossary)|"

<p>One &quot;unrolled&quot; cell within a
<a href=""https://developers.google.com/machine-learning/glossary/#recurrent_neural_network""><strong>recurrent neural network</strong></a>.
For example, the following figure shows three timesteps (labeled with
the subscripts t-1, t, and t+1):</p>

<p>
<img src=""https://developers.google.com/machine-learning/glossary/images/Simple_RNN.svg"">
</p>

"
tower (Google Machine Learning Glossary)|"

<p>A component of a <a href=""https://developers.google.com/machine-learning/glossary/#deep_neural_network""><strong>deep neural network</strong></a> that
is itself a deep neural network without an output layer. Typically,
each tower reads from an independent data source. Towers are independent
until their output is combined in a final layer.</p>

"
TPU (Google Machine Learning Glossary)|"

<p>Abbreviation for <a href=""https://developers.google.com/machine-learning/glossary/#TPU""><strong>Tensor Processing Unit</strong></a>.</p>

"
TPU chip (Google Machine Learning Glossary)|"

<p>A programmable linear algebra accelerator with on-chip high bandwidth memory
that is optimized for machine learning workloads.
Multiple TPU chips are deployed on a <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU device</strong></a>.</p>

"
TPU device (Google Machine Learning Glossary)|"

<p>A printed circuit board (PCB) with multiple <a href=""https://developers.google.com/machine-learning/glossary/#TPU_chip""><strong>TPU chips</strong></a>,
high bandwidth network interfaces, and system cooling hardware.</p>

"
TPU master (Google Machine Learning Glossary)|"

<p>The central coordination process running on a host machine that sends and
receives data, results, programs, performance, and system health information
to the <a href=""https://developers.google.com/machine-learning/glossary/#TPU_worker""><strong>TPU workers</strong></a>. The TPU master also manages the setup
and shutdown of <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU devices</strong></a>.</p>

"
TPU node (Google Machine Learning Glossary)|"

<p>A TPU resource on Google Cloud Platform with a specific
<a href=""https://developers.google.com/machine-learning/glossary/#TPU_type""><strong>TPU type</strong></a>. The TPU node connects to your
<a href=""https://cloud.google.com/vpc/docs/"">VPC Network</a> from a
<a href=""https://cloud.google.com/vpc/docs/vpc-peering"">peer VPC network</a>.
TPU nodes are a resource defined in the
<a href=""https://cloud.google.com/tpu/docs/reference/rest/v1/projects.locations.nodes"">Cloud TPU API</a>.</p>

"
TPU Pod (Google Machine Learning Glossary)|"

<p>A specific configuration of <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU devices</strong></a> in a Google
data center. All of the devices in a TPU pod are connected to one another
over a dedicated high-speed network. A TPU Pod is the largest configuration of
<a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU devices</strong></a> available for a specific TPU version.</p>

"
TPU resource (Google Machine Learning Glossary)|"

<p>A TPU entity on Google Cloud Platform that you create, manage, or consume. For
example, <a href=""https://developers.google.com/machine-learning/glossary/#TPU_node""><strong>TPU nodes</strong></a> and <a href=""https://developers.google.com/machine-learning/glossary/#TPU_type""><strong>TPU types</strong></a> are
TPU resources.</p>

"
TPU slice (Google Machine Learning Glossary)|"

<p>A TPU slice is a fractional portion of the <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU devices</strong></a> in
a <a href=""https://developers.google.com/machine-learning/glossary/#TPU_Pod""><strong>TPU Pod</strong></a>. All of the devices in a TPU slice are connected
to one another over a dedicated high-speed network.</p>

"
TPU type (Google Machine Learning Glossary)|"

<p>A configuration of one or more <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU devices</strong></a> with a specific
TPU hardware version. You select a TPU type when you create
a <a href=""https://developers.google.com/machine-learning/glossary/#TPU_node""><strong>TPU node</strong></a> on Google Cloud Platform. For example, a <code translate=""no"" dir=""ltr"">v2-8</code>
TPU type is a single TPU v2 device with 8 cores. A <code translate=""no"" dir=""ltr"">v3-2048</code> TPU type has 256
networked TPU v3 devices and a total of 2048 cores. TPU types are a resource
defined in the
<a href=""https://cloud.google.com/tpu/docs/reference/rest/v1/projects.locations.acceleratorTypes"">Cloud TPU API</a>.</p>

"
TPU worker (Google Machine Learning Glossary)|"

<p>A process that runs on a host machine and executes machine learning programs
on <a href=""https://developers.google.com/machine-learning/glossary/#TPU_device""><strong>TPU devices</strong></a>.</p>

"
training (Google Machine Learning Glossary)|"

<p>The process of determining the ideal <a href=""https://developers.google.com/machine-learning/glossary/#parameter""><strong>parameters</strong></a> comprising
a model.</p>

"
training set (Google Machine Learning Glossary)|"

<p>The subset of the dataset used to train a model.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#validation_set""><strong>validation set</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#test_set""><strong>test set</strong></a>.</p>

"
trajectory (Google Machine Learning Glossary)|"

<p>In reinforcement learning, a sequence of
<a href=""https://wikipedia.org/wiki/Tuple"" target=""T"">tuples</a> that represent
a sequence of <a href=""https://developers.google.com/machine-learning/glossary/#state""><strong>state</strong></a> transitions of the <a href=""https://developers.google.com/machine-learning/glossary/#agent""><strong>agent</strong></a>,
where each tuple corresponds to the state, <a href=""https://developers.google.com/machine-learning/glossary/#action""><strong>action</strong></a>,
<a href=""https://developers.google.com/machine-learning/glossary/#reward""><strong>reward</strong></a>, and next state for a given state transition.</p>

"
transfer learning (Google Machine Learning Glossary)|"

<p>Transferring information from one machine learning task to another.
For example, in multi-task learning, a single model solves multiple tasks,
such as a <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep model</strong></a> that has different output nodes for
different tasks.  Transfer learning might involve transferring knowledge
from the solution of a simpler task to a more complex one, or involve
transferring knowledge from a task where there is more data to one where
there is less data.</p>

<p>Most machine learning systems solve a <em>single</em> task. Transfer learning is a
baby step towards artificial intelligence in which a single program can solve
<em>multiple</em> tasks.</p>

"
translational invariance (Google Machine Learning Glossary)|"

<p>In an image classification problem, an algorithm&#39;s ability to successfully
classify images even when the position of objects within the image changes.
For example, the algorithm can still identify a dog, whether it is in the
center of the frame or at the left end of the frame.</p>

<p>See also <a href=""https://developers.google.com/machine-learning/glossary/#size_invariance""><strong>size invariance</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#rotational_invariance""><strong>rotational invariance</strong></a>.</p>

"
trigram (Google Machine Learning Glossary)|"

<p>An <a href=""https://developers.google.com/machine-learning/glossary/#N-gram""><strong>N-gram</strong></a> in which N=3.</p>

"
true negative (TN) (Google Machine Learning Glossary)|"

<p>An example in which the model <em>correctly</em> predicted the
<a href=""https://developers.google.com/machine-learning/glossary/#negative_class""><strong>negative class</strong></a>. For example, the model inferred that
a particular email message was not spam, and that email message really was
not spam.</p>

"
true positive (TP) (Google Machine Learning Glossary)|"

<p>An example in which the model <em>correctly</em> predicted the
<a href=""https://developers.google.com/machine-learning/glossary/#positive_class""><strong>positive class</strong></a>. For example, the model inferred that
a particular email message was spam, and that email message really was spam.</p>

"
true positive rate (TPR) (Google Machine Learning Glossary)|"

<p>Synonym for <a href=""https://developers.google.com/machine-learning/glossary/#recall""><strong>recall</strong></a>. That is:</p>

<div>
[$$]\text{True Positive Rate} = \frac{\text{True Positives}} {\text{True Positives} + \text{False Negatives}}[/$$]
</div>

<p>True positive rate is the y-axis in an <a href=""https://developers.google.com/machine-learning/glossary/#ROC""><strong>ROC curve</strong></a>.</p>


"
unawareness (to a sensitive attribute) (Google Machine Learning Glossary)|"

<p>A situation in which <a href=""https://developers.google.com/machine-learning/glossary/#sensitive_attribute""><strong>sensitive attributes</strong></a> are
present, but not included in the training data. Because sensitive attributes
are often correlated with other attributes of one’s data, a model trained
with unawareness about a sensitive attribute could still have
<a href=""https://developers.google.com/machine-learning/glossary/#disparate_impact""><strong>disparate impact</strong></a> with respect to that attribute,
or violate other <a href=""https://developers.google.com/machine-learning/glossary/#fairness_constraint""><strong>fairness constraints</strong></a>.</p>

"
underfitting (Google Machine Learning Glossary)|"

<p>Producing a model with poor predictive ability because the model
hasn&#39;t captured the complexity of the training data. Many problems
can cause underfitting, including:</p>

<ul align=""left"">
<li>Training on the wrong set of features.</li>
<li>Training for too few epochs or at too low a learning rate.</li>
<li>Training with too high a regularization rate.</li>
<li>Providing too few hidden layers in a deep neural network.</li>
</ul>

"
unlabeled example (Google Machine Learning Glossary)|"

<p>An example that contains <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>features</strong></a> but no <a href=""https://developers.google.com/machine-learning/glossary/#label""><strong>label</strong></a>.
Unlabeled examples are the input to <a href=""https://developers.google.com/machine-learning/glossary/#inference""><strong>inference</strong></a>. In
<a href=""https://developers.google.com/machine-learning/glossary/#semi-supervised_learning""><strong>semi-supervised</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning""><strong>unsupervised</strong></a> learning,
unlabeled examples are used during training.</p>

"
unsupervised machine learning (Google Machine Learning Glossary)|"

<p>Training a <a href=""https://developers.google.com/machine-learning/glossary/#model""><strong>model</strong></a> to find patterns in a dataset, typically an
unlabeled dataset.</p>

<p>The most common use of unsupervised machine learning is to cluster data
into groups of similar examples. For example, an unsupervised machine
learning algorithm can cluster songs together based on various properties
of the music. The resulting clusters can become an input to other machine
learning algorithms (for example, to a music recommendation service).
Clustering can be helpful in domains where true labels are hard to obtain.
For example, in domains such as anti-abuse and fraud, clusters can help
humans better understand the data.</p>

<p>Another example of unsupervised machine learning is
<a href=""https://wikipedia.org/wiki/Principal_component_analysis"" target=""T"">principal
component analysis (PCA)</a>.
For example, applying PCA on a
dataset containing the contents of millions of shopping carts might reveal
that shopping carts containing lemons frequently also contain antacids.</p>

<p>Compare with <a href=""https://developers.google.com/machine-learning/glossary/#supervised_machine_learning""><strong>supervised machine learning</strong></a>.</p>

"
upweighting (Google Machine Learning Glossary)|"

<p>Applying a weight to the <a href=""https://developers.google.com/machine-learning/glossary/#downsampling""><strong>downsampled</strong></a> class equal
to the factor by which you downsampled.</p>

"
user matrix (Google Machine Learning Glossary)|"

<p>In <a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation systems</strong></a>, an
<a href=""https://developers.google.com/machine-learning/glossary/#embeddings""><strong>embedding</strong></a> generated by
<a href=""https://developers.google.com/machine-learning/glossary/#matrix_factorization""><strong>matrix factorization</strong></a>
that holds latent signals about user preferences.
Each row of the user matrix holds information about the relative
strength of various latent signals for a single user.
For example, consider a movie recommendation system.  In this system,
the latent signals in the user matrix might represent each user&#39;s interest
in particular genres, or might be harder-to-interpret signals that involve
complex interactions across multiple factors.</p>

<p>The user matrix has a column for each latent feature and a row for each user.
That is, the user matrix has the same number of rows as the target
matrix that is being factorized. For example, given a movie
recommendation system for 1,000,000 users, the
user matrix will have 1,000,000 rows.</p>


"
validation (Google Machine Learning Glossary)|"

<p>A process used, as part of <a href=""https://developers.google.com/machine-learning/glossary/#training""><strong>training</strong></a>, to evaluate
the quality of a <a href=""https://developers.google.com/machine-learning/glossary/#machine_learning""><strong>machine learning</strong></a> model
using the <a href=""https://developers.google.com/machine-learning/glossary/#validation_set""><strong>validation set</strong></a>. Because the validation
set is disjoint from the training set, validation helps ensure that the
model’s performance generalizes beyond the training set.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#test_set""><strong>test set</strong></a>.</p>

"
validation set (Google Machine Learning Glossary)|"

<p>A subset of the dataset—disjoint from the training set—used
in <a href=""https://developers.google.com/machine-learning/glossary/#validation""><strong>validation</strong></a>.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#training_set""><strong>training set</strong></a>
and <a href=""https://developers.google.com/machine-learning/glossary/#test_set""><strong>test set</strong></a>.</p>

"
vanishing gradient problem (Google Machine Learning Glossary)|"

<p>The tendency for the gradients of early <a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layers</strong></a>
of some <a href=""https://developers.google.com/machine-learning/glossary/#deep_neural_network""><strong>deep neural networks</strong></a> to become
surprisingly flat (low). Increasingly lower gradients result in increasingly
smaller changes to the weights on nodes in a deep neural network, leading to
little or no learning. Models suffering from the vanishing gradient problem
become difficult or impossible to train.
<a href=""https://developers.google.com/machine-learning/glossary/#Long_Short-Term_Memory""><strong>Long Short-Term Memory</strong></a> cells address this issue.</p>

<p>Compare to <a href=""https://developers.google.com/machine-learning/glossary/#exploding_gradient_problem""><strong>exploding gradient problem</strong></a>.</p>


"
Wasserstein loss (Google Machine Learning Glossary)|"

<p>One of the loss functions commonly used in
<a href=""https://developers.google.com/machine-learning/glossary/#generative_adversarial_network""><strong>generative adversarial networks</strong></a>,
based on the <a href=""https://wikipedia.org/wiki/Earth_mover%27s_distance"">earth-mover&#39;s
distance</a> between
the distribution of generated data and real data.</p>

<p>Wasserstein Loss is the default loss function in
<a href=""https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/gan"">TF-GAN</a>.</p>


"
weight (Google Machine Learning Glossary)|"

<p>A coefficient for a <a href=""https://developers.google.com/machine-learning/glossary/#feature""><strong>feature</strong></a> in a linear model, or an edge
in a deep network. The goal of training a linear model is to determine
the ideal weight for each feature. If a weight is 0, then its corresponding
feature does not contribute to the model.</p>

"
Weighted Alternating Least Squares (WALS) (Google Machine Learning Glossary)|"

<p>An algorithm for minimizing the objective function during
<a href=""https://developers.google.com/machine-learning/glossary/#matrix_factorization""><strong>matrix factorization</strong></a> in
<a href=""https://developers.google.com/machine-learning/glossary/#recommendation_system""><strong>recommendation systems</strong></a>, which allows a
downweighting of the missing examples. WALS minimizes the weighted
squared error between the original matrix and the reconstruction by
alternating between fixing the row factorization and column factorization.
Each of these optimizations can be solved by least squares
<a href=""https://developers.google.com/machine-learning/glossary/#convex_optimization""><strong>convex optimization</strong></a>. For details, see the
<a href=""/machine-learning/recommendation/collaborative/matrix""
   target=""T""
   class=""gc-analytics-event""
   data-category=""launchRecommendationCourse""
   data-label=""ml-glossary""
   data-action=""click"">Recommendation Systems course</a></p>

"
wide model (Google Machine Learning Glossary)|"

<p>A linear model that typically has many
<a href=""https://developers.google.com/machine-learning/glossary/#sparse_features""><strong>sparse input features</strong></a>. We refer to it as &quot;wide&quot; since
such a model is a special type of <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a> with a
large number of inputs that connect directly to the output node. Wide models
are often easier to debug and inspect than deep models. Although wide models
cannot express nonlinearities through <a href=""https://developers.google.com/machine-learning/glossary/#hidden_layer""><strong>hidden layers</strong></a>,
they can use transformations such as
<a href=""https://developers.google.com/machine-learning/glossary/#feature_cross""><strong>feature crossing</strong></a> and
<a href=""https://developers.google.com/machine-learning/glossary/#bucketing""><strong>bucketization</strong></a> to model nonlinearities in different ways.</p>

<p>Contrast with <a href=""https://developers.google.com/machine-learning/glossary/#deep_model""><strong>deep model</strong></a>.</p>

"
width (Google Machine Learning Glossary)|"

<p>The number of <a href=""https://developers.google.com/machine-learning/glossary/#neuron""><strong>neurons</strong></a> in a particular <a href=""https://developers.google.com/machine-learning/glossary/#layer""><strong>layer</strong></a>
of a <a href=""https://developers.google.com/machine-learning/glossary/#neural_network""><strong>neural network</strong></a>.</p>
          </div>