Skip to content

Commit 3352bba

Browse files
MarkyVamueller
authored andcommitted
Type-os and added great links to learning more about Machine Learning
1 parent 85c0b45 commit 3352bba

File tree

10 files changed

+104
-102
lines changed

10 files changed

+104
-102
lines changed

CONTRIBUTING.md

+9-10
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Contributing code
44

55
**Note: This document is just to get started, visit [**Contributing
66
page**](http://scikit-learn.org/stable/developers/index.html#coding-guidelines)
7-
for the full contributor's guide. Make sure to read it carefully to make
7+
for the full contributor's guide. Please be sure to read it carefully to make
88
the code review process go as smoothly as possible and maximize the
9-
likelihood of your contribution to get merged.**
9+
likelihood of your contribution being merged.**
1010

1111
How to contribute
1212
-----------------
@@ -29,7 +29,7 @@ GitHub:
2929

3030
and start making changes. Never work in the ``master`` branch!
3131

32-
4. Work on this copy, on your computer, using Git to do the version
32+
4. Work on this copy on your computer using Git to do the version
3333
control. When you're done editing, do:
3434

3535
$ git add modified_files
@@ -43,8 +43,8 @@ Finally, go to the web page of the your fork of the scikit-learn repo,
4343
and click 'Pull request' to send your changes to the maintainers for
4444
review. request. This will send an email to the committers.
4545

46-
(If any of the above seems like magic to you, then look up the [Git documentation](http://git-scm.com/documentation)
47-
on the web.)
46+
(If any of the above seems like magic to you, then look up the
47+
[Git documentation](http://git-scm.com/documentation) on the web.)
4848

4949
It is recommended to check that your contribution complies with the
5050
following rules before submitting a pull request:
@@ -64,7 +64,7 @@ following rules before submitting a pull request:
6464
to other methods available in scikit-learn.
6565

6666
- At least one paragraph of narrative documentation with links to
67-
references in the literature (with PDF links when possible) and
67+
```` references in the literature (with PDF links when possible) and
6868
the example.
6969
7070
The documentation should also include expected time and space
@@ -76,7 +76,7 @@ scale in dimensionality: n_features is expected to be lower than
7676
You can also check for common programming errors with the following
7777
tools:
7878
79-
- Code with a good unittest coverage (at least 80%), check with:
79+
- Code with good unittest coverage (at least 80%), check with:
8080
8181
$ pip install nose coverage
8282
$ nosetests --with-coverage path/to/tests_for_package
@@ -119,7 +119,7 @@ reStructuredText documents (like this one), tutorials, etc.
119119
reStructuredText documents live in the source code repository under the
120120
doc/ directory.
121121
122-
You can edit the documentation using any text editor, and then generate
122+
You can edit the documentation using any text editor and then generate
123123
the HTML output by typing ``make html`` from the doc/ directory.
124124
Alternatively, ``make`` can be used to quickly generate the
125125
documentation without the example gallery. The resulting HTML files will
@@ -133,7 +133,7 @@ For building the documentation, you will need
133133
When you are writing documentation, it is important to keep a good
134134
compromise between mathematical and algorithmic details, and give
135135
intuition to the reader on what the algorithm does. It is best to always
136-
start with a small paragraph with a hand-waiving explanation of what the
136+
start with a small paragraph with a hand-waving explanation of what the
137137
method does to the data and a figure (coming from an example)
138138
illustrating it.
139139
@@ -143,4 +143,3 @@ Further Information
143143
Visit the [Contributing Code](http://scikit-learn.org/stable/developers/index.html#coding-guidelines)
144144
section of the website for more information including conforming to the
145145
API spec and profiling contributed code.
146-

doc/tutorial/basic/tutorial.rst

+13-11
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,10 @@ Machine learning: the problem setting
1616

1717
In general, a learning problem considers a set of n
1818
`samples <http://en.wikipedia.org/wiki/Sample_(statistics)>`_ of
19-
data and try to predict properties of unknown data. If each sample is
20-
more than a single number, and for instance a multi-dimensional entry
19+
data and then tries to predict properties of unknown data. If each sample is
20+
more than a single number and, for instance, a multi-dimensional entry
2121
(aka `multivariate <http://en.wikipedia.org/wiki/Multivariate_random_variable>`_
22-
data), is it said to have several attributes,
23-
or **features**.
22+
data), is it said to have several attributes or **features**.
2423

2524
We can separate learning problems in a few large categories:
2625

@@ -35,9 +34,12 @@ We can separate learning problems in a few large categories:
3534
samples belong to two or more classes and we
3635
want to learn from already labeled data how to predict the class
3736
of unlabeled data. An example of classification problem would
38-
be the digit recognition example, in which the aim is to assign
39-
each input vector to one of a finite number of discrete
40-
categories.
37+
be the handwritten digit recognition example, in which the aim is
38+
to assign each input vector to one of a finite number of discrete
39+
categories. Another way to think of classification is as a discrete
40+
(as opposed to continuous) form of supervised learning where one has a
41+
limited number of categories and for each of the n samples provided,
42+
one is to try to label them with the correct categroy or class.
4143

4244
* `regression <http://en.wikipedia.org/wiki/Regression_analysis>`_:
4345
if the desired output consists of one or more
@@ -52,7 +54,7 @@ We can separate learning problems in a few large categories:
5254
it is called `clustering <http://en.wikipedia.org/wiki/Cluster_analysis>`_,
5355
or to determine the distribution of data within the input space, known as
5456
`density estimation <http://en.wikipedia.org/wiki/Density_estimation>`_, or
55-
to project the data from a high-dimensional space down to two or thee
57+
to project the data from a high-dimensional space down to two or three
5658
dimensions for the purpose of *visualization*
5759
(:ref:`Click here <unsupervised-learning>`
5860
to go to the Scikit-Learn unsupervised learning page).
@@ -62,8 +64,8 @@ We can separate learning problems in a few large categories:
6264
Machine learning is about learning some properties of a data set
6365
and applying them to new data. This is why a common practice in
6466
machine learning to evaluate an algorithm is to split the data
65-
at hand in two sets, one that we call a **training set** on which
66-
we learn data properties, and one that we call a **testing set**,
67+
at hand into two sets, one that we call the **training set** on which
68+
we learn data properties and one that we call the **testing set**
6769
on which we test these properties.
6870

6971
.. _loading_example_dataset:
@@ -142,7 +144,7 @@ the classes to which unseen samples belong.
142144
In `scikit-learn`, an estimator for classification is a Python object that
143145
implements the methods `fit(X, y)` and `predict(T)`.
144146

145-
An example of estimator is the class ``sklearn.svm.SVC`` that
147+
An example of an estimator is the class ``sklearn.svm.SVC`` that
146148
implements `support vector classification
147149
<http://en.wikipedia.org/wiki/Support_vector_machine>`_. The
148150
constructor of an estimator takes as arguments the parameters of the

doc/tutorial/common_includes/info.txt

+3-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
1-
Meant to share common RST file snippets that we want to reuse by inclusion in the real tutorial to lower the maintenance burden of redundant sections.
1+
Meant to share common RST file snippets that we want to reuse by inclusion
2+
in the real tutorial in order to lower the maintenance burden
3+
of redundant sections.

doc/tutorial/statistical_inference/finding_help.rst

+5-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ clarification in the docstring or the online documentation, please feel free to
1010
ask on the `Mailing List <http://scikit-learn.sourceforge.net/support.html>`_
1111

1212

13-
Q&A communities with Machine Learning practictioners
13+
Q&A communities with Machine Learning practitioners
1414
----------------------------------------------------
1515

1616
:Metaoptimize/QA:
@@ -36,3 +36,7 @@ Q&A communities with Machine Learning practictioners
3636
.. _`good freely available textbooks on machine learning`: http://metaoptimize.com/qa/questions/186/good-freely-available-textbooks-on-machine-learning
3737

3838
.. _`What are some good resources for learning about machine learning`: http://www.quora.com/What-are-some-good-resources-for-learning-about-machine-learning
39+
40+
-- _'An excellent free online course for Machine Learning taught by Professor Andrew Ng of Stanford': https://www.coursera.org/course/ml
41+
42+
-- _'Another excellent free online course that takes a more general approach to Artificial Intelligence':http://www.udacity.com/overview/Course/cs271/CourseRev/1

doc/tutorial/statistical_inference/model_selection.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,9 @@ of the computer.
9797

9898
*
9999

100-
- Split it K folds, train on K-1, test on left-out
100+
- Split it K folds, train on K-1 and then test on left-out
101101

102-
- Make sure that all classes are even accross the folds
102+
- Make sure that all classes are even across the folds
103103

104104
- Leave one observation out
105105

@@ -155,8 +155,8 @@ estimator during the construction and exposes an estimator API::
155155
0.94228356336260977
156156

157157

158-
By default the :class:`GridSearchCV` uses a 3-fold cross-validation. However, if
159-
it detects that a classifier is passed, rather than a regressor, it uses
158+
By default, the :class:`GridSearchCV` uses a 3-fold cross-validation. However,
159+
if it detects that a classifier is passed, rather than a regressor, it uses
160160
a stratified 3-fold.
161161

162162
.. topic:: Nested cross-validation
@@ -167,7 +167,7 @@ a stratified 3-fold.
167167
array([ 0.97996661, 0.98163606, 0.98330551])
168168

169169
Two cross-validation loops are performed in parallel: one by the
170-
:class:`GridSearchCV` estimator to set `gamma`, the other one by
170+
:class:`GridSearchCV` estimator to set `gamma` and the other one by
171171
`cross_val_score` to measure the prediction performance of the
172172
estimator. The resulting scores are unbiased estimates of the
173173
prediction score on new data.
@@ -183,8 +183,8 @@ Cross-validated estimators
183183
----------------------------
184184

185185
Cross-validation to set a parameter can be done more efficiently on an
186-
algorithm-by-algorithm basis. This is why, for certain estimators, the
187-
sklearn exposes :ref:`cross_validation` estimators, that set their parameter
186+
algorithm-by-algorithm basis. This is why for certain estimators the
187+
sklearn exposes :ref:`cross_validation` estimators that set their parameter
188188
automatically by cross-validation::
189189

190190
>>> from sklearn import linear_model, datasets

doc/tutorial/statistical_inference/putting_together.rst

+4-8
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ Putting it all together
88
Pipelining
99
============
1010

11-
We have seen that some estimators can transform data, and some estimators
12-
can predict variables. We can create combined estimators:
11+
We have seen that some estimators can transform data and that some estimators
12+
can predict variables. We can also create combined estimators:
1313

1414
.. image:: ../../auto_examples/images/plot_digits_pipe_1.png
1515
:target: ../../auto_examples/plot_digits_pipe.html
@@ -26,7 +26,7 @@ Face recognition with eigenfaces
2626
=================================
2727

2828
The dataset used in this example is a preprocessed excerpt of the
29-
"Labeled Faces in the Wild", aka LFW_:
29+
"Labeled Faces in the Wild", also known as LFW_:
3030

3131
http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)
3232

@@ -71,10 +71,6 @@ Expected results for the top 5 most represented people in the dataset::
7171
Open problem: Stock Market Structure
7272
=====================================
7373

74-
Can we predict the variation in stock prices for Google?
74+
Can we predict the variation in stock prices for Google over a given time frame?
7575

7676
:ref:`stock_market`
77-
78-
79-
80-

doc/tutorial/statistical_inference/settings.rst

+7-8
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ these arrays is the **samples** axis, while the second is the
2626
features: their sepal and petal length and width, as detailed in
2727
`iris.DESCR`.
2828

29-
When the data is not intially in the `(n_samples, n_features)` shape, it
30-
needs to be preprocessed to be used by the scikit.
29+
When the data is not initially in the `(n_samples, n_features)` shape, it
30+
needs to be preprocessed in order to by used by scikit.
3131

32-
.. topic:: An example of reshaping data: the digits dataset
32+
.. topic:: An example of reshaping data would be the digits dataset
3333

3434
.. image:: ../../auto_examples/datasets/images/plot_digits_last_image_1.png
3535
:target: ../../auto_examples/datasets/plot_digits_last_image.html
@@ -46,7 +46,7 @@ needs to be preprocessed to be used by the scikit.
4646
>>> pl.imshow(digits.images[-1], cmap=pl.cm.gray_r) #doctest: +SKIP
4747
<matplotlib.image.AxesImage object at ...>
4848

49-
To use this dataset with the scikit, we transform each 8x8 image in a
49+
To use this dataset with the scikit, we transform each 8x8 image into a
5050
feature vector of length 64 ::
5151

5252
>>> data = digits.images.reshape((digits.images.shape[0], -1))
@@ -68,16 +68,16 @@ Estimators objects
6868
6969
**Fitting data**: the main API implemented by scikit-learn is that of the
7070
`estimator`. An estimator is any object that learns from data;
71-
it may a classification, regression or clustering algorithm or
71+
it may be a classification, regression or clustering algorithm or
7272
a `transformer` that extracts/filters useful features from raw data.
7373

74-
All estimator objects expose a `fit` method, that takes a dataset
74+
All estimator objects expose a `fit` method that takes a dataset
7575
(usually a 2-d array):
7676

7777
>>> estimator.fit(data)
7878

7979
**Estimator parameters**: All the parameters of an estimator can be set
80-
when it is instantiated, or by modifying the corresponding attribute::
80+
when it is instantiated or by modifying the corresponding attribute::
8181

8282
>>> estimator = Estimator(param1=1, param2=2)
8383
>>> estimator.param1
@@ -90,4 +90,3 @@ underscore::
9090

9191
>>> estimator.estimated_param_ #doctest: +SKIP
9292

93-

0 commit comments

Comments
 (0)