Unsupervised #9

brooksjessup · 2021-01-07T05:42:04Z

Explore the Data Using Pandas-
typo: "interpretation. <3 your data"

Why not apply some of the preprocessing techniques from the last lesson here on the music reviews data?

Creating the DTM using scikit-learn-
Explanation needed for why it's necessary to remove numbers.

Topic Modeling-
typo: "what the ext is about" -> "text"
The paragraph on the "theory" behind LDA is very dense and difficult to parse.

It is unnecessary to fit-transform both tf-idf and countvectorizer here - one or the other is fine.

Error message fitting the lda model:
"LatentDirichletAllocation(n_topics=10...)" -> "LatentDirichletAllocation(n_components=10"

It might be nice to include an interpretation of the 10 topics identified by the model.

Error message in cosine similarity example at end of notebook.

Further resources-
The link for the blog post is broken. Remove it?

EastBayEv · 2021-02-11T21:51:53Z

Hi @brooksjessup -- Can you commit and push these changes? Please close this comment when you are done. Let me know if you have any questions. Thanks!

EastBayEv assigned brooksjessup Jan 7, 2021

katherinerosewolf mentioned this issue Aug 25, 2021

dataset updates #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsupervised #9

Unsupervised #9

brooksjessup commented Jan 7, 2021

EastBayEv commented Feb 11, 2021

Unsupervised #9

Unsupervised #9

Comments

brooksjessup commented Jan 7, 2021

EastBayEv commented Feb 11, 2021