Skip to content
This repository was archived by the owner on Mar 22, 2022. It is now read-only.

Unsupervised #9

Open
brooksjessup opened this issue Jan 7, 2021 · 1 comment
Open

Unsupervised #9

brooksjessup opened this issue Jan 7, 2021 · 1 comment
Assignees

Comments

@brooksjessup
Copy link
Contributor

Explore the Data Using Pandas-
typo: "interpretation. <3 your data"

Why not apply some of the preprocessing techniques from the last lesson here on the music reviews data?

Creating the DTM using scikit-learn-
Explanation needed for why it's necessary to remove numbers.

Topic Modeling-
typo: "what the ext is about" -> "text"
The paragraph on the "theory" behind LDA is very dense and difficult to parse.

It is unnecessary to fit-transform both tf-idf and countvectorizer here - one or the other is fine.

Error message fitting the lda model:
"LatentDirichletAllocation(n_topics=10...)" -> "LatentDirichletAllocation(n_components=10"

It might be nice to include an interpretation of the 10 topics identified by the model.

Error message in cosine similarity example at end of notebook.

Further resources-
The link for the blog post is broken. Remove it?

@EastBayEv
Copy link
Contributor

Hi @brooksjessup -- Can you commit and push these changes? Please close this comment when you are done. Let me know if you have any questions. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants