Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize tags before saving #159

Open
WyohKnott opened this issue Dec 9, 2018 · 2 comments
Open

Normalize tags before saving #159

WyohKnott opened this issue Dec 9, 2018 · 2 comments

Comments

@WyohKnott
Copy link
Contributor

I have many post tagged with real names in which sometimes i used upper case letters and sometimes I did not. Now the issue is that for tags index pages, these tags are differents: for example "Todd Hido" is different from "todd hido". Would it be possible to normalize every tag beforehand, by making them all lower case?

@aspensmonster
Copy link

Fuzzy matching might help in this case. You could have any tags with similar enough strings get grouped, and still expose the underlying tags themselves.

  • toddhido
    • Todd Hido
    • todd hido
    • ToddHido
    • ToddHida
  • someOtherTagWithNoFuzzyMatches

Though I can't think immediately of a decent way to test all combinations within the tag set. Tags can be quite diverse on tumblr.

@thisismycontributionaccount

I was having a similar problem with "/" in tags as well as upper and lower case tags. So I modified the tumblr_backup.py to quote and lower case tags. Let me run a few more tests and I will try to add the code.

thisismycontributionaccount added a commit to thisismycontributionaccount/tumblr-utils that referenced this issue Dec 19, 2018
…iewing

This is for issue bbolli#159 .  I was having a similar issue with special characters as well as with tag upper/lower case.

I have added three new options and the code to implement the options.  

--normalize-tags - sets the text to lower case and creates a unique set to remove duplicates
--escape-tags - uses urllib.quote_plus to escape special characters in the tags
--fix-for-disk - adds an extra urllib.quote_plus when the urls are being built to account for browsing from disk weirdness in windows
thisismycontributionaccount added a commit to thisismycontributionaccount/tumblr-utils that referenced this issue Dec 19, 2018
I updated the documentation with the three new options I added for issue bbolli#159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants