Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0 creates permission error on .suffix_cache #209

Closed
jaz2038 opened this issue Oct 20, 2020 · 21 comments
Closed

3.0 creates permission error on .suffix_cache #209

jaz2038 opened this issue Oct 20, 2020 · 21 comments
Assignees

Comments

@jaz2038
Copy link

jaz2038 commented Oct 20, 2020

We had a python dependency on a package that had a dependency on tldextract > 2.0. Our build pipeline has been pulling in tldextract 2.3, but today it pulled in 3.0 and we started getting this exception in our environment:

File "/usr/local/lib/python3.7/site-packages/tldextract/cache.py", line 104, in run_and_cache
cache_filepath = self._key_to_cachefile_path(namespace, key_args)
File "/usr/local/lib/python3.7/site-packages/tldextract/cache.py", line 95, in _key_to_cachefile_path
_make_dir(cache_path)
File "/usr/local/lib/python3.7/site-packages/tldextract/cache.py", line 155, in _make_dir
os.makedirs(os.path.dirname(filename))
File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/local/lib/python3.7/os.py", line 223, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.7/site-packages/tldextract/.suffix_cache'

@john-kurkowski
Copy link
Owner

john-kurkowski commented Oct 22, 2020

This file/folder used to be called /path/to/tldextract/.tld_set. Before 3.0, did you do something special to allow .tld_set? Can you do the same with the new file, .suffix_cache? This package always assumed the folder containing these cache files was writable.*

*To much confusion. See this comment and related issues.

@john-kurkowski
Copy link
Owner

I suppose the stacktrace could log the same helpful hint as here.

"unable to cache %s.%s in %s. This could refresh the "
"Public Suffix List over HTTP every app startup. "
"Construct your `TLDExtract` with a writable `cache_dir` or "
"set `cache_dir=False` to silence this warning. %s"

@jaz2038
Copy link
Author

jaz2038 commented Oct 22, 2020

never did anything special w/ .tld_set/etc... we don't have a direct dependency on tldextract so we never did anything around it. we get tldextract from this code: https://github.com/codelucas/newspaper which has this dependency setup:

  • tldextract [required: >=2.0.1, installed: 2.2.3]
    (looks like maybe more of a bug in newspaper? since it should say required: >=2.0.1,<3 but unfortunately this library is obsolete/hasn't build for a long time)

@jaz2038
Copy link
Author

jaz2038 commented Oct 22, 2020

(for now I just manually added tldextract==2.2.3 dependency into my requirements.txt to get around this - please feel free to close this bug unless you think there are any fixes necessary - thx.

@aleprovencio
Copy link

Hi, I'm having this issue on the qute-pass script which is bundled with qutebrowser and uses python-tldextract as a dependency. Downgrading solved it for me.

@bastbnl
Copy link

bastbnl commented Oct 23, 2020

Same issue here. And it's a bit troubling as I'm using this module in a Django project with celery. The Django processes are running in a different account than the celery project and I can't run the project right now.

Maybe a more decoupled approach would be a better fit. I understand why you want to cache the file but I don't understand why you want the refresh the content each time the module is imported. It somehow feels more natural to use the cached downloads when it's there.

So here's what I did to get it working in my setup:

from tldextract import TLDExtract
from tldextract.cache import DiskCache


class CacheBeforeLiveDiskCache(DiskCache, ):
    """
    This cache loads from the cache instead of just downloading it

    """
    def run_and_cache(self, func, namespace, kwargs, hashed_argnames):

        # Unable to open the lockfile, so it seems
        key_args = {k: v for k, v in kwargs.items() if k in hashed_argnames}
        cache_filepath = self._key_to_cachefile_path("urls", key_args)
        if isfile(cache_filepath, ):
            return self.get(namespace="urls", key=key_args)

        return super().run_and_cache(func, namespace, kwargs, hashed_argnames)


class CustomTLDExtract(TLDExtract, ):
    def __init__(self, *args, **kwargs):
        cache_dir = abspath(
            join(
                BASE_DIR,
                '..',
                'run/tldextract-data/cache',
            ),
        )
        super().__init__(
            cache_dir=cache_dir,
            suffix_list_urls=[
                'https://publicsuffix.org/list/public_suffix_list.dat',
            ],
        )
        self._cache = CacheBeforeLiveDiskCache(
            cache_dir,
        )


extract = CustomTLDExtract()

@john-kurkowski
Copy link
Owner

I understand why you want to cache the file but I don't understand why you want the refresh the content each time the module is imported.

Right, that is unintended. It's supposed to log the permission error then work with what it has. I guess it's an uncaught error, so subsequent re-imports re-raise?

@john-kurkowski john-kurkowski self-assigned this Oct 24, 2020
john-kurkowski added a commit that referenced this issue Oct 24, 2020
Fixes #209.

* Move makedir side-effect from string building helper function into try block
* Duplicate warning from writing a single cache file
* Log this at most once per app
@brycedrennan
Copy link
Collaborator

brycedrennan commented Oct 24, 2020

@bastbnl @aleprovencio @jaz2038 @falonso-caa @ivomac @Dophin2009 @vgrabovets Apologies for this issue caused by my recent changes.

I'd like to be able to recreate this error so I can be confident we have addressed it. Would you be willing to share any of the following details of your build environment?

  • Operating system
  • dockerfile
  • python version
  • how python was installed
  • whether you're using a virtual environment
  • what user the script is running under when this error occurs
  • anything else you think might be relevant

@croyleje
Copy link

Just thought I would leave a comment letting you know I am also having the same issue had to downgrade to version 2.2.3-1 to resolve the issue when I get back I will post the information you requested.
OS arch
PYTHON 3.8.6
Installed via pacman and tried installing via pip same issue
Not a virtual environment

I don't have the info with me but the original permission error for .suffix_cache did not exist that location in the error.

If there is anything else I can supply to help please let me know.
Jason

@mirryi
Copy link

mirryi commented Oct 24, 2020

@brycedrennan I reinstalled with pip install --user to avoid permission issues. My environment before was:

  • Arch Linux
  • Python 3.8, installed via pacman
  • tldextract installed via pacman (python-tldextract package)
  • No virtual environment
  • Called in the qute-bitwarden userscript included with qutebrowser

Let me know if there's anything I can do to help.

@brycedrennan
Copy link
Collaborator

brycedrennan commented Oct 25, 2020

Thanks for the info. I setup an arch linux instance and recreated the error. It's caused by installing python and tldextract using sudo pacman which installs everything as root.

I don't understand how but there is a .tld_set file there that has global write permissions set. Perhaps something about the pacman python-tldextract package was altered to overcome this issue.

@brycedrennan
Copy link
Collaborator

@john-kurkowski yes I think using something like appdirs could help out here. Will investigate more.

I also wonder if we should change the default behavior to not update the list - make it something that you can manually call. That's what they did on archlinux.

@mirryi
Copy link

mirryi commented Oct 25, 2020

Confirmed archlinux previously added a workaround to the package

That makes sense.

yes I think using something like appdirs could help out here. Will investigate more.

I second using something like appdirs to follow the XDG Base Directory specification for caching. It would probably be best to not rely on having access to the installation directory.

john-kurkowski added a commit that referenced this issue Oct 25, 2020
…211)

Stanches #209.

* Move makedir side-effect from string building helper function into try block
* Duplicate warning from writing a single cache file
* Log this at most once per app
* Remove unused vars
@john-kurkowski
Copy link
Owner

I released #211 as 3.0.2 for now, to stanch the errors. That should put the library on par with its 2.x series. Let's leave this open to track not relying on access to the installation directory.

@brycedrennan
Copy link
Collaborator

Sounds good. Unfortunately appdirs sounds like it's not being maintained anymore. ActiveState/appdirs#79

We could probably still use XDG_CACHE_HOME. Not sure how I feel about a shared cache across virtualenvs.

@bastbnl
Copy link

bastbnl commented Oct 25, 2020

@brycedrennan do you still need the python version, how it was installed, etc?

I was considering my options using the caching approach and I think it would increase the ways I can use the module. It would be great if the caching engine would be pluggable, so I can write an engine for the project I'm working on and just cache the tlds. I could create one that uses the Django cache with the benefit of only downloading the tlds once for every engine working with the same cache instance

@lifenautjoe
Copy link

(for now I just manually added tldextract==2.2.3 dependency into my requirements.txt to get around this - please feel free to close this bug unless you think there are any fixes necessary - thx.

Thanks ! Was going crazy about this one. Will lock it up until there's a fix.

Also happy to sponsor whomever is taking care of this project a couple coffees :-)

@brycedrennan
Copy link
Collaborator

@bastbnl if you're not on arch linux then yes it would be helpful to know what environment you're having difficulty

@john-kurkowski
Copy link
Owner

Will lock it up until there's a fix.

3.0.2 should catch the error and log a warning, instead of the uncaught error.

This was referenced Oct 25, 2020
@john-kurkowski
Copy link
Owner

3.0.2 should catch the error and log a warning, instead of the uncaught error.

That should put the library on par with its 2.x series. Let's leave this open to track not relying on access to the installation directory.

On 2nd thought, the OP's bug should be fixed. People can use 3.x again, as they did 2.x. We can close this. Let's track redesign discussions in #212.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants