Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to align with rprojroot and here #20

Merged
merged 3 commits into from
Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 47 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Find relative paths from a project root directory
# Project-oriented workflow in Python

Finding project directories in Python (data science) projects, just like there R [`here`][here] and [`rprojroot`][rprojroot] packages.
Finding project directories in Python (data science) projects.

This library aims to provide both
the programmatic functionality from the R [`rprojroot`][rprojroot] package
and the interactive functionality from the R [`here`][here] package.

## Motivation

**Problem**: I have a project that has a specific folder structure,
for example, one mentioned in [Noble 2009][noble2009] or something similar to [this project template][project-template],
Expand All @@ -11,60 +17,86 @@ and I want to be able to:
3. Reference datasets from a root directory when using a jupyter notebook because everytime I use a jupyter notebook,
the working directory changes to the location of the notebook, not where I launched the notebook server.

**Solution**: `pyprojroot` finds the root working directory for your project as a `pathlib` object.
**Solution**: `pyprojroot` finds the root working directory for your project as a `pathlib.Path` object.
You can now use the `here` function to pass in a relative path from the project root directory
(no matter what working directory you are in the project),
and you will get a full path to the specified file.
That is, in a jupyter notebook,
you can write something like `pandas.read_csv(here('./data/my_data.csv'))`
you can write something like `pandas.read_csv(here('data/my_data.csv'))`
instead of `pandas.read_csv('../data/my_data.csv')`.
This allows you to restructure the files in your project without having to worry about changing file paths.

Great for reading and writing datasets!

Further reading:

* [Project-oriented workflows](https://www.tidyverse.org/articles/2017/12/workflow-vs-script/)
* [Stop the working directory insanity](https://gist.github.com/jennybc/362f52446fe1ebc4c49f)
* [Ode to the here package](https://github.com/jennybc/here_here)

## Installation

### pip

```bash
pip install pyprojroot
python -m pip install pyprojroot
```

### conda

https://anaconda.org/conda-forge/pyprojroot

```bash
conda install -c conda-forge pyprojroot
conda install -c conda-forge pyprojroot
```

## Usage
## Example Usage

### Interactive

This is based on the R [`here`][here] library.

```python
from pyprojroot import here
from pyprojroot.here import here

here()
```

### Example
### Programmatic

This based on the R [`rprojroot`][rprojroot] library.

```python
import pyprojroot

base_path = pyprojroot.find_root(pyprojroot.has_dir(".git"))
```

## Demonstration

Load the packages

```
In [1]: from pyprojroot import here
In [1]: from pyprojroot.here import here
In [2]: import pandas as pd
```

The current working directory is the "notebooks" folder

```
In [3]: !pwd
/home/dchen/git/hub/scipy-2019-pandas/notebooks
```

In the notebooks folder, I have all my notebooks

```
In [4]: !ls
01-intro.ipynb 02-tidy.ipynb 03-apply.ipynb 04-plots.ipynb 05-model.ipynb Untitled.ipynb
```

If I wanted to access data in my notebooks I'd have to use `../data`

```
In [5]: !ls ../data
billboard.csv country_timeseries.csv gapminder.tsv pew.csv table1.csv table2.csv table3.csv table4a.csv table4b.csv weather.csv
Expand All @@ -73,8 +105,9 @@ billboard.csv country_timeseries.csv gapminder.tsv pew.csv table1.csv table
However, with there `here` function, I can access my data all from the project root.
This means if I move the notebook to another folder or subfolder I don't have to change the path to my data.
Only if I move the data to another folder would I need to change the path in my notebook (or script)

```
In [6]: pd.read_csv(here('./data/gapminder.tsv'), sep='\t').head()
In [6]: pd.read_csv(here('data/gapminder.tsv'), sep='\t').head()
Out[6]:
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
Expand All @@ -84,9 +117,10 @@ Out[6]:
4 Afghanistan Asia 1972 36.088 13079460 739.981106
```

By the way, you get a `pathlib` object path back!
By the way, you get a `pathlib.Path` object path back!

```
In [7]: here('./data/gapminder.tsv')
In [7]: here('data/gapminder.tsv')
Out[7]: PosixPath('/home/dchen/git/hub/scipy-2019-pandas/data/gapminder.tsv')
```

Expand Down
7 changes: 3 additions & 4 deletions pyprojroot/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from .pyprojroot import here, py_project_root # noqa:F401

__all__ = ["here", "py_project_root"]
__version__ = "0.2.0"
from .criterion import *
from .root import find_root, find_root_with_reason
from .here import here
81 changes: 81 additions & 0 deletions pyprojroot/criterion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
"""
This module is inspired by the `rprojroot` library for R.
See https://github.com/r-lib/rprojroot.

It is intended for interactive or programmatic only.
"""

import pathlib as _pathlib
import typing
from os import PathLike as _PathLike

# TODO: It would be nice to have a class that encapsulates these checks,
# so that we can implement methods like |, !, &, ^ operators

# TODO: Refactor in a way that allows creation of reasons


def as_root_criterion(criterion) -> typing.Callable:
if callable(criterion):
return criterion

# criterion must be a Collection, rather than just Iterable
if isinstance(criterion, _PathLike):
criterion = [criterion]
criterion = list(criterion)

def f(path: _pathlib.Path) -> bool:
for c in criterion:
if isinstance(c, _PathLike):
if (path / c).exists():
return True
else:
if c(path):
return True
return False

return f


def has_file(file: _PathLike) -> typing.Callable:
"""
Check that specified file exists in path.

Note that a directory with that name will not match.
"""

def f(path: _pathlib.Path) -> bool:
return (path / file).is_file()

return f


def has_dir(file: _PathLike) -> typing.Callable:
"""
Check that specified directory exists.

Note that a regular file with that name will not match.
"""

def f(path: _pathlib.Path) -> bool:
return (path / file).is_dir()

return f


def matches_glob(pat: str) -> typing.Callable:
"""
Check that glob has at least one match.
"""

def f(path: _pathlib.Path) -> bool:
matches = path.glob(pat)
try:
# Only need to get one item from generator
next(matches)
except StopIteration:
return False
else:
return True

return f
55 changes: 55 additions & 0 deletions pyprojroot/here.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""
This module is inspired by the `here` library for R.
See https://github.com/r-lib/here.

It is intended for interactive use only.
"""

import pathlib as _pathlib
import warnings as _warnings
from os import PathLike as _PathLike

from . import criterion
from .root import find_root, find_root_with_reason

CRITERIA = [
criterion.has_file(".here"),
criterion.has_dir(".git"),
criterion.matches_glob("*.Rproj"),
criterion.has_file("requirements.txt"),
criterion.has_file("setup.py"),
criterion.has_dir(".dvc"),
criterion.has_dir(".spyproject"),
criterion.has_file("pyproject.toml"),
criterion.has_dir(".idea"),
criterion.has_dir(".vscode"),
]


def get_here():
# TODO: This should only find_root once per session
start = _pathlib.Path.cwd()
path, reason = find_root_with_reason(CRITERIA, start=start)
return path, reason


# TODO: Implement set_here


def here(relative_project_path: _PathLike = "", warn_missing=False) -> _pathlib.Path:
"""
Returns the path relative to the projects root directory.
:param relative_project_path: relative path from project root
:param project_files: list of files to track inside the project
:param warn_missing: warn user if path does not exist (default=False)
:return: pathlib path
"""
path, reason = get_here()
# TODO: Show reason when requested

if relative_project_path:
path = path / relative_project_path

if warn_missing and not path.exists():
_warnings.warn(f"Path doesn't exist: {path!s}")
return path
52 changes: 0 additions & 52 deletions pyprojroot/pyprojroot.py

This file was deleted.

66 changes: 66 additions & 0 deletions pyprojroot/root.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""
This module is inspired by the `rprojroot` library for R.
See https://github.com/r-lib/rprojroot.

It is intended for interactive or programmatic only.
"""

import pathlib as _pathlib
import typing as _typing
from os import PathLike as _PathLike

from .criterion import as_root_criterion as _as_root_criterion


def as_start_path(start: _PathLike) -> _pathlib.Path:
if start is None:
return _pathlib.Path.cwd()
if not isinstance(start, _pathlib.Path):
start = _pathlib.Path(start)
# TODO: consider `start = start.resolve()`
return start


def find_root_with_reason(
criterion, start: _PathLike = None
) -> _typing.Tuple[_pathlib.Path, str]:
"""
Find directory matching root criterion with reason.

Recursively search parents of start path for directory
matching root criterion with reason.
"""
# TODO: Implement reasons

# Prepare inputs
criterion = _as_root_criterion(criterion)
start = as_start_path(start)

# Check start
if start.is_dir() and criterion(start):
return start, "Pass"

# Iterate over all parents
# TODO: Consider adding maximum depth
# TODO: Consider limiting depth to path (e.g. "if p == stop: raise")
for p in start.parents:
if criterion(p):
return p, "Pass"

# Not found
raise RuntimeError("Project root not found.")


def find_root(criterion, start: _PathLike = None, **kwargs) -> _pathlib.Path:
"""
Find directory matching root criterion.

Recursively search parents of start path for directory
matching root criterion.
"""
try:
root, _ = find_root_with_reason(criterion, start=start, **kwargs)
except RuntimeError as ex:
raise ex
else:
return root
Loading