Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feather files #390

Merged
merged 34 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9a1a9c2
add feather files and remove pint/libstempo
AaronDJohnson Aug 4, 2024
e996aef
remove to_pickle test, add to_feather test
AaronDJohnson Aug 4, 2024
ef37702
Merge branch 'dev' into feather_files
AaronDJohnson Aug 4, 2024
d047b59
reformat with black
AaronDJohnson Aug 5, 2024
a573638
reformat with OLDER VERSION of black
AaronDJohnson Aug 5, 2024
1ab8ad9
Merge pull request #386 from nanograv/dev
AaronDJohnson Aug 26, 2024
22706d7
Merge remote-tracking branch 'upstream/master' into feather_files
AaronDJohnson Aug 27, 2024
6171575
add feather data files for tests
AaronDJohnson Sep 13, 2024
6174a80
default to feather in tests, skip t2 and pint
AaronDJohnson Oct 16, 2024
f2e7065
lint
AaronDJohnson Oct 17, 2024
1847506
update tests with data, skip t2 pint tests
AaronDJohnson Oct 17, 2024
d83a97b
lint issue and remove libstempo install from CI
AaronDJohnson Oct 17, 2024
dba9c0e
uncomment some tests
AaronDJohnson Oct 17, 2024
a702e0e
UPDATE A SINGLE SPACE, NICE FIND LINTER
AaronDJohnson Oct 17, 2024
39bda59
only import if PINT is installed
AaronDJohnson Oct 17, 2024
6752e75
make 1909 reappear
AaronDJohnson Oct 17, 2024
8ae71e8
avoid testing wideband on feather files right now
AaronDJohnson Oct 17, 2024
ff8ba50
loosen tolerance by 1 OoM
AaronDJohnson Oct 17, 2024
0c8e867
remove codecov for tempo2/pint segments of Pulsar
AaronDJohnson Oct 17, 2024
fbf1bad
does this change codecov?
AaronDJohnson Oct 17, 2024
d95da54
fix spacing
AaronDJohnson Oct 17, 2024
5fbe2ce
lower required codecov to 70%
AaronDJohnson Oct 17, 2024
a9bb764
one more time...
AaronDJohnson Oct 17, 2024
6006e18
temporarily loosen codecov threshold
AaronDJohnson Oct 17, 2024
d0a0937
allow some metadata to not be set
AaronDJohnson Jan 2, 2025
0bf243a
add flags as U-type ndarrays
AaronDJohnson Jan 2, 2025
af2d014
Merge branch 'dev' into feather_files
AaronDJohnson Jan 2, 2025
8d06ebf
lint files
AaronDJohnson Jan 2, 2025
fd55e38
more linting
AaronDJohnson Jan 2, 2025
45d480a
try with an older version of black
AaronDJohnson Jan 2, 2025
8eaf4af
lint
AaronDJohnson Jan 2, 2025
c3e3ff3
lint again
AaronDJohnson Jan 2, 2025
02be683
temporarily increase % on codecov patch
AaronDJohnson Jan 2, 2025
6fa7e04
add usage example in usage.ipynb
AaronDJohnson Jan 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ omit =
plugins =
coverage_conditional_plugin

[report]
exclude_also =
no cover: start(?s:.)*?no cover: stop

[coverage_conditional_plugin]
rules =
"sys_version_info >= (3, 8)": py-gte-38
Expand Down
4 changes: 2 additions & 2 deletions .github/.codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ coverage:
status:
project:
default:
target: 85%
target: 70%
threshold: 6%
patch:
default:
target: auto
threshold: 6%
threshold: 25%

parsers:
gcov:
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/ci_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-13]
os: [ubuntu-latest, macos-latest]
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']

steps:
Expand All @@ -31,12 +31,10 @@ jobs:
run: |
brew unlink gcc && brew link gcc
brew install automake suite-sparse
curl -sSL https://raw.githubusercontent.com/vallis/libstempo/master/install_tempo2.sh | sh
- name: Install non-python dependencies on linux
if: runner.os == 'Linux'
run: |
sudo apt-get install libsuitesparse-dev
curl -sSL https://raw.githubusercontent.com/vallis/libstempo/master/install_tempo2.sh | sh
- name: Install dependencies and package
env:
SUITESPARSE_INCLUDE_DIR: "/usr/local/opt/suite-sparse/include/suitesparse/"
Expand Down Expand Up @@ -76,7 +74,6 @@ jobs:
- name: Install non-python dependencies on linux
run: |
sudo apt-get install libsuitesparse-dev
curl -sSL https://raw.githubusercontent.com/vallis/libstempo/master/install_tempo2.sh | sh
- name: Build
run: |
python -m pip install --upgrade pip setuptools wheel
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ clean-test: ## remove test and coverage artifacts
rm -fr htmlcov/
rm -rf coverage.xml

COV_COVERAGE_PERCENT ?= 85
COV_COVERAGE_PERCENT ?= 70
test: lint ## run tests quickly with the default Python
pytest -v --durations=10 --full-trace --cov-report html --cov-report xml \
--cov-config .coveragerc --cov-fail-under=$(COV_COVERAGE_PERCENT) \
Expand Down
70 changes: 70 additions & 0 deletions docs/_static/notebooks/usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,76 @@
"psr = Pulsar(parfiles, timfiles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `feather` files are now supported\n",
"\n",
"`enterprise` now supports the use of `feather` files to store the `Pulsar` objects. These files are compressed, and therefore they are useful for saving and loading large pulsar datasets. Below we show how to save and load a `Pulsar` object using `feather` files with a corresponding noise dictionary. Saving Pulsar objects this way requires the `pyarrow` package and `libstempo` or `PINT` to be installed so that we can create a `Pulsar` object using `par` and `tim` files. Once the `feather` file exists, we can load the `Pulsar` object without the need for `libstempo` or `PINT`.\n",
"\n",
"`feather` files can also take in dictionaries of noise parameters for each pulsar to be used in `enterprise` models. Below, we show how to save and load a `Pulsar` object with a noise dictionary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"psr_name = 'J1909-3744'\n",
"\n",
"# Here is the noise dictionary for this pulsar\n",
"params = {'J1909-3744_Rcvr_800_GASP_efac': 0.985523,\n",
" 'J1909-3744_Rcvr1_2_GUPPI_efac': 1.03462,\n",
" 'J1909-3744_Rcvr1_2_GASP_efac': 0.986438,\n",
" 'J1909-3744_Rcvr_800_GUPPI_efac': 1.05208,\n",
" 'J1909-3744_Rcvr1_2_GASP_log10_ecorr': -8.00662,\n",
" 'J1909-3744_Rcvr1_2_GUPPI_log10_ecorr': -7.13828,\n",
" 'J1909-3744_Rcvr_800_GASP_log10_ecorr': -7.86032,\n",
" 'J1909-3744_Rcvr_800_GUPPI_log10_ecorr': -7.14764,\n",
" 'J1909-3744_Rcvr_800_GASP_log10_equad': -6.6358,\n",
" 'J1909-3744_Rcvr1_2_GUPPI_log10_equad': -8.31285,\n",
" 'J1909-3744_Rcvr1_2_GASP_log10_equad': -7.97229,\n",
" 'J1909-3744_Rcvr_800_GUPPI_log10_equad': -7.43842,\n",
" 'J1909-3744_log10_A': -15.1073,\n",
" 'J1909-3744_gamma': 2.88933}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save to feather file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"psr = Pulsar(datadir + f\"/{psr_name}_NANOGrav_9yv1.gls.par\", datadir + f\"/{psr_name}_NANOGrav_9yv1.tim\")\n",
"\n",
"psr.to_feather(datadir + f\"/{psr_name}_NANOGrav_9yv1.t2.feather\", noisedict=params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from feather file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"psr = Pulsar(datadir + f\"/{psr_name}_NANOGrav_9yv1.t2.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down
133 changes: 127 additions & 6 deletions enterprise/pulsar.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
import logging
import os
import pickle

from pyarrow import feather
from pyarrow import Table
from io import StringIO

import numpy as np
Expand All @@ -23,7 +26,9 @@
try:
import libstempo as t2
except ImportError:
logger.warning("libstempo not installed. Will use PINT instead.") # pragma: no cover
logger.warning(
"libstempo not installed. PINT or libstempo are required to use par and tim files."
) # pragma: no cover
t2 = None

try:
Expand All @@ -32,7 +37,7 @@
from pint.residuals import Residuals as resids
from pint.toa import TOAs
except ImportError:
logger.warning("PINT not installed. Will use libstempo instead.") # pragma: no cover
logger.warning("PINT not installed. PINT or libstempo are required to use par and tim files.") # pragma: no cover
pint = None

try:
Expand All @@ -42,10 +47,6 @@
const = None
u = None

if pint is None and t2 is None:
err_msg = "Must have either PINT or libstempo timing package installed"
raise ImportError(err_msg)


def get_maxobs(timfile):
"""Utility function to return number of lines in tim file.
Expand Down Expand Up @@ -161,6 +162,9 @@

self.sort_data()

def to_feather(self, filename, noisedict=None):
FeatherPulsar.save_feather(self, filename, noisedict=noisedict)

Check warning on line 166 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L166

Added line #L166 was not covered by tests

def drop_not_picklable(self):
"""Drop all attributes that cannot be pickled.

Expand Down Expand Up @@ -421,6 +425,8 @@

if dmx:
self._dmx = dmx
else:
self._dmx = None

Check warning on line 429 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L429

Added line #L429 was not covered by tests

def _get_radec(self, model):
if hasattr(model, "RAJ") and hasattr(model, "DECJ"):
Expand Down Expand Up @@ -565,6 +571,8 @@

if dmx:
self._dmx = dmx
else:
self._dmx = None

Check warning on line 575 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L575

Added line #L575 was not covered by tests

def _get_radec(self, t2pulsar):
if "RAJ" in np.concatenate((t2pulsar.pars(which="fit"), t2pulsar.pars(which="set"))):
Expand Down Expand Up @@ -655,7 +663,120 @@
psr._deflated = "destroyed"


class FeatherPulsar:
columns = ["toas", "stoas", "toaerrs", "residuals", "freqs", "backend_flags", "telescope"]
vector_columns = ["Mmat", "sunssb", "pos_t"]
tensor_columns = ["planetssb"]
# flags are done separately
metadata = ["name", "dm", "dmx", "pdist", "pos", "phi", "theta"]
# notes: currently ignores _isort/__isort and gets sorted versions

def __init__(self):
pass

def __str__(self):
return f"<Pulsar {self.name}: {len(self.residuals)} res, {self.Mmat.shape[1]} pars>"

Check warning on line 678 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L678

Added line #L678 was not covered by tests

def __repr__(self):
return str(self)

Check warning on line 681 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L681

Added line #L681 was not covered by tests

def sort_data(self):
"""Sort data by time. This function is defined so that tests will pass."""
self._isort = np.argsort(self.toas, kind="mergesort")
self._iisort = np.zeros(len(self._isort), dtype=int)
for ii, p in enumerate(self._isort):
self._iisort[p] = ii

@classmethod
def read_feather(cls, filename):
f = feather.read_table(filename)
self = FeatherPulsar()

for array in FeatherPulsar.columns:
if array in f.column_names:
setattr(self, array, f[array].to_numpy())

for array in FeatherPulsar.vector_columns:
cols = [c for c in f.column_names if c.startswith(array)]
setattr(self, array, np.array([f[col].to_numpy() for col in cols]).swapaxes(0, 1).copy())

for array in FeatherPulsar.tensor_columns:
rows = sorted(set(["_".join(c.split("_")[:-1]) for c in f.column_names if c.startswith(array)]))
cols = [[c for c in f.column_names if c.startswith(row)] for row in rows]
setattr(
self,
array,
np.array([[f[col].to_numpy() for col in row] for row in cols]).swapaxes(0, 2).swapaxes(1, 2).copy(),
)

self.flags = {}
for array in [c for c in f.column_names if c.startswith("flags_")]:
self.flags["_".join(array.split("_")[1:])] = f[array].to_numpy().astype("U")

meta = json.loads(f.schema.metadata[b"json"])
for attr in FeatherPulsar.metadata:
if attr in meta:
setattr(self, attr, meta[attr])
else:
print(f"Pulsar.read_feather: cannot find {attr} in feather file {filename}.")

Check warning on line 721 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L721

Added line #L721 was not covered by tests

if "noisedict" in meta:
setattr(self, "noisedict", meta["noisedict"])

self.sort_data()

return self

def to_list(a):
return a.tolist() if isinstance(a, np.ndarray) else a

Check warning on line 731 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L731

Added line #L731 was not covered by tests

def save_feather(self, filename, noisedict=None):
self._toas = self._toas.astype(float)
pydict = {array: getattr(self, array) for array in FeatherPulsar.columns}

Check warning on line 735 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L734-L735

Added lines #L734 - L735 were not covered by tests

pydict.update(

Check warning on line 737 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L737

Added line #L737 was not covered by tests
{
f"{array}_{i}": getattr(self, array)[:, i]
for array in FeatherPulsar.vector_columns
for i in range(getattr(self, array).shape[1])
}
)

pydict.update(

Check warning on line 745 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L745

Added line #L745 was not covered by tests
{
f"{array}_{i}_{j}": getattr(self, array)[:, i, j]
for array in FeatherPulsar.tensor_columns
for i in range(getattr(self, array).shape[1])
for j in range(getattr(self, array).shape[2])
}
)

pydict.update({f"flags_{flag}": self.flags[flag] for flag in self.flags})

Check warning on line 754 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L754

Added line #L754 was not covered by tests

meta = {}
for attr in Pulsar.metadata:
if hasattr(self, attr):
meta[attr] = Pulsar.to_list(getattr(self, attr))

Check warning on line 759 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L756-L759

Added lines #L756 - L759 were not covered by tests
else:
print(f"Pulsar.save_feather: cannot find {attr} in Pulsar {self.name}.")

Check warning on line 761 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L761

Added line #L761 was not covered by tests

# use attribute if present
noisedict = getattr(self, "noisedict", None) if noisedict is None else noisedict
if noisedict:

Check warning on line 765 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L764-L765

Added lines #L764 - L765 were not covered by tests
# only keep noisedict entries that are for this pulsar (requires pulsar name to be first part of the key!)
meta["noisedict"] = {par: val for par, val in noisedict.items() if par.startswith(self.name)}

Check warning on line 767 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L767

Added line #L767 was not covered by tests

feather.write_feather(Table.from_pydict(pydict, metadata={"json": json.dumps(meta)}), filename)

Check warning on line 769 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L769

Added line #L769 was not covered by tests


def Pulsar(*args, **kwargs):
featherfile = [x for x in args if isinstance(x, str) and x.endswith(".feather")]
if featherfile:
return FeatherPulsar.read_feather(featherfile[0])
featherfile = kwargs.get("filepath", None)
if featherfile:
return FeatherPulsar.read_feather(featherfile)

Check warning on line 778 in enterprise/pulsar.py

View check run for this annotation

Codecov / codecov/patch

enterprise/pulsar.py#L776-L778

Added lines #L776 - L778 were not covered by tests

ephem = kwargs.get("ephem", None)
clk = kwargs.get("clk", None)
bipm_version = kwargs.get("bipm_version", None)
Expand Down
1 change: 0 additions & 1 deletion enterprise/signals/white_signals.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ def MeasurementNoise(
selection=Selection(selections.no_selection),
name="",
):

"""Class factory for EFAC+EQUAD measurement noise
(with tempo/tempo2/pint parameter convention, variance = efac^2 (toaerr^2 + t2equad^2)).
Leave out log10_t2equad to use EFAC noise only."""
Expand Down
3 changes: 1 addition & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ scipy>=1.2.0
ephem>=3.7.6.0
healpy>=1.14.0
scikit-sparse>=0.4.5
pint-pulsar>=0.8.3
libstempo>=2.4.4
pyarrow>=17.0.0
3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@
"ephem>=3.7.6.0",
"healpy>=1.14.0",
"scikit-sparse>=0.4.5",
"pint-pulsar>=0.8.3",
"libstempo>=2.4.4",
"pyarrow>=17.0.0",
]

test_requirements = []
Expand Down
Binary file added tests/data/1713.Sep.t2.feather
Binary file not shown.
Binary file added tests/data/B1855+09_NANOGrav_9yv1.t2.feather
Binary file not shown.
Binary file added tests/data/B1937+21_NANOGrav_9yv1.t2.feather
Binary file not shown.
Binary file added tests/data/J1909-3744_NANOGrav_9yv1.t2.feather
Binary file not shown.
19 changes: 19 additions & 0 deletions tests/enterprise_test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,25 @@

import os

# Are we on GitHub Actions?
ON_GITHUB = os.getenv("GITHUB_ACTIONS")

# Is libstempo installed?
try:
import libstempo # noqa

LIBSTEMPO_INSTALLED = True
except ImportError:
LIBSTEMPO_INSTALLED = False

# Is PINT installed?
try:
import pint # noqa

PINT_INSTALLED = True
except ImportError:
PINT_INSTALLED = False

# Location of this file and the test data scripts
testdir = os.path.dirname(os.path.abspath(__file__))
datadir = os.path.join(testdir, "data")
Loading
Loading