Skip to content

Commit

Permalink
Merge pull request #406 from astronomy-commons/sean/docs-reformat
Browse files Browse the repository at this point in the history
Reformat Docs Structure, Index page, and Getting Started Guide
  • Loading branch information
smcguire-cmu committed Aug 15, 2024
2 parents 72889fa + 1e3f561 commit 218e140
Show file tree
Hide file tree
Showing 17 changed files with 320 additions and 297 deletions.
8 changes: 6 additions & 2 deletions docs/_static/custom.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.jupyter-widgets {
color: var(--pst-color-text-base) !important;
#lsdb h1 {
text-align: center;
}

.skip-link {
background-color: transparent !important;
}
Binary file added docs/_static/lincc_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ztf_catalog_lazy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ztf_x_gaia.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ztf_x_gaia_tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
"sphinx.ext.graphviz",
"sphinx_design",
]

extensions.append("autoapi.extension")
Expand Down Expand Up @@ -67,6 +68,12 @@
html_static_path = ["_static"]
html_css_files = ["custom.css"]

html_logo = "_static/lincc_logo.png"
html_title = "LSDB"
html_context = {"default_mode": "light"}

pygments_style = "sphinx"

# Cross-link hipscat documentation from the API reference:
# https://docs.readthedocs.io/en/stable/guides/intersphinx.html
intersphinx_mapping = {
Expand Down
31 changes: 31 additions & 0 deletions docs/developer/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,37 @@
Contributing to LSDB
===============================================================================

Installation from Source
---------------------

To install the latest development version of LSDB you will want to build it from source. First, with your virtual environment activated, type in your terminal:

.. code-block:: bash
git clone https://github.com/astronomy-commons/lsdb
cd lsdb/
To install the package and a minimum number of dependencies you can run:

.. code-block:: bash
python -m pip install .
python -m pip install pytest # to validate package installation
In alternative, you can execute the `setup_dev` script which installs all the additional requirements
to setup a development environment.

.. code-block:: bash
chmod +x .setup_dev.sh
./.setup_dev.sh
Finally, to check that your package has been correctly installed, run the package unit tests:

.. code-block:: bash
python -m pytest
Find (or make) a new GitHub issue
-------------------------------------------------------------------------------

Expand Down
201 changes: 201 additions & 0 deletions docs/getting-started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
Getting Started with LSDB
============

Installation
--------------------------

The latest release version of LSDB is available to install with `pip <https://pypi.org/project/lsdb/>`_ or `conda <https://anaconda.org/conda-forge/lsdb/>`_.

.. code-block:: bash
python -m pip install lsdb
.. code-block:: bash
conda install -c conda-forge lsdb
.. hint::

We recommend using a virtual environment. Before installing the package, create and activate a fresh
environment. Here are some examples with different tools:

.. tab-set::

.. tab-item:: Conda

.. code-block:: bash
conda create -n lsdb_env python=3.11
conda activate lsdb_env
.. tab-item:: venv

.. code-block:: bash
python -m venv ./lsdb_env
source ./lsdb_env/bin/activate
.. tab-item:: pyenv

With the pyenv-virtualenv plug-in:

.. code-block:: bash
pyenv virtualenv 3.11 lsdb_env
pyenv local lsdb_env
We recommend Python versions **>=3.9, <=3.12**.

LSDB can also be installed from source on `GitHub <https://github.com/astronomy-commons/lsdb>`_. See our
advanced installation instructions in the :doc:`contribution guide </developer/contributing>`.

Quickstart
--------------------------

LSDB is built on top of `Dask DataFrame <https://docs.dask.org/en/stable/dataframe.html>`_ which allows workflows
to run in parallel on distributed environments, and scale to large, out of memory datasets. For this to work,
Catalogs are loaded **lazily**, meaning that only the metadata is loaded at first. This way, LSDB can plan
how tasks will be executed in the future without actually doing any computation. See our :doc:`tutorials </tutorials>`
for more information.


Loading a Catalog
~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's start by loading a HiPSCat formatted Catalog into LSDB. Use the :func:`lsdb.read_hipscat` function to
lazy load a catalog object. We'll pass in the URL to load the Zwicky Transient Facility Data Release 14
Catalog, and specify which columns we want to use from it.

.. code-block:: python
import lsdb
ztf = lsdb.read_hipscat(
'https://data.lsdb.io/unstable/ztf/ztf_dr14/',
columns=["ra", "dec", "ps1_objid", "nobs_r", "mean_mag_r"],
)
>> ztf
.. image:: _static/ztf_catalog_lazy.png
:align: center
:alt: The Lazy LSDB Representation of Gaia DR3


Here we can see the lazy representation of an LSDB catalog object, showing its metadata such as the column
names and their types without loading any data (See the ellipsis in the table as placeholders where you would
usually see values).

.. important::

We've specified 5 columns to load here. It's important for performance to select only the columns you need
for your workflow. Without specifying any columns, all possible columns will be loaded when
the workflow is executed, making everything much slower and using much more memory.


Where to get Catalogs
~~~~~~~~~~~~~~~~~~~~~~~~~~
LSDB can load any catalogs in the HiPSCat format, locally or from remote sources. There are a number of
catalogs available publicly to use from the cloud. You can see them with their URLs to load in LSDB at our
website `data.lsdb.io <https://data.lsdb.io>`_


If you have your own data not in this format, you can import it by following the instructions in our
:doc:`importing catalogs tutorial section. </tutorials/import_catalogs>`



Performing Filters
~~~~~~~~~~~~~~~~~~~~~~~~~~

LSDB can perform spatial filters fast, taking advantage of HiPSCat's spatial partitioning. These optimized
filters have their own methods, such as :func:`cone_search <lsdb.catalog.Catalog.cone_search>`. For the list
of these methods see the full docs for the :func:`Catalog <lsdb.catalog.Catalog>` class.

.. code-block:: python
ztf_cone = ztf.cone_search(ra=40, dec=30, radius_arcsec=100)
Other filters on columns can be performed in the same way that you would on a pandas DataFrame.

.. code-block:: python
ztf_filtered = ztf_cone[ztf_cone["mean_mag_r"] < 18]
ztf_filtered = ztf_filtered.query("nobs_r > 50")
Cross Matching
~~~~~~~~~~~~~~~~~~~~~~~~~~

Now we've filtered our catalog, let's try cross-matching! We'll need to load another catalog first. For a
catalog on the right side of a cross-match, we need to make sure that we load it with a ``margin_cache`` to
get accurate results. This should be provided with the catalog by the catalog's data provider. See the
:doc:`margins tutorial section </tutorials/margins>` for more.

.. code-block:: python
gaia = lsdb.read_hipscat(
'https://data.lsdb.io/unstable/gaia_dr3/gaia/',
columns=["ra", "dec", "phot_g_n_obs", "phot_g_mean_flux", "pm"],
margin_cache="https://data.lsdb.io/unstable/gaia_dr3/gaia_10arcs/",
)
Once we've got our other catalog, we can crossmatch the two together!

.. code-block:: python
ztf_x_gaia = ztf_filtered.crossmatch(gaia, n_neighbors=1, radius_arcsec=3)
Computing
~~~~~~~~~~~~~~~~~~~~~~~~~~

We've now planned the crossmatch lazily, but it still hasn't been actually performed. To load the data and run
the workflow we'll call the ``compute()`` method, which will perform all the tasks and return the result as a
pandas DataFrame with all the computed values.

.. code-block:: python
result_df = ztf_x_gaia.compute()
>> result_df
.. image:: _static/ztf_x_gaia.png
:align: center
:alt: The result of cross-matching our filtered ztf and gaia


Saving the Result
~~~~~~~~~~~~~~~~~~~~~~~~~~

For large results, it won't be possible to ``compute()`` since the full result won't be able to fit into memory.
So instead, we can run the computation and save the results directly to disk in hipscat format.

.. code-block:: python
ztf_x_gaia.to_hipscat("./ztf_x_gaia")
This creates the following HiPSCat Catalog on disk:

.. code-block::
ztf_x_gaia/
├── Norder=4/
│ ├── Dir=0/
│ │ ├── Npix=57.parquet
│ │ └── ...
│ └── ...
├── _metadata
├── _common_metadata
├── catalog_info.json
├── partition_info.csv
└── provenance_info.json
Creation of Jupyter Kernel
--------------------------

You may want to work with LSDB on Jupyter notebooks and, therefore, you need a kernel where
our package is installed. To install a kernel for your environment, type:

.. code-block:: bash
python -m ipykernel install --user --name lsdb_env --display-name "lsdb_kernel"
It should now be available for selection in your Jupyter dashboard!
59 changes: 41 additions & 18 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,17 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
LSDB - Large Survey DataBase
LSDB
========================================================================================

LSDB is a framework that facilitates and enables fast spatial analysis for extremely large astronomical catalogs
(i.e. querying and crossmatching O(1B) sources). It aims to address large-scale data processing challenges, in
particular those brought up by `LSST <https://www.lsst.org/about>`_.
LSDB (Large Survey DataBase) is a python framework that enables simple, fast spatial analysis of extremely
large astronomical catalogs (e.g. querying and crossmatching O(1B) sources). It aims to address large-scale
data processing challenges, in particular those brought up by `LSST <https://www.lsst.org/about>`_.

Built on top of Dask to efficiently scale and parallelize operations across multiple workers, it leverages
the `HiPSCat <https://hipscat.readthedocs.io/en/stable/>`_ data format for surveys in a partitioned HEALPix
(Hierarchical Equal Area isoLatitude Pixelization) structure.
Built on top of Dask to efficiently scale and parallelize operations across multiple distributed workers, it
uses the `HiPSCat <https://hipscat.readthedocs.io/en/stable/>`_ data format to efficiently perform spatial
operations.

.. figure:: _static/gaia.png
:class: no-scaled-link
Expand All @@ -21,34 +22,56 @@ the `HiPSCat <https://hipscat.readthedocs.io/en/stable/>`_ data format for surve

A possible HEALPix distribution for Gaia DR3.

In this website you will find:

- Getting Started guides on how to :doc:`install <installation>` and run an :doc:`example workflow <tutorials/quickstart>`
- :doc:`Tutorials <tutorials>` with more advanced usage examples
- The detailed :doc:`API Reference <autoapi/index>` documentation
Using this Guide
-------------------------------------------------------------------------------
.. grid:: 1 1 2 2

.. grid-item-card:: Getting Started
:link: getting-started
:link-type: doc

Installation and QuickStart Guide

.. grid-item-card:: Tutorials
:link: tutorials
:link-type: doc

Learn the LSDB features by working through our guides

.. grid:: 1 1 2 2

.. grid-item-card:: API Reference
:link: autoapi/index
:link-type: doc

Learn more about contributing to this repository in our :doc:`Contribution Guide <developer/contributing>`.
The detailed API documentation

.. grid-item-card:: Contribution Guide
:link: developer/contributing
:link-type: doc

For developers, learn more about contributing to this repository

.. toctree::
:hidden:

Home page <self>
Installation <installation>
Getting Started <tutorials/quickstart>
Getting Started <getting-started>
Tutorials <tutorials>
Performance <performance>
API Reference <autoapi/index>

.. toctree::
:hidden:
:caption: Developer

API Reference <autoapi/index>
Contribution Guide <developer/contributing>
Contact us <contact>

.. toctree::
:hidden:
:caption: Developers

Contact us <contact>
Contribution Guide <developer/contributing>

Acknowledgements
-------------------------------------------------------------------------------
Expand Down
Loading

0 comments on commit 218e140

Please sign in to comment.