Skip to content

Latest commit

 

History

History
145 lines (113 loc) · 7.48 KB

README.md

File metadata and controls

145 lines (113 loc) · 7.48 KB

README

Major Changes

The "disk" region now refers to particles with r < 2 kpc instead of 2 kpc < r < 2 Rhalf. The latter is now referred to as the "outer" region.

Previously, each stage of the subhalo selection process (mass cut, photometric cut, spectral cut) was written out as it's own pickled dictionary of dictionaries, and there were separate dictionaries for the parent sample. Much of this was handled by illustris_cuts.py.

Now, all particle-based data for all parent subhalos (i.e., SFR within 2 kpc) is written to one pickle file (parent_particle_data.pkl) by particle_info.py and cuts can be applied during analysis. This cuts down on the overall amount of data written and can make the analysis simpler (i.e., only one dict of dicts has to be converted to numpy arrays).

Python Scripts

Command Line Arguments

All scripts use the same set of command line arguments:

  • z: redshift; currently either 0.0 or 0.5, or 0.1 if using TNG.
  • --no-inst: override default and don't include instantaneous SFR. Replaces a boolean set at the top of some of the scripts.
  • --no-dust: override default don't include dust in the spectra. Replaces a boolean set at the top of some of the scripts.
  • --tng: use TNG instead of the original Illustris.
  • -l,--local [DIR]: use a local copy of the full snapshot. The default is the location on Rusty, and will be set accordingly for the --tng flag.
  • -m,--gen-mocks: use FSPS spectra to determine mock magnitude instead of FITS generated by Illustris team.

Pipeline

The scripts should be run in the order listed below, and as demonstrated in pipeline.slurm (which is a batch job submission script for HPC resources using the SLURM queue system). All scripts should be run with the same command line arguments as set at the top of pipeline.slurm.

  1. download_cutouts and download_fits are for bulk downloading particle cutouts and mock FITs files, respectively. Will exit if using local snapshot data (--local flag).
  2. particle_info will generate a CSV for all subhalos with 1e10 Msun < Mstar < 1e12 Msun, saved to parent_particle_data.csv. See the section on Parent Particle Data for more details.
  3. stellar_spectra generates the mock spectra with FSPS, and will either include or disclude dust or the instantaneous SFR based on the --no-inst and --no-dust flags respectively. It will make spectra for multiple regions of the subhalo. TODO write folders if they don't exist instead of failing.
  4. disk_color calculates the color of the subhalo's disk either based on FITs files or spectra from stellar_spectra.py if using the --local flag. Uses functions from get_magnitudes.py. Outputs to disk_color.csv. NOTE: g-r from FITS uses the old disk definition (now called the "outer" region), while from spectra uses the new, r > 2 kpc definition.
  5. get_d4000 post-processes all FSPS spectra of the inner 2 kpc to calculate the D4000 measure (uses Tjitske's function) and saves them in the appropriate d4000 CSV file (depending on inclusion of dust and instantaneous SFR).
  6. galaxy_density calculates the number and mass density of galaxies with Mstar > 1e10 Msun around each subhalo in the parent sample. Outputs to local_densities.csv.

Utilities

  • utilities contains CLI arg parser and helper functions for downloading Illustris API data, splitting work among MPI tasks, and dealing with Illustris domain periodicity.
  • get_magnitudes contains functions for calculating magnitudes from either FITs files or from spectra. If calculating from spectra, the files SDSS_r_transmission.txt and SDSS_g_transmission.txt must be in the same directory as this script.

Using the Data

Parent Particle Data

The script particle_info.py produces "parent_particle_data.csv", which contains all subhalos with 1e10 Msun < Mstar < 1e12 Msun & half mass radius > 2 kpc. Each subhalo ID is a row in the CSV with information derived from particles and a boolean for satellite status. Each column is named following the format region_quantity.

Regions

  • total: All particles in the subhalo
  • inner: Particles with r < 2 kpc
  • outer: Particles with 2 kpc < r < 2 Rhalf
  • far: Particles with r > 2 Rhalf
  • disk: Particles with r > 2 kpc

Quantities

  • gas: gas mass in Msun
  • SFgas: mass of gas above the density threshold for star formation (0.13 1/cc, which is the Illustris value)
  • SFR: instantaneous star formation rate (calculated by Illustris from the gas) in Msun/yr
  • SFE: star formation efficiency, calculated as SFR/SFgas, in 1/yr

CSV Files

Three scripts will output CSV files: disk_color.py, get_d4000.py, and galaxy_density.py. These are sorted by Subhalo ID. Each file has a one-line header describing the contents and units

Sample Code

import pickle
import numpy as np

# Load pickle file
with open('parent_particle_data.pkl', 'rb') as f:
    particle_data = pickle.load(f)

# Load a CSV file
gr_subids, gr_data = np.genfromtxt('disk_color.csv', delimiter=',',
                                    skip_header=1, unpack=True)
gr_subids = gr_subids.astype(int)

# Make empty arrays to copy pickle data into
inner_gas = np.empty(len(particle_data))
outer_gas = np.empty_like(inner_gas)

for i, subid in enumerate(gr_subids.astype(int, copy=False)):
    try:
        inner_gas[i] = particle_data[k]['inner_gas'].value
        outer_gas[i] = particle_data[k]['outer_gas'].value
    except KeyError: # There is no gas
        inner_gas[i] = 0.0
        outer_gas[i] = 0.0

# Make a boolean array for the g-r cut. True entries are part of the cut.
gr_cut = gr_data > 0.655

# What is the average inner gas mass of g-r selected subhaloes?
print(np.average(inner_gas[gr_cut]))

Distributing Work Using MPI and scatter_work

import pickle
from mpi4py import MPI
from utilities import *

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

if rank == 0:
    # Assemble the subhalos IDs you want to operate on as a numpy array
    # This example assumes they live in a pickled dictionary
    with open("dict.pkl", "rb") as f:
        dictionary = pickle.read(f)
    subhalo_ids = np.array([k for k in dictionary.keys()])

    # Any secondary data to be broadcasted should also be read in on the
    # root processor; for instance, supplementary gas data
    with open("gas_data.pkl", "rb") as f:
        secondary_data = pickle.load(f)
    secondary_data = np.arange(5)
else:
    # Variable names need to be declared for any data you want to distribute
    subhalo_ids = None
    secondary_data = None

# This helper function from utilities.py pads and scatters the arrays
halo_subset = scatter_work(subhalo_ids, rank, size)

# Because scattered arrays have to be the same size, they are padded with -1
good_ids = np.where(halo_subset > -1)[0]

# Broadcast the secondary data normally
secondary_data = comm.bcast(secondary_data, root=0)

my_storage = {} # every rank needs their own way of story results, to be combined later
for halo in halo_subset[good_ids]:
    # do stuff

# Gather the individual results onto one process, stitch together, and save
result_lst = comm.gather(my_storage, root=0)
if rank==0:
    storage = {}
    for dic in result_lst:
        for k, v in dic.items():
            storage[k] = v
    with open("these_results.pkl", "wb") as f:
        pickle.dump(storage, f)
        
# If you want to broadcast the compiled data back out to all processes, add this:
else:
    storage = None
storage = comm.bcast(storage, root=0)