Skip to content

Tutorial: Constraining Galacticus Parameters

Andrew Benson edited this page Sep 14, 2020 · 15 revisions

This tutorial guides you through constraining the parameters of a Galacticus model to achieve a good fit to an observational dataset. Galacticus has built-in MCMC functionality which can be used for this purpose.

Here we will use a very simple example - we'll constrain a single parameter of a Galacticus model to obtain a match to a single point in the stellar mass-halo mass relation of Leauthaud et al. (2012). Much more complex cases are possible (multiple parameters, coupled parameters, multiple target datasets etc.), but this simple example will illustrate the key features.

Running the simulation

For this tutorial we require Galacticus to be compiled with MPI parallelism. To do this:

make -j8 GALACTICUS_BUILD_OPTION=MPI Galacticus.exe

Note that you must have MPI installed for this to work.

Once compilation is completed, to run the tutorial model:

export OMP_NUM_THREADS=1
mpirun -np 4 Galacticus.exe parameters/tutorials/mcmcConfig.xml

The export OMP_NUM_THREADS=1 effectively switches off OpenMP parallelism (which we don't want to use for this tutorial). The mpirun -np 4 prefix command launches 4 parallel Galacticus processes which will communicate via MPI.

Expect this example to run for around - it should output something like this:


Understanding the input parameter file

You can view the complete input parameter file for this tutorial here. Here we'll focus on each section of the parameter file and understand what it does:

  <taskMethod value="posteriorSample">
    <initializeNodeClassHierarchy value="false"/>
  </taskMethod>

We begin by specifying the "task" to perform - we choose posteriorSample which causes Galacticus to run a posterior sampling simulation, which will generate a set of parameters sampled from the posterior distribution given some constraining datasets.

  <posteriorSampleLikelihoodMethod value="galaxyPopulation">
    <baseParametersFileName   value="parameters/tutorials/mcmcBase.xml"      />
    <failedParametersFileName value="./failedParameters.xml"/>
    <randomize                value="false"                                    />
    <evolveForestsVerbosity   value="0"                                        />
  </posteriorSampleLikelihoodMethod>
  <!-- MCMC -->
  <posteriorSampleSimulationMethod value="differentialEvolution">
    <stepsMaximum           value="1000"/>
    <acceptanceAverageCount value="    10"/>
    <stateSwapCount         value="     100"/>
    <logFileRoot            value="mcmcChains"/>
    <reportCount            value="10"/>
    <sampleOutliers         value="false"/>
    <logFlushCount          value="      1"/>
    <posteriorSampleStateMethod value="correlation">
      <acceptedStateCount value="100"/>
    </posteriorSampleStateMethod>
    <posteriorSampleStateInitializeMethod value="latinHypercube">
      <maximinTrialCount value="100"/>
    </posteriorSampleStateInitializeMethod>
    <posteriorSampleConvergenceMethod value="gelmanRubin">
      <thresholdHatR              value=" 1.30"/>
      <burnCount                  value="10"   />
      <testCount                  value="10"   />
      <outlierCountMaximum        value=" 1"   />
      <outlierSignificance        value=" 0.95"/>
      <outlierLogLikelihoodOffset value="60"   />
      <reportCount                value=" 1"   />
      <logFileName                value="mcmcConvergence.log"/>
    </posteriorSampleConvergenceMethod>
    <posteriorSampleStoppingCriterionMethod value="stepCount">
      <stopAfterCount value="10"/>
    </posteriorSampleStoppingCriterionMethod>
    <posteriorSampleDffrntlEvltnRandomJumpMethod   value="adaptive"/>
    <posteriorSampleDffrntlEvltnProposalSizeMethod value="adaptive" >
      <gammaInitial          value="0.500e+0"/>
      <gammaAdjustFactor     value="1.100e+0"/>
      <gammaMinimum          value="1.000e-4"/>
      <gammaMaximum          value="3.000e+0"/>
      <acceptanceRateMinimum value="0.100e+0"/>
      <acceptanceRateMaximum value="0.900e+0"/>
      <updateCount           value="10"     />
    </posteriorSampleDffrntlEvltnProposalSizeMethod>
     <!-- Feedback -->
     <modelParameterMethod value="active">
       <name value="nodeOperatorMethod::nodeOperatorMethod[2]::stellarFeedbackOutflowsMethod::stellarFeedbackOutflowsMethod::velocityCharacteristic"/>
       <distributionFunction1DPrior value="uniform">
	 <limitLower value="25.0"/>
	 <limitUpper value="500.0"/>
       </distributionFunction1DPrior>
       <operatorUnaryMapper value="identity"/>
       <distributionFunction1DPerturber value="cauchy">
	 <median value="0.0"/>
	 <scale value="1.0e-3"/>
       </distributionFunction1DPerturber>
     </modelParameterMethod>
  </posteriorSampleSimulationMethod>
  <!-- Random seed -->
  <randomNumberGeneratorMethod value="GSL">
    <seed          value="219" />
    <mpiRankOffset value="true"/>
  </randomNumberGeneratorMethod>

The "Base" Parameter File

In addition to the mcmcConfig.xml parameter file that we examined above, a second parameter file is used in this tutorial. As explained above when discussing the posteriorSampleLikelihoodMethod section, we supply a parameter file which defines the "base" model, which will then have its parameters varied to attempt to find a good fit to the target dataset. In this example that base parameter file is parameters/tutorials/mcmcBase.xml. This is just a normal Galacticus parameter file, using the evolveForests task, that you could run directly to generate a model if you wanted to. The only important feature of it is that it contains each parameter which will be varied as specified in the mcmcConfig.xml file. In this case it contains a section:

  <!-- Node evolution and physics -->
  <nodeOperatorMethod value="multi">
    <!-- Star formation options -->
    <nodeOperatorMethod value="starFormationDisks"    >
      <luminositiesStellarInactive value="true"/>
    </nodeOperatorMethod>
    <nodeOperatorMethod value="starFormationSpheroids">
      <luminositiesStellarInactive value="true"/>
    </nodeOperatorMethod>
    <!--Stellar feedback outflows-->
    <nodeOperatorMethod value="stellarFeedbackDisks">
      <stellarFeedbackOutflowsMethod value="rateLimit">
        <timescaleOutflowFractionalMinimum value="0.001"/>
        <stellarFeedbackOutflowsMethod value="powerLaw">
          <velocityCharacteristic value="175.0"/>  <!-- This is the parameter being varied -->
          <exponent               value="  3.5"/>
        </stellarFeedbackOutflowsMethod>
      </stellarFeedbackOutflowsMethod>
    </nodeOperatorMethod>
    <nodeOperatorMethod value="stellarFeedbackSpheroids">
      <stellarFeedbackOutflowsMethod value="rateLimit">
        <timescaleOutflowFractionalMinimum value="0.001"/>
        <stellarFeedbackOutflowsMethod value="powerLaw">
          <velocityCharacteristic value=" 50.0"/>
          <exponent               value="  3.5"/>
        </stellarFeedbackOutflowsMethod>
      </stellarFeedbackOutflowsMethod>
    </nodeOperatorMethod>
  </nodeOperatorMethod>

In the mcmcConfig.xml file we referred to a parameter "nodeOperatorMethod::nodeOperatorMethod[2]::stellarFeedbackOutflowsMethod::stellarFeedbackOutflowsMethod::velocityCharacteristic". This refers to the parameter indicated (by a comment) in the above - we first find the nodeOperatorMethod element in mcmcBase.xml. Next we look for the third nodeOperatorMethod element within that element (remember that nodeOperatorMethod[2] indexes the third nodeOperatorMethod element as our indexing starts at 0). Next we find the stellarFeedbackOutflowsMethod element inside that element, then the stellarFeedbackOutflowsMethod inside that element, and finally locate the velocityCharacteristic element inside that element. It is the value of this parameter which will be varied.

Of course, you can specify any other parameters in the base parameter file also - their values will be fixed throughout the MCMC simulation.

The other key part of the base parameter file is in defining the target datasets to be used to constrain the model. These make use of Galacticus' ability to compute predictions for observables as it runs. The base parameter file contains a section:

  <!-- Analyses -->
  <outputAnalysisMethod value="stellarVsHaloMassRelationLeauthaud2012" >
    <redshiftInterval                     value="1"      />
    <computeScatter                       value="false"  />
    <systematicErrorPolynomialCoefficient value="0.0 0.0"/>
    <likelihoodBin                        value="11"     />
  </outputAnalysisMethod>

This tells Galacticus to compute its prediction for the stellar mass-halo mass relation of Leauthaud et al. (2012) - specifically in the first redshift interval (Leauthaud et al. (2012) compute the relation in three redshift intervals), and to compute the likelihood of the model given this dataset using only halo mass bin number 11 (since for this simple tutorial we only compute halos falling within that bin). The computeScatter option is set to false such that we compute the mean of the relation - if it were instead set to true the scatter in the relation would be computed instead. The systematicErrorPolynomialCoefficient parameter allows for the possibility of including and constraining a model of observational systematic errors here, but we ignore it for now (setting the values of this parameter to zero).

Understanding the output

Clone this wiki locally