-
Notifications
You must be signed in to change notification settings - Fork 13
Quick start with experimental data
Here is a tutorial for analyzing experimental data. We recommend doing at least one simulated reconstruction before doing this tutorial. You can start here to do a quick reconstruction.
We will first get the data from the CXIDB, generate the configuration file, convert the data and then run the reconstruction.
As in the simulation case, we recommend creating a reconstruction directory to keep things separated. You can do this easily with the following script from the root directory:
./dragonfly_init -t spi
This should create a folder called spi_0001
and compile the various executables. In the next few sections, we will create the configuration file, replacing the default config.ini
.
We will work with data collected at the AMO end-station of the Linac Coherent Light Source (LCLS). This data was collected as part of an Single Particle Imaging (SPI) Initiative experiment in July/August 2015. The data has been published as Reddy, et al. Scientific Data 4, 170079 (2017).
As a first step, you will need to download the single hits from CXIDB. Follow the hdf5 link to download the 13 HDF5 files into a folder. Within each of these files, the dataset photonConverter/pnccdBack/photonCount
contains photon converted data of the 4x4 down-sampled pnCCD detector.
The experimental parameters section of the configuration file is given below:
[parameters]
detd = 586
lambda = 7.75
detsize = 260 257
pixsize = 0.3
stoprad = 40
ewald_rad = 650.
polarization = x
Most of the parameters are self-explanatory. For units and other details, see the configuration file page. The only additional parameter not seen in the simulation examples is ewald_rad
. This is the radius of curvature of the Ewald sphere in voxels. The size of the 3D grid is determined by the distance (in voxels) of the highest resolution detector pixel. The value of 650 was chosen to get a reasonable oversampling and this generates a 3D volume of 125x125x125 voxels.
Here is the make_detector
section of the configuration file
[make_detector]
in_mask_file = aux/mask_pnccd_back_260_257.byt
out_detector_file = data/det_pnccd_back.dat
In order to generate the detector, we will use a custom mask file representing the different pixel types. After adding this section to the configuration file, run ./utils/make_detector.py
in the reconstruction folder.
In order to convert the HDF5 data to emc
files, run the following command for each file:
./utils/convert/h5toemc.py -d photonConverter/pnccdBack/photonCount <HDF5_file>
This will create a file in the data
folder with the same filename (except extension) as the HDF5 file. To see other options, run ./utils/convert/h5toemc.py -h
. Other data and geometry conversion utilities are available in the utils/convert/
folder, but they are not necessary for this data set. You can look at the data using the frame viewer (./utils/frameviewer.py
). Click 'Random' a few times to see how things look.
The emc file header contains the total number of pixels in the detector, which this conversion script gets from the configuration file. In order to avoid warnings during the reconstruction, the [parameters]
section of the configuration file must be made before converting the data.
Here is the emc
section of the configuration file:
[emc]
in_photons_list = amo86615.txt
in_detector_file = make_detector:::out_detector_file
output_folder = data/
log_file = EMC.log
num_div = 10
need_scaling = 1
beta = 0.001
beta_schedule = 1.41421356 10
First, the in_photon_list
file needs to be created. This is just a text file with all the emc
file locations i.e.
data/amo86615_182_PR772_single.emc
data/amo86615_183_PR772_single.emc
...
The other new parameters are beta
and beta_schedule
. The β parameter is described in detail in the Dragonfly paper. In a nutshell, the orientation probablility distribution is raised to the power of β, and the beta_schedule
parameter is the so-called Deterministic Annealing schedule where β is multiplied by a factor of sqrt(2) every 10 iterations. These measures aid in smooth convergence when the signal is high, which it very much is in this data set.
On a single computer, one can just run
./emc 100
However, since there is a lot of data, this may be quite slow. We would recommend running this at your friendly neighborhood cluster with MPI. On 4 32-thread nodes at CFEL, the time per iteration goes from around 800 s at the beginning to around 240 s for the 100th iteration.
You can monitor the reconstruction using the autoplot
GUI described here.
Here is our output after 100 iterations: