myXCLASSFit

Fitting single spectra

The myXCLASSFit function offers the possibility to fit multiple frequency ranges in multiple files from multiple telescopes simultaneously. It can be used with different optimization algorithms to optimize the input parameters defined in an molfit file to achieve a good description of the observational data. Details of the myXCLASSFit are described in Sect. “myXCLASSFit”.

The myXCLASSFit function requires at least the following three input files

molfit file, see Sect. “The molfit file”, to define which molecules are taken into account and the corresponding components,
observational xml, see Sect. “Observational xml file” , to control the import of the observational data,
algorithm xml file, see Sect. “Algorithm xml file” to specify the optimization algorithm.

In addition to these three input files, the iso ratio file for describing the relationships between molecules and their isotopologues, see Sect. “The iso ratio file”, may also be required.

Example call of the myXCLASSFit function:

>>> from xclass import task_myXCLASSFit
>>> import os

# get path of current directory
>>> LocalPath = os.getcwd() + "/"

# set path and name of molfit file
>>> MolfitsFileName = LocalPath + "files/my_molecules.molfit"

# set path and name of observational (obs.) xml file
>>> ObsXMLFileName = LocalPath + "files/my_observation.xml"

# set path and name of algorithm (alg.) xml file
>>> AlgorithmXMLFileName = LocalPath + "files/my_algorithm__trf.xml"

# set optimization package
>>> Optimizer = "scipy"

# define that results of myXCLASSFit function are return as dictionary
>>> DictOutFlag = True

# call myXCLASSFit function
>>> OutDict = task_myXCLASSFit.myXCLASSFitCore( \
                            MolfitsFileName = MolfitsFileName, \
                            ObsXMLFileName = ObsXMLFileName, \
                            AlgorithmXMLFileName = AlgorithmXMLFileName, \
                            Optimizer = Optimizer, \
                            DictOutFlag = DictOutFlag)

Observational xml file

The observational xml file (obs. xml file) is used to describe the import of the observational data within an xml file. Here, each parameter is defined by so-called tags, i.e. <TagName>TagValue</TagName>.

The obs. xml file has to start with

<?xml version="1.0" encoding="UTF-8"?>

All tags in the obs. xml file has to be enclosed by the <ExpFiles> tag, i.e.

<ExpFiles>
..
</ExpFiles>

The tag <NumberExpFiles> defines the number of observational data files and has to be \(>=1\). All parameters for each observational data file has to be enclosed by the <file> tag, e.g.

<NumberExpFiles>2</NumberExpFiles>
<file>
..
</file>
<file>
..
</file>

The tag <FileNamesExpFiles> defines the path and name of the corresponding observational data file.
For each observational data file the tag <ImportFilter> describes the type of the observational data file. For ASCII files the tag has to be set to xclassASCII, for FITS files to xclassFITS, respectively.
The number of ranges <NumberExpRanges> must be defined for each file and must be \(>=1\). The tag must also be set if the whole file is used.
For each frequency range the tags <MinExpRange> and <MaxExpRange> describe the lowest and highest frequency, respectively.
The step frequency of the simulated spectrum has to be given by the tag <StepFrequency>.
For each frequency range, the user can define a simple phenomenological description of the background continuum

(52)\[I_{\rm bg} (\nu) = T_{\rm bg} \cdot \left(\frac{\nu}{\nu_{\rm min}} \right)^{T_{\rm slope}}\]

using the tags <BackgroundTemperature> (for the background temperature in~K) and <TemperatureSlope> (for the temperature slope, dimensionless). The tag <t_back_flag> indicates if the user defined background continuum (described by <BackgroundTemperature> and <TemperatureSlope>) describe the continuum contribution completely (True) or if continuum contributions defined in the molfit file are taken into account as well (False).
(optional) The user can specify path and name of an ASCII file describing the background intensity between the cosmic microwave background and the components at the largest distance as function of frequency (in~MHz) by using tag <BackgroundFileName>.
(optional) In order to specify global values for the parameters describing the dust contribution for each frequency range, see (7), the tags <HydrogenColumnDensity> (for the hydrogen column density in cm \(^{-2}\)), <DustBeta> (for the dust spectral index, dimensionless), and <Kappa> (in cm \(^{2}\) g \(^{-1}\)) can be used. Additionally, the user can specify path and name of an ASCII file describing the dust optical depth as function of frequency (in MHz) by using tag <DustFileName>.
(optional) In order to define global values for the advanced phenomenological description of the continuum for each frequency range, the user has to define the tags <ContPhenFuncID> (for selecting the continuum function, dimensionless), <ContPhenFuncParam1> (for the first continuum parameter, dimensionless), <ContPhenFuncParam2> (for the second continuum parameter, dimensionless), <ContPhenFuncParam3> (for the third continuum parameter, dimensionless), <ContPhenFuncParam4> (for the fourth continuum parameter, dimensionless), and <ContPhenFuncParam5> (for the fifth continuum parameter, dimensionless).
(optional, for myXCLASSMapFit and myXCLASSMapRedoFit function only) The tag <Threshold> defines a threshold (in~K) for each frequency range for a pixel. If the spectrum of a pixel has an max. intensity lower than the value defined by this parameter the pixel is not fitted (ignored).
(optional, for LineID function only) The tag <NoiseLevel> describes the noise level (in~K) for each frequency range. All parts of the spectrum with intensities lower than the noise level are ignored.
For each observation file the user has to specify the size of the telescope ( \(>0\)) by defining the tag <TelescopeSize> (for circle-shaped beams) or by tags <BMIN>, <BMAJ> and <BPA> for elliptical rotated beams. Additionally, the tag <Inter_Flag> indicates if single dish or interferometric observations are described. If the sub-beam description is deactivated and an elliptical beam is still used, XCLASS assumes a circular beam and calculates an average beam size from BMAJ and BMIN.
Using the tag <GlobalvLSR> the user can defined different local standard of rest velocities ( :math: v_{rm LSR}) for each obs. data file. Thereby, the value defined by the input parameter vLSR is ignored.
(optional) The tag <Redshift> indicates the red-shift for the corresponding obs. data file.
(optional) The tags <ErrorY>, <NumberHeaderLines>, and <SeparatorColumns> control the import of the ASCII file containing the observational data. Please note, that these tags are read only if the corresponding obs. data file is an ASCII file, i.e. that the tag <ImportFilter> is set to xclassASCII. If tag <ErrorY> is set to True, the ASCII file has to contain an additional 3rd column, describing the errors (or weigths) for each frequency channel. These weights are used in the computation of the \(\chi^2\) function

(53)\[\chi^2 = \sum_{i=1}^N \left[ \left(y_i^\mathsf{obs} - y_i^\mathsf{fit} \right)^2 \cdot \frac{1}{\left(\sigma_i^\mathsf{error}\right)^2} \right],\]

where \(\sigma_i^{\rm error}\) represents the error of the i th data point.
In order to use isotopologues, the user has to set the tag <iso_flag> to True and to specify path and name of an iso ratio file using tag <IsoTableFileName>. If no isotopologues are used, tag <iso_flag> has to be set to False.
(optional) Local-overlap is taken into account by setting tag <LocalOverlap_Flag> to True otherwise to False, see Sect. “Local-overlap of neighboring lines”.
(optional) The number of model pixels along x- and y-direction required for the sub-beam description are defined by tags <NumModelPixelXX> and <NumModelPixelYY>, respectively.
(optional) Deactivate sub-beam description, see Sect. “Sub-beam description”, by setting the <NoSubBeamFlag> to True otherwise to False.
(optional) In order to use pre-computed emission and absorption functions used in (5) for different layer distances the user can define the path of a directory, using tag <EmAbsPATH>, containing ASCII files describing both functions for different frequencies and distances.
(optional) By default XCLASS uses the SQLite3 database file cdms_sqlite.db located in the directory defined in the xclass/init.dat file. In order to use a different database file, the user has to define the path and name of another database file using the tag <dbFilename>.

Example observational xml file used by many XCLASS functions:

<?xml version="1.0" encoding="UTF-8"?>
<ExpFiles>


    <!-- define number of experimental data file(s) -->
    <NumberExpFiles>1</NumberExpFiles>


    <!-- **************************************************************** -->
    <!-- define parameters for first obs. data file -->
    <file>


        <!-- define path and name of obs. data file -->
        <FileNamesExpFiles>demo/myXCLASSFit/band1b.dat</FileNamesExpFiles>


        <!-- define format of obs. data file -->
        <ImportFilter>xclassASCII</ImportFilter>


        <!-- define number of frequency ranges in the current data file -->
        <NumberExpRanges>1</NumberExpRanges>


        <!-- define parameters for each frequency range -->
        <FrequencyRange>


            <!-- define min., max. and step frequency -->
            <MinExpRange>580102.0</MinExpRange>
            <MaxExpRange>580546.5</MaxExpRange>
            <StepFrequency>0.5</StepFrequency>


            <!-- define parameters describing the background -->
            <t_back_flag>False</t_back_flag>
            <BackgroundTemperature>0.88</BackgroundTemperature>
            <TemperatureSlope>3.0</TemperatureSlope>
            <BackgroundFileName>background.dat</BackgroundFileName>


            <!-- define parameters describing dust contribution -->
            <HydrogenColumnDensity>3.0e+24</HydrogenColumnDensity>
            <DustBeta>2.0</DustBeta>
            <Kappa>0.42</Kappa>
            <DustFileName>jena_thin_no__MHz.dat</DustFileName>
        </FrequencyRange>


        <!-- define source velocity (in km/s) -->
        <GlobalvLSR>0.0</GlobalvLSR>


        <!-- define redschift -->
        <Redshift>1.1</Redshift>


        <!-- define size of minor axis (in arcsec) -->
        <BMIN>2.098</BMIN>


        <!-- define size of major axis (in arcsec) -->
        <BMAJ>2.432</BMAJ>


        <!-- define degree of rotation (in degree) -->
        <BPA>20.0</BPA>


        <!-- define if interferometric observation is modeled -->
        <Inter_Flag>False</Inter_Flag>


        <!-- define parameters (for ASCII file import only) -->
        <ErrorY>no</ErrorY>
        <NumberHeaderLines>1</NumberHeaderLines>
        <SeparatorColumns> </SeparatorColumns>
    </file>


    <!-- ******************************************************* -->
    <!-- parameters for isotopologues -->
    <iso_flag>True</iso_flag>
    <IsoTableFileName>demo/myXCLASS/iso_names.txt</IsoTableFileName>


    <!-- **************************************************** -->
    <!-- define if local-overlap is taken into account or not -->
    <LocalOverlap_Flag>True</LocalOverlap_Flag>


    <!-- **************************************************** -->
    <!-- deactivate sub-beam description -->
    <NoSubBeamFlag>True</NoSubBeamFlag>


    <!-- ***************************************************** -->
    <!-- define number of model pixel along x- and y-direction -->
    <NumModelPixelXX>500</NumModelPixelXX>
    <NumModelPixelYY>500</NumModelPixelYY>


    <!-- *************************************************** -->
    <!-- define path and name of database file -->
    <dbFilename>Database/cdms_sqlite__2016-06-15.db</dbFilename>
</ExpFiles>

Algorithm xml file

The algorithm xml file (alg. xml file) is used to describe the optimization algorithms, which are used to fit the model parameters defined in the molfit and other input files (iso-ratio) to the observational data. Here, each parameter is defined by so-called tags, i.e. <TagName>TagValue</TagName>.

The alg. xml file has to start with

<?xml version="1.0" encoding="UTF-8"?>

All tags in the alg. xml file has to be enclosed by the <FitControl> tag, i.e.

<FitControl>
..
</FitControl>

The tag <NumberOfFitAlgorithms> defines the number of optimization algorithms and has to be \(>=1\). All parameters for each algorithm has to be enclosed by the <algorithm> tag, e.g.

<NumberExpFiles>2</NumberExpFiles>
<algorithm>
..
</algorithm>
<algorithm>
..
</algorithm>

The tag <FitAlgorithm> defines the name of the optimization algorithm. The following algorithms are available:
- “trf”: (Trust Region Reflective algorithm, only used with optimizer SCIPY or MapFit function, parallelized) a fast local optimization algorithm that takes into account the parameter limits, i.e. the algorithm guarantees that the fitting parameters remain within the user-defined parameter limits.
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “dogbox”: (dogleg algorithm with rectangular trust region, only used with optimizer SCIPY or MapFit function, parallelized), very similar to the aforementioned “trf” algorithm.
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “lm”: the Levenberg-Marquardt algorithm as implemented in MINPACK but without taking parameter limits into account! Please use “trf” algorithm instead.
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “nelder-mead”: (Nelder-Mead algorithm, only used with optimizer SCIPY or MapFit function, not parallelized) a local optimization algorithm
- “powell”: (modified Powell algorithm, only used with optimizer SCIPY or MapFit function, not parallelized) a local optimization algorithm
- “cobyla”: (Constrained Optimization BY Linear Approximation (COBYLA) algorithm, only used with optimizer SCIPY or MapFit function, not parallelized) a local optimization algorithm.
- “cg”: (conjugate gradient algorithm, only used with optimizer SCIPY or MapFit function, parallelized) a local optimization algorithm
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “bfgs”: (BFGS algorithm, only used with optimizer SCIPY or MapFit function, parallelized) a local optimization algorithm
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “l-bfgs-b”: (L-BFGS-B algorithm, only used with optimizer SCIPY or MapFit function, parallelized) a local optimization algorithm
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “slsqp”: (Sequential Least Squares Programming (SLSQP) Algorithm, only used with optimizer SCIPY or MapFit function, parallelized) a local optimization algorithm
  
  Additional tags required by this algorithm:
  - tag <VariationValue> sets the value of variation (in percent) for the calculation of the gradient of the \(\chi^2\) function.
- “basinhopping”: (Basin-Hopping Algorithm, only used with optimizer SCIPY or MapFit function, parallelized) a global optimization algorithm
- “brute”, “bruteforce”, “brute-force”: (brute force, only used with optimizer SCIPY or MapFit function, parallelized) a global optimization algorithm.
  
  Additional tags required by this algorithm:
  - tag <BruteSteps> can be used to define the number of grid points along each parameter axis as python list. (Each element indicates the number of grid points for the corresponding fit parameter).
- “differential_evolution”: (Differential Evolution, only used with optimizer SCIPY or MapFit function, parallelized) a global optimization algorithm.
- “dual_annealing”: (Dual Annealing, only used with optimizer SCIPY or MapFit function, parallelized) a global optimization algorithm.
- “error-estimation”, “errorestim_ins”: estimate error of model parameters using “mcmc” or “ultranest”, parallelized.
  
  Additional tags required by this algorithm:
  - tag <NumberSampler> sets the number of sampler,
  - tag <MultiplicitySigma> sets multiplicity of standard deviation (1-sigma, 2-sigma, 3-sigma, etc.),
  - tag <ErrorType> sets the type of error range: [“Gaussian”, “Percentile”, “HPD”, “UltraNest”].
- “mcmc”, “emcee”: (parallelized) The emcee 1 package [ForemanMackey_2013], implements the affine-invariant ensemble sampler of [Goodman_2010], to perform a full-parallelized MCMC algorithm,
  
  Additional tags required by this algorithm:
  - tag <ErrorMethod> sets the used algorithm to compute errors, i.e. mcmc, ultranest, or ins.
  - tag <NumberSampler> sets the number of sampler
  - tag <BackendFileName> describes the path and name of the backend file, which is used to store and resume an interrupted MCMC run. If the given file does not exists before the MCMC run starts, XCLASS stores all MCMC parameters into a HDF5 file. In order to resume an interrupted MCMC run, the path and name of the corresponding HDF5 file has to be defined by this tag.
  - tag <SampleMethod> sets the initial sampling method. If the tag is set to local, the initial starting values of the MCMC samplers are drawn in a small sphere around the initial parameter values. If the tag is set to global, the initial starting values are randomly distributed within the given parameter limits.
  - tag <NumberBurnInIter> sets the number of iterations for burn-in phase
- “ultranest”: (parallelized) UltraNest 2 is a general-purpose Bayesian inference package [Buchner_2016], [Buchner_2019] for parameter estimation and model comparison. It is especially useful for multi-modal or non-Gaussian parameter spaces, computational expensive models, in robust pipelines. Furthermore, UltraNest is intended for fitting complex physical models with slow likelihood evaluations, with one to hundreds of parameters. In addition, it intends to replace heuristic methods like multi-ellipsoid nested sampling and dynamic nested sampling with more rigorous methods.
  
  Additional tags required by this algorithm:
  - tag <NumberSampler> sets the number of sampler
  - tag <BackendFileName> describes the path and name of the backend file, which is used to store and resume an interrupted UltraNest run.
  - tag <dictionary> describes additional parameters for the UltraNest package as python dictionary
  - tag <NumberBurnInIter> sets the number of iterations for burn-in phase
- “genetic”: (Genetic algorithm, only used with optimizer MAGIX, parallelized) a global optimization algorithm
  - tag <BestSiteCounter> sets the number of best sites.
- “pso”: (Particle swarm optimization, only used with optimizer MAGIX, parallelized) a global optimization algorithm
  - tag <BestSiteCounter> sets the number of best sites.
- “ins”: (Interval nested sampling algorithm, only used with optimizer MAGIX, parallelized) a global optimization algorithm
  - tag <vol_bound> get critical bound for volume,
  - tag <delta_incl> defines the difference between maximal and minimal value of inclusion function,
- “ns”: (Nested sampling algorithm, only used with optimizer MAGIX, parallelized) a global optimization algorithm.
- “bees”: (Bees algorithm, only used with optimizer MAGIX, parallelized) a global optimization algorithm
  - tag <BestSiteCounter> sets the number of best sites,
  - tag <NumberBees> sets the number of bees.
- “blank”: (only used with optimizer SCIPY or MapFit or CubeFit function) using this method the fit parameters are not optimized, but the synthetic spectra based on the given parameter values are computed.
The tag <number_iterations> (or <NumberIterations>) defines the max. number of iterations for the corresponding optimization algorithm and has to be \(>=1\).
The tag <NumberProcessors> defines the max. number of cores used for the corresponding optimization algorithm and has to be \(>=1\).

Note that a value \(>1\) is considered only for parallelized algorithms.
The tag <limit_of_chi2> (or <LimitChi2>) defines the stopping criterion for the \(\chi^2\) function. If the \(\chi^2\) function drops below this threshold, the algorithm is stopped.
The tag <RenormalizedChi2> defines if a renormalized \(\chi^2\) function is used.

\[\left(\chi^2_{\rm limit} \right)_\mathsf{renom} = \left(\sum_{i=1}^{N_\mathsf{exp}} \cdot N_\mathsf{points}(i) - N_\mathsf{par}\right) \cdot \left(\chi^2_{\rm limit} \right)_\mathsf{orig},\]

where \(N_{\rm exp}\) is the number of observation files \(N_{\rm points}(i)\) indicates the number of observation data points in the observation file \(i\); \(N_{\rm par}\) is the total number of all fit parameters; \(\left(\chi^2_{\rm limit} \right)_\mathsf{\rm orig}\) is the original unmodified value of \(\chi^2\).
The tag <SaveChi2> indicates the storage of the \(\chi^2\) function.
With the optimizer MAGIX, see Sect. “myXCLASSFit” the following optimization algorithms are available as MPI-parallelized implementations

“Levenberg-Marquardt”, “genetic”, “pso”, “ins”, “ns”, and “bees”.
With optimizer MAGIX the tag <MPIHostFileName> defines path and name of a host file used for MPI parallelized algorithms.
With optimizer SCIPY the tag <LogChi2> indicates the usage of a log file containing all \(\chi^2\) values with the corresponding parameter vectors computed during the application of each optimization algorithm.
With optimizer SCIPY the tag <PlotFlag> indicates the creation of plots describing the obs. data together with the synthetic spectra and \(\chi^2\) function for each obs. data file.

Example of an algorithm xml file:

<?xml version="1.0" encoding="UTF-8"?>
<FitControl>
    <!-- settings for fit process -->


    <!-- set number of used algorithms -->
    <NumberOfFitAlgorithms>2</NumberOfFitAlgorithms>


    <algorithm>
        <!-- define algorithm -->
        <FitAlgorithm>bees</FitAlgorithm>


        <!-- set max. number of iterations -->
        <number_iterations>30</number_iterations>


        <!-- set max. number of processors -->
        <NumberProcessors>8</NumberProcessors>


        <!-- set path and name of host file -->
        <MPIHostFileName>hostfile.txt</MPIHostFileName>


        <!-- settings for chi^2 -->
        <limit_of_chi2>0.001</limit_of_chi2>
        <RenormalizedChi2>yes</RenormalizedChi2>
        <DeterminationChi2>default</DeterminationChi2>
        <SaveChi2>yes</SaveChi2>


        <!-- set plot options -->
        <PlotAxisX>Frequency [Hz]</PlotAxisX>
        <PlotAxisY>Intensity</PlotAxisY>
        <PlotIteration>yes</PlotIteration>
    </algorithm>

    <algorithm>
        <!-- define algorithm -->
        <FitAlgorithm>Levenberg-Marquardt</FitAlgorithm>


        <!-- set max. number of iterations -->
        <number_iterations>20</number_iterations>


        <!-- set max. number of processors -->
        <NumberProcessors>8</NumberProcessors>


        <!-- set path and name of host file -->
        <MPIHostFileName>hostfile.txt</MPIHostFileName>


        <!-- settings for chi^2 -->
        <limit_of_chi2>0.0008</limit_of_chi2>
        <RenormalizedChi2>yes</RenormalizedChi2>
        <DeterminationChi2>default</DeterminationChi2>
        <SaveChi2>yes</SaveChi2>


        <!-- set plot options -->
        <PlotAxisX>Frequency [Hz]</PlotAxisX>
        <PlotAxisY>Intensity</PlotAxisY>
        <PlotIteration>yes</PlotIteration>
    </algorithm>
</FitControl>

Footnotes

1: https://emcee.readthedocs.io/en/stable/
2: https://johannesbuchner.github.io/UltraNest/index.html

References

ForemanMackey_2013: Foreman-Mackey, Daniel, David W. Hogg, Dustin Lang, and Jonathan Goodman. 2013. “Emcee: The MCMC Hammer.” Publications of the Astronomical Society of the Pacific 125 (925): 306. https://doi.org/10.1086/670067.
Goodman_2010: Goodman, Jonathan, and Jonathan Weare. 2010. “Ensemble Samplers with Affine Invariance.” Communications in Applied Mathematics and Computational Science 5 (1): 65–80.
Buchner_2016: Buchner, Johannes. 2016. “A statistical test for Nested Sampling algorithms.” Statistics and Computing 26 (1-2): 383–92. https://doi.org/10.1007/s11222-014-9512-y.
Buchner_2019: ———. 2019. “Collaborative Nested Sampling: Big Data versus Complex Physical Models” 131 (1004): 108005. https://doi.org/10.1088/1538-3873/aae7fc.