.. _lineid-short: LineID ====== This function identifies molecules in a given spectrum and returns a quantitative description of the data. Details of the input parameters are described in Sect. ":ref:`api-lineid`". In general, the **LineIdentification** function (or LineID function) takes all molecules into account which have at least one transition within the user-defined frequency range(s) [1]_. These molecules are stored into a so-called *molecule list*. In order to determine the contribution of each molecule, the **LineIdentification** function performs so-called *single molecule fits* for each molecule. If a molecule covers a defined fraction of the spectrum the molecule is “for now” identified and the optimized molfit file is append to a so-called *overall molfit file* which describes the contribution of all identified molecules. After all single molecule fits are done, the **LineIdentification** function performs a *final fit*, using the overall molfit file created before. |br| |br| Here's an example how to use the **LineID** function of the *XCLASS* package: :: >>> from xclass import task_LineIdentification >>> import os # get path of current directory >>> LocalPath = os.getcwd() + "/" # define path and name of default molfit file >>> DefaultMolfitFile = LocalPath + "files/my_LineID__default.molfit" # define path and name of obs. xml file >>> ObsXMLFileName = LocalPath + "files/my_observation__LineID.xml" # define list of molecules with are analyzed >>> SelectedMolecules = ["HCCCN;v=0;", "CH3OH;v=0;", "C2H5OH;v=0;", \ "CH3CN;v=0;", "SO;v=0;", "SO2;v=0;"] # define upper limit of overestimation >>> MaxOverestimationHeight = 500.0 # define tolerance >>> Tolerance = 65.0 # define path and name of algorithm xml files >>> AlgorithmXMLFileSMF = LocalPath + "files/my_algorithm-settings.xml" >>> AlgorithmXMLFileOverAll = LocalPath + "files/my_algorithm-settings.xml" # define lower limit for column density of core components >>> MinColumnDensityEmis = 0.0 # define lower limit for column density of foreground components >>> MinColumnDensityAbs = 0.0 # define source name >>> SourceName = "" # define list of so-called strong molecules >>> StrongMoleculeList = [] ## define path and name of cluster file >>> clusterdef = "" # call LineID function >>> IdentifiedLines, JobDir = task_LineIdentification.LineIdentificationCore( MaxOverestimationHeight = MaxOverestimationHeight, \ SourceName = SourceName, \ DefaultMolfitFile = DefaultMolfitFile, \ Tolerance = Tolerance, \ SelectedMolecules = SelectedMolecules, \ StrongMoleculeList = StrongMoleculeList, \ MinColumnDensityEmis = MinColumnDensityEmis, \ MinColumnDensityAbs = MinColumnDensityAbs, \ AlgorithmXMLFileSMF = AlgorithmXMLFileSMF, \ AlgorithmXMLFileOverAll = AlgorithmXMLFileOverAll, \ experimentalData = ObsXMLFileName, \ clusterdef = clusterdef) If an iso ratio file is defined, the **LineIdentification** function determines the isotopologues of all molecules in the molecule list using the given iso ratio file. The identified isotopologues are removed from the list of molecules for whom a single molecule fit has to be done. All single molecule fits are done without using isotopologues. The user defined iso ratio file is used again for the final overall fit, where all iso-molecules and their definitions, which were not identified, are removed. Please note, the *XCLASS* package offers the possibility to optimize iso ratios as well, see Sect. ":ref:`myxclass-iso`". The **LineIdentification** function offers different possibilities to control the single molecule fits, e.g. by using a so-called *default molfit file* or a so-called *source molfit file*, see below. The user is free to use the Levenberg-Marquardt algorithm for all fits, or to define a MAGIX xml file, described in Sect. ":ref:`myxclassfit-alg-xml-file`", which describes settings for another algorithm or for an algorithm tree. So, the user is able to control the accurateness of the whole line identification process. Additionally, the **LineIdentification** function is able to identify molecules in more than one spectrum (or frequency range). Hereby, the **LineIdentification** function performs single molecule fits where all frequency ranges of all spectra are fitted simultaneously. If a molecule contributes significantly to at least one frequency range and does not lead to artificial features in the modeled spectrum, the optimized molfit file of the current molecule is append to an overall molfit file. At the end of the line identification process the **LineIdentification** function fits all frequency ranges simultaneously using the overall molfit file containing all identified molecules. Depending on the number of identified molecules and used velocity components, the **LineIdentification** function has to optimize more than a few thousand free parameters in the final overall fit. The **LineIdentification** function creates a subdirectory within the job directory called ``single-molecule_fits``. Within this single molecule fit directory, the **LineIdentification** function creates subdirectories for each molecule from the database (or template file etc.). All files produced by the single molecule fits are stored in these subdirectories. Please note, that the original xml- and data files are not modified. Only the copies of these files located in the single molecule fit directories are modified! In addition, the **LineIdentification** function converts the column density :math:`N_{\rm tot}` and (if the hydrogen column density :math:`N_H` is given for each component) the hydrogen column density :math:`N_H` as well to a log scale, i.e. these two densities are converted automatically to their log10 values to get a better fit. At the end of the fitting process, the log10 values are converted back to the linear values. Note, the **LineIdentification** function offers the possibility to perform more than one single molecule fit at the same time using a cluster consisting of at least two computers. The nodes (computers) of the cluster are defined in the “cluster configuration file”, see input parameter ``clusterdef`` described below. For example, a cluster consisting of three nodes with four cores respectively offers the possibility to perform 12 (:math:`3 \times 4`) single molecule fits simultaneously. Please note, that the total number of cores used by the **LineIdentification** function is mostly even higher, because each single molecule fit requires further cores defined in the algorithm xml file as well. For example, using the Levenberg-Marquardt algorithm with eight processors (cores) for a single molecule fit on a cluster described above requires 96 (:math:`8 \times 12`) cores altogether. .. _`LineID:StrongMoleculeFits`: Strong molecule fits -------------------- In order to take the contribution of one or more so-called strong (i.e. highly abundant) molecules for each single molecule fit into account, the user can define a list of strong molecules. The **LineIdentification** function performs the single molecule fits for these strong molecules first, where it starts with the first strong molecule. If this molecule contributes to the given spectra, the optimized molfit file is append to all molfit files of all other molecules. For example, if the first strong molecule contributes to the given spectra, the optimized molfit file is append to the molfit file related to the second strong molecule and so on. Please note, the (optimized) parameters describing the contribution of a strong molecules are kept constant for all other single molecule fits. These parameters will be optimized only in the final overall fit. .. _`LineID:Contribution`: Does a molecule contribute? --------------------------- After finishing a single molecule fit the **LineIdentification** function reads in the modeled spectra for all frequency ranges and checks, if the modeled spectra contains at least one peak with intensity above the user defined noise level(s). (All peaks in the modeled spectra below the noise level(s) are completely ignored.) Thereafter, the **LineIdentification** function searches for artificial peaks, i.e. peaks in the modeled spectrum above the noise level, which have no corresponding peak in the observational data. For that purpose, the user has to define the global overestimation factor (in %, input parameter ``MaxOverestimationHeight``) valid for all single molecule fits. The **LineIdentification** function compares the intensities of the modeled and the observed spectra at the doppler-shifted transition frequencies :math:`\nu_{\rm Doppler}^i = \nu_t^i \cdot \left(1 - \frac{v_{\rm offset}^{m,c}}{c_{\rm light}} \right)`, where :math:`\nu_t^i` represents the :math:`i`\ th non-Doppler shifted transition frequency taken from the database and :math:`v_{\rm offset}^c` the velocity offset of component :math:`c` taken from the molfit file for the current molecule :math:`m`. By calculating the fraction :math:`\eta_{\rm Peak}` of intensities of modeled and observed spectra at these frequencies, i.e. .. math:: \eta_{\rm Peak} = \frac{I_{\rm model} \left(\nu_{\rm Doppler}^i\right)}{I_{\rm observed}\left(\nu_{\rm Doppler}^i\right)} \times 100 the **LineIdentification** function decides if a molecule is included or not. A peak is overestimated if :math:`\eta_{\rm Peak}` is larger than the overestimation limit defined by the input parameter ``MaxOverestimationHeight`` + 100 %. If the number of overestimated lines compared to the total number of Doppler-shifted transition frequencies :math:`\nu_{\rm Doppler}^i` of the current molecule is higher than the user defined tolerance (given by the input parameter ``Tolerance``) the current molecule is NOT identified. If the modeled spectrum does not overestimate the observed spectrum the corresponding molecule is “for now” identified and the fitted molfit file is added to the overall molfit file which is used at the end of the line identification process to determine the final contribution of each molecule. Furthermore, the **LineIdentification** function writes a short summary about the result of each single molecule fit to a file called ``results.dat`` located in the current job subdirectory. The file contains the input parameters, the min. and max. frequency of each frequency range, a list with all molecules considered in the defined frequency range(s), the noise level for the defined frequency range(s) (i.e. the minimal intensity of a line), and information about the molecule identification process. Additionally, the **LineIdentification** function creates a file called ``Identified_Molecules.dat`` located in the same directory containing all identified molecules, i.e. all molecules which contributes significantly to the spectra (controlled by the input parameter ``MaxOverestimationHeight``, see below). Additionally, the **LineIdentification** function creates a further subdirectory within the job directory called ``Intermediate_identified_molecules`` containing a plot of the fitted (continuum subtracted) spectrum together with the optimized molfit file for each identified molecule. Hereby the name of the optimized molfit file and the name of the spectrum plot file contains the name of the corresponding molecule plus the name of the obs. data file plus the lowest and highest frequencies of the corresponding frequency range. For example, the file ``CH3OH_v=0___sgrb2m.dat__342282.0_-_345282.0_MHz.png`` describes the modeled spectrum of the molecule CH\ :math:`_3`\ OH\ :math:`_{v=0}` together with the observational data from file ``sgrb2m.dat.png`` for the frequency range between 342282.0 MHz and 345282.0 MHz. In addition, each plot contains one (or two) horizontal blue dotted line(s) indicating the noise level [2]_, and one or more vertical green dashed lines describing the non-Doppler shifted transition frequencies, see :numref:`fig-IdentifiedMoleculePlot`. In order to control the identification process, the current job directory contains another subdirectory, called ``Not_identified_molecules``, including the plots of the non-identified molecules. The plots contain the same information as the plots of the identified molecules. Please note, the directory ``Not_identified_molecules`` contains no molfit files. .. figure:: ../figures/CH3OH__SMF.png :width: 100% :name: fig-IdentifiedMoleculePlot Result of a single molecule fit for CH\ :math:`_3`\ OH (continuum subtracted) together with the observed data (continuum subtracted). The horizontal green dotted lines indicate the transition frequencies of CH\ :math:`_3`\ OH, respectively. .. _`LineID:OverallFit`: Overall fit ----------- After finishing all single molecule fits for all frequency ranges, the **LineIdentification** function performs an overall fit with all identified molecules, where all frequency ranges are fitted simultaneously. For this overall fit, the **LineIdentification** function creates a further subdirectory within the current job directory called ``all``, where all required MAGIX files are stored in. The molfit file for this overall molfit file is made up by merging all optimized molfit files of the identified molecules from the single molecule fits. After the overall fit is done, the **LineIdentification** function determines the contribution of each molecule described in the optimized overall molfit file using the *myXCLASS* function. For that purpose the **LineIdentification** function creates a subdirectory within the ``all`` subdirectory called ``final_fit``, which describe the spectrum of each molecule for each frequency range using the *myXCLASS* function, creates a plot showing the (continuum subtracted) observational data together with the (continuum subtracted) modeled spectrum of a certain molecule, and stores this plot file in a further subdirectory named with the name of the current molecule within the ``final_fit`` directory. The name of the plot file is made up of the name of the molecule, the name of the observational data file and the lowest frequency (in MHz) of the corresponding frequency range. The plots for each molecule contribution contain one or more vertical green dotted lines indicating the transition frequencies stored in the database for the corresponding molecule in the given frequency range, see :numref:`fig-MoleculeContriubitionPlot`. In addition the horizontal blue dotted line(s) indicate(s) the band of noise, as described above. .. figure:: ../figures/C-13-H3OH_v=0.png :width: 100.0% :name: fig-MoleculeContriubitionPlot Example of a plot showing the contribution of a single molecule (here \ :math:`^{13}`\ CH\ :math:`_3`\ OH\ :math:`_{v=0}`) after finishing the overall fit. The vertical green dotted lines indicate the transition frequencies in the database in the given frequency range. .. ---------------------------------------------------------------------------------------- Footnotes --------- .. Footnotes .. [1] By using the SQL parameter in the *default molfit file*, described in Sect. ":ref:`myxclass-molfit-sql`", the user can limit the number of transitions. .. [2] The second blue line is important for absorption: Only absorption lines below the second (lower) blue line are considered. .. ---------------------------------------------------------------------------------------- References ---------- .. citation reference .. ---------------------------------------------------------------------------------------- .. hack to get extra blank line .. |br| raw:: html