====================================================================== === QPACK === ====================================================================== ===== Status, bug reports, etc. ====================================== Qpack is being implemented by Patrick Eriksson and Carlos Jimenez. The package is under development and quick changes from the outline below can be made. The package is shared following the philosophy of the GNU public license, this including no warranty what so ever. Comments, suggestions and bug reports shall be sent to: patrick@rss.chalmers.se jimenez@rss.chalmers.se ===== General information ============================================ Qpack is a package of Matlab functions for retrieval work using ARTS as computational engine. Beside functions for the actual inversion and setting up the retrieval quantities, the package includes functions to characterize the retrieval and to optimize the calculation grids. The input to ARTS is provided in two ways, by control file templates and a structure with settings, denoted as Q. The final control files are created by the Qtool in AMI. All input not given by the template files must be specified by Q. The fields of Q are defined at the end of this file. For each retrieval/error quantity there exist a do-field, for example Q.TEMPERATURE_DO. If the do-field is set to zero, the quantity will be neglected totally and the other fields of the quantity can be left undefined. For retrieval/error quantities, the do-field can have a value between 0 and 3: 0 neglect the quantity totally 1 neglect during retrieval but calculate error for this quantity 2 treat as observation error 3 retrieve this quantity However, some quantities, such as calibration error, cannot be retrieved and the value 3 is not allowed. Measurement thermal is always turned on. Fields of Q ending with "_ON" shall be treated as booleans. For example, Q.CHBIN_ON=0 means that no binning of channels shall be performed. If a on-field is set to zero, the other fields of the quantity can be left undefined. The Q structure shall be defined in a function taking Q as only input and also returning Q, called a QDF. These functions shall not be called directly, the Q structure is instead obtained by the function qpack which calls the specified QDF. The function qpack starts with setting all do and on variables to zero. In this way it is not necessary to change all old files with definitions of Q when a new retrieval/error variable is introduced. The qpack function makes also some initial manipulations not handled elsewhere, such as setting the global varaible REPORT_LEVEL used by the out.m in AMI to the value of Q.QP_LEVEL. See further the help of qpack. Variables with a low influence of the atmospheric state are pre-calculated. Quantities such as H, Sx and Se are pre-calculated, while Kx is not. Most input files must be in the ARTS ascii format. This is true for all grids read directly by ARTS. Some files read by Qpack functions can either by ascii or binary (for example, sensor definitions), but to be safe, use ascii as this format always works. The file format for the different quantities is commented below the first corresponding variable. Please note that the 10-log (decades) is used as altitude coordinate when defining covariance matrices. All covariance matrices are read by the AMI function sFromFile (see on-line help for a general definition of the file format). Some quantities are expected to be calculated before doing an inversion or characterization of the retrieval performance. The following main quantities shall be pre-calculated: H, Sx and Se. Together with these main variables, some derived variables are created for higher flexibility. [!! Describe this !!] The standard order for the pre-calculations is Sx, H and Se, but it is only necessary to calculate Sx before H when eigenvector data reduction is applied. On the other hand, H must always be calculated before Se (to include a possible data reduction in Se by Hd). The pre-calculated quantities are stored using the Matlab format (.mat files). See Qpack strategy section for a bit more of info. ===== Package organization and structure ============================ Qpack is structured into several sub-folders: === ./ === The only function in the main folder is a function to init the Qpack. === Templates/ === The control file templates are placed here. The extension of the templates is ".tmplt". === Main/ === This folder contains the main functions of Qpack. The main functions (beside qpack.m) start with "qp_", while so called in-line functions evoked by the cfile templates starts with "qpi_". === CLS/ === The functions for doing inversions by constrained least squares, that is, OEM and Tikhonov regularization. Ordinary least squares is also handled by these functions for simplicity. These functions starts with "qpcls_", beside the function doing the actual inversions that is called just qpcls.m. === Setup/ === This folder contains various help functions to create and "optimize" the input files. (I am not sure how much stuff that will be implemented, but this is at least the general idea.) The functions in this directory starts with "qpmake_" and "qpopt_". === Samples/ === This folder contains a set of sample files to get started with Qpack. The example case corresponds to the 501.4 GHz band of Odin-SMR, but please note that the files are only provided as examples and do not make a perfect basis for simulating the Odin observations. These files form a full scale, but not extremely large, test case. In this directory you find a definition of the Q structure matching the files in the sub-folders. Copy this file (sample_q.m) to your test directory. Edit the path of the directories to match your environment. For example, if you want the pre-calculated quantities to be stored in the folder /home/xxx/Test, you set Q.OUT to this folder. After this, you should be ready to test Qpack. ===== To include new retrieval and error variables =================== To include a new retrieval variables the follwong steps are needed: 1. Introduce the needed Q-fields. 2. Modify qp_Sx. 3. Modify qp_kinfo. 4. Modify qpi_Kx or qp_Kx. 5. Modify qpoem: a. Include variable in map_x. b. Move retrieved values to X. 6. Check if qpoem_invchar must be modified. ===== The fields of Q ================================================ The Qpack fields are included here for completion, but check also Qpack/ChangeLog for an understanding of Qpack and possible changes not reflected here. 1 Report levels 2 Names and directories 3 Atmospheres 4 Tag groups 5 Spectroscopy 6 Hydrostatic equilibrium and RTE 7 Calculation grids 8 Sensor 9 Data Reduction 10 Thermal noise 11 Retrieval/error quantities 12 Conditional simulations 13 OEM retrievals === 1 Report levels === QP_LEVEL The report level for the Qpack functions (beside messages directly from ARTS). The number 0 means no reports, while the numbers 1-3 mean increasing level of verbosity. The function out.m in AMI is used for printing inside the Qpack functions. ARTS_LEVEL ARTS is called with this report level for the screen. === 2 Names and directories === ARTS The full path to the ARTS executable to run, e.g. /u/patrick/arts/src/arts OUT Name or full path for storing pre-calculated quantities. For example: OUT = 'out' OUT = '/u/patrick/SMR/out' TMP_AREA All the calculations are performed in temporary sub-folders to TMP_AREA. The name of the temporary folders is TmpXXXXXX where XXXXXX is a random number. The sub-folder is deleted when the calculations are finished. SPECTRO_DIR Full path of the directory for line files and definition of continua. CALCGRIDS_DIR Full path of the directory where the calculation grids are stored (f_mono, za_pencil and p_abs). SENSOR_DIR Full path of the directory for sensor files. RETRIEVDEF_DIR Full path of the directory for grids and definition of covariance matrices. === 3 Atmospheres === APRIORI_VMR Full path for the a priori VMR profiles. There shall exist a file for all needed species. This data are only used during the actual inversion or when generating rand data. APRIORI_PTZ Full path for a priori pressure, temperature and altitude file. The full file name, including extension, shall be given, this in contrast to APRIORI_VMR where only the main path shall be specified. SETUP_VMR SETUP_PTZ One or several atmospheres to be used during the pre-calculations. The format for the VMR profiles and the PTZ files follows the a priori atmosphere (APRIORI_VMR_DIR and APRIORI_PTZ). Each atmosphere corresponds to a string in an ARTS string array (see RETRIEVAL_TAGS). When setting up grids and similar operations, all given atmospheres are considered to avoid "over-fitting" to a special atmospheric condition. When only one atmospheric state can be considered, the first atmosphere is selected. See below for its use with rand data. ==== 4 Tag groups === RETRIEVAL_TAGS The tag groups to be retrieved. The tag groups shall be given as a Matlab string containing an ARTS string array (also called a string-string). For example: RETRIEVAL_TAGS = '"O3","ClO"' OTHER_TAGS The tag groups beside the retrieval tags. Format as above. === 5 Spectroscopy === LINEFILE Name on file defining which transitions to consider for the line- by-line calculations. The line file should be in the ARTS format. LINESHAPE LINESHAPE_FACTOR LINESHAPE_CUTOFF Line shape variables. Corresponds to the lineshape workspace variable (WSV) in ARTS. CONTINUA The name of the file defining continua. This file shall look like the part of an ARTS control file where the continua are described. The cont_descriptionInit command shall not be part of the file. === 6 Hydrostatic equilibrium and RTE === HSE_IN_ON Boolean to ensure that the input atmosphere fulfills hydrostatic equilibrium. The field is only used when reading the atmosphere for the first time HSE_RETRIEVAL_ON Boolean to tell if hydrostatic equilibrium shall be assumed for the retrieval. If this field is set to 1, temperature weighting functions are calculated with hydrostatic eq. and z_abs is recalculated for each iteration. Note that this field can be important for tropospheric water retrievals even if the temperature is not retrieved. HSE_PREF Pressure of the reference point for hydrostatic equlibrium. HSE_ZREF Pressure of the reference point for hydrostatic equlibrium. PLATFORM_ALTITUDE Corresponds to the ARTS WSV z_plat. STEPLENGTH_RTE Corresponds to the ARTS WSV l_step. GROUND_ALTITUDE GROUND_EMISSION Corresponds to the ARTS WSVs z_ground and e_ground, respectively. The ground temperature is set to the the first element of t_abs. REFRACTION_ON Boolean to turn on refraction. REFR_METHOD The method to use for calculation of refractive index. It is assumed that the method does not need any specific input. For example: REFR_METHOD = 'Boudouris' REFR_LFAC Corresponds to the ARTS WSV with the same name. EMISSION_ON Boolean to include emission in the calculations. === 7 Calculation grids === P_ABS F_MONO ZA_PENCIL The file (with extension) in GRID_DIR defining p_abs, f_mono and za_pencil, respectively. The files must be in ascii format. === 8 Sensor === Some information about the sensor calculations is found in the chapter "Sensor modeling" in AUG. F_ORDER ZA_ORDER These variables define if the spectra shall be treated as piece- wise linear or cubic when setting up the sensor H-matrix. The value 1 means piece-wise linear and the value 3 piece-wise cubic. ANTENNA_ON Boolean to consider the antenna pattern. ZA_SENSOR The zenith angles observed by the sensor (the middle points of the antenna pattern). This file is only read if an antenna is considered. If the antenna is neglected, ZA_SENSOR is set to ZA_PENCIL (in qp_H). ANTENNA_FILE The file in SENSOR_DIR defining the antenna pattern. See hAntennaFromFileAdv for file format. ANTENNA_ORDER Determines if the antenna pattern shall be treated to be either piece-wise linear (1) or cubic (3). See further F_ORDER. ANTENNA_MOVE Movement of the antenna (unit is degrees) during the integration. A constant scanning velocity is assumed. The value 0 means a fixed antenna during the integration. DSB_ON Boolean to consider the image sideband. DSB_FILE The file in SENSOR_DIR defining the response of the sideband filter. See hMixerFromFile for file format. DSB_ORDER Determines if the sideband filter response shall be treated to be either piece-wise linear (1) or cubic (3). See further F_ORDER. DSB_FPRIMARY Some frequency of the primary band (to indicate if the upper or lower band is the primary band). DSB_LO The LO frequency. BACKEND_ON Boolean to consider the backend (spectrometer). F_SENSOR The frequencies recorded by the sensor (the middle points of the backend channels). This file is only read if a backend is considered. If both mixer and backend are neglected, F_SENSOR is set to F_MONO (in qp_H). If backend is neglected but a mixer is included, F_SENSOR is set to the frequencies in the primary band (in qp_H). BACKEND_FILE The file in SENSOR_DIR defining the backend channel response. The response of all channels are assumed to be identical. See hBackendFromFile for file format. BACKEND_ORDER Determines if the channel response shall be treated to be either piece-wise linear (1) or cubic (3). See further F_ORDER. === 9 Data reduction === BINVIEW_ON Boolean to turn on binning of subsequent spectra. BINVIEW_DATA Corresponds to the input vector bins of hBinView. BINNING_ON Boolean to turn on binning of neighboring channels. BINNING_FILE Full path of file defining the binning pattern. See hBinFile for file format. KRED_ON Boolean to turn on data reduction by the Kx-matrix version of the Hotelling transformation. The basic idea is to do a Hotelling transformation of the spectral space. But as the spectral variability can be decomposed as: Sy = KxSxKx' + KbSbKb' + Se and we are interested in capture the variability of the parameters to be retrieved (x), the transformation is calculated only from that variability: KxSxKx' = E V E' where E are the eigenvectors and V the eigenvalues. The Hotteling transformation is derived as: 1. Notice that KxSxKx' = Kx sqrt(Sx) sqrt(Sx)' Kx' = Kx sqrt(Sx) (Kx sqrt(Sx))' where sqrt(Sx) is one of the possible square roots of Sx. 2. Then a SVD Kx sqrt(Sx) = U M V' where U and V are left and right singular vectors. 3. The eigenvectors are then U as from (1): E V E' = U M V' (U M V')' = U M M U' KRED_N Number of eigenvectors to keep in the data reduction. KRED_DEPTH How to do the SVD of Kx sqrt(Sx) during reduction, posibilities: Different assumptions are implemented for sqrt(Sx): 1.Q.KRED_DEPTH = 0 sqrt(Sx) = I i.e. Sx is assumed to be the identity matrix 2.Q.KRED_DEPTH = 1 sqrt(Sx) = sqrt(diag(Sx)) i.e. Sx is assumed to be diagonal 3.Q.KRED_DEPTH = 2 sqrt(Sx) = chol(Sx) i.e. full Sx is used and the sqrt is done by a Choleski decomposition. More info in: Eriksson, P., Jimenez, C., Búhler, S. and Murtagh, D. "A Hotteling Transformation approach for rapid inversion of atmospheric spectra", J. Quant Rad. Spectroscopy, pp 2001. LRED_ON Boolean to turn on data reduction by the limb version of the Hotelling transformation. By Limb it means a special reduction based on combining the reduction based on Kx with a prior optimization of the spectral input from a limb observation regarding each specific point of a retrieval grid. This reduction is useful for a retrieval method where each element of the retrieved species state vector is obtained independently, so the spectral input in the limb observation can be optimized for each element. So far we use it with the net retrievals from the package Npack. The basic idea is to decide which elements for the spectral input are left for each retrieval point, and then to proceed with a normal Hotelling transformation of the new spectral space. In practice is done by doing the Hot transf in a Kx padded with zeros to leave only the rows (part of the scan) and columns (the retrieval tag depending on LRED_INDTAGS) corresponding to each retrieval tag and point. LRED_HISTEP This parameter gives the upward distance from the retrieval point to seletc what upper part of the scan to include in the reduction. If a tangent altitude corresponding to a zenith angle is there, the corresponding spectrum is kept. LRED_LOSTEP As above but to determine the downward distance. LRED_INDTAGS Flag, 1 only the Kx corresponding to the retrieval tag is left, 0 the whole Kx is used. LRED_DEPTH As KRED_DEPTH LRED_KGRIDS This gives the retrieval grids where the reduction method (e.g neural nets) will be doing the retrieval. Notice that they can be different from SPECIES_KGRIDS in a Q structure use to generate rnd data (to allow the calculation of Kx with SPECIES_KGRIDS in a fine grid, independent of the retrieval grid). Q.LRED_N As KRED_N === 10 Thermal noise === Some information about the thermal noise is found in the chapter "Measurement errors" in AUG. MEASNOISE_DO Treatment of measurement thermal noise (0-2). MEASNOISE_COVMAT The file specyfying a covariance matrix to generate measurement noise. The file data shall have three columns, where the first column holds frequencies and the second column gives the standard deviations and the third one gives correlation length (in frequecny as frequency is the abcisa here. Linear interpolation is applied to get values for other frequencies (but no extrapolation). See further the AMI function sFromFile. CALINOISE_DO Treatment of calibration noise (0-2). This noise is correlated between subsequent viewing angles, in contrast to the measurement noise. CALINOISE_COVMAT As MEASNOISE_FILE but for the calibration noise. === 11 Retrieval/error/random quantities === SPECIES_KGRIDS The retrieval grid for each RETRIEVAL_TAGS as a string-string. For example: SPECIES_KGRIDS = '"grid1.aa","grid2.aa"' The grids shall be defined in files found in RETRIEVDEF_DIR. SPECIES_COVMATS Definition file of Sx for each retrieval tag as a string-string. The file format of these files are specified in the on-line help for the AMI method sFromFile. The altitude coordinate when specifying the covariance matrices is the 10-log of the pressure, or decades. A decade is about 16 km. The std is given in relative units e.g 0.3 means 30% of species apriori concentration. SPECIES_POSITIVE_ON Flag to include a positive constraint for the species retrieval (by retrieving the log of the species profiles). This feature is only activated for non-linear retrievals. RECALC_ABS_ON Flag to force re-calculation of absorption in each iteration of iterative inversions. Absorption will be automatically be re- calculated if temperature is retrieved. This flag can be used when, for example, the self-broadening of water can change notably between the a priori and true states. TEMPERATURE_DO Treatment of temperature (0-3, see the top of the file). TEMPERATURE_KGRID The file defining the retrieval/error grid for temperature. TEMPERATURE_COVMAT The file defining the temperature covariance matrix. Notice that the std is given in absolute units e.g. K and not in relative units as the species. POINTING_DO Treatment of pointing off-sets (0-3). POINTING_PDF Treatment of pointing offset pdf, 'gaussian' and 'uniform' can be used to fill this field. POINTING_STDV Standard deviation for the pointing off-set if 'gaussian' or half amplitude of pdf if 'uniform', realizations assumed to be centered around the nominal zenitth angle. POINTING_DELTA Perturbation to apply for calculation of the weighting function. Unit is degrees. POINTING_COR Set to 1, porinting error perfectly correlated, i.e. it's a pointing offset; set to 0, totally uncorrelated; so far no more refined schemes set (e.g linear drifts). CONTABS_DO Treatment of continuum absorption (0-3). The CONTABS_ORDER Polynomial order for the continuum fitting. CONTABS_LIMITS A vector of length 2 specifying the frequency limits of the fit. If this field is empty (but exists), the limits are set to the first and last value of F_MONO. CONTABS_KGRID The file defining the retrieval/error grid for continuum absorption. CONTABS_COVMAT The file defining the continuum absorption covariance matrix. The standard deviations shall be given in percentages with respect to the absorption of the species in CONTABS_REF_SPECIES. The given percentages are converted to absorption values [1/m] by calculating the absorption of the reference species and multiplicate the percentages with the absorption at the fitting points. CONTABS_REF_SPECIES Reference species when defining the standard deviations in the covriance matrix for continuum absorption. See further above for CONTABS_COVMAT. EGROUND_DO Treatment of ground emission (0-3). EGROUND_LIMITS A common value for the ground emission is used inside each specified frequency range. Values outside the range of F_MONO can be given. The values are given as a vector and gives the starting point of each range, beside the last value that gives the end point of the last range. Range 1 ends where range 2 starts etc. The ranges for EGROUND_LIMITS=[f1 f2 f3 f4] are: f1 <= f < f2 f2 <= f < f3 f3 <= f <= f4 An example: if the ground emission is assumed to be different in primary and image bands, but to be constant inside each band, EGROUND_LIMITS can be set to [0 f_lo 1e99] where f_lo is the LO frequency. If parts of F_MONO are outside the specified ranges, no retrieval of the ground emission will be perfomed for those parts. If EGROUND_LIMITS is set to be empty, this means that each frequency shall be treated seperately. EGROUND_COVMAT Definition file of Sx/Sb for the ground emission. Standard deviations and correlation lengths are given as functions of frequency. The mean of the first and last element of F_MONO inside each range given by EGROUND_LIMITS is used when interpolating the given values. EGROUND_MINMAX A vector with length 2 specifying the lower and upper limit for allowed values on the ground emission. If the vector is set to be empty, all values are allowed. These limits are applied only during the iterations of an inversion. When convergence has been reached, the retrieved values are returned even if they are outside the given min and max values. The min/max values can accordingly be used to keep the ground emission inside the range 0 and 1 during the iterations to avoid unphysical conditions. POLYFIT_DO Treatment of polynomial fit to baseline ripple (0-3). POLYFIT_ORDER Order of polynomial fit. POLYFIT_COVMATS Definition file of Sx/Sb for each polynomial coefficient, given as a string-string. The number of files shall be POLFIT_ORDER+1. The standard deviations are given in intensity unit. The correlation length refers to the zenith angles and the unit is accordingly degrees. PPOLYFIT_DO PPOLYFIT_ORDER PPOLYFIT_COVMATS PPOLYFIT_LIMITS Variables for piecewise polynomial baseline fit. The variables PPOLYFIT_DO, PPOLYFIT_ORDER and PPOLYFIT_COVMATS act as the corresponding POLYFIT variables. All the polynomials have the same order and a priori uncertainty. The variable PPOLYFIT_LIMITS is a vector with the limits between the different polynomials. The end limits must be specified and the length of PPOLYFIT_LIMITS is accordingly the number of polynomials + 1. === 12 Conditional simulations === NUMBER_DO If different from 0 a random set following the statistics given in the _DO variables will be calculated. Those _DO variables larger than 0 are included in the caluclations. NUMBER_DO gives the number of rnd realizations for each variable. The number of spectra created is also NUMBER_DO, i.e for spectrum i the i realizations for each variable defined a state i used to generate spectrum i. See below generating random sets. === 13 Retrievals === Retrievals by optimal estimation (OEM), Tikhonov regularization (TR) and ordinary (weighted) least squares (WLS) are here handled with the same functions as far as possible. The name constrained least squares is used as a common name for the methods (WLS is not a CLS method, but is included here for simplicity). NOTE that so far only OEM is working. RETRIEVAL METHOD String with the acronym of the retrieval method to use. Allowed options are: 'oem' CLS_NONLIN_ON Boolean to turn on nonlinear retrievals. The Marquardt- Levenberg scheme is used for the iterations. See qp_oem for more details. All CLS variables below are only needed for non-linear inversions. CLS_GA_START_VALUE Start value for gamma in the Marquardt-Levenberg iteration. CLS_GA_FAC_WHEN_OK The factor with which gamma is decreased when an iteration is succesful. CLS_GA_FAC_WHEN_NOT_OK The factor with which gamma is increased when an iteration is not successful. CLS_GA_UPPER_LIMIT When gamma equals or is above this value, the iterations are stopped. CLS_STOP Stop criterion for the iteration. The quantity used to check if convergence is reached is the change of the state vector from one iteration to next, scaled with Sx and the length of x. See qp_oem for details. CLS_MAX_ITER An upper limit on the number of iterations to be performed. ===== Qpack strategy =============================================== 1 Doing a measurement characterization 2 Doing an optimal inversion 3 Generating random sets === 1 Doing a meassurement characterization === The _DO fields from 1-11 are treated as: - 0 the variable is ignored, i.e. no error is calculated - 1 calculates error as Dy K S K' Dy' but it's not treated as observation error, i.e it was not included in the derivation of Se and Dy - 2 treats the variable as observation error, i.e it is included in Se and used in deriving Dy, and of course the error is calculated as 1. - 3 the variable will be retrieved, so it's part of the vector state x and the retrieval error will be calculated by adding the contribution from all the variables treated as 1 and 2. The statistics for the characterization are described by the _COVMATS and _KGRID fields. Use qp_invchar or qpcls_invchar. === 2 Doing an optimal inversion retrieval === The retrieval variables are given by _DO as 3 and the variables treated as observation errors by _DO as 2. Variables _DO 1 or 0 are ignored here. The retrieval grids are given by _KGRID fields, and the statistics to construct the Sx matrix are taken from _COVMATS. If given an observation Qc is a structure for characterization and Qo for OEM inversion, the normal procedure will be Qo = Qc + OEM fields, but in principle it is possible to have differences, e.g. _COVMATS, we characterize with the best information but we retrieve by constraining the solution with different statistics - authors of the document not very keen on this ,-) Use qpcls. === 3 Generating random sets ==== NUMBER_DO gives the number of realizations, variables with _DO fields > 0 are disturbed around the a priori state with the variability given by _COVMAT. _KGRID fields are only used in a reduction based on Kx is done, in which case the retieval grid is needed. APRIORI_VMR_DIR APRIORI_PTZ SETUP_VMR_DIRS SETUP_PTZ If no SETUP fields are given the APRIORI fields give the atmosphere where to take the species profiles to be randomly modified. If SETUP fields are given the APRIORI fields will be ignored. The idea is to generate sets of profiles-spectra where the profiles correspond to a more than one mean state. Thing can get a bit tricky, for instance if as part of the sensor a reduction based on Kx is done, the APRIORI fields will be used as a priori atmosphere when generating the Kx, so it might be a good idea if the APRIORI atmosphere is a mean state from the SETUP atmospheres. By qp_rnd_atm, qp_rnd_sensor and qp_rnd_atmxsensor. Files saved [Q.OUT'.ybatch'] with raw spectra, [Q.OUT'.measnoise'] and so on with rand fields from the sensor, and [Q.OUT'.ysensor'] with simulated spectra at the sensor output.