Merge branch 'master' of github.com:cschreib/fastpp

cschreib · cschreib · commit 1254950a5d59 · 2019-03-15T10:55:09.000Z
diff --git a/README.md b/README.md
@@ -295,7 +295,7 @@ To get the closest behavior to that of FAST-IDL, you should set ```C_INTERVAL=68
 
 ## Monte Carlo simulations
  * ```SAVE_SIM```: possible values are ```0``` or ```1```. The default is ```0```. If set to ```1```, the program will save the best fitting parameters in the Monte Carlo simulations of each galaxy in a FITS table located at ```best_fits/[catalog]_[source].sim.fits```. This table contains the values of all fitting parameters and the chi2. This will consume some more disk space, but will not slow down the program significantly. It can be useful to identify covariances and degeneracies that are not apparent from the confidence intervals printed in the output catalog.
- * ```BEST_FROM_SIM```: possible values are ```0``` or ```1```. The default is ```0```, and the best fitting solution will be chosen as the one providing the smallest chi2 value in the grid. If set to ```1```, the program will instead determine the best solution from the median of all the Monte Carlo simulations. This will ensure that the "best fit" values are more consistent with the confidence intervals (i.e., usually more centered), and erases large fluctuations when multiple solutions with very different fit parameters lead to very close chi2 values. This typically happens for galaxies with poor photometry: there are a large number of models which give similarly good chi2, but one of them has a chi2 better by a very small amount (say 0.001) and it thus picked as the "best fit".
+ * ```BEST_FROM_SIM```: possible values are ```0``` or ```1```. The default is ```0```, and the best fitting solution will be chosen as the one providing the smallest chi2 value in the grid. If set to ```1```, the program will instead determine the best solution from the median of all the Monte Carlo simulations. This will ensure that the "best fit" values are more consistent with the confidence intervals (i.e., usually more centered), and erases large fluctuations when multiple solutions with very different fit parameters lead to very close chi2 values. Such problematic cases typically happen for galaxies with poor photometry: there are a large number of models which give similarly good chi2, but one of them has a chi2 better by a very small amount (say 0.001) and it thus picked as the "best fit". Note that, when this option is set together with ```BEST_FIT=1```, the code will no longer output the model that best fits the catalog fluxes; instead it will output the "median" best-fit model from the Monte Carlo simulations (defined as the MC model with the smallest Euclidian distance to the other MC models). This median model may not provide the smallest possible chi2 to the original data, although by construction is should still be reasonably close to that optimal chi2.
 
 ## Confidence intervals from chi2 grid
  * ```INTERVAL_FROM_CHI2```: possible values are ```0``` or ```1```. The default is ```0```, in which case confidence intervals (error bars) are computed using a series of Monte Carlo simulations of the data. An alternative, much faster way to obtain similar results is to read off the range of parameter values covered by models in the grid, selecting only the models that have a chi2 within a certain threshold from the best value (see Avni 1976). This can be done by setting this value to ```1```. While in most cases the resulting confidence intervals will be almost the same as with the MC simulations, there are several advantages to using this approach. The first is that, by construction, the resulting confidence intervals will always include the best fit value; this is not guaranteed with MC simulation. The second is that this method behaves better when the best fit solution is close to the edge of the grid; intervals from MC simulations are computed from percentiles and therefore may under-represent edge values. The latter is particularly important for parameters that have a natural upper/lower bound, such as parameters that cannot be negative but for which zero is a perfectly fine value (e.g., Av, or some SFH parameters). There are some drawbacks however. To obtain precise confidence intervals, the grid must be relatively fine: if the true confidence interval of a parameter is 1 +/- 0.01, but the fitting grid for that parameter only has a step of 0.1, then the code will report 1 +/- 0, which is too optimistic. Finally, in terms of performance, for this option to work the program has to write part of the chi2 grid on the disk, and the amount of space required can be large if you have a very fine grid.
diff --git a/src/fast++-write_output.cpp b/src/fast++-write_output.cpp
@@ -221,18 +221,47 @@ void write_best_fits(const options_t& opts, const input_state_t& input, const gr
     vec1f lam, sed, sed_nodust, flx;
     auto pg = progress_start(input.id.size());
     for (uint_t is : range(input.id)) {
-        float scale = output.best_params(is,gridder.nparam+prop_id::scale,0);
+        float scale = finf;
+        uint_t model = npos;
+
+        if (!opts.best_from_sim) {
+            // Get the model with smallest chi2
+            scale = output.best_params(is,gridder.nparam+prop_id::scale,0);
+            model = output.best_model[is];
+        } else {
+            // Get the "median model" among MC simulations
+            // NB: the median is computed on the multidimensional space of the fitting grid,
+            // yet there is no unique definition of the median in N-dimensional space. Here
+            // we pick the model with the smallest distance to the other models
+            double best_dist = dinf;
+            for (uint_t imc : range(opts.n_sim)) {
+                vec1i idm = vec1i{gridder.grid_ids(output.mc_best_model(is,imc))};
+                double dist = 0.0;
+                for (uint_t imc2 : range(opts.n_sim)) {
+                    if (imc2 == imc) continue;
+
+                    vec1i idm2 = vec1i{gridder.grid_ids(output.mc_best_model(is,imc2))};
+                    dist += sqrt(total(sqr(idm - idm2)));
+                }
+
+                if (dist < best_dist) {
+                    best_dist = dist;
+                    scale = output.mc_best_props(is,prop_id::scale,imc);
+                    model = output.mc_best_model(is,imc);
+                }
+            }
+        }
 
         // Get model
         if (opts.intrinsic_best_fit) {
-            if (!gridder.build_template_nodust(output.best_model[is], lam, sed_nodust, flx)) {
+            if (!gridder.build_template_nodust(model, lam, sed_nodust, flx)) {
                 return;
             }
 
             sed_nodust *= scale;
         }
 
-        if (!gridder.build_template(output.best_model[is], lam, sed, flx)) {
+        if (!gridder.build_template(model, lam, sed, flx)) {
             return;
         }