Skip to content

Commit e9421a0

Browse files
committed
Added INTERVAL_FROM_CHI2
1 parent a460b9b commit e9421a0

File tree

5 files changed

+165
-45
lines changed

5 files changed

+165
-45
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,9 @@ To get the closest behavior to that of FAST-IDL, you should set ```C_INTERVAL=68
295295
* ```SAVE_SIM```: possible values are ```0``` or ```1```. The default is ```0```. If set to ```1```, the program will save the best fitting parameters in the Monte Carlo simulations of each galaxy in a FITS table located at ```best_fits/[catalog]_[source].sim.fits```. This table contains the values of all fitting parameters and the chi2. This will consume some more disk space, but will not slow down the program significantly. It can be useful to identify covariances and degeneracies that are not apparent from the confidence intervals printed in the output catalog.
296296
* ```BEST_FROM_SIM```: possible values are ```0``` or ```1```. The default is ```0```, and the best fitting solution will be chosen as the one providing the smallest chi2 value in the grid. If set to ```1```, the program will instead determine the best solution from the median of all the Monte Carlo simulations. This will ensure that the "best fit" values are more consistent with the confidence intervals (i.e., usually more centered), and erases large fluctuations when multiple solutions with very different fit parameters lead to very close chi2 values. This typically happens for galaxies with poor photometry: there are a large number of models which give similarly good chi2, but one of them has a chi2 better by a very small amount (say 0.001) and it thus picked as the "best fit".
297297

298+
## Confidence intervals from chi2 grid
299+
* ```INTERVAL_FROM_CHI2```: possible values are ```0``` or ```1```. The default is ```0```, in which case confidence intervals (error bars) are computed using a series of Monte Carlo simulations of the data. An alternative, much faster way to obtain similar results is to read off the range of parameter values covered by models in the grid, selecting only the models that have a chi2 within a certain threshold from the best value (see Avni 1976). This can be done by setting this value to ```1```. While in most cases the resulting confidence intervals will be almost the same as with the MC simulations, there are several advantages to using this approach. The first is that, by construction, the resulting confidence intervals will always include the best fit value; this is not guaranteed with MC simulation. The second is that this method behaves better when the best fit solution is close to the edge of the grid; intervals from MC simulations are computed from percentiles and therefore may under-represent edge values. The latter is particularly important for parameters that have a natural upper/lower bound, such as parameters that cannot be negative but for which zero is a perfectly fine value (e.g., Av, or some SFH parameters). There are some drawbacks however. To obtain precise confidence intervals, the grid must be relatively fine: if the true confidence interval of a parameter is 1 +/- 0.01, but the fitting grid for that parameter only has a step of 0.1, then the code will report 1 +/- 0, which is too optimistic. Finally, in terms of performance, for this option to work the program has to write part of the chi2 grid on the disk, and the amount of space required can be large if you have a very fine grid.
300+
298301
## Controlling the cache
299302
* ```NO_CACHE```: possible values are ```0``` or ```1```. The default is ```0```, and the program will read and/or create a cache file, storing the pre-computed model fluxes for reuse. If you are changing your grid often or if the grid is very large and you do not want to store it on the disk, you can set this value to ```1``` and the program will neither read from nor write to the cache. Because it avoids some IO operations, it may make the program faster when the grid has to be rebuilt.
300303

@@ -320,7 +323,7 @@ To get the closest behavior to that of FAST-IDL, you should set ```C_INTERVAL=68
320323
* ```BEST_SFHS```: possible values are ```0``` or ```1```. The default is ```0```. If set to ```1```, the program will output the best fit star formation history (SFH) to a file, in the ```best_fits``` directory (as for the best fit SEDs). If Monte Carlo simulations are enabled, the program will also output confidence intervals on the SFH for each time step, as well as the median SFH among all Monte Carlo simulations. This median may not correspond to any analytical form allowed by your chosen SFH model.
321324
* ```SFH_OUTPUT_STEP```: possible values are any strictly positive number, which defines the size of a time step in the output SFH (in Myr). The default is ```10``` Myr.
322325
* ```SFH_OUTPUT```: possible values are ```'sfr'``` or ```'mass'```. The default is ```'sfr'```, and the program outputs as "SFH" the evolution of the instantaneous SFR of each galaxy with time. If set to ```'mass'```, the program will output instead the evolution of the stellar mass with time (which is usually better behaved, see Glazebrook et al. 2017). Note that the evolution of the mass accounts for mass loss, so the mass slowly _decreases_ with time after a galaxy has quenched.
323-
* ```SAVE_BESTCHI```: FAST++ can save the entire chi2 grid on the disk with the ```SAVE_CHI_GRID``` option. However, if you have *huge* grids, this can require too much disk space (I have been in situations where the chi2 grid would be as large as several TB!). Usually, one is not interested in the chi2 of *all* models, but only those that match the data within some tolerance threshold. This option allows you to only save on the disk the models that are worst than the best chi2 by some amount ```chi2 - best_chi2 < SAVE_BESTCHI``` (where ```SAVE_BESTCHI=1``` if you are interested in standard 68% confidence intervals, or ```2.71``` for 90% confidence, etc., see Avni 1976). These "good" models are saved in a separate ".grid" file for each galaxy of the input catalog, inside the ```best_chi2``` folder. The format is similar to the ".grid" file for the full chi2 grid (which is described above), but not identical. The ``fast++-grid2fits`` tool can also convert these files into FITS tables. The binary format is the following:
326+
* ```SAVE_BESTCHI```: FAST++ can save the entire chi2 grid on the disk with the ```SAVE_CHI_GRID``` option. However, if you have *huge* grids, this can require too much disk space (I have been in situations where the chi2 grid would be as large as several TB!). Usually, one is not interested in the chi2 of *all* models, but only those that match the data within some tolerance threshold. This option allows you to only save on the disk the models that are worst than the best chi2 by some amount ```chi2 - best_chi2 < SAVE_BESTCHI``` (where ```SAVE_BESTCHI=1``` if you are interested in standard 68% confidence intervals, or ```2.71``` for 90% confidence, etc., see Avni 1976). If you set the ```INTERVAL_FROM_CHI2``` option, the program will use these saved grids to automatically compute parameter confidence intervals. The parameters of all these "good" models are saved in a separate ".grid" file for each galaxy of the input catalog, inside the ```best_chi2``` folder. The format is similar to the ".grid" file for the full chi2 grid (which is described above), but not identical. The ``fast++-grid2fits`` tool can also convert these files into FITS tables. The binary format is the following:
324327
```
325328
# Begin header
326329
# ------------

example/fast.param

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,11 @@ APPLY_VDISP = 0 # km/s
208208
# best-fit instead of the model with the smallest chi squared on
209209
# the original (unperturbed) photometry.
210210
#
211+
# o INTERVAL_FROM_CHI2: use the chi2 grid directly to compute confidence
212+
# intervals on the fit parameters, instead of using Monte Carlo
213+
# simulation. This will force setting 'SAVE_BESTCHI' to a value large
214+
# enough to encompass the chosen confidence intervals.
215+
#
211216
# o SAVE_SIM: save the best-fit parameters for each Monte Carlo
212217
# simulation for all sources in the "best_fits" directory.
213218
#
@@ -272,6 +277,7 @@ N_SIM = 100
272277
C_INTERVAL = 68 # 68 / 95 / 99 or [68,95] etc
273278
BEST_FIT = 0 # 0 / 1
274279
BEST_FROM_SIM = 0 # 0 / 1
280+
INTERVAL_FROM_CHI2 = 0 # 0 / 1
275281
SAVE_SIM = 0 # 0 / 1
276282
SFR_AVG = 0 # 0, 100 Myr, 300 Myr etc
277283
INTRINSIC_BEST_FIT = 0 # 0 / 1

src/fast++-fitter.cpp

Lines changed: 134 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -315,12 +315,6 @@ void fitter_t::write_chi2(uint_t igrid, const vec1f& chi2, const vec2f& props, u
315315
if (chi2.safe[cis] < best_chi2.safe[is]) {
316316
best_chi2.safe[is] = chi2.safe[cis];
317317

318-
struct datum {
319-
uint32_t id;
320-
float chi2;
321-
vec1f p;
322-
};
323-
324318
// Read the saved data and write simultaneously
325319
file::move(chi2_filename.safe[is], chi2_filename.safe[is]+".old");
326320
in.open(chi2_filename.safe[is]+".old", std::ios::binary | std::ios::in);
@@ -723,6 +717,47 @@ vec2d make_grid_bins(const vec1d& grid) {
723717
return bins;
724718
}
725719

720+
double get_chi2_from_conf_interval(double conf) {
721+
if (1.0 - conf < 1e-6) return 24.0;
722+
723+
double eps = 1e-3;
724+
725+
double chi2 = 0.0;
726+
double prev_chi2;
727+
double delta = 1.0;
728+
bool last_increase = true;
729+
730+
// This is a naive iterative inversion of the error function.
731+
// It yields the corresponding chi2 value with a relative accuracy of 'eps'.
732+
// Not the fastest implementation, but we don't care much about speed here.
733+
734+
do {
735+
// Compute confidence interval for this chi2
736+
double p = erf(sqrt(chi2/2.0));
737+
738+
// Move chi2
739+
prev_chi2 = chi2;
740+
if (p < conf) {
741+
if (!last_increase) {
742+
delta *= 0.5;
743+
last_increase = true;
744+
}
745+
746+
chi2 += delta;
747+
} else {
748+
if (last_increase) {
749+
delta *= 0.5;
750+
last_increase = false;
751+
}
752+
753+
chi2 -= delta;
754+
}
755+
756+
} while (abs(chi2/prev_chi2 - 1.0) > eps);
757+
758+
return chi2;
759+
}
760+
726761
void fitter_t::find_best_fits() {
727762
if (opts.parallel == parallel_choice::models) {
728763
if (opts.verbose) note("waiting for all models to finish...");
@@ -743,6 +778,17 @@ void fitter_t::find_best_fits() {
743778
}
744779
}
745780

781+
vec1f delta_chi2;
782+
if (opts.interval_from_chi2) {
783+
for (auto& c : input.conf_interval) {
784+
if (c < 0.5) {
785+
delta_chi2.push_back(-get_chi2_from_conf_interval(1.0 - 2*c));
786+
} else {
787+
delta_chi2.push_back(+get_chi2_from_conf_interval(2*c - 1.0));
788+
}
789+
}
790+
}
791+
746792
if (opts.verbose) note("finding best fits...");
747793
for (uint_t is : range(input.id)) {
748794
if (!silence_invalid_chi2 && !is_finite(output.best_chi2[is])) {
@@ -759,6 +805,8 @@ void fitter_t::find_best_fits() {
759805
}
760806
}
761807

808+
// Deal with Monte Carlo Simulations
809+
762810
if (opts.n_sim > 0) {
763811
vec1u bmodel = output.mc_best_model(is,_);
764812

@@ -788,57 +836,100 @@ void fitter_t::find_best_fits() {
788836
}
789837
}
790838

791-
// For grid parameters, use cumulative distribution
792-
for (uint_t ip : range(gridder.nparam)) {
793-
vec1d grid = sorted_grid[ip];
839+
if (!opts.interval_from_chi2) {
840+
// For grid parameters, use cumulative distribution
841+
for (uint_t ip : range(gridder.nparam)) {
842+
vec1d grid = sorted_grid[ip];
794843

795-
if (grid.size() == 1) {
796-
if (opts.best_from_sim) {
797-
output.best_params(is,ip,0) = grid[0];
798-
}
844+
if (grid.size() == 1) {
845+
if (opts.best_from_sim) {
846+
output.best_params(is,ip,0) = grid[0];
847+
}
799848

800-
for (uint_t ic : range(input.conf_interval)) {
801-
output.best_params(is,ip,1+ic) = grid[0];
849+
for (uint_t ic : range(input.conf_interval)) {
850+
output.best_params(is,ip,1+ic) = grid[0];
851+
}
852+
} else {
853+
// Build cumulative histogram of binned values
854+
vec2d bins = make_grid_bins(grid);
855+
vec1d hist = histogram(bparams.safe(ip,_), bins);
856+
vec1d cnt = cumul(hist);
857+
cnt /= cnt.back();
858+
859+
// Treat the edges in a special way to avoid extrapolation beyond the grid
860+
prepend(cnt, {0.0});
861+
prepend(grid, {grid.front()});
862+
append(cnt, {1.0});
863+
append(grid, {grid.back()});
864+
865+
// Compute percentiles by interpolating the cumulative PDF
866+
auto get_percentile = [&](double p) {
867+
return interpolate(grid, cnt, p);
868+
};
869+
870+
if (opts.best_from_sim) {
871+
output.best_params(is,ip,0) = get_percentile(0.5);
872+
}
873+
874+
for (uint_t ic : range(input.conf_interval)) {
875+
output.best_params(is,ip,1+ic) =
876+
get_percentile(input.conf_interval[ic]);
877+
}
802878
}
803-
} else {
804-
// Build cumulative histogram of binned values
805-
uint_t ng = grid.size();
806-
vec2d bins = make_grid_bins(grid);
807-
vec1d hist = histogram(bparams.safe(ip,_), bins);
808-
vec1d cnt = cumul(hist);
809-
cnt /= cnt.back();
810-
811-
// Treat the edges in a special way to avoid extrapolation beyond the grid
812-
prepend(cnt, {0.0});
813-
prepend(grid, {grid.front()});
814-
append(cnt, {1.0});
815-
append(grid, {grid.back()});
816-
817-
// Compute percentiles by interpolating the cumulative PDF
818-
auto get_percentile = [&](double p) {
819-
return interpolate(grid, cnt, p);
820-
};
879+
}
880+
881+
// For properties, use percentiles
882+
for (uint_t ip : range(gridder.nparam, bparams.dims[0])) {
883+
vec1d bp = bparams.safe(ip,_);
821884

822885
if (opts.best_from_sim) {
823-
output.best_params(is,ip,0) = get_percentile(0.5);
886+
output.best_params(is,ip,0) = inplace_median(bp);
824887
}
825888

826889
for (uint_t ic : range(input.conf_interval)) {
827-
output.best_params(is,ip,1+ic) = get_percentile(input.conf_interval[ic]);
890+
output.best_params(is,ip,1+ic) =
891+
inplace_percentile(bp, input.conf_interval[ic]);
828892
}
829893
}
830894
}
895+
}
831896

832-
// For properties, use percentiles
833-
for (uint_t ip : range(gridder.nparam, bparams.dims[0])) {
834-
vec1d bp = bparams.safe(ip,_);
897+
// If asked, obtain confidence intervals from chi2 grid saved on disk
835898

836-
if (opts.best_from_sim) {
837-
output.best_params(is,ip,0) = inplace_median(bp);
838-
}
899+
if (opts.interval_from_chi2) {
900+
std::ifstream in(chi2_filename[is], std::ios::binary);
901+
in.seekg(obchi2.hpos);
839902

840-
for (uint_t ic : range(input.conf_interval)) {
841-
output.best_params(is,ip,1+ic) = inplace_percentile(bp, input.conf_interval[ic]);
903+
while (in) {
904+
uint32_t id;
905+
float chi2;
906+
vec1f p(gridder.nprop);
907+
908+
if (file::read(in, id) && file::read(in, chi2) && file::read(in, p)) {
909+
vec1u ids = gridder.grid_ids(id);
910+
for (uint_t ip : range(gridder.nparam+gridder.nprop)) {
911+
double v;
912+
if (ip < gridder.nparam) {
913+
v = output.grid[ip][ids[ip]];
914+
} else {
915+
v = p[ip - gridder.nparam];
916+
}
917+
918+
for (uint_t ic : range(input.conf_interval)) {
919+
if (chi2 - best_chi2.safe[is] < abs(delta_chi2[ic])) {
920+
float& saved = output.best_params(is,ip,1+ic);
921+
if (delta_chi2[ic] < 0.0) {
922+
if (saved > v || !is_finite(saved)) {
923+
saved = v;
924+
}
925+
} else {
926+
if (saved < v || !is_finite(saved)) {
927+
saved = v;
928+
}
929+
}
930+
}
931+
}
932+
}
842933
}
843934
}
844935
}

src/fast++-read_input.cpp

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ bool read_params(options_t& opts, input_state_t& state, const std::string& filen
164164
PARSE_OPTION(rest_mag)
165165
PARSE_OPTION(continuum_indices)
166166
PARSE_OPTION(sfh_quantities)
167+
PARSE_OPTION(interval_from_chi2)
167168

168169
#undef PARSE_OPTION
169170
#undef PARSE_OPTION_RENAME
@@ -321,6 +322,21 @@ bool read_params(options_t& opts, input_state_t& state, const std::string& filen
321322
return false;
322323
}
323324

325+
if (opts.interval_from_chi2) {
326+
double max_interval = max(opts.c_interval);
327+
double max_chi2 = get_chi2_from_conf_interval(max_interval/100.0);
328+
if (max_chi2 > opts.save_bestchi) {
329+
error("with 'INTERVAL_FROM_CHI2=1', ", max_interval, "% confidence interval "
330+
"requires 'SAVE_BESTCHI>=", max_chi2, "'");
331+
return false;
332+
}
333+
}
334+
335+
if (!opts.best_from_sim && opts.interval_from_chi2 && !opts.save_sim && opts.n_sim != 0) {
336+
warning("with the current setup, Monte Carlo simulations are not used; setting 'N_SIM=0'");
337+
opts.n_sim = 0;
338+
}
339+
324340
if (!opts.my_sfh.empty()) {
325341
opts.sfh = sfh_type::single;
326342
} else if (!opts.custom_sfh.empty()) {
@@ -355,7 +371,7 @@ bool read_params(options_t& opts, input_state_t& state, const std::string& filen
355371
}
356372
}
357373

358-
if (opts.n_sim != 0) {
374+
if (opts.n_sim != 0 || opts.interval_from_chi2) {
359375
state.conf_interval = 0.5*(1.0 - opts.c_interval/100.0);
360376
inplace_sort(opts.c_interval);
361377
vec1f cint = state.conf_interval;

src/fast++.hpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ struct options_t {
121121
float save_bestchi = 0.0;
122122
vec1u rest_mag;
123123
std::string continuum_indices;
124+
bool interval_from_chi2 = false;
124125

125126
// Custom SFH
126127
std::string custom_sfh;
@@ -538,4 +539,7 @@ namespace file {
538539
}
539540
}
540541

542+
// defined in fast++-fitter.cpp
543+
double get_chi2_from_conf_interval(double conf);
544+
541545
#endif

0 commit comments

Comments
 (0)