ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (25%)	8.00	5.00	3676
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	3 (75%)	4.00	4.33	2202
Total	4 (100%)	5.00	4.50	2570

Title	Ratings	Review Text	EditLens Prediction
When to Use Which? An Investigation of Search Methods on Expensive Black-box Optimisation Problems	Soundness: 1: poor Presentation: 3: good Contribution: 1: poor Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.	This paper addresses the problem of choosing the right algorithm for black box optimisation problems. To do so, the authors consider a series of algorithms belonging to different paradigms (evolutionary algorithms, Bayesian optimisation, local search, etc.) and evaluate their performance on 24 BBOB functions. S1. The paper addresses a legitimate but not particularly novel question in the world of optimisation: given a BBO, what is the most appropriate algorithm that a practitioner should use? S2. The paper is very well written and easy to follow, I enjoyed reading it. W1. The authors address the question through a fairly extensive experimental comparison and draw conclusions about the algorithm to be used based on the evaluation budget. However, this approach has a serious problem, in my view. How extrapolatable are the conclusions drawn about the 24 BBOB functions to any other real-world problem? Can the authors really guarantee that the recommendations they provide are accurate? W2. (Continuing previous weakness) The authors are correct in the paper when they say, in line 137, ‘... the circumstances under which one algorithm surpasses another remain unclear.’ And this paper does not clarify anything in that regard. Although I am no expert on the subject, I understand that other approaches such as Runtime Analysis or Problem decomposition (Walsh, Fourier, Elementary landscapes) are usually much more effective (although perhaps not applicable here). W3. I would suggest that the statistical analysis used to construct Table 1 is erroneous: (1) Wilcoxon is for pairwise comparisons, whereas in this case there are multiple comparisons, which should be tested using Friedman tests. (2) The assumptions for applying Wilcoxon are verified (the distributions of the algorithms should have the same shape). (3) The way in which the test is applied is not clearly explained. Are fitness values from different instances mixed in the test? How is this implemented? (4) We have no information on the magnitude of the difference in performance. This analysis is highly relevant and has been omitted. (5) Did you run p-value corrections in the test when doing multiple comparisons? Please answer my questions in W1.	Fully human-written
When to Use Which? An Investigation of Search Methods on Expensive Black-box Optimisation Problems	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The paper compares a range of standard black-box optimization problems on a group of standard test problems to investigate which algorithm is best applied under what circumstances, including the evaluation budget, problem dimensionality and objective characteristics. The comparisons appear fair and objective. While there is nothing revolutionary here, I feel that this paper is helpful in providing a high-level overview of the field for users. Results are fairly thorough, and more-or-less reflect what I would expect (caveat: this may be confirmation bias as I work in BO research). One fairly important weakness of this paper is a lack of detail on the BO algorithm settings (and, presumably, the other algorithms, though I am less familiar with these). I assume, based on the sparsity of information, that the authors have simply used whatever settings (acquisition function, GP kernel etc) are provided out of the box. While this is not necessarily unreasonable (many - perhaps even most - users will apply BO precisely like this, for better or worse) I feel it is important to at least describe these settings. For example is "vanilla-BO" using EI (expected improvement)? GP-UCB (GP upper conficence bound in one of its many variants)? Thompson sampling? I'm also guessing the kernel (covariance) is an SE kernel with lengthscale tuned for log-likelihood, based both on experience and comments on line 362, but this really needs to be defined explicitly. One other (related) point of weakness is section 4.3. In BO, knowledge of problem characteristics is often used to achieve a substantial speedup (or just avoid a slowdown). In particular kernel selection should reflect knowledge of the problem domain - eg smooth functions are well suited to an SE kernel, whereas if you have "sharp edges" then a (lower-order) Matern may be preferred; separable variables may lead you to prefer a Laplacian kernel or a product of kernels on different axis; and awareness of local structure may lead you to use some variation of kernel localization. Some discussion of these issues may be helpful. See weaknesses.	Fully human-written
When to Use Which? An Investigation of Search Methods on Expensive Black-box Optimisation Problems	Soundness: 4: excellent Presentation: 4: excellent Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper studies the problem of determining which blackbox optimization method to use given a blackbox optimization problem. There are a wide variety of blackbox optimization methods developed under different sets of assumptions and for different evaluation budgets. For instance evolutionary algorithm based approaches typically work well for settings with a large budget while bayesian optimization methods work well for more budget constrained settings. This paper studies this problem and provides guidance on which methods work best for which settings. Overall BO based methods work best for lower budgets while CMA-ES works best for higher budgets. - This paper presents an extensive study on the effectiveness of BO and Evolutionary algorithms for black box optimization for various budget ranges. - Various aspects of the black box optimization problem are considered such as dimensionality, multi-modality and other problem characteristics that can help make a better judgement of the best method. - Through experiments on benchmark datasets, the best settings are determined in a data-driven manner. - The main weakness of this paper is that all the runs are on BBOB functions, and no real world functions were considered. The findings from the study have not been applied to any real world blackbox optimization functions. - From a practical viewpoint, it has been known for a long time that evolutionary algorithms perform well for higher budgets and bayesian optimization based approaches for lower budgets. The novelty of this study is low. - While this paper empirically quantifies that BO works better at lower budgets, and evolutionary methods at higher budgets, it is unclear what practical benefits can this bring. Can this approach be used to build a meta algorithm which selects the optimization method given a problem? How much better would that algorithm be compared to any individual method. This has not been evaluated on any practical real world problems. (see above comments for details) Can this approach be used to build a meta algorithm which selects the optimization method given a problem? How much better would that algorithm be compared to any individual method. How can one use the finding from this study to make BBO better?	Fully human-written
When to Use Which? An Investigation of Search Methods on Expensive Black-box Optimisation Problems	Soundness: 3: good Presentation: 4: excellent Contribution: 3: good Rating: 8: accept, good paper Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.	This paper investigates which optimization algorithm is most appropriate for expensive black-box optimization (BBO) problems under varying evaluation budget constraints. The authors consider common families of search methods – including Bayesian optimization (BO) with Gaussian processes, an advanced BO variant (TuRBO), evolutionary algorithms (CMA-ES, Genetic Algorithm, Differential Evolution), a local search method (Nelder–Mead), and random search – and compare their performance across different budget regimes and problem characteristics. The main contribution is an extensive empirical study that provides practical guidelines on “when to use which” optimizer, depending on how tight the evaluation budget is and what is known about the problem’s landscape. - The study is exceptionally thorough in its empirical scope. It tests seven distinct algorithms (random search, Nelder–Mead, GA, DE, CMA-ES, vanilla BO, and TuRBO) across a wide range of conditions. Experiments cover 24 benchmark functions with diverse properties, multiple budget levels (200, 800, 2,000, and 8,000 evaluations, plus continuous trajectories up to 10,000), and different dimensionalities (2D, 10D, 40D). - The paper employs appropriate statistical tests to support its claims. For each benchmark function and budget, it identifies the best algorithm using Wilcoxon rank-sum tests with Holm–Bonferroni correction. - Practical Guidance and Clear Takeaways: for example, they conclude vanilla BO is the default choice under very tight evaluation budgets, CMA-ES is best when the budget is sufficiently large, and methods like NM or TuRBO become preferable in certain niche scenarios - All experiments are conducted on the BBOB artificial test functions, which, while standard in optimization research, are still simplified representations of real-world tasks. IMHO, the more interesting question is the actual computation time of each optimization method, e.g., wall-clock time, when considering the actual computation cost/time of each function evaluation. For instance, if the time complexity of each function evaluation is less than O(n^3) - the time complexity of BO, then its advantage in the tight budget setting is questionable. - Overall, the paper is well-written, but a few minor points could be improved. For example, some of the terminology like “weak global structure” or “adequate global structure” (from the BBOB categorization) might be unclear to readers unfamiliar with those benchmarks. - - Could you clarify why certain popular black-box optimizers were not included in the study? For instance, Particle Swarm Optimization (PSO) is a well-known heuristic for continuous optimization – was it omitted for any specific reason? - How confident are the authors that the findings will generalize to real-world expensive optimization tasks? The benchmarks used are synthetic and noise-free. For a practical problem, say, optimizing hyperparameters of a machine learning model, which might have noise and unknown constraints, would the recommendation still be “use BO for ≤200 evaluations, CMA-ES for ≥1000,” etc.? - Section 4.3 groups problems by characteristics (separability, conditioning, multi-modality) and gives algorithm recommendations per category. In a real scenario, a user may not know these properties of their objective function upfront. What do the authors suggest for a practitioner to identify, for example, that their problem has a “weak global structure” or is “highly conditioned”? This is important if the user wants to apply the paper's advice. Any guidance on how to assess problem features in practice would enhance the utility of those recommendations.	Fully AI-generated

PreviousPage 1 of 1 (4 total rows)Next