|
Guided Domain Solver: Structured Exploration of Domain-Specific Tasks with Large Language Models |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 0:
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper introduces an innovative framework that integrates MCTS, Knowledge Graphs, and LLMs to solve complex domain-specific problems. The approach leverages MCTS for exploration, a knowledge graph for structured domain representation, and LLM reasoning for informed decision-making during search expansion. Validated on the Sokoban puzzle game, the system achieves improved search efficiency and interpretability without retraining the LLM.
I really like the core idea of this paper i.e. combining symbolic reasoning (MCTS and KG) with LLM-based semantic understanding. The framework bridges structured search and flexible reasoning, showing how rule-based exploration can be guided by high-level language reasoning.
The paper feels rather rushed and incomplete. The related work section is shallow and does not sufficiently position this method within existing literature on LLM-based planning or neuro-symbolic systems. The experimental setting is narrow and limited to a single environment (Sokoban) and the baseline comparisons are weak. Moreover, the paper lacks ablation studies to disentangle the contribution of each component (MCTS, KG, LLM).
- The prompt template in Figure 3 is helpful but overly simplistic; it doesn’t show how the LLM integrates graph-based reasoning.
- The results section contains only one plot (Figure 5) comparing normalized branching factors. There are no quantitative metrics |
Fully AI-generated |
|
Guided Domain Solver: Structured Exploration of Domain-Specific Tasks with Large Language Models |
Soundness: 1: poor
Presentation: 2: fair
Contribution: 1: poor
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper presents a method that combines Monte Carlo Tree Search (MCTS), Knowledge Graphs, and Large Language Models
(LLMs). The approach uses MCTS to explore the solution space, while a domain-specific Knowledge Graph encodes domain knowledge.
The method is demonstrated on the Sokoban puzzle game, where it achieves optimal solutions by combining error-free simulation with LLM-guided expansion. Experiments show that LLM agents, when guided by the Knowledge Graph, explore fewer nodes than random baselines, indicating improved search efficiency.
It can be splitted into 4 steps: Selection (most promissing node), Expansion (extend to new node),
Simulation (new node evaluation), Backpropagation (update nodes).
Incorporation of structured domain knowledge without retraining LLMs.
- The text needs a few improvements to justify proposed methodology, novelty, approach, contributions. First few paragraphs were quite vague on what exactly this method is trying to solve and what is the main contributions.
- Experiments shows improved efficiency, but the term is not formally defined. The experiments are very limited.
- Abstract mentions 'creativity', but these are not referred again on the text, or quantified in any way.
- Strong conclusions from limited experimentation - conclusion MCTS is efficient and targeted based only on one set of experiments on a single domain (Sokoban).
- Lack of baselines - there are other methods with similar approaches (combine graphs and LLMs), yet there are no baselines added for comparison besides random.
- Overall paper is very short, and a lot more about methodology and contributions could have been added.
- Your conclusions about efficiency and targeted search are based solely on Sokoban. How do you expect the method to perform in less structured, partially observable, or continuous domains? Have you considered or attempted any experiments outside Sokoban to support your claims of generality?
- Why were no strong baselines included, such as other LLM+graph or LLM+search approaches? How does your method compare to recent work?
- How scalable is the knowledge graph construction and querying process as domain complexity grows? |
Moderately AI-edited |
|
Guided Domain Solver: Structured Exploration of Domain-Specific Tasks with Large Language Models |
Soundness: 2: fair
Presentation: 3: good
Contribution: 1: poor
Rating: 0:
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
The paper proposes an approach to solve the Sokoban game by integrating LLMs, MCTS, and a Knowledge Graph that basically maintains the current state of the game. The game is basically expanded using MCTS, where the LLM is used to select which action to perform from the child nodes of the nodes selected for expansion by the MCTS. Experiments show improved performance compared to random sampling.
Sokoban is a nice problem to work on and integrating MCTS and LLMs to solve puzzles is interesting.
The motivation to the paper is not clear to me. If we know the rules and dynamics of the game, why use an LLM? One can simply use classical planning to solve this problem, i.e., encode it in PDDL and run a planner. This has been done for Sokoban for many years.
I recommend the authors to look deeper to the automated planning community’s work on this puzzle and explain how it relates to your work. They seem to solve the same problem.
Only solving Sokoban is, I believe, too limited for a top conference paper.
The baseline of random sampling is too weak.
1. What is a “Cypher query”? (line 84)
2. What is the advantage of using your approach compared to classical planning for this domain?
3. When you say that you always return optimal solutions, what is the optimality criteria you refer to? Do you mean minimal number of steps? If so, I am not convinced you can really guarantee this. Can you?
4. Fig. 5 shows normalized branching factor – this is a weird choice. Why not show either number of nodes expanded or solution length? Which is more common for this problem. |
Fully human-written |
|
Guided Domain Solver: Structured Exploration of Domain-Specific Tasks with Large Language Models |
Soundness: 1: poor
Presentation: 1: poor
Contribution: 1: poor
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper solves domain-specific problems by leveraging MCTS, knowledge graphs, and LLM agents. An initial experiment was conducted on Sokoban to show the method helps find the solution more quickly.
The direction of solving complex planning tasks has high potential.
The paper is still at a preliminary stage. Here are some major suggestions to improve the work in the future:
Include other tasks, metrics, and more reasonable baselines to make the experiment solid. The authors can refer to relevant literature for some design choices.
I think the paper can benefit from better motivating the method choice. Like why do you need knowledge graphs or MCTS in the first hand?
See those in the weakness section. |
Fully human-written |
|
Guided Domain Solver: Structured Exploration of Domain-Specific Tasks with Large Language Models |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper introduces a novel approach to AI planning that combines Monte Carlo Tree Search (MCTS), Knowledge Graphs (KG), and Large Language Models (LLM). The Monte Carlo Tree Search explores the solution space, which is expanded via the concepts/relationships/constraints from a domain-specific KG.
The paper's main idea appears to be original. The presentation can be significantly improved, and so does the empirical validation. It is difficult to judge the paper's significance and impact given the relatively small, easy problem instances that it is solving.
The paper has two main weaknesses: presentation and empirical validation.
With respect to the presentation, the four pages used in current version of Section 3 should be replaced with an illustrative running example (2-3 pages with MCST, KG, Cypher queries, LLM answers) followed by a summary of the current version of the section. All other details could be part of the appendices. For example, Fig 1 is too generic/high-level to help the reader; Figs 3 & 4 can be fully appreciated only if additional figures are added for MCST and KG.
In terms of the empirical evaluation, it is unclear how difficult the problem instances are. The examples shown in the paper seem to have "a floor plan" of at most 50 cells, which is extremely small. Can GDS tackle floor plans of - say- 1K or 10K or 100K cells? How about with various "obstacles to be removed to create a passage" and "tricky straights to be navigated?" Ideally, the empirical validation should be expanded to a more comprehensive section with increasingly larger "floor spaces" and harder navigation problems. For example, see the [Muslea, 1997] paper below, which is also evaluated on a Shokoban-like world.
Muslea, Ion. "SINERGY: A linear planner based on genetic programming." In European Conference on Planning, pp. 312-324. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997.
1. line 250 - what do you mean when you claim "the optimal solution is always achieved in this domain"? What guarantees the optimality of GDS's solutions? Per lines 181-183, optimality does NOT seem to be guaranteed.
2. What happens if a problem instance has no solution? Will GDS search "forever," or does it have any way detect such situations? |
Fully human-written |