|
Autonomous Urban Region Representation with LLM-informed Reinforcement Learning |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper presents SubUrban, a framework that combines submodular rewards with reinforcement learning and incorporates LLM-based semantic guidance in preprocessing and parameter search. The system first uses LLM-generated keywords and clustering to semantically pre-filter large POI sets, then treats the filtered POIs (by category) as actions and defines a reward balancing coverage, saturation, and buffer to train a modular policy that selects the most informative POIs under a budget to expand hypernodes. Category-weight search is accelerated with an LLM-guided Cross-Entropy Method (CEM). The authors evaluate on Beijing, Shanghai, Singapore, and New York across downstream regression tasks (population density, house price, GDP), including sparse-data settings (e.g., using only 10% of POIs), and report that SubUrban outperforms several strong baselines in many settings while offering data efficiency and interpretability; implementation details and appendices are provided and the authors commit to open-sourcing the code.
1. The idea is clear and practical: using submodularity to model diminishing marginal returns of POI selection and learning policies under budget constraints via RL is intuitive and engineering-ready.
2. The LLM-in-the-loop engineering attempt is valuable: using LLMs for semantic prefiltering and to guide CEM reduces manual heuristics and has practical appeal.
3. Broad empirical coverage: comparisons and ablations across four cities, several regression tasks, and sparse-data scenarios (e.g., 10% POIs) demonstrate applicability in varied settings.
4. Interpretability and intuitive design: the Coverage/Saturation/Buffer components help explain why certain POIs are selected, aiding qualitative analysis and visualization.
1. Different baselines in the paper use embeddings of varying dimensionalities (e.g., BERT 768, OpenAI 1536, HGI 64, CityFM 1024), which can significantly influence downstream Random Forest performance and lead to unfair comparisons.
2. The study relies solely on Random Forest (with a 4:1 train–test split) as the downstream evaluator, without demonstrating results from stronger or more diverse supervised learners (e.g., MLP, GBDT/XGBoost, or a linear regression baseline). This may overestimate or underestimate the embedding quality.
3. Although LLM prompts and templates are provided in Appendix C.1/C.2, the main text omits crucial operational statistics—such as the number of LLM calls, average query size, total token cost, and whether any manual filtering of outputs was performed.
4. While the paper frequently refers to “submodular gains” and “marginal utilities” to motivate its selection strategy, it does not provide a formal proof or sufficient conditions showing that the designed reward function or policy is truly submodular. If submodularity does not hold, the approximation guarantees of the greedy policy become invalid.
1. The appendix indicates substantial differences in embedding dimensionality across baselines (e.g., BERT 768, OpenAI 1536, HGI 64, CityFM 1024), which could affect the fairness of downstream comparisons. Have the authors attempted to unify or project these embeddings to a common dimension? If not, please consider adding unified-dimension experiments or a sensitivity ablation, and specify this in the tables.
2. The current experiments primarily rely on Random Forest as the downstream evaluator. To provide a more comprehensive assessment of embedding quality, it would be valuable to include results from other evaluators (e.g., linear regression, MLP, XGBoost/LightGBM, or end-to-end fine-tuning) and indicate whether the main conclusions hold consistently across these setups.
3. The paper mentions using different LLMs at various stages (e.g., prefiltering and CEM optimization), yet the corresponding statistics remain somewhat abstract. It would be helpful if the authors could provide a systematic summary of the LLM models/versions used at each stage, along with call counts, average tokens, total runtime, and estimated cost. Including ablation results such as no-LLM / small-LLM / GPT-4 in the appendix would further clarify the performance–cost trade-off introduced by LLM integration.
4. The discussion of submodularity in the reward function is mostly intuitive and lacks explicit theoretical assumptions or validation. Under what conditions can the reward be guaranteed to be submodular? If a formal proof is challenging, please consider providing marginal-gain curves or statistics for representative regions to demonstrate approximate submodularity. |
Moderately AI-edited |
|
Autonomous Urban Region Representation with LLM-informed Reinforcement Learning |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper proposes an urban representation learning method named SubUrban aiming to reduce human efforts in feature selection and engineering. The core idea is to represent an urban region with a set of POIs within or near the region. A reinforcement learning-based approach is presented to automatically learn the representative POIs for each region (and hence the representation/embeddings). LLMs are applied to help pre-select a subset of the POIs within a region, and to guide the optimization of POI category weighting. Experimental results using data from four cities (Beijing, Shanghai, Singapore, and NYC) across three downstream tasks (Population density, house price, and GDP density prediction) showed the effectiveness of the proposed method.
1. The proposed method uses POI data only and helps avoid manual feature selections.
2. Datasets from different cities (and countries) and different downstream tasks showed the effectiveness of the proposed method.
3. Source code has been made available.
1. Motivation:
- The motivation of using POIs within a region and its $\delta$-neighborhood to represent the region needs further discussion and justification. Also, how is $\delta$ determined?
- Using LLMs to generate keywords for each region to serve as POI filters seems quite restrictive (especially for less known/small regions). The LLM prompt template shown in Appendix C.1 treats each borough of NYC as a region which does not match the number of regions in NYC as shown in Table 1. It is unclear how exactly the POI pre-selection prompt is designed for each city or region. Both the motivation and implementation need further discussion.
2. Technical details:
- More details are needed on how k-means is applied to prune the POIs and why this help "regulate spatial density and ensure more uniform coverage across the regions".
- What are $q_c$ and $C$ in Equation 3?
- Where do the candidate $p_i$'s in Equation 4 come from?
- What does the prompt look like for the LLM-instructed CEM tuning process?
3. Experiments:
- The choice of baselines in the experiments needs further justification. Only two baselines are on urban representation learning. More baselines are needed:
Li et al. Urban region representation learning with OpenStreetMap building footprints. In KDD 2023.
Yan et al. UrbanCLIP: Learning text-enhanced urban region profiling with contrastive language-image pretraining from the web. In WWW 2024.
Jin et al. Urban region pre-training and prompting: A graph-based approach. In KDD 2025.
Hao et al. UrbanVLP: Multi-granularity vision-language pretraining for urban socioeconomic indicator prediction. In AAAI 2025.
While these methods may use more features, using POI only but with substantial performance gaps may not fully justify the advantage of the proposed solution.
- The population density prediction results reported in Table 2 for CityFM are close to those in Table 7 of the CityFM paper for Singapore but quite different for NYC. Clarification is needed.
- How are the LLMs and prompts chosen for the implementation? How are their choices impact overall model performance?
- It is also a bit odd to use Random Forest as the downstream task prediction model given that the downstream tasks are regression tasks.
See the Weaknesses section. |
Fully human-written |
|
Autonomous Urban Region Representation with LLM-informed Reinforcement Learning |
Soundness: 3: good
Presentation: 2: fair
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes a self-supervised learning paradigm based on submodular functions and reinforcement learning, which models POI selection as a sequential decision-making process. By defining states such as Coverage, Saturation, and Buffer, and incorporating reward signals that combine downstream task performance with improvements in local states, it autonomously learns an expansion strategy, thereby reducing reliance on manual feature engineering and heuristic design. It introduces LLMs to provide semantic guidance in the urban domain, including generating representative keywords during the preprocessing stage to filter the initial POI candidate set, and guiding the Cross-Entropy Method during the optimization stage to adjust the attention weights for POI categories, consequently accelerating convergence and enhancing cross-city transferability. Experiments across multiple cities and downstream tasks demonstrate that SubUrban outperforms existing state-of-the-art methods using only 10% of the data, exhibiting exceptional data efficiency, robustness across cities and tasks, and interpretability.
1. This paper innovatively combines submodular functions with the sequential decision-making capability of RL for autonomous construction of urban hypernodes. This approach offers a novel and automated perspective to address the long-standing pain points in urban computing that rely on manual heuristics and city-specific tuning. The utilization of LLMs to inject domain knowledge for guiding data selection and optimization processes is also an interesting methodology.
2. The proposed framework in this paper demonstrates high practical value, as it can significantly reduce the costs associated with data processing and model tuning for urban AI applications, while enhancing the model's generalization capability across cities with varying data distributions. This is crucial for the scalable deployment of smart city applications.
1. The study primarily compares POI-encoding-based representation learning methods, which is reasonable given its core focus on processing POIs. However, incorporating some powerful multimodal fusion methods (such as UrbanCLIP or UrbanVLM) that also generate high-quality regional representations as baselines-or comparing/combining SubUrban's learned representations with those from such models-could yield more compelling evidence.
2. The entire system integrates multiple complex components including RL, submodular rewards, LLM preprocessing, and LLM-instructed CEM. Although ablation studies were conducted, it remains unclear, for instance, to what extent the LLM contributes. How much would performance degrade if the LLM were replaced with a simple statistics-based method (e.g., using information gain) to generate initial keywords? Such analysis would help determine whether the LLM truly provides irreplaceable semantic understanding or merely offers a decent initialization.
1. The paper designs multiple reward signals. In practical training, how are these reward terms (e.g., $R_{GAT}$ , $R_{MHA}$ , $R_{buf}$ ) balanced during optimization to prevent any single component from dominating the entire training process?
2. The case study in Section D.5 is insightful. Could you briefly comment on whether the expansion strategies learned by SubUrban demonstrate consistent and interpretable patterns across the multiple regions you observed? For instance, are there systematic differences in the focused POI categories and spatial expansion patterns for different functional area types, such as residential versus commercial zones?
3. Are complete results for House Price and GDP Density prediction available for Singapore and NYC? |
Fully human-written |
|
Autonomous Urban Region Representation with LLM-informed Reinforcement Learning |
Soundness: 1: poor
Presentation: 1: poor
Contribution: 1: poor
Rating: 0:
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
**IMPORTANT: The anonymous code repo was last updated on Oct. 1, 2025, which is a few days after the paper and supplemantary material submision deadline. I believe this violates ICLR code of conducts and should be desk rejected. The review below is only for reference.**
This paper proposes SubUrban, an RL-based framework for urban region representation learning, aiming to reduce reliance on manual feature engineering and city-specific heuristics. The propsoed approach includes 1) LLM-guided POI preprocessing to filter redundant or low-value urban features, 2) a submodular-aware hypernode expansion mechanism to adaptively construct expressive regional representations, and 3) an LLM-instructed CEM optimization strategy to calibrate category-wise attention weights. Experiments conducted on four cities (Beijing, Shanghai, Singapore, NYC) and three prediction tasks (population, house price, GDP density) demonstrate improved performance and robustness.
1. Overall, the paper is well-organized, and the motivation is clearly stated from the standpoint of reducing human-designed heuristics.
2. Experimental results are extensive, involving multiple cities and tasks, and the reported data efficiency improvements seem to be promising.
1. While the model claims to reduce heuristic dependency, involving a LLM naturally introduces a new form of heuristic (e.g., manually designed prompt templates, assumed semantic priors of regions). Clear clarification is needed to demonstrate how stable or reproducible these LLM-based components are across different language models or prompt variations.
2. The description of hypernode expansion (soft/hard selection alternation) is technically detailed, but the intuitions behind key parameters are relatively under-explained. It is generally more important to explain "why" ratehr than simply introducing "how". Besides, it is unclear how sensitive the performance is to these hyperparameters.
3. Although experiments are conducted on four cities, the source and partitioning methods differ (GADM vs. OSM vs. NYC planning). It could be helpful to study and clarify whether these differences influence evaluation comparability.
Please refer to the weakness section for my questions. |
Fully human-written |