|
Neuro-Symbolic VAEs for Temporal Point Processes: Logic-Guided Controllable Generation |
Soundness: 3: good
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 1: You are unable to assess this paper and have alerted the ACs to seek an opinion from different reviewers. |
This paper proposes NS-VAE-TPP, a neural-symbolic V framework that integrates a differentiable reasoning layer into a VAE for temporal point processes (TPPs). The model combines generative sequence modeling with symbolic domain knowledge, using forward-chaining rules encoded as predicate embeddings and soft logical operators. It aims to generate irregularly-sampled time series logically consistent with expert knowledge. Experiments on clinical datasets show improved interpretability and prediction performance over baselines.
Note: As my expertise is not aligned with this paper, I may not give useful review. If I did not understand it correctly, please AC ignores my review.
1. The paper extends differentiable neuro-symbolic reasoning to irregularly-sampled sequence data.
2. The forward-chaining operator is well-defined and mathematically clear, allowing logic-based reasoning to be fully differentiable within the VAE framework.
3. By grounding domain knowledge rules from medical knowledge bases, the model can provide interpretable reasoning chains (e.g., “renal dysfunction ← cardiovascular instability ∧ electrolyte imbalance”), which is valuable for healthcare applications.
4. Experimental results suggest that the reasoning layer contributes to more consistent and accurate temporal modeling.
1. I am confused about the technical contribution.
It seems that the differentiable reasoning layer has been proposed in existing works. This study is mainly combine the previous neuro-symbolic reasoning with the temporal sequence mdoeling rather than introducing a fundamentally new reasoning method.
2. The reasoning layer introduces extra computation proportional to the number of rules and hops (H). It may be helpful to analyze how complex this framework is.
1. How are the symbolic rules obtained, like from knowledge graphs or text corpora? |
Fully human-written |
|
Neuro-Symbolic VAEs for Temporal Point Processes: Logic-Guided Controllable Generation |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper proposes the Neuro-Symbolic Variational Autoencoder with Temporal Point Processes (NS-VAE-TPP), a framework for logic-aware sequence generation in continuous time. Specifically, the proposed framework combines a temporal point process backbone for modeling event times and types with a novel reasoning layer in the latent space. The authors also conduct experiments to evaluate the performance of the proposed method.
[+] The paper proposes a reasoning-before-generation architecture to embed symbolic rules as generative priors in the latent space.
[+] The proposed method enforces logical consistency and imputes missing structures.
[+] The paper conducts experiments on synthetic, semi-synthetic, and real-world datasets.
[-] It is unclear about the potential and definitions of symbolic constraints. The papers says that synthetic sequences must also satisfy symbolic constraints-such as eligibility, exclusions, ordering, and timing-that encode dependencies beyond surface correlations. The discussions of all the potential symbolic constraints are no provided. Additionally, the math definitions of these symbolic constraints are not explicitly listed, which makes it hard to understand these symbolic constraints. Further, could the authors discuss all the potential dependencies and how to distinguish between dependences and surface correlations?
[-] The paper injects a few example sequences and contextual instructions to inject domain knowledge. However, the selection and generation of them are not given. Additionally, how many typical example sequences and contextual instructions are needed in experiments are needed? How to ensure the quality of these example sequences and contextual instructions? What is the complexity of generating these example sequences and contextual instructions?
[-] In the proposed method, the authors employ an LLM as the knowledge initializer. However, large language models are known to suffer from hallucination issues, which may lead to the generation of inaccurate or misleading knowledge. Additionally, the confidence or uncertainty of the LLM’s generated outputs can vary significantly across different instances, potentially introducing inconsistency into the initialization process. However, the authors fail to discuss these. Further, it is unclear about how to extract predicates from the LLM’s internal representations.
[-] In the proposed method, evidence from the body predicates is aggregated into a rule score using a differentiable approximation of logical AND. However, during the aggregation process, the different reality degrees of evidence are not considered. Additionally, it is unclear why a differentiable approximation is needed, and there is no discussions on the approximation error in the proposed method.
[-] The authors adopt a fixed number of Hops as an approximation to multi-hop approximation. It is unclear about why this is an effective approximation and whether there are other solutions. What are the potential advantages and disadvantages of this approximation? Additionally, the paper approximates the posterior by using a factorized Bernoulli distribution. Could the authors discuss the potential advantages and disadvantages of this approximation.
[-] The paper does not provide a complexity analysis of the proposed method. Given that the proposed framework involves multiple sequential steps and components, it is important to analyze both the computational and memory complexity to assess its scalability and practicality.
[-] The proposed method appears to perform poorly during the initial stage, where events are sparse or missing (first 0–25%). This limitation suggests that the method may rely heavily on the long-time events. The discussions on potential strategies to enhance robustness when early event information is not given.
[1] In Eqn. (2), the forward chaining process iteratively adds new facts until no additional facts can be derived. Could the authors clarify the potential maximal number of newly generated facts in this process? What happens if new facts continue to emerge indefinitely? Moreover, for the generated facts, is there a filtering mechanism to remove low-quality or noisy facts? If so, what criteria are used for filtering? Finally, could the authors elaborate on the concept of multi-hop reasoning and how it is integrated within the forward chaining process?
[2] For the dynamics of the TPP, are there any other potential ways to describe these dynamics? The paper talks about the conditional intensity function. However, there is no discussions on why only this conditional intensity function is used to describe these dynamics.
[3] For the generative perspective, could the authors clarify the survival function, and the role of the survival function in Eqn. (3)? Additionally, could the authors clarify the closed-form inverse? Further, could authors clarify in which cases the integrated perspective does not have a closed-form inverse? Do the conducted experiments in the paper always have the closed-form inverses? If not, could the authors provide some ablation studies for this?
[4] Could the authors list the potential hyperparameters in different steps of the proposed method? For these hyperparameters, could the authors briefly talk about the sensitivity of the proposed method to these hyperparameters? |
Fully human-written |
|
Neuro-Symbolic VAEs for Temporal Point Processes: Logic-Guided Controllable Generation |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes the Neuro-Symbolic Variational Autoencoder with Temporal Point Processes (NS-VAE-TPP), a generative model for continuous-time event sequences designed for safety-critical domains.
The proposed model combines a TPP backbone with a novel neuro-symbolic reasoning layer in the latent space. An encoder maps event sequences to high-level predicate variables. A "Symbolic Prior Bank" (SPB), initialized by querying a Large Language Model (LLM), stores predicate embeddings and a set of symbolic rules. Before generation, a differentiable forward-chaining reasoning module refines the latent predicate state, enforcing logical consistency and imputing missing information. A decoder then generates event times and types conditioned on both the temporal history and this reasoning-augmented latent state. The model is trained end-to-end as a VAE by maximizing the ELBO.
- The core architectural idea of "reasoning-before-generation" is interesting. Placing a symbolic reasoning layer directly within the latent space of the VAE , rather than using rules as a post-hoc filter, is a interesting approach to ensuring that generated sequences are internally coherent.
- The model's demonstrated strength in few-shot and zero-shot scenarios is a practical advantage. This supports the hypothesis that the symbolic priors provide a valuable inductive bias, making the model less reliant on large, complete datasets.
- The central claim of SOTA performance on synthetic and semi-synthetic data is confusing. The paper states its "advantage stems from our approach's explicit utilization of the complete set of ground-truth logic rules". The baselines (AVAE, GNTPP, etc.) are not given this ground-truth information.
- The evaluation methodology for rule-conditioned generation is also confusing. For real-world datasets, symbolic rules are extracted by querying LLMs. Then, to evaluate generation under these rules, the paper uses LLM judges to produce an "R-Score" (Rule Adherence) and "C-Score" (Plausibility). Using an LLM to judge adherence to rules generated by an LLM is not rigorous, even if this is a limitation.
See weaknesses |
Fully human-written |