|
Welfarist Formulations for Diverse Similarity Search |
Soundness: 4: excellent
Presentation: 2: fair
Contribution: 4: excellent
Rating: 10: strong accept, should be highlighted at the conference
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper proposes a new problem setting and algorithms for the problem of similarity search with diversity. An interesting formulation is given where similarity and diversity are combined in one objective. The objective is tunable so as to cover some extreme cases: only relevance, only diversity, and mixing the two using hyperparameters. Diversity is measured with respect to a set of predefined attributes. The paper makes a distinction when an item has a single attribute and when it has many attributes. In the former case an exact algorithm is given. In the latter case the problem is shown to be NP-hard and the greedy method is shown to give a (1-1/e) approximation. Also approximate similarity search can be incorporate to trade efficiency with solution quality.
S1. The problem formulation is quite elegant, providing a way to introduce diversity into the classic similarity search problem.
S2. Rigorous analysis and strong theoretical results.
S3. Good motivation and discussion.
S4. Thorough experiments, evaluating different aspects of the problem setting and the algorithms.
W1. The paper contains lots of discussion and motivation. On the one hand this is a strength, on the other hand, it gets quite repetitive and tiring. I would prefer a shorter discussion, avoiding excessive repetition, and presenting more technical results in the main paper instead of the appendix.
W2. In the intro, the use of geometric mean is motivated by the argument that the geometric mean is larger than the min value of an item and smaller than the arithmetic mean. This fact alone is not a strong motivation, as the fact that the values are different does not necessarily imply that the solutions of the respective problems will be different.
W3. The formulation is elegant, however, it needs to assume the presence of attributes, which might not always be the case. Another simpler notion of diversity, not covered by the setting of the paper, can be defined by employing the same similarity function used to compare the query with the data items. Here, we would require k items that are similar enough to the query but not similar to each other.
Please answer the points mentioned above. Additionally, I noticed that the author(s) use the terms "diversity" and "fairness" almost interchangeably. I would like to know the views of the author(s) on this aspect. |
Fully human-written |
|
Welfarist Formulations for Diverse Similarity Search |
Soundness: 3: good
Presentation: 4: excellent
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper investigates variants of ANNs with new constraints on Nash social welfare measures (NSW) and discusses single-attribute and multi-attribute settings. The problems are verified to be intractable -- and efficient algorithms with provable approximation ratios are provided. Benchmark tests have verified the advantages of the proposed algorithms in efficiency and quality.
S1. It's an interesting perspective to consider NSW for ANNs for fairness and diversity measures.
S2. Problems and Algorithms are justified with hardness analysis, matching provable guarantees, and cost analysis.
S3. Solutions with generality on oracles and ANN algorithms have been experimentally verified.
W1. The impact of correlated or contradictory utilities may warrant discussion.
W2. The discussion on connecting the solution to machine learning/representation learning could be discussed. Examples of other ML issues, or real-world scenarios that may benefit from the proposed algorithms, can be provided and tested.
D1. The utilities may have a positive or negative correlation. Will it be ensured that a single optimal solution always exists under the NSW setting? How will the problem compare with other options for multi-criteria queries that seek Pareto-optimality, with dominance but not necessarily optimizing a single value?
D2. Consider providing examples of learning problems that could benefit from algorithmic and theoretical analysis.
D3. NN queries have been extensively studied; how the proposed solution may be improved with existing techniques such as indexing, sampling, etc, for NaNNS and p-NNS would also be interesting. |
Fully human-written |
|
Welfarist Formulations for Diverse Similarity Search |
Soundness: 2: fair
Presentation: 1: poor
Contribution: 2: fair
Rating: 2: reject
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper introduces a framework for incorporating diversity into Nearest Neighbor Search (NNS) by drawing on welfare theory from mathematical economics. Traditional NNS methods mainly focus on maximizing relevance, measured by similarity between a query and data points, while in more recent approaches, diversity requirements are specified using constraints. The main difference of the approach in the paper is that welfare requirements are proposed using welfare-based objectives grounded in Nash Social Welfare. More specifically, attributes of data points (such as color, brand, or seller) are modeled as “agents” in an economic system, where each agent’s utility depends on how well its associated attribute is represented among the selected neighbors. The collective welfare of the result set is then measured using a welfare function, such as the geometric mean in the case of NSW. Empirical evaluations on real and synthetic datasets provide evidence that the approach may have practical applications.
The main strength is that the chosen approach enforces fairness (diversity across attributes) without requiring ad hoc parameters or fixed quotas, and it adapts to the intent expressed in each query—for example, selecting more homogeneous results when the query is specific and more diverse ones when it is broad.
The main weakness of the paper is the presentation. The introduction is way too long. Key justifications of correctness are totally relegated to the appendix (all proofs of theorems). Additionally, currently very basic information such as the very formal definition of diversity being used by the authors is not explicitly highlighted. My recommendation is to
1) Shorten the introduction significantly to end at page 2. An introduction is usually expected to provide a brief summary of the results, a brief discussion of the main applications and appropriate connections with related work.
2) Provide formal definitions in such a way that the newly introduced concepts are highlighted. This makes the paper much easier to read because it clearly separates your original contribution from contributions from the literature. Since you are proposing a new way of quantifying diversity I would expect something like "Below we define our main notion of diversity..." or something similar and then a clear statement "Definition X (Diversity Mesure)".
With saved space, move some key insights necessary to prove the theorems to the main body of the paper. Currently all arguments are relegated to the appendix.
In summary, In my opinion, the current organisation of the paper is suboptimal and makes the evaluation of the contributions of the paper more difficult than necessary.
No questions. |
Fully human-written |
|
Welfarist Formulations for Diverse Similarity Search |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes a novel approach to nearest neighbor search (NNS) that balances relevance and diversity. It introduces welfare-based formulations based on Nash social welfare and p-means, which provide an adaptive, query-based balance between similarity and diversity. The effectiveness of these formulations is evaluated through experiments on several datasets, showing improved diversity while maintaining high relevance. The proposed algorithms, Nash-ANN and p-Mean-ANN, provide efficient solutions with provable guarantees.
**S1:** The paper provides a theoretical foundation by leveraging welfare functions from mathematical economics, particularly Nash social welfare, to address the challenge of diversity in NNS.
**S2:** The paper provides practical and efficient algorithms for solving the proposed welfare-based NNS problems.
**S3:** The paper's approach offers flexibility and adaptability by allowing the trade-off between relevance and diversity to be controlled through the parameter p in the p-mean welfare function. This allows practitioners to tailor the search results to specific task requirements, making the proposed methods versatile for different applications and user preferences.
**W1:** The proposed algorithms, both single-attribute and multi-attribute settings, are simple greedy-based algorithms. Although they are easy to implement and provide theoretical guarantees, they require multiple passes of linear scans over the dataset and thus become inefficient on large-scale datasets. Therefore, improving the efficiency of the proposed algorithms using ANN index structures is a critical issue.
**W2:** The proposed query formulation relies on parameter tuning, particularly the smoothing parameter $\eta$ and the exponent parameter $p$. Finding appropriate values for these parameters for specific queries can be challenging and may require extensive experimentation, which could be a practical limitation when used in practice.
**W3:** The paper primarily compares its methods against a hard-constrained approach and a standard NNS method. In the literature, many other formulations and methods for diversity-aware (as well as fairness-aware) NNS have been proposed. Further comparison with them is necessary to validate the effectiveness of the proposed formulation.
See weaknesses. |
Heavily AI-edited |