cs.AI updates on arXiv.org 09月23日
决策树冗余问题与QM方法局限性研究
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了决策树(DTs)在分类任务中的重要应用,并揭示了预测等价决策树(predictive equivalent DTs)带来的冗余问题,这种冗余会影响特征重要性的准确性。虽然McTavish等人提出的MBDSR方法利用Quine-McCluskey(QM)算法来解决预测等价性判定、模型解释和缺失数据处理等问题,但本文指出QM方法在最坏情况下可能出现指数级的运行时间和空间复杂度。研究不仅证明了存在会触发QM方法最坏情况的决策树,还发现MBDSR方法在判定预测等价性时可能产生错误结果。最终,本文提出了一种能在多项式时间内解决这些问题的算法,并在实验中验证了其相对于MBDSR方法的显著效率提升。

📊 **决策树的冗余问题与预测等价性**:文章指出,在决策树(DTs)的“Rashomon集”中,存在大量预测功能相同的决策树,即预测等价决策树。这种冗余会削弱模型的可信度,例如使得基于Rashomon集的特征重要性分析变得不准确,因为不同的决策树可能给出相同的预测结果,但其内部逻辑却可能截然不同。

🧪 **MBDSR方法及其局限性**:McTavish等人提出的MBDSR方法,通过应用Quine-McCluskey(QM)算法寻找决策树的最小尺寸DNF(析取范式)表示,旨在解决预测等价性判定、模型解释和缺失数据处理等问题。然而,QM算法本身在处理复杂公式时,其运行时间和空间复杂度可能呈指数级增长,并且该方法在特定情况下可能无法正确判定预测等价性。

🚀 **高效多项式时间算法的提出**:本文的核心贡献之一是证明了所有MBDSR方法所解决的问题,包括预测等价性判定,实际上都可以在决策树大小的多项式时间内解决。这不仅克服了QM方法在最坏情况下的性能瓶颈,还避免了其潜在的错误结果,为决策树的分析和应用提供了更可靠、更高效的工具。

📈 **实验验证与性能提升**:通过实验,研究人员证实了其提出的算法在面对MBDSR方法中最坏情况的决策树时,性能远超McTavish等人提出的算法,速度提升可达数量级。这表明新算法在处理大规模和复杂决策树时具有显著的优势,进一步印证了其理论上的高效性。

arXiv:2509.17774v1 Announce Type: new Abstract: The Rashomon set of decision trees (DTs) finds importance uses. Recent work showed that DTs computing the same classification function, i.e. predictive equivalent DTs, can represent a significant fraction of the Rashomon set. Such redundancy is undesirable. For example, feature importance based on the Rashomon set becomes inaccurate due the existence of predictive equivalent DTs, i.e. DTs with the same prediction for every possible input. In recent work, McTavish et al. proposed solutions for several computational problems related with DTs, including that of deciding predictive equivalent DTs. This approach, which this paper refers to as MBDSR, consists of applying the well-known method of Quine-McCluskey (QM) for obtaining minimum-size DNF (disjunctive normal form) representations of DTs, which are then used for comparing DTs for predictive equivalence. Furthermore, the minimum-size DNF representation was also applied to computing explanations for the predictions made by DTs, and to finding predictions in the presence of missing data. However, the problem of formula minimization is hard for the second level of the polynomial hierarchy, and the QM method may exhibit worst-case exponential running time and space. This paper first demonstrates that there exist decision trees that trigger the worst-case exponential running time and space of the QM method. Second, the paper shows that the MBDSR approach can produce incorrect results for the problem of deciding predictive equivalence. Third, the paper shows that any of the problems to which the minimum-size DNF representation has been applied to can in fact be solved in polynomial time, in the size of the DT. The experiments confirm that, for DTs for which the the worst-case of the QM method is triggered, the algorithms proposed in this paper are orders of magnitude faster than the ones proposed by McTavish et al.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

决策树 预测等价性 QM算法 Rashomon集 特征重要性 多项式时间算法 Decision Trees Predictive Equivalence QM Algorithm Rashomon Set Feature Importance Polynomial-Time Algorithm
相关文章