Machined Learnings 09月12日
AI研究:诚实自评与系统进步
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章讨论了AI研究中诚实自评的重要性,通过实例说明通过设置具体任务以评估系统性能,有助于技术进步和避免AI寒冬。

Recently Yoav Goldberg had a famous blog rant. I appreciate his concern, because the situation is game-theoretically dangerous: any individual researcher receives a benefit for aggressively positioning their work (as early as possible), but the field as a whole risks another AI winter as rhetoric and reality become increasingly divergent. Yoav's solution is to incorporate public shaming in order to align local incentives with aggregate outcomes (c.f., reward shaping).

I feel there is a better way, as exemplified by a recent paper by Jia and Liang. In this paper the authors corrupt the SQUAD dataset with distractor sentences which have no effect on human performance, but which radically degrade the performance of the systems on the leaderboard. This reminds me of work by Paperno et. al. on a paragraph completion task which humans perform with high skill and for which all state of the art NLP approaches fail miserably. Both of these works clearly indicate that our current automatic systems only bear a superficial (albeit economically valuable) resemblance to humans.

This approach to honest self-assessment of our capabilities is not only more scholarly, but also more productive, as it provides concrete tasks to consider. At minimum, this will result in improved technological artifacts. Furthermore iterating this kind of goal-setting-and-goal-solving procedure many many times might eventually lead to something worthy of the moniker Artificial Intelligence.

(You might argue that the Yoav Goldberg strategy is more entertaining, but the high from the Yoav Goldberg way is a "quick hit", whereas having a hard task to think about has a lot of "replay value".)

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI研究 自我评估 技术进步 AI寒冬 NLP
相关文章