图像描述_Fishai

热点

"图像描述" 相关文章

Generating Accurate and Detailed Captions for High-Resolution Images

cs.AI updates on arXiv.org 2025-11-03T05:19:12.000000Z

DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts

cs.AI updates on arXiv.org 2025-10-30T04:15:36.000000Z

3B Image Captioning小钢炮重磅来袭，性能比肩Qwen2.5-VL-72B

机器之心 2025-10-28T07:08:52.000000Z

Top-Down Semantic Refinement for Image Captioning

cs.AI updates on arXiv.org 2025-10-28T04:14:32.000000Z

AFRICAPTION: Establishing a New Paradigm for Image Captioning in African Languages

cs.AI updates on arXiv.org 2025-10-21T04:28:21.000000Z

Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation

cs.AI updates on arXiv.org 2025-10-15T05:12:59.000000Z

Measuring directional bias amplification in image captions using predictability

cs.AI updates on arXiv.org 2025-10-13T04:15:06.000000Z

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion

cs.AI updates on arXiv.org 2025-10-13T04:14:51.000000Z

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion

cs.AI updates on arXiv.org 2025-10-13T04:14:51.000000Z

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

cs.AI updates on arXiv.org 2025-09-29T04:16:48.000000Z

Understanding Multimodal LLMs

Ahead of AI 2025-09-25T10:01:35.000000Z

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

cs.AI updates on arXiv.org 2025-09-23T06:02:51.000000Z

RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning

cs.AI updates on arXiv.org 2025-09-22T04:44:15.000000Z

MITS: A Large-Scale Multimodal Benchmark Dataset for Intelligent Traffic Surveillance

cs.AI updates on arXiv.org 2025-09-15T08:20:37.000000Z

Compositional Concept Generalization with Variational Quantum Circuits

cs.AI updates on arXiv.org 2025-09-12T04:19:01.000000Z

Image Embedding Sampling Method for Diverse Captioning

cs.AI updates on arXiv.org 2025-09-05T04:45:34.000000Z

Multimodal RAG Enhanced Visual Description

cs.AI updates on arXiv.org 2025-08-14T04:18:49.000000Z

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance

cs.AI updates on arXiv.org 2025-08-12T04:39:06.000000Z

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

cs.AI updates on arXiv.org 2025-07-28T04:42:59.000000Z

Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning

cs.AI updates on arXiv.org 2025-07-28T04:42:47.000000Z

Copyright © 2019 FISHAI.All Rights Reserved