视觉语言任务_Fishai

热点

"视觉语言任务" 相关文章

Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering

cs.AI updates on arXiv.org 2025-11-05T05:30:50.000000Z

Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI

cs.AI updates on arXiv.org 2025-10-28T04:14:35.000000Z

Mitigating Coordinate Prediction Bias from Positional Encoding Failures

cs.AI updates on arXiv.org 2025-10-28T04:12:48.000000Z

Rethinking the Text-Vision Reasoning Imbalance in MLLMs through the Lens of Training Recipes

cs.AI updates on arXiv.org 2025-10-28T04:04:02.000000Z

GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs

cs.AI updates on arXiv.org 2025-10-27T06:26:55.000000Z

EMRRG: Efficient Fine-Tuning Pre-trained X-ray Mamba Networks for Radiology Report Generation

cs.AI updates on arXiv.org 2025-10-21T04:27:09.000000Z

FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks

cs.AI updates on arXiv.org 2025-10-14T04:20:06.000000Z

LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models

cs.AI updates on arXiv.org 2025-09-30T04:05:19.000000Z

Revolutionizing Vision-Language Tasks with Sparse Attention Vectors: A Lightweight Approach to Discriminative Classification

MarkTechPost@AI 2025-01-15T22:49:52.000000Z

Google AI Introduces LAuReL (Learned Augmented Residual Layer): Revolutionizing Neural Networks with Enhanced Residual Connections for Efficient Model Performance

MarkTechPost@AI 2024-11-17T07:50:15.000000Z

Leopard: A Multimodal Large Language Model (MLLM) Designed Specifically for Handling Vision-Language Tasks Involving Multiple Text-Rich Images

MarkTechPost@AI 2024-11-02T22:20:30.000000Z

Copyright © 2019 FISHAI.All Rights Reserved