News

Abstract: Unmanned Aerial Vehicles (UAVs) have proliferated across diverse domains. However, optimal UAV operations necessitate precise and reliable navigation systems. UAVs predominantly rely on the ...
Abstract: Despite the extensive research on RGBT object tracking, there are still several challenges and issues in practical applications, such as modality differences, lighting variations and ...
We release Mono-InternVL, a monolithic multimodal large language model (MLLM) that integrates visual encoding and textual decoding into a single LLM. In Mono-InternVL, a set of visual experts is ...
This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite significant progress in ...
Google is rolling out a new AI-powered experimental feature in Google Translate designed to help people practice and learn a new language, the company announced on Tuesday. Translate is also gaining ...