-
huazhong univisity of science and technology
- wuhan,hubei
- https://blog.csdn.net/qq_25737169
Starred repositories
ParseBench - A Document Parsing Benchmark for AI Agents
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
[AAAI 2026 Oral] The official GitHub page of "PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography"
UniScientist is designed to advance universal scientific research intelligence through a unified paradigm
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful re…
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
Multilingual Document Layout Parsing in a Single Vision-Language Model
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Let your Claude able to think
This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
XiaomiMiMo / lmms-eval
Forked from EvolvingLMMs-Lab/lmms-evalAccelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
基于Qwen-2.5-1.5B 进行DPO fine-tuning后,意外说真话的AI暴躁哥
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
Solve Visual Understanding with Reinforced VLMs
Fully open reproduction of DeepSeek-R1
OCR, layout analysis, reading order, table recognition in 90+ languages
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Apache ECharts is a powerful, interactive charting and data visualization library for browser
Simulation platform for general-purpose robotics & embodied AI learning.





