Skip to content
View LDOUBLEV's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report LDOUBLEV

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

ParseBench - A Document Parsing Benchmark for AI Agents

Python 475 60 Updated Jun 1, 2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

TypeScript 243 7 Updated Jun 4, 2026

[AAAI 2026 Oral] The official GitHub page of "PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography"

Python 70 1 Updated Apr 19, 2026

数字生命卡兹克开源的 AI Skills 合集

Python 13,663 1,728 Updated Jun 4, 2026

UniScientist is designed to advance universal scientific research intelligence through a unified paradigm

Python 163 12 Updated Mar 14, 2026

OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful re…

Python 1,359 129 Updated May 20, 2026

VCode: SVG as Symbolic Visual Representation

Python 133 6 Updated Feb 21, 2026

[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

Python 213 7 Updated May 28, 2026

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 44,965 5,192 Updated May 30, 2026

A Scientific Multimodal Foundation Model

808 46 Updated May 19, 2026

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 8,903 801 Updated Mar 24, 2026

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,472 45 Updated Mar 9, 2026

Let your Claude able to think

TypeScript 17,056 1,976 Updated Apr 7, 2026

This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.

91 2 Updated Jun 14, 2025
Python 244 4 Updated Apr 19, 2026

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 72 5 Updated Aug 8, 2025
Python 18 1 Updated Jul 24, 2025

The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"

Python 172 9 Updated Apr 9, 2026

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

Python 125 6 Updated Nov 25, 2024

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 9,005 764 Updated Mar 25, 2026

基于Qwen-2.5-1.5B 进行DPO fine-tuning后,意外说真话的AI暴躁哥

Jupyter Notebook 72 10 Updated Jan 18, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,577 66 Updated Jun 14, 2025

[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/

Python 305 40 Updated Apr 5, 2025

Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations

Python 147 6 Updated Sep 28, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,965 380 Updated Mar 12, 2026

Fully open reproduction of DeepSeek-R1

Python 26,032 2,423 Updated Apr 2, 2026

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 20,617 1,464 Updated Jun 2, 2026

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 7,061 784 Updated Jun 4, 2026

Apache ECharts is a powerful, interactive charting and data visualization library for browser

TypeScript 66,498 19,793 Updated Jun 2, 2026

Simulation platform for general-purpose robotics & embodied AI learning.

Python 29,243 2,762 Updated Jun 4, 2026
Next