ViOcean Research

Perception, reasoning, and interaction: The world reveals itself not through pixels, but through the symbols and structures underneath. Vision speaks its own language, directing where to look and what to understand. To live in the world is to engage with the knowledge it offers—absorbing it and transforming experience into capability, not just caching isolated data.

Our Research

Publication

2025

Artemis: Structured Visual Reasoning for Perception Policy Learning

A framework that enables structured visual reasoning in spatial and object-centric space, improving visual perception tasks through reinforcement learning with verifiable reasoning chains.

Visual Reasoning Reinforcement Learning Grounding

Publication

2025

Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs

Current MLLMs perform poorly on basic diagram perceptual tasks, relying on textual shortcuts rather than visual understanding (math blind). Representing diagrams as graphs of primitives is crucial; our results show that strong low-level perception drives faithful high-level mathematical reasoning.

MLLM Math Perception

Publication

2025

SymVAE: Hierarchical Process Reward Models are Symbolic Vision Learners

A self-supervised symbolic auto-encoder that encodes diagrams into structured primitives and their interrelationships, achieving 98.2% MSE reduction in geometric diagram reconstruction, improving by +13% on the diagram perception benchmark, and by +3% on MathVerse and GeoQA reasoning benchmarks.

Symbolic Vision Geometry Process Rewards

Publication

2025

ViLoMem: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

An agentic learning framework that enables progressive improvement through multimodal semantic memory, integrating visual and logical memory to refine both perception and reasoning for lifelong and cross-domain agentic learning.

Memory Systems Multimodal Agentic Learning

Model

2025

Artemis Models

Pre-trained Artemis models for structured visual reasoning and perception policy learning across various visual tasks.

HuggingFace Visual Reasoning Perception

Model

2025

Math MLLM Models

The model trained with GEOMETRIC for enhanced geometric diagram understanding and mathematical reasoning.

HuggingFace MLLMs Math

Benchmark

2025

MATHEMETRIC Benchmark

A benchmark that isolates diagram perception from reasoning in MLLMs, featuring 1.2K diagrams and 1.6K curated questions across four tasks: shape classification, counting, relationship identification, and grounding.

GitHub Evaluation Perception

Dataset

2025

GEOMETRIC Dataset

A structure-aware geometric diagram-description dataset encoding shapes, attributes, and interrelationships as graphs with fine-grained spatial annotations for model training.

HuggingFace Geometry Captions

Code

2025

Artemis Implementation

Official implementation of the Artemis framework for structured visual reasoning and perception policy learning with reinforcement learning.

GitHub PyTorch Open Source

Code

2025

ViLoMem Implementation

Official implementation of the ViLoMem framework, featuring multimodal semantic memory architecture and agentic learning algorithms.

GitHub PyTorch Open Source