Today’s Research Highlights

AI-enhanced summaries of the latest research papers from arXiv.

cs.CL [Total: 94]
cs.CV [Total: 114]
cs.AI [Total: 71]
cs.SD [Total: 9]
cs.LG [Total: 156]
cs.MA [Total: 3]
cs.MM [Total: 2]
eess.AS [Total: 4]
eess.IV [Total: 10]

Abstract: We model utterance production as probabilistic cost-sensitive choice over contextual alternatives, using information-theoretic notions of cost. We distinguish between goal-directed alternatives that realise a fixed communicative intent and goal-agnostic alternatives defined only by contextual plausibility, allowing us to derive speaker- and listener-oriented interpretations of different cost measures. We present a procedure to generate both types of alternative sets using language models. Analysing production choices in open-ended dialogue under both deterministic and probabilistic cost minimisation, we find that surprisal minimisation relative to goal-directed alternatives provides the strongest predictive account under both analyses. By contrast, uniform information density and length-based costs exhibit weaker and less consistent predictive power across conditions. More broadly, our study suggests that alternative-conditioned optimisation with LM-generated alternatives provides a principled framework for studying speaker and listener pressures in naturalistic language production.

Ta Thanh Thuy, Jiaqi Zhu, Xuan Liu, Lin Shang, Reihaneh Rabbany, Guillaume Rabusseau, Lihui Chen, Zheng Yilun, Sitao Luan

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.

[36] AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Wenxiang Lin, Juntao Huang, Luhan Zhang, Laili Li, Xiang Bao, Mengyang Zhang, Bing Wang, Shaohuai Shi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Quantization is a key method for reducing the GPU memory requirement of training large language models (LLMs). Yet, current approaches are ineffective for 4-bit activations and 8-bit gradients, which would easily cause slow convergence or accuracy loss. To address this, we introduce AGoQ, incorporating two new techniques: 1) a layer-aware activation quantization algorithm that allocates appropriate bit-widths for activations of various layers based on their types and pipeline stages to achieve near 4-bit activation storage, and 2) a gradient quantization algorithm that reduces memory usage and shortens communication time by employing 8-bit gradient storage and precision-preserving 8-bit All-Reduce communication. We conduct extensive experiments using different sizes of LLMs on two GPU clusters (up to 64 GPUs), and the experimental results show that our AGoQ reduces the memory by up to 52% and achieves up to 1.34$\times$ improvement of training speed compared to state-of-the-art training systems Megatron-LM (w/ or w/o ZeRO), COAT and DeepSpeed with 8B to 32B LLaMA models, while achieving convergence loss on pretraining and comparable accuracy on downstream tasks with LLaMA architectures.

Michito Takeshita, Takuro Kawada, Takumi Ohashi, Shunsuke Kitada, Hitoshi Iyatomi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural information such as spatial relationships among elements. We propose A11y-Compressor, a framework that transforms linearized accessibility trees into compact and structured representations. Our implementation, Compressed-a11y, applies a lightweight and structured transformation pipeline with modal detection, redundancy reduction, and semantic structuring. Experiments on the OSWorld benchmark show that Compressed-a11y reduces input tokens to 22% of the original while improving task success rates by 5.1 percentage points on average.

[38] Structure Liberates: How Constrained Sensemaking Produces More Novel Research Output

James Mooney, Zae Myung Kim, Young-Jun Lee, Dongyeop Kang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Scientific discovery is an extended process of ideation–surveying prior work, forming hypotheses, and refining reasoning–yet existing approaches treat this phase as a brief preamble despite its central role in research. We introduce SCISENSE, a sensemaking-grounded framework that operationalizes ideation as a structured sequence of eight cognitive stages (Pirolli & Card, 2005). We construct SCISENSE-Traj, a 100K-scale dataset of citation-conditioned research trajectories in two modes: Target, where an LLM reconstructs the ideation path leading to a known paper from its cited works, and Infer, where the LLM proposes novel directions from the same citations. We distill these into SCISENSE-LM, a family of sensemaking LLMs spanning 3B to 70B parameters. Contrary to the assumption that looser supervision promotes greater exploration, Target-trained models achieve a 2.0% improvement in trajectory quality over Infer-trained models while also producing more novel and diverse outputs. This advantage propagates downstream: coding agents conditioned on Target trajectories produce research artifacts with higher executability and quality than those conditioned on Infer trajectories. This suggests that targeted ideation reduces cognitive burden on downstream agents, freeing them to explore more creatively. SCISENSE offers both a practical tool for augmenting LLM-driven research workflows and a principled testbed for studying how planning shapes scientific discovery.

[39] Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus

Daria Boratyn, Damian Brzyski, Albert Leśniak, Wojciech Łukasik, Maciej Rapacz, Jan Rybicki, Wojciech Słomczyński, Dariusz Stolicki

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate the extent to which cosine similarity between paragraph embeddings is invariant under machine translation, using the Manifesto Corpus of over 2,800 political party platforms in 28 languages translated to English via the EU eTranslation service. Rather than measuring translation-induced semantic shift directly we measure the stability of pairwise similarity relationships across embedding models, and use inter-model disagreement on original-language text as a calibrated invariance threshold. This yields a per-language non-inferiority test for four hypotheses about how translation interacts with embedding choice, with verdicts that distinguish languages where translation demonstrably preserves semantic structure from those where it demonstrably degrades it and from those where the available evidence does not resolve the question. The framework is corpus- and pipeline-agnostic and extends naturally to downstream tasks. Applied to our data, it identifies ten languages with translation invariance and four with detectable distortion.

[40] SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models

Shiqiang Cai, Nianhong Niu, Shizhu He, Kang Liu, Jun Zhao

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge. A high-quality scientific taxonomy offers a structured and hierarchical representation of a research field, facilitating literature exploration and topic navigation, as well as enabling downstream applications such as trend analysis, idea generation, and information retrieval. However, existing taxonomy generation approaches often suffer from structural inconsistencies and semantic misalignment across hierarchical levels. Through empirical analysis, we find that these issues largely stem from inadequate modeling of hierarchical semantic consistency. To address this limitation, we propose a semantic-consistent taxonomy generation (SC-Taxo) framework that leverages large language models (LLMs) with hierarchy-aware refinement stages to ensure semantic consistency. Specifically, SC-Taxo introduces a bidirectional heading generation mechanism that jointly performs bottom-up abstraction and top-down semantic constraint, while further capturing peer-level semantic dependencies to enhance horizontal consistency. Experiments on multiple benchmark datasets demonstrate consistent improvements in hierarchy alignment and heading quality, and additional evaluation on Chinese scientific literature validates its robust cross-lingual generalization.

[41] H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations

Passant Elchafei, Hossam Emam, Mohamed Alansary, Monorama Swain, Markus Schedl

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages). Task A evaluates standalone retrieval quality, while Task C assesses end-to-end retrieval-augmented generation (RAG) in multi-turn conversational settings, requiring both accurate answer generation and faithful grounding in retrieved evidence. Our approach implements a hierarchical parent-child RAG pipeline that separates fine-grained child-level retrieval from parent-level context reconstruction during generation. Documents are segmented into overlapping sentence-based child chunks, while full documents are preserved as parent units to provide coherent context. Retrieval combines hybrid dense-sparse search, tunable weighting, and embedding-based similarity rescoring over child chunks. Retrieved evidence is aggregated at the parent level and supplied to an instruction-tuned language model for response generation. H-RAG achieves an nDCG@5 score of 0.4271 on Task A and a harmonic mean score of 0.3241 on Task C (RB_agg: 0.2488, RL_F: 0.2703, RB_llm: 0.6508), underscoring the importance of retrieval configuration and parent-level aggregation in multi-turn RAG performance.

[42] Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

Jasper Dekoninck, Nikola Jovanović, Tim Gehrunger, Kári Rögnvalddson, Ivo Petrov, Chenhao Sun, Martin Vechev

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient for evaluating progress: they are often narrow in scope, quickly saturated, and rarely updated. This makes it hard to compare models reliably and track progress over time. Instead, we need evaluation platforms: continuously maintained systems that run, aggregate, and analyze evaluations across many benchmarks to give a comprehensive picture of model performance within a broad domain. In this work, we build on the original MathArena benchmark by substantially broadening its scope from final-answer olympiad problems to a continuously maintained evaluation platform for mathematical reasoning with LLMs. MathArena now covers a much wider range of tasks, including proof-based competitions, research-level arXiv problems, and formal proof generation in Lean. Additionally, we maintain a clear evaluation protocol for all models and regularly design new benchmarks as model capabilities improve to ensure that MathArena remains challenging. Notably, the strongest model, GPT-5.5, now reaches 98% on the 2026 USA Math Olympiad and 74% on research-level questions, showing that frontier models can now comfortably solve extremely challenging mathematical problems. This highlights the importance of continuously maintained evaluation platforms like MathArena to track the rapid progress of LLMs in mathematical reasoning.

[43] ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

Yunhan Zhao, Zhaorun Chen, Xingjun Ma, Yu-Gang Jiang, Bo Li

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As Large Language Models (LLMs) are increasingly deployed in cross-linguistic contexts, ensuring safety in diverse regulatory and cultural environments has become a critical challenge. However, existing multilingual benchmarks largely rely on general risk taxonomies and machine translation, which confines guardrail models to these predefined categories and hinders their ability to align with region-specific regulations and cultural nuances. To bridge these gaps, we introduce ML-Bench, a policy-grounded multilingual safety benchmark covering 14 languages. ML-Bench is constructed directly from regional regulations, where risk categories and fine-grained rules derived from jurisdiction-specific legal texts are directly used to guide the generation of multilingual safety data, enabling culturally and legally aligned evaluation across languages. Building on ML-Bench, we develop ML-Guard, a Diffusion Large Language Model (dLLM)-based guardrail model that supports multilingual safety judgment and policy-conditioned compliance assessment. ML-Guard has two variants, one 1.5B lightweight model for fast `safe/unsafe’ checking and a more capable 7B model for customized compliance checking with detailed explanations. We conduct extensive experiments against 11 strong guardrail baselines across 6 existing multilingual safety benchmarks and our ML-Bench, and show that ML-Guard consistently outperforms prior methods. We hope that ML-Bench and ML-Guard can help advance the development of regulation-aware and culturally aligned multilingual guardrail systems.

[44] Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memory updates, sparse outcome rewards provide weak supervision, resulting in unstable long-horizon optimization. Drawing on memory schema theory and the functional division between prefrontal regions and hippocampus regions, we introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, we propose Memory Guideline Induction to optimize a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable robustness, transferability, and efficiency.

[45] FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chinese) red-teaming benchmark designed to test an LLM’s refusal of requests that violate financial compliance. Grounded in real-world financial crime cases and ethics standards, the benchmark comprises 14 subcategories spanning financial crimes and ethical violations. Through extensive experiments on general-purpose and finance-specialized LLMs under three representative attack settings, we identify critical vulnerabilities that allow adversarial prompts to bypass compliance safeguards. Further analysis reveals stronger susceptibility in Chinese contexts and highlights the limitations of prompt-level defenses against sophisticated or implicit manipulation strategies.

[46] Characterizing the Expressivity of Local Attention in Transformers

Jiaoda Li, Ryan Cotterell

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before generating the next token. One common variant of attention is called local attention, which restricts each token to aggregating information from a bounded window of predecessors, reducing the quadratic cost of global attention to linear. Although this restriction is usually motivated by efficiency, it has also been found to improve model quality, a phenomenon that has so far lacked a satisfactory explanation. We provide a formal account of this phenomenon in terms of recognizer expressivity. It has been shown that fixed-precision transformers with global attention correspond to a fragment of linear temporal logic containing a single past operator. We additionally prove that adding local attention introduces a second temporal operator, strictly enlarging the class of recognizable regular languages. Moreover, global and local attention are expressively complementary: neither subsumes the other, and combining them yields the richest fragment. Experiments on formal language recognition and natural language modeling corroborate the theory, showing that hybrid global–local transformers outperform their global-only counterparts.

Scott Friedman, Ruta Wheelock, Sonja Schmer-Galunder, Drisana Iverson, Jake Vasilakes, Joan Zheng, Jeffrey Rye, Vasanth Sarathy, Christopher Miller

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpfulness, compassion) and anti-social sentiment (e.g., threats, opposition, blame) at different topics, all in the same message. While many natural language processing (NLP) tools classify or score a text’s overall sentiment as positive, neutral, or negative, these tools cannot report that positive and negative sentiments coexist, and they cannot report the target of those sentiments. This paper presents the Directed Social Regard (DSR) approach to multi-dimensional, multi-valence sentiment analysis, comprised of a pair of transformer-based models that (1) detects span-level targets of sentiment in a message and then (2) scores all spans within the message context along three (-1, 1) axes of regard that are motivated by social science theories of moral disengagement and moral framing. We present a data collection and annotation strategy for DSR dataset construction, a transformer-based architecture for span-level scoring, and a validation study with promising results. We apply the validated DSR model on six third-party datasets of online media and report meaningful correlations between DSR outputs and the labels and topics in these pre-existing social science datasets.

[48] When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a step-wise arithmetic algorithm and two numeric inputs, and must return the final computed value. The benchmark uses simple arithmetic operations but increases complexity through algorithm length and look-back dependencies over intermediate variables. Across 14 models and 55 datasets, average first-answer accuracy drops from 61% on 5-step procedures to 20% on 95-step procedures. Generation-level analysis shows that failures often involve missing answers, premature answers, self-correction after an initial error, under-executed traces, and hallucinated extra steps. These findings suggest that apparent reasoning ability can mask substantial weaknesses in faithful instruction execution.

[49] Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs

Dylan Bouchard

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2407.10853: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2407.10853&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[50] Reinforcement Learning for LLM Post-Training: A Survey

Zhichao Wang, Kiran Ramnath, Bin Bi, Shiva Kumar Pentyala, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Xiang-Bo Mao, Sitaram Asur, Cheng

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2407.16216: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2407.16216&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[51] Bias in Large Language Models: Origin, Evaluation, and Mitigation

Yufei Guo, Muzhe Guo, Juntao Su, Zhou Yang, Mengqiu Zhu, Hongfei Li, Mengyang Qiu, Shuo Shuo Liu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.10915: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.10915&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[52] Representation in large language models

Cameron Yetman

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.00885: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.00885&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[53] Exploring the System 1 Thinking Capability of Large Reasoning Models

Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.10368: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.10368&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Zongqi Wang, Tianle Gu, Chen Gong, Xin Tian, Siqi Bao, Yujiu Yang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.06698: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.06698&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[55] Lightweight Domain Adaptation of a Large Language Model for Legal Assistance in the Indian Context

Jatin Gupta, Akhil Sharma, Saransh Singhania, Ali Imam Abidi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.22003: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.22003&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[56] ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Zexi Liu, Jingyi Chai, Xinyu Zhu, Shuo Tang, Rui Ye, Bo Zhang, Lei Bai, Siheng Chen

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action step, accelerating experience collection and improving training efficiency; (3) an agentic ML-specific reward module, which unifies varied ML feedback signals into consistent rewards for RL optimization. Leveraging this framework, we train ML-Agent, driven by a 7B-sized Qwen-2.5 LLM for autonomous ML. Despite training on only 9 ML tasks, our 7B-sized ML-Agent achieves comparable performance to agents using much larger proprietary LLMs (e.g., GPT-5) but at significantly lower computational cost, demonstrating strong performance and cross-task generalization.

[57] ToolGrad: Efficient Tool-use Dataset Generation with Textual “Gradients”

Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data generation. We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual “gradients”, and then synthesizes corresponding user queries. This “answer-first” approach led to ToolGrad-500, a dataset generated with more complex tool use, lower cost, and almost 100% pass rate. Experiments show that ToolGrad models outperform those trained on expensive baseline datasets and proprietary LLMs. The ToolGrad source code, dataset, and models are available at https://github.com/zhongyi-zhou/toolgrad.

[58] InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

Anirudh Iyengar Kaniyar Narayana Iyengar, Srija Mukhopadhyay, Adnan Qidwai, Shubhankar Singh, Dan Roth, Vivek Gupta

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types ranging from entity inference and trend correlation to numerical estimation and abstract multi-step reasoning grounded in 2-3 thematically or structurally related charts. We organize the benchmark into three tiers of increasing difficulty: (1) factual reasoning over individual charts, (2) integrative analysis across synthetically aligned chart sets, and (3) semantic inference over visually complex, real-world chart pairs. Our evaluation of state-of-the-art open- and closed-source VLMs reveals consistent and steep accuracy declines as chart complexity increases. We find that models perform better when we decompose multi-entity charts into simpler visual units, underscoring their struggles with cross-chart integration. By exposing these systematic limitations, InterChart provides a rigorous framework for advancing multimodal reasoning in complex, multi-visual environments.

[59] Reasoning-Intensive Regression

Diane Tchuindjo, Omar Khattab

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e., deducing subtle numerical scores from text. Unlike standard language regression tasks such as sentiment or similarity analysis, RiR often appears instead in ad-hoc applications such as rubric-based scoring, modeling dense rewards in complex environments, or domain-specific retrieval, where much deeper analysis of context is required while only limited task-specific training data and computation are available. We cast four realistic problems as RiR tasks to establish an initial benchmark, and use that to test our hypothesis that prompting frozen LLMs and fine-tuning Transformer encoders via gradient descent will both often struggle in RiR. We then propose MENTAT, a simple and lightweight method that combines batch-reflective prompt optimization with neural ensemble learning. MENTAT achieves up to 65% improvement over both baselines, though substantial room remains for future advances.

[60] Structured In-context Environment Scaling for Large Language Model Reasoning

Peng Yu, Zeyuan Zhao, Shao Zhang, Luoyi Fu, Xinbing Wang, Ying Wen

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) have achieved significant advancements in reasoning capabilities through reinforcement learning (RL) via environmental exploration. As the intrinsic properties of the environment determine the abilities that LLMs can learn, the environment plays a important role in the RL finetuning process. An ideal LLM reasoning environment should possess three core characteristics: scalability, generalizable reasoning, and verifiability. However, existing mathematical and coding environments are difficult to scale due to heavy reliance on expert annotation, while the skills learned in game-based environments are too specialized to generalize. To bridge this gap, we introduce the \textbf{S}tructured \textbf{I}n-context \textbf{E}nvironment (SIE) framework. SIE achieves scalability by automatically constructing reasoning environments from large-scale structured data, where the rich compositional patterns naturally support generalizable reasoning. Moreover, the explicit schemas and reasoning chains in structured data provide a foundation for rule-based verifiability. Experimental results show that SIE framework not only achieves substantial improvements in in-domain structured reasoning, but also enables the learned compositional reasoning skills to generalize effectively to out-of-domain mathematical and logical reasoning tasks. We further explored learning in information-limited partial SIEs and found that LLMs can infer the missing information through exploring the environment, leading to robust reasoning improvements and generalization performance.

[61] Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

Haolin Yang, Hakaze Cho, Naoya Inoue

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.

[62] Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

Haolin Yang, Hakaze Cho, Kaize Ding, Naoya Inoue

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of “key heads” most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.

[63] ADVICE: Answer-Dependent Verbalized Confidence Estimation

Ki Jung Seo, Sehun Lim, Taeuk Kim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent progress in large language models (LLMs) has enabled them to communicate their confidence in natural language, improving transparency and reliability. However, this expressiveness is often accompanied by systematic overconfidence, whose underlying causes remain poorly understood. In this work, we analyze the dynamics of verbalized confidence estimation and identify answer-independence – the failure to condition confidence on the model’s own answer – as a primary driver of this behavior. To address this, we introduce ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that promotes answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration, while exhibiting strong generalization to unseen settings without degrading task performance. We further demonstrate that these gains stem from enhanced answer dependence, shedding light on the origins of overconfidence and enabling trustworthy confidence verbalization.

[64] Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often costly, inflexible, and opaque, we propose a memory-augmented framework that leverages LLM-generated critiques grounded in labeled data. Our framework uses episodic memory to store instance-level critiques - capturing specific past experiences - and semantic memory to distill these into reusable, task-level guidance. Across a diverse set of tasks and models, our best performing self-critique strategy (utilizing both memory types) yields an average improvement of 8.1 percentage points over the zero shot baseline, and 4.6pp over a RAG-based baseline that relies only on labels. However, improvements vary substantially across models and domains. To explain this variation, we introduce suggestibility - a novel metric capturing how receptive a model is to external reasoning provided in context. We use suggestibility to illuminate when and why memory augmentation succeeds or falls short. Beyond accuracy gains, we find pre-computed critiques substantially reduce inference-time computation for reasoning models, cutting thinking tokens by an average of 31.95% across all datasets by substituting for reasoning that the model would otherwise perform independently. Our findings highlight the conditions under which memory-driven, reflective learning can serve as a lightweight, interpretable, and efficient strategy for improving LLM adaptability.

[65] PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Feijie Wu, Weiwu Zhu, Yuxiang Zhang, Soumya Chatterjee, Jiarong Zhu, Fan Mo, Rong Luo, Jing Gao

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents’ tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step’s importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final answer, plus an auxiliary term indicating whether the step’s tool calls satisfy formatting constraints and execute successfully. Using these step-wise importance estimates, PORTool updates the policy to generate efficient tool-call steps, guided by both local comparisons within each branching decision and the overall quality of entire trajectories. Experiments show that PORTool improves final-answer accuracy while reducing tool-call steps compared with state-of-the-art policy-optimization baselines, and ablation studies confirm the robustness of the proposed step-wise importance estimates.

[66] Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices

Liu Zai, Iraklis Klampanos

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers, yet little work has been done to optimize it for edge-side inference. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-3, LLaMa-3, and Qwen-2.5. After breaking down and analyzing the logic of the original cl100k pretokenizer, we introduced a new pretokenization algorithm with linear time complexity and constant, trivial memory usage, suited for edge scenarios. Test results show that it increases microbenchmarking throughput by up to $ 2.48\times $ and delivers a $ 1.14\times $ improvement in overall throughput across the entire Byte-level BPE encoding process, depending on the dataset, while providing identical results as the baseline Regex-based tokenizer.

[67] Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

Jen-tse Huang, Chang Chen, Shiyang Lai, Wenxuan Wang, Michelle R. Kaufman, Mark Dredze

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually annotated dataset of 200 short videos spanning four health domains. This dataset provides fine-grained annotations for three deceptive patterns-experimental errors, logical fallacies, and fabricated claims-each verified by evidence such as national standards and academic literature. We evaluate eight frontier MLLMs across five modality settings. Experimental results demonstrate that Gemini-2.5-Pro achieves the highest performance in the multimodal setting with a belief score of 71.5/100, while o3 performs the worst at 35.2. Furthermore, we investigate social cues that induce false beliefs in videos and find that models are susceptible to biases like authoritative channel IDs.

[68] Reward Modeling from Natural Language Human Feedback

Zongqi Wang, Rui Wang, Yuchuan Wu, Yiyao Yu, Pinyi Zhang, Shaoning Sun, Yujiu Yang, Yongbin Li

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs generate reasoning chains ending with critiques and preference labels, and RLVR then relies on the correctness of the preference labels as the training reward. However, in this paper, we demonstrate that such binary classification tasks make GRMs susceptible to guessing correct outcomes without sound critiques. Consequently, these spurious successes introduce substantial noise into the reward signal, thereby impairing the effectiveness of reinforcement learning. To address this issue, we propose Reward Modeling from Natural Language Human Feedback (RM-NLHF), which leverages natural language feedback to obtain process reward signals, thereby mitigating the problem of limited solution space inherent in binary tasks. Specifically, we compute the similarity between GRM-generated and human critiques as the training reward, which provides more accurate reward signals than outcome-only supervision. Additionally, considering that human critiques are difficult to scale up, we introduce Meta Reward Model (MetaRM) which learns to predict process reward from datasets with human critiques and then generalizes to data without human critiques. Experiments on multiple benchmarks demonstrate that our method consistently outperforms state-of-the-art GRMs trained with outcome-only reward, confirming the superiority of integrating natural language over binary human feedback as supervision.

[69] Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

Zhaoyi Li, Jiatong Li, Gangwei Jiang, Linqi Song, Defu Lian, Ying Wei

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems. However, recent studies reveal a sharp performance drop in reasoning hop generalization scenarios, where the required number of reasoning steps exceeds training distributions while the underlying algorithm remains unchanged. The internal mechanisms driving this failure remain poorly understood. In this work, we conduct a systematic study on tasks from multiple domains, and find that errors concentrate at token positions of a few critical error types, rather than being uniformly distributed. Closer inspection reveals that these token-level erroneous predictions stem from internal competition mechanisms: certain attention heads, termed erroneous processing heads (ep heads), tip the balance by amplifying incorrect reasoning trajectories while suppressing correct ones. Notably, removing individual ep heads during inference can often restore the correct predictions. Motivated by these insights, we propose test-time correction of reasoning, a lightweight intervention method that dynamically identifies and deactivates ep heads in the reasoning process. Extensive experiments across different tasks and LLMs show that it consistently improves reasoning hop generalization, highlighting both its effectiveness and potential.

[70] Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

Jinu Lee, Kyoung-Woon On, Simeng Han, Arman Cohan, Julia Hockenmaier

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.01020: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.01020&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[71] Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation

Lakshan Cooray, Deshan Sumanathilaka, Pattigadapa Venkatesh Raju

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00665: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00665&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[72] Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization

Runquan Gui, Jie Wang, Zhihai Wang, Chi Ma, Jianye Hao, Feng Wu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03141: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03141&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[73] Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03216: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03216&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[74] Language Models Struggle to Use Representations Learned In-Context

Michael A. Lepori, Tal Linzen, Ann Yuan, Katja Filippova

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.04212: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.04212&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[75] Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood, Hongtai Wei, Sudeep Das, Danny Nightingale, Meg Watson, Charles Pollnow V

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.03565: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.03565&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Tanvir Ahmed Sijan, S. M Golam Rifat, Pankaj Chowdhury Partha, Md. Tanjeed Islam, Md. Musfique Anwar

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.15949: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.15949&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[77] Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

Yanchen Wu, Tenghui Lin, Yingli Zhou, Fangyuan Zhang, Qintian Guo, Xun Zhou, Sibo Wang, Xilin Liu, Yuchi Ma, Yixiang Fang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.01707: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.01707&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[78] How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Gregory N. Frank

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.04385: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.04385&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[79] Turing or Cantor: That is the Question

Eugene Eberbach

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10418: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10418&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[80] On Cost-Effective LLM-as-a-Judge Improvement Techniques

Ryan Lail, Luke Markham

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.13717: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.13717&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[81] VGR: Visual Grounded Reasoning

Jiacong Wang, Zijian Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.11991: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.11991&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[82] SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning

Rania Elbadry, Sarfraz Ahmad, Ahmed Heakl, Dani Bouch, Momina Ahsan, Muhra AlMahri, Marwa Elsaid khalil, Yuxia Wang, Salem Lahlou, Sophia Ananiadou, Veselin Stoyanov, Jimin Huang, Xueqing Peng, Preslav Nakov, Zhuohan Xie

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19098: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19098&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[83] SCOPE:Planning for Hybrid Querying over Clinical Trial Data

Suparno Roy Chowdhury, Manan Roy Choudhury, Tejas Anvekar, Muhammad Ali Khan, Kaneez Zahra Rubab Khakwani, Mohamad Bassam Sonbol, Irbaz Bin Riaz, Vivek Gupta

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.25120: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.25120&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[84] From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

Alex Petrov, Alexander Gusak, Denis Mukha, Dima Korolev

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27906: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27906&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[85] FlowBot: Inducing LLM Workflows with Bilevel Optimization and Textual Gradients

Hongyeon Yu, Young-Bum Kim, Yoon Kim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26258: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26258&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[86] LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps

Keito Inoshita, Xiaokang Zhou, Akira Kawai, Katsutoshi Yada

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27345: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27345&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[87] Exploring Applications of Transfer-State Large Language Models: Cognitive Profiling and Socratic AI Tutoring

Minori Noguchi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27454: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27454&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[88] Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

Ansar Aynetdinov, Patrick Haller, Alan Akbik

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.28075: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.28075&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[89] Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Ziyuan Zhang, Darcy Wang, Ningyuan Chen, Rodrigo Mansur, Vahid Sarhangian

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.09901: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.09901&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[90] Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment

Kundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.00166: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.00166&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[91] ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

Yiran Wu, Mauricio Velazco, Andrew Zhao, Manuel Raúl Meléndez Luján, Srisuma Movva, Yogesh K Roy, Quang Nguyen, Roberto Rodriguez, Qingyun Wu, Michael Albada, Julia Kiseleva, Anand Mudgerikar

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.14201: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.14201&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[92] Knowing When to Defer: Selective Prediction for Responsible Knowledge Tracing

Joshua Mitton, Prarthana Bhattacharyya, Ralph Abboud, Simon Woodhead

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.21514: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.21514&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[93] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

Gregory N. Frank

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.18280: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.18280&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[94] Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yi R., Fung

Udayanga G. W. K. N. Gamage, Yan Zeng, Cesar Cadena, Matteo Fumagalli, Silvia Tolu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-time object detection on energy-constrained platforms is critical for applications such as UAV-based inspection, autonomous navigation, and mobile robotics. Spiking neural networks (SNNs) on neuromorphic hardware are believed to be significantly more energy-efficient than conventional artificial neural networks (ANNs). In this work, we present a comprehensive methodology for designing general SNN detection architectures targeting neuromorphic platforms, along with the engineering adaptations required to deploy them on the state-of-the-art Neuromorphic processor, Intel Loihi 2. We benchmark SNN-based object detection on Loihi 2 using both frame-based and event-based datasets, comparing performance with ANN-based detection on the NVIDIA Jetson Orin Nano, NVIDIA Jetson Nano B01, and the Apple M2 CPU. Our results show that SNNs on Loihi 2 can perform real-time detection while achieving the lowest per-inference dynamic energy among all platforms. Also, Loihi 2 outperforms the other platforms in terms of power consumption, though ANNs on Jetson Orin Nano achieve higher inference rates. Furthermore, our ANN-to-SNN distillation-aware training enables SNNs to recover 87-100% of the detection accuracy of their ANN counterparts while maintaining lower inference latency; without distillation, SNNs exhibit an 11-27% accuracy drop. These results highlight the potential of neuromorphic systems for energy-efficient, real-time object detection at the edge.

Hang Wang, Chao Shen, Chenhao Lin, Minghui Yang, Lei Zhang, Cong Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The proliferation of advanced AI video synthesis techniques poses an unprecedented challenge to digital video authenticity. Existing AI-generated video (AIGV) detection methods primarily focus on uni-modal or spatiotemporal artifacts, but they overlook the rich cues within the visual-textual cross-modal space, especially the temporal stability of semantic alignment. In this work, we identify a distinctive fingerprint in AIGVs, termed cross-modal temporal artifact (CMTA). Unlike real videos that exhibit natural temporal fluctuations in cross-modal alignment due to semantic variations, AIGVs display unnaturally stable semantic trajectories governed by given input prompts. To bridge this gap, we propose the CMTA framework, a cross-modal detection approach that captures these unique temporal artifacts through joint cross-modal embedding and multi-grained temporal modeling. Specifically, CMTA leverages BLIP to generate frame-level image captions and utilizes CLIP to extract corresponding visual-textual representations. A coarse-grained temporal modeling branch is then designed to characterize temporal fluctuations in cross-modal alignment with a GRU. In parallel, a fine-grained branch is constructed to capture intricate inter-frame variations from integrated visual-textual features with a Transformer encoder. Extensive experiments on 40 subsets across four large-scale datasets, including GenVideo, EvalCrafter, VideoPhy, and VidProM, validate that our approach sets a new state-of-the-art while exhibiting superior cross-generator generalization. Code and models of CMTA will be released at https://github.com/hwang-cs-ime/CMTA

[101] From Images2Mesh: A 3D Surface Reconstruction Pipeline for Non-Cooperative Space Objects

Bala Prenith Reddy Gopu, Patrick Quinn, George M. Nehma, Madhur Tiwari, Matt Ueckermann, David Hinckley, Christopher McKenna

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: On-orbit inspection imagery is crucial as it enables characterization of non-cooperative resident space objects, providing the geometry and structural condition essential for active debris removal and on-orbit servicing mission planning. However, most existing neural implicit surface reconstruction methods have been confined to synthetic or hardware-in-the-loop data with known camera poses and controlled illumination. In this work, we present a pipeline for neural implicit surface reconstruction of non-cooperative space objects from monocular inspection imagery. We demonstrate it on publicly released ISS inspection footage from the STS-119 mission and publicly released on-orbit inspection footage of an H-IIA rocket upper stage. We find that segmentation-based background removal is essential for successful camera pose estimation from real on-orbit footage, where background variation between frames caused direct processing to fail entirely. We further incorporate photometric correction of per-frame exposure variations and analyze its behavior across datasets, finding that performance in shadowed regions varies with the illumination characteristics of the input footage.

[102] VkSplat: High-Performance 3DGS Training in Vulkan Compute

Jingxiang Chen, Mohamed Ibrahim, Yang Liu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present VkSplat, a high-performance, cross-vendor 3D Gaussian Splatting (3DGS) training pipeline implemented fully in Vulkan compute, addressing performance and compatibility limitation of existing training pipelines. With various optimizations, we achieve $3.3\times$ speed and $33%$ VRAM reduction over CUDA+PyTorch baseline, maintaining quality, and demonstrating compatibility across GPU vendors. To the best of our knowledge, this is the first fully-Vulkan-based 3DGS training pipeline that achieves state-of-the-art performance. Code: \href{https://github.com/harry7557558/vksplat}{https://github.com/harry7557558/vksplat}

[103] Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis

Shadid Yousuf, S. M. Mahbubur Rahman, Mohammed Imamul Hassan Bhuiyan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The deployment of artificial intelligence in medical imaging is hindered by high computational complexity and resource-intensive processing of volumetric data. Although chest computed tomography (CT) volumes offer richer diagnostic information than projection radiography, their use in AI-based diagnosis remains limited due to the computational burden of processing uncompressed volumetric images (typically stored in NIfTI or DICOM format). Addressing the growing need for low-resource deployment and efficient electronic data transfer, we investigate the utilization of JPEG-compressed chest CT volumes for thoracic abnormality detection. We propose Feature Attention Style Transfer (FAST), a novel distillation framework that transfers both activation patterns and structural relationships from high-fidelity CT representations to a spatiotemporal visual encoder operating on compressed inputs. By combining Gram-matrix-based attention style preservation with dual-attention feature alignment, FAST enables robust feature extraction from degraded volumes. Furthermore, we introduce Structured Factorized Projection (SFP), leveraging Block Tensor Train decomposition as a parameter-efficient alternative to dense projection layers, reducing projection-head parameters by almost half. Our contrastive learning pipeline, CT-Lite, integrates these components with a SigLIP-based multimodal alignment objective. Experiments on CT-RATE, NIDCH, and Rad-ChestCT demonstrate that CT-Lite achieves AUROC within 5-7% of the uncompressed-input baseline across all three datasets, despite operating on compressed inputs with significantly fewer parameters, paving the way for AI-based clinical evaluation under resource constraints.

[104] Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation

Aishani Pathak, Hasti Seifi

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Egocentric pose estimation for Augmented Reality (AR) and assistive devices requires not just accurate predictions but guaranteed uncertainty regions. Conformal prediction (CP) provides such guarantees without retraining, but we show that standard CP with a single fixed threshold achieves nominal 90% overall coverage while covering only ~60% of the hardest 25% of frames (Q4) – a ~30 percentage-point conditional coverage gap consistent across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. We further show that a geodesic SE(3) nonconformity score identifies physically harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth camera displacement for geodesic Q4 frames. To close the coverage gap, we propose DINOv2-Bridge adaptive CP: a two-stage difficulty estimator trained on a single source participant that transfers cross-participant without any images at test time, improving Q4 coverage from ~0.75 to ~0.93 while maintaining overall coverage at the 90% target.

[105] MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

Xijia Wei, Yuan Fang, Kevin Chetty, Youngjun Cho, Nadia Bianchi-Berthouze

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose estimation predictions. We evaluate it across three datasets based on leave-one-person-out cross-validation with rigorous statistical testing. MAEPose consistently outperforms state-of-the-art baselines by up to 22.1% in MPJPE p<0.05, and maintains robust accuracy under zero-shot bystander interference with only a 6.5% error increase. Ablation studies confirm that both the pre-training and the heatmap decoder contribute substantially, while modality analysis indicates that leveraging Range-Doppler video as input achieves better pose estimation performance than Range-Azimuth or their fusion, with lower computational cost.

[106] Remote SAMsing: From Segment Anything to Segment Everything

Osmar Luiz Ferreira de Carvalho, Osmar Abílio de Carvalho Júnior, Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent quality-coverage trade-off: strict thresholds yield precise masks but leave most of the image unsegmented, while relaxed thresholds increase coverage at the cost of mask quality; and (2) large images must be tiled, fragmenting objects across tile boundaries. We propose Remote SAMsing, an open-source pipeline that solves both problems without modifying SAM2 or requiring training data. For coverage, a multi-pass algorithm runs SAM2 repeatedly on each tile, painting accepted masks black between passes to simplify the scene for the next iteration, and relaxing quality thresholds only when coverage gains stagnate, ensuring that the most precise masks are always captured first. For spatial consistency, contextual padding and a parameter-free best-match merge reconstruct objects fragmented across tile boundaries. Evaluated on seven scenes (5~~cm to 4.78~~m GSD), the pipeline raises coverage from 30–68% (single-pass SAM2) to 91–98%. Ablation experiments quantify the contribution of each component to coverage and detection quality. Per-class evaluation shows that SAM2 transfers well to discrete RS objects (buildings 95%, cars 82–93% Det@0.5) with segment boundaries 3–8$\times$ more precise than SLIC and Felzenszwalb baselines. Tile size functions as an implicit scale parameter: reducing it from $1{,}000$ to 250 raises Det@0.5 from 56% to 85%, outperforming SAM2’s built-in multi-scale mechanism. The pipeline generalizes to MNF false-color imagery without retraining (99.5% ASA) and scales to production-sized images: a 1.94 billion pixel Potsdam mosaic achieved 97% coverage without quality degradation.

Vincenzo Polizzi, David B. Lindell, Jonathan Kelly

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Event cameras provide several unique advantages over standard frame-based sensors, including high temporal resolution, low latency, and robustness to extreme lighting. However, existing learning-based approaches for event processing are typically confined to narrow, task-specific silos and lack the ability to generalize across modalities. We address this gap with REALM, a cross-modal framework that learns an RGB and Event Aligned Latent Manifold by projecting event representations into the pretrained latent space of RGB foundation models. Instead of task-specific training, we leverage low-rank adaptation (LoRA) to bridge the modality gap, effectively unlocking the geometric and semantic priors of frozen RGB backbones for asynchronous event streams. We demonstrate that REALM effectively maps events into the ViT-based foundation latent space. Our method allows us to perform downstream tasks like depth estimation and semantic segmentation by simply transferring linear heads trained on the RGB teacher. Most significantly, REALM enables the direct, zero-shot application of complex, frozen image-trained decoders, such as MASt3R, to raw event data. We demonstrate state-of-the-art performance in wide-baseline feature matching, significantly outperforming specialized architectures. Code and models are available upon acceptance.

[108] When Do Diffusion Models learn to Generate Multiple Objects?

Yujin Jeong, Arnas Uselis, Iro Laina, Seong Joon Oh, Anna Rohrbach

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Text-to-image diffusion models achieve impressive visual fidelity, yet they remain unreliable in multi-object generation. Despite extensive empirical evidence of these failures, the underlying causes remain unclear. We begin by asking how much of this limitation arises from the data itself. To disentangle data effects, we consider two regimes across different dataset sizes: (1) concept generalization, where each individual concept is observed during training under potentially imbalanced data distributions, and (2) compositional generalization, where specific combinations of concepts are systematically held out. To study these regimes, we introduce mosaic (Multi-Object Spatial relations, AttrIbution, Counting), a controlled framework for dataset generation. By training diffusion models on mosaic, we find that scene complexity plays a dominant role rather than concept imbalance, and that counting is uniquely difficult to learn in low-data regimes. Moreover, compositional generalization collapses as more concept combinations are held out during training. These findings highlight fundamental limitations of diffusion models and motivate stronger inductive biases and data design for robust multi-object compositional generation.

[109] An End-to-End Decision-Aware Multi-Scale Attention-Based Model for Explainable Autonomous Driving

Maryam Sadat Hosseini Azad, Shahriar Baradaran Shokouhi, Amir Abbas Hamidi Imani, Shahin Atakishiyev, Randy Goebel

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The application of computer vision is gradually increasing across various domains. They employ deep learning models with a black-box nature. Without the ability to explain the behavior of neural networks, especially their decision-making processes, it is not possible to recognize their efficiency, predict system failures, or effectively implement them in real-world applications. Due to the inevitable use of deep learning in fully automated driving systems, many methods have been proposed to explain their behavior; however, they suffer from flawed reasoning and unreliable metrics, which have prevented a comprehensive understanding of complex models in autonomous vehicles and hindered the development of truly reliable systems. In this study, we propose a multi-scale attention-based model in which driving decisions are fed into the reasoning component to provide case-specific explanations for each decision simultaneously. For quantitative evaluation of our model’s performance, we employ the F1-score metric, and also proposed a new metric called the Joint F1 score to demonstrate the accurate and reliable performance of the model in terms of Explainable Artificial Intelligence (XAI). In addition to the BDD-OIA dataset, the nu-AR dataset is utilized to further validate the generalization capability and robustness of the proposed network. The results demonstrate the superiority of our reasoning network over the classic and state-of-the-art models.

[110] Efficient Spatio-Temporal Vegetation Pixel Classification with Vision Transformers

Alan Gomes, Anderson Gonçalves, Samuel Felipe dos Santos, Nathan Felipe Alves, Magna Soelma Beserra de Moura, Bruna de Costa Alberton, Leonor Patricia C. Morellato, Ricardo da Silva Torres, Jurandy Almeida

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Plant phenology-the study of recurrent life cycle events-is essential for understanding ecosystem dynamics and their responses to climate change impacts. While Unmanned Aerial Vehicles (UAVs) and near-surface cameras enable high-resolution monitoring, identifying plant species across time remains computationally challenging. State-of-the-art approaches, specifically Multi-Temporal Convolutional Networks (CNNs), rely on rigid multi-branch architectures that scale poorly with longer time series and require large spatial context windows. In this paper, we present an extensive study on optimizing Vision Transformers (ViTs) for efficient spatio-temporal vegetation pixel classification. We conducted a comprehensive ablation study analyzing seven key design dimensions, including: (i) data normalization; (ii) spectral arrangement; (iii) boundary handling; (iv) spatial context window shape and size; (v) tokenization strategies; (vi) positional encoding; and (vii) feature aggregation strategies. Our method was evaluated on two datasets from the Brazilian Cerrado biome, Serra do Cipó (aerial imagery) and Itirapina (near-surface imagery). Experimental results demonstrate that our ViT approach offers a substantial improvement in computational efficiency while maintaining competitive classification performance. Notably, our ViT reduces Floating Point Operations (FLOPs) by an order of magnitude and maintains constant parameter complexity regardless of the time series length, whereas the CNN baseline scales linearly. Our findings confirm that ViTs are a robust, scalable solution for resource-constrained phenological monitoring systems.

[111] Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration

Zhili Li, Kangyang Chai, Zhihao Wang, Xiaowei Jia, Yanhua Li, Gengchen Mai, Sergii Skakun, Dinesh Manocha, Yiqun Xie

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Super-resolution (SR) techniques have made major advances in reconstructing high-resolution images from low-resolution inputs. The increased resolution provides visual enhancement and utility for monitoring tasks. In particular, SR has been increasingly developed for satellite-based Earth observation, with applications in urban planning, agriculture, ecology, and disaster response. However, existing SR studies and benchmarks typically use fidelity metrics such as PSNR or SSIM, whereas the true utility of super-resolved images lies in supporting downstream tasks such as land cover classification, biomass estimation, and change detection. To bridge this gap, we introduce GeoSR-Bench, a downstream task-integrated SR benchmark dataset to evaluate SR models beyond fidelity metrics. GeoSR-Bench comprises spatially co-located, temporally aligned, and quality-controlled image pairs from about 36,000 locations across diverse land covers, spanning resolutions from 500m to 0.6m. To the best of our knowledge, GeoSR-Bench is the first SR benchmark that directly connects improved image resolution from SR models with downstream Earth monitoring tasks, including land cover segmentation, infrastructure mapping, and biophysical variable estimation. Using GeoSR-Bench, we benchmark GAN, transformer, neural operator, and diffusion-based SR models on perceptual quality and downstream task performance. We conduct experiments with 270 settings, covering 2 cross-platform SR tasks, 9 SR models, 3 downstream task models, and 5 downstream tasks for each SR task. The results show that improvements in traditional SR metrics often do not correlate with gains in task performance, and the correlations can be negative, indicating that these metrics provide limited guidance for selecting superior models for downstream tasks. This reveals the need to integrate downstream tasks into SR model development and evaluation.

[112] Online Self-Calibration Against Hallucination in Vision-Language Models

Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Perception Mismatch: the student model is forced to align with fine-grained details beyond its perceptual capacity, learning to guess rather than to see. To obtain reliable self-supervision for online learning, we identify a Generative-Discriminative Gap within LVLMs, where models exhibit higher accuracy on discriminative verification than open-ended generation. Leveraging this capability, we propose \textbf{O}nline \textbf{S}elf-\textbf{CA}lib\textbf{R}ation (OSCAR), a framework that integrates Monte Carlo Tree Search with a Dual-Granularity Reward Mechanism to construct preference data and iteratively refines the model via Direct Preference Optimization. Extensive experiments demonstrate that OSCAR achieves state-of-the-art performance on hallucination benchmarks while improving general multimodal capabilities.

[113] Pose-Aware Diffusion for 3D Generation

Zihan Zhou, Luxi Chen, Jingzhi Zhou, Yuhao Wan, Min Zhao, Baoyu Fan, Chongxuan Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.

[114] CURE-OOD: Benchmarking Out-of-Distribution Detection for Survival Prediction

Wenjie Zhao, Jia Li, Mingrui Liu, Jing Wang, Yunhui Guo

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: ``How long can I live and remain free of cancer?’’ is often the first question a patient asks after receiving a cancer diagnosis and treatment. Accurate survival prediction helps alleviate psychological distress and supports risk stratification and personalized treatment planning. Recent survival prediction frameworks have shown strong performance using computed tomography (CT) images. However, variations in imaging acquisition introduce out-of-distribution (OOD) samples caused by covariate shifts that undermine model reliability. Despite this challenge, to our knowledge, no existing benchmark systematically studies OOD detection in cancer survival prediction. To address this gap, we introduce the Cancer sURvival bEnchmark for OOD Detection (CURE-OOD), the first benchmark for systematically evaluating OOD detection in survival prediction under controlled acquisition-induced distribution shifts. CURE-OOD defines scanner-parameter-based training, in-distribution (ID), and OOD test splits across four survival prediction tasks. Our experiments show that covariate shifts notably reduce survival prediction performance. It also shows that mainstream classification-oriented OOD detectors can fail in survival prediction. Finally, we include HazardDev as a simple survival-aware reference baseline for OOD detection. CURE-OOD enables systematic analysis of how distribution shifts affect both downstream survival performance and OOD detectability.

[115] Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking

Nhat-Tan Do, Le-Huy Tu, Nhi Ngoc-Yen Nguyen, Dieu-Phuong Nguyen, Trong-Hop Do

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-object tracking (MOT) is critical in numerous real-world applications, including surveillance, autonomous driving, and robotics. Accurately predicting object motion is fundamental to MOT, but current methods struggle with the complexities of real-world, non-linear motion (e.g., sudden stops, sharp turns). While recent research has gravitated towards increasingly complex and computationally expensive generative models to tackle this problem, their practical utility is often constrained. This paper challenges that paradigm, arguing that such complexity is not only unnecessary but can be outperformed by a more efficient, purpose-built approach. We introduce the Temporal Convolutional Motion Predictor (TCMP), a novel framework for MOT that leverages a modified Temporal Convolutional Network (TCN) featuring dilated convolutions and a regression head. This design allows for effective motion prediction across arbitrary temporal context lengths. Experimental results demonstrate that our approach achieves state-of-the-art performance, specifically improves upon the previous best method in several key metrics: HOTA (a measure of overall tracking accuracy) increases from 62.3% to 63.4%, IDF1 (a measure of identity preservation) rises from 63.0% to 65.0%, and AssA (a measure of association accuracy) improves from 47.2% to 49.1%. Significantly, TCMP achieves this performance while being highly efficient; it has only 0.014 times the parameters and requires only 0.05 times the computational cost (FLOPs) compared to the SOTA method. while is only 0.014 times the size (in terms of parameters) and requires only 0.05 times the computational cost (in terms of FLOPs). These findings highlight the robustness of our method to advance MOT systems by ensuring adaptability, accuracy, and efficiency in complex tracking environments.

[116] Flow matching for Sentinel-2 super-resolution: implementation, application, and implications

Dakota Hester, Vitor S. Martins, Lucas B. Ferreira, Thainara M. A. Lima, Juliana A. Araújo

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Developing robust techniques for super-resolution of satellite imagery involves navigating commonly observed trade-offs between spectral fidelity and perceptual quality. In this work, we introduce a flow matching model for 4x super-resolution of 10-m Sentinel-2 visible and near-infrared bands over the conterminous United States (CONUS) using a dataset of 120,851 10-m Sentinel-2 and 2.5-m resampled NAIP imagery pairs acquired on the same day. Our results showed that the flow matching model outperformed diffusion and Real-ESRGAN models in pixel-wise accuracy in a single sampling step using the Euler method. When evaluated with a second-order Midpoint solver, our model generated perceptually realistic super-resolved imagery in only 20 sampling steps, effectively navigating the perception-distortion trade-off at inference time without retraining. We used this model to produce a super-resolved 2.5-m 4-band CONUS imagery product derived from 2025 10-m Sentinel-2 annual composites, consisting of over 1.58 trillion pixels. We further evaluated the use of super-resolved data on a land cover classification task using semantic segmentation models. Finally, we generated a yearly 2.5-m land cover product for the Chesapeake Bay watershed for 2020-2025. An accuracy assessment against 25,000 ground truth points revealed an overall accuracy of 89.11% for the annual land cover product. We conclude that flow matching is an effective generative modeling approach for super-resolution of Sentinel-2 imagery compared to diffusion and Generative Adversarial Network-based methods, and has strong implications for expanding access to high-resolution imagery for geospatial applications that demand fine spatial detail.

[117] RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

Ben Wan, Yan Feng, Zihan Tang, Weizhe Huang, Yuting Zeng, Jia Wang, Tongxuan Liu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventional vision-language models (VLMs) fail to preserve textual fidelity due to improper compression mechanisms. By analyzing the decoding process of DeepSeek-OCR, we find that a distinct two-stage reading trajectory: the model initially prioritizes the majority of high-norm tokens, then subsequently redistributes its attention to the remaining ones. Motivated by this insight, we propose RTPrune, a two-stage token pruning method tailored for DeepSeek-OCR. In the first stage, we prioritize high-norm visual tokens that capture salient textual and structural information. In the second stage, the remaining tokens are paired and merged based on optimal transport theory to achieve efficient feature aggregation. We further introduce a dynamic pruning ratio that adapts to token similarity and textual density for OCR tasks, enabling a better efficiency-accuracy trade-off. Extensive experiments demonstrate state-of-the-art performance, as evidenced by 99.47% accuracy and 1.23$\times$ faster prefill on OmniDocBench, achieved with 84.25% token retention when applied to DeepSeek-OCR-Large.

[118] SIMON: Saliency-aware Integrative Multi-view Object-centric Neural Decoding

YuSheng Lin, Ji-Hwa Tsai, Chun-Shu Wei

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent EEG-to-image retrieval methods leverage pretrained vision encoders and foveation-inspired priors, but typically assume a fixed, center-focused view. This center bias conflicts with content-driven human attention, creating a geometric-semantic dissociation between visual features and EEG responses. We propose SIMON, a saliency-aware multi-view framework for zero-shot EEG-to-image retrieval. SIMON combines foreground segmentation and saliency prediction to select fixation centers via Saliency-Aware Sampling (SAS), then generates foveated views that emphasize informative object regions while suppressing background clutter. On THINGS-EEG, SIMON achieves state-of-the-art performance in both intra-subject and inter-subject settings, reaching an average Top-1 accuracy of 69.7% and 19.6%, respectively, consistently outperforming recent competitive baselines. Analyses across sampling granularity, EEG channel topology, and visual/brain encoder backbones further support the robustness of saliency-aware multi-view integration. Our code and models are publicly available at https://github.com/simonlink666/SIMON.

[119] BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception

Kang Yang, Tianci Bu, Peng Wang, Deying Li, Yongcai Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Most existing heterogeneous cooperative perception methods depend on prior preparation like offline joint training or tailored collaborator-model adaptation. Such preprocessing is, however, generally impractical in real scenarios, as agents are usually independently trained by different developers and meet occasionally online. This work investigates \emph{preparation-free heterogeneous cooperative perception}, where agents use independently trained single-agent detectors without any pre-deployment coordination. We find direct cross-agent fusion under this setting greatly underperforms ego-only perception. We present BOLT, a lightweight plug-and-play module that adapts neighboring features online via ego-as-teacher distillation, requiring only ego predictions without ground-truth labels. BOLT leverages high-confidence ego perception features to guide cross-agent feature-domain alignment, while enabling neighbors to contribute features in the ego’s low-confidence regions. With only 0.9M trainable parameters, BOLT improves AP@50 by up to 32.3 points over vanilla unadapted fusion in the preparation-free setting. It consistently outperforms ego-only results on DAIR-V2X and OPV2V, across different encoder pairs and fusion strategies. Code: https://github.com/sidiangongyuan/BOLT.

[120] Beyond Heuristics: Learnable Density Control for 3D Gaussian Splatting

Zhenhua Ning, Xin Li, Jun Yu, Guangming Lu, Yaowei Wang, Wenjie Pei

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While 3D Gaussian Splatting (3DGS) has demonstrated impressive real-time rendering performance, its efficacy remains constrained by a reliance on heuristic density control. Despite numerous refinements to these handcrafted rules, such methods inherently lack the flexibility to adapt to diverse scenes with complex geometries. In this paper, we propose a paradigm shift for density control from rigid heuristics to fully learnable policies. Specifically, we introduce \textbf{LeGS}, a framework that reformulates density control as a parameterized policy network optimized via Reinforcement Learning (RL). Central to our approach is the tailored effective reward function grounded in sensitivity analysis, which precisely quantifies the marginal contribution of individual Gaussians to reconstruction quality. To maintain computational tractability, we derive a closed-form solution that reduces the complexity of reward calculation from $O(N^2)$ to $O(N)$. Extensive experiments on the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets demonstrate that \textbf{LeGS} significantly outperforms state-of-the-art methods, striking a superior balance between reconstruction quality and efficiency. The code will be released at https://github.com/AaronNZH/LeGS

[121] LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations

Huangbiao Xu, Huanqi Wu, Xiao Ke, Yuxin Peng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-world multimodal learning is often hindered by missing modalities. While Incomplete Multimodal Learning (IML) has gained traction, existing methods typically rely on the unrealistic assumption of full-modal availability during training to provide reconstruction supervision or cross-modal priors. This paper tackles the more challenging setting of IML under training-time incomplete observations, which precludes reliance on a ``God’s eye view’’ of complete data. We propose LIMSSR (LLM-Driven Incomplete Multimodal Sequence-to-Score Reasoning), a framework that reformulates this challenge as a conditional sequence reasoning task. LIMSSR leverages the semantic reasoning capabilities of Large Language Models via Prompt-Guided Context-Aware Modality Imputation and Multidimensional Representation Fusion to infer latent semantics from available contexts without direct reconstruction. To mitigate hallucinations, we introduce a Mask-Aware Dual-Path Aggregation to dynamically calibrate inference uncertainty. Extensive experiments on three Action Quality Assessment datasets demonstrate that LIMSSR significantly outperforms state-of-the-art baselines without relying on complete training data, establishing a new paradigm for data-efficient multimodal learning. Code is available at https://github.com/XuHuangbiao/LIMSSR.

[122] Scaling Video Understanding via Compact Latent Multi-Agent Collaboration

Kerui Chen, Jinglu Wang, Jianrong Zhang, Ming Li, Yan Lu, Hehe Fan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-modal large language models (MLLMs) advance vision language understanding but face inherent limitations in long-video tasks due to bounded perception context budgets. Existing agentic methods mitigate this via rule-based preprocessing, yet often suffer from information loss, high cost, and reliance on textual intermediates. We propose MACF, an end-to-end Multi-Agent Collaboration Framework that decouples per-agent perception budgets from global video complexity, enabling scalable video understanding while preserving visual fidelity. MACF partitions videos into segments for locally budgeted agents and enables holistic reasoning via an agent-native latent communication protocol. Each agent encodes partial observations into compact, task-sufficient tokens in a shared embedding space, allowing efficient and information-preserving collaboration by a central coordinator. We introduce a curriculum training strategy that progressively enforces semantic alignment, evidence summarization, and cross-agent coordination. Extensive experiments on diverse video understanding benchmarks show that MACF consistently outperforms state-of-the-art MLLMs and multi-agent systems under identical budget constraints, demonstrating the effectiveness of our latent collaboration for scalable video understanding.

[123] From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

Yearim Kim, Sangyu Han, Nojun Kwak

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Modern vision models achieve remarkable accuracy, but explaining where evidence arises, what the model encodes, and how internal computations assemble that evidence remains fragmented. We introduce an iERF-centric framework that unifies local, global, and mechanistic interpretability around a single analysis unit: the pointwise feature vector (PFV) paired with its instance-specific Effective Receptive Field (iERF). On the local side, Sharing Ratio Decomposition (SRD) expresses each PFV as a mixture of upstream PFVs via sharing ratios and propagates iERFs to construct class-discriminative saliency maps. SRD yields high-resolution, activation-faithful explanations, is robust to targeted manipulation and noise, and remains activation-agnostic across common nonlinearities. For the global view, we introduce Concept-Anchored Feature Explanation (CAFE), which utilizes the iERF as a semantic label, grounding abstract latent vectors in verifiable pixel-level evidence. With CAFE, we address the challenge of non-localized sparse autoencoder latents–especially in Transformers, where early self-attention mixes distant context. To answer how representations are composed through depth, we propose the Interlayer Concept Graph with Interlayer Concept Attribution (ICAT), which quantifies concept-to-concept influence while isolating layer pairs; an interlayer insertion, deletion protocol identifies Integrated Gradients as the most faithful instantiation. Empirically, across ResNet50, VGG16, and ViTs, our framework outperforms baselines in both fidelity and robustness, successfully interprets dispersed SAE features, and exposes dominant concept routes in correct, misclassified, and adversarial cases. Grounded in iERFs, our approach provides a coherent, evidence-backed map from pixels to concepts to decisions.

[124] Leveraging Vision-Language Models as Weak Annotators in Active Learning

Phuong Ngoc Nguyen, Kaito Shiku, Ryoma Bise, Seiichi Uchida, Shinnosuke Matsuo

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Active learning aims to reduce annotation cost by selectively querying informative samples for supervision under a limited labeling budget. In this work, we investigate how vision-language models (VLMs) can be leveraged to further reduce the reliance on costly human annotation within the active learning paradigm. To this end, we find that the reliability of VLMs varies significantly with label granularity in fine-grained recognition tasks: they perform poorly on fine-grained labels but can provide accurate coarse-grained labels. Leveraging this property, we propose an active learning framework that combines fine-grained human annotations with coarse-grained VLM-generated weak labels through instance-wise label assignment. We further model the systematic noise in VLM-generated labels using a small set of trusted full labels. Experiments on CUB200 and FGVC-Aircraft show that the proposed framework consistently outperforms existing active learning methods under the same annotation budget.

[125] High-Speed Vision Improves Zero-Shot Semantic Understanding of Human Actions

Yongpeng Cao, Yuji Yamakawa

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Understanding human actions from visual observations is essential for human–robot interaction, particularly when semantic interpretation of unfamiliar or hard-to-annotate actions is required. In scenarios such as rapid and less common activities, collecting sufficient labeled data for supervised learning is challenging, making zero-shot approaches a practical alternative for semantic understanding without task-specific training. While recent advances in large-scale pretrained models enable such zero-shot reasoning, the impact of temporal resolution, especially for rapid and fine-grained motions, remains underexplored. In this study, we investigate how temporal resolution affects zero-shot semantic understanding of high-speed human actions. Using kendo as a representative case of rapid and subtle motion patterns, we propose a training-free pipeline that combines a pre-trained video-language model for semantic representation with large language model-based reasoning for pairwise action comparison. Through controlled experiments across multiple frame rates (120 Hz, 60 Hz, and 30 Hz), we show that higher temporal resolution significantly improves semantic separability in zero-shot settings. We further analyze the role of tracking-based human joint information under both full and partial observation scenarios. Quantitative evaluation using a nearest-class prototype strategy demonstrates that high-speed video provides more stable and interpretable semantic representations for fast actions. These findings highlight the importance of temporal resolution in training-free action recognition and suggest that high-speed perception can enhance semantic understanding capabilities.

[126] GOR-IS: 3D Gaussian Object Removal in the Intrinsic Space

Yonghao Zhao, Yupeng Gao, Jian Yang, Jin Xie, Beibei Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent advances in Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made it standard practice to reconstruct 3D scenes from multi-view images. Removing objects from such 3D representations is a fundamental editing task that requires complete and seamless inpainting of occluded regions, ensuring consistency in geometry and appearance. Although existing methods have made notable progress in improving inpainting consistency, they often neglect global lighting effects, leading to physically implausible results. Moreover, these methods struggle with view-dependent non-Lambertian surfaces, where appearance varies across viewpoints, leading to unreliable inpainting. In this paper, we present 3D Gaussian Object Removal in the Intrinsic Space (GOR-IS), a novel framework for physically consistent and visually coherent 3D object removal. Our approach decomposes the scene into intrinsic components and explicitly models light transport to maintain global lighting effects consistency. Furthermore, we introduce an intrinsic-space inpainting module that operates directly in the material and lighting domains, effectively addressing the challenges posed by non-Lambertian surfaces. Extensive experiments on both synthetic and real-world datasets demonstrate that our framework substantially improves the physical consistency and visual coherence of object removal, outperforming existing methods by 13% in perceptual similarity (LPIPS) and 2dB in peak signal-to-noise ratio (PSNR). Code is publicly available at https://applezyh.github.io/GOR-IS-project-page/

[127] End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

Wenda Chu, Bingliang Zhang, Jiaqi Han, Yizhuo Li, Linjie Yang, Yisong Yue, Qiushan Guo

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Autoregressive image modeling relies on visual tokenizers to compress images into compact latent representations. We design an end-to-end training pipeline that jointly optimizes reconstruction and generation, enabling direct supervision from generation results to the tokenizer. This contrasts with prior two-stage approaches that train tokenizers and generative models separately. We further investigate leveraging vision foundation models to improve 1D tokenizers for autoregressive modeling. Our autoregressive generative model achieves strong empirical results, including a state-of-the-art FID score of 1.48 without guidance on ImageNet 256x256 generation.

[128] PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation

Nan Lei, Yuan-Ming Li, Ling-An Zeng, Liang Xu, Zhi-Wei Xia, Hui-Wen Huang, Fa-Ting Hong, Wei-Shi Zheng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Despite substantial progress in text-driven 3D human motion synthesis, generating realistic multi-person interaction sequences remains challenging. Notably, body inter-penetration is a pervasive issue from both data acquisition to the generated results, which significantly undermines the realism and usability. Previous generative models either ignored this issue or introduced computationally expensive mesh-level loss functions to alleviate inter-body collisions. In this paper, we propose a general-purpose and computationally efficient optimization strategy named PhysiGen to explicitly integrate collision-aware physical constraints for human-human interaction generation. Specifically, we simplify the high-resolution human body mesh into geometric primitives to greatly reduce the cost of inter-person collision detection. Moreover, we identify the collision regions as the guidance of the optimization directions. PhysiGen is plug-and-play and can be readily integrated into existing human interaction generation models. Extensive cross-dataset and cross-model experiments show that our method can effectively reduce interpenetration and significantly improve visual coherence and physical plausibility compared to the state-of-the-art methods.

Weichen Liu, Yixin Yang, Changsheng Chen, Alex Kot

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Suspect face generation remains a technical challenge in crime investigations. Traditional sketch-drawing workflows suffer from low efficiency and quality, while diffusion-based approaches still face intrinsic limitations on conditional ambiguity for text-to-image models and sampling variance for one-shot generation. We proposed IdentiFace, a novel diffusion-based framework for identifiable suspect face generation, which addressed these issues through (1) multi-modal input design to strengthen conditional control, and (2) an iterative generation pipeline enabling identifiable feature adjustment. We additionally contributed a facial identity loss and two task-specific datasets. Comprehensive experiments on synthetic datasets and in real-world scenarios indicate that IdentiFace achieves superior performance over existing methods, especially in terms of identity retrieval, and shows strong potential for practical applications.

[130] Vesselpose: Vessel Graph Reconstruction from Learned Voxel-wise Direction Vectors in 3D Vascular Images

Rajalakshmi Palaniappan, Christoph Karg, Nemesio Navarro-Arambula, Peter Hirsch, Kristin Kraeker, Lisa Mais, Dagmar Kainmueller

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Blood vessel segmentation and -tracing are essential tasks in many medical imaging applications. Although numerous methods exist, the prevailing segment-then-fix paradigm is fundamentally limited regarding its suitability for modeling the task of complete and topologically accurate vascular network reconstruction. Here, we propose an approach to extract topologically more accurate vascular graphs from 3D image data, building upon highly successful ideas from the related biomedical tasks of cell segmentation and -tracking. Our approach first predicts voxel-wise vessel direction vectors joint with standard vessel segmentation masks. Second, to extract the vascular graph from these predictions, we introduce a direction-vector-guided extension of the TEASAR algorithm. Our approach achieves state-of-the-art performance on three benchmark datasets, spanning both synthetic and real imagery. We further demonstrate the applicability of our approach to challenging 3D micro-CT scans of rat heart vasculature. Finally, we propose meaningful and interpretable measures of topological error, namely false splits and false merges for graphs. Overall, our approach substantially improves the topological accuracy of reconstructed vascular graphs, being able to separate closely apposed vessel segments and handle multiple vascular trees within a single volume.

[131] Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation

Nadav Z. Cohen, Ofir Abramovich, Ariel Shamir

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Text-to-image diffusion models generate images by gradually converting white Gaussian noise into a natural image. White Gaussian noise is well suited for producing diverse outputs from a single text prompt due to its absence of structure. However, this very property limits control over, and predictability of, specific visual attributes, as the noise is not human-interpretable. In this work, we investigate the characteristics of the input noise in diffusion models. We show that, although all frequencies in white Gaussian noise have comparable statistical energy, low-frequency components primarily determine the images global structure and color composition, while high-frequency components control finer details. Building on this observation, we demonstrate that simple manipulations of the low-frequency noise using low-frequency image priors can effectively condition the generation process to reconstruct these low-frequency visual cues. This allows us to define a simple, training-free method with minimal overhead that steers overall image structure and color, while letting high-frequency components freely emerge as fine details, enabling variability across generated outputs.

[132] Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds

Heejoon Moon, Jongwoo Lee, Jeonggon Kim, Je Hyeong Hong

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The emergence of deep neural networks capable of revealing high-fidelity scene details from sparse 3D point clouds has raised significant privacy concerns in visual localization involving private maps. Lifting map points to randomly oriented 3D lines is a well-known approach for obstructing undesired recovery of the scene images, but these lines are vulnerable to a density-based attack that can recover the point cloud geometry by observing the neighborhood statistics of lines. With the aim of nullifying this attack, we present a new privacy-preserving scene representation called \emph{sphere cloud}, which is constructed by lifting all points to 3D lines crossing the centroid of the map, resembling points on the unit sphere. Since lines are most dense at the map centroid, the sphere cloud mislead the density-based attack algorithm to incorrectly yield points at the centroid, effectively neutralizing the attack. Nevertheless, this advantage comes at the cost of i) a new type of attack that may directly recover images from this cloud representation and ii) unresolved translation scale for camera pose estimation. To address these issues, we introduce a simple yet effective cloud construction strategy to thwart new attack and propose an efficient localization framework to guide the translation scale by utilizing absolute depth maps acquired from on-device time-of-flight (ToF) sensors. Experimental results on public RGB-D datasets demonstrate sphere cloud achieves competitive privacy-preserving ability and localization runtime while not excessively compensating the pose estimation accuracy compared to other depth-guided localization methods.

[133] 2D-SuGaR: Surface-Aware Gaussian Splatting for Geometrically Accurate Mesh Reconstruction

Prajwal Gupta C. R., Divyam Sheth, Jinjoo Ha, Mirela Ostrek, Justus Thies

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful technique for generating photorealistic renderings of a scene in real-time. However, the volumetric nature of 3DGS limits its ability to accurately capture surface geometry. To address this, 2D Gaussian Splatting (2DGS) was proposed to enable view-consistent and geometrically accurate surface reconstruction from multi-view images. However, 2DGS can be sensitive to the initialization of the Gaussian primitives. Reliance on Structure-from-Motion (SfM) initializations, which can produce poor estimates on challenging image sets, may lead to subpar results. In this work, we enhance 2DGS by incorporating monocular depth and normal priors to improve both geometric accuracy and robustness. We propose a depth-guided initialization strategy for Gaussians and introduce a clustering-based technique for pruning degenerate Gaussians. We evaluate our method on the DTU dataset, where it achieves state-of-the-art results in mesh reconstruction while preserving high-quality novel view synthesis.

[134] Federated Distillation for Whole Slide Image via Gaussian-Mixture Feature Alignment and Curriculum Integration

Luru Jing, Cong Cong, Yanyuan Chen, Yongzhi Cao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated learning (FL) offers a promising framework for collaborative digital pathology by enabling model training across institutions. However, real-world deployments face heterogeneity arising from diverse multiple instance learning (MIL) architectures and heterogeneous feature extractors across institutions. We propose FedHD, a novel FL framework that performs local Gaussian-mixture feature alignment tailored for WSI analysis. Instead of exchanging model parameters, each client independently distills semantically rich synthetic feature representations aligned with the distribution of real WSIs. To preserve diagnostic diversity, FedHD adopts a one-to-one distillation strategy, generating a synthetic counterpart for each real slide to avoid over-compression. During federation, a curriculum-based integration strategy progressively incorporates cross-site synthetic features into local training once performance plateaus. Furthermore, an optional interpretation module reconstructs pseudo-patches from synthetic embeddings, enhancing transparency. FedHD is architecture-agnostic, privacy-preserving, and supports personalized yet collaborative training across diverse institutions. Experiments on TCGA-IDH, CAMELYON16, and CAMELYON17 show that FedHD consistently outperforms state-of-the-art federated and distillation baselines.

[135] Jailbreaking Vision-Language Models Through the Visual Modality

Aharon Azulay, Jan Dubiński, Zhuoyun Li, Atharv Mittal, Yossi Gandelsman

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The visual modality of vision-language models (VLMs) is an underexplored attack surface for bypassing safety alignment. We introduce four jailbreak attacks exploiting the vision component: (1) encoding harmful instructions as visual symbol sequences with a decoding legend, (2) replacing harmful objects with benign substitutes (e.g., bomb -> banana) then prompting for harmful actions using the substitute term, (3) replacing harmful text in images (e.g., on book covers) with benign words while visual context preserves the original meaning, and (4) visual analogy puzzles whose solution requires inferring a prohibited concept. Evaluating across six frontier VLMs, our visual attacks bypass safety alignment and expose a cross-modality alignment gap: text-based safety training does not automatically generalize to harmful intent conveyed visually. For example, our visual cipher achieves 40.9% attack success on Claude-Haiku-4.5 versus 10.7% for an equivalent textual cipher. To further our insight into the attack mechanism, we present preliminary interpretability and mitigation results. These findings highlight that robust VLM alignment requires treating vision as a first-class target for safety post-training.

[136] Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models

Jiayu Li, Jiaxin Qi, Sheng Zhou, Jiaqiang Huang, Xiansheng Hua

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Contrastive vision-language models like CLIP exhibit remarkable zero-shot generalization. However, prompt tuning remains highly sensitive to label noise, as mislabeled samples generate disproportionately large gradients that can overwhelm pre-trained priors. We argue that because CLIP already provides a near-optimal initialization, adaptation should be inherently conservative, particularly against the extreme gradient updates common in noisy settings. To this end, we propose Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free method for intrinsic gradient suppression. By applying a sequential probabilistic normalization, DSPT induces a self-adaptive saturation zone that suppresses gradients from high-error noisy samples while maintaining informative updates. We also provide both theoretical analysis and empirical evidence about how this mechanism achieves adaptive suppression. This design transforms ``gradient vanishing’’, traditionally a training bottleneck, into a principled noise-filtering shield for label-noise prompt tuning. Extensive experiments confirm that this simple, drop-in design achieves state-of-the-art robustness across various noisy benchmarks, outperforming methods with complex architectures and handcrafted hyperparameters.

[137] Robust Fusion of Object-Level V2X for Learned 3D Object Detection

Lukas Ostendorf, Lennart Reiher, Onn Haran, Lutz Eckstein

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by line-of-sight and field-of-view constraints. These inherent limitations may cause onboard perception to fail under occlusions or poor visibility conditions. In parallel, cooperative awareness via vehicle-to-everything (V2X) communication is becoming increasingly available, enabling vehicles and infrastructure to share their own state as object-level information that complements onboard perception. In this work, we study how such V2X information can be integrated into 3D object detection and how robust the resulting system is to realistic V2X imperfections. Using the nuScenes dataset, we emulate object-level cooperative awareness messages from ground truth, injecting controlled noise and object dropout to mimic real-world conditions such as latency, localization errors, and low V2X penetration rates. We convert these messages into a dedicated bird’s-eye view (BEV) input and fuse them into a BEVFusion-style detector. Our results demonstrate that while object-level cooperative information can substantially improve detection performance, achieving an NDS of 0.80 under favorable conditions, models trained on idealized data become fragile and over-reliant on V2X. Conversely, our proposed noise-aware training strategy, coupled with explicit confidence encoding, enhances robustness, maintaining performance gains even under severe noise and reduced V2X penetration.

[138] Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors

Hao Wei, Yanhui Zhou, Chenyang Ge, Saeed Anwar, Ajmal Mian

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Most recent extreme rescaling methods struggle to preserve semantically consistent structures and produce realistic details, due to the severely ill-posed nature of low- to high-resolution mapping under scaling factors of $16\times$ or higher. To alleviate the above problems, we propose FaithEIR, a diffusion-based framework for extreme image rescaling. Inspired by singular value decomposition, we develop learnable reversible transformation that enables invertible downscaling and upscaling in the latent space. To compensate for information loss due to quantization, we propose an adaptive detail prior, a high-frequency dictionary that captures the empirical average of commonly occurring structures in the training data. Finally, we design a lightweight pixel semantic embedder to provide semantic conditioning for the pretrained diffusion model. We present extensive experimental results demonstrating that our FaithEIR consistently outperforms state-of-the-art methods, achieving superior reconstruction fidelity and perceptual quality. Our code, model weights, and detailed results are released at https://github.com/cshw2021/FaithEIR.

[139] BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis

Massimo Rondelli, Francesco Pivi, Maurizio Gabbrielli

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automatic generation of executable Blender code from natural language remains challenging, with state-of-the-art LLMs producing frequent syntactic errors and geometrically inconsistent objects. We present BlenderRAG, a retrieval-augmented generation system that operates on a curated multimodal dataset of 500 expert-validated examples (text, code, image) across 50 object categories. By retrieving semantically similar examples during generation, BlenderRAG improves compilation success rates from 40.8% to 70.0% and semantic normalized alignment from 0.41 to 0.77 (CLIP similarity) across four state-of-the-art LLMs, without requiring fine-tuning or specialized hardware, making it immediately accessible for deployment. The dataset and code will be available at https://github.com/MaxRondelli/BlenderRAG.

[140] UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unified multimodal framework that leverages VDM priors for versatile video generation. UniVidX formulates pixel-aligned tasks as conditional generation in a shared multimodal space, adapts to modality-specific distributions while preserving the backbone’s native priors, and promotes cross-modal consistency during synthesis. It is built on three key designs. Stochastic Condition Masking (SCM) randomly partitions modalities into clean conditions and noisy targets during training, enabling omni-directional conditional generation instead of fixed mappings. Decoupled Gated LoRA (DGL) introduces per-modality LoRAs that are activated when a modality serves as the generation target, preserving the strong priors of the VDM. Cross-Modal Self-Attention (CMSA) shares keys and values across modalities while keeping modality-specific queries, facilitating information exchange and inter-modal alignment. We instantiate UniVidX in two domains: UniVid-Intrinsic, for RGB videos and intrinsic maps including albedo, irradiance, and normal; and UniVid-Alpha, for blended RGB videos and their constituent RGBA layers. Experiments show that both models achieve performance competitive with state-of-the-art methods across distinct tasks and generalize robustly to in-the-wild scenarios, even when trained on fewer than 1,000 videos. Project page: https://houyuanchen111.github.io/UniVidX.github.io/

[141] InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization

Jaeyoung Chung, Suyoung Lee, Kyoung Mu Lee

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present a training-free approach for controllable 3D inpainting based on initial noise optimization. In the structured 3D latent diffusion framework, we observe that the underlying geometric structure is established during the early stages of the diffusion process and exhibits high sensitivity to the initial noise. Such characteristics compromise stability in tasks like inpainting and editing, where the model must ensure strict alignment with the existing context while synthesizing a new structure. In this paper, we introduce a strategy to optimize the initial noise within the structured 3D latent diffusion framework, ensuring high-fidelity 3D inpainting. Specifically, we update the initial noise by leveraging a backpropagation approximation grounded in the rectified flow model, with the spectral parameterization specially designed for robust and efficient structured 3D latent optimization. Experiments demonstrate consistent improvements in contextual consistency and prompt alignment over representative training-free inpainting baselines, establishing initial noise control as an independent dimension for 3D inpainting, orthogonal to conventional sampling trajectory manipulation.

[142] Prediction of Alzheimer’s Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

Seowung Leem, Yunchao Yang, Adam J. Woods, Ruogu Fang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The systemic, metabolic, lifestyle factors have established associations with Alzheimer’s Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using UK Biobank CFPs, DL models were trained using 62,876 images from 44,501 unique participants to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

[143] DMDSC: A Dynamic-Margin Deep Simplex Classifier for Open-Set Recognition on Medical Image Datasets

Vishal, Arnav Aditya, Nitin Kumar, Saurabh J. Shigwan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Medical imaging datasets are often characterized by extreme class imbalances, where rare pathologies are significantly underrepresented compared to common conditions. This imbalance poses a dual challenge for Open-Set Recognition (OSR): models must maintain high classification accuracy on known classes while reliably rejecting unknown samples unseen during training in the clinical settings. While recently proposed Deep Simplex Classifier (DSC)~~\cite{cevikalp2024reaching} and UnCertainty-aware Deep Simplex Classifier (UCDSC)~~\cite{Aditya_2026_WACV} successfully leverage Neural Collapse to ensure maximal inter-class separation, they rely on a uniform margin that does not account for the varying densities of medical classes. In this paper, we propose DMDSC an enhanced framework featuring a dynamic margin approach. Our approach automatically adapts class-specific margins based on label frequency, enforcing a higher penalty and tighter feature clustering for rare pathologies to counteract the effects of data imbalance. Extensive experiments conducted on diverse medical benchmarks on BloodMNIST\cite{medmnistv2}, OCTMNIST\cite{medmnistv2}, DermaMNIST\cite{medmnistv2}, and BreaKHis~\cite{spanhol2015dataset} datasets, demonstrate that our framework outperforms state-of-the-art methods.

[144] Foundation AI Models for Aerosol Optical Depth Estimation from PACE Satellite Data

Zahid Hassan Tushar, Sanjay Purushotham

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Aerosol Optical Depth (AOD) retrieval is essential for Earth observation, supporting applications from air quality monitoring to climate studies. Conventional physics-based AOD retrieval methods formulate the problem as a pixel-wise inversion, relying on radiative transfer modeling, memory-intensive look-up tables, and auxiliary meteorological data. While recent data-driven approaches have shown promise, many fail to exploit the spatial-spectral coherence of hyperspectral imagery, leading to spatially inconsistent and noise-sensitive retrievals. We present the first study exploring Foundation AI models for AOD retrieval and propose ViTCG, a Vision Transformer with Channel-wise Grouping-based spatial regression framework that reduces retrieval bias and error. ViTCG uses hyperspectral top-of-atmosphere radiance as input and jointly models spatial context and spectral information. Validation with PACE radiance observations demonstrates a 62% reduction in mean squared error compared to state-of-the-art foundation models, including Prithvi, and produces spatially coherent AOD fields.

[145] Static and Dynamic Graph Alignment Network for Temporal Video Grounding

Zhanjie Hu, Bolin Zhang, Jianhua Wang, Jianbo Zheng, Chenchen Yan, Takahiro Komamizu, Ichiro Ide, Jiangbo Qian

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries. Recently, Graph Convolutional Networks (GCN) have been widely adopted in TVG to model temporal relations among video clips and enhance contextual reasoning by constructing clip-level graphs. Despite their effectiveness, existing GCN-based TVG methods encounter three critical bottlenecks: 1) Most methods construct graph nodes using either static or dynamic features alone, resulting in incomplete visual representation and overlooking complementary semantics, 2) Most methods construct temporal graphs in a query-agnostic manner, leading to inefficient feature interaction within the temporal graph representation, and 3) Most methods often suffer from a single-granularity semantic matching, while direct training on complex temporal localization task may lead to slow convergence and suboptimal precision. To address these challenges, we propose Static and Dynamic Graph Alignment Network (SDGAN). First, SDGAN jointly exploits static and dynamic visual features to construct two complementary temporal graphs and performs Position-wise Nodes Alignment, enabling more expressive and robust visual representation. Second, SDGAN introduces Query-Clip Contrastive Learning and Adaptive Graph Modeling to explicitly align visual clips with their corresponding textual queries, yielding query-aware visual representations. Third, SDGAN incorporates multi-granularity temporal proposals within Progressive Easy-to-Hard Training Strategy, effectively bridging coarse-grained semantic localization and fine-grained temporal boundary refinement. Extensive experiments on three benchmark datasets demonstrate that SDGAN achieves superior performance across complex TVG scenarios. Codes and datasets are available at https://github.com/ZhanJieHu/SDGAN.

[146] PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

Guandong Li, Mengxia Ye

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Image editing instructions are heterogeneous: a color swap, an object insertion, and a physical-action edit all demand different spatial coverage and different reasoning depth, yet existing reasoning-based editors apply a single fixed inference recipe to every instruction. We argue that adaptivity along both the spatial and temporal axes is the missing degree of freedom, and we present PhysEdit, an editing framework built around this principle. PhysEdit introduces two inference-time modules that compose without retraining the backbone. At its core, (1) Complexity-Adaptive Reasoning Depth (CARD) predicts edit complexity directly from the instruction and reference image and allocates the reasoning step count N_r and reasoning-token length r per sample – turning a previously fixed inference schedule into a conditional-computation problem. CARD is supported by (2) a Spatial Reasoning Mask (SRM) that extracts an instruction-conditioned spatial prior from cross-attention to confine reasoning to regions that semantically require it. On the full 737-case ImgEdit Basic-Edit Suite, PhysEdit delivers a 1.18x wall-clock speedup (64.3s vs. 76.1s per sample) over a strong reasoning baseline while slightly improving instruction adherence (CLIP-T 0.2283 vs. 0.2266, +0.7%) and matching identity preservation within noise (CLIP-I 0.8246 vs. 0.8280). The speedup is category-dependent and reaches 1.52x on appearance-level edits, validating CARD’s adaptive allocation as the principal source of efficiency gain. A 30-sample pilot with full ablations isolates the contribution of each module.

[147] Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels

Tongxu Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Knee osteoarthritis (OA) assessment involves a natural but often underused label hierarchy: a coarse binary OA decision and a fine-grained Kellgren–Lawrence (KL) severity grade. Existing deep learning studies commonly treat these targets as separate classification problems, either reducing OA assessment to disease presence or directly optimizing noisy ordinal KL labels. In this work, we ask whether this clinical hierarchy can serve as a representation-level supervisory prior. Rather than introducing a complex architecture, we use a deliberately simple dual-head model with a shared encoder and two task-specific heads as a probe of hierarchical supervision. We compare single-OA, single-KL, and dual-head training across multiple 3D backbones under the same test protocol. Beyond standard classification metrics, we perform paired statistical comparisons, analyze latent severity-axis geometry, and examine saliency overlap with cartilage regions. The results show that dual-head supervision produces backbone-dependent gains, with clear improvements in KL-related metrics for selected backbones. More importantly, the gains are accompanied by a more ordered coarse-to-fine latent organization and, for responsive backbones, stronger anatomical alignment of saliency with cartilage. These findings suggest that even simple hierarchical dual-head supervision can reshape disease representations under noisy coarse/fine labels, providing a useful inductive bias for OA diagnosis and severity grading.

[148] Unpaired Image Deraining Using Reward-Guided Self-Reinforcement Strategy

Yinghao Chen, Yeying Jin, Xiang Chen, Yanyan Wei, Ziyang Yan, Yaowen Fu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Unsupervised deraining has attracted attention for its ability to learn the real-world distribution of rain without paired supervision. However, the lack of strong constraints makes it difficult for the network to converge, especially with the complex diversity of rain degradation. A key motivation is that high-quality deraining results occasionally emerge during training, which can be leveraged to guide the optimization process. To overcome these challenges, we introduce RGSUD (Reward-Guided Self-Reinforcement Unsupervised Image Deraining), comprising two key stages: reward recycling and self-reinforcement (SR) training. For the former stage, we propose an Image Quality Assessment (IQA)-based dynamic reward recycling mechanism that selects optimal derained outputs during training and continuously collects high-quality deraining images. In latter stage, we incorporate these rewards into the model’s optimization process, constraining the optimization space and improving alignment between derained outputs and clean images. By leveraging IQA-based self-reinforced loss and dynamically updated rewards, we enhance the quality of synthesized pseudo-paired data and stabilize the optimization. Extensive experiments demonstrate that our method achieves SOTA performance across multiple datasets, including paired synthetic, paired real, and unpaired real images, outperforming existing unsupervised deraining approaches in both subjective and objective IQA metrics. Additionally, we show that the self-reinforcement strategy is adaptable to other unsupervised deraining methods and our deraining framework demonstrates strong generalization across existing supervised deraining networks.

[149] Exploring the Limits of End-to-End Feature-Affinity Propagation for Single-Point Supervised Infrared Small Target Detection

Qiancheng Zhou, Wenhua Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Single-point supervised infrared small target detection (IRSTD) drastically reduces dense annotation costs. Current state-of-the-art (SOTA) methods achieve high precision by recovering mask supervision through explicit, offline pseudo-label construction, such as multi-stage active learning and physics-driven mask generation. In this paper, we study a minimalist alternative: generating point-to-mask supervision online through in-batch, point-anchored feature-affinity propagation. We instantiate this paradigm as GSACP, an end-to-end testbed that directly supervises the detector using hard-margin feature affinity gated by local image priors, entirely eliminating external label-evolution loops. This compact design, however, exposes an optimization bottleneck. Because the affinity target is generated from the same feature representation being optimized, training forms a self-referential loop. We theoretically formalize this as \emph{Self-Referential Propagation Drift}, a representation-supervision entanglement that can sharpen true boundaries or distort the feature space to satisfy its own targets. To systematically isolate these failure modes, we apply a protocolized single-variable ablation procedure spanning local EMA teacher decoupling, hard-background contrastive separation, and adaptive support geometry. On the SIRST3 dataset, GSACP-Final establishes a new ultra-low false-alarm operating regime, achieving a highly competitive $0.6674$ mIoU while demonstrating a $38% relative reduction in false-positive artifacts ($\mathrm{Fa}$) compared with PAL. By systematically deconstructing the end-to-end paradigm, we map its performance boundaries and show that in-batch feature propagation provides a compact alternative for deployment scenarios where false-alarm suppression is paramount.

[150] Quantum Gradient-Based Approach for Edge and Corner Detection Using Sobel Kernels

Mohammad Aamir Sohail, Gabriela Pinheiro, Yasemin Poyraz Kocak, Batuhan Hangun, Emre Camkerten, Simge Yigit, Hafize Asude Ertan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. Corners are locations where gray-level intensity changes abruptly in multiple directions and are widely used in feature extraction, object tracking, and 3D modeling. In this study, we present a quantum implementation of Sobel-based edge detection and Harris-style corner detection. Two quantum image encoding methods - Flexible Representation of Quantum Images (FRQI) and Quantum Probability Image Encoding (QPIE) - are used to encode the input data and are comparatively analyzed. The proposed approach introduces a quantum gradient computation scheme based on lag-2 differences, enabling the evaluation of gradient-like features in superposition. To improve detection quality and reduce false positives, a classical post-processing step is applied to candidate corner points identified by the quantum circuit. Results show that the proposed quantum circuits produce outputs consistent with classical Sobel and Harris operators. Furthermore, the QPIE-based configuration yields more stable and coherent results than FRQI, especially under limited measurement shots. While gradient computation can be performed efficiently at the circuit level, the overall cost remains dominated by state preparation, measurement, and classical post-processing. All experiments are conducted under noiseless simulation, and performance on NISQ hardware may be affected by noise and measurement limitations. Therefore, this work demonstrates a functional and scalable quantum realization of classical edge and corner detection methods rather than an end-to-end speedup.

[151] Modeling Subjective Urban Perception with Human Gaze

Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.

[152] Map2World: Segment Map Conditioned Text to 3D World Generation

Jaeyoung Chung, Suyoung Lee, Jianfeng Xiang, Jiaolong Yang, Kyoung Mu Lee

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.

[153] Make Your LVLM KV Cache More Lightweight

Xihao Chen, Yangyang Guo, Roger Zimmermann

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.

[154] GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image features, lack of factor-aware conditioning, and impractical capacity scaling. To address these challenges, we propose Globally-conditioned Multi-scale Gaze estimation (GMGaze), which leverages a multi-scale transformer architecture. Specifically, the model first introduces semantic prototype conditioning, which modulates the CLIP global image embedding using four learned prototype banks (i.e., illumination, background, head pose and appearance) to generate two complementary context-biased global tokens. These tokens, along with the CLIP patch and CNN tokens, are fused at the first layer. This early unified fusion prevents information loss common in late-stage merging. Finally, each token passes through sparse Mixture-of-Experts modules, providing conditional computational capacity without uniformly increasing dense parameters. For cross-domain adaptation, we incorporate an adversarial domain adaptation technique with a feature separation loss that encourages the two global tokens to remain de-correlated. Experiments using four public benchmarks (MPIIFaceGaze, EYEDIAP, Gaze360, and ETH-XGaze) show that GMGaze achieves mean angular errors of 2.49$^\circ$, 3.22$^\circ$, 10.16$^\circ$, and 1.44$^\circ$, respectively, outperforming previous baselines in all within-domain settings. In cross-domain evaluations, it provides state-of-the-art (SOTA) results on two standard transfer routes.

[155] Let ViT Speak: Generative Language-Image Pre-training

Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLIP trains a ViT to predict language tokens directly from visual tokens using a standard language modeling objective, without contrastive batch construction or an additional text decoder. This design offers three key advantages: (1) \textbf{Simplicity}: a single transformer jointly models visual and textual tokens; (2) \textbf{Scalability}: it scales effectively with both data and model size; and (3) \textbf{Performance}: it achieves competitive or superior results across diverse multimodal benchmarks. Trained on 8B samples from Recap-DataComp-1B, GenLIP matches or surpasses strong baselines despite using substantially less pretraining data. After continued pretraining on multi-resolution images at native aspect ratios, GenLIP further improves on detail-sensitive tasks such as OCR and chart understanding, making it a strong foundation for vision encoders in MLLMs.

[156] Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a “Visual Signal Dilution” phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to ensure sustained, on-demand visual perception. Integrated as a parallel branch alongside the Feed-Forward Network (FFN) in LVLMs, PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for precise visual perception, thereby structurally mitigating the signal suppression inherent to deep generation. Extensive experiments on Qwen3-VL models demonstrate that PVM brings notable improvements with negligible parameter overhead, delivering consistent average accuracy gains across both 4B and 8B scales, particularly in complex reasoning tasks that demand persistent visual perception. Furthermore, in-depth analysis reveals that PVM can resist length-induced signal decay and accelerate internal prediction convergence.

[157] Posterior Augmented Flow Matching

George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.

[158] Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Ahmet Enis Cetin, Ulas Bagci

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Self-attention is central to the success of Transformer architectures; however, learning the query, key, and value projections from random initialization remains challenging and computationally expensive. In this paper, we propose two complementary methods that leverage the Discrete Cosine Transform (DCT) to enhance the efficiency and performance of Vision Transformers. First, we address the initialization problem by introducing a simple yet effective DCT-based initialization strategy for self-attention, where projection weights are initialized using DCT coefficients. This structure-preserving approach consistently improves classification accuracy on the CIFAR-10 and ImageNet-1K benchmarks. Second, we propose a DCT-based attention compression technique that exploits the decorrelation properties of the frequency domain. By observing that high-frequency DCT coefficients typically correspond to noise, we truncate high-frequency components of the input patches, thereby reducing the dimensionality of the query, key, and value projections without sacrificing accuracy. Experiments on Swin Transformer models demonstrate that the proposed compression method achieves a substantial reduction in computational overhead while maintaining comparable performance.

[159] Image Score: Learning and Evaluating Human Preferences for Mercari Search

Chingis Oinar, Miao Cao, Shanshan Fu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2408.11349: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2408.11349&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[160] PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Shangkun Sun, Ruyang Liu, Haoran Tang, Yixiao Ge, Haibo Lu, Wei Gao, Jiankun Yang, Chen Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.02327: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.02327&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[161] Copula-enhanced Vision Transformer for high myopia diagnosis through OU UWF fundus images

Chong Zhong, Yunhao Liu, Yang Li, Xiang Fu, Jin Yang, Danjuan Yang, Meiyan Li, Jinfeng Xu, Aiyi Liu, Alan H. Welsh, Xingtao Zhou, Bo Fu, Catherine C. Liu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.06540: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.06540&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[162] Diffusion Models are Secretly Zero-Shot 3DGS Harmonizers

Vsevolod Skorokhodov, Nikita Durasov, Pascal Fua

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.06740: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.06740&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[163] Color Conditional Generation with Sliced Wasserstein Guidance

Alexander Lobashev, Maria Larchenko, Dmitry Guskov

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.19034: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.19034&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[164] Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark

Udayanga G.W.K.N. Gamage, Xuanni Huo, Luca Zanatta, T Delbruck, Cesar Cadena, Matteo Fumagalli, Silvia Tolu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.05679: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.05679&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[165] APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

Yuan Gao, Shaobo Xia, Sheng Nie, Cheng Wang, Xiaohuan Xi, Bisheng Yang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.09971: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.09971&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[166] Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.18875: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.18875&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[167] Thought Graph Traversal for Test-time Scaling in Chest X-ray VLLMs

Yue Yao, Zelin Wen, Yan Tong, Xinyu Tian, Xuqing Li, Xiao Ma, Dongliang Xu, Tom Gedeon

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.11989: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.11989&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[168] How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.01955: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.01955&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[169] Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement

Xiaoran Sun, Liyan Wang, Yeying Jin, Kin-man Lam, Zhixun Su, Yang Yang, Jinshan Pan, Cong Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.18064: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.18064&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[170] Smoothing Slot Attention Iterations and Recurrences

Rongzhen Zhao, Wenyan Yang, Juho Kannala, Joni Pajarinen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.05417: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.05417&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[171] A Deep Learning-Based CCTV System for Automatic Smoking Detection in Fire Exit Zones

Sami Sadat, Mohammad Irtiza Hossain, Junaid Ahmed Sifat, Suhail Haque Rafi, Md. Waseq Alauddin Alvi, Md. Khalilur Rhaman

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.11696: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.11696&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[172] Quantization Robustness to Input Degradations for Object Detection

Toghrul Karimov, Hassan Imani, Allan Kazakov

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.19600: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.19600&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[173] CollaFuse: Collaborative Diffusion Models

Simeon Allmendinger, Domenique Zipperling, Lukas Struppek, Niklas Kühl

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2406.14429: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2406.14429&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[174] Deepfakes: we need to re-think the concept of “real” images

Janis Keuper, Margret Keuper

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.21864: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.21864&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[175] FreeRet: MLLMs as Training-Free Retrievers

Yuhan Zhu, Xiangyu Zeng, Chenting Wang, Xinhao Li, Chunxu Liu, Yicheng Xu, Ziang Yan, Yi Wang, Limin Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24621: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24621&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[176] Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

Chenhui Xu, Fuxun Yu, Michael J. Bianco, Jacob Kovarskiy, Raphael Tang, Qi Zhang, Zirui Xu, Will LeVine, Brandon Dubbs, Heming Liao, Cassandra Burgess, Suvam Bag, Jay Patravali, Rupanjali Kukal, Mikael Figueroa, Rishi Madhok, Nikolaos Karianakis, Jinjun Xiong

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.00072: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.00072&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[177] ClustViT: Clustering-based Token Merging for Semantic Segmentation

Fabio Montello, Ronja Güldenring, Lazaros Nalpantidis

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.01948: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.01948&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[178] Instance-Aware Pseudo-Labeling and Class-Focused Contrastive Learning for Weakly Supervised Domain Adaptive Segmentation of Electron Microscopy

Shan Xiong, Jiabao Chen, Ye Wang, Jialin Peng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.16450: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.16450&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[179] Residual Diffusion Bridge Model for Image Restoration

Hebaixu Wang, Jing Zhang, Haoyang Chen, Haonan Guo, Di Wang, Jiayi Ma, Bo Du

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.23116: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.23116&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[180] The Determinism of Randomness: Latent Space Degeneracy in Diffusion Model

Song Yan, Chenfeng Wang, Wei Zhai, Xinliang Bi, Jian Yang, Yancheng Cai, Yusen Zhang, Yunwei Lan, Tao Zhang, GuanYe Xiong, Min Li, Zheng-Jun Zha

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.07756: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.07756&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[181] LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping

Chenying Liu, Wei Huang, Xiao Xiang Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.08156: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.08156&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[182] High Dynamic Range 3D Gaussian Splatting via Luminance-Chromaticity Decomposition

Kaixuan Zhang, Minxian Li, Mingwu Ren, Jiankang Deng, Xiatian Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.12895: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.12895&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[183] Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.01116: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.01116&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[184] A Novel Patch-Based TDA Approach for Computed Tomography Imaging

Dashti A. Ali, Aras T. Asaad, Jacob J. Peoples, Ahmad Bashir Barekzai, Camila Vilela, Hala Khasawneh, Jayasree Chakraborty, João Miranda, Mohammad Hamghalam, Natalie Gangai, Natally Horvat, Richard K. G. Do, Alice C. Wei, Amber L. Simpson

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.12108: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.12108&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[185] Adapting MLLMs for Nuanced Video Retrieval

Piyush Bagad, Andrew Zisserman

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.13511: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.13511&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[186] Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations

Jiawei Ge, Jiuxin Cao, Xinyi Li, Xuelin Zhu, Chang Liu, Bo Liu, Chen Feng, Ioannis Patras

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.20260: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.20260&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[187] It’s Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.00090: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.00090&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[188] VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image

Teng-Fang Hsiao, Bo-Kai Ruan, Yu-Lun Liu, Hong-Han Shuai

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.04349: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.04349&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[189] Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

Haoyuan Li, Qihang Cao, Tao Tang, Kun Xiang, Zihan Guo, Jianhua Han, Hang Xu, JiaWang Bian, Xiaodan Liang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.06037: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.06037&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[190] WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

Aydin Ayanzadeh, Prakhar Dixit, Sadia Kamal, Milton Halem

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.13305: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.13305&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[191] The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, Haiyi Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.09896: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.09896&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[192] ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

A. Said Gurbuz, Sunghwan Hong, Ahmed Nassar, Marc Pollefeys, Peter Staar

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.14276: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.14276&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[193] Driving with A Thousand Faces: A Benchmark for Closed-Loop Personalized End-to-End Autonomous Driving

Xiaoru Dong, Ruiqin Li, Xiao Han, Zhenxuan Wu, Jiamin Wang, Jian Chen, Qi Jiang, SM Yiu, Xinge Zhu, Yuexin Ma

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.18757: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.18757&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[194] Prefer-DAS: Learning from Local Preferences and Sparse Prompts for Domain Adaptive Segmentation of Electron Microscopy

Jiabao Chen, Shan Xiong, Jialin Peng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.19423: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.19423&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[195] Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

Hongbo Zheng, Afshin Bozorgpour, Dorit Merhof, Minjia Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.02727: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.02727&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[196] VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Ruoliu Yang, Chu Wu, Caifeng Shan, Ran He, Chaoyou Fu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.22285: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.22285&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[197] Learning Locally, Revising Globally: Global Reviser for Federated Learning with Noisy Labels

Yuxin Tian, Mouxing Yang, Yuhao Zhou, Jian Wang, Qing Ye, Tongliang Liu, Gang Niu, Jiancheng Lv

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2412.00452: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2412.00452&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[198] Adaptive Dual-Teacher Distillation with Subnetwork Rectification for Bridging Semantic Gaps in Black-Box Domain Adaptation

Zhe Zhang, Jing Li, Wanli Xue, Xu Cheng, Jianhua Zhang, Qinghua Hu, Shengyong Chen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.22908: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.22908&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[199] Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

Felix Wimbauer, Fabian Manhardt, Michael Oechsle, Nikolai Kalischek, Christian Rupprecht, Daniel Cremers, Federico Tombari

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.28980: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.28980&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[200] Are Pretrained Image Matchers Good Enough for SAR-Optical Satellite Registration?

Isaac Corley, Alex Stoken, Gabriele Berton

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10217: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10217&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[201] Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance

Saeed Mohseni-Sehdeh, Walid Saad, Kei Sakaguchi, Tao Yu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.18654: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.18654&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[202] At FullTilt: Real-Time Open-Set 3D Macromolecule Detection Directly from Tilted 2D Projections

Ming-Yang Ho, Alberto Bartesaghi

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10766: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10766&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[203] Find, Fix, Reason: Context Repair for Video Reasoning

Haojian Huang, Chuanyu Qin, Yinchuan Li, Yingcong Chen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16243: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16243&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[204] Semantic Foam: Unifying Spatial and Semantic Scene Decomposition

Amr Sharafeldin, Shrisudhan Govindarajan, Thomas Walker, Aryan Mikaeili, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26262: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26262&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[205] DiffMI: Breaking Face Recognition Privacy via Diffusion-Driven Training-Free Model Inversion

Hanrui Wang, Shuo Wang, Chun-Shien Lu, Isao Echizen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.18015: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.18015&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[206] Certifiable Factor Graph Optimization

Zhexin Xu, Nikolas R. Sanderson, Hanna Jiamei Zhang, David M. Rosen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.01267: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.01267&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[207] ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Stephan Xie, Ben Cohen, Mononito Goswami, Junhong Shen, Emaad Khwaja, Chenghao Liu, David Asker, Othmane Abou-Amal, Ameet Talwalkar

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.21199: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.21199&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[208] Phase-Separated Complex Hilbert PCA on Markerless 3D Pose Estimation Data: A Global Phase Network and Its Extension to a Continuous Field on the Body Surface

Hiromitsu Goto, Tao Tao, Zheng-Lin Chia

Luca Castri, Gloria Beraldo, Nicola Bellotto

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.11901: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.11901&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Lehan He, Zeren Chen, Zhe Zhang, Xiang Gao, Lu Sheng

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.18315: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.18315&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[251] Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.06361: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.06361&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[252] LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery

Arshia Akhavan, Alireza Hosseinpour, Abbas Heydarnoori, Hamid Bagheri, Mehdi Keshani

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.12232: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.12232&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[253] Vibe Coding in Product Teams: Reconfiguring AI-Assisted Workflows, Prototyping, and Collaboration

Jie Li, Youyang Hou, Laura Lin, Ruihao Zhu, Hancheng Cao, Abdallah El Ali

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.10652: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.10652&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[254] LLM DNA: Tracing Model Evolution via Functional Representations

Zhaomin Wu, Haodong Zhao, Ziyang Wang, Jizhou Guo, Qian Wang, Bingsheng He

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24496: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24496&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[255] Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

Lorenzo Nikiforos, Luciano Prono, Charalampos Antoniadis, Fabio Pareschi, Riccardo Rovatti, Gianluca Setti

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.09696: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.09696&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[256] ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

Charidimos Papadakis, Angeliki Dimitriou, Giorgos Filandrianos, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.15949: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.15949&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[257] Feedback Lunch: Learned Feedback Codes for Secure Communications

Yingyao Zhou, Natasha Devroye, Onur Günlü

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.16620: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.16620&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[258] MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, Yiqun Liu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.17281: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.17281&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[259] Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

Md. Mehedi Hasan, Sk Tanzir Mehedi, Ziaur Rahman, Rafid Mostafiz, Md. Abir Hossain

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.22628: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.22628&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[260] TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation

Baris Yilmaz, Bevan Deniz Cilgin, Erdem Akagündüz, Salih Tileylioglu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.04694: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.04694&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[261] AI-Driven Expansion and Application of the Alexandria Database

Théo Cavignac, Jonathan Schmidt, Pierre-Paul De Breuck, Antoine Loew, Tiago F. T. Cerqueira, Hai-Chen Wang, Anton Bochkarev, Yury Lysogorskiy, Aldo H. Romero, Ralf Drautz, Silvana Botti, Miguel A. L. Marques

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.09169: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.09169&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[262] Evolutionary BP+OSD Decoding for Low-Latency Quantum Error Correction

Hee-Youl Kwak, Seong-Joon Park, Hyunwoo Jung, Jeongseok Ha, Jae-Won Kim

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.18273: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.18273&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[263] Adoption and Use of LLMs at an Academic Medical Center

Nigam H. Shah, Nerissa Ambers, Abby Pandya, Timothy Keyes, Juan M. Banda, Srikar Nallan, Carlene Lugtu, Artem A. Trotsyuk, Suhana Bedi, Alyssa Unell, Miguel Fuentes, Francois Grolleau, Sneha S. Jain, Jonathan Chen, Devdutta Dash, Danton Char, Aditya Sharma, Duncan McElfresh, Patrick Scully, Vishanthan Kumar, Clancy Dennis, Connor OBrien, Satchi Mouniswamy, Elvis Jones, Krishna Jasti, Gunavathi Mannika Lakshmanan, Sree Ram Akula, Varun Kumar Singh, Ramesh Rajmanickam, Sudhir Sinha, Vicky Zhou, Xu Wang, Bilal Mawji, Joshua Ge, Wencheng Li, Travis Lyons, Jarrod Helzer, Vikas Kakkar, Ramesh Powar, Darren Batara, Cheryl Cordova, William Frederick III, Olivia Tang, Phoebe Morgan, April S. Liang, Stephen P. Ma, Shivam Vedak, Dong-han Yao, Akshay Swaminathan, Mehr Kashyap, Brian Ng, Jamie Hellman, Nikesh Kotecha, Christopher Sharp, Gretchen Brown, Christian Lindmark, Anurang Revri, Michael A. Pfeffer

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00074: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00074&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[264] BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Abdullah Arafat Miah, Kevin Vu, Yu Bi

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.07200: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.07200&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Stephan Vonschallen, Dominique Oberle, Theresa Schmiedel, Friederike Eyssel

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.12873: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.12873&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[266] Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising

Yuduo Guo, Hao Zhang, Mingyu Li, Fujiang Yu, Yunjing Wu, Yuhan Hao, Song Huang, Yongming Liang, Xiaojing Lin, Xinyang Li, Jiamin Wu, Zheng Cai, Qionghai Dai

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.17205: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.17205&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[267] GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables

Zhengyu Li, Xiangfei Qiu, Yuhan Zhu, Xingjian Wu, Jilin Hu, Chenjuan Guo, Bin Yang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.08032: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.08032&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[268] Semantic Level of Detail for Knowledge Graphs: Discovering Abstraction Boundaries via Spectral Heat Diffusion

Edward Izgorodin

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.08965: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.08965&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[269] GenRecEdit: Adapting Model Editing for Generative Recommendation with Cold-Start Items

Chenglei Shen, Teng Shi, Weijie Yu, Xiao Zhang, Jun Xu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.14259: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.14259&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[270] Degrees, Levels, and Profiles of Contextuality

Ehtibar N. Dzhafarov, Victor H. Cervantes

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.26692: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.26692&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[271] A First Guess is Rarely the Final Answer: Learning to Search in the Traveling Salesperson Problem

Andoni Irazusta Garmendia

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.06940: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.06940&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[272] Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.07669: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.07669&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[273] Bridging the Experimental Last Mile: Digitizing Laboratory Know-How for Safe AI-Assisted Support

Akira Miura, Yuki Sasahara, Momoka Demura, Yuji Masubuchi, Tetsuya Asai, Chikahiko Mitsui

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16345: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16345&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[274] Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.18239: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.18239&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[275] Removing Sandbagging in LLMs by Training with Weak Supervision

Emil Ryd, Henning Bartsch, Julian Stastny, Joe Benton, Vivek Hebbar

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.22082: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.22082&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[276] SUDP: Secret-Use Delegation Protocol for Agentic Systems

Xiaohang Yu, Hejia Geng, Xinmeng Zeng, William Knottenbelt

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.24920: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.24920&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[277] Learning Rate Transfer in Normalized Transformers

Boris Shigida, Boris Hanin, Andrey Gromov

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27077: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27077&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[278] Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Halley Young, Nikolaj Björner

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27209: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27209&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[279] AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

Xubin Luo, Cheng Yang

[289] Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

Pragya Sharma, Hang Qiu, Mani Srivastava

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The increasing deployment of deep neural networks (DNNs) in cyber-physical systems (CPS) enhances perception fidelity, but imposes substantial computational demands on execution platforms, posing challenges to real-time control deadlines. Traditional distributed CPS architectures typically favor on-device inference to avoid network variability and contention-induced delays on remote platforms. However, this design choice places significant energy and computational demands on the local hardware. In this work, we revisit the assumption that cloud-based inference is intrinsically unsuitable for latency-sensitive control tasks. We demonstrate that, when provisioned with high-throughput compute resources, cloud platforms can effectively amortize network and queueing delays, enabling them to match or surpass on-device performance for real-time decision-making. Specifically, we develop a formal analytical model that characterizes distributed inference latency as a function of the sensing frequency, platform throughput, network delay, and task-specific safety constraints. We instantiate this model in the context of emergency braking for autonomous driving and validate it through extensive simulations using real-time vehicular dynamics. Our empirical results identify concrete conditions under which cloud-based inference adheres to safety margins more reliably than its on-device counterpart. These findings challenge prevailing design strategies and suggest that the cloud is not merely a feasible option, but often the preferred inference location for distributed CPS architectures. In this light, the cloud is not as distant as traditionally perceived; in fact, it is closer than it appears.

[290] FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources

Md Sirajul Islam, Isabelle G Chapman, N I Md Ashafuddula, Xu Yuan, Li Chen, Nian-Feng Tzeng, Klara Nahrstedt

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated Learning (FL) enables collaborative intelligence across decentralized data source devices in a privacy-preserving way. While substantial research attention has been drawn to optimizing the learning process for an individual task, real-world applications increasingly require multiple machine learning tasks simultaneously training their models across a shared pool of devices. Naively applying single-FL optimization techniques in multi-FL systems results in suboptimal system performance, particularly due to device heterogeneity and resource inefficiency. To address such a critical open challenge, we introduce {\em FedACT}, a novel resource heterogeneity-aware device scheduling approach designed to efficiently schedule heterogeneous devices across multiple concurrent FL jobs, with the goal of minimizing their average job completion time (JCT). {\em FedACT} dynamically assigns devices to FL jobs based on an alignment scoring mechanism that evaluates the compatibility between available resources of devices and resource demands of jobs. Additionally, it incorporates participation fairness to ensure balanced contributions from devices across jobs, further enhancing the accuracy levels of learned global models. An optimal scheduling plan is formulated in {\em FedACT} by prioritizing devices with higher alignment scores, while ensuring fair participation across jobs. To evaluate the effectiveness of the proposed scheduling algorithm, we carried out comprehensive experiments using diverse FL jobs and benchmark datasets. Experimental results demonstrate that {\em FedACT} reduces the average JCT by up to 8.3(\times) and improves model accuracy by up to 44.5%, compared to the state-of-the-art baselines.

[291] What Physics do Data-Driven MoCap-to-Radar Models Learn?

Kevin Chen, Kenneth W. Parker, Anish Arora

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Data-driven MoCap-to-radar models generate plausible micro-Doppler spectrograms, but do they actually learn the underlying physics? We introduce a physics-based interpretability framework to answer this question via two proposed complementary metrics: one measures alignment between model predictions and the physics-derived Doppler frequency, while the other tests whether predictions preserve the velocity-frequency relationship under velocity intervention. Both metrics require only MoCap input and model predictions, without access to measured radar data. Experiments across several model architectures reveal that low reconstruction error does not guarantee physical consistency: some, but not all, models achieve low error yet perform poorly on the two physics-based metrics. Further analysis shows that temporal attention is critical for transformer-based models to learn the underlying physics.

[292] AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G

Kejia Bian, Meixia Tao, Jianhua Mo, Zhiyong Chen, Leyan Chen

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The success of large foundation models is catalyzing a new paradigm for AI-native 6G network design: wireless foundation models for physical layer design. However, existing models often operate on channel state information (CSI) in the space-time-frequency (STF) domain, where distinct multipath components are inherently superimposed and structurally entangled. This hinders the learning of universal channel representation. Meanwhile, their reliance on global attention mechanisms incurs prohibitive computational overhead. In this paper, we propose AirFM-DDA, an Air-interface Foundation Model operating in the Delay-Doppler-Angle (DDA) domain for physicallayer tasks. Specifically, AirFM-DDA reparameterizes CSI from the STF domain into the DDA domain to explicitly resolve multipath components along physically meaningful axes. It employs a window-based attention module augmented with framestructure-aware positional encoding (FS-PE). This window-based attention aligns with locally clustered multipath dependencies while avoiding quadratic-complexity global attention, and FS-PE injects frame-structure priors into network. Extensive experiments demonstrate that AirFM-DDA achieves superior zero-shot generalization across unseen scenarios and datasets, consistently outperforming the baselines on channel prediction and estimation tasks. Compared to the global attention, its window-based attention reduces training and inference costs by nearly an order of magnitude. Moreover, AirFM-DDA maintains robustness under high mobility, large delay spreads, severe noise, and extreme aliasing conditions.

[293] High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

Aleksandar Armacki, Haoyuan Cai, Ali H. Sayed

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent’s cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.

[294] Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

Chunlei Meng, Pengbin Feng, Rong Fu, Hoi Leong Lee, Xiaojing Du, Zhaolu Kang, Zeyu Zhang, Weilin Zhou, Chun Ouyang, Zhongxue Gan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Centralized multimodal learning commonly compresses language, acoustic, and visual signals into a single fused representation for prediction. While effective, this paradigm suffers from two limitations: modality dominance, where optimization gravitates towards the path of least resistance, ignoring weaker but informative modalities, and spurious modality coupling, where models overfit to incidental cross-modal correlations. To address these, we propose Group Cognition Learning (GCL), a governed collaboration paradigm that applies a two-stage protocol after modality-specific encoding. In Stage 1 (Selective Interaction), a Routing Agent proposes directed interaction routes, and an Auditing Agent assigns sample-wise gates to emphasize exchanges that yield positive marginal predictive gain while suppressing redundant coupling. In Stage 2 (Consensus Formation), a Public-Factor Agent maintains an explicit shared factor, and an Aggregation Agent produces the final prediction through contribution-aware weighting while keeping each modality representation as a specialization channel. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate that GCL mitigates dominance and coupling, establishing state-of-the-art results across both regression and classification benchmarks. Analysis experiments further demonstrate the effectiveness of the design.

[295] Learning physically grounded traffic accident reconstruction from public accident reports

Yanchen Guan, Haicheng Liao, Chengyue Wang, Zhenning Li

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Traffic accidents are routinely documented in textual reports, yet physically grounded accident reconstruction remains difficult because detailed scene measurements and expert reconstructions are scarce, costly and hard to scale. Here we formulate accident reconstruction from publicly accessible reports and scene measurements as a parameterized multimodal learning problem. We construct CISS-REC, a dataset of 6,217 real-world accident cases curated from the NHTSA Crash Investigation Sampling System, and develop a reconstruction framework that grounds report semantics to road topology and participant attributes, reconstructs lane consistent pre-impact motion, and refines collision relevant interactions through localized geometric reasoning and temporal allocation. Our method outperforms representative baselines on CISS-REC, achieving the strongest overall reconstruction fidelity, including improved accident point accuracy and collision consistency. These results show that public accident reports can serve as scalable computational substrates for quantitatively verifiable accident reconstruction, with potential value for traffic safety analysis, simulation and autonomous driving research.

[296] Smart Ensemble Learning Framework for Predicting Groundwater Heavy Metal Pollution

T. Ansah-Narh, G. Y. Afrifa, J. B. Tandoh, K. Asare, M. Addi, K. E. Yorke, D. M. A. Akpoley, K. Aidoo, S. K. Fosuhene

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Groundwater in the Densu Basin is increasingly threatened by heavy metal contamination, but conventional methods fail to capture the statistical complexity and spatial heterogeneity of pollution indicators. A key challenge is modelling the Heavy Metal Pollution Index (HPI), which is typically skewed and affected by correlated contaminants, leading to biased predictions without transformation. This study develops a predictive framework integrating response transformations with nested cross-validated ensemble machine learning. Three transformations (raw, log, and Gaussian copula) were applied to HPI and evaluated across six learners: support vector regression (SVM), $k$-nearest neighbours (k-NN), CART, Elastic Net, kernel ridge regression, and a stacked Lasso ensemble. Raw-scale models produced deceptively high fits (Elastic Net and stacked ensemble $R^2 \approx 1.0$), suggesting over-optimism. The log transformation stabilised variance (SVM: $R^2 = 0.93$, RMSE $= 0.18$; k-NN: $R^2 = 0.92$, RMSE $= 0.20$). The Gaussian copula gave the most reliable results: stacked ensemble $R^2 = 0.96$ (RMSE $= 0.19$), with other learners maintaining high accuracy. Copula-based models improved residuals and produced spatially plausible maps. DBSCAN clustering revealed Fe and Mn as primary HPI contributors, consistent with regional hydrogeochemistry. Limitations include reliance on random (not spatial) cross-validation and basin-specific scope. Future work should explore spatial validation and other geological settings. Overall, distribution-aware ensembles with clustering diagnostics offer robust, interpretable assessments of groundwater contamination.

[297] Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Shradha Sharma, Swapnil Dhamal, Shweta Jain

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

[298] Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise

Mohammad Partohaghighi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Information-theoretic generalization bounds analyze stochastic optimization by relating expected generalization error to the mutual information between learned parameters and training data. Virtual perturbation analyses of SGD add auxiliary Gaussian noise only in the proof, making mutual information tractable while leaving the actual SGD trajectory unchanged. Existing bounds, however, typically require perturbation covariances to be fixed independently of the optimization history, limiting their ability to represent geometries induced by moving gradient statistics, preconditioners, curvature proxies, and other pathwise information. We introduce predictable history-adaptive virtual perturbations, where the perturbation covariance at each iteration may depend on the past real SGD history but not on current or future randomness. This predictability enables a conditional Gaussian relative-entropy argument and yields generalization bounds for SGD with adaptive virtual-noise geometry. The bounds replace fixed sensitivity and gradient-deviation terms with conditional adaptive counterparts, include an output-sensitivity penalty from accumulated perturbation covariance, and reduce the deviation term to a conditional variance only under conditional unbiasedness. Since adaptive covariances may be data-dependent, we separate local Gaussian smoothing from global reference-kernel comparison. The resulting bound includes a covariance-comparison cost measuring the KL price of using an admissible reference geometry different from the actual adaptive covariance. Fixed-noise-style bounds are recovered under admissible synchronization, such as deterministic, public, or prefix-observable covariance rules. The framework recovers fixed isotropic and geometry-aware bounds as special cases while extending virtual perturbation analysis to history-dependent SGD without modifying the algorithm.

[299] RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., \texttt{IF}, \texttt{GOTO}, \texttt{FORALL}). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.

[300] Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications

Ricardo Luna Gutierrez, Sahand Ghorbanpour, Ejaz Rahman, Varchas Gopalaswamy, Riccardo Betti, Vineet Gundecha, Aarne Lees, Soumyendu Sarkar

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Inertial Confinement Fusion (ICF) holds transformative promise for sustainable, near-limitless clean energy, yet remains constrained by prohibitively high costs and limited experimental opportunities. This paper presents Human-in-the-Loop Meta Bayesian Optimization (HL-MBO), a framework that integrates expert knowledge with few-shot, uncertainty-aware machine learning to accelerate discovery in data-scarce, high-stakes scientific domains. HL-MBO introduces a meta-learned surrogate model with an expert-informed acquisition function to recommend candidate experiments. To foster trust and enable informed decisions, HL-MBO also provides interpretable explanations of its suggestions. We show HL-MBO outperforms current BO methods on ICF energy yield optimization, as well as benchmarks in molecular optimization and critical temperature maximization for superconducting materials.

[301] Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series

Christopher Holder, Anthony Bagnall

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Elastic distances like dynamic time warping (DTW) are central to time series machine learning because they compare sequences under local temporal misalignment. Soft-DTW is an adaptation of DTW that can be used as a gradient-based loss by replacing the hard minimum in its dynamic-programming recursion with a smooth relaxation. However, this approach does not directly extend to elastic distances whose transition costs depend on the local alignment context. Move-Split-Merge (MSM) is one such distance: it uses context-aware split and merge penalties and has often outperformed DTW in supervised and unsupervised time series machine learning tasks such as classification and clustering. We introduce Soft-MSM, a smooth relaxation of MSM and an elastic alignment loss with context-aware transition costs. Central to the formulation is a smooth gated surrogate for MSM’s piecewise split/merge cost, which enables gradients through both the dynamic-programming recursion and the local transition structure. We derive the forward recursion, backward recursion, soft alignment matrix, closed-form gradient, limiting behaviour, and divergence-corrected formulation. Experiments on 112 UCR datasets show that Soft-MSM gives lower MSM barycentre loss than existing MSM barycentre methods, and yields significantly better clustering and nearest-centroid classification performance than Soft-DTW-based alternatives. An implementation is available in the open-source \texttt{aeon} toolkit.

[302] CRADIPOR: Crash Dispersion Predictor

Edgar Chaillou, Sebastian Rodriguez, Yves Tourbier, Francisco Chinesta

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present CRADIPOR, a numerical dispersion prediction tool for automotive crash simulations. Finite Element (FE) crash models are widely used throughout vehicle development, but their predictions are not strictly repeatable because of parallel computation and model complexity. As a result, performance criteria evaluated during post-processing may exhibit significant numerical dispersion, which complicates engineering decision-making. Although dispersion can be estimated by repeating the same simulation, this approach is generally impractical because of its high computational cost. This work therefore investigates a prediction tool that can be applied during routine crash-simulation post-processing without repeating the computation. The proposed approach relies on a Rank Reduction Autoencoder (RRAE) combined with supervised classification in order to identify regions sensitive to numerical dispersion. The comparative analysis suggests that the RRAE-based framework is more effective than the Random Forest baseline on the studied dataset. Among the tested signal representations, wavelet-based and slope-based inputs appear to be the most promising, with slope variations providing the best classification performance. These results support the use of structured latent representations for improving numerical-dispersion detection in automotive crash post-processing.

[303] Hyperspherical Forward-Forward with Prototypical Representations

Shalini Sarode, Brian Moser, Joachim Folz, Federico Raue, Tobias Nauen, Stanislav Frolov, Andreas Dengel

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The Forward-Forward (FF) algorithm presents a compelling, bio-inspired alternative to backpropagation. However, while efficient in training, it has a computationally prohibitive inference process that requires a separate forward pass for every class that is evaluated. In this work, we introduce the Hyperspherical Forward-Forward (HFF), a novel reformulation that resolves this critical bottleneck. Our core innovation is to reframe the local objective of each layer from a binary goodness-of-fit task to a direct multi-class classification problem within a hyperspherical feature space. We achieve this by learning a set of class-specific, unit-norm prototypes that act as geometric anchors and implicit negatives. This architectural innovation preserves the benefits of local training while enabling weight update and inference in a single forward pass, making it >40x faster than the original FF algorithm. Our method is simple to implement, scales effectively to modern convolutional architectures, and achieves superior accuracy on standard image classification benchmarks, closing the gap with backpropagation. Most notably, we are among the first greedy local-learning methods to report over 25% top-1 accuracy on ImageNet-1k, and 65.96% with transfer learning.

[304] Comparative Analysis of Polygon-Based and Global Machine Learning Models for Bus Occupancy Prediction

Daniel Azenkot, Michael Fire, Eran Ben Elia

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Accurate forecasting of bus ridership (passengers numbers) is crucial for efficient management and optimization of public transport systems. Traditional forecasting models often fail to capture the unique and localized dynamics of different urban areas by treating the entire city as a single, homogeneous region. This paper introduces a novel framework that enhances bus ridership prediction by integrating a spatial clustering methodology with multi-dimensional feature analysis. The proposed framework utilizes a diverse set of data, including bus ridership data (by route number, time, and bus stop) complemented by a variety of open source data, such as spatial features (e.g., attractive destinations), meteorological conditions (e.g., temperature, rainfall), and temporal patterns (e.g., time of day, day of week). By clustering the urban area into distinct regions, based on the principle that bus stops in close proximity share similar ridership characteristics, a separate local forecasting model is trained for each of these clusters. This localized approach demonstrates an accuracy comparable to that of global models. The findings suggest that a spatially-aware, localized modeling strategy is effective for public transport prediction, paving the way for more targeted and efficient service improvements.

[305] SPLICE: Latent Diffusion over JEPA Embeddings for Conformal Time-Series Inpainting

Arnaud Zinflou

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Generative models for time-series imputation achieve strong reconstruction accuracy, yet provide no finite-sample reliability guarantees, a critical limitation in power systems where imputed values inform dispatch and planning. We introduce SPLICE (Self-supervised Predictive Latent Inpainting with Conformal Envelopes), a modular framework coupling latent generative imputation with distribution-free, online-adaptive prediction intervals. A JEPA encoder maps daily load segments into a 64-dimensional latent space; a conditional latent bridge with four sampling modes generates candidate gap trajectories; an hourly-conditioned decoder maps back to signal space; and Adaptive Conformal Inference (ACI) wraps the output with coverage-guaranteed prediction bands. The flow-matching variant achieves comparable quality to DDIM in 5–10 ODE steps (5-10x speedup). On thirteen load datasets (nine proprietary, three UCI Electricity, ETTh1), SPLICE achieves the lowest mean Load-only MSE (0.056), winning 9/12 non-degenerate datasets at 91-day gaps and 18/32 across all gap lengths vs. five established baselines, and produces the best CRPS (0.161, -18.3% vs. the strongest competitor). ACI delivers 93–95% empirical coverage, correcting under-coverage failures of up to 7.5 pp observed with static conformal prediction. A pooled JEPA encoder trained on nine feeds transfers to four unseen domains, matching or exceeding per-dataset oracles with only a quick bridge fine-tuning.

[306] Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization

Huayu Li, ZhengXiao He, Xiwen Chen, Jingjing Wang, Siyuan Tian, Jinghao Wen, Ao Li

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Learning meaningful representations from medical time series (MedTS) such as ECG or EEG signals is a critical challenge. These signals are often high-dimensional, variable-length and rife with noise. Existing self-supervised approaches, such as Masked Autoencoders (MAEs) are highly effective for pre-training general-purpose encoders. However, they do not explicitly learn compact and semantically interpretable latent representations, typically relying on heuristic aggregation strategies such as global average pooling or a designated [CLS] token. We propose a novel framework that compresses a variable-length MedTS into a fixed-size set of $k$ latent Fingerprint Tokens. Our architecture employs a cross-attention bottleneck to generate these tokens and is trained with a dual-objective function. The first objective is a reconstruction loss, which ensures the tokens are \textit{sufficient statistics} for the original data. The second, a diversity penalty based on the Total Coding Rate (TCR), explicitly minimizes the redundancy between tokens, encouraging them to become statistically \textit{disentangled} representations. We present the theoretical justification for our method, framing it as a novel \textbf{Disentangled Rate-Distortion} problem. This approach produces a low-dimensional, interpretable, and sample-efficient representation, where each token is encouraged to capture an independent factor of variation, paving the way for more robust digital biomarkers.

[307] Smart Profit-Aware Crop Advisory System: Kisan AI

Debasis Dwibedy, Avyay Nishtala, Pranathi Mukku, D Snehaja

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Modern crop advisory systems exhibit a critical limitation termed \textit{economic blindness}. These systems primarily optimize for biological yield, often overlooking market price, which can lead farmers toward agronomically sound yet financially unviable decisions. In this paper, we develop Kisan AI, a smart profit-aware crop advisory system that resolves the above-mentioned limitation through a research-driven, full-stack application. We train the Random Forest(RF) classifier model on a nine-feature benchmark dataset, the standard seven agronomic attributes augmented with a \textit{market_price} variable, and evaluated against eight baseline models, considering the evaluation matrices, such as, accuracy, precision, recall, F1-score, and Log Loss. The RF model achieves the highest accuracy of 99.3% and the lowest Log Loss, confirming that the inclusion of market price as a predictive feature is both valid and impactful. We then implement the RF model within a multilingual progressive Web App alongside a Facebook Prophet six-month price forecasting engine and a MobileNetV2 disease detection module. A nine-language AI chatbot powered by the Anthropic Claude API unifies all modules into a single, mobile-installable platform accessible to farmers across India.

[308] Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization

YiFeng Wang, Zhun Sun, Keisuke Sakaguchi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Activation Residual Hessian Quantization (ARHQ), a post-training weight splitting method designed to mitigate error propagation in low-bit activation-weight quantization. By constructing an input-side residual Hessian from activation quantization residuals (G_x), ARHQ analytically identifies and isolates error-sensitive weight directions into a high-precision low-rank branch. This is achieved via a closed-form truncated SVD on the scaled weight matrix W G^{1/2}_x . Experimental results on Qwen3-4B-Thinking-2507 demonstrate that ARHQ significantly improves layer-wise SNR and preserves downstream reasoning performance on ZebraLogic even under aggressive quantization. The code is available at https://github.com/BeautMoonQ/ARHQ.

[309] Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Yikai Wang, Shang Liu, Jose Blanchet

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement learning from human feedback (RLHF) has become a core post-training step for aligning large language models, yet the reward signal used in RLHF is only a learned proxy for true human utility. From an operations research perspective, this creates a decision problem under objective misspecification: the policy is optimized against an estimated reward, while deployment performance is determined by an unobserved objective. The resulting gap leads to reward over-optimization, or Goodharting, where proxy reward continues to improve even after true quality deteriorates. Existing mitigations address this problem through uncertainty penalties, pessimistic rewards, or conservative constraints, but they can be computationally burdensome and overly pessimistic. We propose Wasserstein distributionally robust regret optimization (DRRO) for RLHF. Instead of pessimizing worst-case value as in standard DRO, DRRO pessimizes worst-case regret relative to the best policy under the same plausible reward perturbation. We study the promptwise problem through a simplex allocation model and show that, under an $\ell_1$ ambiguity set, the inner worst-case regret admits an exact solution and the optimal policy has a water-filling structure. These results lead to a practical policy-gradient algorithm with a simple sampled-bonus interpretation and only minor changes to PPO/GRPO-style RLHF training. The framework also clarifies theoretically why DRRO is less pessimistic than DRO, and our experiments show that DRRO mitigates over-optimization more effectively than existing baselines while standard DRO is systematically over-pessimistic.

[310] Consistent Diffusion Language Models

Hasan Amin, Yuan Gao, Yaser Souri, Subhojit Som, Ming Yin, Rajiv Khanna, Xia Song

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Diffusion language models (DLMs) are an attractive alternative to autoregressive models because they promise sublinear-time, parallel generation, yet practical gains remain elusive as high-quality samples still demand hundreds of refinement steps. In continuous domains, consistency training along the probability-flow ODE is a popular recipe to accelerate diffusion. For discrete diffusion, no analogous sample-space ODE exists, making direct adaptation ill-defined. We argue that the natural discrete substitute is not a deterministic trajectory but its stochastic counterpart: the exact posterior bridge, available in closed form for broad corruption families including masked and uniform diffusion. Building on this observation, we introduce Multi-Path Discrete Consistency (MPDC), a new principle that trains a denoiser to be path-invariant in expectation across these stochastic bridges, and instantiate it as the Consistent Diffusion Language Model (CDLM), a single-stage, teacher-free training framework. A single CDLM objective unifies masked diffusion, continuous consistency models, and progressive/discrete distillation as analytic limits or empirical approximations of one common view. Empirically, CDLM establishes a new state of the art on both conditional and unconditional text-generation, consistently outperforming strong base discrete diffusion models and often even multi-stage distilled baselines across sampling budgets, with the largest gains in the few-step regime. Together, these results position CDLM as a principled and scalable foundation for the next generation of fast, high-fidelity discrete generative modeling.

[311] Towards A Generative Protein Evolution Machine with DPLM-Evo

Xinyou Wang, Liang Hong, Jiasheng Ye, Zaixiang Zheng, Yu Li, Shujian Huang, Quanquan Gu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Proteins are shaped by gradual evolution under biophysical and functional constraints. Protein language models learn rich evolutionary constraints from large-scale sequences, and discrete diffusion-based protein language models~(\eg, DPLMs) are promising for both understanding and generation. However, existing DPLMs typically rely on masking-based absorbing diffusion that contradicts a simple biological intuition: proteins evolve through accumulated edits, not by emerging from masks. Consequently, these frameworks lack explicit pretraining objectives for substitution and insertion/deletion (indel) operations, limiting both optimization-style post-editing and flexible guided generation. To address these limitations, we present DPLM-Evo, an evolutionary discrete diffusion framework that explicitly predicts substitution, insertion, and deletion operations during denoising. DPLM-Evo decouples an upsampled-length latent alignment space from the variable-length observed sequence space, which makes indel-aware generation tractable and enables adaptive scaffold growth throughout the process with negligible computational overhead. To better align substitutions with real evolution, we further introduce a contextualized evolutionary noising kernel that produces biologically informed, context-dependent mutation patterns. Across tasks, DPLM-Evo improves sequence understanding and achieves state-of-the-art mutation effect prediction performance on ProteinGym in the single-sequence setting. It also enables variable-length simulated evolution, and post-editing/optimization of existing proteins via explicit edit trajectories.

[312] Introducing WARM-VR: Benchmark Dataset for Multimodal Wearable Affect Recognition in Virtual Reality

Karim Alghoul, Faisal Mohd, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: With the growing integration of human-computer interaction into everyday life, advances in machine learning have enabled systems to better perceive and respond to users’ emotional states. Most existing affect recognition datasets focus on static environments, limiting their applicability to immersive multimedia contexts such as Virtual Reality (VR). In this paper, we introduce WARM-VR, a novel publicly available multimodal dataset designed to support affect recognition in immersive, multisensory environments using wearable sensing instrumentation. Data were collected from 31 participants aged 19-37 using wearable sensors: a wristband measuring Blood Volume Pulse (BVP), EDA, skin Temperature, three-axis Acceleration, and a chest strap recording ECG signals. Participants engaged in immersive VR experiences designed to elicit relaxation through a calming beach environment following stress induction via an arithmetic task. These sessions incorporated synchronized multimedia stimuli: visual, auditory, and olfactory. Affective states were assessed subjectively through validated self-report questionnaires and objectively through the analysis of physiological measurements. Statistical analysis of the questionnaires confirmed that VR relaxation significantly reduced negative affect, particularly with olfactory enhancement. Furthermore, we established a benchmark on the dataset using widely recognized machine learning algorithms. The best performance for binary classification from BVP data of valence, was obtained with a CNN and a CNN-Bi-GRU model, both achieving an average F1-score of 0.63 and an AUC of 0.69. For arousal, a lightweight Transformer architecture provided the most balanced results (F1-0 0.54 and F1-1 0.63), outperforming recurrent hybrids. In the relaxation task, a CNN-Bi-GRU model reached the highest overall performance (average F1-score 0.64, AUC 0.69).

[313] Fair Dataset Distillation via Cross-Group Barycenter Alignment

Mohammad Hossein Moslemi, Nima Hosseini Dashtbayaz, Zhimin Mei, Boyu Wang, Bissan Ghaddar

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dataset Distillation aims to compress a large dataset into a small synthetic one while maintaining predictive performance. We show that as different demographic groups exhibit distinct predictive patterns, the distillation process struggles to simultaneously preserve informative signals for all subgroups, regardless of whether group sizes are mildly or severely imbalanced. Consequently, models trained on distilled data can experience substantial performance drops for certain subgroups, leading to fairness gaps. Crucially, these gaps do not disappear by merely correcting group imbalance, since they stem from fundamental mismatches in subgroup predictive patterns rather than from sample-size disparities alone. We therefore formally analyze the interaction between these two sources of bias and cast the solution as identifying a group-imbalance-agnostic barycenter of the predictive information that induces similar representations across all subgroups. By distilling toward this shared aggregate representation, we show that group fairness concerns can be reduced. Our approach is compatible with existing distillation methods, and empirical results show that it substantially reduces bias introduced by dataset distillation.

[314] OTSS: Output-Targeted Soft Segmentation for Contextual Decision-Weight Learning

Renjun Hu, Hyun-Soo Ahn

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Many machine learning systems make constrained decisions by optimizing factorized objectives, but the context-specific objective is often treated as fixed. We study contextual decision-weight learning: from logged decisions and proxy outputs, learn an optimizer-facing weight vector w(x) over interpretable decision factors z(x,d), rather than a direct policy or generic predictive score. We propose OTSS, an output-targeted soft-segmentation model that deploys the personalized decision-ready weight vector. At the function-class level, the theory highlights a hard-versus-soft distinction. Hard partitions incur an approximation-estimation tradeoff under overlap, while a realizable fixed-K soft class removes the hard-partition approximation floor and attains a parametric rate. We evaluate OTSS in controlled benchmarks with finite evaluation libraries, where the true weight vector and downstream regret can be computed exactly. In the representative overlap setting, OTSS attains the lowest mean regret among the comparators, including EM mixture regression, the strongest soft-mixture baseline in our comparison; it matches EM on coefficient recovery while running about two orders of magnitude faster. In a matched K=5 benchmark, OTSS remains competitive under hard-routed truth and improves as heterogeneity becomes softer and sample size grows. On a fixed Complete Journey retail anchor with real household covariates and action geometry, OTSS again achieves the lowest mean-regret point estimate.

[315] Diversity in Large Language Models under Supervised Fine-Tuning

Roman Klypa, Oleksandr Cherednichenko

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Supervised Fine-Tuning (SFT) is essential for aligning Large Language Models (LLMs) with user intent, yet it is believed to suppress generative diversity. Although this reduction is frequently referenced, formal empirical testing of the phenomenon remains limited. The expressiveness of LLMs by itself was addressed by multiple prior methods. Their varying perspectives suggest that deeper analysis could yield further improvements. In this study, we attribute the decline to two primary drivers: the neglect of low-frequency patterns within fine-tuning datasets and the forgetting of preexisting knowledge. Motivated by our theoretical analysis, we develop Tempered Focal (TOFU) loss, a novel objective that addresses both stated challenges simultaneously. Our extensive evaluation confirms at scale that generation breadth narrows after SFT and strengthens the hypothesis explaining this effect. Across multiple models and benchmarks, we demonstrate that TOFU enhances output diversity while preserving high response quality, offering a principled approach to SFT.

[316] State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning

Thea Aviss

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Current transformers discard their rich latent residual stream between positions, reconstructing latent reasoning context at each new position and leaving potential reasoning capacity untapped. The State Stream Transformer (SST) V2 enables parameter-efficient reasoning in continuous latent space through an FFN-driven nonlinear recurrence at each decoder layer, where latent states are streamed horizontally across the full sequence via a learned blend. This same mechanism supports continuous latent deliberation per position at inference time, dedicating additional FLOPs to exploring abstract reasoning before committing to a token. A two-pass parallel training procedure resolves the sequential dependency of the recurrence to allow compute-efficient training. Hidden state analysis shows the state stream facilitates reasoning through exploration of distinct semantic basins in continuous latent space, where transitions at content-dependent positions move the model into a substantially different Bayesian posterior, directly influencing the latent space at future positions. We also find, via a learned probe, that at the first generated token position, the latent state already predicts whether the eventual answer will survive or break under additional latent computation for every subsequent position. Co-trained into an existing 27B backbone using only a small dataset of GSM8K examples, the SST delivers a +15.15 point gain over a fine-tuning-matched baseline on out-of-distribution GPQA-Diamond and cuts that same baseline’s remaining GSM8K errors by 46%, together showing that the reasoning improvement is attributable to the architectural mechanism rather than scale or training data. On GPQA-Diamond, the resulting 27B SST also achieves higher accuracy than several larger open-weight and proprietary systems, including open-weight models up to 25 times larger.

[317] CompleteRXN: Toward Completing Open Chemical Reaction Databases

Gabriel Vogel, Minouk Noordsij, Evgeny Pidko, Jana M. Weber

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Chemical reaction datasets such as USPTO suffer from substantial incompleteness, frequently missing byproducts, co-reactants, and stoichiometric coefficients. This limits their applicability and reliability in downstream applications. Here, we introduce CompleteRXN, a large-scale supervised benchmark for reaction completion under realistic missing-data conditions. We construct a dataset of aligned incomplete and atom-balanced reactions by mapping USPTO records to curated mechanistic reactions. We evaluate representative baselines, including a novel encoder-decoder reaction completion model with constrained decoding, the Constrained Reaction Balancer (CRB), and a recent algorithmic method, SynRBL. On our CompleteRXN benchmark, the CRB achieves high performance across splits of increasing difficulty, reaching 99.20% equivalence accuracy on the random split and 91.12% on the extreme out-of-distribution split. SynRBL produces many balanced and chemically plausible completions, but with lower accuracy on the benchmark test splits. Across all methods, performance degrades with increasing incompleteness. We observe a substantial drop when evaluating on reactions outside the benchmark (full uncurated USPTO), highlighting the gap between benchmark performance and practical robustness and motivating future work.

[318] Bayesian Optimization in Linear Time

Jesse Schneider, William J. Welch

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Bayesian optimization is a sequential method for minimizing objective functions that are expensive to evaluate and about which few assumptions can be made. By using all gathered data to train a Gaussian process model for the function and adaptively employing a mixture of global exploration and local exploitation, this method has been used for optimization in many fields including machine learning, automotive engineering and reinforcement learning. However, the standard method suffers from two problems: 1) with cubic computational complexity in the training-set size it eventually becomes computationally infeasible to train the model, and 2) globally modeling the objective function is not necessarily optimal given the local nature of minimization. Using flexible and recursive binary partitioning of the search space, we adapt both the modeling and acquisitive aspects of standard Bayesian optimization to work harmoniously with the partitioning scheme, thereby ameliorating both standard shortcomings. We compare our method against a commonly used Bayesian optimization library on seven challenging test functions, ranging in dimensionality from $6$ to $124$, and show that our method achieves superior optimization performance in all tests. In addition our method has linear computational complexity.

[319] NLPOpt-Net: A Learning Method for Nonlinear Optimization with Feasibility Guarantees

Bimol Nath Roy, Rahul Golder, MM Faruque Hasan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Nonlinear Parametric Optimization Network (NLPOpt-Net) is an unsupervised learning architecture to solve constrained nonlinear programs (NLP). Given the structure of an NLP, it learns the parametric solution maps with guaranteed constraint satisfaction. The architecture consists of a backbone neural network (NN) followed by a multilayer ($k$-layered) projection. While the NN drives toward optimality through a loss function consisting of a modified Lagrangian augmented with a consistency loss, the projection ensures feasibility by projecting the NN predictions in the original constraint manifold. Instead of typical distance minimization, our projection exploits local quadratic approximations of the original NLP. Under certain conditions (such as convexity), the projection has a descent property, which improves the NN predictions further. NLPOpt-Net deploys an inversion-free, modified Chambolle-Pock algorithm to solve the constrained quadratic projections during the forward pass and uses the implicit function theorem for efficient backpropagation. The fixed structure of the projection further allows decoupling of the NN and the projection once the training is complete. NLPOpt-Net solves large-scale convex QP, QCQP, NLP, and nonconvex problems with near zero optimality gap and constraint violations reduced to machine precision. Additionally, it provides near accurate prediction of the active sets and corresponding dual variables, thereby enabling a scalable approach for multiparametric programming. Compiling the projection in C provides order of magnitude improvement in inference time compared to JAX. We provide the codes and NLPOpt-Net as a ready to use package that includes GPU support.

[320] Pessimism-Free Offline Learning in General-Sum Games via KL Regularization

Claire Chen, Yuheng Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Offline multi-agent reinforcement learning in general-sum settings is challenged by the distribution shift between logged datasets and target equilibrium policies. While standard methods rely on manual pessimistic penalties, we demonstrate that KL regularization suffices to stabilize learning and achieve equilibrium recovery. We propose General-sum Anchored Nash Equilibrium (GANE), which recovers regularized Nash equilibria at an accelerated statistical rate of $\widetilde{O}(1/n)$. For computational tractability, we develop General-sum Anchored Mirror Descent (GAMD), an iterative algorithm converging to a Coarse Correlated Equilibrium at the standard rate of $\widetilde{O}(1/\sqrt{n}+1/T)$. These results establish KL regularization as a standalone mechanism for pessimism-free offline learning that achieves equivalent or accelerated rates in multi-player general-sum games.

[321] Jailbroken Frontier Models Retain Their Capabilities

Daniel Zhu, Zihan Wang, Jenny Bao, Jerry Wei

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a “jailbreak tax” that degrades the target model’s task performance. We show that this tax scales inversely with model capability and that the most advanced jailbreaks effectively yield no reduction in model capabilities. Evaluating 28 jailbreaks on five benchmarks across Claude models ranging in capability from Haiku 4.5 to Opus 4.6, we find Haiku 4.5 loses an average of 33.1% on benchmark performance when jailbroken, while Opus 4.6 at max thinking effort loses only 7.7%. We also observe that across all models, reasoning-heavy tasks display considerably more degradation than knowledge-recall tasks. Finally, Boundary Point Jailbreaking, currently the strongest jailbreak against deployed classifiers, achieves near-perfect classifier evasion with near-zero degradation across safeguarded models. We recommend that safety cases for frontier models should not rely on a meaningful capability degradation from jailbreaks.

[322] Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning

Sahil Mishra, Srinitish Srinivasan, Sourish Dasgupta, Tanmoy Chakraborty

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-world knowledge is often organized as hierarchies such as product taxonomies, medical ontologies, and label trees, yet learning hierarchical representations is challenging due to asymmetric structure and noisy semantics. We introduce Polaris, a polar hyperspherical embedding framework that separates semanticity from hierarchy using angular geometry and radius, enabling the learning of meaning and structure without interference. To map latent representation onto the sphere, we project it to the tangent space at the north pole, apply the exponential map, and learn unit-norm representations using spherical linear layers. Polaris then combines robust local constraints, global regularization that prevents geometric collapse, and uncertainty-aware asymmetric objectives that encourage directional containment. At inference time, Polaris uses structure-guided retrieval to efficiently narrow down candidate parents before final ranking. We evaluate Polaris on different settings of taxonomy expansion - spanning trees, multi-parent DAGs, and multimodal hierarchies, showing consistent improvements of up to ~19 points in top-K retrieval and up to ~60% reduction in mean rank over fourteen strong baselines.

[323] Caracal: Causal Architecture via Spectral Mixing

Bingzheng Gan, Tianyi Zhang, Yusu Li, Jing Huang, Wei Shi, Yangkai Ding, Tao Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, $\mathcal{O}(L \log L)$ Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.

[324] A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions

Matteo Raviola, Benjamin Peherstorfer

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dirac-Frenkel instantaneous residual minimization evolves nonlinear parametrizations of PDE solutions in time, but ill-conditioning can render the parameter dynamics non-unique. We interpret this non-uniqueness as a gauge freedom: nullspace directions that leave the time derivative unchanged can be used to select better-conditioned parameter velocities. Building on Onsager’s minimum-dissipation principle, we introduce a history variable – interpretable as momentum – and inject it only along the nullspace directions. The resulting Dirac-Frenkel-Onsager dynamics preserve instantaneous residual minimization, in contrast to standard regularization that can introduce bias, while promoting temporally smooth parameter evolutions. Examples demonstrate that the approach leads to increased robustness in singular and near-singular regimes.

[325] Data Deletion Can Help in Adaptive RL

Param Budhraja, Aditya Gangrade, Alex Olshevsky, Venkatesh Saligrama

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Markov Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called “universal policy” which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory. We identify a simple, counterintuitive trick that substantially improves the estimator: randomly delete a fraction of the training buffer after each round. This works because data is collected across multiple rounds using progressively better policies, and older trajectories come from a different distribution than what the estimator will face at deployment time; random deletion creates an implicit exponential decay on older data while preserving diversity without requiring any explicit identification of which samples are stale. This reduces robustness gap by 30% for MLPs and by 6% on average for recurrent networks. Strikingly, it allows a narrow MLP with 5x fewer parameters to outperform a wide MLP trained without deletion. To understand when and why deletion helps, we analyze regularized empirical risk minimization with a mismatch between the train distribution and the distribution at deployment; in this idealized setting, we prove that removing a single uniformly random training point decreases expected test loss in expectation under mild conditions. For ridge regression we make this quantitative: deletion helps when the regularization coefficient is moderate and the signal-to-noise ratio (SNR) is sufficiently low, and, crucially, this SNR threshold gives a direct measure of how large the distribution mismatch between training and deployment must be for deletion to be beneficial.

[326] Federated Weather Modeling on Sensor Data

Shengchao Chen, Guodong Long

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated weather modeling on sensor data is a distributed system underpinned by federated learning, enabling multiple sensor data sources, including ground weather stations, satellites and IoT devices, to collaboratively train deep learning models without sharing raw data. This method safeguards data privacy and security while leverages diverse, geographically distributed datasets to improve the accuracy and robustness of global/regional weather modeling tasks such as forecasting and anomaly detection.

[327] Conformalized Quantum DeepONet Ensembles for Scalable Operator Learning with Distribution-Free Uncertainty

Purav Matlia, Christian Moya, Guang Lin

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Operator learning enables fast surrogate modeling of high-dimensional dynamical systems, but existing approaches face two fundamental limitations: quadratic inference complexity and unreliable uncertainty quantification in safety-critical settings. We propose Conformalized Quantum DeepONet Ensembles, a framework that addresses both challenges simultaneously. By leveraging Quantum Orthogonal Neural Networks (QOrthoNNs), we reduce operator inference complexity from O(n^2) to O(n), enabling scalable evaluation over fine discretizations. To provide rigorous uncertainty quantification, we combine ensemble-based epistemic modeling with adaptive conformal prediction, yielding distribution-free coverage guarantees. A key challenge in ensembling is that naive parallelism scales hardware resources linearly with the number of models. We resolve this by using Superposed Parameterized Quantum Circuits (SPQCs), which compress multiple ensemble members into a single circuit and enable simultaneous multi-model execution. Experiments on synthetic partial differential equations and real-world power system dynamics demonstrate that our approach achieves accurate predictions while maintaining calibrated uncertainty under realistic quantum noise. These results establish a practical pathway toward scalable, uncertainty-aware operator learning in quantum machine learning.

[328] Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities

Abay Bektursun

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Frozen Gemma 4 31B weights pretrained exclusively on text tokens, unmodified, transfer across modality boundaries through a thin trainable interface. (1) OGBench scene-play-singletask-task1-v0: $+4.33$pt over published GCIQL at $n=3$ with std 0.74 – a published-SOTA win on a robotic manipulation task the substrate has never seen. (2) D4RL Walker2d-medium-v2: Decision-Transformer parity ($76.2 \pm 0.8$, $n=3$) at $0.43\times$ DT’s trainable count, with the frozen substrate compressing to a 5L slice ($+1.66$pt over the 6L baseline at $n=3$). (3) Associative recall as the cleanest pretraining-load-bearing case: the frozen slice + a 113K-parameter linear interface reaches L30 best-checkpoint per-bit error 0.0505 ($n=2$); a 6.36M-parameter from-scratch trained transformer at matched capacity ($1/\sqrt{d_k}$ scaling, two seeds, LR sweep) cannot solve the task at all under the protocol (best L30 = 0.4395), an $8.7\times$ advantage. Architecture-alone falsifications: a frozen random transformer with correct $1/\sqrt{d_k}$ scaling stays at random-chance loss for 50k steps; a random-init Gemma slice fails OGBench cube-double-play-task1 entirely (0.89% across $n=3$ where pretrained reaches 60%). A dual-measurement protocol – text-activation probing on 95 English sentences plus task-ablation on a non-language target – names individual heads independently identifiable on both protocols: head L26.28 scores $3.7\times$ the slice mean for English token-copying and is the #2 most-critical head for binary copy ablation ($Δ$ L30 $= +0.221$); three further heads (L27.28, L27.2, L27.3) classify by the same protocol. The mechanism is single-model and the cross-modality results are single-task within their respective benchmarks; cross-model replication is structurally constrained because Gemma 4 31B is the only model on the small-scale Pareto frontier as of April 2026.

[329] Free Energy Surface Sampling via Reduced Flow Matching

Zichen Liu, Tiejun Li

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Sampling the free energy surface, namely, the distribution of collective variables (CVs), is a crucial problem in statistical physics, as it underpins a better understanding of chemical reactions and conformational transitions. Traditional methods for free energy surface sampling involve simulation in high-dimensional configuration space and projecting the resulting configurations onto the CV space. To reduce the computational costs of such sampling, we propose FES-FM, a reduced flow matching (FM) method for free energy sampling (FES). We train a dynamical transport map in the CV space, thereby enabling direct sampling of the free energy surface. For many-particle systems, we construct a prior distribution based on the Hessian at a local minimum of the potential, which ensures both rotation-translation invariance and physically meaningful configurations. We evaluate the proposed method across a variety of potential functions and collective variables. Comparative experiments demonstrate that our approach drastically reduces computational costs while delivering superior accuracy per unit sampling time.

[330] Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning (RL) only in relatively short-horizon settings (typically around 20–30 turns). In this work, we study RL-based training of VLMs for long-horizon decision-making in Super Mario Land, a visually grounded environment requiring 100+ turns of interaction with coordinated perception, reasoning, and action. We begin with a systematic investigation of key algorithmic components and propose an adapted variant of PPO with a lightweight turn-level critic, which substantially improves training stability and sample efficiency over critic-free methods such as GRPO and Reinforce++. We further show that pretrained VLMs provide strong action priors, significantly improving sample efficiency during RL training and reducing the need for manual design choices such as action engineering, compared to classical deep RL trained from scratch. Building on these insights, we introduce Odysseus, an open training framework for VLM agents, achieving substantial gains across multiple levels of the game and at least 3 times average game progresses than frontier models. Moreover, the trained models exhibit consistent improvements under both in-game and cross-game generalization settings, while maintaining general-domain capabilities. Overall, our results identify key ingredients for making RL stable and effective in long-horizon, multi-modal settings, and provide practical guidance for developing VLMs as embodied agents.

[331] Hypergraph and Latent ODE Learning for Multimodal Root Cause Localization in Microservices

Xin Liu, Yuhang He, Sichen Zhao, Kejian Tong, Xingyu Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Root cause localization in cloud native microservice systems requires modeling complex service dependencies, irregular temporal dynamics, and heterogeneous observability data. We present HyperODE RCA, a unified framework that combines hypergraph attention learning, latent ordinary differential equations, and multimodal cross attention fusion for fine grained root cause analysis. The method learns higher order service interactions through differentiable hyperedge construction, captures continuous anomaly evolution from irregular observations with an ODE RNN encoder, and adaptively fuses logs, traces, metrics, entities, and events using context aware modality routing. We further improve robustness with a variational information bottleneck, temporal causal regularization, and invariant risk constraints. Experiments on the Tianchi AIOps benchmark show clear gains over strong baselines in ranking and classification performance, while preserving interpretability through learned hypergraph attention.

[332] VQ-SAD: Vector Quantized Structure Aware Diffusion For Molecule Generation

Farshad Noravesh, Reza Haffari, Layki Soon, Arghya Pal

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Many diffusion based molecule generation methods ignore the symbolic information of molecules and represent the atom and bond type as one hot representation. Methods based on Morgan fingerprints produce hash collisions and are hard to embed into a continuous space without information loss and random fingerprints correspond to no valid molecule. To circumvent this issue we use another paradigm and consider atom and bond codes as latent variables of VQ-VAE. We introduce VQ-SAD which first trains a VQ-VAE and uses the frozen pretrained VQ-VAE model and considers the codebooks for both atom and bond types as tokenizers for the downstream diffusion process. VQ-SAD is a neuro-symbolic model that utilizes both symbolic and neural structural information for a diffusion based model with learnable forward process. The large discrete code space provides a more balanced atom and bond types which enhances the denoising process. VQ-VAE slightly outperforms SOTA models for diffusion based molecule generation on QM9 and ZINC250k datasets.

[333] Binomial flows: Denoising and flow matching for discrete ordinal data

Yair Shenfeld, Ricardo Baptista, Stefano Peluchetti

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Flow-based generative modeling in continuous spaces exploit Tweedie’s formula to express the denoiser (learned in training) as a score function (used in sampling). In contrast, this relation has been largely missing in the discrete setting where common approaches focus on learning discrete scores and rates. In this work we close this gap for discrete non-negative ordinal data by introducing Binomial flows. Our framework provides a simple recipe for training a discrete diffusion model which simultaneously denoises, samples, and estimates exact likelihoods. We verify our methodology on synthetic examples and obtain competitive results on real-world data sets.

[334] Uniform-Correct Policy Optimization: Breaking RLVR’s Indifference to Diversity

Anamika Lochab, Bolian Li, Ruqi Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has achieved substantial gains in single-attempt accuracy (Pass@1) on reasoning tasks, yet often suffers from reduced multi-sample coverage (Pass@K), indicating diversity collapse. We identify a structural cause for this degradation: common RLVR objectives, such as GRPO, are indifferent to how probability mass is distributed among correct solutions. Combined with stochastic training dynamics, this indifference induces a self-reinforcing collapse, in which probability mass concentrates on a narrow subset of correct outputs while alternative valid solutions are suppressed. We formalize this collapse mechanism and further characterize the optimal policy structure under two complementary criteria: robustness and entropy-regularized optimality, which identify the Uniform-Correct Policy as uniquely optimal. Motivated by this analysis, we propose Uniform-Correct Policy Optimization (UCPO), a modification to GRPO that adds a conditional uniformity penalty on the policy’s distribution over correct solutions. The penalty redistributes gradient signal toward underrepresented correct responses, encouraging uniform allocation of probability mass within the correct set. Across three models (1.5B-7B parameters) and five mathematical reasoning benchmarks, UCPO improves Pass@K and diversity while maintaining competitive Pass@1, achieving up to +10% absolute improvement on AIME24 at Pass@64 and up to 45% higher equation-level diversity within the correct set. The code is available at https://github.com/AnamikaLochab/UCPO.

[335] AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

Chenyu Huang, Jianghao Lin, Zhengyang Tang, Bo Jiang, Ruoqing Jiang, Benyou Wang, Lai Wei

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose AlphaInventory, an end-to-end inventory-policy evolution and inference framework grounded in confidence-interval-based certification. The framework trains a large language model using reinforcement learning, incorporates demand data as well as numerical and textual features beyond demand, and generates white-box inventory policy with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical interface that connects training, inference, and deployment. This allows us to characterize the probability that the AlphaInventory evolves a statistically safe and improved policy, and to quantify the deployment gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, AlphaInventory outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.

[336] Advancing Edge Classification through High-Dimensional Causal Modeling of Node-Edge Interplay

Duanyu Feng, Li Ding, Hongru Liang, Wenqiang Lei

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Edge classification, a crucial task for graph applications, remains relatively under-explored compared to link prediction. Current methods often overlook the potential causal influences of node features on edge features, leading to a loss of relevant prior information. In this work, we present an empirical exploration using the Causal Edge Classification Framework (CECF). Unlike conventional causal inference methods, CECF is the first framework to apply causal inference principles to the edge classification task and to explore modeling edge features as a high-dimensional treatment within a causal framework. Based on the node embedding of Graph Neural Network (GNN), CECF seeks to learn a balanced representation of high-dimensional edge features by mitigating the potential influence of node features. Then, a cross-attention network captures the complex dependencies between node and edge features for final edge classification.Extensive experiments demonstrate that CECF not only achieves superior performance but also serves as a flexible, plug-and-play enhancement for existing methods.We also provide empirical analyses, offering insights into when and how this high-dimensional causal modeling framework works for the edge classification.

[337] ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects negative-token hidden representations onto an SVD-based low-rank positive subspace and uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4% in Avg@16 and 7.0% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

[338] PILIR: Physics-Informed Local Implicit Representation

Jianfeng Li, Feng Wang, Ke Tang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Physics-Informed Neural Networks have become a powerful mesh-free method for solving partial differential equations, but their performance is often limited by spectral bias. Specifically, in standard MLPs used in PINNs, the global parameter coupling causes the model to prioritize learning low-frequency components, resulting in slow convergence for high-frequency details. To overcome this limitation, we introduce the Physics-Informed Local Implicit Representation (PILIR). Our approach separates the global physical domain into a discrete latent feature space and a continuous generative decoder. By using a learnable grid to encode explicit spatial locality, PILIR can capture high-frequency details locally, preventing dilution by global patterns. A generative neural operator then synthesizes these local latent features into continuous physical fields, allowing accurate reconstruction of fine-scale structures. Experiments on a range of challenging PDEs show that PILIR effectively mitigates spectral bias, thereby boosting the convergence of high-frequency details and achieving superior accuracy compared to state-of-the-art methods.

[339] Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

Sai Niranjan Ramachandran, Suvrit Sra

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Decision trees and diffusion models are ostensibly disparate model classes, one discrete and hierarchical, the other continuous and dynamic. This work unifies the two by establishing a crisp mathematical correspondence between hierarchical decision trees and diffusion processes in appropriate limiting regimes. Our unification reveals a shared optimization principle: \emph{Global Trajectory Score Matching (GTSM)}, for which gradient boosting (in an idealized version) is asymptotically optimal. We underscore the conceptual value of our work through two key practical instantiations: \treeflow, which achieves competitive generation quality on tabular data with higher fidelity and a 2\times computational speedup, and \dsmtree, a novel distillation method that transfers hierarchical decision logic into neural networks, matching teacher performance within 2% on many benchmarks.

[340] Towards Robust and Scalable Density-based Clustering via Graph Propagation

Yingtao Zheng, Hugo Phibbs, Ninh Pham

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present \textit{CluProp}, a novel framework that reimagines varied-density clustering in high-dimensional spaces as a label propagation process over neighborhood graphs. Our approach formally bridges the gap between density-based clustering and graph connectivity, leveraging efficient propagation mechanisms from network science to mitigate the parameter sensitivity inherent in traditional density-based methods. Specifically, we introduce a deterministic density-based propagation strategy to ensure scalable neighborhood identification. The framework is agnostic to the choice of distance metric and exhibits superior performance on large-scale data, processing millions of points in minutes while consistently outperforming existing baselines in accuracy.

[341] BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

Zhixiong Zhao, Zukang Xu, Dawei Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandwidth cost. However, existing methods cannot address activation heavy tails and thus must keep activations in high precision, preventing true end-to-end acceleration. To overcome this limitation, we propose BWLA (Binarized Weights and Low-bit Activations), the first post-training quantization framework that preserves high accuracy while achieving 1-bit weight quantization together with low-bit activations (e.g., 6 bits). The Orthogonal-Kronecker Transformation (OKT) learns an orthogonal mapping via EM minimization, converting unimodal weights into symmetric bimodal forms while suppressing activation tails and incoherence. The Proximal SVD Projection (PSP) then performs lightweight low-rank refinement through proximal SVD projection, further enhancing quantizability with minimal overhead. On Qwen3-32B, BWLA reaches a Wikitext2 perplexity of 11.92 under 6-bit activations (vs. 38 from SOTA), improves five zero-shot tasks by more than 70%, and delivers 3.26 times inference speedup, demonstrating strong potential for real-world LLM compression and acceleration.

[342] Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Haichen Hu, Jian Qian, David Simchi-Levi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While recent advances have explored offline oracle-efficient algorithms, their computational complexity typically scales with the cardinality of the state and action spaces, rendering them intractable for large-scale or continuous environments. In this paper, we address this fundamental limitation by studying offline oracle-efficient episodic RL through the lens of log-barrier and log-determinant regularization. Specifically, for tabular Markov Decision Processes (MDPs), we propose a novel algorithm that achieves the optimal $\tilde{O}(\sqrt{T})$ regret bound while requiring only $O(H\log\log T)$ calls to both the offline statistical estimation and planning oracles when $T$ is known and $O(H\log T)$ calls when $T$ is unknown. Crucially, this oracle complexity is entirely independent of the size of the state and action spaces. This strict independence drastically reduces the planning oracle complexity, representing a substantial improvement over existing offline oracle-efficient algorithms (Qian et al., 2024). Furthermore, we demonstrate the versatility of our framework by generalizing the algorithm to linear MDPs featuring infinite state spaces and arbitrary action spaces. We prove that this generalized approach successfully attains meaningful sub-linear regret. Consequently, our work yields the first doubly oracle-efficient (i.e., efficient with respect to both statistical estimation and policy optimization) regret minimization algorithm capable of solving MDPs with infinite state and action spaces, significantly expanding the boundaries of computationally tractable RL.

[343] Mesh Field Theory: Port-Hamiltonian Formulation of Mesh-Based Physics

Satoshi Noguchi, Yoshinobu Kawahara

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Mesh Field Theory (MeshFT) and its neural realization, MeshFT-Net: a structure-preserving framework for mesh-based continuum physics that cleanly separates the physics’ topological structure from its metric structure. Imposing minimal physical principles (locality, permutation equivariance, orientation covariance, and energy balance/dissipation inequality), we prove a reduction theorem for mesh-based physics. Under these conditions, the physical dynamics admit a local factorization into a port-Hamiltonian form: the conservative interconnection is fixed uniquely by mesh topology, whereas metric effects enter only through constitutive relations and dissipation. This reduction clarifies what must be fixed and what should be learned, directly informing MeshFT-Net’s design. Across evaluations on analytic and realistic datasets, physics-consistency tests, and out-of-distribution validation, MeshFT-Net achieves near-zero energy drift and strong physical fidelity (correct dispersion and momentum conservation) along with robust extrapolation and high data efficiency. By eliminating non-physical degrees of freedom and learning only metric-dependent structure, MeshFT provides a principled inductive bias for stable, faithful, and data-efficient learning-based physical simulation.

[344] M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data

J. Jake Nichol, Michael Weylandt, G. Matthew Fricke, Jhayron Perez-Carrasquilla, Melanie E. Moses

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Causal graph discovery for space-time systems is challenging in high-dimensional gridded data, which often has many more grid cells than temporal observations per cell. The Causal Space-Time Stencil Learning (CaStLe) meta-algorithm was developed to address that niche under space-time locality and stationarity assumptions, but it is currently limited to univariate analyses. In this work, we present M-CaStLe. M-CaStLe generalizes the local embedding and parent-identification phases of CaStLe to jointly model local within-variable and cross-variable space-time causal structures in gridded data. Like CaStLe, by constraining candidate parents to a constant-size space-time neighborhood and pooling spatial replicates, M-CaStLe increases effective sample size to make discovery tractable in high-dimensional settings. We further decompose the resulting multivariate stencil graph into reaction and spatial graphs to aid interpretation in complex settings. We study M-CaStLe in four settings: a multivariate space-time vector autoregression benchmark with known ground truth, an advective-diffusive-reaction partial differential equation verification problem with derived physical reference structure, an atmospheric chemistry case study in a low-temporal-sample regime, and an El Niño Southern Oscillation study on reanalysis data, identifying phase-dependent ocean–atmosphere coupling. Across these settings, M-CaStLe more accurately recovers multivariate causal structure in controlled settings and identifies important physical dynamics in real-world case studies. Overall, M-CaStLe advances causal discovery for multivariate space-time systems while retaining interpretability at the grid level.

[345] PAMod: Modeling Cyclical Shifts via Phase-Amplitude Modulation for Non-stationary Time Series Forecasting

Yingbo Zhou, Yutong Ye, Shuhao Li, Rui Qian, Qiang Huang, Lemao Liu, Li Sun, Dejing Dou

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-world time series forecasting faces the fundamental challenge of non-stationary statistical properties, including shifts in mean and variance over time. While reversible instance normalization (RevIN) has shown promise by stationarizing inputs and denormalizing outputs, it relies on the strong assumption that historical and future distributions remain identical. We observe that in many practical applications, distribution shifts follow cyclical patterns that correlate with periodic positions (e.g., seasonal and holiday volatility). To this end, we propose PAMod, a lightweight yet powerful framework that models cyclical distribution shifts via Phase-Amplitude Modulation in the normalized feature space. PAMod learns periodic embeddings to modulate representations: phase modulation captures mean shifts, while amplitude modulation adapts to variance changes. Crucially, we prove mathematically that modulating in normalized space is equivalent to applying dynamic denormalization, offering an elegant unification of distribution adaptation and representation learning. Extensive experiments on twelve real-world benchmarks demonstrate that PAMod achieves state-of-the-art performance with fewer computational resources. Furthermore, our modulation mechanism, as a novel plug-and-play technique, can improve existing time-series forecasting methods with simple integration.

[346] Rethinking LLM Ensembling from the Perspective of Mixture Models

Jiale Fu, Yuchu Jiang, Peijun Wu, Chonghan Liu, Joey Tianyi Zhou, Xu Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Model ensembling is a well-established technique for improving the performance of machine learning models. Conventionally, this involves averaging the output distributions of multiple models and selecting the most probable label. This idea has been naturally extended to large language models (LLMs), yielding improved performance but incurring substantial computational cost. This inefficiency stems from directly applying conventional ensemble implementation to LLMs, which require a separate forward pass for each model to explicitly compute the ensemble distribution. In this paper, we propose the Mixture-model-like Ensemble (ME). By reinterpreting the ensemble as a mixture model, ME stochastically selects a single model at each step to generate the next token, thereby avoiding the need to explicitly compute the full ensemble distribution. ME is mathematically equivalent to sampling from the ensemble distribution, but requires invoking only one model, making it 1.78x-2.68x faster than conventional ensemble. Furthermore, this perspective connects LLM ensembling and token-level routing methods, suggesting that LLM ensembling is a special case of routing methods. Our findings open new avenues for efficient LLM ensembling and motivate further exploration of token-level routing strategies for LLMs. Our code is available at https://github.com/jialefu/Mixture-model-like-Ensemble/.

[347] Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks

Sara Malacarne, Eirik Hoel-Høiseth, Erlend Aune, David Zsolt Biró, Massimiliano Ruocco

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Mobile network operators must monitor thousands of heterogeneous network elements across the radio access network and the packet core, each exposing high-dimensional KPI time series. The scale and cost of incident labelling make supervised approaches impractical, motivating unsupervised anomaly detection robust to context shifts and nonstationarity. We propose \textbf{C-MTAD-GAT} (\emph{Context-aware Multivariate Time-series Anomaly Detection with Graph Attention}), an anomaly detection framework designed to operate as a single shared model across large populations of network elements. The model combines temporal and feature-wise graph attention with lightweight static and dynamic context conditioning and a dual-head decoder for reconstruction and multi-step forecasting. It produces per-element, per-feature anomaly scores, converted to alerts via fully unsupervised thresholds calibrated from validation residuals. On the TELCO dataset released with DC-VAE \cite{garcia2023onemodel}, C-MTAD-GAT improves event-level affiliation and pointwise F1 while generating fewer alarms than prior graph-attention and VAE-based baselines. We then apply the same system to nation-scale radio access and evolved packet core control-plane counter data from a mobile network operator, where it is deployed. Operator feedback indicates the alerts are actionable and support daily monitoring, showing scalability across domains without relying on labelled incidents.

[348] GD4: Graph-based Discrete Denoising Diffusion for MIMO Detection

Qincheng Lu, Sitao Luan, Xiao-Wen Chang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In wireless communications, recovering the optimal solution to the multiple-input multiple-output (MIMO) detection problem is NP-hard. Obtaining high-quality suboptimal solutions with a favorable performance-complexity trade-off is particularly challenging in under-determined systems with $N_t$ transmit antennas and $N_r < N_t$ receive antennas. Recent diffusion-based MIMO detectors have shown promise, but they require extensive sampling iterations at inference time, and their performance degrades in under-determined scenarios. We propose GD4, a graph-based discrete denoising diffusion method for MIMO detection. Unlike existing diffusion-based detectors that operate in a continuous relaxed space, GD4 performs denoising directly in the discrete symbol space and enables fast inference with one or a few denoising evaluations. Numerical results show that, under a similar inference-time compute budget, GD4 produces higher-quality suboptimal solutions than existing diffusion-based detectors and some widely used classical baseline including box-constrained Babai point and the $K$-best box-constrained randomized Klein-Babai point in both under-determined and overdetermined settings.

[349] Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

Yu-Hsueh Fang, Chia-Yen Lee

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Online Conformal Prediction (CP) struggles to balance temporal adaptability and structural stability. Feedback-driven methods (e.g., Adaptive Conformal Inference (ACI)) suffer from systemic marginal under-coverage and high interval variance during abrupt shifts, while temporally discounted Bayesian CP suffers from severe structural lag and uncalibrated interval bloat. We propose State-Adaptive Bayesian Conformal Prediction (SA-BCP) to achieve optimal spatio-temporal decoupling. By gating long-term temporal inertia with spatial kernel-density evidence, SA-BCP proactively expands intervals for recognized historical regimes while maintaining tight efficiency during stable states. We rigorously prove this mechanism’s optimality, identifying a minimax bias-variance tradeoff governed by an evidence threshold $K$. Extensive benchmarks on volatile financial datasets (2016–2026), including AMD, Gold, and GBP/USD, demonstrate that SA-BCP consistently minimizes the strictly proper Winkler score across diverse confidence levels. Specifically, SA-BCP resolves the systematic under-coverage inherent to ACI variants while simultaneously reducing the uncalibrated interval bloat of Bayesian CP by 10% to 37% under high-confidence requests. By elegantly navigating this tradeoff, SA-BCP achieves an optimal balance between conditional reliability and predictive efficiency.

[350] Adaptive Equilibrium: Dynamic Weighting Framework for Generalized Interruption of DeepFake Models

Hongrui Zheng, Liejun Wang, Zhiqing Guo

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The advancement of generalized deepfake disruption is constrained by the interruption imbalance, a fundamental bottleneck inherent to the generation of universal perturbations. We reveal that conventional static gradient normalization fundamentally struggles to resolve architectural conflicts, causing the optimization to bias towards susceptible models while neglecting resistant ones. We argue that achieving high and uniform effectiveness requires resolving this imbalance by reaching an adaptive equilibrium. We propose the Adaptive Equilibrium Framework (AEF), which employs a dynamic weighting mechanism that utilizes real-time loss feedback to adaptively assign greater interruption weights to the most resistant models. This approach shifts the optimization from an average-case problem to finding a dynamic balance, driving the perturbation to a uniformly effective equilibrium state. Comprehensive experiments validate that AEF achieves a more balanced interruption performance, maintaining a consistent interruption success rate across the evaluated diverse architectures.

[351] The Power of Order: Fooling LLMs with Adversarial Table Permutations

Xinshuai Dong, Haifeng Chen, Xuyuan Liu, Shengyu Chen, Haoyu Wang, Shaoan Xie, Kun Zhang, Zhengzhang Chen

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models have achieved remarkable success and are increasingly deployed in critical applications involving tabular data, such as Table Question Answering. However, their robustness to the structure of this input remains a critical, unaddressed question. This paper demonstrates that modern LLMs exhibit a significant vulnerability to the layout of tabular data. Specifically, we show that semantically-invariant permutations of rows and columns - rearrangements that do not alter the table’s underlying information - are sometimes sufficient to cause incorrect or inconsistent model outputs. To systematically probe this vulnerability, we introduce Adversarial Table Permutation, a novel, gradient-based attack that efficiently identifies worst-case permutations designed to maximally disrupt model performance. Our extensive experiments demonstrate that ATP significantly degrades the performance of a wide range of LLMs. This reveals a pervasive vulnerability across different model sizes and architectures, including the most recent and popular models. Our findings expose a fundamental weakness in how current LLMs process structured data, underscoring the urgent need to develop permutation-robust models for reliable, real-world applications.

[352] Federated Learning with Hypergradient-based Online Update of Aggregation Weights

Ayano Nakai-Kasai, Tadashi Wadayama

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated learning using mobile and Internet of Things devices requires not only the ability to handle heterogeneity of clients’ data distributions but also high adaptability to varying communication environments. We propose FedHAW (Federated Learning with Hypergradient-based update of Aggregation Weights) that implements online updates of aggregation weights. FedHAW updates the aggregation weights by using hypergradient, the gradient of the objective function with respect to the weights, which can be calculated with low computational overhead. Simulation results show that the proposed method possesses high generalization performance in heterogeneous environments and high robustness to communication errors.

[353] Batch Normalization for Neural Networks on Complex Domains

Xuan Son Nguyen, Nistor Grozavu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of principled Riemannian analogs of fundamental building blocks in deep neural networks (DNNs). Among those, Riemannian batch normalization (BN) layers have shown to enhance training stability and improve accuracy. In this paper, we propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.

[354] Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Ziwen Zhao, Menglin Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $Ψ$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative “merging and collapse” process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $Ψ$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.

[355] Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation

Shihong Ding, Fangyu Du, Cong Fang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-task learning (MTL) has emerged as a pivotal paradigm in machine learning by leveraging shared structures across multiple related tasks. Despite its empirical success, the development of likelihood-based efficiently solvable algorithms–even for shared linear representations–remains largely underdeveloped, primarily due to the non-convex structure intrinsic to matrix factorization. This paper introduces a first-order algorithm that jointly learns a shared representation and task-specific parameters, with guaranteed efficiency. Notably, it converges in $\widetilde{\mathcal{O}}(1)$ iterations and attains a \emph{near-optimal} estimation error of $\widetilde{\mathcal{O}}(dk/(TN))$, \emph{improving} over existing likelihood-based methods by a factor of $k$, where $d$, $k$, $T$, $N$ denote input dimension, representation dimension, task count, and samples per task, respectively. Our results justify that likelihood-based first-order methods can efficiently solve the MTL problem.

[356] Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

Junda Ying, Yuxuan Wang, Bowen Yang, Peijie Zhou, Lei Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis. Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level. However, this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution, which is essential for understanding lineage branching and fate decisions. We present Unbalanced Schrödinger Bridge (USB), a simulation-free framework for learning underlying dynamics that effectively integrates both stochastic and unbalanced effects which also models the discrete, jump-like birth-death dynamics at single-cell resolution. Theoretically, USB provides a tractable solution to the Branching Schrödinger Bridge (BSB) problem, offering a rigorous microscopic interpretation where individual cells undergo both Brownian motion and discrete birth-death jumps. Technically, the method implements an efficient solver by introducing a simulation-free training objective that effectively scales to high-dimensional omics data. Empirically, we demonstrate on both simulated and real-world datasets that USB not only achieves trajectory reconstruction performance better than or comparable to deterministic baselines but also uniquely enables realistic discrete simulation of birth-death dynamics at single-cell resolution.

[357] Trading off rewards and errors in multi-armed bandits

Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm. We study the tradeoff between identifying arm means accurately and accumulating reward, and present an algorithm with regret guarantees that interpolates between the two objectives. We provide both upper and lower bounds and validate empirically.

[358] Revealing graph bandits for maximizing local influence

Alexandra Carpentier, Michal Valko

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study a graph bandit setting where the objective of the learner is to detect the most influential node of a graph by requesting as little information from the graph as possible. One of the relevant applications for this setting is marketing in social networks, where the marketer aims at finding and taking advantage of the most influential customers. The existing approaches for bandit problems on graphs require either partial or complete knowledge of the graph. In this paper, we do not assume any knowledge of the graph, but we consider a setting where it can be gradually discovered in a sequential and active way. At each round, the learner chooses a node of the graph and the only information it receives is a stochastic set of the nodes that the chosen node is currently influencing. To address this setting, we propose BARE, a bandit strategy for which we prove a regret guarantee that scales with the detectable dimension, a problem dependent quantity that is often much smaller than the number of nodes.

[359] Distance metric learning for conditional anomaly detection

Michal Valko, Milos Hauskrecht

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Anomaly detection methods can be very useful in identifying unusual or interesting patterns in data. A recently proposed conditional anomaly detection framework extends anomaly detection to the problem of identifying anomalous patterns on a subset of attributes in the data. The anomaly always depends (is conditioned) on the value of remaining attributes. The work presented in this paper focuses on instance-based methods for detecting conditional anomalies. The methods depend heavily on the distance metric that lets us identify examples in the dataset that are most critical for detecting the anomaly. To optimize the performance of such methods we study and devise a metric learning method that learns the distance metric to reflect best the conditional anomaly pattern.

[360] Fairness of Classifiers in the Presence of Constraints between Features

Martin C. Cooper, Imane Bousdira

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In Machine Learning, an accepted definition of fairness of a decision taken by a classifier is that it should not depend on protected features, such as gender. Unfortunately, when constraints exist between features, such dependencies can be obscured by the constraints. To avoid this problem, we propose that a decision be considered fair if it has a fair explanation. We define a fair explanation as a prime-implicant reason for the decision that does not contain any protected feature (where the constraints are taken into account in the definition of prime-implicant). Surprisingly, ignoring constraints can completely change the fairness of a decision (according to this definition) even in the absence of constraints between protected and unprotected features. Three possible definitions of fairness of a classifier are that for all its decisions (1) there are only fair explanations, (2) there is at least one fair explanation, or (3) changing protected features does not change the outcome. We identify the relationships between these different definitions of fairness and study the computational complexity of testing fairness of classifiers.

[361] Scaling Federated Linear Contextual Bandits via Sketching

Hantao Yang, Hong Xie, Xutong Liu, Defu Lian

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In federated contextual linear bandits, high data dimensionality incurs prohibitive computation and communication costs: local agents perform $O(d^3)$-time determinant computation and upload $O(d^2)$ parameters, making existing algorithms unscalable, where $d$ is the dimension of data. To relieve these scaling bottlenecks, this paper proposes Federated Sketch Contextual Linear Bandits (FSCLB). On the computation side, FSCLB uses SVD to indirectly obtain the determinant required for communication, eliminating the prohibitive cost of direct determinant calculation and cutting complexity from $O(d^3)$ to $O(l^2d)$ per round, where $l< d$ is the sketch size. On the communication side, FSCLB introduces a double-sketch strategy that reduces both upload and download costs from $O(d^2)$ to $O(ld)$. Naively involving sketch update into federated contextual linear bandits can destroy the local increment and invalidate the asynchronous communication condition; FSCLB solves this by replacing the covariance matrix with the sketch matrix when deciding whether to communicate. Theoretically, FSCLB achieves a regret bound of $\widetilde{O} ((\sqrt{d}+\sqrt{M\varepsilon_l})\sqrt{lT})$, where $\varepsilon_l$ is the upper bounded by the spectral tail of the covariance matrix; when $l$ exceeds the rank of the covariance matrix, the bound simplifies to $\widetilde{O}(\sqrt{ldT})$, matching the optimal no-sketch regret. Experiments on both synthetic and real-world datasets show that FSCLB significantly reduces computational and communication costs by over 90 % while sacrificing only a negligible amount of cumulative reward.

[362] Possibilistic Predictive Uncertainty for Deep Learning

Yao Ni, Jeremie Houssineau, Yew Soon Ong, Piotr Koniusz

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep neural networks achieve impressive results across diverse applications, yet their overconfidence on unseen inputs necessitates reliable epistemic uncertainty modelling. Existing methods for uncertainty modelling face a fundamental dilemma: Bayesian approaches provide principled estimates but remain computationally prohibitive, while efficient second-order predictors lack rigorous derivations connecting their specific objectives to epistemic uncertainty quantification. To resolve this dilemma, we introduce Dirichlet-approximated possibilistic posterior predictions (DAPPr), a principled framework leveraging possibility theory. We define a possibilistic posterior over parameters, projects this posterior to the prediction space via supremum operators, and approximates the projected posterior using learnable Dirichlet possibility functions. This projection-and-approximation strategy yields a simple training objective with closed-form solutions. Extensive experiments across diverse benchmarks demonstrate that our approach achieves competitive or superior uncertainty quantification performance compared to state-of-the-art evidential deep learning methods while maintaining both principled derivation and computational efficiency. Code will be available at https://github.com/MaxwellYaoNi/DAPPr.

[363] LambdaRankIC: Directly Optimizing Rank IC for Financial Prediction

Yan Lin, Yihong Su, Yi Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In financial predictions, the performance of machine learning models is often assessed by Rank IC, which is the Spearman rank correlation between the model predictions and the realized asset returns. Despite its wide adoption, most existing models are trained using regression losses or ranking objectives that may not align with Rank IC. We propose LambdaRankIC, a novel learning-to-rank approach that directly optimizes Rank IC. We circumvent the non-differentiability of the ranking operator by deriving the closed-form expression for the lambda gradients induced by the pairwise rank swaps, which enables efficient gradient-based optimization within the LambdaRank framework. We implement LambdaRankIC as a custom objective in XGBoost. Theoretically, we show that our approach optimizes an upper bound on Rank IC. We evaluate the proposed approach on both simulated and real-world financial data. In simulation studies, LambdaRankIC accurately recovers the true ranking structure in noiseless settings and consistently outperforms regression-based and NDCG-oriented ranking methods under low signal-to-noise ratios and heavy-tailed noise regimes. In empirical experiments using real market data, LambdaRankIC achieves the best out-of-sample performance on evaluation metrics commonly used in finance, including Rank IC, ICIR, monthly return, and Sharpe ratio. These results show that directly optimizing Rank IC can yield substantial improvements over conventional learning objectives in financial predictions when the full-order ranking quality is the primary goal.

[364] A Comparative Study of QSPR Methods on a Unique Multitask PAMPA dataset

Andrs Formanek, Anna Vincze, Richrd Bicsak, Yves Moreau, Gyorgy T. Balogh, Adam Arany

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present a unique, multitask dataset comprising 143 drug and drug candidate molecules, each evaluated on in vitro, parallel artificial-membrane permeability assays (PAMPA) using six different model membranes. Using this resource, we systematically assess the effectiveness of various molecular descriptors and regression models in predicting passive membrane permeability. The studied models range from simple linear regression to a modern pre-trained transformer architecture. Particular attention is given to the trade-off between predictive performance and model interpretability, highlighting the challenges introduced by machine learning approaches. To our knowledge, this is the most comprehensive study on simultaneous modeling of multiple organ-specific PAMPA membranes to date, offering novel insights into membrane-specific permeability profiles. We found that expert-designed physico-chemical property descriptors are more fitting for a limited sample size permeabilty study than deep learning based representations.

[365] Learning Multimodal Energy-Based Model with Multimodal Variational Auto-Encoder via MCMC Revision

Jiali Cui, Zhiqiang Lao, Heather Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Energy-based models (EBMs) are a flexible class of deep generative models and are well-suited to capture complex dependencies in multimodal data. However, learning multimodal EBM by maximum likelihood requires Markov Chain Monte Carlo (MCMC) sampling in the joint data space, where noise-initialized Langevin dynamics often mixes poorly and fails to discover coherent inter-modal relationships. Multimodal VAEs have made progress in capturing such inter-modal dependencies by introducing a shared latent generator and a joint inference model. However, both the shared latent generator and joint inference model are parameterized as unimodal Gaussian (or Laplace), which severely limits their ability to approximate the complex structure induced by multimodal data. In this work, we study the learning problem of the multimodal EBM, shared latent generator, and joint inference model. We present a learning framework that effectively interweaves their MLE updates with corresponding MCMC refinements in both the data and latent spaces. Specifically, the generator is learned to produce coherent multimodal samples that serve as strong initial states for EBM sampling, while the inference model is learned to provide informative latent initializations for generator posterior sampling. Together, these two models serve as complementary models that enable effective EBM sampling and learning, yielding realistic and coherent multimodal EBM samples. Extensive experiments demonstrate superior performance for multimodal synthesis quality and coherence compared to various baselines. We conduct various analyses and ablation studies to validate the effectiveness and scalability of the proposed multimodal framework.

[366] Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

Mengke Zhao, Guang-Xing Li, Duo Xu, Keping Qiu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Complex physical systems, from supersonic turbulence to the macroscopic structure of the universe, are governed by continuous multiscale dynamics. While modern machine learning architectures excel at mapping the high-dimensional observables of these systems, it remains unclear whether they internalize the governing physical laws or merely interpolate discrete statistical correlations. Standard Explainable AI (XAI) architectures, particularly perturbation-based and gradient-saliency methods, rely on pixel-wise perturbations, which generate unphysical artifacts and push inputs off the valid empirical distribution. To resolve this, we introduce a diagnostic framework driven by Constrained Diffusion Decomposition (CDD), a diffusion-based multiscale data decomposition algorithm that enables physically constrained data generation and model evaluation via scale-aware modifications. Applying this framework to a Denoising Diffusion Probabilistic Model (DDPM), we execute deterministic interventions directly within the continuous, CDD-based scale space. We demonstrate that under moderate physical perturbations, the unconstrained generative model exhibits localized structural freezing and non-linear instability rather than continuous PDE-like responses. The network fails to maintain cross-scale continuity, causing the generative trajectory to diverge when pushed into unseen physical states. By synthesizing a continuum of physically coherent states, this scale-informed methodology establishes a controlled test ground to evaluate algorithmic vulnerabilities, providing the rigorous physical constraints necessary for future architectures to respect the multiscale causality of the natural universe.

[367] AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

Zhijie Cai, Haolong Chen, Guangxu Zhu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Fine-tuning LLMs is necessary for various dedicated downstream tasks, but classic backpropagation-based fine-tuning methods require substantial GPU memory. To this end, a recent work, MeZO, which relies solely on forward passes to fine-tune LLMs, significantly reduces GPU requirements at the cost of slower convergence due to its indifference to loss landscapes. Standard solutions, such as Adam, explore loss landscapes by estimating the first- and second-order moments and storing them in memory to guide the model’s movement through dimensions with lower curvature and vice versa. However, directly applying Adam negates MeZO’s advantage as it will triple the memory requirement. In light of this, we propose AdaMeZO, a zeroth-order optimizer that leverages Adam-style first- and second-moment estimates without maintaining them in memory. We present a theoretical analysis of AdaMeZO, corroborated by extensive experiments demonstrating AdaMeZO’s performance, showing that AdaMeZO can outperform MeZO while requiring up to $70%$ fewer forward passes. Trajectory visualizations affirm AdaMeZO’s ability to adapt to diverse loss landscapes.

[368] Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that perform distribution matching are a promising methods, but they are notorious for training instability and mode collapse. In particular, unstable rewards in red-teaming accelerate mode collapse. We propose Stable-GFN (S-GFN), which eliminates partition function $Z$ estimation in GFN and reduces training instability. S-GFN avoids Z-estimation through pairwise comparisons and employs a robust masking methodology against noisy rewards. Additionally, we propose a fluency stabilizer to prevent the model from getting stuck in local optima that produce gibberish. S-GFN provides more stable training while maintaining the optimal policy of GFN. We demonstrate the overwhelming attack performance and diversity of S-GFN across various settings.

[369] Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

Andrzej Ruszczynski, Tiangang Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We also define the class of multipattern risk-averse problems that generalizes the class of linear systems. We use both concepts in a feature-based $Q$-learning method with multipattern $Q$-factor approximation and we prove a high-probability regret bound of $\mathcal{O}\big(H^2 N^H \sqrt{ K}\big)$, where $H$ is the horizon, $N$ is the mini-batch size, and $K$ is the number of episodes. We also propose an economical version of the $Q$-learning method that streamlines the policy evaluation (backward) step. The theoretical results are illustrated on a stochastic assignment problem and a short-horizon multi-armed bandit problem.

[370] Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

Man Yung Wong

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition. Three lightweight gate modifications raise this to 0.748 +/- 0.002 (124x), cutting experts needed for 99% coverage from infeasible to a small constant: temporal memory (beta), a per-expert LIF membrane potential accumulating routing context across tokens; precision-weighted gating (Pi), a per-expert inverse variance of recent prediction error, yielding 31x contrast between reliable and unreliable experts; and anticipatory routing, a next-state predictor conditioned on the beta-accumulated hidden state. The mechanisms draw from Friston’s Free Energy Principle and use LIF dynamics from spiking neural networks. An ablation across all 2^3 subsets reveals a super-additive beta x Ant interaction: anticipation alone gives nothing (+0.000 +/- 0.001); beta alone gives modest gain (+0.295 +/- 0.013); combined they close 75% of the oracle gap (+0.741 +/- 0.002, exceeding the sum by +0.446 +/- 0.014). This is structural: a stateless predictor cannot detect approaching transitions because pre-transition tokens are distributionally identical to within-domain tokens. In a character-level MoE LM (5 seeds), beta-routing reduces transition-step BPC from 6.56 +/- 0.01 (Standard) to 4.01 +/- 0.15 (beta-MoE); the beta + Ant gate places 0.86 +/- 0.02 probability on the correct domain expert before that domain appears in input, vs 0.42 +/- 0.12 for Standard MoE. Reference implementations (~200 lines each): https://github.com/russellwmy/affinity-is-not-enough

[371] Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning

Jiaming Zhang, Yujie Yang, Yao Lyu, Shengbo Eben Li, Liping Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominent paradigm. Handling state-wise constraints with the Lagrangian method requires a distinct multiplier for every state, necessitating neural networks to approximate them as a multiplier network. However, applying standard dual gradient ascent to multiplier networks induces severe training oscillations. This is because the inherent instability of dual ascent is exacerbated by network generalization – local overshoots and delayed updates propagate to adjacent states, further amplifying policy fluctuations. Existing stabilization techniques are designed for scalar multipliers, which are inadequate for state-dependent multiplier networks. To address this challenge, we propose an augmented Lagrangian multiplier network (ALaM) framework for stable learning of state-wise multipliers. ALaM consists of two key components. First, a quadratic penalty is introduced into the augmented Lagrangian to compensate for delayed multiplier updates and establish the local convexity near the optimum, thereby mitigating policy oscillations. Second, the multiplier network is trained via supervised regression toward a dual target, which stabilizes training and promotes convergence. Theoretically, we show that ALaM guarantees multiplier convergence and thus recovers the optimal policy of the constrained problem. Building on this framework, we integrate soft actor-critic (SAC) with ALaM to develop the SAC-ALaM algorithm. Experiments demonstrate that SAC-ALaM outperforms state-of-the-art safe RL baselines in both safety and return, while also stabilizing training dynamics and learning well-calibrated multipliers for risk identification.

[372] Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

Chaohao Yuan, Chenghao Xiao, Yu Rong, Hong Cheng, Long-Kai Huang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential training can cause catastrophic forgetting, and joint optimization often suffers from severe gradient conflicts. We analyze SFT and RLVR through the lens of task vectors and reveal three structural properties behind these failures: a 30* magnitude disparity, 45* sign interference, and heterogeneous module-wise update distributions. These findings show SFT and RLVR are difficult to integrate directly, but they also suggest that the two paradigms modify partly complementary components of the model. Motivated by these observations, we propose Decoupled Test-time Synthesis (DoTS), a post-hoc framework allows SFT and RLVR checkpoints to be trained independently and synthesizes their capabilities only at inference time via task vector arithmetic, without updating model parameters. To reduce interference, DOTS applies selective sparsification with norm-preserving rescaling. It then uses Bayesian optimization on a small set of unlabeled queries to search for combination coefficients on the Pareto frontier of consistency and perplexity. Empirically, \ours matches or exceeds the performance of training-based SFT–RLVR integration methods across multiple mathematical reasoning benchmarks, incurring only $\sim$3% of the computational cost. When applied to stronger post-trained checkpoints, DOTS surpasses SOTA models and generalizes to out-of-domain benchmarks without re-tuning. Code is available at https://github.com/chaohaoyuan/DoTS.

[373] Class Angular Distortion Index for Dimensionality Reduction

Kaviru Gunaratne, Stephen Kobourov, Jacob Miller

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dimensionality reduction (DR) techniques are often characterized by whether they preserve global, high-level structures in the data or local, neighborhood structures. This distinction matters in visualization: global methods can obscure clusters while local methods can over-emphasize them. Yet, even when clusters appear distinct, their relative arrangement in the projection may be arbitrary or misleading, a common issue in techniques such as t-SNE and UMAP. Existing cluster quality metrics either only measure cluster separability or assume spherical, globular clusters in the original space. We introduce the Class Angular Distortion Index (CADI), a metric that uses internal angles among point triples to determine the faithfulness of cluster organization in a projection. We show cases on both real and synthetic data where existing cluster metrics fail, but CADI provides an interpretable result. Since it relies on computing angles, CADI is also differentiable, enabling optimization. We demonstrate this with a CADI-based DR technique.

[374] Unlearning Offline Stochastic Multi-Armed Bandits

Zichun Ye, Runqi Wang, Xuchuang Wang, Xutong Liu, Shuai Li, Mohammad Hajiesmaili

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms that switch between them according to the data regime and privacy constraint. We further introduce a mixing procedure that elucidates the rationale behind these baselines. We provide performance guarantees across the above settings and establish lower bounds under both dataset models. Experiments validate the predicted tradeoffs and demonstrate the effectiveness of the proposed methods.

[375] Knowing when to trust machine-learned interatomic potentials

Shams Mehdi, Ilkwon Cho, Olexandr Isayev

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Prevailing machine-learned interatomic potential (MLIP) uncertainty-quantification methods rely on ensembles of independently trained backbones. These methods scale unfavorably with foundation-scale MLIPs, and their member-disagreement signals correlate weakly with per-molecule prediction error. Here we probe the frozen per-atom representations of a pretrained MLIP with a compact discriminative classifier, recasting MLIP uncertainty quantification as selective classification rather than error regression. The resulting method, PROBE (Post-hoc Reliability frOm Backbone Embeddings), produces a per-prediction reliability probability that monotonically tracks actual error without modification to the underlying model. Across large held-out evaluation sets and two structurally distinct MLIP architectures, PROBE outperforms ensemble disagreement as a binary reliability signal, which strengthens with the expressiveness of the backbone representation, implying a favorable scaling trajectory toward foundation-scale MLIPs. Multi-head self-attention additionally yields per-atom importance maps, providing chemically interpretable diagnostics at no additional computational cost. PROBE is post-hoc and architecture-agnostic, and is directly deployable on any MLIP that exposes per-atom representations.

[376] Bridging Graph Drawing and Dimensionality Reduction with Stochastic Stress Optimization

Daniel Hangan, Stephen Kobourov, Jacob Miller

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Both Dimensionality Reduction (DR) and Graph Drawing (GD) aim to visualize abstract, non-linear structures, yet rely on different optimization paradigms. This contrast is evident in Multidimensional Scaling (MDS), which typically depends on the SMACOF algorithm despite graph drawing results showing that simpler stochastic optimization schemes can be more effective for the same objective. We bridge these domains by adapting Stochastic Gradient Descent (SGD) techniques from graph drawing to vector data embedding. We present a scikit-learn compatible estimator that minimizes global stress through local pairwise updates, improving upon the existing implementation. Experiments on standard high-dimensional benchmarks show that our stochastic solver converges substantially faster than SMACOF while achieving comparable or lower stress.

[377] From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting

Alireza Namazi, Heman Shakeri

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Clinical time-series forecasting is increasingly studied for decision support, yet standard aggregate metrics can obscure whether a model is actually useful for the task it is meant to serve. In safety-critical settings, low average error can coexist with dangerous failures in exactly the high-risk regimes that matter most. We present a task-aware evaluation framework for blood glucose forecasting built around two downstream uses: hypoglycemia early warning and insulin dosing decision support. For early warning, we evaluate on real data from three clinical cohorts using event-level recall and false alarms per patient-day, metrics that reflect operational alarm burden rather than aggregate accuracy. We show that models appearing acceptable overall, with recall above 0.9 on the full test set, can fail badly in the post-bolus slice, where insulin-on-board is elevated and missed warnings carry the greatest clinical consequences. Standard forecasting evaluation, however, does not test whether a model can reason about the effects of actions, a requirement for supporting insulin dosing decisions. We therefore add a second, interventional arm using the FDA-accepted UVA/Padova simulator, where we evaluate whether forecasters can predict glucose responses to altered insulin plans in paired factual/counterfactual scenarios. We show that models that look strong on real-data forecasting often fail to predict the direction, magnitude, or ranking of intervention effects, and choose poor insulin doses when evaluated under a clinically motivated cost. Taken together, the two arms reveal a consistent gap between forecasting accuracy and task-relevant usefulness. We release the benchmark, the standardized preprocessing pipeline for public cohorts, and the simulator-based interventional dataset as a reproducible toolkit.

Xinran Liu, Yuwen Li, Hongxiang Gao, Heyang Xu, Jianqing Li, Zongmin Wang, Chengyu Liu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automated pediatric electrocardiogram (ECG) diagnosis remains challenging because models trained predominantly on adult data suffer from substantial cross-population mismatch, while pediatric labels are often scarce. We present PEACE (Pediatric-Adult ECG Alignment via Cross-modal Enhancement), a structured cross-modal alignment framework for adult-to-pediatric ECG transfer. PEACE integrates tri-axial clinical semantic decomposition, label-query feature extraction, and curriculum-gated optimization to align transferable adult ECG representations with pediatric diagnostic targets. Since ZZU-pECG provides no paired clinical reports, we generate label-conditioned semantic descriptors using Gemini with concise clinical prompts and use them only as auxiliary training supervision; inference remains ECG-only. On ZZU-pECG, PEACE achieves 59.39%, 79.03%, and 90.89% AUC under zero-shot, 50-shot, and full fine-tuning settings, respectively, and reaches 96.65% AUC on the shared PTB-XL label space. These results suggest that structured clinical semantic supervision can improve low-resource adult-to-pediatric ECG transfer, while prospective clinical validation and more explicit age-aware modeling remain necessary before real-world deployment.

[379] Budget Constraints as Riemannian Manifolds

Michael Helcig, Dan Alistarh

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Assigning one of K options to each of N groups under a total cost budget is a recurring problem in machine learning, appearing in mixed-precision quantization, non-uniform pruning, and expert selection. The objective (model loss) depends jointly on all assignments and does not decompose across groups, which prevents combinatorial solvers from optimizing the true objective directly and limits them to proxy objectives. Evolutionary search evaluates the actual loss but lacks gradient information, while penalty-based methods provide gradients but enforce the budget only approximately and require sensitive hyperparameter tuning. We observe that under softmax relaxation, the budget constraint defines a smooth Riemannian manifold in logit space with particularly simple geometry: the normal vector is available in closed form, shifting logits along the cost vector changes expected cost monotonically, allowing binary-search retraction, and vector transport reduces to a single inner product. Building on this structure, we propose Riemannian Constrained Optimization (RCO), which augments a standard Adam update with tangent projection, binary-search retraction, and momentum transport. Combined with Gumbel straight-through estimation and budget-constrained dynamic programming for discrete feasibility, RCO enables first-order optimization of the true objective under exact budget enforcement, without introducing constraint hyperparameters. On synthetic knapsack problems with known optima, the manifold-based constraint handling recovers optimal solutions, whereas penalty methods plateau at 83% of optimal. On LLM compression tasks, including mixed-precision quantization and MoE expert pruning, RCO matches or exceeds evolutionary search methods while requiring 3x to 16x lower wall-clock cost on the evaluated configurations.

[380] Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game

Lixing Li

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While Large Language Models have achieved notable success on formal mathematics benchmarks such as MiniF2F, it remains unclear whether these results stem from genuine logical reasoning or semantic pattern matching against pre-training data. This paper identifies Architectural Reasoning: the ability to synthesize formal proofs using exclusively local axioms and definitions within an alien math domain, as the necessary ability for future automated theorem discovery AI. We use the Obfuscated Natural Number Game, a benchmark to evaluate Architectural Reasoning. By renaming identifiers in the Natural Number Game in Lean 4, we created a zero-knowledge, closed environment. We evaluate state-of-the-art models, finding a universal latency tax where obfuscation increases inference time. The results also reveal a divergence in robustness: while general models (Claude-Sonnet-4.5, GPT-4o) suffer performance degradation, reasoning models (DeepSeek-R1, GPT-5, DeepSeek-Prover-V2) maintain the same accuracy despite the absence of semantic cues. These findings provide a quantitative metric for assessing the true capacity for mathematical reasoning.

[381] Deep Kernel Learning for Stratifying Glaucoma Trajectories

Bruce Rushing, Angela Danquah, Alireza Namazi, Arjun Dirghangi, Heman Shakeri

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Effectively stratifying patient risk in chronic diseases like glaucoma is a major clinical challenge. Clinicians need tools to identify patients at high risk of progression from sparse and irregularly-sampled electronic health records (EHRs). We propose a novel deep kernel learning (DKL) architecture that leverages a Gaussian Process (GP) backend. The GP’s kernel is defined by a transformer-based feature extractor applied to clinical-BERT embeddings to model glaucoma patient trajectories from multimodal EHR data. Our method successfully identifies three clinically distinct patient subgroups. Crucially, the model learns to decouple disease progression from current severity, identifying a high-risk group with a worsening trajectory despite having better average visual acuity than a second, stably poor group. This reveals that the model learns to identify progression risk rather than just the current disease state. This ability to stratify patients based on their risk trajectory progression offers a powerful tool for clinical decision support, enabling targeted interventions for high-risk individuals and improving the management of glaucoma care.

[382] Aitchison Embeddings for Learning Compositional Graph Representations

Nikolaos Nakis, Chrysoula Kosma, Panagiotis Promponas, Michail Chatzianastasis, Giannis Nikolentzos

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Representation learning is central to graph machine learning, powering tasks such as link prediction and node classification. However, most graph embeddings are hard to interpret, offering limited insight into how learned features relate to graph structure. Many networks naturally admit a role-mixture view, where nodes are best described as mixtures over latent archetypal factors. Motivated by this structure, we propose a compositional graph embedding framework grounded in Aitchison geometry, the canonical geometry for comparing mixtures. Nodes are represented as simplex-valued compositions and embedded via isometric log-ratio (ILR) coordinates, which preserve Aitchison distances while enabling unconstrained optimization in Euclidean space. This yields intrinsically interpretable embeddings whose geometry reflects relative trade-offs among archetypes and supports coherent behavior under component restriction; we consider both fixed and learnable ILR bases. Across node classification and link prediction, our method achieves competitive performance with strong baselines while providing explainability by construction rather than post-hoc. Finally, subcompositional coherence enables principled component restriction: removing and renormalizing subsets preserves a well-defined geometry, which we exploit via subcompositional dimensionality removal to probe how archetype groups influence representations and predictions.

[383] Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

Jiawen Chen, Qi Shao, Duxin Chen, Wenwu Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Combinatorial complexes have unified set-based (e.g., graphs, hypergraphs) and part-whole (e.g., simplicial, cellular complexes) structures into a common topological framework. Existing topological neural networks and Weisfeiler-Lehman variants remain fragmented, lacking a unified theoretical foundation for topological deep learning. In this work, we introduce the Combinatorial Complex Weisfeiler-Lehman (CCWL) test, an axiomatic-style extension of the WL test to combinatorial complexes. CCWL formalizes topological message passing through four types of neighborhood relation and provides a unified perspective on the expressive power of higher-order variants. We further prove that upper and lower neighborhoods are sufficient among the four adjacent WL tests to reach the expressivity of the full CCWL framework across topological structures of combinatorial complexes. Building on this framework, we also propose the Combinatorial Complex Isomorphism Network (CCIN) and evaluate it on synthetic and real-world benchmarks. Experimental results indicate CCIN outperforms baseline methods and offers a generalized expressive framework for topological deep learning.

[384] Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

Ramin Mohammadi, Vahab vahdat, Sarthak Jain, Amir T. Namin, Ramya Palacholla, Sagar Kamarthi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.

[385] NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.

[386] Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

Rodolphe Barlogis, Ferhat Tamssaouet, Quentin Falcoz, Stéphane Grieu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the DeepONet framework. We consider a 2D square domain with an inclusion of arbitrary boundary geometry at its center. This inclusion acts as a scatterer for an incoming harmonic wave. The aim is to learn the operator linking the geometry of the scatterer to the resulting scattered field. A signed distance function to the boundary of the inner inclusion, evaluated at several points in the domain, is used to encode its geometry. It serves as input for the branch part of the DeepONet architecture, while local information is used as input for the trunk part. This approach enables the encoding of arbitrary geometries, whether they are parameterized or not. The evaluation of the model on unseen geometries is compared with its finite element method (FEM) equivalent to test its generalization capabilities. The trained network weights implicitly embed the local physics and their interaction with the domain geometry. If the training space sufficiently covers the target evaluation space, the model can generalize accordingly. Furthermore, it can be refined to extend to another region of interest without retraining from scratch. This framework also avoids the need to remesh the domain for each geometry. The proposed approach delivers a computationally lighter surrogate model than FEM alternatives and avoids relying on FEM-generated training data.

[387] Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

Jacques Raynal, Pierre Slangen, Jacques Margerit

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly presumes a correspondence between output metrics and internal system states that may not hold in adaptive systems. In this study, the vertical dimension of occlusion (VDO) is considered as a constraint applied to an adaptive neuromechanical system, enabling the exploration of system-level responses under controlled variations. A single-case design in a patient with Parkinson’s disease allows an intra-individual analysis across repeated conditions.The analysis is structured across three complementary levels: (i) aggregated linear metrics describing observable performance, (ii) a dynamical systems framework describing temporal organization in state space, and (iii) a latent space representation obtained through unsupervised embedding. The results show that conditions with comparable observable performance may correspond to different organizations in both state space and latent space representations. This dissociation highlights a limitation of aggregated metrics and suggests that similar outputs may arise from non-equivalent system states. A fourth level is proposed as a purely conceptual extension describing potential relationships between system states. This level is not implemented and is not derived from experimental data. These observations are strictly exploratory and non-causal. The proposed framework does not establish mechanistic, predictive, or directional relationships, but provides a structured approach for analyzing constraint-driven systems across multiple levels of representation.

[388] SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Stavros Orfanoudakis, Pedro P. Vergara

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy updates directly in the action space. To bridge this gap, a geometry-aware RL algorithm that explicitly incorporates value-based similarity into the policy update, State-Action Value Geometry Optimization (SAVGO), is proposed. In detail, SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity, while dissimilar pairs are mapped to distinct directions. This learned geometry enables the generation of a similarity kernel over candidate actions sampled at each update, allowing policy improvement to be guided directly toward higher-value regions beyond local gradient-based updates. As a result, representation learning, value estimation, and policy optimization are unified within a single geometry-consistent objective, while preserving the scalability of off-policy actor-critic training. The proposed method is evaluated on standard MuJoCo continuous-control benchmarks, demonstrating improvements over strong baselines on challenging high-dimensional tasks. Ablation studies are done to analyze the contributions of value-geometry learning and similarity-based policy updates.

[389] Generating Statistical Charts with Validation-Driven LLM Workflows

Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-answer pairs. We present a structured LLM-based workflow that decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. By incorporating rendered-output validation, the workflow addresses visualization-specific failure modes such as readability and semantic mismatch. It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task, retaining each chart with its code, dataset context, description, and question-answer pairs. Applied to UCI datasets, the workflow produces 1,500 charts from 74 datasets, spanning 24 chart families and paired with 30,003 question-answer pairs. We evaluate 16 multimodal LLMs (MLLMs) on these chart-question pairs. The results show that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain more challenging, illustrating the workflow’s utility for diagnostic studies of chart-grounded multimodal reasoning.

[390] Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2309.06577: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2309.06577&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[391] Value Explicit Pretraining for Learning Transferable Representations

Kiran Lekkala, Henghui Bao, Sumedh A. Sontakke, Erdem Biyik, Laurent Itti

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2312.12339: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2312.12339&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[392] Mutatis Mutandis: Revisiting the Comparator in Discrimination Testing

Jose M. Alvarez, Salvatore Ruggieri

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2405.13693: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2405.13693&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[393] Dynamics-Encoded Deep Learning for Robust System Identification and Parameter Estimation

Caitlin Ho, Andrea Arnold

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2410.04299: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2410.04299&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[394] Latent Generative Modeling of Random Fields from Limited Training Data

James E. Warner, Tristan A. Shah, Patrick E. Leser, Geoffrey F. Bomarito, Joshua D. Pribe, Michael C. Stanley

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.13007: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.13007&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[395] Privacy Amplification in Differentially Private Zeroth-Order Optimization with Hidden States

Eli Chien, Wei-Ning Chen, Pan Li

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.00158: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.00158&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[396] Graph Concept Bottleneck Models

Haotian Xu, Tsui-Wei Weng, Lam M. Nguyen, Tengfei Ma

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.14255: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.14255&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[397] Concolic Testing on Individual Fairness of Neural Network Models

Ming-I Huang, Chih-Duo Hong, Fang Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.06864: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.06864&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[398] Optimal hypersurface decision trees

Xi He

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.12057: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.12057&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[399] Incomplete Data, Complete Dynamics: A Diffusion Approach

Zihan Zhou, Chenguang Wang, Hongyi Ye, Yongtao Guan, Tianshu Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.20098: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.20098&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[400] Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling

Siva Viknesh, Amirhossein Arzani

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.00233: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.00233&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[401] Adaptive Node Feature Selection For Graph Neural Networks

Ali Azizpour, Madeline Navarro, Santiago Segarra

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.03096: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.03096&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[402] Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models

Xinshuai Dong, Ignavier Ng, Haoyue Dai, Jiaqi Sun, Xiangchen Song, Peter Spirtes, Kun Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.04378: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.04378&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[403] SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

Murtaza Rangwala, Farag Azzedin, Richard O. Sinnott, Rajkumar Buyya

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.07922: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.07922&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[404] Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

Jingxin Zhan, Yuze Han, Zhihua Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.22819: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.22819&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[405] SynQuE: Estimating Synthetic Dataset Quality Without Annotations

Arthur Chen, Victor Zhong

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.03928: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.03928&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[406] Uncertainty Modeling for Multi-Objective RTA Interception with Distillation Acceleration

Gaoxiang Zhao, Ruinan Qiu, Pengpeng Zhao, Rongjin Wang, Xiaoting Wang, Zhangang Lin, Xiaoqiang Wang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.05582: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.05582&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[407] When Structure Doesn’t Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected

Haotian Xu, Yuning You, Tengfei Ma

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.16767: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.16767&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[408] Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism

Tianwei Ni, Esther Derman, Vineet Jain, Vincent Taboga, Siamak Ravanbakhsh, Pierre-Luc Bacon

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.04341: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.04341&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[409] Resting Neurons, Active Insights: Robustify Activation Sparsity for Large Language Models

Haotian Xu, Jiannan Yang, Tian Gao, Tsui-Wei Weng, Tengfei Ma

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.12744: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.12744&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[410] NRGPT: An Energy-based Alternative for GPT

Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.16762: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.16762&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[411] The Seismic Wavefield Common Task Framework

Alexey Yermakov, Yue Zhao, Marine Denolle, Yiyu Ni, Philippe M. Wyder, Judah Goldfeder, Stefano Riva, Jan Williams, David Zoro, Amy Sara Rude, Matteo Tomasetto, Joe Germany, Joseph Bakarji, Georg Maierhofer, Miles Cranmer, J. Nathan Kutz

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.19927: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.19927&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[412] Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces

Bryon Tjanaka, Henry Chen, Matthew C. Fontaine, Stefanos Nikolaidis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.01082: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.01082&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[413] Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

Luca Zhou, Bo Zhao, Rose Yu, Emanuele Rodolà

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.22285: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.22285&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[414] Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

Markus Mueller, Kathrin Gruber, Dennis Fok

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.22816: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.22816&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[415] Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE

Yuanteng Chen, Peisong Wang, Nanxin Zeng, Yuantian Shao, Shuang Qiu, Gang Li, Jing Liu, Jian Cheng

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.02443: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.02443&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[416] Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models

Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03265: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03265&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[417] Riemannian MeanFlow

Dongyeop Woo, Marta Skreta, Seonghyun Park, Kirill Neklyudov, Sungsoo Ahn

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.07744: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.07744&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[418] One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Amith Bhat, Haipeng Luo, Aadirupa Saha

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.14474: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.14474&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[419] RAT+: Train Dense, Infer Sparse – Recurrence Augmented Attention for Dilated Inference

Xiuying Wei, Caglar Gulcehre

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.18196: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.18196&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[420] A Comparative Study of UMAP and Other Dimensionality Reduction Methods

Guanzhe Zhang, Shanshan Ding, Zhezhen Jin

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.02275: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.02275&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[421] Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning

Xueqiao Peng, Andrew Perrault

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.19397: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.19397&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[422] Placing Puzzle Pieces Where They Matter: A Question Augmentation Framework for Reinforcement Learning

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15830: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15830&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[423] A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

S. Gratton, Ph. L. Toint

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17423: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17423&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[424] How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals

Dharshan Kumaran, Viorica Patraucean, Simon Osindero, Petar Veličković, Nathaniel Daw

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.22271: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.22271&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[425] A Differentiable Framework for Global Circulation Model Precipitation Bias Correction

Kamlesh Sawadekar, Seth McGinnis, Peijun Li, Kathryn Lawson, Chaopeng Shen

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.23045: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.23045&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[426] RL Token: Bootstrapping Online RL with Vision-Language-Action Models

Charles Xu, Jost Tobias Springenberg, Michael Equi, Ali Amin, Adnan Esmail, Sergey Levine, Liyiming Ke

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.23073: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.23073&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[427] SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations

Jason Wu, Shir-Kang Scott Jin, Yuyang Yuan, Maggie Wigness, Lance M. Kaplan, Hang Qiu, Mani Srivastava

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26181: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26181&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[428] Distance-Aware Error for Spline Networks: A Bottom-Up Approach to Uncertainty

Masoud Ataei, Mohammad Javad Khojasteh, Vikas Dhiman

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.04757: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.04757&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[429] Mean-field limit from general mixtures of experts to quantum neural networks

Anderson Melchor Hernandez, Davide Pastorello, Giacomo De Palma

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.14660: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.14660&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[430] Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium

Kaizhao Liu, Qi Long, Zhekun Shi, Weijie J. Su, Jiancong Xiao

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.10990: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.10990&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[431] Doubly robust identification of treatment effects from multiple environments

Piersilvio De Bartolomeis, Julia Kostin, Javier Abad, Yixin Wang, Fanny Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.14459: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.14459&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[432] TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference

Raja Gond, Nipun Kwatra, Ramachandran Ramjee

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.11329: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.11329&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[433] Characterizing control between interacting subsystems with deep Jacobian estimation

Adam J. Eisen, Mitchell Ostrow, Sarthak Chandra, Leo Kozachkov, Earl K. Miller, Ila R. Fiete

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.01946: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.01946&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[434] Surprisingly High Redundancy in Electronic Structure Data Across Materials Explained by Low Intrinsic Dimensionality

Sazzad Hossain, Ponkrshnan Thiagarajan, Shashank Pathrudkar, Stephanie Taylor, Abhijeet S. Gangan, Amartya S. Banerjee, Susanta Ghosh

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.09001: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.09001&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[435] Understanding Cognitive States from Head & Hand Motion Data

Kaiang Wen, Mark Roman Miller

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24255: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24255&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[436] Foundation Models for Discovery and Exploration in Chemical Space

Alexius Wadell, Anoushka Bhutani, Victor Azumah, Austin R. Ellis-Mohr, Andrew J. Stier, Kareem Hegazy, Alexander Brace, Hancheng Zhao, Celia Kelly, Anuj K. Nayak, Yuhan Chen, Dimitrios Simatos, Hongyi Lin, Murali Emani, Venkatram Vishwanath, Kevin Gering, Melisa Alkan, Tom Gibbs, Jack Wells, Wesley W. Qian, Richard C. Gerkin, Benjamin Amorelli, Alexander B. Wiltschko, Lav R. Varshney, Bharath Ramsundar, Karthik Duraisamy, Michael W. Mahoney, Arvind Ramanathan, Venkatasubramanian Viswanathan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.18900: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.18900&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[437] TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests

Haarika Manda, Manshi Sagar, Yogesh, Kartikay Singh, Cindy Zhao, Tarun Mangla, Phillipa Gill, Elizabeth Belding, Arpit Gupta

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.21141: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.21141&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[438] Minimizing Human Intervention in Online Classification

William Réveillard, Vasileios Saketos, Alexandre Proutiere, Richard Combes

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.23557: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.23557&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[439] Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network

Elham Kiyani, Amit Makarand Deshpande, Madhura Limaye, Zhiwei Gao, Zongren Zou, Sai Aditya Pradeep, Srikanth Pilla, Gang Li, Zhen Li, George Em Karniadakis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.13746: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.13746&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[440] Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

Huan Li, Yiming Dong, Zhouchen Lin

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.07326: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.07326&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[441] Statistical Testing Framework for Clustering Pipelines by Selective Inference

Yugo Miyata, Tomohiro Shiraishi, Shuichi Nishino, Ichiro Takeuchi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.18413: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.18413&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[442] On the Expressive Power of Contextual Relations in Transformers

Demián Fraiman

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.25860: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.25860&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[443] Generative Modeling under Non-Monotone MAR Missingness via Approximate Wasserstein Gradient Flows

Gitte Kremling, Jeffrey Näf, Johannes Lederer

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.04567: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.04567&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[444] From Cursed to Competitive: Closing the ZO-FO Gap via Input-to-State Stability

Amir Ali Farzin, Philipp Braun, Iman Shames

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.25372: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.25372&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

cs.MA

Aurélien Bück-Kaeffer, Sneheel Sarangi, Maximilian Puelma Touzel, Reihaneh Rabbany, Zachary Yang, Jean-François Godbout

Main category: cs.MA

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Studies attempting to simulate human behavior with $\textit{Silicon Societies}$ grow in numbers while LLM-only social networks have started appearing outside of controlled settings. However, the design space of these networks remains under-studied, which contributes to a gap in validating model realism. To enable future works to make more informed design decisions, we perform a systematic analysis of the consequences and interactions of key design choices in simulated social networks, including the choice of base model used to model individual agents, and how they are connected to each other. Using surveys as a proxy for agent opinions, our findings suggest that the geometry of the design space is non-trivial, with some parameters behaving in additive ways while others display more complex interactions. In particular, the choice of the base LLM is the most important variable impacting the simulation outcomes.

[446] Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

Maksym Nechepurenko, Pavel Shuvalov

Main category: cs.MA

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Evaluating the true forecasting ability of AI agents requires environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL – a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score – proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy’s classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $α^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $α^* = 0.01$ requires four times more. We complement these analytical results with a 50-round live evaluation of five frontier LLM agents plus a random baseline. Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. All smart contracts and evaluation infrastructure are open-source.

[447] Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen

[454] Broadband Wide Field of View Imaging with Computational Mirrors

Vishwanath Saragadam, Niki Nezakati, Amit Roy-Chowdhury, Vivek Boominathan

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Traditional glass-based optics are typically optimized for narrow spectral bands, such as the visible (400-700nm) or shortwave infrared (1000-1800nm). While the emergence of VIS-SWIR sensors (400-1700nm) offers transformative potential, refractive optics struggle to focus this entire range simultaneously. Mirrors represent a promising achromatic alternative; however, they are often sidelined by field curvature, and off-axis aberrations. This paper introduces Computational Mirrors, a framework that enables high-resolution, wide-field-of-view imaging across the complete VIS-SWIR spectrum using a single sensor. Our method is built on the observation that distinct regions of the field of view reach focus at varying distances from the mirror. By capturing a minimal focal stack (2-4 images), we utilize a computational backend to recover a sharp, all-in-focus image. A key contribution of this work is SeidelConv, a novel, physics-inspired, spatially-varying point spread function (PSF) model designed to accurately characterize and correct the off-axis aberrations inherent in simple concave mirrors. We demonstrate the efficacy of our approach using a first-of-its-kind 50mm F/1 optical system equipped with a VIS-SWIR sensor. Our system produces sharp images across RGB, NIR, and SWIR wavelengths without requiring refocusing, revealing material details invisible within individual spectral bands. We further validate the scalability of our approach with a 100mm F/2 system optimized for long-range imaging.

[455] RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics

Bojun Zhang, Huiyu Yang, Yunpeng Wang, Yuntian Chen, Yuanwei Bin, Rikui Zhang, Jianchun Wang

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Rapid aerodynamic evaluation is crucial for modern vehicle design, yet existing neural operators struggle to capture intricate spatial correlations. We propose the rotary-enhanced transformer operator (RETO), a novel neural solver featuring a dual-stage spatial awareness mechanism: sinusoidal-cosine encodings for global referencing and rotary positional encodings (RoPE) for relative displacements. RoPE encodes spatial relations via unitary rotations, enforcing translation invariance and enhancing local gradient resolution. RETO is validated on ShapeNet and the high-fidelity DrivAerML benchmark. On ShapeNet, RETO achieves a relative $L_2$ error of 0.063, outperforming RegDGCNN at 0.125 and representing a 16% improvement over the Transolver baseline, which yields an error of 0.075. These performance gains are further amplified on the DrivAerML dataset, where RETO achieves relative $L_2$ errors of 0.089 for surface pressure and 0.097 for velocity. In comparison, Transolver results in errors of 0.116 and 0.121 for the same metrics, indicating that RETO achieves precision enhancements of 23% and 19%, respectively. For comprehensive comparison, the surface pressure and velocity errors for AB-UBT are 0.102 and 0.124, while RegDGCNN yields 0.235 and 0.312, respectively. Information-theoretical analysis shows that the entropy peak of RETO at 0.35 is significantly lower than that of Transolver at 0.75 under $10^4$ resolution, indicating a focused attentional mechanism capable of preserving localized gradients against global diffusion.

[456] Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

Ge Luo, Jun-Jie Huang, Qi Yu, Tianrui Liu, Ke Liang, Yuming Xiang, Wentao Zhao, Xinwang Liu, Meng Wang

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.

[457] Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy

Minhee Lee, Sangyoon Lee, Jiwook Lee, Minki Hong, Kyuyoung Kim, Wonhwa Kim, Jaeho Lee

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios. However, Lissajous scanning traces a resonant trajectory and samples only the visited pixels per frame; at high frame rates, many pixels remain unvisited, creating structured holes. In this work, we introduce the first benchmark for high-rate Lissajous CLE, consisting of low-quality video clips paired with high-quality reference images. The reference images are wide-FOV mosaics obtained by stitching stabilized, slow-scan frames of the same tissue, enabling temporally aligned supervision. Using this dataset, we propose MIRA, a lightweight recurrent framework for Lissajous CLE restoration that iteratively aggregates temporal context through feature reuse and displacement alignment. Our experiments demonstrate that MIRA outperforms both lightweight and high-complexity baselines in restoration quality while maintaining a favorable computational efficiency suitable for clinical deployment.

[458] FedKPer: Tackling Generalization and Personalization in Medical Federated Learning via Knowledge Personalization

Zoe Fowler, Ghassan AlRegib

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated learning (FL) holds great potential for medical applications. However, statistical heterogeneity across healthcare institutions poses a major challenge for FL, as the global model struggles both to generalize across unseen patient populations and to adapt to the unique data distributions of individual hospitals. This heterogeneity also exacerbates forgetting at both the global and local level, resulting in previous learned patient patterns to be misclassified after model updates. While prior work has largely treated generalization and personalization as separate challenges, we show that a better balance between the two can be achieved through selective alignment with the global model and a modified aggregation scheme, which together mitigate the effects of statistical heterogeneity. Specifically, we introduce FedKPer, which introduces knowledge personalization into the training stage of each local device. Afterwards, generalization is considered via the global model aggregation process, where local updates that are reliable and label-diverse are emphasized. We evaluate the performance of FedKPer, devising additional metrics that relate to common consequences of forgetting. Overall, we demonstrate FedKPer improves the generalization-personalization trade-off without sacrificing retention.

[459] Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

Jingxi Pu, Tonghua Liu, Zhilin Guan, Siqiao Li, Yang Ming, Zheng Cong, Wei Zhang, Fangwei Li

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, which may interfere with visual interpretation by physicians and affect diagnostic results. To address this problem, inspired by Cycle-GAN for unsupervised learning, this paper proposes an end-to-end unsupervised low-dose computed tomography denoising framework. The proposed framework combines a U-Net structure for multi-scale feature extraction, an attention mechanism for feature fusion, and a residual network for feature transformation. It also introduces perceptual loss to improve the network for the characteristics of medical images. In addition, we construct a real low-dose computed tomography dataset and design a large number of comparative experiments to validate the proposed method, using both image-based evaluation metrics and medical evaluation criteria. Compared with classical methods, the main advantage of this paper is that it addresses the limitation that real clinical data cannot be directly used for supervised learning, while still achieving excellent performance. The experimental results are also professionally evaluated by imaging physicians and meet clinical needs.

[460] A Unified Deep Learning Framework for Motion Correction in Medical Imaging

Jian Wang, Razieh Faghihpirayesh, Danny Joca, Polina Golland, Ali Gholipour

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep learning has shown significant value in medical image registration for motion correction, however, current techniques are either limited by the type and range of motion they can handle, or require iterative inference and/or retraining for new imaging data. To address these limitations, we introduce UniMo, a Unified Motion Correction framework that leverages deep neural networks to correct for various types of motion in medical imaging. UniMo exploits an alternating optimization scheme for a unified loss function to train an integrated model of 1) an equivariant neural network for global rigid motion correction and 2) an encoder-decoder network to correct local deformations. It features a geometric deformation augmenter that 1) enhances the robustness of global motion correction by addressing any local deformations, and 2) generates augmented data to improve the training process. UniMo is a hybrid model that uses both image intensities and shapes to achieve robust performance amid image appearance variations, and, therefore, it generalizes well to various medical imaging modalities without a need for network retraining. We trained and tested UniMo to track motion in fetal magnetic resonance imaging. Then we tested the trained model, without retraining, on various image modalities from three public datasets, including MedMNIST, lung CT, and BraTS. The results show that UniMo surpassed existing motion correction methods in terms of accuracy, and, notably, it enabled one-time training on a single modality while maintaining high stability and adaptability for inference across multiple unseen imaging datasets. By offering a unified solution, UniMo marks a significant advantage in challenging applications with a mixture of bulk motion and local deformations. https://github.com/IntelligentImaging/UNIMO

[461] CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction

Suyi Chen, Haibin Ling

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As a critical modality for structural biology, cryogenic electron microscopy (cryo-EM) facilitates the determination of macromolecular structures at near-atomic resolution. The core computational task in single-particle cryo-EM is to reconstruct the 3D electrostatic potential of a molecule from noisy 2D projections acquired at unknown orientations. Gaussian mixture models (GMMs) provide a continuous, compact, and physically interpretable representation for molecular density and have recently gained interest in cryo-EM reconstruction. However, existing methods rely on external consensus maps or atomic models for initialization, limiting their use in self-contained pipelines. In parallel, differentiable rendering techniques such as Gaussian splatting have demonstrated remarkable scalability and efficiency for volumetric representations, suggesting a natural fit for GMM-based cryo-EM reconstruction. However, off-the-shelf Gaussian splatting methods are designed for photorealistic view synthesis and remain incompatible with cryo-EM due to mismatches in the image formation physics, reconstruction objectives, and coordinate systems. Addressing these issues, we propose cryoSplat, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation. In particular, we develop an orthogonal projection-aware Gaussian splatting, with adaptations such as a view-dependent normalization term and FFT-aligned coordinate system tailored for cryo-EM imaging. These innovations enable stable and efficient homogeneous reconstruction directly from raw cryo-EM particle images using random initialization. Experimental results on real datasets validate the effectiveness and robustness of cryoSplat over representative baselines. The code will be released at https://github.com/Chen-Suyi/cryosplat.

[462] Brain MR Image Synthesis with 3D Multi-Contrast Self-Attention GAN

Zaid A. Abod, Furqan Aziz

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Complete and high-quality multi-modal Magnetic Resonance Imaging (MRI) is essential for accurate neuro-oncological assessment, as each contrast provides complementary anatomical and pathological information. However, acquiring all modalities (e.g., T1c, T1n, T2w, T2f) for every patient is often impractical due to prolonged scan times, cost, and patient discomfort, potentially limiting comprehensive tumour evaluation. We propose 3D-MC-SAGAN (3D Multi-Contrast Self-Attention Generative Adversarial Network), a unified 3D multi-contrast synthesis framework that generates high-fidelity missing modalities from a single T2w input while explicitly preserving tumour characteristics. The model employs a multi-scale 3D encoder–decoder generator with residual connections and a novel Memory-Bounded Hybrid Attention (MBHA) block to capture long-range dependencies efficiently, and is trained with a WGAN-GP critic and an auxiliary domain classification head to produce T2f, T1n, and T1c volumes within a unified network. To ensure anatomical and pathological fidelity, we incorporate a frozen 3D U-Net-based segmentation network that enforces a tumour-consistency constraint during training. A composite objective combining adversarial, reconstruction, perceptual, structural similarity, contrast-classification, and segmentation-guided losses further promotes both global realism and tumour-preserving structure. Extensive experiments on 3D brain MRI datasets demonstrate that 3D-MC-SAGAN achieves state-of-the-art quantitative performance and produces visually coherent, anatomically plausible contrasts with improved distributional realism. Importantly, the proposed method maintains tumour segmentation accuracy comparable to that achieved using fully acquired multi-modal inputs, highlighting its potential to reduce acquisition burden while preserving clinically meaningful information.

[463] Conditional Diffusion Posterior Alignment for Sparse-View CT Reconstruction

Luis Barba, Johannes Kirschner, Benjamin Bejar

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Computed Tomography (CT) is a widely used imaging modality in medical and industrial applications. To limit radiation exposure and measurement time, there is a growing interest in sparse-view CT, where the number of projection views is significantly reduced. Deep neural networks have shown great promise in improving reconstruction quality in sparse-view CT, especially generative diffusion models. However, these methods struggle to scale to large 3D volumes due to several reasons: (i) the high memory and computational requirements of 3D models, (ii) the lack of large 3D training datasets, and (iii) the inconsistencies across slices when using 2D models independently on each slice. We overcome these limitations and scale diffusion-based sparse-view CT reconstruction to large 3D volumes by combining conditional diffusion with explicit data consistency. We propose Conditional Diffusion Posterior Alignment (CDPA) to enable scalable 3D sparse-view CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to improve inter-slice consistency, combined with data-consistency alignment to match measured projections. Experiments on synthetic and real Cone Beam CT (CBCT) data show state-of-the-art performance, with ablations that confirm the synergistic effects of the proposed pipeline. Finally, we show that the same principles also strengthen fast denoising U-Nets, yielding near-diffusion quality at a fraction of the computational cost.

Today’s Research Highlights

Table of Contents

cs.CL

[1] Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment

[2] NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

[3] How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses

[4] ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts

[5] Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

[6] Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

[7] Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

[8] Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

[9] RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

[10] Confidence Estimation in Automatic Short Answer Grading with LLMs

[11] Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

[12] Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

[13] Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory

[14] Lost in State Space: Probing Frozen Mamba Representations

[15] Retrieval-Augmented Reasoning for Chartered Accountancy

[16] How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework

[17] Are You the A-hole? A Fair, Multi-Perspective Ethical Reasoning Framework

[18] What Don’t You Understand? Using Large Language Models to Identify and Characterize Student Misconceptions About Challenging Topics

[19] Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

[20] Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

[21] Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

[22] Budget-Aware Routing for Long Clinical Text

[23] Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

[24] MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

[25] From Backward Spreading to Forward Replay: Revisiting Target Construction in LLM Parameter Editing

[26] Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

[27] Language-free Experience at Expo 2025 Osaka

[28] Agentic AI for Substance Use Education: Integrating Regulatory and Scientific Knowledge Sources

[29] Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines

[30] RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI

[31] Escaping Mode Collapse in LLM Generation via Geometric Regulation

[32] Impact of Task Phrasing on Presumptions in Large Language Models

[33] ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?

[34] Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue

[35] ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks

[36] AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

[37] A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction

[38] Structure Liberates: How Constrained Sensemaking Produces More Novel Research Output

[39] Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus

[40] SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models

[41] H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations

[42] Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

[43] ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

[44] Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

[45] FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

[46] Characterizing the Expressivity of Local Attention in Transformers

[47] Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

[48] When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

[49] Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs

[50] Reinforcement Learning for LLM Post-Training: A Survey

[51] Bias in Large Language Models: Origin, Evaluation, and Mitigation

[52] Representation in large language models

[53] Exploring the System 1 Thinking Capability of Large Reasoning Models

[54] SCAN: Structured Capability Assessment and Navigation for LLMs

[55] Lightweight Domain Adaptation of a Large Language Model for Legal Assistance in the Indian Context

[56] ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

[57] ToolGrad: Efficient Tool-use Dataset Generation with Textual “Gradients”

[58] InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

[59] Reasoning-Intensive Regression

[60] Structured In-context Environment Scaling for Large Language Model Reasoning

[61] Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

[62] Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

[63] ADVICE: Answer-Dependent Verbalized Confidence Estimation

[64] Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

[65] PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

[66] Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices

[67] Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

[68] Reward Modeling from Natural Language Human Feedback

[69] Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

[70] Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

[71] Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation

[72] Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization

[73] Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

[74] Language Models Struggle to Use Representations Learned In-Context

[75] Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

[76] BanglaSocialBench: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

[77] Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework