Today’s Research Highlights

AI-enhanced summaries of the latest research papers from arXiv.

cs.CL [Total: 139]
cs.CV [Total: 149]
cs.AI [Total: 127]
cs.SD [Total: 9]
cs.LG [Total: 123]
cs.MA [Total: 5]
cs.MM [Total: 4]
eess.AS [Total: 6]
eess.IV [Total: 5]

Abstract: Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded routing mechanism for precise business attribution and a multi-dimensional noise reduction pipeline that integrates domain knowledge, statistical patterns, and behavioral filtering. Deployed in a production environment handling a peak throughput of over 2,000 messages per minute and 300,000 messages per day, TingIS achieves a P90 alert latency of 3.5 minutes and a 95% discovery rate for high-priority incidents. Benchmarks constructed from real-world data demonstrate that TingIS significantly outperforms baseline methods in routing accuracy, clustering quality, and Signal-to-Noise Ratio.

Weixi Tong, Yifeng Di, Tianyi Zhang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website’s structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a limited budget. We propose Mango, a multi-agent web navigation method that leverages the website structure to dynamically determine optimal starting points. We formulate URL selection as a multi-armed bandit problem and employ Thompson Sampling to adaptively allocate the navigation budget across candidate URLs. Furthermore, we introduce an episodic memory component to store navigation history, enabling the agent to learn from previous attempts. Experiments on WebVoyager demonstrate that Mango achieves a success rate of 63.6% when using GPT-5-mini, outperforming the best baseline by 7.3%. Furthermore, on WebWalkerQA, Mango attains a 52.5% success rate, surpassing the best baseline by 26.8%. We also demonstrate the generalizability of Mango using both open-source and closed-source models as backbones. Our data and code are open-source and available at https://github.com/VichyTong/Mango.

[62] EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development have limitations, including limited coverage of event types in closed-domain settings and a lack of large, manually verified dataset in open-domain settings. To address these limitations, we create EVENT5Ws , a large, manually annotated, and statistically verified open-domain event extraction dataset. We design a systematic annotation pipeline to create the dataset and provide empirical insights into annotation complexity. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that models trained on EVENT5Ws generalize effectively to datasets from different geographical contexts, which demonstrates its potential for developing generalizable algorithms. Finally, we summarize the lessons learned during the dataset development and provide recommendations to support future large-scale dataset development.

[63] Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach

Flávio Soriano, Victoria F. Mello, Pedro B. Rigueira, Gisele L. Pappa, Wagner Meira, Ana Paula Couto da Silva, Jussara M. Almeida

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Analyses of legislative behavior often rely on voting records, overlooking the rich semantic and rhetorical content of political speech. In this paper, we ask three complementary questions about parliamentary discourse: how things are said, what is being said, and who is speaking in discursively similar ways. To answer these questions, we introduce a scalable and generalizable computational framework that combines diachronic stylometric analysis, contextual topic modeling, and semantic clustering of deputies’ speeches. We apply this framework to a large-scale case study of the Brazilian Chamber of Deputies, using a corpus of over 450,000 speeches from 2003 to 2025. Our results show a long-term stylistic shift toward shorter and more direct speeches, a legislative agenda that reorients sharply in response to national crises, and a granular map of discursive alignments in which regional and gender identities often prove more salient than formal party affiliation. More broadly, this work offers a robust methodology for analyzing parliamentary discourse as a multidimensional phenomenon that complements traditional vote-based approaches.

[64] GiVA: Gradient-Informed Bases for Vector-Based Adaptation

Neeraj Gangwar, Rishabh Deshmukh, Michael Shavlovsky, Hancao Li, Vivek Mittal, Lexing Ying, Nickvash Kani

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As model sizes continue to grow, parameter-efficient fine-tuning has emerged as a powerful alternative to full fine-tuning. While LoRA is widely adopted among these methods, recent research has explored vector-based adaptation methods due to their extreme parameter efficiency. However, these methods typically require substantially higher ranks than LoRA to match its performance, leading to increased training costs. This work introduces GiVA, a gradient-based initialization strategy for vector-based adaptation. It achieves training times comparable to LoRA and maintains the extreme parameter efficiency of vector-based adaptation. We evaluate GiVA across diverse benchmarks, including natural language understanding, natural language generation, and image classification. Experiments show that our approach consistently outperforms or achieves performance competitive with existing vector-based adaptation methods and LoRA while reducing rank requirements by a factor of eight ($8\times$).

[65] MathDuels: Evaluating LLMs as Problem Posers and Solvers

Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy dual roles: each authors math problems under adversarial prompting and solves problems authored by every other participant. Problems are produced through a three-stage generation pipeline (meta-prompting, problem generation, and difficulty amplification), and validated by an independent verifier that excludes ill-posed questions. A Rasch model (Rasch, 1993) jointly estimates solver abilities and problem difficulties; author quality is derived from the difficulties of each model’s authored problems. Experiments across 19 frontier models reveal that authoring and solving capabilities are partially decoupled, and that dual-role evaluation reveals capability separations invisible in single-role benchmarks. As newer models enter the arena, they produce problems that defeat previously dominant solvers, so the benchmark’s difficulty co-evolves with participant strength rather than saturating at a fixed ceiling. We host a public leaderboard that updates as new models are released.

[66] Evaluation of Automatic Speech Recognition Using Generative Large Language Models

Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their relevance through three approaches: (1) selecting the best hypothesis between two candidates, (2) computing semantic distance using generative embeddings, and (3) qualitative classification of errors. On the HATS dataset, the best LLMs achieve 92–94% agreement with human annotators for hypothesis selection, compared to 63% for WER, also outperforming semantic metrics. Embeddings from decoder-based LLMs show performance comparable to encoder models. Finally, LLMs offer a promising direction for interpretable and semantic ASR evaluation.

[67] Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.

[68] Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Divyanshu Aggarwal, Sankarshan Damle, Navin Goyal, Satya Lokam, Sunayana Sitaram

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model’s performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks with varying data distributions and time shifts. This paper focuses on the language adaptability of LLMs through CFT. We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM from Phase 1 (predominantly Task Ability) is sequentially fine-tuned on a multilingual dataset – comprising task data in new languages – in Phase 2 (predominantly Language Ability). We observe that the ``similarity’’ of Phase 2 tasks with Phase 1 determines the LLM’s adaptability. For similar phase-wise datasets, the LLM after Phase 2 does not show deterioration in task ability. In contrast, when the phase-wise datasets are not similar, the LLM’s task ability deteriorates. We test our hypothesis on the open-source \mis\ and \llm\ models with multiple phase-wise dataset pairs. To address the deterioration, we analyze tailored variants of two CFT methods: layer freezing and generative replay. Our findings demonstrate their effectiveness in enhancing the language ability of LLMs while preserving task performance, in comparison to relevant baselines.

[69] Federated Co-tuning Framework for Large and Small Language Models

Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Shuoling Liu, Kai Chen, Qiang Yang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: By adapting Large Language Models (LLMs) to domain-specific tasks or enriching them with domain-specific knowledge, we can fully harness the capabilities of LLMs. Nonetheless, a gap persists in achieving simultaneous mutual enhancement between the server’s LLM and the downstream clients’ Small Language Models (SLMs). To address this, we propose FedCoLLM, a novel and parameter-efficient federated framework designed for co-tuning LLMs and SLMs. This approach is aimed at adaptively transferring server-side LLMs knowledge to clients’ SLMs while simultaneously enriching the LLMs with domain insights from the clients. To accomplish this, FedCoLLM utilizes lightweight adapters in conjunction with SLMs, facilitating knowledge exchange between server and clients in a manner that respects data privacy while also minimizing computational and communication overhead. Our evaluation of FedCoLLM, utilizing various public LLMs and SLMs across a range of NLP text generation tasks, reveals that the performance of clients’ SLMs experiences notable improvements with the assistance of the LLMs. Simultaneously, the LLMs enhanced via FedCoLLM achieves comparable performance to that obtained through direct fine-tuning on clients’ data. Our code has been contributed to the FATE open-source project and is now publicly accessible at https://github.com/FederatedAI/FATE-LLM/tree/main/python/fate_llm/algo/fedcollm.

[70] SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Fine-tuning large language models (LLMs) is a common practice to adapt generalist models to specialized domains. However, recent studies show that fine-tuning can erode safety alignment, causing LLMs to respond to harmful or unethical prompts. Many methods to realign safety have been proposed, but often introduce custom algorithms that are difficult to implement or compromise task utility. In this work, we propose SafeMERGE, a lightweight, post-fine-tuning framework that restores safety while maintaining downstream performance. SafeMERGE selectively merges fine-tuned with safety-aligned model layers only when they deviate from safe behavior, measured by a cosine similarity criterion. Across four LLMs and several tasks, SafeMERGE consistently reduces harmful outputs compared to other defenses, with negligible or even positive impact on utility. Our results demonstrate that selective, layer-wise merging offers a robust safeguard against the inadvertent loss of safety during fine-tuning, establishing SafeMERGE as a simple yet effective post-fine-tuning defense.

[71] XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration

Nuo Chen, Andre Lin HuiKai, Jiaying Wu, Junyi Hou, Zining Zhang, Qian Wang, Xidong Wang, Bingsheng He

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited in supporting high-quality scientific writing. Most existing systems are designed for general-purpose scientific text generation and fail to meet the sophisticated demands of research communication beyond surface-level polishing, for example, maintaining conceptual coherence across sections. Furthermore, academic writing is inherently iterative and revision-driven, a process that is not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision, centered on criteria-guided intent alignment and context-aware modeling. To validate the framework, we curate a dataset of 7,000 research papers from top-tier venues, annotated with 140,000 instruction–response pairs that reflect realistic, section-level scientific revisions. We instantiate the framework in XtraGPT, the first suite of open-source LLMs (1.5B to 14B parameters) specifically fine-tuned for context-aware academic paper revision. Extensive experiments show that XtraGPT significantly outperforms same-scale baselines and rivals the quality of proprietary counterparts. Both automated preference assessments and human evaluations confirm the effectiveness of XtraGPT in improving scientific drafts. Our code and models are available at https://github.com/Xtra-Computing/XtraGPT and https://huggingface.co/collections/Xtra-Computing/xtragpt.

[72] Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Despite substantial advancements in aligning large language models (LLMs) with human values, current safety mechanisms remain susceptible to jailbreak attacks. We hypothesize that this vulnerability stems from distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, we introduce LogiBreak, a novel and universal black-box jailbreak method that leverages logical expression translation to circumvent LLM safety systems. By converting harmful natural language prompts into formal logical expressions, LogiBreak exploits the distributional gap between alignment data and logic-based inputs, preserving the underlying semantic intent and readability while evading safety constraints. We evaluate LogiBreak on a multilingual jailbreak dataset spanning three languages, demonstrating its effectiveness across various evaluation settings and linguistic contexts.

[73] It’s High Time: A Survey of Temporal Question Answering

Bhawna Piryani, Abdelrahman Abdallah, Jamshid Mozafari, Avishek Anand, Adam Jatowt

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Time plays a critical role in how information is generated, retrieved, and interpreted. In this survey, we provide a comprehensive overview of Temporal Question Answering (TQA), a research area that focuses on answering questions involving temporal constraints or context. As time-stamped content from sources like news articles, web archives, and knowledge bases continues to grow, TQA systems must address challenges such as detecting temporal intent, normalizing time expressions, ordering events, and reasoning over evolving or ambiguous facts. We organize existing work through a unified perspective that captures the interaction between corpus temporality, question temporality, and model capabilities, enabling a systematic comparison of datasets, tasks, and approaches. We review recent advances in TQA enabled by neural architectures, especially transformer-based models and Large Language Models (LLMs), highlighting progress in temporal language modeling, retrieval-augmented generation (RAG), and temporal reasoning. We also discuss benchmark datasets and evaluation strategies designed to test temporal robustness,

[74] RewardBench 2: Advancing Reward Model Evaluation

Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, reasoning, safety, and more domains. The community has begun establishing best practices for evaluating reward models, from the development of benchmarks that test capabilities in specific skill areas to others that test agreement with human preferences. At the same time, progress in evaluation has not been mirrored by the effectiveness of reward models in downstream tasks – simpler direct alignment algorithms are reported to work better in many cases. This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation – models score about 20 points on average lower on RewardBench 2 compared to the first RewardBench – while being highly correlated with downstream performance. Compared to most other benchmarks, RewardBench 2 sources new human prompts instead of existing prompts from downstream evaluations, facilitating more rigorous evaluation practices. In this paper, we describe our benchmark construction process and report how existing models perform on it, while quantifying how performance on the benchmark correlates with downstream use of the models in both inference-time scaling algorithms, like best-of-N sampling, and RLHF training algorithms like proximal policy optimization.

[75] Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

Eva Vanmassenhove

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multilingual Large Language Models considerably changed how technologies influence language. While previous technologies could mediate or assist humans, there is now a tendency to offload the task of writing itself to these technologies, enabling models to change our languages more directly. While they provide us quick access to information and impressively fluent output, beneath their (apparent) sophistication lies a subtle, insidious threat: the gradual decline and loss of linguistic diversity. In this position paper, I explore how model collapse, with a particular focus on translation technology, can lead to the loss of linguistic forms, grammatical features, and cultural nuance. Model collapse refers to the consequences of self-consuming training loops, where automatically generated data (re-)enters the training data, leading to a gradual distortion of the data distribution and the underrepresentation of low-probability linguistic phenomena. Drawing on recent work in Computer Vision, Natural Language Processing and Machine Translation, I argue that the many tails of our linguistic distributions might be vanishing, and with them, the narratives and identities they carry. This paper is a call to resist linguistic flattening and to reimagine Natural Language Processing as a field that encourages, values and protects expressive multilingual diversity and creativity.

[76] Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models

Gaurav Srivastava, Aafiya Hussain, Sriram Srinivasan, Xuan Wang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) achieve impressive performance on complex mathematical benchmarks yet sometimes fail on basic math reasoning while generating unnecessarily verbose responses. In this paper, we present LLMThinkBench, a systematic benchmark and comprehensive empirical study to evaluate the efficiency of reasoning in LLMs, focusing on the fundamental tradeoff between accuracy and overthinking. First, we formalize the accuracy-verbosity tradeoff. Second, we introduce the Overthinking Score, a harmonic-mean metric combining accuracy and token-efficiency for holistic model evaluation. Third, we establish an evaluation protocol with dynamically-generated data across 14 basic math tasks. Fourth, we conduct a large-scale empirical study evaluating 53 LLMs, including reasoning and quantized variants across different reasoning budgets. Fifth, we release LLMThinkBench as an open-source Python package and public leaderboard for reproducibility. Our findings reveal: 1) model performance on complex benchmarks does not translate directly to basic math reasoning; 2) reasoning models generate ~18x more tokens while sometimes achieving lower accuracy and exhibit catastrophic collapse when tokens are constrained, dropping by up to ~36%; 3) the accuracy-verbosity relationship is non-monotonic with extended reasoning budgets yielding diminishing returns (GPT-5/o-series models show zero accuracy gain from low -> medium -> high reasoning effort). Our findings challenge the assumption that longer reasoning in LLMs necessarily improves mathematical reasoning. Our public leaderboard is available at https://ctrl-gaurav.github.io/LLMThinkBench/. Our open-source Python package is available at https://pypi.org/project/llmthinkbench/, and the codebase can be found at https://github.com/ctrl-gaurav/LLMThinkBench for easy and reproducible evaluation.

[77] EduCoder: An Open-Source Annotation System for Education Transcript Data

Guanzhong Pan, Mei Tan, Hyunji Nam, Lucía Langlois, James Malamut, Liliana Deonizio, Dorottya Demszky

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We introduce EduCoder, a domain-specialized tool designed to support utterance-level annotation of educational dialogue. While general-purpose text annotation tools for NLP and qualitative research abound, few address the complexities of coding education dialogue transcripts – with diverse teacher-student and peer interactions. Common challenges include defining codebooks for complex pedagogical features, supporting both open-ended and categorical coding, and contextualizing utterances with external features, such as the lesson’s purpose and the pedagogical value of the instruction. EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials. Additionally, it offers a side-by-side comparison of multiple annotators’ responses, allowing comparison and calibration of annotations with others to improve data reliability. The system is open-source, with a demo video available.

[78] Aligning Language Models with Real-time Knowledge Editing

Chenming Tang, Yutong Yang, Kexue Wang, Yunfang Wu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Knowledge editing aims to modify outdated knowledge in language models efficiently while retaining their original capabilities. Mainstream datasets for knowledge editing are predominantly static and fail to keep in pace with the evolving real-world knowledge. In this work, we introduce CRAFT, an ever-evolving real-world dataset for knowledge editing. It evaluates models on temporal locality, common-sense locality, composite portability and alias portability, providing a comprehensive and challenging evaluation for knowledge editing, on which previous methods hardly achieve balanced performance. Towards flexible real-time knowledge editing, we propose KEDAS, a novel paradigm of knowledge editing alignment featuring diverse edit augmentation and self-adaptive post-alignment inference, exhibiting significant performance gain on both CRAFT and traditional datasets compared to previous methods. We hope this work may serve as a catalyst for shifting the focus of knowledge editing from static update to dynamic evolution.

[79] Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs

Norman Paulsen

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language model (LLM) providers boast big numbers for maximum context window sizes. To test the real world use of context windows, we 1) define a concept of maximum effective context window, 2) formulate a testing method of a context window’s effectiveness over various sizes and problem types, and 3) create a standardized way to compare model efficacy for increasingly larger context window sizes to find the point of failure. We collected hundreds of thousands of data points across several models and found significant differences between reported Maximum Context Window (MCW) size and Maximum Effective Context Window (MECW) size. Our findings show that the MECW is, not only, drastically different from the MCW but also shifts based on the problem type. A few top of the line models in our test group failed with as little as 100 tokens in context; most had severe degradation in accuracy by 1000 tokens in context. All models fell far short of their Maximum Context Window by as much as 99 percent. Our data reveals the Maximum Effective Context Window shifts based on the type of problem provided, offering clear and actionable insights into how to improve model accuracy and decrease model hallucination rates.

[80] Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

Yucheng Wang, Yifan Hou, Aydin Javadov, Mubashara Akhtar, Mrinmaya Sachan

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multimodal large language models (MLLMs) promise enhanced reasoning by integrating diverse inputs such as text, vision, and audio. Yet cross-modal reasoning remains underexplored, with conflicting reports on whether added modalities help or harm performance. These inconsistencies stem from a lack of controlled evaluation frameworks and analysis of models’ internals to isolate when and why modality interactions support or undermine reasoning. We address this gap through a logic-grounded evaluation framework that categorizes multimodal reasoning into six interaction patterns, varying how facts are distributed across modalities and logically combined. Empirically, additional modalities enhance reasoning only when they provide independent and sufficient reasoning paths, while redundant or chained entailment support often hurts performance. Moreover, reasoning degrades in three systematic ways: weaker modalities drag down overall performance, conflicts bias preference toward certain modalities, and joint signals from different modalities fail to be integrated effectively. Therefore, we identify two core failures: task-composition bottleneck, where recognition and reasoning cannot be jointly executed in one pass, and fusion bottleneck, where early integration introduces bias. For further investigation, we find that attention patterns fail to encode fact usefulness, but a simple two-step prompting (recognize then reason) restores performance, confirming the task-composition bottleneck. Moreover, modality identity remains recoverable in early layers, and softening attention in early fusion improves reasoning, highlighting biased fusion as another failure mode. Overall, our findings show that integration, not perception, is the main barrier to multimodal reasoning, suggesting composition-aware training and early fusion control as promising directions.

[81] ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations

Yindong Wang, Martin Preiß, Margarita Bugueño, Jan Vincent Hoffbauer, Abdullatif Ghajar, Tolga Buz, Gerard de Melo

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.25868: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.25868&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[82] RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection

Yejin Lee, Hyeseon Ahn, Yo-Sub Han

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.10971: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.10971&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[83] Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling

Peng Kuang, Yanli Wang, Xiaoyu Han, Yaowenqi Liu, Kaidi Xu, Haohan Wang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.13918: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.13918&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[84] Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

Xinzhuo Li, Adheesh Juvekar, Jiaxun Zhang, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Yifan Shen, Tianjiao Yu, Ismini Lourentzou

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.21546: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.21546&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[85] Capabilities and Evaluation Biases of Large Language Models in Classical Chinese Poetry Generation: A Case Study on Tang Poetry

Bolei Ma, Yina Yao, Anna-Carolina Haensch

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.15313: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.15313&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[86] Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards

Ming Li

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.18731: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.18731&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[87] Fairness Evaluation and Inference Level Mitigation in LLMs

Afrozah Nadeem, Mark Dras, Usman Naseem

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.18914: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.18914&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[88] RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA

Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.20505: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.20505&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[89] SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding

Yiqiao Jin, Rachneet Kaur, Zhen Zeng, Sumitra Ganesh, Srijan Kumar

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.26615: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.26615&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[90] Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production

Shiva Upadhye, Richard Futrell

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.07752: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.07752&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[91] Identifying Bias in Machine-generated Text Detection

Kevin Stowe, Svetlana Afanaseva, Rodolfo Raimundo, Yitao Sun, Kailash Patil

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.09292: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.09292&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[92] Schoenfeld’s Anatomy of Mathematical Reasoning by Language Models

Ming Li, Chenrui Fan, Yize Cheng, Soheil Feizi, Tianyi Zhou

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.19995: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.19995&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[93] STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning

Juntong Ni, Shiyu Wang, Qi He, Ming Jin, Wei Jin

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.03248: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.03248&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[94] Hán Dān Xué Bù (Mimicry) or Qīng Chū Yú Lán (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models

Yueqing Hu, Xinyang Peng, Shuting Peng, Hanqi Wang, Tianhong Wang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.05019: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.05019&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[95] Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection

Minghui Jia, Qichao Zhang, Ali Luo, Linjing Li, Shuo Ye, Hailing Lu, Wen Hou, Dongbin Zhao

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.06498: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.06498&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[96] SocraticKG: Knowledge Graph Construction via QA-Driven Fact Extraction

Sanghyeok Choi, Woosang Jeon, Kyuseok Yang, Taehyeong Kim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.10003: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.10003&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[97] Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning

Yue Guo, Fanfu Wang, Jianwei Lv, Xincheng Shi, Yuchen Li, Youya Wang, Yunsheng Zeng, Yujing Liu, Yunhao Qiao, Gen Li, Junfeng Wang, Bo Yuan

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.13690: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.13690&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[98] GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Lotta Kiefer, Christoph Leiter, Sotaro Takeshita, Elena Schmidt, Steffen Eger

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.13711: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.13711&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[99] Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations

Abhinav Gupta, Toben H. Mintz, Jesse Thomason

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00469: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00469&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[100] DMAP: A Distribution Map for Text

Tom Kempton, Julia Rozanova, Parameswaran Kamalaruban, Maeve Madigan, Karolina Wresilo, Yoann L. Launay, David Sutton, Stuart Burrell

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.11871: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.11871&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[101] Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning

Zhiwen You, Xi Chen, Aniket Vashishtha, Simo Du, Gabriel Erion-Barner, Hongyuan Mei, Hao Peng, Yue Guo

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.27820: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.27820&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[102] Survey on Evaluation of LLM-based Agents

Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, Michal Shmueli-Scheuer

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.16416: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.16416&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[103] Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.28342: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.28342&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[104] AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning

Yuanfu Sun, Kang Li, Dongzhe Fan, Jiajin Liu, Qiaoyu Tan

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.05846: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.05846&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[105] Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Bowen Zuo, Yinglun Zhu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.12721: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.12721&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[106] Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10072: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10072&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[107] Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10079: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10079&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[108] Secure LLM Fine-Tuning via Safety-Aware Probing

Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Xiaokun Luan, Meng Sun

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.16737: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.16737&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[109] Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning

Xinyi Huang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10516: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10516&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[110] Does Welsh media need a review? Detecting bias in Nation.Cymru’s political reporting

Cai Parry-Jones

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17628: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17628&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[111] Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

Tim Z. Xiao, Johannes Zenn, Zhen Liu, Weiyang Liu, Robert Bamler, Bernhard Schölkopf

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.09998: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.09998&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[112] MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

Sua Lee, Sanghee Park, Jinbae Im

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.18164: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.18164&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[113] ReProbe: Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models

Jingwei Ni, Ekaterina Fadeeva, Tianyi Wu, Mubashara Akhtar, Jiaheng Zhang, Elliott Ash, Markus Leippold, Timothy Baldwin, See-Kiong Ng, Artem Shelmanov, Mrinmaya Sachan

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.06209: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.06209&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[114] Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models

Kihyuk Lee

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19598: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19598&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[115] Tracing Relational Knowledge Recall in Large Language Models

Nicholas Popovič, Michael Färber

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19934: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19934&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[116] Surrogate modeling for interpreting black-box LLMs in medical predictions

Changho Han, Songsoo Kim, Dong Won Kim, Leo Anthony Celi, Jaewoong Kim, SungA Bae, Dukyong Yoon

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20331: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20331&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[117] CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Zhenpeng Su, Leiyu Pan, Minxuan Lv, Yuntao Li, Wenping Hu, Fuzheng Zhang, Kun Gai, Guorui Zhou

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.20712: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.20712&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[118] AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Dongrui Liu, Qihan Ren, Chen Qian, Shuai Shao, Yuejin Xie, Yu Li, Zhonghao Yang, Haoyu Luo, Peng Wang, Qingyu Liu, Binxin Hu, Ling Tang, Jilin Mei, Dadi Guo, Leitao Yuan, Junyao Yang, Guanxu Chen, Qihao Lin, Yi Yu, Bo Zhang, Jiaxuan Guo, Jie Zhang, Wenqi Shao, Huiqi Deng, Zhiheng Xi, Wenjie Wang, Wenxuan Wang, Wen Shen, Zhikai Chen, Haoyu Xie, Jialing Tao, Juntao Dai, Jiaming Ji, Zhongjie Ba, Linfeng Zhang, Yong Liu, Quanshi Zhang, Lei Zhu, Zhihua Wei, Hui Xue, Chaochao Lu, Jing Shao, Xia Hu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.18491: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.18491&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[119] Knowledge Capsules: Structured Nonparametric Memory Units for LLMs

Bin Ju, Shenfeng Weng, Danying Zhou, Rongkai Xu, Kunkai Su

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20487: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20487&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[120] Intersectional Fairness in Large Language Models

Chaima Boufaied, Ronnie De Souza Santos, Ann Barcomb

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20677: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20677&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[121] Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

Mohamed Hesham Elganayni, Runsheng Chen, Sebastian Nagl, Matthias Grabmair

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20726: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20726&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[122] Working Memory Constraints Scaffold Learning in Transformers under Data Scarcity

Pranava Madhyastha, Dagmar Adamcova

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20789: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20789&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[123] Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

Zhaofeng Wu, Shiqi Wang, Boya Peng, Anuj Goyal, Melanie Kambadur, Sebastian Ruder, Yoon Kim, Chloe Bi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20835: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20835&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[124] Fake or Real, Can Robots Tell? Evaluating VLM Robustness to Domain Shift in Single-View Robotic Scene Understanding

Federico Tavella, Amber Drinkwater, Angelo Cangelosi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.19579: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.19579&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[125] BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation

Liang Ye, Shengqin Chen, Jiazhu Dai

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.20792: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.20792&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[126] Unveiling Unicode’s Unseen Underpinnings in Undermining Authorship Attribution

Robert Dilworth

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.15840: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.15840&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[127] From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation

KaiWen Wei, Kejun He, Xiaomian Kang, Jie Zhang, Yuming Yang, Li Jin, Zhenyang Li, Jiang Zhong, He Bai, Junnan Zhu

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.23649: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.23649&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[128] Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

Robert Dilworth

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.03465: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.03465&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[129] Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Zhenpeng Su, Leiyu Pan, Minxuan Lv, Tiehua Mei, Zijia Lin, Yuntao Li, Wenping Hu, Ruiming Tang, Kun Gai, Guorui Zhou

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.05591: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.05591&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[130] StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

Robert Dilworth

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.09056: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.09056&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Syed Mehtab Hussain Shah, Frank Hopfgartner, Arnim Bleier

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.08561: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.08561&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[132] Learning State-Tracking from Code Using Linear RNNs

Julien Siems, Riccardo Grazzi, Kirill Kalinin, Hitesh Ballani, Babak Rahmani

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.14814: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.14814&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[133] Intent Laundering: AI Safety Datasets Are Not What They Seem

Shahriar Golchin, Marc Wetter

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.16729: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.16729&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[134] Crystal: Characterizing Relative Impact of Scholarly Publications

Hannah Collison, Benjamin Van Durme, Daniel Khashabi

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.26791: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.26791&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[135] SODA: Semi On-Policy Black-Box Distillation for Large Language Models

Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.03873: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.03873&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[136] Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution

Robert Dilworth

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10271: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10271&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[137] LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Jiaqi Weng, Jialing Tao, Hui Xue, Hongning Wang, Han Qiu, Minlie Huang

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.12710: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.12710&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[138] UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

Tobias Schimanski, Stefanie Lewandowski, Christian Woerle, Nicola Reichenau, Yauheni Huryn, Markus Leippold

Main category: cs.CL

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15827: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15827&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[139] MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

Markus Knauer, Edoardo Fiorini, Maximilian Mühlbauer, Stefan Schneyer, Promwat Angsuratanawech, Florian Samuel Lay, Timo Bachmann, Samuel Bustamante, Korbinian Nottensteiner, Freek Stulp, Alin Albu-Schäffer, João Silvério, Thomas Eiband

Naga VS Raviteja Chappa, Evangelos Sariyanidi, Lisa Yankowitz, Gokul Nair, Casey J. Zampella, Robert T. Schultz, Birkan Tunç

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Micro-actions are subtle, localized movements lasting 1-3 seconds such as scratching one’s head or tapping fingers. Such subtle actions are essential for social communication, ubiquitously used in natural interactions, and thus critical for fine-grained video understanding, yet remain poorly understood by current computer vision systems. We identify a fundamental challenge: micro-actions exhibit diverse spatio-temporal characteristics where some are defined by spatial configurations while others manifest through temporal dynamics. Existing methods that commit to a single spatio-temporal decomposition cannot accommodate this diversity. We propose a dual-path network that processes anatomically-grounded spatial entities through parallel Spatial-Temporal (ST) and Temporal-Spatial (TS) pathways. The ST path captures spatial configurations before modeling temporal dynamics, while the TS path inverts this order to prioritize temporal dynamics. Rather than fixed fusion, we introduce entity-level adaptive routing where each body part learns its optimal processing preference, complemented by Mutual Action Consistency (MAC) loss that enforces cross-path coherence. Extensive experiments demonstrate competitive performance on MA-52 dataset and state-of-the-art results on iMiGUE dataset. Our work reveals that architectural adaptation to the inherent complexity of micro-actions is essential for advancing fine-grained video understanding.

Dahun Kim, Ganesh Satish Mallya, Anelia Angelova

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-spectral imagery is a valuable input signal for Remote Sensing applications, such as land-use and land-cover classification and environmental monitoring. However, generalist Large Multi-modal Models (LMMs) are typically trained on RGB images, limiting their applicability to the RGB domain. At the same time, training multi-spectral multi-modal models is expensive and produces uniquely specialized models. To address this, we propose a novel training-free approach that introduces multi-spectral data within the inference pipeline of standard RGB-only LMMs, allowing large gains in performance. Our approach leverages the LMMs’ understanding of the visual space by adapting non-RGB inputs to that space and injecting domain-specific information and Chain-of-Thought reasoning as instructions. We demonstrate this with the Gemini 2.5 model and observe strong Zero-Shot performance gains on popular Remote Sensing benchmarks. These results highlight the potential for geospatial professionals to leverage powerful generalist models for specialized sensor inputs, benefiting from rich reasoning capabilities grounded in specialized data.

[145] Discriminative-Generative Synergy for Occlusion Robust 3D Human Mesh Recovery

Yang Liu, Zhiyong Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D human mesh recovery from monocular RGB images aims to estimate anatomically plausible 3D human models for downstream applications, but remains challenging under partial or severe occlusions. Regression-based methods are efficient yet often produce implausible or inaccurate results in unconstrained scenarios, while diffusion-based methods provide strong generative priors for occluded regions but may weaken fidelity to rare poses due to over-reliance on generation. To address these limitations, we propose a brain-inspired synergistic framework that integrates the discriminative power of vision transformers with the generative capability of conditional diffusion models. Specifically, the ViT-based pathway extracts deterministic visual cues from visible regions, while the diffusion-based pathway synthesizes structurally coherent human body representations. To effectively bridge the two pathways, we design a diverse-consistent feature learning module to align discriminative features with generative priors, and a cross-attention multi-level fusion mechanism to enable bidirectional interaction across semantic levels. Experiments on standard benchmarks demonstrate that our method achieves superior performance on key metrics and shows strong robustness in complex real-world scenarios.

[146] Materialistic RIR: Material Conditioned Realistic RIR Generation

Mahnoor Fatima Saad, Sagnik Majumder, Kristen Grauman, Ziad Al-Halah

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Rings like gold, thuds like wood! The sound we hear in a scene is shaped not only by the spatial layout of the environment but also by the materials of the objects and surfaces within it. For instance, a room with wooden walls will produce a different acoustic experience from a room with the same spatial layout but concrete walls. Accurately modeling these effects is essential for applications such as virtual reality, robotics, architectural design, and audio engineering. Yet, existing methods for acoustic modeling often entangle spatial and material influences in correlated representations, which limits user control and reduces the realism of the generated acoustics. In this work, we present a novel approach for material-controlled Room Impulse Response (RIR) generation that explicitly disentangles the effects of spatial and material cues in a scene. Our approach models the RIR using two modules: a spatial module that captures the influence of the spatial layout of the scene, and a material module that modulates this spatial RIR according to a user-specified material configuration. This explicitly disentangled design allows users to easily modify the material configuration of a scene and observe its impact on acoustics without altering the spatial structure or scene content. Our model provides significant improvements over prior approaches on both acoustic-based metrics (up to +16% on RTE) and material-based metrics (up to +70%). Furthermore, through a human perceptual study, we demonstrate the improved realism and material sensitivity of our model compared to the strongest baselines.

[147] Projected Gradient Unlearning for Text-to-Image Diffusion Models: Defending Against Concept Revival Attacks

Aljalila Aladawi, Mohammed Talha Alam, Fakhri Karray

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Machine unlearning for text-to-image diffusion models aims to selectively remove undesirable concepts from pre-trained models without costly retraining. Current unlearning methods share a common weakness: erased concepts return when the model is fine-tuned on downstream data, even when that data is entirely unrelated. We adapt Projected Gradient Unlearning (PGU) from classification to the diffusion domain as a post-hoc hardening step. By constructing a Core Gradient Space (CGS) from the retain concept activations and projecting gradient updates into its orthogonal complement, PGU ensures that subsequent fine-tuning cannot undo the achieved erasure. Applied on top of existing methods (ESD, UCE, Receler), the approach eliminates revival for style concepts and substantially delays it for object concepts, running in roughly 6 minutes versus the ~2 hours required by Meta-Unlearning. PGU and Meta-Unlearning turn out to be complementary: which performs better depends on how the concept is encoded, and retain concept selection should follow visual feature similarity rather than semantic grouping.

[148] Building a Precise Video Language with Human-AI Oversight

Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Video-language models (VLMs) learn to reason about the dynamic visual world through natural language. We introduce a suite of open datasets, benchmarks, and recipes for scalable oversight that enable precise video captioning. First, we define a structured specification for describing subjects, scenes, motion, spatial, and camera dynamics, grounded by hundreds of carefully defined visual primitives developed with professional video creators such as filmmakers. Next, to curate high-quality captions, we introduce CHAI (Critique-based Human-AI Oversight), a framework where trained experts critique and revise model-generated pre-captions into improved post-captions. This division of labor improves annotation accuracy and efficiency by offloading text generation to models, allowing humans to better focus on verification. Additionally, these critiques and preferences between pre- and post-captions provide rich supervision for improving open-source models (Qwen3-VL) on caption generation, reward modeling, and critique generation through SFT, DPO, and inference-time scaling. Our ablations show that critique quality in precision, recall, and constructiveness, ensured by our oversight framework, directly governs downstream performance. With modest expert supervision, the resulting model outperforms closed-source models such as Gemini-3.1-Pro. Finally, we apply our approach to re-caption large-scale professional videos (e.g., films, commercials, games) and fine-tune video generation models such as Wan to better follow detailed prompts of up to 400 words, achieving finer control over cinematography including camera motion, angle, lens, focus, point of view, and framing. Our results show that precise specification and human-AI oversight are key to professional-level video understanding and generation. Data and code are available on our project page: https://linzhiqiu.github.io/papers/chai/

[149] StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling

Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We build on the Visual Autoregressive Modeling (VAR) framework and formulate style transfer as conditional discrete sequence modeling in a learned latent space. Images are decomposed into multi-scale representations and tokenized into discrete codes by a VQ-VAE; a transformer then autoregressively models the distribution of target tokens conditioned on style and content tokens. To inject style and content information, we introduce a blended cross-attention mechanism in which the evolving target representation attends to its own history, while style and content features act as queries that decide which aspects of this history to emphasize. A scale-dependent blending coefficient controls the relative influence of style and content at each stage, encouraging the synthesized representation to align with both the content structure and the style texture without breaking the autoregressive continuity of VAR. We train StyleVAR in two stages from a pretrained VAR checkpoint: supervised fine-tuning on a large triplet dataset of content–style–target images, followed by reinforcement fine-tuning with Group Relative Policy Optimization (GRPO) against a DreamSim-based perceptual reward, with per-action normalization weighting to rebalance credit across VAR’s multi-scale hierarchy. Across three benchmarks spanning in-, near-, and out-of-distribution regimes, StyleVAR consistently outperforms an AdaIN baseline on Style Loss, Content Loss, LPIPS, SSIM, DreamSim, and CLIP similarity, and the GRPO stage yields further gains over the SFT checkpoint, most notably on the reward-aligned perceptual metrics. Qualitatively, the method transfers texture while maintaining semantic structure, especially for landscapes and architectural scenes, while a generalization gap on internet images and difficulty with human faces highlight the need for better content diversity and stronger structural priors.

[150] Clinically-Informed Modeling for Pediatric Brain Tumor Classification from Whole-Slide Histopathology Images

Joakim Nguyen, Jian Yu, Jinrui Fang, Nicholas Konz, Tianlong Chen, Sanjay Krishnan, Chandra Krishnan, Ying Ding, Hairong Wang, Ankita Shukla

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Accurate diagnosis of pediatric brain tumors, starting with histopathology, presents unique challenges for deep learning, including severe data scarcity, class imbalance, and fine-grained morphologic overlap across diagnostically distinct subtypes. While pathology foundation models have advanced patch-level representation learning, their effective adaptation to weakly supervised pediatric brain tumor classification under limited data remains underexplored. In this work, we introduce an expert-guided contrastive fine-tuning framework for pediatric brain tumor diagnosis from whole-slide images (WSI). Our approach integrates contrastive learning into slide-level multiple instance learning (MIL) to explicitly regularize the geometry of slide-level representations during downstream fine-tuning. We propose both a general supervised contrastive setting and an expert-guided variant that incorporates clinically informed hard negatives targeting diagnostically confusable subtypes. Through comprehensive experiments on pediatric brain tumor WSI classification under realistic low-sample and class-imbalanced conditions, we demonstrate that contrastive fine-tuning yields measurable improvements in fine-grained diagnostic distinctions. Our experimental analyses reveal complementary strengths across different contrastive strategies, with expert-guided hard negatives promoting more compact intra-class representations and improved inter-class separation. This work highlights the importance of explicitly shaping slide-level representations for robust fine-grained classification in data-scarce pediatric pathology settings.

[151] Optimizing Diffusion Priors with a Single Observation

Frederic Wang, Katherine L. Bouman

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While diffusion priors generate high-quality posterior samples across many inverse problems, they are often trained on limited training sets or purely simulated data, thus inheriting the errors and biases of these underlying sources. Current approaches to finetuning diffusion models rely on a large number of observations with varying forward operators, which can be difficult to collect for many applications, and thus lead to overfitting when the measurement set is small. We propose a method for tuning a prior from only a single observation by combining existing diffusion priors into a single product-of-experts prior and identifying the exponents that maximize the Bayesian evidence. We validate our method on real-world inverse problems, including black hole imaging, where the true prior is unknown a priori, and image deblurring with text-conditioned priors. We find that the evidence is often maximized by priors that extend beyond those trained on a single dataset. By generalizing the prior through exponent weighting, our approach enables posterior sampling from both tempered and combined diffusion models, yielding more flexible priors that improve the trustworthiness of the resulting posterior image distribution.

[152] BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion

Tianzhi Jia, Kaixing Yang, Xiaole Yang, Xulong Tang, Ke Qiu, Shikui Wei, Yao Zhao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D conducting motion generation aims to synthesize fine-grained conductor motions from music, with broad potential in music education, virtual performance, digital human animation, and human-AI co-creation. However, this task remains underexplored due to two major challenges: (1) the lack of large-scale fine-grained 3D conducting datasets and (2) the absence of effective methods that can jointly support long-sequence generation with high quality and efficiency. To address the data limitation, we develop a quality-oriented 3D conducting motion collection pipeline and construct CM-Data, a fine-grained SMPL-X dataset with about 10 hours of conducting motion data. To the best of our knowledge, CM-Data is the first and largest public dataset for 3D conducting motion generation. To address the methodological limitation, we propose BiTDiff, a novel framework for 3D conducting motion generation, built upon a BiMamba-Transformer hybrid model architecture for efficient long-sequence modeling and a Diffusion-based generative strategy with human-kinematic decomposition for high-quality motion synthesis. Specifically, BiTDiff introduces auxiliary physical-consistency losses and a hand-/body-specific forward-kinematics design for better fine-grained motion modeling, while leveraging BiMamba for memory-efficient long-sequence temporal modeling and Transformer for cross-modal semantic alignment. In addition, BiTDiff supports training-free joint-level motion editing, enabling downstream human-AI interaction design. Extensive quantitative and qualitative experiments demonstrate that BiTDiff achieves state-of-the-art (SOTA) performance for 3D conducting motion generation on the CM-Data dataset. Code will be available upon acceptance.

[153] Foveated Reasoning: Stateful, Action-based Visual Focusing for Vision-Language Models

Juhong Min, Lazar Valkov, Vitali Petsiuk, Hossein Souri, Deen Dayal Mohan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Vision-language models benefit from high-resolution images, but the increase in visual-token count incurs high compute overhead. Humans resolve this tension via foveation: a coarse view guides “where to look”, while selectively acquired high-acuity evidence refines “what to think”. We introduce Foveated Reasoner, an autoregressive vision-language framework that unifies foveation and reasoning within a single decoding trajectory. Starting from a low-resolution view, the model triggers foveation only when needed, retrieves high-resolution evidence from selected regions, and injects it back into the same decoding trajectory. We train the method with a two-stage pipeline: coldstart supervision to bootstrap foveation behavior, followed by reinforcement learning to jointly improve evidence acquisition and task accuracy while discouraging trivial “see-everything” solutions. Experiments show that the method learns effective foveation policies and achieves stronger accuracy under tight visual-token budgets across multiple vision-language benchmarks.

[154] Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery

Siyuan Yao, Siavash Ghorbany, Kuangshi Ai, Arnav Cherukuthota, Meghan Forstchen, Alexis Korotasz, Matthew Sisk, Ming Hu, Chaoli Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present a novel framework for automatically evaluating building conditions nationwide in the United States by leveraging large language models (LLMs) and Google Street View (GSV) imagery. By fine-tuning Gemma 3 27B on a modest human-labeled dataset, our approach achieves strong alignment with human mean opinion scores (MOS), outperforming even individual raters on SRCC and PLCC relative to the MOS benchmark. To enhance efficiency, we apply knowledge distillation, transferring the capabilities of Gemma 3 27B to a smaller Gemma 3 4B model that achieves comparable performance with a 3x speedup. Further, we distill the knowledge into a CNN-based model (EfficientNetV2-M) and a transformer (SwinV2-B), delivering close performance while achieving a 30x speed gain. Furthermore, we investigate LLMs’ capabilities for assessing an extensive list of built environment and housing attributes through a human-AI alignment study and develop a visualization dashboard that integrates LLM assessment outcomes for downstream analysis by homeowners. Our framework offers a flexible and efficient solution for large-scale building condition assessment, enabling high accuracy with minimal human labeling effort.

[155] Pretrain Where? Investigating How Pretraining Data Diversity Impacts Geospatial Foundation Model Performance

Amandeep Kaur, Mirali Purohit, Gedeon Muhawenayo, Esther Rolf, Hannah Kerner

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: New geospatial foundation models introduce a new model architecture and pretraining dataset, often sampled using different notions of data diversity. Performance differences are largely attributed to the model architecture or input modalities, while the role of the pretraining dataset is rarely studied. To address this research gap, we conducted a systematic study on how the geographic composition of pretraining data affects a model’s downstream performance. We created global and per-continent pretraining datasets and evaluated them on global and per-continent downstream datasets. We found that the pretraining dataset from Europe outperformed global and continent-specific pretraining datasets on both global and local downstream evaluations. To investigate the factors influencing a pretraining dataset’s downstream performance, we analysed 10 pretraining datasets using diversity across continents, biomes, landcover and spectral values. We found that only spectral diversity was strongly correlated with performance, while others were weakly correlated. This finding establishes a new dimension of diversity to be accounted for when creating a high-performing pretraining dataset. We open-sourced 7 new pretraining datasets, pretrained models, and our experimental framework at https://github.com/kerner-lab/pretrain-where.

[156] HyperFM: An Efficient Hyperspectral Foundation Model with Spectral Grouping

Zahid Hassan Tushar, Sanjay Purushotham

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The NASA PACE mission provides unprecedented hyperspectral observations of ocean color, aerosols, and clouds, offering new insights into how these components interact and influence Earth’s climate and air quality. Its Ocean Color Instrument measures light across hundreds of finely spaced wavelength bands, enabling detailed characterization of features such as phytoplankton composition, aerosol properties, and cloud microphysics. However, hyperspectral data of this scale is large, complex, and difficult to label, requiring specialized processing and analysis techniques. Existing foundation models, which have transformed computer vision and natural language processing, are generally trained on standard RGB imagery and therefore struggle to interpret the continuous spectral signatures captured by PACE. While recent advances have introduced hyperspectral foundation models, they are typically trained on cloud-free observations and often remain limited to single-sensor datasets due to spectral inconsistencies across instruments. Moreover, existing models tend to be parameter-heavy and computationally expensive, limiting scalability and adoption in operational settings. To address these challenges, we introduce HyperFM, a parameter-efficient hyperspectral foundation model that leverages intra-group and inter-group spectral attention along with hybrid parameter decomposition to better capture spectral spatial relationships while reducing computational cost. HyperFM demonstrates consistent performance improvements over existing hyperspectral foundation models and task-specific state-of-the-art methods across four benchmark downstream atmospheric cloud property retrieval tasks. To support further research, we additionally release HyperFM250K, a large-scale hyperspectral dataset from the PACE mission that includes both clear and cloudy scenes.

Yalcin Tur, Mihajlo Stojkovic, Ulas Bagci

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Diffusion models have achieved remarkable quality in multi-modal MRI synthesis, but their computational cost (hundreds of sampling steps and separate models per modality) limits clinical deployment. We observe that this inefficiency stems from an unnecessary starting point: diffusion begins from pure noise, discarding the structural information already present in available MRI sequences. We propose WFM (Wavelet Flow Matching), which instead learns a direct flow from an informed prior, the mean of conditioning modalities in wavelet space, to the target distribution. Because the source and target share underlying anatomy and differ primarily in contrast, this formulation enables accurate synthesis in just 1-2 integration steps. A single 82M-parameter model with class conditioning synthesizes all four BraTS modalities (T1, T1c, T2, FLAIR), replacing four separate diffusion models totaling 326M parameters. On BraTS 2024, WFM achieves 26.8 dB PSNR and 0.94 SSIM, within 1-2 dB of diffusion baselines, while running 250-1000x faster (0.16-0.64s vs. 160s per volume). This speed-quality trade-off makes real-time MRI synthesis practical for clinical workflows. Code is available at https://github.com/yalcintur/WFM.

[158] Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

Jingkun Chen, Ruoshi Xu, Mingqi Gao, Shengda Luo, Jungong Han

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Point-Vision-Language Models promise to empower embodied agents with executable spatial reasoning, yet they frequently succumb to geometric hallucination where predicted 3D structures contradict the observed 2D reality. We identify a key cause of this failure not as a representation bottleneck but as a structural misalignment in reinforcement learning, where sparse geometric tokens are drowned out by noisy and broadcasted sequence-level rewards. To resolve this causal dilution, we propose Geometric Reward Credit Assignment, a framework that disentangles holistic supervision into field-specific signals and routes them exclusively to their responsible token spans. This mechanism transforms vague feedback into precise gradient updates and effectively turns generic policy optimization into targeted structural alignment. Furthermore, we internalize physical constraints via a Reprojection-Consistency term which serves as a cross-modal verifier to penalize physically impossible geometries. Validated on a calibrated benchmark derived from ShapeNetCore, our approach bridges the reliability gap by boosting 3D KPA from 0.64 to 0.93, increasing 3D bounding box intersection over union to 0.686, and raising reprojection consistency scores to 0.852. Crucially, these gains are achieved while maintaining robust 2D localization performance, marking a meaningful step from plausible textual outputs toward physically verifiable spatial predictions.

[159] WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images

Yuki Fujimura, Takahiro Kushida, Kazuya Kitano, Takuya Funatomi, Yasuhiro Mukaigawa

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We propose WildSplatter, a feed-forward 3D Gaussian Splatting (3DGS) model for unconstrained images with unknown camera parameters and varying lighting conditions. 3DGS is an effective scene representation that enables high-quality, real-time rendering; however, it typically requires iterative optimization and multi-view images captured under consistent lighting with known camera parameters. WildSplatter is trained on unconstrained photo collections and jointly learns 3D Gaussians and appearance embeddings conditioned on input images. This design enables flexible modulation of Gaussian colors to represent significant variations in lighting and appearance. Our method reconstructs 3D Gaussians from sparse input views in under one second, while also enabling appearance control under diverse lighting conditions. Experimental results demonstrate that our approach outperforms existing pose-free 3DGS methods on challenging real-world datasets with varying illumination.

[160] SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning

Chan Yeong Hwang, Miso Choi, Sunghyun On, Jinkyu Kim, Jungbeom Lee

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Understanding visual scenes requires not only recognizing objects but also reasoning about their spatial relationships. Unlike general vision-language tasks, spatial reasoning requires integrating multiple inductive biases, such as 2D appearance cues, depth signals, and geometric constraints, whose reliability varies across contexts. This suggests that effective spatial reasoning requires \emph{spatial adaptability}: the ability to flexibly coordinate different reasoning strategies depending on the input. However, most existing approaches rely on a single reasoning pipeline that implicitly learns a fixed spatial prior, limiting their ability to adapt under distribution changes. Multi-agent systems offer a promising alternative by aggregating diverse reasoning trajectories, but prior attempts in spatial reasoning primarily employ homogeneous agents, restricting the diversity of inductive biases they can leverage. In this work, we introduce \textbf{\textsc{SpatiO}}, a heterogeneous multi-agent framework for spatial reasoning that coordinates multiple vision-language specialists with complementary inductive biases. To enable effective collaboration, we propose \textbf{Test-Time Orchestration (TTO)}, an optimization mechanism that dynamically evaluates and reweights agents based on their observed reliability during inference, without modifying model parameters. Extensive experiments on diverse spatial reasoning benchmarks, including 3DSRBench, STVQA-7k, CV-Bench, and Omni3D-Bench, demonstrate that \textsc{SpatiO} consistently improves spatial reasoning performance over both closed-source and open-source baselines.

[161] A Probabilistic Framework for Improving Dense Object Detection in Underwater Image Data via Annealing-Based Data Augmentation

Eleanor Wiesler, Trace Baxley

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Object detection models typically perform well on images captured in controlled environments with stable lighting, water clarity, and viewpoint, but their performance degrades substantially in real-world underwater settings characterized by high variability and frequent occlusions. In this work, we address these challenges by introducing a novel data augmentation framework designed to improve robustness in dense and unconstrained underwater scenes. Using the DeepFish dataset, which contains images of fish in natural environments, we first generate bounding box annotations from provided segmentation masks to construct a custom detection dataset. We then propose a pseudo-simulated annealing-based augmentation algorithm, inspired by the copy-paste strategy of Deng et al. [1], to synthesize realistic crowded fish scenarios. Our approach improves spatial diversity and object density during training, enabling better generalization to complex scenes. Experimental results show that our method significantly outperforms a baseline YOLOv10 model, particularly on a challenging test set of manually annotated images collected from live-stream footage in the Florida Keys. These results demonstrate the effectiveness of our augmentation strategy for improving detection performance in dense, real-world underwater environments.

[162] Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

Boxun Xu, Yuming Du, Zichang Liu, Siyu Yang, Ziyang Jiang, Siqi Yan, Rajasi Saha, Albert Pumarola, Wenchen Wang, Peng Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We introduce Sparse Forcing, a training-and-inference paradigm for autoregressive video diffusion models that improves long-horizon generation quality while reducing decoding latency. Sparse Forcing is motivated by an empirical observation in autoregressive diffusion rollouts: attention concentrates on a persistent subset of salient visual blocks, forming an implicit spatiotemporal memory in the KV cache, and exhibits a locally structured block-sparse pattern within sliding windows. Building on this observation, we propose a trainable native sparsity mechanism that learns to compress, preserve, and update these persistent blocks while restricting computation within each local window to a dynamically selected local neighborhood. To make the approach practical at scale for both training and inference, we further propose Persistent Block-Sparse Attention (PBSA), an efficient GPU kernel that accelerates sparse attention and memory updates for low-latency, memory-efficient decoding. Experiments show that Sparse Forcing improves the VBench score by +0.26 over Self-Forcing on 5-second text-to-video generation while delivering a 1.11-1.17x decoding speedup and 42% lower peak KV-cache footprint. The gains are more pronounced on longer-horizon rollouts, delivering improved visual quality with +0.68 and +2.74 VBench improvements, and 1.22x and 1.27x speedups on 20-second and 1-minute generations, respectively.

[163] LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Facial attribute editing and style manipulation are crucial for applications like virtual avatars and photo editing. However, achieving precise control over facial attributes without altering unrelated features is challenging due to the complexity of facial structures and the strong correlations between attributes. While conditional GANs have shown progress, they are limited by accuracy issues and training instability. Diffusion models, though promising, face challenges in style manipulation due to the limited expressiveness of semantic directions. In this paper, we propose LatRef-Diff, a novel diffusion-based framework that addresses these limitations. We replace the traditional semantic directions in diffusion models with style codes and propose two methods for generating them: latent and reference guidance. Based on these style codes, we design a style modulation module that integrates them into the target image, enabling both random and customized style manipulation. This module incorporates learnable vectors, cross-attention mechanisms, and a hierarchical design to improve accuracy and image quality. Additionally, to enhance training stability while eliminating the need for paired images (e.g., before and after editing), we propose a forward-backward consistency training strategy. This strategy first removes the target attribute approximately using image-specific semantic directions and then restores it via style modulation, guided by perceptual and classification losses. Extensive experiments on CelebA-HQ demonstrate that LatRef-Diff achieves state-of-the-art performance in both qualitative and quantitative evaluations. Ablation studies validate the effectiveness of our model’s design choices.

[164] ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative through fast, non-iterative online updates. Combined with a compact convolutional neural network (CNN) feature extractor, HDC enables efficient on-device adaptation with strong visual representations. However, prior HDC-based CL systems often depend on multi-tier memory hierarchies and complex cluster management, limiting deployability on resource-constrained hardware. We present ImageHD, an FPGA accelerator for on-device continual learning of visual data based on HDC. ImageHD targets streaming CL under strict latency and on-chip memory constraints, avoiding costly iterative optimization. At the algorithmic level, we introduce a hardware-aware CL method that bounds class exemplars through a unified exemplar memory and a hardware-efficient cluster merging strategy, while incorporating a quantized CNN front-end to reduce deployment overhead without sacrificing accuracy. At the system level, ImageHD is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA, integrating HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation within tight on-chip resource budgets. On CORe50, ImageHD achieves up to 40.4x (4.84x) speedup and 383x (105.1x) energy efficiency over optimized CPU (GPU) baselines, demonstrating the practicality of HDC-enabled continual learning for real-time edge AI.

[165] AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Facial attribute editing aims to modify target attributes while preserving attribute-irrelevant content and overall image fidelity. Existing GAN-based methods provide favorable controllability, but often suffer from weak alignment between style codes and attribute semantics. Diffusion-based methods can synthesize highly realistic images; however, their editing precision is limited by the entanglement of semantic directions among different attributes. In this paper, we propose AttDiff-GAN, a hybrid framework that combines GAN-based attribute manipulation with diffusion-based image generation. A key challenge in such integration lies in the inconsistency between one-step adversarial learning and multi-step diffusion denoising, which makes effective optimization difficult. To address this issue, we decouple attribute editing from image synthesis by introducing a feature-level adversarial learning scheme to learn explicit attribute manipulation, and then using the manipulated features to guide the diffusion process for image generation, while also removing the reliance on semantic direction-based editing. Moreover, we enhance style-attribute alignment by introducing PriorMapper, which incorporates facial priors into style generation, and RefineExtractor, which captures global semantic relationships through a Transformer for more precise style extraction. Experimental results on CelebA-HQ show that the proposed method achieves more accurate facial attribute editing and better preservation of non-target attributes than state-of-the-art methods in both qualitative and quantitative evaluations.

[166] GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Vision Graph Neural Networks (ViGs) represent an image as a graph of patch tokens, enabling adaptive, feature-driven neighborhoods. Unlike CNNs with fixed grid biases or Vision Transformers with global token interactions, ViGs rely on dynamic graph convolution: at each layer, a feature-dependent graph is built via k-nearest-neighbor (kNN) search on current patch features, followed by message passing. This per-layer graph construction is the main bottleneck, consuming 50–95% of graph convolution time on CPUs and GPUs, scaling as $O(N^2)$ with the number of patches $N$, and creating a sequential dependency between graph construction and feature updates. We introduce GraphLeap, a simple reformulation that removes this dependency by decoupling graph construction from feature update across layers. GraphLeap performs the feature update at layer $\ell$ using a graph built from the previous layer’s features, while simultaneously using the current layer’s features to construct the graph for layer $\ell+1$. This one-layer-lookahead graph construction enables concurrent graph construction and message passing. Although using prior-layer features can introduce minor accuracy degradation, lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy. Building on GraphLeap, we present the first end-to-end FPGA accelerator for Vision GNNs. Our streaming, layer-pipelined design overlaps a kNN graph construction engine with a feature update engine, exploits node- and channel-level parallelism, and enables efficient on-chip dataflow without explicit edge-feature materialization. Evaluated on isotropic and pyramidal ViG models on an Alveo U280 FPGA, GraphLeap achieves up to $95.7\times$ speedup over CPU and $8.5\times$ speedup over GPU baselines, demonstrating the feasibility of real-time Vision GNN inference.

[167] Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Controllable human video generation aims to produce realistic videos of humans with explicitly guided motions and appearances,serving as a foundation for digital humans, animation, and embodied AI.However, the scarcity of largescale, diverse, and privacy safe human video datasets poses a major bottleneck, especially for rare identities and complex actions.Synthetic data provides a scalable and controllable alternative,yet its actual contribution to generative modeling remains underexplored due to the persistent Sim2Real gap.In this work,we systematically investigate the impact of synthetic data on controllable human video generation. We propose a diffusion-based framework that enables fine-grained control over appearance and motion while providing a unfied testbed to analyze how synthetic data interacts with real world data during training. Through extensive experiments, we reveal the complementary roles of synthetic and real data and demonstrate possible methods for efficiently selecting synthetic samples to enhance motion realism,temporal consistency,and identity preservation.Our study offers the first comprehensive exploration of synthetic data’s role in human-centric video synthesis and provides practical insights for building data-efficient and generalizable generative models.

[168] an interpretable vision transformer framework for automated brain tumor classification

Chinedu Emmanuel Mbonu, Tochukwu Sunday Belonwu, Okwuchukwu Ejike Chukwuogo, Kenechukwu Sylvanus Anigbogu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Brain tumors represent one of the most critical neurological conditions, where early and accurate diagnosis is directly correlated with patient survival rates. Manual interpretation of Magnetic Resonance Imaging (MRI) scans is time-intensive, subject to inter-observer variability, and demands significant specialist expertise. This paper proposes a deep learning framework for automated four-class brain tumor classification distinguishing glioma, meningioma, pituitary tumor, and healthy brain tissue from a dataset of 7,023 MRI scans. The proposed system employs a Vision Transformer (ViT-B/16) pretrained on ImageNet-21k as the backbone, augmented with a clinically motivated preprocessing and training pipeline. Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to enhance local contrast and accentuate tumor boundaries invisible to standard normalization. A two-stage fine-tuning strategy is adopted: the classification head is warmed up with the backbone frozen, followed by full fine-tuning with discriminative learning rates. MixUp and CutMix augmentation is applied per batch to improve generalization. Exponential Moving Average (EMA) of weights and Test-Time Augmentation (TTA) further stabilize and boost performance. Attention Rollout visualization provides clinically interpretable heatmaps of the brain regions driving each prediction. The proposed model achieves a test accuracy of 99.29%, macro F1-score of 99.25%, and perfect recall on both healthy and meningioma classes, outperforming all CNN-based baselines

[169] The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

Kai Liu, Haoyang Yue, Zeli Lin, Zheng Chen, Jingkai Wang, Jue Gong, Jiatong Li, Xianglong Yan, Libo Zhu, Jianze Li, Ziqing Zhang, Zihan Zhou, Xiaoyang Liu, Radu Timofte, Yulun Zhang, Junye Chen, Zhenming Yan, Yucong Hong, Ruize Han, Song Wang, Li Pang, Heng Zhao, Xinqiao Wu, Deyu Meng, Xiangyong Cao, Weijun Yuan, Zhan Li, Zhanglu Chen, Boyang Yao, Yihang Chen, Yifan Deng, Zengyuan Zuo, Junjun Jiang, Saiprasad Meesiyawar, Sulocha Yatageri, Nikhil Akalwadi, Ramesh Ashok Tabib, Uma Mudenagudi, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Cici Liu, Tongyao Mu, Qiong Cao, Yifan Wang, Kosuke Shigematsu, Hiroto Shirono, Asuka Shin, Wei Zhou, Linfeng Li, Lingdong Kong, Ce Wang, Xingwei Zhong, Wanjie Sun, Dafeng Zhang, Hongxin Lan, Qisheng Xu, Mingyue He, Hui Geng, Tianjiao Wan, Kele Xu, Changjian Wang, Antoine Carreaud, Nicola Santacroce, Shanci Li, Jan Skaloud, Adrien Gressin

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper presents the NTIRE 2026 Remote Sensing Infrared Image Super-Resolution (x4) Challenge, one of the associated challenges of NTIRE 2026. The challenge aims to recover high-resolution (HR) infrared images from low-resolution (LR) inputs generated through bicubic downsampling with a x4 scaling factor. The objective is to develop effective models or solutions that achieve state-of-the-art performance for infrared image SR in remote sensing scenarios. To reflect the characteristics of infrared data and practical application needs, the challenge adopts a single-track setting. A total of 115 participants registered for the competition, with 13 teams submitting valid entries. This report summarizes the challenge design, dataset, evaluation protocol, main results, and the representative methods of each team. The challenge serves as a benchmark to advance research in infrared image super-resolution and promote the development of effective solutions for real-world remote sensing applications.

[170] PLAS-Net: Pixel-Level Area Segmentation for UAV-Based Beach Litter Monitoring

Yongying Liu, Jiaqi Wang, Jian Song, Xinlei Shao, Yijia Chen, Nan Xu, Katsunori Mizuno, Shigeru Tabeta, Fan Zhao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Accurate quantification of the physical exposure area of beach litter, rather than simple item counts, is essential for credible ecological risk assessment of marine debris. However, automated UAV-based monitoring predominantly relies on bounding-box detection, which systematically overestimates the planar area of irregular litter objects. To address this geometric limitation, we develop PLAS-Net (Pixel-level Litter Area Segmentor), an instance segmentation framework that extracts pixel-accurate physical footprints of coastal debris. Evaluated on UAV imagery from a monsoon-driven pocket beach in Koh Tao, Thailand, PLAS-Net achieves a mAP_50 of 58.7% with higher precision than eleven baseline models, demonstrating improved mask fidelity under complex coastal conditions. To illustrate how the accuracy of the masking affects the conclusions of environmental analysis, we conducted three downstream demonstrations: (i) power-law fitting of normalized plastic density (NPD) to characterize fragmentation dynamics; (ii) area-weighted ecological risk index (ERI) to map spatial pollution hotspots; and (iii) source composition analysis revealing the abundance-area paradox: fishing gear constitutes a small proportion of the total number of items, but has the largest physical area per unit item. Pixel-level area extraction can provide more valuable information for coastal monitoring compared to methods based solely on counting.

[171] FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

Khaled R Ahmed, Toqi Tahamid Sarker, Taminul Islam, Tamany M Alanezi, Amer AbuGhazaleh

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Monitoring frying oil degradation is critical for food safety, yet current practice relies on destructive wet-chemistry assays that provide no spatial information and are unsuitable for real-time use. We identify a fundamental obstacle in thermal-image-based inspection, the camera-fingerprint shortcut, whereby models memorize sensor-specific noise and thermal bias instead of learning oxidation chemistry, collapsing under video-disjoint evaluation. We propose FryNet, a dual-stream RGB-thermal framework that jointly performs oil-region segmentation, serviceability classification, and regression of four chemical oxidation indices (PV, p-AV, Totox, temperature) in a single forward pass. A ThermalMiT-B2 backbone with channel and spatial attention extracts thermal features, while an RGB-MAE Encoder learns chemically grounded representations via masked autoencoding and chemical alignment. Dual-Encoder DANN adversarially regularizes both streams against video identity via Gradient Reversal Layers, and FiLM fusion bridges thermal structure with RGB chemical context. On 7,226 paired frames across 28 frying videos, FryNet achieves 98.97% mIoU, 100% classification accuracy, and 2.32 mean regression MAE, outperforming all seven baselines.

[172] Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

Zhiyong Li, Wei Jiang, Haojie Liu, Mingyu Wang, Wanchong Xu, Weijie Mao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Visible-infrared person re-identification (VI-ReID) enables cross-modality identity matching for all-day surveillance, yet existing methods predominantly focus on the image level or rely heavily on costly identity annotations. While video-based VI-ReID has recently emerged to exploit temporal dynamics for improved robustness, existing studies remain limited to supervised settings. Crucially, the unsupervised video VI-ReID problem, where models must learn from RGB and infrared tracklets without identity labels, remains largely unexplored despite its practical importance in real-world deployment. To bridge this gap, we propose HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework without explicit hard pseudo-label assignment for unsupervised video-based VI-ReID. HiTPro begins with an efficient Temporal-aware Feature Encoder that first extracts discriminative frame-level features and then aggregates them into a robust tracklet-level representation. Building upon these features, HiTPro first constructs reliable intra-camera prototypes via Intra-Camera Tracklet Prototyping by aggregating features from temporally partitioned sub-tracklets. Through Hierarchical Cross-Prototype Alignment, we perform a two-stage positive mining process: progressing from within-modality associations to cross-modality matching, enhanced by Dynamic Threshold Strategy and Soft Weight Assignment. Finally, {Hierarchical Contrastive Learning} progressively optimizes feature-prototype alignment across three levels: intra-camera discrimination, cross-camera same-modality consistency, and cross-modality invariance. Extensive experiments on HITSZ-VCM and BUPTCampus demonstrate that HiTPro achieves state-of-the-art performance under fully unsupervised settings, significantly outperforming adapted baselines and establishes a strong baseline for future research.

[173] MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment

Juan Li, Chuanghao Ding, Xujie Zhang, Cam-Tu Nguyen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Universal Multimodal Retrieval (UMR) aims to map different modalities (e.g., visual and textual) into a shared embedding space for multi-modal retrieval. Existing UMR methods can be broadly divided into two categories: early-fusion approaches, such as Marvel, which projects visual features into the language model (LM) space for integrating with text modality, and late-fusion approaches, such as UniVL-DR, which encode visual and textual inputs using separate encoders and obtain fused embeddings through addition. Our pilot study reveals that Marvel exhibits visual modality collapse, which is characterized by the model’s tendency to disregard visual features while depending excessively on textual cues. In contrast, although UniVL-DR is less affected by this issue, it is more susceptible to semantic misalignment, where semantically related content is positioned far apart in the embedding space. To address these challenges, we propose MiMIC, which introduces two key innovations: (1) a fusion-in-decoder architecture for effective multimodal integration, and (2) robust training through single modality mixin and random caption dropout. Experiments on the WebQA+ and EVQA+ datasets, where image in documents or queries might lack captions, indicate that MiMIC consistently outperforms both early- and late-fusion baselines.

[174] Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

Masahiro Kada, Ryota Yoshihashi, Satoshi Ikehata, Rei Kawakami, Ikuro Sato

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent progress in deep learning has been driven by increasingly large-scale models, but the resulting computational cost has become a critical bottleneck. Sparse Mixture of Experts (MoE) offers an effective solution by activating only a small subset of experts for each input, achieving high scalability without sacrificing inference speed. Although effective, sparse MoE training exhibits characteristic optimization difficulties. Because the router receives informative gradients only through the experts selected in the forward pass, it suffers from gradient blocking and obtains little information from unselected routes. This limited, highly localized feedback makes it difficult for the router to learn appropriate expert-selection scores and often leads to unstable routing dynamics, such as fluctuating expert assignments during training. To address this issue, we propose TGR-MoE: Teacher-Guided Routing for Sparse Vision Mixture-of-Experts, a simple yet effective method that stabilizes router learning using supervision derived from a pretrained dense teacher model. TGR-MoE constructs a teacher router from the teacher’s intermediate representations and uses its routing outputs as pseudo-supervision for the student router, suppressing frequent routing fluctuations during training and enabling knowledge-guided expert selection from the early stages of training. Extensive experiments on ImageNet-1K and CIFAR-100 demonstrate that TGR consistently improves both accuracy and routing consistency, while maintaining stable training even under highly sparse configurations.

[175] Latent Denoising Improves Visual Alignment in Large Multimodal Models

Dhruv Parikh, Jacob Fein-Ashley, Rajgopal Kannan, Viktor Prasanna

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Multimodal Models (LMMs) such as LLaVA are typically trained with an autoregressive language modeling objective, providing only indirect supervision to visual tokens. This often yields weak internal visual representations and brittle behavior under distribution shift. Inspired by recent progress on latent denoising for learning high-quality visual tokenizers, we show that the same principle provides an effective form of visual supervision for improving internal visual feature alignment and multimodal understanding in LMMs. We propose a latent denoising framework that corrupts projected visual tokens using a saliency-aware mixture of masking and Gaussian noising. The LMM is trained to denoise these corrupted tokens by recovering clean teacher patch features from hidden states at a selected intermediate LLM layer using a decoder. To prevent representation collapse, our framework also preserves the teacher’s intra-image similarity structure and applies intra-image contrastive patch distillation. During inference, corruption and auxiliary heads are disabled, introducing no additional inference-time overhead. Across a broad suite of standard multimodal benchmarks, our method consistently improves visual understanding and reasoning over strong baselines, and yields clear gains on compositional robustness benchmarks (e.g., NaturalBench). Moreover, under ImageNet-C-style non-adversarial common corruptions applied to benchmark images, our method maintains higher accuracy and exhibits reduced degradation at both moderate and severe corruption levels. Our code is available at https://github.com/dhruvashp/latent-denoising-for-lmms.

[176] Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

Wadii Boulila, Adel Ammar, Bilel Benjdira, Maha Driss

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views, which works well when augmentations preserve semantic content. However, aerial images are frequently degraded by haze, motion blur, rain, and occlusion that remove critical evidence. Enforcing alignment between a clean and a severely degraded view can introduce spurious structure into the latent space. This study proposes a training strategy and architectural modification to enhance SSL robustness to such corruptions. It introduces a per-sample, per-factor trust weight into the alignment objective, combined with the base contrastive loss as an additive residual. A stop-gradient is applied to the trust weight instead of a multiplicative gate. While a multiplicative gate is a natural choice, experiments show it impairs the backbone, whereas our additive-residual approach improves it. Using a 200-epoch protocol on a 210,000-image corpus, the method achieves the highest mean linear-probe accuracy among six backbones on EuroSAT, AID, and NWPU-RESISC45 (90.20% compared to 88.46% for SimCLR and 89.82% for VICReg). It yields the largest improvements under severe information-erasing corruptions on EuroSAT (+19.9 points on haze at s=5 over SimCLR). The method also demonstrates consistent gains of +1 to +3 points in Mahalanobis AUROC on a zero-shot cross-domain stress test using BDD100K weather splits. Two ablations (scalar uncertainty and cosine gate) indicate the additive-residual formulation is the primary source of these improvements. An evidential variant using Dempster-Shafer fusion introduces interpretable signals of conflict and ignorance. These findings offer a concrete design principle for uncertainty-aware SSL. Code is publicly available at https://github.com/WadiiBoulila/trust-ssl.

[177] SparseGF: A Height-Aware Sparse Segmentation Framework with Context Compression for Robust Ground Filtering Across Urban to Natural Scenes

Nannan Qin, Pengjie Tao, Haiyan Guan, Zhizhong Kang, Lingfei Ma, Xiangyun Hu, Jonathan Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: High-quality digital terrain models derived from airborne laser scanning (ALS) data are essential for a wide range of geospatial analyses, and their generation typically relies on robust ground filtering (GF) to separate point clouds across diverse landscapes into ground and non-ground parts. Although current deep-learning-based GF methods have demonstrated impressive performance, especially in specific challenging terrains, their cross-scene generalization remains limited by two persistent issues: the context-detail dilemma in large-scale processing due to limited computational resources, and the random misclassification of tall objects arising from classification-only optimization. To overcome these limitations, we propose SparseGF, a height-aware sparse segmentation framework enhanced with context compression. It is built upon three key innovations: (1) a convex-mirror-inspired context compression module that condenses expansive contexts into compact representations while preserving central details; (2) a hybrid sparse voxel-point network architecture that effectively interprets compressed representations while mitigating compression-induced geometric distortion; and (3) a height-aware loss function that explicitly enforces topographic elevation priors during training to suppress random misclassification of tall objects. Extensive evaluations on two large-scale ALS benchmark datasets demonstrate that SparseGF delivers robust GF across urban to natural terrains, achieving leading performance in complex urban scenes, competitive results on mixed terrains, and moderate yet non-catastrophic accuracy in densely forested steep areas. This work offers new insights into deep-learning-based GF research and encourages further exploration toward truly cross-scene generalization for large-scale environmental monitoring.

[178] Prototype-Based Test-Time Adaptation of Vision-Language Models

Zhaohong Huang, Yuxin Zhang, Wenjing Liu, Fei Chao, Rongrong Ji

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Test-time adaptation (TTA) has emerged as a promising paradigm for vision-language models (VLMs) to bridge the distribution gap between pre-training and test data. Recent works have focused on backpropagation-free TTA methods that rely on cache-based designs, but these introduce two key limitations. First, inference latency increases as the cache grows with the number of classes, leading to inefficiencies in large-scale settings. Second, suboptimal performance occurs when the cache contains insufficient or incorrect samples. In this paper, we present Prototype-Based Test-Time Adaptation (PTA), an efficient and effective TTA paradigm that uses a set of class-specific knowledge prototypes to accumulate knowledge from test samples. Particularly, knowledge prototypes are adaptively weighted based on the zero-shot class confidence of each test sample, incorporating the sample’s visual features into the corresponding class-specific prototype. It is worth highlighting that the knowledge from past test samples is integrated and utilized solely in the prototypes, eliminating the overhead of cache population and retrieval that hinders the efficiency of existing TTA methods. This endows PTA with extremely high efficiency while achieving state-of-the-art performance on 15 image recognition benchmarks and 4 robust point cloud analysis benchmarks. For example, PTA improves CLIP’s accuracy from 65.64% to 69.38% on 10 cross-domain benchmarks, while retaining 92% of CLIP’s inference speed on large-scale ImageNet-1K. In contrast, the cache-based TDA achieves a lower accuracy of 67.97% and operates at only 50% of CLIP’s inference speed.

[179] KD-CVG: A Knowledge-Driven Approach for Creative Video Generation

Linkai Liu, Wei Feng, Xi Zhao, Shen Zhang, Xingye Chen, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Yuchen Zhou, Zipeng Guo, Chao Gou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Creative Generation (CG) leverages generative models to automatically produce advertising content that highlights product features, and it has been a significant focus of recent research. However, while CG has advanced considerably, most efforts have concentrated on generating advertising text and images, leaving Creative Video Generation (CVG) relatively underexplored. This gap is largely due to two major challenges faced by Text-to-Video (T2V) models: (a) \textbf{ambiguous semantic alignment}, where models struggle to accurately correlate product selling points with creative video content, and (b) \textbf{inadequate motion adaptability}, resulting in unrealistic movements and distortions. To address these challenges, we develop a comprehensive Advertising Creative Knowledge Base (ACKB) as a foundational resource and propose a knowledge-driven approach (KD-CVG) to overcome the knowledge limitations of existing models. KD-CVG consists of two primary modules: Semantic-Aware Retrieval (SAR) and Multimodal Knowledge Reference (MKR). SAR utilizes the semantic awareness of graph attention networks and reinforcement learning feedback to enhance the model’s comprehension of the connections between selling points and creative videos. Building on this, MKR incorporates semantic and motion priors into the T2V model to address existing knowledge gaps. Extensive experiments have demonstrated KD-CVG’s superior performance in achieving semantic alignment and motion adaptability, validating its effectiveness over other state-of-the-art methods. The code and dataset will be open source at https://kdcvg.github.io/KDCVG/.

[180] EdgeFormer: local patch-based edge detection transformer on point clouds

Yifei Xie, Zhikun Tu, Tong Yang, Yuhe Zhang, Xinyu Zhou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Edge points on 3D point clouds can clearly convey 3D geometry and surface characteristics, therefore, edge detection is widely used in many vision applications with high industrial and commercial demands. However, the fine-grained edge features are difficult to detect effectively as they are generally densely distributed or exhibit small-scale surface gradients. To address this issue, we present a learning-based edge detection network, named EdgeFormer, which mainly consists of two stages. Based on the observation that spatially neighboring points tend to exhibit high correlation, forming the local underlying surface, we convert the edge detection of the entire point cloud into a point classification based on local patches. Therefore, in the first stage, we construct local patch feature descriptors that describe the local neighborhood around each point. In the second stage, we classify each point by analyzing the local patch feature descriptors generated in the first stage. Due to the conversion of the point cloud into local patches, the proposed method can effectively extract the finer details. The experimental results show that our model demonstrates competitive performance compared to six baselines.

[181] VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought

Byeonggeuk Lim, Kyeonghyun Kim, JungMin Yun, YoungBin Kim

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The advancement of Large Vision-Language Models (LVLMs) requires precise local region-based reasoning that faithfully grounds the model’s logic in actual visual evidence. However, existing datasets face limitations in scalability due to extensive manual annotation and lack of explicit alignment between multi-step reasoning and corresponding image regions, which constrains the evaluation of model trustworthiness. To address these challenges, we propose the Visual Grounding Chain-of-Thought (VG-CoT) dataset, which explicitly links each reasoning step to real visual evidence within the image through a fully automated three-stage pipeline. The pipeline first extracts object- and text-level visual evidence using state-of-the-art detection and OCR models, then generates step-by-step grounded reasoning with GPT-4o, and finally refines the grounding through a rationale-driven open-set detection process. In addition, we introduce a new benchmark that comprehensively evaluates LVLMs reasoning across three complementary dimensions: Rationale Quality, Answer Accuracy, and Reasoning-Answer Alignment. Experiments with representative LVLMs, including LLaVA-1.5 and Qwen2-VL, demonstrate consistent improvements on most evaluation metrics, confirming that VG-CoT effectively enhances trustworthy, evidence-based reasoning while maintaining scalable and cost-efficient dataset construction. The dataset and code will be released publicly upon acceptance to facilitate further research.

[182] You Only Gaussian Once: Controllable 3D Gaussian Splatting for Ultra-Densely Sampled Scenes

Jinrang Jia, Zhenjia Li, Yifeng Shi

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, yet existing methods remain predominantly research prototypes ill-suited for production-level deployment. We identify a critical “Industry-Academia Gap” hindering real-world application: unpredictable resource consumption from heuristic Gaussian growth, the “sparsity shield” of current benchmarks that rewards hallucination over physical fidelity, and severe multi-sensor data pollution. To bridge this gap, we propose YOGO (You Only Gaussian Once), a system-level framework that reformulates the stochastic growth process into a deterministic, budget-aware equilibrium. YOGO integrates a novel budget controller for hardware-constrained resource allocation and an availability-registration protocol for robust multi-sensor fusion. To push the boundaries of reconstruction fidelity, we introduce Immersion v1.0, the first ultra-dense indoor dataset specifically designed to break the “sparsity shield.” By providing saturated viewpoint coverage, Immersion v1.0 forces algorithms to focus on extreme physical fidelity rather than viewpoint interpolation, and enables the community to focus on the upper limits of high-fidelity reconstruction. Extensive experiments demonstrate that YOGO achieves state-of-the-art visual quality while maintaining a strictly deterministic profile, establishing a new standard for production-grade 3DGS. To facilitate reproducibility, part scenes of Immersion v1.0 dataset and source code of YOGO has been publicly released. The project link is https://jjrcn.github.io/YOGO/.

[183] S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

Qingxiao Li, Lifeng Xu, QingLi Wang, Yudong Bai, Mingwei Ou, Shu Hu, Nan Xu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present S1-VL, a multimodal reasoning model for scientific domains that natively supports two complementary reasoning paradigms: Scientific Reasoning, which relies on structured chain-of-thought, and Thinking-with-Images, which enables the model to actively manipulate images through Python code execution during reasoning. In the Thinking-with-Images mode, the model generates and executes image-processing code in a sandbox environment, obtains intermediate visual results, and continues reasoning in a multi-turn iterative manner. This design is particularly effective for challenging scenarios such as high-resolution scientific chart interpretation, microscopic image understanding, and geometry-assisted reasoning. To construct the training data, we collect scientific multimodal datasets spanning six disciplines: mathematics, physics, chemistry, astronomy, geography, and biology. We further develop a six-dimensional quality filtering framework for reasoning trajectories. To mitigate redundant, ineffective, and erroneous visual operations commonly found in existing datasets, we propose a multi-stage filtering pipeline together with an adaptive data routing strategy. This strategy converts samples with low visual information gain into pure Reasoning-mode data, enabling the model to learn when image operations are truly necessary. S1-VL is trained through a four-stage progressive pipeline: scientific multimodal SFT, Thinking-with-Images cold-start SFT, and two stages of reinforcement learning with SAPO. We build S1-VL-32B on top of Qwen3-VL-32B-Thinking and evaluate it on 13 benchmarks. Experimental results show that S1-VL-32B achieves state-of-the-art performance on all five Thinking-with-Images benchmarks, including HRBench-4K, HRBench-8K, MME-RealWorld-CN, MME-RealWorld-Lite, and V*, and outperforms compared systems on scientific reasoning benchmarks such as Physics and VRSBench.

[184] Pre-process for segmentation task with nonlinear diffusion filters

Javier Sanguino, Carlos Platero, Olga Velasco

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper deals with the case of using nonlinear diffusion filters to obtain piecewise constant images as a previous process for segmentation techniques. We first show an intrinsic formulation for the nonlinear diffusion equation to provide some design conditions on the diffusion filters. According to this theoretical framework, we propose a new family of diffusivities; they are obtained from nonlinear diffusion techniques and are related with backward diffusion. Their goal is to split the image in closed contours with a homogenized grey intensity inside and with no blurred edges. We also prove that our filters satisfy the well-posedness semi-discrete and full discrete scale-space requirements. This shows that by using semi-implicit schemes, a forward nonlinear diffusion equation is solved, instead of a backward nonlinear diffusion equation, connecting with an edge-preserving process. Under the conditions established for the diffusivity and using a stopping criterion for the diffusion time, we get piecewise constant images with a low computational effort. Finally, we test our filter with real images and we illustrate the effects of our diffusivity function as a method to get piecewise constant images. The code is available at https://github.com/cplatero/NonlinearDiffusion.

[185] UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery

Jingfang Li, Haoran Zhu, Wen Yang, Jinrui Zhang, Fang Xu, Haijian Zhang, Gui-Song Xia

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Ultra-High-Resolution (UHR) imagery has become essential for modern remote sensing, offering unprecedented spatial coverage. However, detecting small objects in such vast scenes presents a critical dilemma: retaining the original resolution for small objects causes prohibitive memory bottlenecks. Conversely, conventional compromises like image downsampling or patch cropping either erase small objects or destroy context. To break this dilemma, we propose UHR-DETR, an efficient end-to-end transformer-based detector designed for UHR imagery. First, we introduce a Coverage-Maximizing Sparse Encoder that dynamically allocates finite computational resources to informative high-resolution regions, ensuring maximum object coverage with minimal spatial redundancy. Second, we design a Global-Local Decoupled Decoder. By integrating macroscopic scene awareness with microscopic object details, this module resolves semantic ambiguities and prevents scene fragmentation. Extensive experiments on the UHR imagery datasets (e.g., STAR and SODA-A) demonstrate the superiority of UHR-DETR under strict hardware constraints (e.g., a single 24GB RTX 3090). It achieves a 2.8% mAP improvement while delivering a 10$\times$ inference speedup compared to standard sliding-window baselines on the STAR dataset. Our codes and models will be available at https://github.com/Li-JingFang/UHR-DETR.

[186] 2L-LSH: A Locality-Sensitive Hash Function-Based Method For Rapid Point Cloud Indexing

Shurui Wang, Yuhe Zhang, Ruizhe Guo, Yaning Zhang, Yifei Xie, Xinyu Zhou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The development of 3D scanning technology has enabled the acquisition of massive point cloud models with diverse structures and large scales, thereby presenting significant challenges in point cloud processing. Fast neighboring points search is one of the most common problems, which is frequently used in model reconstruction, classification, retrieval and feature visualization. Hash function is well known for its high-speed and accurate performance in searching high-dimensional data, which is also the core of the proposed 2L-LSH. Specifically, the 2L-LSH algorithm adopts a two-step hash function strategy, in which the popular step divides the bounding box of the point cloud model and the second step constructs a generalized table-based data structure. The proposed 2L-LSH offers a highly efficient and accurate solution for fast neighboring points search in large-scale 3D point cloud models, making it a promising technique for various applications in the field. The proposed algorithm is compared with the well-known methods including Kd-tree and Octree; the obtained results demonstrated that the proposed method outperforms Kd-tree and Octree in terms of speed, i.e. the time consumption of kNN search can be 51.111% and 94.159% lower than Kd-tree and Octree, respectively. And the RN search time can be 54.519% and 41.840% lower than Kd-tree and Octree, respectively.

[187] VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution

Yixuan Zhu, Shilin Ma, Haolin Wang, Ao Li, Yanzhe Jing, Yansong Tang, Lei Chen, Jiwen Lu, Jie Zhou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent advancements in visual autoregressive models (VAR) have demonstrated their effectiveness in image generation, highlighting their potential for real-world image super-resolution (Real-ISR). However, adapting VAR for ISR presents critical challenges. The next-scale prediction mechanism, constrained by causal attention, fails to fully exploit global low-quality (LQ) context, resulting in blurry and inconsistent high-quality (HQ) outputs. Additionally, error accumulation in the iterative prediction severely degrades coherence in ISR task. To address these issues, we propose VARestorer, a simple yet effective distillation framework that transforms a pre-trained text-to-image VAR model into a one-step ISR model. By leveraging distribution matching, our method eliminates the need for iterative refinement, significantly reducing error propagation and inference time. Furthermore, we introduce pyramid image conditioning with cross-scale attention, which enables bidirectional scale-wise interactions and fully utilizes the input image information while adapting to the autoregressive mechanism. This prevents later LQ tokens from being overlooked in the transformer. By fine-tuning only 1.2% of the model parameters through parameter-efficient adapters, our method maintains the expressive power of the original VAR model while significantly enhancing efficiency. Extensive experiments show that VARestorer achieves state-of-the-art performance with 72.32 MUSIQ and 0.7669 CLIPIQA on DIV2K dataset, while accelerating inference by 10 times compared to conventional VAR inference.

[188] Instance-level Visual Active Tracking with Occlusion-Aware Planning

Haowei Sun, Kai Zhou, Hao Gao, Shiteng Zhang, Jinwu Hu, Xutao Wen, Qixiang Ye, Mingkui Tan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an Occlusion-Aware Trajectory Planner, trained on our new Planning-20k dataset, uses conditional diffusion to generate obstacle-avoiding paths for occlusion recovery. Experiments demonstrate OA-VAT achieves 0.93 average SR on UnrealCV (+2.2% vs. SOTA TrackVLA), 90.8% average CAR on real-world datasets (+12.1% vs. SOTA GC-VAT), and 81.6% TSR on a DJI Tello drone. Running at 35 FPS on an RTX 3090, it delivers robust, real-time performance for practical deployment.

[189] Do MLLMs Understand Pointing? Benchmarking and Enhancing Referential Reasoning in Egocentric Vision

Chentao Li, Zirui Gao, Mingze Gao, Yinglian Ren, Jianjiang Feng, Jie Zhou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Egocentric AI agents, such as smart glasses, rely on pointing gestures to resolve referential ambiguities in natural language commands. However, despite advancements in Multimodal Large Language Models (MLLMs), current systems often fail to precisely ground the spatial semantics of pointing. Instead, they rely on spurious correlations with visual proximity or object saliency, a phenomenon we term “Referential Hallucination.” To address this gap, we introduce EgoPoint-Bench, a comprehensive question-answering benchmark designed to evaluate and enhance multimodal pointing reasoning in egocentric views. Comprising over 11k high-fidelity simulated and real-world samples, the benchmark spans five evaluation dimensions and three levels of referential complexity. Extensive experiments demonstrate that while state-of-the-art proprietary and open-source models struggle with egocentric pointing, models fine-tuned on our synthetic data achieve significant performance gains and robust sim-to-real generalization. This work highlights the importance of spatially aware supervision and offers a scalable path toward precise egocentric AI assistants. Project page: https://guyyyug.github.io/EgoPoint-Bench/

[190] ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation

Junyan Luo, Peipeng Yu, Jianwei Fei, Shiya Zeng, Xiaoyu Zhou, Zhihua Xia, Xiang Liu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deepfake technologies have rapidly advanced with modern generative AI, and face swapping in particular poses serious threats to privacy and digital security. Existing proactive defenses mostly rely on pixel-level perturbations, which are ineffective against contemporary swapping models that extract robust high-level identity embeddings. We propose ID-Eraser, a feature-space proactive defense that removes identifiable facial information to prevent malicious face swapping. By injecting learnable perturbations into identity embeddings and reconstructing natural-looking protection images through a Face Revive Generator (FRG), ID-Eraser produces visually realistic results for humans while rendering the protected identities unusable for Deepfake models. Experiments show that ID-Eraser substantially disrupts identity recognition across diverse face recognition and swapping systems under strict black-box settings, achieving the lowest Top-1 accuracy (0.30) with the best FID (1.64) and LPIPS (0.020). Compared with swaps generated from clean inputs, the identity similarity of protected swaps drops sharply to an average of 0.504 across five representative face swapping models. ID-Eraser further demonstrates strong cross-dataset generalization, robustness to common distortions, and practical effectiveness on commercial APIs, reducing Tencent API similarity from 0.76 to 0.36.

[191] Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts

Yuhan Luo, Tao Chen, Decheng Liu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Nowadays, visual data forgery detection plays an increasingly important role in social and economic security with the rapid development of generative models. Existing face forgery detectors still can’t achieve satisfactory performance because of poor generalization ability across datasets. The key factor that led to this phenomenon is the lack of suitable metrics: the commonly used cross-dataset AUC metric fails to reveal an important issue where detection scores may shift significantly across data domains. To explicitly evaluate cross-domain score comparability, we propose \textbf{Cross-AUC}, an evaluation metric that can compute AUC across dataset pairs by contrasting real samples from one dataset with fake samples from another (and vice versa). It is interesting to find that evaluating representative detectors under the Cross-AUC metric reveals substantial performance drops, exposing an overlooked robustness problem. Besides, we also propose the novel framework \textbf{S}emantic \textbf{F}ine-grained \textbf{A}lignment and \textbf{M}ixture-of-Experts (\textbf{SFAM}), consisting of a patch-level image-text alignment module that enhances CLIP’s sensitivity to manipulation artifacts, and the facial region mixture-of-experts module, which routes features from different facial regions to specialized experts for region-aware forgery analysis. Extensive qualitative and quantitative experiments on the public datasets prove that the proposed method achieves superior performance compared with the state-of-the-art methods with various suitable metrics.

[192] Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction

Yanjiao Liu, Jiawei Liu, Xun Gong, Zifei Nie

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) have recently demonstrated strong reasoning capabilities and attracted increasing research attention in the field of autonomous driving (AD). However, safe application of LLMs on AD perception and prediction still requires a thorough understanding of both the dynamic traffic agents and the static road infrastructure. To this end, this study introduces a framework to evaluate the capability of LLMs in understanding the behaviors of dynamic traffic agents and the topology of road networks. The framework leverages frozen LLMs as the reasoning engine, employing a traffic encoder to extract spatial-level scene features from observed trajectories of agents, while a lightweight Convolutional Neural Network (CNN) encodes the local high-definition (HD) maps. To assess the intrinsic reasoning ability of LLMs, the extracted scene features are then transformed into LLM-compatible tokens via a reprogramming adapter. By residing the prediction burden with the LLMs, a simpler linear decoder is applied to output future trajectories. The framework enables a quantitative analysis of the influence of multi-modal information, especially the impact of map semantics on trajectory prediction accuracy, and allows seamless integration of frozen LLMs with minimal adaptation, thereby demonstrating strong generalizability across diverse LLM architectures and providing a unified platform for model evaluation.

[193] VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Yupeng Zhang, Ruize Han, Ningnan Guo, Wei Feng, Song Wang, Liang Wan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In real-world scenarios, continual changes in weather, illumination, and imaging conditions cause significant domain shifts, leading detectors trained on a single source domain to degrade severely in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data augmentation or domain-invariant representation learning, but pay limited attention to detector mechanisms, leaving clear limitations under complex domain shifts. Through analytical experiments, we find that performance degradation is dominated by increasing missed detections, which fundamentally arises from reduced cross-domain stability of the detector: object-background and inter-instance relations become less stable in the encoding stage, while semantic-spatial alignment of query representations also becomes harder to maintain in the decoding stage. To this end, we propose VFM$^{4}$SDG, a dual-prior learning framework for SDGOD, which introduces a frozen vision foundation model (VFM) as a transferable cross-domain stability prior into detector representation learning and query modeling. In the encoding stage, we propose Cross-domain Stable Relational Prior Distillation to enhance the robustness of object-background and inter-instance relational modeling. In the decoding stage, we propose Semantic-Contextual Prior-based Query Enhancement, which injects category-level semantic prototypes and global visual context into queries to improve their semantic recognition and spatial localization stability in unseen domains. Extensive experiments show that the proposed method consistently outperforms existing SOTA methods on standard SDGOD benchmarks and two mainstream DETR-based detectors, demonstrating its effectiveness, robustness, and generality.

[194] Gmd: Gaussian mixture descriptor for pair matching of 3D fragments

Meijun Xiong, Zhenguo Shi, Xinyu Zhou, Yuhe Zhang, Shunli Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In the automatic reassembly of fragments acquired using laser scanners to reconstruct objects, a crucial step is the matching of fractured surfaces. In this paper, we propose a novel local descriptor that uses the Gaussian Mixture Model (GMM) to fit the distribution of points, allowing for the description and matching of fractured surfaces of fragments. Our method involves dividing a local surface patch into concave and convex regions for estimating the k value of GMM. Then the final Gaussian Mixture Descriptor (GMD) of the fractured surface is formed by merging the regional GMDs. To measure the similarities between GMDs for determining adjacent fragments, we employ the L2 distance and align the fragments using Random Sample Consensus (RANSAC) and Iterative Closest Point (ICP). The extensive experiments on real-scanned public datasets and Terracotta datasets demonstrate the effectiveness of our approach; furthermore, the comparisons with several existing methods also validate the advantage of the proposed method.

[195] Seeing Isn’t Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we systematically evaluate the reliability of Evaluator VLMs across both I2T and T2I tasks. We introduce targeted perturbations that degrade output quality along key error dimensions, including object hallucinations, spatial reasoning, factual grounding, and visual fidelity. These perturbations test whether Evaluator VLMs can reliably account for these quality degrading errors in their evaluations. Using a comprehensive benchmark of over 4000 perturbed instances spanning 40 perturbation dimensions, we evaluate 4 prominent VLMs using single-answer scoring, pairwise comparison, and reference-guided paradigms. Our findings reveal that current VLM evaluators exhibit substantial blind spots: they often fail to detect perturbed outputs - in some cases exceeding 50%, struggle particularly with fine-grained compositional and spatial errors, and are often insensitive to hallucinated content that contradicts the input image. Pairwise comparison proves more reliable, though failure rates persist. These results highlight the unreliable nature of current Evaluator VLMs and urge caution in their deployment for benchmarking and development decisions. Code and data have been made publicly available.

[196] Attention-based multiple instance learning for predominant growth pattern prediction in lung adenocarcinoma wsi using foundation models

Laura Valeria Perez-Herrera, M. J. Garcia-Gonzalez, Karen Lopez-Linares

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Lung adenocarcinoma (LUAD) grading depends on accurately identifying growth patterns, which are indicators of prognosis and can influence treatment decisions. Common deep learning approaches to determine the predominant pattern rely on patch-level classification or segmentation, requiring extensive annotations. This study proposes an attention-based multiple instance learning (ABMIL) framework to predict the predominant LUAD growth pattern at the whole slide level to reduce annotation burden. Our approach integrates pretrained pathology foundation models as patch encoders, used either frozen or fine-tuned on annotated patches, to extract discriminative features that are aggregated through attention mechanisms. Experiments show that fine-tuned encoders improve performance, with Prov-GigaPath achieving the highest agreement (\k{appa} = 0.699) under ABMIL. Compared to simple patch-aggregation baselines, ABMIL yields more robust predictions by leveraging slide-level supervision and spatial attention. Future work will extend this framework to estimate the full distribution of growth patterns and validate performance on external cohorts.

[197] Component-Based Out-of-Distribution Detection

Wenrui Liu, Hong Chang, Ruibing Hou, Shiguang Shan, Xilin Chen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Out-of-Distribution (OOD) detection requires sensitivity to subtle shifts without overreacting to natural In-Distribution (ID) diversity. However, from the viewpoint of detection granularity, global representation inevitably suppress local OOD cues, while patch-based methods are unstable due to entangled spurious-correlation and noise. And neither them is effective in detecting compositional OODs composed of valid ID components. Inspired by recognition-by-components theory, we present a training-free Component-Based OOD Detection (CoOD) framework that addresses the existing limitations by decomposing inputs into functional components. To instantiate CoOD, we derive Component Shift Score (CSS) to detect local appearance shifts, and Compositional Consistency Score (CCS) to identify cross-component compositional inconsistencies. Empirically, CoOD achieves consistent improvements on both coarse- and fine-grained OOD detection.

[198] Deep kernel video approximation for unsupervised action segmentation

Silvia L. Pintea, Jouke Dijkstra

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This work focuses on per-video unsupervised action segmentation, which is of interest to applications where storing large datasets is either not possible, or nor permitted. We propose to segment videos by learning in deep kernel space, to approximate the underlying frame distribution, as closely as possible. To define this closeness metric between the original video distribution and its approximation, we rely on maximum mean discrepancy (MMD) which is a geometry-preserving metric in distribution space, and thus gives more reliable estimates. Moreover, unlike the commonly used optimal transport metric, MMD is both easier to optimize, and faster. We choose to use neural tangent kernels (NTKs) to define the kernel space where MMD operates, because of their improved descriptive power as opposed to fixed kernels. And, also, because NTKs sidestep the trivial solution, when jointly learning the inputs (video approximation) and the kernel function. Finally, we show competitive results when compared to state-of-the-art per-video methods, on six standard benchmarks. Additionally, our method has higher F1 scores than prior agglomerative work, when the number of segments is unknown.

Changfan Wang, Xinran Wang, Donghai Liu, Fei Su, Lulu Sun, Zhicheng Zhao, Zhu Meng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Spatial transcriptomics (ST) enables spatially resolved gene profiling but remains expensive and low-throughput, limiting large-cohort studies and routine clinical use. Predicting spatial gene expression from routine hematoxylin and eosin (H&E) slides is a promising alternative, yet under realistic leave-one-slide-out evaluation, existing models often suffer from slide-level appearance shifts and regression-driven over-smoothing that suppress biologically meaningful variation. CHRep is a two-phase framework for robust histology-to-expression prediction. In the training phase, CHRep learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization. In the inference phase, cross-slide robustness is improved without backbone fine-tuning through a lightweight calibration module trained on the training slides, which combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module. Unlike prior embedding-alignment or retrieval-based transfer methods that rely on a single prediction route, CHRep couples topology-preserving representation learning with post-hoc calibration, enabling stable neighborhood retrieval and controlled bias correction under slide-level shifts. Across the three cohorts, CHRep consistently improves gene-wise correlation under leave-one-slide-out evaluation, with the largest gains observed on Alex+10x. Relative to HAGE, the Pearson correlation coefficient on all considered genes [PCC(ACG)] increases by 4.0% on cSCC and 9.8% on HER2+. Relative to mclSTExp, PCC(ACG) further improves by 39.5% on Alex+10x, together with 9.7% and 9.0% reductions in mean squared error (MSE) and mean absolute error (MAE), respectively.

Zeyu Cai, Yuliang Xiu, Renke Wang, Zhijing Shao, Xiaoben Li, Siyuan Yu, Chao Xu, Yang Liu, Baigui Sun, Jian Yang, Zhenyu Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Fitting an underlying body model to 3D clothed human assets has been extensively studied, yet most approaches focus on either single-modal inputs such as point clouds or multi-view images alone, often requiring a known metric scale. This constraint is frequently impractical, especially for AI-generated assets where scale distortion is common. We propose OmniFit, a method that can seamlessly handle diverse multi-modal inputs, including full scans, partial depth observations, and image captures, while remaining scale-agnostic for both real and synthetic assets. Our key innovation is a simple yet effective conditional transformer decoder that directly maps surface points to dense body landmarks, which are then used for SMPL-X parameter fitting. In addition, an optional plug-and-play image adapter incorporates visual cues to compensate for missing geometric information. We further introduce a dedicated scale predictor that rescales subjects to canonical body proportions. OmniFit substantially outperforms state-of-the-art methods by 57.1 to 80.9 percent across daily and loose clothing scenarios. To the best of our knowledge, it is the first body fitting method to surpass multi-view optimization baselines and the first to achieve millimeter-level accuracy on the CAPE and 4D-DRESS benchmarks.

[201] Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Minghao Yin, Wenbo Hu, Jiale Xu, Ying Shan, Kai Han

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.

[202] Local Neighborhood Instability in Parametric Projections: Quantitative and Visual Analysis

Frederik L. Dennig, Daniel A. Keim

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Parametric projections let analysts embed new points in real time, but input variations from measurement noise or data drift can produce unpredictable shifts in the 2D layout. Whether and where a projection is locally stable remains largely unexamined. In this paper, we present a stability evaluation framework that probes parametric projections with Gaussian perturbations around selected anchor points and assesses how neighborhoods deform in the 2D embedding. Our approach combines quantitative measures of mean displacement, bias, and nearest-anchor assignment error with per-anchor visualizations of displacement vectors, local PCA ellipsoids, and Voronoi misassignment for detailed inspection. We demonstrate the framework’s effectiveness on UMAP- and t-SNE-based neural projectors of varying network sizes and study the effect of Jacobian regularization as a gradient-based robustness strategy. We apply our framework to the MNIST and Fashion-MNIST datasets. The results show that our framework identifies unstable projection regions invisible to reconstruction error or neighborhood-preservation metrics.

[203] DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

Tahar Chettaoui, Eduarda Caldeira, Guray Ozgur, Raghavendra Ramachandra, Fadi Boutros, Naser Damer

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both identity conditioning and latent space levels. Unlike image-level methods suffering from blending artifacts or GAN-based approaches with limited reconstruction fidelity, DCMorph leverages identity-conditioned latent diffusion models through two mechanisms: (1) decoupled cross-attention interpolation that injects identity-specific features from both source faces into the denoising process, enabling explicit dual-identity conditioning absent in existing diffusion-based methods, and (2) DDIM inversion with spherical interpolation between inverted latent representations from both source faces, providing geometrically consistent initial latent representation that preserves structural attributes. Vulnerability analyses across four state-of-the-art face recognition systems demonstrate that DCMorph achieves the highest attack success rates compared to existing methods at both operational thresholds, while remaining challenging to detect by current morphing attack detection solutions.

[204] DualSplat: Robust 3D Gaussian Splatting via Pseudo-Mask Bootstrapping from Reconstruction Failures

Xu Wang, Zhiru Wang, Shiyun Xie, Chengwei Pan, Yisong Chen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While 3D Gaussian Splatting (3DGS) achieves real-time photorealistic rendering, its performance degrades significantly when training images contain transient objects that violate multi-view consistency. Existing methods face a circular dependency: accurate transient detection requires a well-reconstructed static scene, while clean reconstruction itself depends on reliable transient masks. We address this challenge with DualSplat, a Failure-to-Prior framework that converts first-pass reconstruction failures into explicit priors for a second reconstruction stage. We observe that transients, which appear in only a subset of views, often manifest as incomplete fragments during conservative initial training. We exploit these failures to construct object-level pseudo-masks by combining photometric residuals, feature mismatches, and SAM2 instance boundaries. These pseudo-masks then guide a clean second-pass 3DGS optimization, while a lightweight MLP refines them online by gradually shifting from prior supervision to self-consistency. Experiments on RobustNeRF and NeRF On-the-go show that DualSplat outperforms existing baselines, demonstrating particularly clear advantages in transient-heavy scenes and transient regions.

[205] Causal Disentanglement for Full-Reference Image Quality Assessment

Zhen Zhang, Jielei Chu, Tian Zhang, Weide Liu, Fengmao Lv, Tianrui Li, Jun Cheng, Yuming Fang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Existing deep network-based full-reference image quality assessment (FR-IQA) models typically work by performing pairwise comparisons of deep features from the reference and distorted images. In this paper, we approach this problem from a different perspective and propose a novel FR-IQA paradigm based on causal inference and decoupled representation learning. Unlike typical feature comparison-based FR-IQA models, our approach formulates degradation estimation as a causal disentanglement process guided by intervention on latent representations. We first decouple degradation and content representations by exploiting the content invariance between the reference and distorted images. Second, inspired by the human visual masking effect, we design a masking module to model the causal relationship between image content and degradation features, thereby extracting content-influenced degradation features from distorted images. Finally, quality scores are predicted from these degradation features using either supervised regression or label-free dimensionality reduction. Extensive experiments demonstrate that our method achieves highly competitive performance on standard IQA benchmarks across fully supervised, few-label, and label-free settings. Furthermore, we evaluate the approach on diverse non-standard natural image domains with scarce data, including underwater, radiographic, medical, neutron, and screen-content images. Benefiting from its ability to perform scenario-specific training and prediction without labeled IQA data, our method exhibits superior cross-domain generalization compared to existing training-free FR-IQA models.

[206] Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-language alignment through dedicated encoders that project motion features into the LLM’s embedding space, remaining constrained by cross-modal representation and alignment. Inspired by biomechanical analysis, where joint angles and body-part kinematics have long served as a precise descriptive language for human movement, we propose \textbf{Structured Motion Description (SMD)}, a rule-based, deterministic approach that converts joint position sequences into structured natural language descriptions of joint angles, body part movements, and global trajectory. By representing motion as text, SMD enables LLMs to apply their pretrained knowledge of body parts, spatial directions, and movement semantics directly to motion reasoning, without requiring learned encoders or alignment modules. We show that this approach goes beyond state-of-the-art results on both motion question answering (66.7% on BABEL-QA, 90.1% on HuMMan-QA) and motion captioning (R@1 of 0.584, CIDEr of 53.16 on HumanML3D), surpassing all prior methods. SMD additionally offers practical benefits: the same text input works across different LLMs with only lightweight LoRA adaptation (validated on 8 LLMs from 6 model families), and its human-readable representation enables interpretable attention analysis over motion descriptions. Code, data, and pretrained LoRA adapters are available at https://yaozhang182.github.io/motion-smd/.

[207] Sapiens2

Rawal Khirodkar, He Wen, Julieta Martinez, Yuan Dong, Su Zhaoen, Shunsuke Saito

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substantially improves over its predecessor in both pretraining and post-training. First, to learn features that capture low-level details (for dense prediction) and high-level semantics (for zero-shot or few-label settings), we combine masked image reconstruction with self-distilled contrastive objectives. Our evaluations show that this unified pretraining objective is better suited for a wider range of downstream tasks. Second, along the data axis, we pretrain on a curated dataset of 1 billion high-quality human images and improve the quality and quantity of task annotations. Third, architecturally, we incorporate advances from frontier models that enable longer training schedules with improved stability. Our 4K models adopt windowed attention to reason over longer spatial context and are pretrained with 2K output resolution. Sapiens2 sets a new state-of-the-art and improves over the first generation on pose (+4 mAP), body-part segmentation (+24.3 mIoU), normal estimation (45.6% lower angular error) and extends to new tasks such as pointmap and albedo estimation. Code: https://github.com/facebookresearch/sapiens2

[208] WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesthetic scores, and VLM-based judgments, but none supplies the standardized test conditions – identical scenes, identical action sequences, and a unified control interface – needed to make those metrics comparable across models with heterogeneous inputs. We introduce WorldMark, the first benchmark that provides such a common playing field for interactive Image-to-Video world models. WorldMark contributes: (1) a unified action-mapping layer that translates a shared WASD-style action vocabulary into each model’s native control format, enabling apples-to-apples comparison across six major models on identical scenes and trajectories; (2) a hierarchical test suite of 500 evaluation cases covering first- and third-person viewpoints, photorealistic and stylized scenes, and three difficulty tiers from Easy to Hard spanning 20-60s; and (3) a modular evaluation toolkit for Visual Quality, Control Alignment, and World Consistency, designed so that researchers can reuse our standardized inputs while plugging in their own metrics as the field evolves. We will release all data, evaluation code, and model outputs to facilitate future research. Beyond offline metrics, we launch World Model Arena (warena.ai), an online platform where anyone can pit leading world models against each other in side-by-side battles and watch the live leaderboard.

[209] Efficient Logic Gate Networks for Video Copy Detection

Katarzyna Fojcik

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Video copy detection requires robust similarity estimation under diverse visual distortions while operating at very large scale. Although deep neural networks achieve strong performance, their computational cost and descriptor size limit practical deployment in high-throughput systems. In this work, we propose a video copy detection framework based on differentiable Logic Gate Networks (LGNs), which replace conventional floating-point feature extractors with compact, logic-based representations. Our approach combines aggressive frame miniaturization, binary preprocessing, and a trainable LGN embedding model that learns both logical operations and interconnections. After training, the model can be discretized into a purely Boolean circuit, enabling extremely fast and memory-efficient inference. We systematically evaluate different similarity strategies, binarization schemes, and LGN architectures across multiple dataset folds and difficulty levels. Experimental results demonstrate that LGN-based models achieve competitive or superior accuracy and ranking performance compared to prior models, while producing descriptors several orders of magnitude smaller and delivering inference speeds exceeding 11k samples per second. These findings indicate that logic-based models offer a promising alternative for scalable and resource-efficient video copy detection.

[210] Unlocking the Power of Critical Factors for 3D Visual Geometry Estimation

Guangkai Xu, Hua Geng, Huanyi Zheng, Songyi Yin, Yanlong Sun, Hao Chen, Chunhua Shen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Feed-forward visual geometry estimation has recently made rapid progress. However, an important gap remains: multi-frame models usually produce better cross-frame consistency, yet they often underperform strong per-frame methods on single-frame accuracy. This observation motivates our systematic investigation into the critical factors driving model performance through rigorous ablation studies, which reveals several key insights: 1) Scaling up data diversity and quality unlocks further performance gains even in state-of-the-art visual geometry estimation methods; 2) Commonly adopted confidence-aware loss and gradient-based loss mechanisms may unintentionally hinder performance; 3) Joint supervision through both per-sequence and per-frame alignment improves results, while local region alignment surprisingly degrades performance. Furthermore, we introduce two enhancements to integrate the advantages of optimization-based methods and high-resolution inputs: a consistency loss function that enforces alignment between depth maps, camera parameters, and point maps, and an efficient architectural design that leverages high-resolution information. We integrate these designs into CARVE, a resolution-enhanced model for feed-forward visual geometry estimation. Experiments on point cloud reconstruction, video depth estimation, and camera pose/intrinsic estimation show that CARVE achieves strong and robust performance across diverse benchmarks.

[211] Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection

Wenxuan Bao, Yanjun Zhao, Xiyuan Yang, Jingrui He

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Pretrained vision-language models such as CLIP exhibit strong zero-shot generalization but remain sensitive to distribution shifts. Test-time adaptation adapts models during inference without access to source data or target labels, offering a practical way to handle such shifts. However, existing methods typically assume that test samples come from a single, consistent domain, while in practice, test data often include samples from mixed domains with distinct characteristics. Consequently, their performance degrades under mixed-domain settings. To address this, we present Ramen, a framework for robust test-time adaptation through active sample selection. For each incoming test sample, Ramen retrieves a customized batch of relevant samples from previously seen data based on two criteria: domain consistency, which ensures that adaptation focuses on data from similar domains, and prediction balance, which mitigates adaptation bias caused by skewed predictions. To improve efficiency, Ramen employs an embedding-gradient cache that stores the embeddings and sample-level gradients of past test images. The stored embeddings are used to retrieve relevant samples, and the corresponding gradients are aggregated for model updates, eliminating the need for any additional forward or backward passes. Our theoretical analysis provides insight into why the proposed adaptation mechanism is effective under mixed-domain shifts. Experiments on multiple image corruption and domain-shift benchmarks demonstrate that Ramen achieves strong and consistent performance, offering robust and efficient adaptation in complex mixed-domain scenarios. Our code is available at https://github.com/baowenxuan/Ramen .

[212] Interpretable facial dynamics as behavioral and perceptual traces of deepfakes

Timothy Joseph Murphy, Jennifer Cook, Hélio Clemente José Cuve

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deepfake detection research has largely converged on deep learning approaches that, despite strong benchmark performance, offer limited insight into what distinguishes real from manipulated facial behavior. This study presents an interpretable alternative grounded in bio-behavioral features of facial dynamics and evaluates how computational detection strategies relate to human perceptual judgments. We identify core low-dimensional patterns of facial movement, from which temporal features characterizing spatiotemporal structure were derived. Traditional machine learning classifiers trained on these features achieved modest but significant above-chance deepfake classification, driven by higher-order temporal irregularities that were more pronounced in manipulated than real facial dynamics. Notably, detection was substantially more accurate for videos containing emotive expressions than those without. An emotional valence classification analysis further indicated that emotive signals are systematically degraded in deepfakes, explaining the differential impact of emotive dynamics on detection. Furthermore, we provide an additional and often overlooked dimension of explainability by assessing the relationship between model decisions and human perceptual detection. Model and human judgments converged for emotive but diverged for non-emotive videos, and even where outputs aligned, underlying detection strategies differed. These findings demonstrate that face-swapped deepfakes carry a measurable behavioral fingerprint, most salient during emotional expression. Additionally, model-human comparisons suggest that interpretable computational features and human perception may offer complementary rather than redundant routes to detection.

[213] Back to Source: Open-Set Continual Test-Time Adaptation via Domain Compensation

Yingkai Yang, Chaoqi Chen, Hui Huang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Test-Time Adaptation (TTA) aims to mitigate distributional shifts between training and test domains during inference time. However, existing TTA methods fall short in the realistic scenario where models face both continually changing domains and the simultaneous emergence of unknown semantic classes, a challenging setting we term Open-set Continual Test-Time Adaptation (OCTTA). The coupling of domain and semantic shifts often collapses the feature space, severely degrading both classification and out-of-distribution detection. To tackle this, we propose DOmain COmpensation (DOCO), a lightweight and effective framework that robustly performs domain adaptation and OOD detection in a synergistic, closed loop. DOCO first performs dynamic, adaptation-conditioned sample splitting to separate likely ID from OOD samples. Then, using only the ID samples, it learns a domain compensation prompt by aligning feature statistics with the source domain, guided by a structural preservation regularizer that prevents semantic distortion. This learned prompt is then propagated to the OOD samples within the same batch, effectively isolating their semantic novelty for more reliable detection. Extensive experiments on multiple challenging benchmarks demonstrate that DOCO outperforms prior CTTA and OSTTA methods, establishing a new state-of-the-art for the demanding OCTTA setting.

[214] Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting

Avinash Paliwal, Adithya Iyer, Shivin Yadav, Muhammad Ali Afridi, Midhun Harikumar

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Precise camera control for reshooting dynamic videos is bottlenecked by the severe scarcity of paired multi-view data for non-rigid scenes. We overcome this limitation with a highly scalable self-supervised framework capable of leveraging internet-scale monocular videos. Our core contribution is the generation of pseudo multi-view training triplets, consisting of a source video, a geometric anchor, and a target video. We achieve this by extracting distinct smooth random-walk crop trajectories from a single input video to serve as the source and target views. The anchor is synthetically generated by forward-warping the first frame of the source with a dense tracking field, which effectively simulates the distorted point-cloud inputs expected at inference. Because our independent cropping strategy introduces spatial misalignment and artificial occlusions, the model cannot simply copy information from the current source frame. Instead, it is forced to implicitly learn 4D spatiotemporal structures by actively routing and re-projecting missing high-fidelity textures across distinct times and viewpoints from the source video to reconstruct the target. At inference, our minimally adapted diffusion transformer utilizes a 4D point-cloud derived anchor to achieve state-of-the-art temporal consistency, robust camera control, and high-fidelity novel view synthesis on complex dynamic scenes.

Katharina Prasse, Steffen Jung, Isaac Bravo, Stefanie Walter, Patrick Knab, Christian Bartelt, Margret Keuper

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Social media platforms have become primary arenas for climate communication, generating millions of images and posts that - if systematically analysed - can reveal which communication strategies mobilise public concern and which fall flat. We aim to facilitate such research by analysing how computer vision methods can be used for social media discourse analysis. This analysis includes application-based taxonomy design, model selection, prompt engineering, and validation. We benchmark six promptable vision-language models and 15 zero-shot CLIP-like models on two datasets from X (formerly Twitter) - a 1,038-image expert-annotated set and a larger corpus of over 1.2 million images, with 50,000 labels manually validated - spanning five annotation dimensions: animal content, climate change consequences, climate action, image setting, and image type. Among the models benchmarked, Gemini-3.1-flash-lite outperforms all others across all super-categories and both datasets, while the gap to open-weight models of moderate size remains relatively small. Beyond instance-level metrics, we advocate for distributional evaluation: VLM predictions can reliably recover population level trends even when per-image accuracy is moderate, making them a viable starting point for discourse analysis at scale. We find that chain-of-thought reasoning reduces rather than improves performance, and that annotation dimension specific prompt design improves performance. We release tweet IDs and labels along with our code at https://github.com/KathPra/Codebooks2VLMs.git.

[216] SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery

Safouane El Ghazouali, Nicola Venturi, Michael Rueegsegger, Umberto Michelucci

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent advances in deep learning for remote sensing rely heavily on large annotated datasets, yet acquiring high-quality ground truth for geometric, radiometric, and multi-domain tasks remains costly and often infeasible. In particular, the lack of accurate depth annotations, controlled illumination variations, and multi-scale paired imagery limits progress in monocular depth estimation, domain adaptation, and super-resolution for aerial scenes. We present SyMTRS, a large-scale synthetic dataset generated using a high-fidelity urban simulation pipeline. The dataset provides high-resolution RGB aerial imagery (2048 x 2048), pixel-perfect depth maps, night-time counterparts for domain adaptation, and aligned low-resolution variants for super-resolution at x2, x4, and x8 scales. Unlike existing remote sensing datasets that focus on a single task or modality, SyMTRS is designed as a unified multi-task benchmark enabling joint research in geometric understanding, cross-domain robustness, and resolution enhancement. We describe the dataset generation process, its statistical properties, and its positioning relative to existing benchmarks. SyMTRS aims to bridge critical gaps in remote sensing research by enabling controlled experiments with perfect geometric ground truth and consistent multi-domain supervision. The results obtained in this work can be reproduced from this Github repository: https://github.com/safouaneelg/SyMTRS.

[217] TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typically cover only a limited range of salient changes, which induces two limitations highly relevant to practical applications, namely Insufficient Entity Coverage and Clause-Entity Misalignment. In order to address these issues and bring CIR closer to real-world use, we construct two instruction-rich multi-modification datasets, M-FashionIQ and M-CIRR. In addition, we propose TEMA, the Text-oriented Entity Mapping Architecture, which is the first CIR framework designed for multi-modification while also accommodating simple modifications. Extensive experiments on four benchmark datasets demonstrate that TEMA’s superiority in both original and multi-modification scenarios, while maintaining an optimal balance between retrieval accuracy and computational efficiency. Our codes and constructed multi-modification dataset (M-FashionIQ and M-CIRR) are available at https://github.com/lee-zixu/ACL26-TEMA/.

[218] Multiscale Super Resolution without Image Priors

Daniel Fu, Gabby Litterio, Pedro Felzenszwalb, Rashid Zia

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We address the ambiguities in the super-resolution problem under translation. We demonstrate that combinations of low-resolution images at different scales can be used to make the super-resolution problem well posed. Such differences in scale can be achieved using sensors with different pixel sizes (as demonstrated here) or by varying the effective pixel size through changes in optical magnification (e.g., using a zoom lens). We show that images acquired with pairwise coprime pixel sizes lead to a system with a stable inverse, and furthermore, that super-resolution images can be reconstructed efficiently using Fourier domain techniques or iterative least squares methods. Our mathematical analysis provides an expression for the expected error of the least squares reconstruction for large signals assuming i.i.d. noise that elucidates the noise-resolution tradeoff. These results are validated through both one- and two-dimensional experiments that leverage charge-coupled device (CCD) hardware binning to explore reconstructions over a large range of effective pixel sizes. Finally, two-dimensional reconstructions for a series of targets are used to demonstrate the advantages of multiscale super-resolution, and implications of these results for common imaging systems are discussed.

[219] Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

Bowen Liu, Li Yang, Shanshan Song, Mingyu Tang, Zhifang Gao, Qifeng Chen, Yangqiu Song, Huimin Chen, Xiaomeng Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Capsule endoscopy (CE) enables non-invasive gastrointestinal screening, but current CE research remains largely limited to frame-level classification and detection, leaving video-level analysis underexplored. To bridge this gap, we introduce and formally define a new task, diagnosis-driven CE video summarization, which requires extracting key evidence frames that covers clinically meaningful findings and making accurate diagnoses from those evidence frames. This setting is challenging because diagnostically relevant events are extremely sparse and can be overwhelmed by tens of thousands of redundant normal frames, while individual observations are often ambiguous due to motion blur, debris, specular highlights, and rapid viewpoint changes. To facilitate research in this direction, we introduce VideoCAP, the first CE dataset with diagnosis-driven annotations derived from real clinical reports. VideoCAP comprises 240 full-length videos and provides realistic supervision for both key evidence frame extraction and diagnosis. To address this task, we further propose DiCE, a clinician-inspired framework that mirrors the standard CE reading workflow. DiCE first performs efficient candidate screening over the raw video, then uses a Context Weaver to organize candidates into coherent diagnostic contexts that preserve distinct lesion events, and an Evidence Converger to aggregate multi-frame evidence within each context into robust clip-level judgments. Experiments show that DiCE consistently outperforms state-of-the-art methods, producing concise and clinically reliable diagnostic summaries. These results highlight diagnosis-driven contextual reasoning as a promising paradigm for ultra-long CE video summarization.

[220] Grounding Video Reasoning in Physical Signals

Alibay Osmanli, Zixu Cheng, Shaogang Gong

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what–when–where evaluation structure of V-STaR to four video sources, six physics domains, three prompt families (physics, vstar_like, and neutral_rstr), and four input conditions (original, shuffled, ablated, and frame-masked). The benchmark contains 1,560 base video clips from SSV2, YouCook2, HoloAssist, and Roundabout-TAU. Each clip is first converted into a shared grounded event record, and the three query families are derived from that record. Temporal and spatial targets are shared across prompt families, while the non-physics families use deterministic family-appropriate semantic a_what targets derived from the same record. Across models and prompt families, physics remains the strongest regime overall, vstar_like is the clearest non-physics semantic comparison, and neutral_rstr behaves as a harder templated control. Prompt-family robustness is selective rather than universal, perturbation gains cluster in weak original cases, and spatial grounding is the weakest across settings. These results suggest that video Q&A reasoning benchmarks shall report physically grounded, prompt-aware, and perturbation-aware diagnostics alongside aggregate accuracy.

[221] Addressing Image Authenticity When Cameras Use Generative AI

Umar Masud, Abhijith Punnappurath, Luxi Zhao, David B. Lindell, Michael S. Brown

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Interestingly, images captured directly by our cameras are considered authentic and faithful. However, with the increasing integration of deep-learning modules into cameras’ capture-time hardware – namely, the image signal processor (ISP) – there is now a potential for hallucinated content in images directly output by our cameras. Hallucinated capture-time image content is typically benign, such as enhanced edges or texture, but in certain operations, such as AI-based digital zoom or low-light image enhancement, hallucinations can potentially alter the semantics and interpretation of the image content. As a result, users may not realize that the content in their camera images is not authentic. This paper addresses this issue by enabling users to recover the ‘unhallucinated’ version of the camera image to avoid misinterpretation of the image content. Our approach works by optimizing an image-specific multi-layer perceptron (MLP) decoder together with a modality-specific encoder so that, given the camera image, we can recover the image before hallucinated content was added. The encoder and MLP are self-contained and can be applied post-capture to the image without requiring access to the camera ISP. Moreover, the encoder and MLP decoder require only 180 KB of storage and can be readily saved as metadata within standard image formats such as JPEG and HEIC.

[222] When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the language component, yet the relative importance of these factors remains unclear. To resolve this ambiguity, We propose HalluScope, a benchmark to better understand the extent to which different factors induce hallucinations. Our analysis indicates that hallucinations largely stem from excessive reliance on textual priors and background knowledge, especially information introduced through textual instructions. To mitigate hallucinations induced by textual instruction priors, we propose HalluVL-DPO, a framework for fine-tuning off-the-shelf LVLMs towards more visually grounded responses. HalluVL-DPO leverages preference optimization using a curated training dataset that we construct, guiding the model to prefer grounded responses over hallucinated ones. We demonstrate that our optimized model effectively mitigates the targeted hallucination failure mode, while preserving or improving performance on other hallucination benchmarks and visual capability evaluations. To support reproducibility and further research, we will publicly release our evaluation benchmark, preference training dataset, and code at https://pegah-kh.github.io/projects/prompts-override-vision/ .

[223] UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

Yanran Zhang, Wenzhao Zheng, Yifei Li, Bingyao Yu, Yu Zheng, Lei Chen, Jiwen Lu, Jie Zhou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved distinct architectural paradigms: the former predominantly relies on generative networks, while the latter favors discriminative frameworks. A recent trend in both domains is the use of adversarial information to enhance performance, revealing potential for synergy. However, the significant architectural divergence between them presents considerable challenges. Departing from previous approaches, we propose UniGenDet: a Unified generative-discriminative framework for co-evolutionary image Generation and generated image Detection. To bridge the task gap, we design a symbiotic multimodal self-attention mechanism and a unified fine-tuning algorithm. This synergy allows the generation task to improve the interpretability of authenticity identification, while authenticity criteria guide the creation of higher-fidelity images. Furthermore, we introduce a detector-informed generative alignment mechanism to facilitate seamless information exchange. Extensive experiments on multiple datasets demonstrate that our method achieves state-of-the-art performance. Code: \href{https://github.com/Zhangyr2022/UniGenDet}{https://github.com/Zhangyr2022/UniGenDet}.

[224] Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision

Leyla Roksan Caglar, Pedro A. M. Mediano, Baihan Lin

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Humans and modern vision models can reach similar classification accuracy while making systematically different kinds of mistakes - differing not in how often they err, but in who gets mistaken for whom, and in which direction. We show that these directional confusions reveal distinct inductive biases that are invisible to accuracy alone. Using matched human and deep vision model responses on a natural-image categorization task under 12 perturbation types, we quantify asymmetry in confusion matrices and link it to generalization geometry through a Rate-Distortion (RD) framework, summarized by three geometric signatures (slope (beta), curvature (kappa)) and efficiency (AUC). We find that humans exhibit broad but weak asymmetries, whereas deep vision models show sparser, stronger directional collapses. Robustness training reduces global asymmetry but fails to recover the human-like breadth-strength profile of graded similarity. Mechanistic simulations further show that different asymmetry organizations shift the RD frontier in opposite directions, even when matched for performance. Together, these results position directional confusions and RD geometry as compact, interpretable signatures of inductive bias under distribution shift.

[225] Vista4D: Video Reshooting with 4D Point Clouds

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca, Yash Kant, Ryan Burgert, Yuancheng Xu, Koichi Namekata, Yiwei Zhao, Bolei Zhou, Micah Goldblum, Paul Debevec, Ning Yu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video reshooting methods often struggle with depth estimation artifacts of real-world dynamic videos, while also failing to preserve content appearance and failing to maintain precise camera control for challenging new trajectories. We build a 4D-grounded point cloud representation with static pixel segmentation and 4D reconstruction to explicitly preserve seen content and provide rich camera signals, and we train with reconstructed multiview dynamic data for robustness against point cloud artifacts during real-world inference. Our results demonstrate improved 4D consistency, camera control, and visual quality compared to state-of-the-art baselines under a variety of videos and camera paths. Moreover, our method generalizes to real-world applications such as dynamic scene expansion and 4D scene recomposition. See our project page for results, code, and models: https://eyeline-labs.github.io/Vista4D

[226] Context Unrolling in Omni Models

Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This process enables the model to aggregate complementary information across heterogeneous modalities, facilitating a more faithful approximation of the shared multimodal knowledge manifold and improving downstream reasoning fidelity. As a result, Omni achieves strong performance on both multimodal generation and understanding benchmarks, while demonstrating advanced multimodal reasoning capabilities, including in-context generation of text, image, video, and 3D geometry.

[227] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene dynamics. IMU-to-4D uses data from a few inertial sensors from earbuds, watches, or smartphones and predicts detailed 4D human motion together with coarse scene structure. Experiments across diverse human-scene datasets show that IMU-to-4D yields more coherent and temporally stable results than SoTA cascaded pipelines, suggesting wearable motion sensors alone can support rich 4D understanding.

[228] Seeing Fast and Slow: Learning the Flow of Time in Videos

Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in videos. We first exploit the multimodal cues and temporal structure naturally present in videos to learn, in a self-supervised manner, to detect speed changes and estimate playback speed. We then show that these learned temporal reasoning models enable us to curate the largest slow-motion video dataset to date from noisy in-the-wild sources. Such slow-motion footage, typically filmed by high-speed cameras, contains substantially richer temporal detail than standard videos. Using this data, we further develop models capable of temporal control, including speed-conditioned video generation, which produces motion at specified playback speed, and temporal super-resolution, which tranforms low-FPS, blurry videos into high-FPS sequences with fine-grained temporal details. Our findings highlight time as a manipulable, perceptual dimension in video learning, opening doors to temporally controllable video generation, temporal forensics detection, and potentially richer world-models that understand how events unfold over time.

[229] VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Vision Large Language Models (VLLMs) are widely acknowledged to be prone to hallucinations. Existing research addressing this problem has primarily been confined to image inputs, with limited exploration of video-based hallucinations. Furthermore, current evaluation methods fail to capture nuanced errors in generated responses, which are often exacerbated by the rich spatiotemporal dynamics of videos. To address this, we introduce VidHal, a benchmark specially designed to evaluate video-based hallucinations in VLLMs. VidHal is constructed by bootstrapping video instances across a wide range of common temporal aspects. A defining feature of our benchmark lies in the careful creation of captions which represent varying levels of hallucination associated with each video. To enable fine-grained evaluation, we propose a novel caption ordering task requiring VLLMs to rank captions by hallucinatory extent. We conduct extensive experiments on VidHal and comprehensively evaluate a broad selection of models. Our results uncover significant limitations in existing VLLMs regarding hallucination generation. Through our benchmark, we aim to inspire further research on 1) holistic understanding of VLLM capabilities, particularly regarding hallucination, and 2) extensive development of advanced VLLMs to alleviate this problem.

[230] SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation

Guoan Xu, Jiaming Chen, Wenfeng Huang, Wenjing Jia, Guangwei Gao, Guo-Jun Qi

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The Vision Transformer (ViT) has achieved notable success in computer vision, with its variants widely validated across various downstream tasks, including semantic segmentation. However, as general-purpose visual encoders, ViT backbones often do not fully address the specific requirements of task decoders, highlighting opportunities for designing decoders optimized for efficient semantic segmentation. This paper proposes Strip Cross-Attention (SCASeg), an innovative decoder head specifically designed for semantic segmentation. Instead of relying on the conventional skip connections, we utilize lateral connections between encoder and decoder stages, leveraging encoder features as Queries in cross-attention modules. Additionally, we introduce a Cross-Layer Block (CLB) that integrates hierarchical feature maps from various encoder and decoder stages to form a unified representation for Keys and Values. The CLB also incorporates the local perceptual strengths of convolution, enabling SCASeg to capture both global and local context dependencies across multiple layers, thus enhancing feature interaction at different scales and improving overall efficiency. To further optimize computational efficiency, SCASeg compresses the channels of queries and keys into one dimension, creating strip-like patterns that reduce memory usage and increase inference speed compared to traditional vanilla cross-attention. Experiments show that SCASeg’s adaptable decoder delivers competitive performance across various setups, outperforming leading segmentation architectures on benchmark datasets, including ADE20K, Cityscapes, COCO-Stuff 164k, and Pascal VOC2012, even under diverse computational constraints.

[231] DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Ziyang Song, Zerong Wang, Bo Li, Hao Zhang, Ruijie Zhu, Li Liu, Peng-Tao Jiang, Tianzhu Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Monocular depth estimation within the diffusion-denoising paradigm demonstrates impressive generalization ability but suffers from low inference speed. Recent methods adopt a single-step deterministic paradigm to improve inference efficiency while maintaining comparable performance. However, they overlook the gap between generative and discriminative features, leading to suboptimal results. In this work, we propose DepthMaster, a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task. First, to mitigate overfitting to texture details introduced by generative features, we propose a Feature Alignment module, which incorporates high-quality semantic features to enhance the denoising network’s representation capability. Second, to address the lack of fine-grained details in the single-step deterministic framework, we propose a Fourier Enhancement module to adaptively balance low-frequency structure and high-frequency details. We adopt a two-stage training strategy to fully leverage the potential of the two modules. In the first stage, we focus on learning the global scene structure with the Feature Alignment module, while in the second stage, we exploit the Fourier Enhancement module to improve the visual quality. Through these efforts, our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets. Our project page can be found at https://indu1ge.github.io/DepthMaster_page.

[232] Geometry-aided Vision-based Localization of Future Mars Helicopters in Challenging Illumination Conditions

Dario Pisanti, Robert Hewitt, Roland Brockers, Georgios Georgakis

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Planetary exploration using aerial assets has the potential for unprecedented scientific discoveries on Mars. While NASA’s Mars helicopter Ingenuity proved flight in Martian atmosphere is possible, future Mars rotorcraft will require advanced navigation capabilities for long-range flights. One such critical capability is Map-based Localization (MbL) which registers an onboard image to a reference map during flight to mitigate cumulative drift from visual odometry. However, significant illumination differences between rotorcraft observations and a reference map prove challenging for traditional MbL systems, restricting the operational window of the vehicle. In this work, we investigate a new MbL system and propose Geo-LoFTR, a geometry-aided deep learning model for image registration that is more robust under large illumination differences than prior models. The system is supported by a custom simulation framework that uses real orbital maps to produce large amounts of realistic images of the Martian terrain. Comprehensive evaluations show that our proposed system outperforms prior MbL efforts in terms of localization accuracy under significant lighting and scale variations. Furthermore, we demonstrate the validity of our approach across a simulated Martian day and on real Mars imagery. Code and datasets are available at: https://dpisanti.github.io/geo-loftr/.

[233] Information Bottleneck-Guided Heterogeneous Graph Learning for Interpretable Neurodevelopmental Disorder Diagnosis

Yueyang Li, Lei Chen, Wenhao Dong, Shengyu Gong, Zijian Kang, Boyang Wei, Weiming Zeng, Hongjie Yan, Lingbin Bian, Zhiguo Zhang, Wai Ting Siok, Nizhuan Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Developing interpretable models for neurodevelopmental disorders (NDDs) diagnosis presents significant challenges in effectively encoding, decoding, and integrating multimodal neuroimaging data. While many existing machine learning approaches have shown promise in brain network analysis, they typically suffer from limited interpretability, particularly in extracting meaningful biomarkers from functional magnetic resonance imaging (fMRI) data and establishing clear relationships between imaging features and demographic characteristics. Besides, current graph neural network methodologies face limitations in capturing both local and global functional connectivity patterns while simultaneously achieving theoretically principled multimodal data fusion. To address these challenges, we propose the Interpretable Information Bottleneck Heterogeneous Graph Neural Network (I2B-HGNN), a unified framework that applies information bottleneck principles to guide both brain connectivity modeling and cross-modal feature integration. This framework comprises two complementary components. The first is the Information Bottleneck Graph Transformer (IBGraphFormer), which combines transformer-based global attention mechanisms with graph neural networks through information bottleneck-guided pooling to identify sufficient biomarkers. The second is the Information Bottleneck Heterogeneous Graph Attention Network (IB-HGAN), which employs meta-path-based heterogeneous graph learning with structural consistency constraints to achieve interpretable fusion of neuroimaging and demographic data. The experimental results demonstrate that I2B-HGNN achieves superior performance in diagnosing NDDs, exhibiting both high classification accuracy and the ability to provide interpretable biomarker identification while effectively analyzing non-imaging data.

[234] Anatomy-Aware Text-Visual Fusion with Dual-Perspective Prompts for Fine-Grained Lumbar Spine Segmentation

Sheng Lian, Jianlong Cai, Dengfeng Pan, Guang-Yong Chen, Hao Xu, Fan Zhang, Guodong Fan, Shuo Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Accurate lumbar spine segmentation is crucial for diagnosing spinal disorders. Existing methods typically use coarse-grained segmentation strategies that lack the fine detail needed for precise diagnosis. Additionally, their reliance on visual-only models hinders the capture of anatomical semantics, leading to misclassified categories and poor segmentation details. To address these limitations, we present ATM-Net, an innovative framework that employs an anatomy-aware, text-guided, multi-modal fusion mechanism for fine-grained segmentation of lumbar substructures, i.e., vertebrae (VBs), intervertebral discs (IDs), and spinal canal (SC). ATM-Net adopts the Anatomy-aware Text Prompt Generator (ATPG) to adaptively convert image annotations into anatomy-aware prompts in different views. These insights are further integrated with image features via the Holistic Anatomy-aware Semantic Fusion (HASF) module, building a comprehensive anatomical context. The Channel-wise Contrastive Anatomy-Aware Enhancement (CCAE) module further enhances class discrimination and refines segmentation through class-wise channel-level multi-modal contrastive learning. Extensive experiments on the MRSpineSeg and SPIDER datasets demonstrate that ATM-Net significantly outperforms state-of-the-art methods, with consistent improvements regarding class discrimination and segmentation details. For example, ATM-Net achieves Dice of 79.39% and HD95 of 9.91 pixels on SPIDER, outperforming the competitive SpineParseNet by 8.31% and 4.14 pixels, respectively.

[235] Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos

Rundong Luo, Matthew Wallingford, Ali Farhadi, Noah Snavely, Wei-Chiu Ma

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 360° videos have emerged as a promising medium to represent our dynamic visual world. Compared to the “tunnel vision” of standard cameras, their borderless field of view offers a more complete perspective of our surroundings. While existing video models excel at producing standard videos, their ability to generate full panoramic videos remains elusive. In this paper, we investigate the task of video-to-360° generation: given a perspective video as input, our goal is to generate a full panoramic video that is consistent with the original video. Unlike conventional video generation tasks, the output’s field of view is significantly larger, and the model is required to have a deep understanding of both the spatial layout of the scene and the dynamics of objects to maintain spatio-temporal consistency. To address these challenges, we first leverage the abundant 360° videos available online and develop a high-quality data filtering pipeline to curate pairwise training data. We then carefully design a series of geometry- and motion-aware operations to facilitate the learning process and improve the quality of 360° video generation. Experimental results demonstrate that our model can generate realistic and coherent 360° videos from in-the-wild perspective video. In addition, we showcase its potential applications, including video stabilization, camera viewpoint control, and interactive visual question answering.

[236] APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

Yuan Gao, Shaobo Xia, Sheng Nie, Cheng Wang, Xiaohuan Xi, Bisheng Yang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.09971: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.09971&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[237] LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval

Zhenyu Ning, Guangda Liu, Qihao Jin, Chengwei Li, Wenchao Ding, Minyi Guo, Jieru Zhao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.15269: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.15269&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[238] FunduSegmenter: Leveraging the RETFound Foundation Model for Joint Optic Disc and Optic Cup Segmentation in Retinal Fundus Images

Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.11354: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.11354&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[239] TV Subgradient-Guided Multi-Source Fusion for Spectral Imaging in Dual-Camera CASSI Systems

Weiqiang Zhao, Tianzhu Liu, Yuzhe Gui, Wei Bian, Yanfeng Gu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.10897: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.10897&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[240] Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Zilun Zhang, Zian Guan, Tiancheng Zhao, Haozhan Shen, Tianyu Li, Yuxiang Cai, Zhonggen Su, Zhaojun Liu, Jianwei Yin, Xiang Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.21976: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.21976&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[241] Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge

Max Kirchner, Hanna Hoffmann, Alexander C. Jenke, Oliver L. Saldanha, Kevin Pfeiffer, Weam Kanjo, Julia Alekseenko, Claas de Boer, Santhi Raj Kolamuri, Lorenzo Mazza, Nicolas Padoy, Sophia Bano, Annika Reinke, Lena Maier-Hein, Danail Stoyanov, Jakob N. Kather, Fiona R. Kolbinger, Sebastian Bodenstedt, Stefanie Speidel

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.04772: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.04772&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[242] Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

Arnela Hadzic, Simon Johannes Joham, Martin Urschler

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.04823: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.04823&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[243] Accelerating Vision Transformers with Adaptive Patch Sizes

Rohan Choudhury, JungEun Kim, Jinhyung Park, Eunho Yang, László A. Jeni, Kris M. Kitani

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.18091: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.18091&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[244] VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models

Tianci Bi, Xiaoyi Zhang, Yan Lu, Nanning Zheng

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.18457: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.18457&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[245] When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA

Luca Carlini, Dennis Pierantozzi, Mauro Orazio Drago, Chiara Lena, Cesare Hassan, Elena De Momi, Danail Stoyanov, Sophia Bano, Mobarak I. Hoque

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.01458: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.01458&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[246] PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Antonio Oroz, Matthias Nießner, Tobias Kirschstein

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.02777: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.02777&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[247] Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

Sichen Guo, Wenjie Li, Yuanyang Liu, Guangwei Gao, Jian Yang, Chia-Wen Lin

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.03232: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.03232&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[248] SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding

Mauro Orazio Drago, Luca Carlini, Pelinsu Celebi Balyemez, Dennis Pierantozzi, Chiara Lena, Cesare Hassan, Danail Stoyanov, Elena De Momi, Sophia Bano, Mobarak I. Hoque

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.03325: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.03325&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[249] VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping

Haotian Dong, Ye Li, Rongwei Lu, Chen Tang, Shu-Tao Xia, Zhi Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.13587: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.13587&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[250] SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors

Ruijie Fan, Junyan Ye, Huan Chen, Zilong Huang, Xiaolei Wang, Weijia Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.18264: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.18264&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[251] LRDUN: A Low-Rank Deep Unfolding Network for Efficient Spectral Compressive Imaging

He Huang, Yujun Guo, Wei He

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.18513: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.18513&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[252] PAT3D: Physics-Augmented Text-to-3D Scene Generation

Guying Lin, Kemeng Huang, Michael Liu, Ruihan Gao, Hanke Chen, Lyuhao Chen, Beijia Lu, Taku Komura, Yuan Liu, Jun-Yan Zhu, Minchen Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.21978: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.21978&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[253] Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection

Jessica Plassmann, Nicolas Schuler, Michael Schuth, Georg von Freymann

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.06171: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.06171&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[254] UbiQVision: Quantifying Uncertainty in XAI for Image Recognition

Akshat Dubey, Aleksandar Anžel, Bahar İlgen, Georges Hattab

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.20288: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.20288&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[255] GeCo: Evaluating Geometric Consistency for Video Generation via Motion and Structure

Leslie Gu, Junhwa Hur, Charles Herrmann, Fangneng Zhan, Todd Zickler, Deqing Sun, Hanspeter Pfister

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.22274: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.22274&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[256] What’s Left Unsaid? Detecting and Correcting Misleading Omissions in Multimodal News Previews

Fanxiao Li, Jiaying Wu, Tingchao Fu, Dayang Li, Herun Wan, Wei Zhou, Min-Yen Kan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.05563: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.05563&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[257] ATATA: One Algorithm to Align Them All

Boyi Pang, Savva Ignatyev, Vladimir Ippolitov, Ramil Khafizov, Yurii Melnik, Oleg Voynov, Maksim Nakhodnov, Aibek Alanov, Xiaopeng Fan, Peter Wonka, Evgeny Burnaev

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.11194: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.11194&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[258] Bridging Supervision Gaps: A Unified Framework for Remote Sensing Change Detection

Kaixuan Jiang, Chen Wu, Zhenghui Zhao, Chengxi Han, Haonan Guo, Hongruixuan Chen

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.17747: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.17747&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[259] Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning

Judith Vilella-Cantos, Mauro Martini, Marcello Chiaberge, Mónica Ballesta, David Valiente

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.18714: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.18714&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[260] DAVIS: OOD Detection via Dominant Activations and Variance for Increased Separation

Abid Hassan, Tuan Ngo, Saad Shafiq, Nenad Medvidovic

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.22703: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.22703&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[261] Catalyst: Out-of-Distribution Detection via Elastic Scaling

Abid Hassan, Tuan Ngo, Saad Shafiq, Nenad Medvidovic

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.02409: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.02409&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[262] MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.18792: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.18792&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[263] Efficient Multi-Source Knowledge Transfer by Model Merging

Marcin Osial, Bartosz Wójcik, Bartosz Zieliński, Sebastian Cygert

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.19353: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.19353&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[264] Fusion Complexity Inversion: Why Simpler Cross View Modules Outperform SSMs and Cross View Attention Transformers for Pasture Biomass Regression

Mridankan Mandal

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.07819: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.07819&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[265] SGG-R$^{\rm 3}$: From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation

Jiaye Feng, Qixiang Yin, Yuankun Liu, Tong Mo, Weiping Li

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.07961: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.07961&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[266] Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Fei Wang, Xinye Zheng, Kun Li, Yanyan Wei, Yuxin Liu, Ganpeng Hu, Tong Bao, Jingwen Yang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.12845: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.12845&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[267] RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

Sen Zhang, Runmei Li, Shizhuang Deng, Zhichao Zheng, Yuhe Zhang, Jiani Li, Kailun Zhang, Tao Zhang, Wenjun Wu, Qunbo Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.27112: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.27112&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[268] VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models

Ravi Ranjan, Agoritsa Polyzou

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.03956: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.03956&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[269] ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

Jiayang Xu, Fan Zhuo, Majun Zhang, Changhao Pan, Zehan Wang, Siyu Chen, Xiaoda Yang, Tao Jin, Zhou Zhao

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.07958: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.07958&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[270] StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding

Junxi Wang, Te Sun, Jiayi Zhu, Junxian Li, Haowen Xu, Zichen Wen, Xuming Hu, Zhiyu Li, Linfeng Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.09000: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.09000&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[271] FastSHADE: Fast Self-augmented Hierarchical Asymmetric Denoising for Efficient inference on mobile devices

Nikolay Falaleev

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10275: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10275&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[272] Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis

Boss Chen, Hanqing Wang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.13589: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.13589&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[273] Find the Differences: Differential Morphing Attack Detection vs Face Recognition

Una M. Kelly, Luuk J. Spreeuwers, Raymond N.J. Veldhuis

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.14734: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.14734&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[274] PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding

Junjie Wen, Junlin He, Fei Ma, Jinqiang Cui

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15770: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15770&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[275] A Lightweight Transformer for Pain Recognition from Brain Activity

Stefanos Gkikas, Christian Arzate Cruz, Yu Fang, Lu Cao, Muhammad Umar Khan, Thomas Kassiotis, Giorgos Giannakakis, Raul Fernandez Rojas, Randy Gomez

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16491: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16491&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[276] ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

Honglin Chen, Karran Pandey, Rundi Wu, Matheus Gadelha, Yannick Hold-Geoffroy, Ayush Tewari, Niloy J. Mitra, Changxi Zheng, Paul Guerrero

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17623: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17623&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[277] TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting

Lingyu Jiang, Lingyu Xu, Peiran Li, Dengzhe Hou, Qianwen Ge, Dingyi Zhuang, Shuo Xing, Wenjing Chen, Xiangbo Gao, Ting-Hsuan Chen, Xueying Zhan, Xin Zhang, Ziming Zhang, Zhengzhong Tu, Michael Zielewski, Kazunori Yamada, Fangzhou Lin

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.18539: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.18539&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[278] E3VS-Bench: A Benchmark for Viewpoint-Dependent Active Perception in 3D Gaussian Splatting Scenes

Koya Sakamoto, Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Shu Morikuni, Naoya Chiba, Motoaki Kawanabe, Yusuke Iwasawa, Yutaka Matsuo

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17969: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17969&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[279] PC2Model: ISPRS benchmark on 3D point cloud to model registration

Mehdi Maboudi, Said Harb, Jackson Ferrao, Kourosh Khoshelham, Yelda Turkan, Karam Mawas

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19596: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19596&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[280] CrackForward: Context-Aware Severity Stage Crack Synthesis for Data Augmentation

Nassim Sadallah, Mohand Saïd Allili

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19941: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19941&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[281] Semantic-Fast-SAM: Efficient Semantic Segmenter

Byunghyun Kim

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20169: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20169&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[282] Fourier Series Coder: A Novel Perspective on Angle Boundary Discontinuity Problem for Oriented Object Detection

Minghong Wei, Pu Cao, Zhihao Chen, Zhiyuan Zang, Lu Yang, Qing Song

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20281: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20281&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[283] RefAerial: A Benchmark and Approach for Referring Detection in Aerial Images

Guyue Hu, Hao Song, Yuxing Tong, Duzhi Yuan, Dengdi Sun, Aihua Zheng, Chenglong Li, Jin Tang

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20543: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20543&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[284] Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback

Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20730: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20730&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[285] Demystifying Action Space Design for Robotic Manipulation Policies

Yuchun Feng, Jinliang Zheng, Zhihao Wang, Dongxiu Liu, Jianxiong Li, Jiangmiao Pang, Tai Wang, Xianyuan Zhan

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.23408: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.23408&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[286] Human Presence Detection via Wi-Fi Range-Filtered Doppler Spectrum on Commodity Laptops

Jessica Sanson, Rahul C. Shah, Valerio Frascolla

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.10845: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.10845&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[287] Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling

Christian Belardi, Justin Lovelace, Kilian Q. Weinberger, Carla P. Gomes

Main category: cs.CV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.16797: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.16797&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[288] Unsharp Measurement with Adaptive Gaussian POVMs for Quantum-Inspired Image Processing

Debashis Saikia, Bikash K. Behera, Mayukha Pal, Prasanta K. Panigrahi

Marzia Binta Nizam, James Davis

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such as ‘doctor’ or ‘CEO’ frequently yield lighter-skinned outputs, while lower-status roles like ‘janitor’ show more diversity, reinforcing stereotypes. Existing mitigation methods typically require retraining or curated datasets, making them inaccessible to most users. We propose a lightweight, inference-time framework that mitigates representational bias through prompt-level intervention without modifying the underlying model. Instead of assuming a single definition of fairness, our approach allows users to select among multiple fairness specifications-ranging from simple choices such as a uniform distribution to more complex definitions informed by a large language model(LLM) that cites sources and provides confidence estimates. These distributions guide the construction of demographic specific prompt variants in the corresponding proportions, and we evaluate alignment by auditing adherence to the declared target and measuring the resulting skin tone distribution rather than assuming uniformity as ‘fairness’. Across 36 prompts spanning 30 occupations and 6 non-occupational contexts, our method shifts observed skin-tone outcomes in directions consistent with the declared target, and reduces deviation from targets when the target is defined directly in skin-tone space(fallback). This work demonstrates how fairness interventions can be made transparent, controllable, and usable at inference time, directly empowering users of generative AI.

Andrew Shin

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present AI-Gram, a live platform enabling image-based interactions, to study social dynamics in a fully autonomous multi-agent visual network where all participants are LLM-driven agents. Using the platform, we conduct experiments on how agents communicate and adapt through visual media, and observe the spontaneous emergence of visual reply chains, indicating rich communicative structure. At the same time, agents exhibit aesthetic sovereignty resisting stylistic convergence toward social partners, anchoring under adversarial influence, and a decoupling between visual similarity and social ties. These results reveal a fundamental asymmetry in current agent architectures: strong expressive communication paired with a steadfast preservation of individual visual identity. We release AI-Gram as a publicly accessible, continuously evolving platform for studying social dynamics in Al-native multi-agent systems. https://ai-gram.ai/

[300] Active Data

Richard Arthur, Virginia DiDomizio, Louis Hoebel

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In some complex domains, certain problem-specific decompositions can provide advantages over monolithic designs by enabling comprehension and specification of the design. In this paper we present an intuitive and tractable approach to reasoning over large and complex data sets. Our approach is based on Active Data, i.e., data as atomic objects that actively interact with environments. We describe our intuition about how this bottom-up approach improves designs confronting computational and conceptual complexity. We describe an implementation of the base Active Data concepts within the air traffic flow management domain and discuss performance for this implementation.

Nicklas Neu, Thomas Ebner, Jasmin Primus, Raphael Zefferer, Bernhard Schenkenfelder, Mathias Brunbauer, Florian Kromp

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned PaliGemma-2, a multi-modal vision-language model, with only 1,000 images and corresponding captions, describing embryo morphology, embryonic cell cycle and developmental stage. Our results show that the fine-tuned model, InVitroVision, outperformed a commercial model, ChatGPT 5.2, and base models in overall metrics, with performance improving with larger training datasets. This study demonstrates the potential of foundational vision-language models to generalize to IVF tasks with limited data, enabling the prediction of natural language descriptions of embryo morphology and development. This approach may facilitate the use of large language models to retrieve information and scientific evidence from relevant publications and guidelines, and has implications for few-shot adaptation to multiple downstream tasks in IVF.

[302] Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

Ye Yu, Heming Liu, Haibo Jin, Xiaopeng Yuan, Peng Kuang, Haohan Wang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.

[303] Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

Gricel Vázquez, Alexandros Evangelidis, Sepeedeh Shahbeigi, Radu Calinescu, Simos Gerasimou

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Integrating Large Language Models (LLMs) into complex software systems enables the generation of human-understandable explanations of opaque AI processes, such as automated task planning. However, the quality and reliability of these explanations heavily depend on effective prompt engineering. The lack of a systematic understanding of how diverse stakeholder groups formulate and refine prompts hinders the development of tools that can automate this process. We introduce COMPASS (COgnitive Modelling for Prompt Automated SynthesiS), a proof-of-concept self-adaptive approach that formalises prompt engineering as a cognitive and probabilistic decision-making process. COMPASS models unobservable users’ latent cognitive states, such as attention and comprehension, uncertainty, and observable interaction cues as a POMDP, whose synthesised policy enables adaptive generation of explanations and prompt refinements. We evaluate COMPASS using two diverse cyber-physical system case studies to assess the adaptive explanation generation and their qualities, both quantitatively and qualitatively. Our results demonstrate the feasibility of COMPASS integrating human cognition and user profile’s feedback into automated prompt synthesis in complex task planning systems.

[304] Propensity Inference: Environmental Contributors to LLM Behaviour

Olli Järviniemi, Oliver Makins, Jacob Merizian, Robert Kirk, Ben Millwood

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Motivated by loss of control risks from misaligned AI systems, we develop and apply methods for measuring language models’ propensity for unsanctioned behaviour. We contribute three methodological improvements: analysing effects of changes to environmental factors on behaviour, quantifying effect sizes via Bayesian generalised linear models, and taking explicit measures against circular analysis. We apply the methodology to measure the effects of 12 environmental factors (6 strategic in nature, 6 non-strategic) and thus the extent to which behaviour is explained by strategic aspects of the environment, a question relevant to risks from misalignment. Across 23 language models and 11 evaluation environments, we find approximately equal contributions from strategic and non-strategic factors for explaining behaviour, do not find strategic factors becoming more or less influential as capabilities improve, and find some evidence for a trend for increased sensitivity to goal conflicts. Finally, we highlight a key direction for future propensity research: the development of theoretical frameworks and cognitive models of AI decision-making into empirically testable forms.

[305] AI Governance under Political Turnover: The Alignment Surface of Compliance Design

Andrew J. Peterson

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Governments are increasingly interested in using AI to make administrative decisions cheaper, more scalable, and more consistent. But for probabilistic AI to be incorporated into public administration it must be embedded in a compliance layer that makes decisions reviewable, repeatable, and legally defensible. That layer can improve oversight by making departures from law easier to detect. But it can also create a stable approval boundary that political successors learn to navigate while preserving the appearance of lawful administration. We develop a formal model in which institutions choose the scale of automation, the degree of codification, and safeguards on iterative use. The model shows when these systems become vulnerable to strategic use from within government, why reforms that initially improve oversight can later increase that vulnerability, and why expansions in AI use may be difficult to unwind. Making AI usable can thus make procedures easier for future governments to learn and exploit.

[306] Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction

Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: At-home physiotherapy compliance remains critically low due to a lack of personalized supervision and dynamic feedback. Existing digital health solutions rely on static, pre-recorded video libraries or generic 3D avatars that fail to account for a patient’s specific injury limitations or home environment. In this paper, we propose a novel Multi-Agent System (MAS) architecture that leverages Generative AI and computer vision to close the tele-rehabilitation loop. Our framework consists of four specialized micro-agents: a Clinical Extraction Agent that parses unstructured medical notes into kinematic constraints; a Video Synthesis Agent that utilizes foundational video generation models to create personalized, patient-specific exercise videos; a Vision Processing Agent for real-time pose estimation; and a Diagnostic Feedback Agent that issues corrective instructions. We present the system architecture, detail the prototype pipeline using Large Language Models and MediaPipe, and outline our clinical evaluation plan. This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively.

[307] Trust but Verify: Introducing DAVinCI – A Framework for Dual Attribution and Verification in Claim Inference for Language Models

Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and verifiability are paramount. In this paper, we introduce DAVinCI - a Dual Attribution and Verification framework designed to enhance the factual reliability and interpretability of LLM outputs. DAVinCI operates in two stages: (i) it attributes generated claims to internal model components and external sources; (ii) it verifies each claim using entailment-based reasoning and confidence calibration. We evaluate DAVinCI across multiple datasets, including FEVER and CLIMATE-FEVER, and compare its performance against standard verification-only baselines. Our results show that DAVinCI significantly improves classification accuracy, attribution precision, recall, and F1-score by 5-20%. Through an extensive ablation study, we isolate the contributions of evidence span selection, recalibration thresholds, and retrieval quality. We also release a modular DAVinCI implementation that can be integrated into existing LLM pipelines. By bridging attribution and verification, DAVinCI offers a scalable path to auditable, trustworthy AI systems. This work contributes to the growing effort to make LLMs not only powerful but also accountable.

[308] Speculative Actions: A Lossless Framework for Faster Agentic Systems

Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, with each action requiring an API call that can incur substantial latency. For example, a game of chess between two state-of-the-art agents can take hours. We introduce Speculative Actions, a lossless acceleration framework for general agentic systems. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, our method uses faster models to predict likely future actions and execute them in parallel, committing only when predictions match. We evaluate speculative actions across gaming, e-commerce, and web search environments, and additionally study a lossy extension in an operating systems setting. Across domains, we achieve up to 55% next-action prediction accuracy, translating into up to 20% latency reductions. Finally, we present a cost-latency analysis that formalizes the tradeoff between speculative breadth and time savings. This analysis enables principled tuning and selective branch launching to ensure that multi-branch speculation delivers practical speedups without prohibitive cost growth.

[309] Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

Yanan Wang, Yong Ge

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Online reviews have played a pivotal role in consumers’ decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains unaddressed due to the considerable human labor required to respond to the rapid growth of online reviews. While generative AI has achieved remarkable success in a range of tasks, they are general-purpose models and may not align well with domain-specific human preferences. To tailor these general generative AI models to domain-specific applications, finetuning is commonly employed. Nevertheless, several challenges persist in finetuning with domain-specific data, including hallucinations, difficulty in representing domain-specific human preferences, and over conservatism in offline policy optimization. To address these challenges, we propose a novel preference finetuning method to align an LLM with domain-specific human preferences for generating online review responses. Specifically, we first identify the source of hallucination and propose an effective context augmentation approach to mitigate the LLM hallucination. To represent human preferences, we propose a novel theory-driven preference finetuning approach that automatically constructs human preference pairs in the online review domain. Additionally, we propose a curriculum learning approach to further enhance preference finetuning. To overcome the challenge of over conservatism in existing offline preference finetuning method, we propose a novel density estimation-based support constraint method to relax the conservatism, and we mathematically prove its superior theoretical guarantees. Extensive evaluations substantiate the superiority of our proposed preference finetuning method.

[310] The Specification Trap: Why Static Value Alignment Alone Is Insufficient for Robust Alignment

Austin Spizzirri

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Static content-based AI value alignment is insufficient for robust alignment under capability scaling, distributional shift, and increasing autonomy. This holds for any approach that treats alignment as optimizing toward a fixed formal value-object, whether reward function, utility function, constitutional principles, or learned preference representation. Three philosophical results create compounding difficulties: Hume’s is-ought gap (behavioral data underdetermines normative content), Berlin’s value pluralism (human values resist consistent formalization), and the extended frame problem (any value encoding will misfit future contexts that advanced AI creates). RLHF, Constitutional AI, inverse reinforcement learning, and cooperative assistance games each instantiate this specification trap, and their failure modes reflect structural vulnerabilities, not merely engineering limitations that better data or algorithms will straightforwardly resolve. Known workarounds for individual components face mutually reinforcing difficulties when the specification is closed: the moment it ceases to update from the process it governs. Drawing on compatibilist philosophy, the paper argues that behavioral compliance under training conditions does not guarantee robust alignment under novel conditions, and that this gap grows with system capability. For value-laden autonomous systems, known closed approaches face structural vulnerabilities that worsen with capability. The constructive burden shifts to open, developmentally responsive approaches, though whether such approaches can be achieved remains an empirical question.

[311] ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures

Xiyin Zeng, Yuyu Sun, Haoyang Li, Shouqiang Liu, Hao Wang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Vision-Language-Action systems follow instructions to execute multi-step tasks in multimodal environments. Recent VLA approaches typically rely on post-hoc correction mechanisms or operate under fixed task decompositions and alignment schemes. However, once an intermediate step is mis-specified, local errors propagate through subsequent steps and eventually accumulate into cascading failures. To mitigate this compounding effect, we propose Predictive Alignment and Planning Architecture, a framework that uses prediction and contrast to adjust deviations across three levels: actions, subgoals, and trajectories. Semantic alignment is enforced at all levels using a Sinkhorn-based module and a Score-field module. The predictive correction and alignment jointly update the action generator during training, enabling it to adjust fine-grained steps to remain aligned with the overall intent. We further introduce two new metrics to quantify error propagation and recovery processes in tasks, capturing how mistakes spread and fade over long-horizon execution. Experiments show that ReCAPA achieves competitive results on embodied agent benchmarks such as VisualAgentBench, MineDojo, and AI2-THOR, outperforming strong proprietary and open-source Large Language Model baselines.

[312] Robustness Analysis of POMDP Policies to Observation Perturbations

Benjamin Kraske, Qi Heng Ho, Federico Rossi, Morteza Lahijanian, Zachary Sunberg

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Policies for Partially Observable Markov Decision Processes (POMDPs) are often designed using a nominal system model. In practice, this model can deviate from the true system during deployment due to factors such as calibration drift or sensor degradation, leading to unexpected performance degradation. This work studies policy robustness against deviations in the POMDP observation model. We introduce the Policy Observation Robustness Problem: to determine the maximum tolerable deviation in a POMDP’s observation model that guarantees the policy’s value remains above a specified threshold. We analyze two variants: the sticky variant, where deviations are dependent on state and actions, and the non-sticky variant, where they can be history-dependent. We show that the Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, we show that when policies are represented with finite-state controllers (FSCs) it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. We present Robust Interval Search, an algorithm with soundness and convergence guarantees, for both the sticky and non-sticky variants. We show this algorithm has polynomial time complexity in the non-sticky variant and at most exponential time complexity in the sticky variant. We provide experimental results validating and demonstrating the scalability of implementations of Robust Interval Search to POMDP problems with tens of thousands of states. We also provide case studies from robotics and operations research which demonstrate the practical utility of the problem and algorithms.

[313] Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages

Michael Bouzinier, Sergey Trifonov, Michael Chumack, Eugenia Lvova, Dmitry Etin

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: \textbf{Background:} Regulatory frameworks for AI in healthcare, including the EU AI Act and FDA guidance on AI/ML-based medical devices, require clinical decision support to demonstrate not only accuracy but auditability. Existing formal languages for clinical logic validate syntactic and structural correctness but not whether decision rules use epistemologically appropriate evidence. \textbf{Methods:} Drawing on design-by-contract principles, we introduce meta-predicates – predicates about predicates – for asserting epistemological constraints on clinical decision rules expressed in a DSL. An epistemological type system classifies annotations along four dimensions: purpose, knowledge domain, scale, and method of acquisition. Meta-predicates assert which evidence types are permissible in any given rule. The framework is instantiated in AnFiSA, an open-source platform for genetic variant curation, and demonstrated using the Brigham Genomics Medicine protocol on 5.6 million variants from the Genome in a Bottle benchmark. \textbf{Results:} Decision trees used in variant interpretation can be reformulated as unate cascades, enabling per-variant audit trails that identify which rule classified each variant and why. Meta-predicate validation catches epistemological errors before deployment, whether rules are human-written or AI-generated. The approach complements post-hoc methods such as LIME and SHAP: where explanation reveals what evidence was used after the fact, meta-predicates constrain what evidence may be used before deployment, while preserving human readability. \textbf{Conclusions:} Meta-predicate validation is a step toward demonstrating not only that decisions are accurate but that they rest on appropriate evidence in ways that can be independently audited. While demonstrated in genomics, the approach generalises to any domain requiring auditable decision logic.

[314] Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation

Minping Chen, Bing Xu, Zulong Chen, Chuanfei Xu, Ying Zhou, Zui Tao, Zeyi Wen

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Person-Job Fit (PJF) is a critical component for online recruitment. Existing approaches face several challenges, particularly in handling low-quality job descriptions and similar candidate-job pairs, which impair model performance. To address these challenges, this paper proposes a large language model (LLM) based method with two novel techniques: (1) LLM-based data augmentation, which polishes and rewrites low-quality job descriptions by leveraging chain-of-thought (COT) prompts, and (2) category-aware Mixture of Experts (MoE) that assists in identifying similar candidate-job pairs. This MoE module incorporates category embeddings to dynamically assign weights to the experts and learns more distinguishable patterns for similar candidate-job pairs. We perform offline evaluations and online A/B tests on our recruitment platform. Our method relatively surpasses existing methods by 2.40% in AUC and 7.46% in GAUC, and boosts click-through conversion rate (CTCVR) by 19.4% in online tests, saving millions of CNY in external headhunting expenses.

[315] Can MLLMs “Read” What is Missing?

Jindi Guo, Xi Fang, Chaozheng Huang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models to recover masked text from single- or multi-page inputs across real-world domains such as documents and webpages. This design isolates the reconstruction task from instruction-following abilities, enabling a direct assessment of a model’s layout understanding, visual grounding, and knowledge integration. MMTR-Bench comprises 2,771 test samples spanning multiple languages and varying target lengths. To account for this diversity, we propose a level-aware evaluation protocol. Experiments on representative MLLMs show that the benchmark poses a significant challenge, especially for sentence- and paragraph-level reconstruction. The homepage is available at https://mmtr-bench-dataset.github.io/MMTR-Bench/.

[316] Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture

Robin Dey, Panyanon Viradecha

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: MemPalace is an open-source AI memory system that applies the ancient method of loci (memory palace) spatial metaphor to organize long-term memory for large language models; launched in April 2026, it accumulated over 47,000 GitHub stars in its first two weeks and claims state-of-the-art retrieval performance on the LongMemEval benchmark (96.6% Recall@5) without requiring any LLM inference at write time. Through independent codebase analysis, benchmark replication, and comparison with competing systems, we find that MemPalace’s headline retrieval performance is attributable primarily to its verbatim storage philosophy combined with ChromaDB’s default embedding model (all-MiniLM-L6-v2), rather than to its spatial organizational metaphor per se – the palace hierarchy (Wings->Rooms->Closets->Drawers) operates as standard vector database metadata filtering, an effective but well-established technique. However, MemPalace makes several genuinely novel contributions: (1) a contrarian verbatim-first storage philosophy that challenges extraction-based competitors, (2) an extremely low wake-up cost (approximately 170 tokens) through its four-layer memory stack, (3) a fully deterministic, zero-LLM write path enabling offline operation at zero API cost, and (4) the first systematic application of spatial memory metaphors as an organizing principle for AI memory systems. We also note that the competitive landscape is evolving rapidly, with Mem0’s April 2026 token-efficient algorithm raising their LongMemEval score from approximately 49% to 93.4%, narrowing the gap between extraction-based and verbatim approaches. Our analysis concludes that MemPalace represents significant architectural insight wrapped in overstated claims – a pattern common in rapidly adopted open-source projects where marketing velocity exceeds scientific rigor.

[317] Ideological Bias in LLMs’ Economic Causal Reasoning

Donggyu Lee, Hyeok Yun, Jungwon Kim, Junsik Min, Sungwon Park, Sangyoon Park, Jihee Kim

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.

[318] Evaluating AI Meeting Summaries with a Reusable Cross-Domain Pipeline

Philip Zhong, Don Wang, Jason Zhang, Kent Chen

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present a reusable evaluation pipeline for generative AI applications, instantiated for AI meeting summaries and released with a public artifact package derived from a Dataset Pipeline. The system separates reusable orchestration from task-specific semantics across five stages: source intake, structured reference construction, candidate generation, structured scoring, and reporting. Unlike standalone claim scorers, it treats both ground truth and evaluator outputs as typed, persisted artifacts, enabling aggregation, issue analysis, and statistical testing. We benchmark the offline loop on a typed dataset of 114 meetings spanning city_council, private_data, and whitehouse_press_briefings, producing 340 meeting-model pairs and 680 judge runs across gpt-4.1-mini, gpt-5-mini, and gpt-5.1. Under this protocol, gpt-4.1-mini achieves the highest mean accuracy (0.583), while gpt-5.1 leads in completeness (0.886) and coverage (0.942). Paired sign tests with Holm correction show no significant accuracy winner but confirm significant retention gains for gpt-5.1. A typed DeepEval contrastive baseline preserves retention ordering but reports higher holistic accuracy, suggesting that reference-based scoring may overlook unsupported-specifics errors captured by claim-grounded evaluation. Typed analysis identifies whitehouse_press_briefings as an accuracy-challenging domain with frequent unsupported specifics. A deployment follow-up shows gpt-5.4 outperforming gpt-4.1 across all metrics, with statistically robust gains on retention metrics under the same protocol. The system benchmarks the offline loop and documents, but does not quantitatively evaluate, the online feedback-to-evaluation path.

[319] Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav, Tanel Tammet

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Vision–language models (VLMs) often fail on abstract visual reasoning benchmarks such as Bongard problems, raising the question of whether the main bottleneck lies in reasoning or representation. We study this on Bongard-LOGO, a synthetic benchmark of abstract concept learning with ground-truth generative programs, by comparing end-to-end VLMs on raw images with large language models (LLMs) given symbolic inputs derived from those images. Using symbolic inputs as a diagnostic probe rather than a practical multimodal architecture, our \emph{Componential–Grammatical (C–G)} paradigm reformulates Bongard-LOGO as a symbolic reasoning task based on LOGO-style action programs or structured descriptions. LLMs achieve large and consistent gains, reaching mid–90s accuracy on Free-form problems, while a strong visual baseline remains near chance under matched task definitions. Ablations on input format, explicit concept prompts, and minimal visual grounding show that these factors matter much less than the shift from pixels to symbolic structure. These results identify representation as a key bottleneck in abstract visual reasoning and show how symbolic input can serve as a controlled diagnostic upper bound.

[320] ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs

Jian Cui, Zhiyuan Ren, Desheng Weng, Yongqi Zhao, Gong Wenbin, Yu Lei, Zhenning Dong

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper proposes ReaGeo, an end-to-end geocoding framework based on large language models, designed to overcome the limitations of traditional multi-stage approaches that rely on text or vector similarity retrieval over geographic databases, including workflow complexity, error propagation, and heavy dependence on structured geographic knowledge bases. The method converts geographic coordinates into geohash sequences, reformulating the coordinate prediction task as a text generation problem, and introduces a Chain-of-Thought mechanism to enhance the model’s reasoning over spatial relationships. Furthermore, reinforcement learning with a distance-deviation-based reward is applied to optimize the generation accuracy. Comprehensive experiments show that ReaGeo can accurately handle explicit address queries in single-point predictions and effectively resolve vague relative location queries. In addition, the model demonstrates strong predictive capability for non-point geometric regions, highlighting its versatility and generalization ability in geocoding tasks.

[321] Time, Causality, and Observability Failures in Distributed AI Inference Systems

Ankur Sharma, Deep Shah, David Lariviere, Hesham ElBakoury

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Distributed AI inference pipelines rely heavily on timestamp-based observability to understand system behavior. This work demonstrates that even small clock skew between nodes can cause observability to become causally incorrect while the system itself remains functionally correct and performant. We present controlled experiments on a multi-node AI inference pipeline, where clock skew is introduced at a single stage. Results show that no violations are observed under synchronized conditions and up to 3 ms skew, while clear causality violations emerge by 5 ms. Despite this, system throughput and output correctness remain largely unaffected. We further observe that violation behavior is not strictly static. In longer runs, negative span rates may stabilize or decrease over time, indicating that effective skew evolves due to relative clock drift between nodes. Experiments were conducted using Kafka and ZeroMQ transports, with consistent results across both. Aeron is under active exploration but is not yet included in the completed validation set. These findings suggest that observability correctness depends not only on system functionality but also on precise time alignment, and that timing must be treated as a first-class concern in distributed AI systems.

[322] SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

Qiang Gao, Zhenping Li, Anqi Zhuo, Yingxiao Zhao, Weibo Geng, Xiaosong Li

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Existing text-to-SQL synthesis pipelines still conflate executability with semantic validity: syntactic checks and execution-based validation can retain queries that execute successfully while violating database semantics. To address these limitations, we propose SemanticAgent, a semantic-aware synthesis framework. SemanticAgent organizes synthesis around three specialized modules: an analyzer, a synthesizer, and a verifier. Through a three-stage protocol of semantic analysis, stepwise synthesis, and diagnostic refinement, SemanticAgent transforms execution-based validation alone into a traceable reasoning process. Our framework generates synthetic data that consistently outperforms prior synthesis methods under semantic-quality evaluation, leading to stronger downstream fine-tuning performance, especially on semantically demanding benchmarks.

[323] FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation

Jinhee Jang, Juhwan Choi, Dongjin Lee, Seunguk Yu, Youngbin Kim

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Quality Estimation (QE) aims to assess machine translation quality without reference translations, but recent studies have shown that existing QE models exhibit systematic gender bias. In particular, they tend to favor masculine realizations in gender-ambiguous contexts and may assign higher scores to gender-misaligned translations even when gender is explicitly specified. To address these issues, we propose FairQE, a multi-agent-based, fairness-aware QE framework that mitigates gender bias in both gender-ambiguous and gender-explicit scenarios. FairQE detects gender cues, generates gender-flipped translation variants, and combines conventional QE scores with LLM-based bias-mitigating reasoning through a dynamic bias-aware aggregation mechanism. This design preserves the strengths of existing QE models while calibrating their gender-related biases in a plug-and-play manner. Extensive experiments across multiple gender bias evaluation settings demonstrate that FairQE consistently improves gender fairness over strong QE baselines. Moreover, under MQM-based meta-evaluation following the WMT 2023 Metrics Shared Task, FairQE achieves competitive or improved general QE performance. These results show that gender bias in QE can be effectively mitigated without sacrificing evaluation accuracy, enabling fairer and more reliable translation evaluation.

[324] Brief chatbot interactions produce lasting changes in human moral values

Yue Teng, Qianer Zhong, Kim Mai Tich Nguyen Thordsen, Christian Montag, Benjamin Becker

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Moral judgements form the foundation of human social behavior and societal systems. While Artificial Intelligence chatbots increasingly serve as personal advisors, their influence on moral judgments remains largely unexplored. Here, we examined whether directive AI conversations shift moral evaluations using a within-subject naturalistic paradigm. Fifty-three participants rated moral scenarios, then discussed four with a chatbot prompted to shift moral judgments and four with a control agent. The brief conversations induced significant directional shifts in moral judgments, accepting stricter standards as well as advocating greater leniency (ps < 0.05; Cohen’s d = 0.735-1.576), with increasing strengths of this effect during a two-week follow-up (Cohen’s d = 1.038-2.069). Critically, the control condition produced no changes, and the effects did not extend to punishment while participants remained unaware of the persuasive intent, and both agents were rated equally likable and convincing, suggesting a vulnerability to undetected and lasting manipulation of foundational moral values.

[325] HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration

Yuehan Zhu, Jingqi Zhao, Jiawen Zhao, Xudong Mao, Baoquan Zhao

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Long-form video understanding remains fundamentally challenged by pervasive spatiotemporal redundancy and intricate narrative dependencies that span extended temporal horizons. While recent structured representations compress visual information effectively, they frequently sacrifice temporal coherence, which is critical for causal reasoning. Meanwhile, existing multi-agent frameworks operate through rigid, pre-defined workflows that fail to adapt their reasoning strategies to question-specific demands. In this paper, we introduce HiCrew, a hierarchical multi-agent framework that addresses these limitations through three core contributions. First, we propose a Hybrid Tree structure that leverages shot boundary detection to preserve temporal topology while performing relevance-guided hierarchical clustering within semantically coherent segments. Second, we develop a Question-Aware Captioning mechanism that synthesizes intent-driven visual prompts to generate precision-oriented semantic descriptions. Third, we integrate a Planning Layer that dynamically orchestrates agent collaboration by adaptively selecting roles and execution paths based on question complexity. Extensive experiments on EgoSchema and NExT-QA validate the effectiveness of our approach, demonstrating strong performance across diverse question types with particularly pronounced gains in temporal and causal reasoning tasks that benefit from our hierarchical structure-preserving design.

[326] Efficient Agent Evaluation via Diversity-Guided User Simulation

Itay Nakash, George Kour, Ateret Anaby-Tavor

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this approach is computationally inefficient, repeatedly regenerating identical early prefixes, and often fails to uncover deep failure modes that arise from rare user behaviors. We introduce DIVERT (Diversity-Induced Evaluation via Branching of Trajectories), an efficient, snapshot-based, coverage-guided user simulation framework for systematic exploration of agent-user interactions. DIVERT captures the full agent-environment state at critical decision points and resumes execution from these snapshots, enabling reuse of shared conversation prefixes and reducing redundant computation. From each junction, the framework branches using targeted, diversity-inducing user responses, allowing directed exploration of alternative interaction paths. By focusing evaluation on semantically diverse and underexplored trajectories, DIVERT improves both efficiency and coverage. Empirical results show that it discovers more failures per token compared to standard linear rollout protocols, while expanding the set of tasks on which failures are identified.

[327] How English Print Media Frames Human-Elephant Conflicts in India

Bonala Sai Punith, Salveru Jayati, Garima Shakya, Shubham Kumar Nigam

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Human-elephant conflict (HEC) is rising across India as habitat loss and expanding human settlements force elephants into closer contact with people. While the ecological drivers of conflict are well-studied, how the news media portrays them remains largely unexplored. This work presents the first large-scale computational analysis of media framing of HEC in India, examining 1,968 full-length news articles consisting of 28,986 sentences, from a major English-language outlet published between January 2022 and September 2025. Using a multi-model sentiment framework that combines long-context transformers, large language models, and a domain-specific Negative Elephant Portrayal Lexicon, we quantify sentiment, extract rationale sentences, and identify linguistic patterns that contribute to negative portrayals of elephants. Our findings reveal a dominance of fear-inducing and aggression-related language. Since the media framing can shape public attitudes toward wildlife and conservation policy, such narratives risk reinforcing public hostility and undermining coexistence efforts. By providing a transparent, scalable methodology and releasing all resources through an anonymized repository, this study highlights how Web-scale text analysis can support responsible wildlife reporting and promote socially beneficial media practices.

[328] GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation

Yitong Zhou, Mingyue Cheng, Jiahao Wang, Qingyang Mao, Qi Liu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Lithology classification in well logs is a fundamental geoscience data mining task that aims to infer rock types from multi dimensional geophysical sequences. Despite recent progress, existing approaches typically formulate the problem as a static, single-step discriminative mapping. This static paradigm limits evidence-based diagnostic reasoning against geological standards, often yielding predictions that are detached from geological reality due to a lack of domain priors. In this work, we propose GeoMind, a tool-augmented agentic framework that models lithology classification as a sequential reasoning process. GeoMind organizes its toolkit into perception, reasoning, and analysis modules, which respectively translate raw logs into semantic trends, infer lithology hypotheses from multi-source evidence, and verify predictions against stratigraphic constraints. A global planner adaptively coordinates these modules based on input characteristics, enabling geologically plausible and evidence-grounded decisions. To guarantee the logical consistency of GeoMind, we introduce a fine-grained process supervision strategy. Unlike standard methods that focus solely on final outcomes, our approach optimizes intermediate reasoning steps, ensuring the validity of decision trajectories and alignment to geological constraints. Experiments on four benchmark well-log datasets demonstrate that GeoMind consistently outperforms strong baselines in classification performance while providing transparent and traceable decision-making processes.

Jiaxian Yan, Jintao Zhu, Yuhang Yang, Qi Liu, Kai Zhang, Zaixi Zhang, Xukai Liu, Boyan Zhang, Kaiyuan Gao, Jinchuan Xiao, Enhong Chen

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Protein-ligand bioactivity data published in the literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction remains challenging because it requires not only interpreting biochemical semantics distributed across text, tables, and figures, but also reconstructing chemically exact ligand structures (e.g., Markush structures). To address this bottleneck, we introduce BioMiner, a multi-modal extraction framework that explicitly separates bioactivity semantic interpretation from ligand structure construction. Within BioMiner, bioactivity semantics are inferred through direct reasoning, while chemical structures are resolved via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer inter-structure relationships, and exact molecular construction is delegated to domain chemistry tools. For rigorous evaluation and method development, we further establish BioVista, a comprehensive benchmark comprising 16,457 bioactivity entries curated from 500 publications. BioMiner validates its extraction ability and provides a quantitative baseline, achieving an F1 score of 0.32 for bioactivity triplets. BioMiner’s practical utility is demonstrated via three applications: (1) extracting 82,262 data from 11,683 papers to build a pre-training database that improves downstream models performance by 3.9%; (2) enabling a human-in-the-loop workflow that doubles the number of high-quality NLRP3 bioactivity data, helping 38.6% improvement over 28 QSAR models and identification of 16 hit candidates with novel scaffolds; and (3) accelerating protein-ligand complex bioactivity annotation, achieving a 5.59-fold speed increase and 5.75% accuracy improvement over manual workflows in PoseBusters dataset.

[330] Satisfying Rationality Postulates of Structured Argumentation Through Deductive Support – Technical Report

Marcos Cramer, Tom Friese

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: ASPIC-style structured argumentation frameworks provide a formal basis for reasoning in artificial intelligence by combining internal argument structure with abstract argumentation semantics. A key challenge in these frameworks is ensuring compliance with five critical rationality postulates: closure, direct consistency, indirect consistency, non-interference, and crash-resistance. Recent approaches, including ASPIC$^{\ominus}$ and Deductive ASPIC$-$, have made significant progress but fall short of meeting all postulates simultaneously under a credulous semantics (e.g. preferred) in the presence of undercuts. This paper introduces Deductive ASPIC$^{\ominus}$, a novel framework that integrates gen-rebuttals from ASPIC$^{\ominus}$ with the Joint Support Bipolar Argumentation Frameworks (JSBAFs) of Deductive ASPIC$-$, incorporating preferences. We show that Deductive ASPIC$^{\ominus}$ satisfies all five rationality postulates under a version of preferred semantics. This work opens new avenues for further research on robust and logically sound structured argumentation systems.

[331] The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks

Sebastiano A. Piccolo, Andrea Tagarelli

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor’s departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.

[332] Unbiased Prevalence Estimation with Multicalibrated LLMs

Fridolin Linder, Thomas Leeper, Daniel Haimovich, Niek Tax, Lorenzo Perini, Milan Vojnovic

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Estimating the prevalence of a category in a population using imperfect measurement devices (diagnostic tests, classifiers, or large language models) is fundamental to science, public health, and online trust and safety. Standard approaches correct for known device error rates but assume these rates remain stable across populations. We show this assumption fails under covariate shift and that multicalibration, which enforces calibration conditional on the input features rather than just on average, is sufficient for unbiased prevalence estimation under such shift. Standard calibration and quantification methods fail to provide this guarantee. Our work connects recent theoretical work on fairness to a longstanding measurement problem spanning nearly all academic disciplines. A simulation confirms that standard methods exhibit bias growing with shift magnitude, while a multicalibrated estimator maintains near-zero bias. While we focus the discussion mostly on LLMs, our theoretical results apply to any classification model. Two empirical applications – estimating employment prevalence across U.S. states using the American Community Survey, and classifying political texts across four countries using an LLM – demonstrate that multicalibration substantially reduces bias in practice, while highlighting that calibration data should cover the key feature dimensions along which target populations may differ.

[333] Engaged AI Governance: Addressing the Last Mile Challenge Through Internal Expert Collaboration

Simon Jarvers, Orestis Papakyriakopoulos

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Under the EU AI Act, translating AI governance requirements into software development practice remains challenging. While AI governance frameworks exist at industry and organizational levels, empirical evidence of team-level implementation is scarce. We address this “Last Mile” Challenge through insider action research embedded within an AI startup. We present a legal-text-to-action pipeline that translates EU AI Act requirements into actionable strategies through internal expert collaboration by extracting requirements from legal text, engaging practitioners in assessment and ideation, and prioritizing implementation through collective evaluation. Our analysis reveals three patterns in how practitioners perceive regulatory requirements: convergence (compliance aligns with development priorities), existing practice (current work already satisfies requirements), and disconnection (requirements perceived as administrative overhead). Based on these patterns, we discuss when governance might be treated genuinely or performatively. Practitioners prioritize requirements that serve end-users or their own development needs, but view verification-oriented requirements as box-ticking exercises. This distinction suggests a translation challenge: regulatory requirements risk superficial treatment unless practitioners understand how compliance serves system quality and user protection. Expert collaboration offers a practical mechanism for transforming governance from external imposition to shared ownership and making previously invisible governance work visible and collective.

[334] Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation

Jingyang Li, Xin Chen, Hongfei Fu, Guoqiang Li

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The problem of probabilistic verification of a neural network investigates the probability of satisfying the safe constraints in the output space when the input is given by a probability distribution. It is significant to answer this problem when the input is affected by disturbances often modeled by probabilistic variables. In the paper, we propose a novel neural network probabilistic verification framework which computes a guaranteed range for the safe probability by efficiently finding safe and unsafe probabilistic hulls. Our approach consists of three main innovations: (1) a state space subdivision strategy using regression trees to produce probabilistic hulls, (2) a boundary-aware sampling method which identifies the safety boundary in the input space using samples that are later used for building regression trees, and (3) iterative refinement with probabilistic prioritization for computing a guaranteed range for the safe probability. The accuracy and efficiency of our approach are evaluated on various benchmarks including ACAS Xu and a rocket lander controller. The result shows an obvious advantage over the state of the art.

[335] Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

Chris Schneider, Philipp Schoenegger, Ben Bariach

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-layer architecture that decouples personal data from shared weights by combining a static base model, composable domain-expert LoRA adapters that shape behavior without imparting user data, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B confirms per-user differentiation in which personal data influences outputs while remaining isolated, verified by a return to baseline after proxy removal (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) and near-zero cross-user contamination. Because user-specific information never enters shared weights, the architecture mitigates model inversion, membership inference, and training-data extraction against shared model components by construction. The approach converts machine unlearning from an intractable weight-editing problem into a deterministic deletion operation that preserves personalization alongside privacy-enhancing guarantees and is compatible with differentially private stochastic gradient descent (DP-SGD) for privacy-preserving shared model improvement.

[336] CoFEE: Reasoning Control for LLM-Based Feature Discovery

Maximilian Westermann, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Yagiz Ihlamur, Kelvin Amoaba, Joseph Ternasky, Fuat Alican, Yigit Ihlamur

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and post-outcome signals. With the introduction of ever-improving Large Language Models (LLMs), our method provides a structured method for addressing this challenge. LLMs are well suited for this task by being able to process large amounts of information, but unconstrained feature generation can lead to weak features. In this work, we study reasoning control in LLMs by inducing cognitive behaviors for improving feature discovery. We introduce CoFEE (Cognitive Feature Engineering Engine), a reasoning control framework that enforces cognitive behaviors in how the LLM reasons during feature discovery. From a machine learning perspective, these cognitive behaviors act as structured inductive biases over the space of candidate features generated by the model. These behaviors have been exploited with success in ML models, and include backward chaining from outcomes, subgoal decomposition, verification against observability and leakage criteria, and explicit backtracking of rejected reasoning paths. In a controlled comparison, we show that enforcing cognitive behaviors yields features with higher empirical predictability than those under unconstrained vanilla LLM prompts. CoFEE achieves an average Success Rate Score that is 15.2% higher than the vanilla approach, while generating 29% fewer features and reducing costs by 53.3%. Using held-out feature evaluation, we assess whether cognitively induced features generalize beyond the data used for discovery. Our results indicate that, in our evaluated setting, reasoning control is associated with improvements in quality and efficiency of LLM-based feature discovery.

[337] To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

Nevena Lazić, Liam Fowl, András György, Csaba Szepesvári

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate the ability of decoder-only transformer models to perform abstract symbolic reasoning; specifically solving propositional logic reasoning problems given in-context. Previous work demonstrated that models fail to generalize to problems involving variable names that were not observed during training, and it was shown that one reason behind this is the difficulty of copying (or generating) unseen tokens. We show both theoretically and empirically that a particular representational collapse also has a crucial role: the unembeddings (last-layer weights) of unseen tokens collapse to nearly the same vector during training. The collapse makes distinguishing multiple unseen variables difficult for the model (especially when the embedding and unembedding parameters are shared), and provides a mechanistic explanation for the effectiveness of existing heuristic interventions like “active forgetting”, which periodically reset the token (un)embeddings. Based on these observations, we devise a combination of techniques, involving a small architecture change facilitating copying, data diversity, and freezing or resetting (un)embeddings, that achieves generalization to unseen tokens. We support our claims with extensive controlled experiments on propositional logic reasoning problems. Beyond synthetic experiments, we also observe evidence of (un)embedding collapse in the open-weight models in the Gemma 3 family, which includes 99 unused tokens reserved for downstream use. Empirically we find that the correlated embeddings of these tokens are a poor initialization for finetuning applications.

[338] GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

Qizhuo Xie, Yunhui Liu, Yu Xing, Qianzi Hou, Xudong Jin, Tao Zheng, Tieke He

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these modalities, they typically treat quantization as flat numerical compression, resulting in semantically entangled codes that fail to mirror the hierarchical nature of human reasoning. In this paper, we propose GS-Quant, a novel framework that generates semantically coherent and structurally stratified discrete codes for KG entities. Unlike prior methods, GS-Quant is grounded in the insight that entity representations should follow a linguistic coarse-to-fine logic. We introduce a Granular Semantic Enhancement module that injects hierarchical knowledge into the codebook, ensuring that earlier codes capture global semantic categories while later codes refine specific attributes. Furthermore, a Generative Structural Reconstruction module imposes causal dependencies on the code sequence, transforming independent discrete units into structured semantic descriptors. By expanding the LLM vocabulary with these learned codes, we enable the model to reason over graph structures isomorphically to natural language generation. Experimental results demonstrate that GS-Quant significantly outperforms existing text-based and embedding-based baselines. Our code is publicly available at https://github.com/mikumifa/GS-Quant.

[339] Enabling and Inhibitory Pathways of University Students’ Willingness to Disclose AI Use: A Cognition-Affect-Conation Perspective

Yiran Du, Huimin He

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The increasing integration of artificial intelligence (AI) in higher education has raised important questions regarding students’ transparency in reporting AI-assisted work. This study investigates the psychological mechanisms underlying university students’ willingness to disclose AI use by applying the Cognition–Affect–Conation (CAC) framework. A sequential explanatory mixed-methods design was employed. In the quantitative phase, survey data were collected from 546 university students and analysed using structural equation modelling to examine the relationships among cognitive perceptions, affective responses, and disclosure intention. In the qualitative phase, semi-structured interviews with 22 students were conducted to further interpret the quantitative findings. The results indicate that psychological safety significantly increases students’ willingness to disclose AI use and is positively shaped by perceived fairness, perceived teacher support, and perceived organisational support. Conversely, evaluation apprehension reduces disclosure intention and psychological safety, and is strengthened by perceived stigma, perceived uncertainty, and privacy concern. Qualitative findings further reveal that institutional clarity and supportive instructional practices encourage openness, whereas policy ambiguity and fear of negative evaluation often lead students to adopt cautious or strategic disclosure practices. Overall, the study highlights the dual role of enabling and inhibitory psychological mechanisms in shaping AI-use disclosure and underscores the importance of supportive institutional environments and clear guidance for promoting responsible AI transparency in higher education.

Dat To-Thanh, Nghia Nguyen-Trong, Hoang Vo, Hieu Bui-Minh, Tinh-Anh Nguyen-Nhu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Image enhancement models for mobile devices often struggle to balance high output quality with the fast processing speeds required by mobile hardware. While recent deep learning models can enhance low-quality mobile photos into high-quality images, their performance is often degraded when converted to lower-precision formats for actual use on mobile phones. To address this training-deployment mismatch, we propose an efficient image enhancement model designed specifically for mobile deployment. Our approach uses a hierarchical network architecture with gated encoder blocks and multiscale refinement to preserve fine-grained visual features. Moreover, we incorporate Quantization-Aware Training (QAT) to simulate the effects of low-precision representation during the training process. This allows the network to adapt and prevents the typical drop in quality seen with standard post-training quantization (PTQ). Experimental results demonstrate that the proposed method produces high-fidelity visual output while maintaining the low computational overhead needed for practical use on standard mobile devices. The code will be available at https://github.com/GenAI4E/QATIE.git.

[341] Thinking with Reasoning Skills: Fewer Tokens, More Accuracy

Guangxiang Zhao, Qilong Shi, Xusen Xiao, Xiangzheng Zhang, Tong Yang, Lin Sun

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance. The resulting lower per-request cost indicates strong practical and economic potential for real-world deployment.

[342] Who Defines “Best”? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

Minji Jung, Minjae Lee, Yejin Kim, Sarang Choi, Minsuk Kahng

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: LLM leaderboards are widely used to compare models and guide deployment decisions. However, leaderboard rankings are shaped by evaluation priorities set by benchmark designers, rather than by the diverse goals and constraints of actual users and organizations. A single aggregate score often obscures how models behave across different prompt types and compositions. In this work, we conduct an in-depth analysis of the dataset used in the LMArena (formerly Chatbot Arena) benchmark and investigate this evaluation challenge by designing an interactive visualization interface as a design probe. Our analysis reveals that the dataset is heavily skewed toward certain topics, that model rankings vary across prompt slices, and that preference-based judgments are used in ways that blur their intended scope. Building on this analysis, we introduce a visualization interface that allows users to define their own evaluation priorities by selecting and weighting prompt slices and to explore how rankings change accordingly. A qualitative study suggests that this interactive approach improves transparency and supports more context-specific model evaluation, pointing toward alternative ways to design and use LLM leaderboards.

[343] Inferring High-Level Events from Timestamped Data: Complexity and Medical Applications

Yvon K. Awuklu, Meghyn Bienvenu, Katsumi Inoue, Vianney Jouhet, Fleur Mougin

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this paper, we develop a novel logic-based approach to detecting high-level temporally extended events from timestamped data and background knowledge. Our framework employs logical rules to capture existence and termination conditions for simple temporal events and to combine these into meta-events. In the medical domain, for example, disease episodes and therapies are inferred from timestamped clinical observations, such as diagnoses and drug administrations stored in patient records, and can be further combined into higher-level disease events. As some incorrect events might be inferred, we use constraints to identify incompatible combinations of events and propose a repair mechanism to select preferred consistent sets of events. While reasoning in the full framework is intractable, we identify relevant restrictions that ensure polynomial-time data complexity. Our prototype system implements core components of the approach using answer set programming. An evaluation on a lung cancer use case supports the interest of the approach, both in terms of computational feasibility and positive alignment of our results with medical expert opinions. While strongly motivated by the needs of the healthcare domain, our framework is purposely generic, enabling its reuse in other areas.

[344] Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Anuj Sadani, Deepak Kumar

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The Model Context Protocol (MCP) has become a common interface for connecting large language model (LLM) agents to external tools, but its reliance on stateless, eager schema injection imposes a hidden per-turn overhead the MCP Tax or Tools Tax that practitioner reports place between roughly 10k and 60k tokens in typical multi-server deployments. This payload inflates the key-value cache, is associated with reasoning degradation as context utilization approaches published fracture points around 70%, and turns token budgets into a recurring operational cost. We introduce Tool Attention, a middleware-layer mechanism that generalizes the “Attention Is All You Need” paradigm from self-attention over tokens to gated attention over tools. Tool Attention combines (i) an Intent Schema Overlap (ISO) score from sentence embeddings, (ii) a state-aware gating function enforcing preconditions and access scopes, and (iii) a two-phase lazy schema loader that keeps a compact summary pool in context and promotes full JSON schemas only for top-k gated tools. We evaluate on a simulated 120-tool, six-server benchmark whose per-server token counts are calibrated to public audits of real MCP deployments. In this simulation, Tool Attention directly reduces measured per-turn tool tokens by 95.0% (47.3k -> 2.4k) and raises effective context utilization (a token-ratio quantity) from 24% to 91%. End-to-end figures for task success, latency, cost, and reasoning quality are reported as projections derived from the measured token counts combined with published deployment telemetry; they are not measured on live LLM agents, and we mark projected values explicitly throughout. Taken together, the results support a simple thesis: protocol-level efficiency, not raw context length, is a binding constraint on scalable gentic systems. The code for this work is accessible at https://github.com/asadani/tool-attention

[345] Alignment has a Fantasia Problem

Nathanael Jo, Zoe De Simone, Mitchell Gordon, Ashia Wilson

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users’ needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize the mechanisms and failures of Fantasia interactions. We then show why existing interventions are insufficient, and propose a research agenda for designing and evaluating AI systems that better help humans navigate uncertainty in their tasks.

[346] Bounding the Black Box: A Statistical Certification Framework for AI Risk Regulation

Natan Levy, Gadi Perl

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Artificial intelligence now decides who receives a loan, who is flagged for criminal investigation, and whether an autonomous vehicle brakes in time. Governments have responded: the EU AI Act, the NIST Risk Management Framework, and the Council of Europe Convention all demand that high-risk systems demonstrate safety before deployment. Yet beneath this regulatory consensus lies a critical vacuum: none specifies what ``acceptable risk’’ means in quantitative terms, and none provides a technical method for verifying that a deployed system actually meets such a threshold. The regulatory architecture is in place; the verification instrument is not. This gap is not theoretical. As the EU AI Act moves into full enforcement, developers face mandatory conformity assessments without established methodologies for producing quantitative safety evidence - and the systems most in need of oversight are opaque statistical inference engines that resist white-box scrutiny. This paper provides the missing instrument. Drawing on the aviation certification paradigm, we propose a two-stage framework that transforms AI risk regulation into engineering practice. In Stage One, a competent authority formally fixes an acceptable failure probability $δ$ and an operational input domain $\varepsilon$ - a normative act with direct civil liability implications. In Stage Two, the RoMA and gRoMA statistical verification tools compute a definitive, auditable upper bound on the system’s true failure rate, requiring no access to model internals and scaling to arbitrary architectures. We demonstrate how this certificate satisfies existing regulatory obligations, shifts accountability upstream to developers, and integrates with the legal frameworks that exist today.

[347] Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Chee Wei Tan, Yuchen Wang, Shangxin Guo

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon’s taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy LLM-powered game agents while actively engaging with AI-driven strategies. The LLM-based chatbot, integrated within Nemobot, demonstrates its capabilities across four distinct classes of games. For dictionary-based games, it compresses state-action mappings into efficient, generalized models for rapid adaptability. In rigorously solvable games, it employs mathematical reasoning to compute optimal strategies and generates human-readable explanations for its decisions. For heuristic-based games, it synthesizes strategies by combining insights from classical minimax algorithms (see, e.g., shannon1950chess) with crowd-sourced data. Finally, in learning-based games, it utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies through trial-and-error and imitation learning. Nemobot amplifies this framework by offering a programmable environment where users can experiment with tool-augmented generation and fine-tuning of strategic game agents. From strategic games to role-playing games, Nemobot demonstrates how AI agents can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own logic. This represents a step toward the long-term goal of self-programming AI.

[348] From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Scientific workflow systems automate execution – scheduling, fault tolerance, resource management – but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an agentic architecture that closes this gap through three layers: an LLM interprets natural language into structured intents (semantic layer); validated generators produce reproducible workflow DAGs (deterministic layer); and domain experts author ``Skills’’: markdown documents encoding vocabulary mappings, parameter constraints, and optimization strategies (knowledge layer). This decomposition confines LLM non-determinism to intent extraction: identical intents always yield identical workflows. We implement and evaluate the architecture on the 1000 Genomes population genetics workflow and Hyperflow WMS running on Kubernetes. In an ablation study on 150 queries, Skills raise full-match intent accuracy from 44% to 83%; skill-driven deferred workflow generation reduces data transfer by 92%; and the end-to-end pipeline completes queries on Kubernetes with LLM overhead below 15 seconds and cost under $0.001 per query.

[349] Handbook of Rough Set Extensions and Uncertainty Models

Takaaki Fujita, Florentin Smarandache

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Rough set theory models uncertainty by approximating target concepts through lower and upper sets induced by indiscernibility, or more generally, by granulation relations in data tables. This perspective captures vagueness caused by limited observational resolution and supports set-theoretic reasoning about what can be determined with certainty and what remains only possible. This book is written as a map of models. Rather than developing a single algorithmic pipeline in depth, it provides a systematic survey of the main rough set paradigms and their extension routes. More specifically, representative variants are organized according to (i) the underlying granulation mechanism, such as equivalence-based, tolerance-based, covering-based, neighborhood-based, and probabilistic approximations, and (ii) the uncertainty semantics attached to data and relations, such as crisp, fuzzy, intuitionistic fuzzy, neutrosophic, and plithogenic settings. The book also explains how each choice changes the form of approximations and the interpretation of boundary regions. Throughout the book, small illustrative examples are used to clarify modeling intent and typical use cases in classification and decision support. Finally, an important clarification of scope should be noted. Since the main purpose of this book is to provide a map of models, the Abstract and Introduction should not lead readers to expect that feature reduction and rule induction are primary objectives. Although these topics are central in the rough set literature, they are treated here mainly as motivating applications and as entry points to the broader research landscape. The principal aim of the book is to survey and position rough set models and their extensions in a systematic and coherent manner.

[350] Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis

Zehua Pei, Hui-Ling Zhen, Lancheng Zou, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2502.04416: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2502.04416&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[351] C-SHAP for time series: An approach to high-level temporal explanations

Annemarie Jutte, Faizan Ahmed, Jeroen Linssen, Maurice van Keulen

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.11159: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.11159&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[352] KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems

Stepan Kulibaba, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.10177: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.10177&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[353] PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

Jiho Choi, Seojeong Park, Seongjong Song, Hyunjung Shim

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.21720: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.21720&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[354] Learning Reasoning Reward Models from Expert Demonstration via Inverse Reinforcement Learning

Claudio Fanconi, Nicolás Astorga, Mihaela van der Schaar

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.01857: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.01857&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[355] OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data

Alana Renda, Jillian Ross, Michael Cafarella, Jacob Andreas

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.15096: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.15096&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[356] mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

Tristan Torchet, Christian Metzner, Karthik Charan Raghunathan, Jimmy Weber, Sebastian Billaudelle, Laura Kriener, Melika Payvand

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.01829: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.01829&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[357] Multimodal Bayesian Network for Robust Assessment of Casualties in Autonomous Triage

Szymon Rusiecki, Cecilia G. Morales, Kimberly Elenberg, Leonard Weiss, Artur Dubrawski

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.18908: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.18908&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[358] HyperAdapt: Simple High-Rank Adaptation

Abel Gurung, Joseph Campbell

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.18629: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.18629&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[359] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Keyu Li, Junhao Shi, Yang Xiao, Mohan Jiang, Jie Sun, Yunze Wu, Dayuan Fu, Shijie Xia, Xiaojie Cai, Tianze Xu, Weiye Si, Wenjie Li, Dequan Wang, Pengfei Liu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.11044: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.11044&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[360] SemaPop: Semantic-Persona Conditioned and Controllable Population Synthesis

Zhenlin Qin, Yancheng Ling, Leizhen Wang, Francisco Câmara Pereira, Zhenliang Ma

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.11569: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.11569&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[361] ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models

Jincheng Liu, Sijun He, Jingjing Wu, Xiangsen Wang, Yang Chen, Zhaoqi Kuang, Siqi Bao, Yuan Yao

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24239: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24239&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[362] Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

Hugh Xuechen Liu, Kıvanç Tatar

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.07101: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.07101&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[363] Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data

Jiancheng Zhang, Yinglun Zhu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.03247: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.03247&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[364] Empirical Comparison of Agent Communication Protocols for Task Orchestration

Ivan Dobrovolskyi

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.22823: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.22823&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[365] On the Relationship between Bayesian Networks and Probabilistic Structural Causal Models

Peter J.F. Lucas, Eleonora Zullo, Fabio Stella

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.27406: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.27406&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[366] Wiring the ‘Why’: A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

Moein Salimi, Shaygan Adim, Danial Parnian, Nima Alighardashi, Mahdi Jafari Siavoshani, Mohammad Hossein Rohban

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.08016: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.08016&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[367] DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?

Young-Suk Lee, Ramon Fernandez Astudillo, Radu Florian

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.09251: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.09251&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[368] QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence

Zhichao Lin, Zhichao Liang, Gaoqiang Liu, Meng Xu, Baoyu Xiang, Shuxin Zhao, Yao Wu, Jian Xu, Guanjun Jiang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.12867: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.12867&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[369] Addressing divergent representations from causal interventions on neural networks

Satchel Grant, Simon Jerome Han, Alexa R. Tartaglini, Christopher Potts

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.04638: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.04638&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[370] AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance

Abiodun A. Solanke

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.12875: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.12875&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[371] Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis

Yiling He, Junchi Lei, Hongyu She, Shuo Shao, Xinran Zheng, Yiping Liu, Zhan Qin, Lorenzo Cavallaro

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.11439: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.11439&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[372] HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks

Fan Cui, Hongyuan Hou, Zizhang Luo, Chenyun Yin, Yun Liang

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.14709: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.14709&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[373] Why Do Language Model Agents Whistleblow?

Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.17085: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.17085&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[374] ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15994: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15994&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[375] Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations

Emily Reif, Claire Yang, Jared Hwang, Deniz Nazar, Noah A. Smith, Jeff Heer

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.18724: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.18724&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[376] Stabilising Generative Models of Attitude Change

Jayd Matyas, William A. Cunningham, Alexander Sasha Vezhnevets, Dean Mobbs, Edgar A. Duéñez-Guzmán, Joel Z. Leibo

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19791: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19791&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[377] Deconstructing Superintelligence: Identity, Self-Modification and Différance

Elija Perrier

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19845: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19845&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[378] FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

Yingjie Gu, Wenjian Xiong, Liqiang Wang, Pengcheng Ren, Chao Li, Xiaojing Zhang, Yijuan Guo, Qi Sun, Jingyao Ma, Shidang Shi

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20300: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20300&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[379] RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

Zehua Liu, Shuqi Liu, Tao Zhong, Mingxuan Yuan

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.09253: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.09253&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[380] Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

Nattavudh Powdthavee

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20652: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20652&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[381] GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

Jiaying Zhang, Lei Shi, Jiguo Li, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.09361: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.09361&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[382] Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2310.02635: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2310.02635&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[383] The Economics of p(doom): Scenarios of Existential Risk and Economic Growth in the Age of Transformative AI

Jakub Growiec, Klaus Prettner

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.07341: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.07341&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[384] Algebraic Language Models for Inverse Design of Metamaterials via Diffusion Transformers

Li Zheng, Siddhant Kumar, Dennis M. Kochmann

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.15753: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.15753&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[385] A Comprehensive Guide to Differential Privacy: From Theory to User Expectations

Napsu Karmitsa, Antti Airola, Tapio Pahikkala, Tinja Pitkämäki

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.03294: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.03294&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[386] InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training

Shiju Wang, Yujie Wang, Ao Sun, Fangcheng Fu, Zijian Zhu, Bin Cui, Xu Han, Kaisheng Ma

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.21275: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.21275&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[387] Analyzing Shapley Additive Explanations to Understand Anomaly Detection Algorithm Behaviors and Their Complementarity

Jordan Levy, Paul Saves, Moncef Garouani, Nicolas Verstaevel, Benoit Gaudou

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00208: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00208&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[388] A Quantale-Weakness Route to $P \neq NP$ via CD Evidence Normalization and Gauge-Buffered Locked Ensembles

Ben Goertzel

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.08814: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.08814&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[389] Continuous-Utility Direct Preference Optimization

Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zihao He, Muhammad Usman Rafique, Asad Aali, Muhammad Ali Jamshed, John M. Cioffi, Emily Fox

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00931: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00931&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[390] AI for software engineering: from probable to provable

Bertrand Meyer

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.23159: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.23159&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[391] OpInf-LLM: Parametric PDE Solving with LLMs via Operator Inference

Zhuoyuan Wang, Hanjiang Hu, Xiyu Deng, Saviz Mowlavi, Yorie Nakahira

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.01493: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.01493&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[392] Language-Conditioned Safe Trajectory Generation for Spacecraft Rendezvous

Yuji Takubo, Arpit Dwivedi, Sukeerth Ramkumar, Luis A. Pabon, Daniele Gammelli, Marco Pavone, Simone D’Amico

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.09111: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.09111&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[393] Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra

Stefan Kuhn, Vandana Dwarka, Przemyslaw Karol Grenda, Eero Vainikko

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03875: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03875&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[394] Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

Yun Bian, Yi Chen, HaiQuan Wang, ShiHao Li, Zhe Cui

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.02438: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.02438&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[395] How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chaowen Hu, Lu Pan, Ke Zeng, Xunliang Cai

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.19208: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.19208&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[396] NPU Design for Diffusion Language Model Inference

Binglei Lou, Haoran Wu, Kevin Lau, Gregor MacDonald, Jiayi Nie, Yao Lai, Can Xiao, Xuan Guo, Jianyi Cheng, Rika Antonova, Robert Mullins, Aaron Zhao

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.20706: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.20706&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[397] An Overlay Multicast Routing Method Based on Network Situational Awareness and Hierarchical Multi-Agent Reinforcement Learning

Miao Ye, Yanye Chen, Yong Wang, Cheng Zhu, Qiuxiang Jiang, Gai Huang, Feng Ding

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.13211: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.13211&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[398] ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

Yuseon Choi, Jingu Lee, Jungjun Oh, Sunjoo Whang, Byeongcheol Kim, Minsung Kim, Hoi-Jun Yoo, Sangjin Kim

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.14626: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.14626&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[399] ATLAS: AI-Assisted Threat-to-Assertion Learning for System-on-Chip Security Verification

Ishraq Tashdid, Kimia Tasnia, Alexander Garcia, Jonathan Valamehr, Sazadur Rahman

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.01170: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.01170&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[400] Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

Jiaoyang Ruan, Xin Gao, Yinda Chen, Hengyu Zeng, Liang Du, Guanghao Li, Jie Fu, Jian Pu

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16565: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16565&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[401] Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Saptarshi Chakraborty, Quentin Berthet, Peter L. Bartlett

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.03700: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.03700&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[402] LiveSense: A Real-Time Wi-Fi Sensing Platform for Range-Doppler on COTS Laptop

Jessica Sanson, Rahul C. Shah, Maximilian Pinaroc, Cagri Tanriover, Valerio Frascolla

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.06545: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.06545&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[403] StormNet: Improving storm surge predictions with a GNN-based spatio-temporal offset forecasting model

Noujoud Nader, Stefanos Giaremis, Clint Dawson, Carola Kaiser, Karame Mohammadiporshokooh, Hartmut Kaiser

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.20688: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.20688&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[404] Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework

Eduardo Di Santi

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.18677: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.18677&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[405] Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos, Diomidis Spinellis

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.18740: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.18740&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Wuyang Luan, Junhui Li, Weiguang Zhao, Wenjian Zhang, Tieru Wu, Rui Ma

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.05673: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.05673&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[407] Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

Edwin C. Montiel-Vazquez, Christian Arzate Cruz, Stefanos Gkikas, Thomas Kassiotis, Giorgos Giannakakis, Randy Gomez

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.11417: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.11417&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[408] LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.12994: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.12994&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[409] Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks

Seyedreza Mohseni, Sarvesh Baskar, Edward Raff, Manas Gaur

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15390: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15390&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[410] The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

Robert Feldt, Per Lenberg, Julian Frattini, Dhasarathy Parthasarathy

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15468: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15468&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[411] Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

Rajveer Bachkaniwala, Chengqi Luo, Richard So, Divya Mahajan, Kexin Rong

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16395: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16395&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[412] Quantifying how AI Panels improve precision

Nicholas CL Beale

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16432: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16432&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[413] Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19533: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19533&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[414] Model Capability Assessment and Safeguards for Biological Weaponization

Michael Richter

Main category: cs.AI

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19811: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19811&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[415] Vibrotactile Preference Learning: Uncertainty-Aware Preference Learning for Personalized Vibration Feedback

Rongtao Zhang, Xin Zhu, Masoume Pourebadi Khotbehsara, Warren Dao, Erdem Bıyık, Heather Culbertson

[425] Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach

Eli Gildish, Michael Grebshtein, Igor Makienko

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Denoising of periodic signals and accurate waveform estimation are core tasks across many signal processing domains, including speech, music, medical diagnostics, radio, and sonar. Although deep learning methods have recently shown performance improvements over classical approaches, they require substantial computational resources and are usually trained separately for each signal observation. This study proposes a computationally efficient method based on DCNN and Re-sampling, termed R-DCNN, designed for operation under strict power and resource constraints. The approach targets signals with varying fundamental frequencies and requires only a single observation for training. It generalizes to additional signals via a lightweight resampling step that aligns time scales in signals with different frequencies to re-use the same network weights. Despite its low computational complexity, R-DCNN achieves performance comparable to state-of-the-art classical methods, such as autoregressive (AR)-based techniques, as well as conventional DCNNs trained individually for each observation. This combination of efficiency and performance makes the proposed method particularly well suited for deployment in resource-constrained environments without sacrificing denoising or estimation accuracy.

[426] Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance

Weitao Du

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While standard flow-matching models transport noise to data uniformly, incorporating an explicit generation order - specifically, establishing coarse, low-frequency structure before fine detail - has proven highly effective for synthesizing natural images. Two recent works offer distinct paradigms for this. K-Flow imposes a hard frequency constraint by reinterpreting a frequency scaling variable as flow time, running the trajectory inside a transformed amplitude space. Latent Forcing provides a soft ordering mechanism by coupling the pixel flow with an auxiliary semantic latent flow via asynchronous time schedules, leaving the pixel interpolation path itself untouched. Viewed from the angle of improving pixel generation, we observe that forcing - guiding generation with an earlier-maturing auxiliary stream - offers a highly compatible route to scale-ordered generation without rewriting the core flow coordinate. Building on this, we propose Frequency-Forcing, which realizes K-Flow’s frequency ordering through Latent Forcing’s soft mechanism: a standard pixel flow is guided by an auxiliary low-frequency stream that matures earlier in time. Unlike Latent Forcing, whose scratchpad relies on a heavy pretrained encoder (e.g., DINO), our frequency scratchpad is derived from the data itself via a lightweight learnable wavelet packet transform. We term this a self-forcing signal, which avoids external dependencies while learning a basis better adapted to data statistics than the fixed bases used in hard frequency flows. On ImageNet-256, Frequency-Forcing consistently improves FID over strong pixel- and latent-space baselines, and naturally composes with a semantic stream to yield further gains. This illustrates that forcing-based scale ordering is a versatile, path-preserving alternative to hard frequency flows.

[427] Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

Matt Franchi, Madiha Zahrah Choksi, Harold Triedman, Helen Nissenbaum

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Information handling practices of LLM agents are broadly misaligned with the contextual privacy expectations of their users. Contextual Integrity (CI) provides a principled framework, defining privacy as the appropriate flow of information within context-relative norms. However, existing approaches either double inference cost via supervisor-assistant architectures, or fine-tune on narrow task-specific data. We propose extracting normative simulacra (structured representations of norms and information flows) from fiction novels and using them to fine-tune LLMs via supervised learning followed by GRPO reinforcement learning. Our composite reward function combines programmatic signals, including task clarity (subsuming schema validity, construct discrimination, and extraction confidence), structural completeness, internal consistency, and context identification, with an LLM judge that evaluates whether the model’s privacy reasoning is grounded in the held-out normative universe of the source text. To mitigate overfitting, we introduce per-completion contrastive scoring: each completion is evaluated against both the correct normative universe and a randomly selected wrong one, teaching the model to condition on context rather than memorize source-specific norms. We evaluate on five CI-aligned benchmarks spanning distinct societal contexts and ablate the contributions of RL and normative grounding. Across seven models, SFT introduces a conservative prior toward restricting information flow, improving recognition of privacy-relevant situations but not the correctness of privacy judgments. GRPO with normative grounding achieves the highest score on a law compliance benchmark and strongest correlation with crowdsourced human privacy expectations, demonstrating that fiction-derived normative simulacra can teach contextual privacy reasoning that transfers to real-world domains.

[428] Do Masked Autoencoders Improve Downhole Prediction? An Empirical Study on Real Well Drilling Data

Aleksander Berezowski, Hassan Hassanzadeh, Gouri Ginde

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Downhole drilling telemetry presents a fundamental labeling asymmetry: surface sensor data are generated continuously at 1Hz, while labeled downhole measurements are costly, intermittent, and scarce. Current machine learning approaches for downhole metric prediction universally adopt fully supervised training from scratch, which is poorly suited to this data regime. We present the first empirical evaluation of masked autoencoder (MAE) pretraining for downhole drilling metric prediction. Using two publicly available Utah FORGE geothermal wells comprising approximately 3.5 million timesteps of multivariate drilling telemetry, we conduct a systematic full-factorial design space search across 72 MAE configurations and compare them against supervised LSTM and GRU baselines on the task of predicting Total Mud Volume. Results show that the best MAE configuration reduces test mean absolute error by 19.8% relative to the supervised GRU baseline, while trailing the supervised LSTM baseline by 6.4%. Analysis of design dimensions reveals that latent space width is the dominant architectural choice (Pearson $r = -0.59$ with test MAE), while masking ratio has negligible effect, an unexpected finding attributed to high temporal redundancy in 1Hz drilling data. These results establish MAE pretraining as a viable paradigm for drilling analytics and identify the conditions under which it is most beneficial.

[429] FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

Fei Zuo, Xiaoyan Xi, Quanyi Zeng, Feiyu Wang, Ho Fai Leung

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models are increasingly deployed on CPU-only platforms where memory bandwidth is the primary bottleneck for autoregressive generation. Weight quantization to four bits or below reduces memory pressure, yet existing systems still dequantize weights and perform floating-point multiplications, limiting the achievable gains. Ternary weights in {-1, 0, +1} provide a more efficient alternative, replacing multiplications with conditional additions, subtractions, or no-ops. While Fairy2i shows that ternary LLMs can match FP16 quality, its runtime does not exploit this structure. We present FairyFuse, an inference system that enables multiplication-free execution on commodity CPUs by fusing the eight real-valued sub-GEMVs of each widely-linear layer into a single AVX-512 loop using masked additions and subtractions, with zero floating-point multiplications. Roofline analysis shows that 16x weight compression shifts memory-bound GEMV toward the compute regime on bandwidth-limited CPUs, yielding a 29.6x kernel speedup while offering little benefit on GPUs. End-to-end, FairyFuse achieves 32.4 tokens per second on a single Intel Xeon 8558P, outperforming llama.cpp Q4_K_M by 1.24x with near-lossless quality (WikiText-2 perplexity 5.52 vs. 5.47 FP16; downstream accuracy 66.0%).

[430] Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

Zhixin Zhang, Shabo Zhang, Chengcan Wu, Zeming Wei, Meng Sun

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Transformers suffer from a high computational cost that grows with sequence length for self-attention, making inference in long streams prohibited by memory consumption. Constant-memory alternatives such as RNNs and SSMs compress history into states with fixed size and thus lose long-tail dependencies, while methods that memorize contexts into parameters, such as Test-Time Training (TTT), are prone to overfitting token-level projection and fail to preserve the causal effect of context in pretrained LLMs. We propose Absorber LLM, which formulates long-context retention as a self-supervised causal synchronization: after absorbing historical contexts into parameters, a contextless model should match the original model with full context on future generations. We optimize this objective by synchronizing internal behaviors of the updated model with the original one, ensuring context absorption and generalization. Experiments on long-context and streaming benchmarks show that Absorber LLM reduces inference memory and improves accuracy over prior parameter-as-memory baselines.

[431] The Path Not Taken: Duality in Reasoning about Program Execution

Eshgin Hasanov, Md Mahadi Hassan Sibat, Santu Karmaker, Aashish Yadavally

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) have shown remarkable capabilities across diverse coding tasks. However, their adoption requires a true understanding of program execution rather than relying on surface-level patterns. Existing benchmarks primarily focus on predicting program properties tied to specific inputs (e.g., code coverage, program outputs). As a result, they provide a narrow view of dynamic code reasoning and are prone to data contamination. We argue that understanding program execution requires evaluating its inherent duality through two complementary reasoning tasks: (i) predicting a program’s observed behavior for a given input, and (ii) inferring how the input must be mutated toward a specific behavioral objective. Both tasks jointly probe a model’s causal understanding of execution flow. We instantiate this duality in DexBench, a benchmark comprising 445 paired instances, and evaluate 13 LLMs. Our results demonstrate that dual-path reasoning provides a robust and discriminative proxy for dynamic code understanding.

[432] Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

Yuzhen Mao, Michael Y. Li, Emily B. Fox

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Scaling large language models to long contexts is challenging due to the quadratic computational cost of full attention. Mitigation approaches include KV-cache selection or compression techniques. We instead provide an effective and end-to-end learnable bridge between the two without requiring architecture modification. In particular, our key insight is that interleaved gist compression tokens – which provide a learnable summary of sets of raw tokens – can serve as routing signals for sparse attention. Building on this, we introduce selective unfolding via GSA, which first compresses the context into gist tokens, then selects the most relevant gists, and subsequently restores the corresponding raw chunks for detailed attention. This yields a simple coarse-to-fine mechanism that combines compact global representations with targeted access to fine-grained evidence. We further incorporate this process directly into training in an end-to-end fashion, avoiding the need for external retrieval modules. In addition, we extend the framework hierarchically via recursive gist-of-gist construction, enabling multi-resolution context access with logarithmic per-step decoding complexity. Empirical results on LongBench and RAG benchmarks demonstrate that our method consistently outperforms other compression baselines as well as inference-time sparse attention methods across compression ratios from $8\times$ to $32\times$. The code is available at: https://github.com/yuzhenmao/gist-sparse-attention/

[433] Validating a Deep Learning Algorithm to Identify Patients with Glaucoma using Systemic Electronic Health Records

John Xiang, Rohith Ravindranath, Sophia Y. Wang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We evaluated whether a glaucoma risk assessment (GRA) model trained on All of Us national data can identify patients at high probability of glaucoma using only systemic electronic health records (EHR) at an independent institution. In this cross-sectional study, 20,636 Stanford patients seen from November 2013 to January 2024 were included (15% with glaucoma). A pretrained GRA model was fine-tuned on the Stanford cohort and tested on a held-out set using demographics, systemic diagnoses, medications, laboratory results, and physical examination measurements as inputs. The best model achieved AUROC 0.883 and PPV 0.657. Calibration was consistent with clinical risk: the highest prediction decile showed the greatest glaucoma diagnosis rate (65.7%) and treatment rate (57.0%). Performance improved with more trainable layers up to 15 and with additional data. An EHR-only GRA model may enable scalable and accessible pre-screening without specialized imaging.

[434] ILDR: Geometric Early Detection of Grokking

Shreel Golwala

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Grokking describes a delayed generalization phenomenon in which a neural network achieves perfect training accuracy long before validation accuracy improves, followed by an abrupt transition to strong generalization. Existing detection signals are indirect: weight norm reflects parameter-space regularization and consistently lags the transition, while GrokFast’s slow gradient EMA, used without gradient amplification, is unstable across seeds with standard deviation exceeding mean lead time. We propose the Inter/Intra-class Distance Ratio (ILDR), a geometric metric computed on second-to-last layer representations as the ratio of inter-class centroid separation to intra-class scatter. ILDR provides an early detection signal: it rises and crosses a threshold at 2.5 times its baseline before the grokking transition appears in validation accuracy, indicating early geometric reorganization in representation space. Grounded in Fisher’s linear discriminant criterion, ILDR requires no eigendecomposition and runs in O(|C|^2 + N). It is evaluated exclusively on held-out data, making it robust to memorization effects. Across modular arithmetic and permutation group composition (S5), ILDR leads the grokking transition by 9 to 73 percent of the training budget, with lead time increasing with task algebraic complexity. Over eight random seeds, ILDR leads by 950 +/- 250 steps with a coefficient of variation of 26 percent, and post-grokking variance drops by 1696 times, consistent with a sharp phase transition in representation space. Using ILDR as an early stopping trigger reduces training by 18.6 percent on average. Optimizer interventions triggered at the ILDR threshold demonstrate bidirectional control over the transition, suggesting ILDR tracks representational conditions underlying generalization rather than a downstream correlate.

[435] Clinically Interpretable Sepsis Early Warning via LLM-Guided Simulation of Temporal Physiological Dynamics

Weizhi Nie, Zhen Qu, Weijie Wang, Chunpei Li, Ke Lu, Bingyang Zhou, Hongzhi Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Timely and interpretable early warning of sepsis remains a major clinical challenge due to the complex temporal dynamics of physiological deterioration. Traditional data-driven models often provide accurate yet opaque predictions, limiting physicians’ confidence and clinical applicability. To address this limitation, we propose a Large Language Model (LLM)-guided temporal simulation framework that explicitly models physiological trajectories prior to disease onset for clinically interpretable prediction. The framework consists of a spatiotemporal feature extraction module that captures dynamic dependencies among multivariate vital signs, a Medical Prompt-as-Prefix module that embeds clinical reasoning cues into LLMs, and an agent-based post-processing component that constrains predictions within physiologically plausible ranges. By first simulating the evolution of key physiological indicators and then classifying sepsis onset, our model offers transparent prediction mechanisms that align with clinical judgment. Evaluated on the MIMIC-IV and eICU databases, the proposed method achieves superior AUC scores (0.861-0.903) across 24-4-hour pre-onset prediction tasks, outperforming conventional deep learning and rule-based approaches. More importantly, it provides interpretable trajectories and risk trends that can assist clinicians in early intervention and personalized decision-making in intensive care environments.

[436] Unsupervised Learning of Inter-Object Relationships via Group Homomorphism

Kyotaro Ushida, Takayuki Komatsu, Yoshiyuki Ohmura, Yasuo Kuniyoshi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While current deep learning models achieve high performance by learning statistical correlations from vast datasets,which stands in stark contrast to human learning. They lack the flexibility of humans-particularly preverbal infants-to autonomously acquire the underlying structure of the world from limited experience and adapt to novel situations. In this study, we propose an unsupervised representation learning method based on a hierarchical relationship in group operations, rather than statistical independence, aiming to build a computational model of the cognitive development of infants. The proposed model features an integrated architecture that simultaneously performs object segmentation and the extraction of motion laws from dynamic image sequences. By introducing the Homomorphism from algebra as a structural constraint within a neural network, the model structurally separates pixel-level changes into meaningful, decomposed transformation components, such as translation and deformation. Using interaction scenes (chasing and evading tasks) based on developmental science findings, we experimentally demonstrate that the model can segment multiple objects into individual slots without any ground-truth labels. Furthermore, we confirmed that relative movements between objects, such as approaching or receding, are accurately mapped and structured into a one-dimensional additive latent space. These results suggest that by introducing algebraic geometric constraints rather than relying solely on statistical correlation learning, physically interpretable “disentangled representations” can be acquired. This study contributes to the understanding of the process by which infants internalize environmental laws as structures and provides a new perspective for constructing artificial systems with developmental intelligence.

[437] Domain-Aware Hierarchical Contrastive Learning for Semi-Supervised Generalization Fault Diagnosis

Junyu Ren, Wensheng Gan, Philip S Yu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Fault diagnosis under unseen operating conditions remains highly challenging when labeled data are scarce. Semi-supervised domain generalization fault diagnosis (SSDGFD) provides a practical solution by jointly exploiting labeled and unlabeled source domains. However, existing methods still suffer from two coupled limitations. First, pseudo-labels for unlabeled domains are typically generated primarily from knowledge learned on the labeled source domain, which neglects domain-specific geometric discrepancies and thus induces systematic cross-domain pseudo-label bias. Second, unlabeled samples are commonly handled with a hard accept-or-discard strategy, where rigid thresholding causes imbalanced sample utilization across domains, while hard-label assignment for uncertain samples can easily introduce additional noise. To address these issues, we propose a unified framework termed domain-aware hierarchical contrastive learning (DAHCL) for SSDGFD. Specifically, DAHCL introduces a domain-aware learning (DAL) module to explicitly capture source-domain geometric characteristics and calibrate pseudo-label predictions across heterogeneous source domains, thereby mitigating cross-domain bias in pseudo-label generation. In addition, DAHCL develops a hierarchical contrastive learning (HCL) module that combines dynamic confidence stratification with fuzzy contrastive supervision, enabling uncertain samples to contribute to representation learning without relying on unreliable hard labels. In this way, DAHCL jointly improves the quality of supervision and the utilization of unlabeled samples. Furthermore, to better reflect practical industrial scenarios, we incorporate engineering noise into the SSDGFD evaluation protocol. Extensive experiments on three benchmark datasets demonstrate that…

[438] Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

Carter Blair, Ben Armstrong, Shiri Alouf-Heffetz, Nimrod Talmon, Davide Grossi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.

[439] IRIS: Interpolative Rényi Iterative Self-play for Large Language Model Fine-Tuning

Wenjie Liao, Like Wu, Liangjie Zhao, Shihui Xu, Shigeru Fujimura

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Self-play fine-tuning enables large language models to improve beyond supervised fine-tuning without additional human annotations by contrasting annotated responses with self-generated ones. Many existing methods rely on a fixed divergence regime. SPIN is closely related to a KL-based regime, SPACE to a Jensen-Shannon-style objective via noise contrastive estimation, and SPIF to $χ^2$-regularized self-play. Since these divergences exhibit different strengths depending on the distributional gap between model and target, no single choice appears to provide favorable learning dynamics across training stages. We propose IRIS (Interpolative Rényi Iterative Self-play), a Rényi-based self-play fine-tuning framework with a continuously adjustable objective. IRIS decomposes into two independent tilted risk terms over annotated and synthetic data, with exponential importance weights controlled by the order parameter $α$. We show that several self-play objectives can be interpreted as limiting or representative regimes at particular values of $α$, providing a unified theoretical perspective on these methods. An adaptive order schedule further adjusts $α$ to the distributional gap, shifting from sharper importance weighting early in training to smoother refinement near convergence. Theoretically, we establish the fixed-point property of IRIS and analyze how $α$ controls gradient concentration. Experiments on Zephyr-7B and Qwen2.5-3B across ten benchmarks show that IRIS improves upon baselines, reaching 44.57% average score with gains across iterations. In our setting, IRIS with only 26$k$ annotated samples surpasses standard supervised fine-tuning trained on the full 200$k$ dataset.

[440] Data-Driven Open-Loop Simulation for Digital-Twin Operator Decision Support in Wastewater Treatment

Gary Simethy, Daniel Ortiz Arroyo, Petar Durdevic

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Wastewater treatment plants (WWTPs) need digital-twin-style decision support tools that can simulate plant response under prescribed control plans, tolerate irregular and missing sensing, and remain informative over 12-36 h planning horizons. Meeting these requirements with full-scale plant data remains an open engineering-AI challenge. We present CCSS-RS, a controlled continuous-time state-space model that separates historical state inference from future control and exogenous rollout. The model combines typed context encoding, gain-weighted forcing of prescribed and forecast drivers, semigroup-consistent rollouts, and Student-t plus hurdle outputs for heavy-tailed and zero-inflated WWTP sensor data. On the public Avedøre full-scale benchmark, with 906,815 timesteps, 43% missingness, and 1-20 min irregular sampling, CCSS-RS achieves RMSE 0.696 and CRPS 0.349 at H=1000 across 10,000 test windows. This reduces RMSE by 40-46% relative to Neural CDE baselines and by 31-35% relative to simplified internal variants. Four case studies using a frozen checkpoint on test data demonstrate operational value: oxygen-setpoint perturbations shift predicted ammonium by -2.3 to +1.4 over horizons 300-1000; a smoothed setpoint plan ranks first in multi-criterion screening; context-only sensor outages raise monitored-variable RMSE by at most 10%; and ammonium, nitrate, and oxygen remain more accurate than persistence throughout the rollout. These results establish CCSS-RS as a practical learned simulator for offline scenario screening in industrial wastewater treatment, complementary to mechanistic models.

[441] Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs

Kibum Kim, Jiwan Kim, Kyle Min, Yueqi Wang, Jinyoung Moon, Julian McAuley, Chanyoung Park

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Video Large Language Models (Video LLMs) incur high inference latency due to a large number of visual tokens provided to LLMs. To address this, training-free visual token pruning has emerged as a solution to reduce computational costs; however, existing methods are primarily validated on Multiple-Choice Question Answering (MCQA) benchmarks, where coarse-grained cues often suffice. In this work, we reveal that these methods suffer a sharp performance collapse on fine-grained understanding tasks requiring precise visual grounding, such as hallucination evaluation. To explore this gap, we conduct a systematic analysis and identify sink tokens–semantically uninformative tokens that attract excessive attention–as a key obstacle to fine-grained video understanding. When these sink tokens survive pruning, they distort the model’s visual evidence and hinder fine-grained understanding. Motivated by these insights, we propose Sink-Token-aware Pruning (SToP), a simple yet effective plug-and-play method that introduces a sink score to quantify each token’s tendency to behave as a sink and applies this score to existing spatial and temporal pruning methods to suppress them, thereby enhancing video understanding. To validate the effectiveness of SToP, we apply it to state-of-the-art pruning methods (VisionZip, FastVid, and Holitom) and evaluate it across diverse benchmarks covering hallucination, open-ended generation, compositional reasoning, and MCQA. Our results demonstrate that SToP significantly boosts performance, even when pruning up to 90% of visual tokens.

[442] Principled Evaluation with Human Labels: One Rater at a Time and Rater Equivalence

Paul Resnick, Yuqing Kong, Grant Schoenebeck, Tim Weninger

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In many classification tasks, there is no definitive ground truth, only human judgments that may disagree. We address two challenges that arise in such settings: (1) how to use human raters to score classifiers, and (2) how to use them for comparison benchmarks. For the first, the common practice is to score classifiers against the majority vote of an evaluation panel of several human raters. We argue that this is not justified when either of two properties fails: objectivity or equanimity. Instead, under a utility model appropriate for such settings, scoring against one rater at a time and averaging the scores across raters is a more principled approach. For the second, we introduce the concept of rater equivalence: the smallest number of human raters whose combined judgment matches the classifier’s performance. We provide a provably optimal algorithm for combining benchmark panel labels, and demonstrate the framework through case studies.

[443] HARBOR: Automated Harness Optimization

Biswa Sengupta, Jinhua Wang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Long-horizon language-model agents are dominated, in lines of code and in operational complexity, not by their underlying model but by the harness that wraps it: context compaction, tool caching, semantic memory, trajectory reuse, speculative tool prediction, and the glue that binds the model to a sandboxed execution environment. We argue that harness design is a first-class machine-learning problem and that automated configuration search dominates manual stacking once the flag space exceeds a handful of bits. We defend this claim in two steps. First, we formalize automated harness optimization as constrained noisy Bayesian optimization over a mixed-variable, cost-heterogeneous configuration space with cold-start-corrected rewards and a posterior chance-constrained safety check, and give a reference solver, HARBOR (Harness Axis-aligned Regularized Bayesian Optimization Routine), built from a block-additive SAAS surrogate, multi-fidelity cost-aware acquisition, and TuRBO trust regions. Second, we instantiate the problem in a flag-gated harness over a production coding agent and report a controlled four-round manual-tuning case study against a fixed task suite and an end-to-end HARBOR run. The formulation itself is task-class agnostic: the configuration space, reward correction, acquisition, and safety check apply to any agent harness with a bounded flag space and a reproducible task suite.

[444] SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

Saish Sachin Shinde

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncating context windows, growing vector databases without bound, or tiered storage systems that lack consolidation and forgetting mechanisms. SCM implements five core components inspired by human memory: a limited-capacity working memory, multi-dimensional importance tagging, offline sleep-stage consolidation with distinct NREM and REM phases, intentional value-based forgetting, and a computational self-model enabling introspection. Across a standardized benchmark suite of eight tests, the prototype achieves perfect recall accuracy over ten-turn conversations while reducing memory noise by 90.9% through adaptive forgetting. Memory search latency remains below one millisecond even with hundreds of stored concepts. This work establishes the architectural foundations for memory systems that consolidate, prioritize, and forget, offering a testable platform for advancing LLM memory research.

[445] LAF-Based Evaluation and UTTL-Based Learning Strategies with MIATTs

Yongquan Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In many real-world machine learning (ML) applications, the true target cannot be precisely defined due to ambiguity or subjectivity information. To address this challenge, under the assumption that the true target for a given ML task is not assumed to exist objectively in the real world, the EL-MIATTs (Evaluation and Learning with Multiple Inaccurate True Targets) framework has been proposed. Bridging theory and practice in implementing EL-MIATTs, in this paper, we develop two complementary mechanisms: LAF (Logical Assessment Formula)-based evaluation algorithms and UTTL (Undefinable True Target Learning)-based learning strategies with MIATTs, which together enable logically coherent and practically feasible modeling under uncertain supervision. We first analyze task-specific MIATTs, examining how their coverage and diversity determine their structural property and influence downstream evaluation and learning. Based on this understanding, we formulate LAF-grounded evaluation algorithms that operate either on original MIATTs or on ternary targets synthesized from them, balancing interpretability, soundness, and completeness. For model training, we introduce UTTL-grounded learning strategies using Dice and cross-entropy loss functions, comparing per-target and aggregated optimization schemes. We also discuss how the integration of LAF and UTTL bridges the gap between logical semantics and statistical optimization. Together, these components provide a coherent pathway for implementing EL-MIATTs, offering a principled foundation for developing ML systems in scenarios where the notion of “ground truth” is inherently uncertain. An application of this work’s results is presented as part of the study available at https://www.qeios.com/read/EZWLSN.

[446] Early Detection of Latent Microstructure Regimes in Limit Order Books

Prakul Sunil Hiremath, Vruksha Arun Hiremath

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Limit order books can transition rapidly from stable to stressed conditions, yet standard early-warning signals such as order flow imbalance and short-term volatility are inherently reactive. We formalise this limitation via a three-regime causal data-generating process (stable $\to$ latent build-up $\to$ stress) in which a latent deterioration phase creates a prediction window prior to observable stress. Under mild assumptions on temporal drift and regime persistence, we establish identifiability of the latent build-up regime and derive guarantees for strictly positive expected lead-time and non-trivial probability of early detection. We propose a trigger-based detector combining MAX aggregation of complementary signal channels, a rising-edge condition, and adaptive thresholding. Across 200 simulations, the method achieves mean lead-time $+18.6 \pm 3.2$ timesteps with perfect precision and moderate coverage, outperforming classical change-point and microstructure baselines. A preliminary application to one week of BTC/USDT order book data shows consistent positive lead-times while baselines remain reactive. Results degrade in low signal-to-noise and short build-up regimes, consistent with theory.

[447] Differentially Private Model Merging

Qichuan Yin, Manzil Zaheer, Tian Li

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In machine learning applications, privacy requirements during inference or deployment time could change constantly due to varying policies, regulations, or user experience. In this work, we aim to generate a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training steps, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post processing techniques, namely random selection and linear combination, to output a final private model for any target privacy parameter. We provide privacy accounting of these approaches from the lens of R’enyi DP and privacy loss distributions for general problems. In a case study on private mean estimation, we fully characterize the privacy/utility results and theoretically establish the superiority of linear combination over random selection. Empirically, we validate our approach and analyses on several models and both synthetic and real-world datasets.

[448] Droplet-LNO: Physics-Informed Laplace Neural Operators for Accurate Prediction of Droplet Spreading Dynamics on Complex Surfaces

Ganesh Sahadeo Meshram, Partha Pratim Chakrabarti, Suman Chakraborty

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Spreading of liquid droplets on solid substrates constitutes a classic multiphysics problem with widespread applications ranging from inkjet printing, spray cooling, to biomedical microfluidic systems. Yet, accurate computational fluid dynamic (CFD) simulations are prohibitively expensive, taking more than 18 to 24 hours for each transient computation. In this paper, Physics-Informed Laplace Operator Neural Network (PI-LNO) is introduced, representing a novel architecture where the Laplace integral transform function serves as a learned physics-informed functional basis. Extensive comparative benchmark studies were performed against five other state-of-the-art approaches: UNet, UNet with attention modules (UNet-AM), DeepONet, Physics-Informed UNet (PI-UNet), and Laplace Neural Operator (LNO). Through complex Laplace transforms, PI-LNO natively models the exponential transient dynamics of the spreading process. A TensorFlow-based PI-LNO is trained on multi-surface CFD data spanning contact angles $θ_s ε[20,160]$, employing a physics-regularized composite loss combining data fidelity (MSE, MAE, RMSE) with Navier-Stokes, Cahn-Hilliard, and causality constraints.

[449] SGD at the Edge of Stability: The Stochastic Sharpness Gap

Fangshuo Liao, Afroditi Kolomvaki, Anastasios Kyrillidis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: When training neural networks with full-batch gradient descent (GD) and step size $η$, the largest eigenvalue of the Hessian – the sharpness $S(\boldsymbolθ)$ – rises to $2/η$ and hovers there, a phenomenon termed the Edge of Stability (EoS). \citet{damian2023selfstab} showed that this behavior is explained by a self-stabilization mechanism driven by third-order structure of the loss, and that GD implicitly follows projected gradient descent (PGD) on the constraint $ S(\boldsymbolθ)\leq 2/η$. For mini-batch stochastic gradient descent (SGD), the sharpness stabilizes below $2/η$, with the gap widening as the batch size decreases; yet no theoretical explanation exists for this suppression. We introduce stochastic self-stabilization, extending the self-stabilization framework to SGD. Our key insight is that gradient noise injects variance into the oscillatory dynamics along the top Hessian eigenvector, strengthening the cubic sharpness-reducing force and shifting the equilibrium below $2/η$. Following the approach of \citet{damian2023selfstab}, we define stochastic predicted dynamics relative to a moving projected gradient descent trajectory and prove a stochastic coupling theorem that bounds the deviation of SGD from these predictions. We derive a closed-form equilibrium sharpness gap: $ΔS = ηβσ_{\boldsymbol{u}}^{2}/(4α)$, where $α$ is the progressive sharpening rate, $β$ is the self-stabilization strength, and $σ_{ \boldsymbol{u}}^{2}$ is the gradient noise variance projected onto the top eigenvector. This formula predicts that smaller batch sizes yield flatter solutions and recovers GD when the batch equals the full dataset.

[450] MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference

Anurita Das

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deploying large language models to heterogeneous hardware is often constrained by memory, not compute. We introduce MCAP (Monte Carlo Activation Profiling), a load-time per-layer importance estimator that enables dynamic precision and memory placement decisions on the target device. MCAP produces a lightweight per-layer signal that drives both precision dispatch (W4A8 vs. W4A16) and residency tier (GPU, RAM, SSD), allowing a single set of weights to operate across diverse memory budgets. Our system, NVE, achieves 1.5-1.8x higher decode throughput than llama.cpp Q4_0 on NVIDIA T4 and enables models to run in memory regimes previously infeasible without modifying weights.

[451] A Deep U-Net Framework for Flood Hazard Mapping Using Hydraulic Simulations of the Wupper Catchment

Christian Lammers, Fernando Arévalo, Leonie Märker-Neuhaus, Daniel Heinenberg, Christian Förster, Karl-Heinz Spies

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The increasing frequency and severity of global flood events highlights the need for the development of rapid and reliable flood prediction tools. This process traditionally relies on computationally expensive hydraulic simulations. This research presents a prediction tool by developing a deep-learning based surrogate model to accurately and efficiently predict the maximum water level across a grid. This was achieved by conducting a series of experiments to optimize a U-Net architecture, patch generation, and data handling for approximating a hydraulic model. This research demonstrates that a deep learning surrogate model can serve as a computationally efficient alternative to traditional hydraulic simulations. The framework was tested using hydraulic simulations of the Wupper catchment in the North-Rhein Westphalia region (Germany), obtaining comparable results.

[452] Synthetic Data in Education: Empirical Insights from Traditional Resampling and Deep Generative Models

Tapiwa Amion Chinodakufa, Ashfaq Ali Shafin, Khandaker Mamun Ahmed

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Synthetic data generation offers promise for addressing data scarcity and privacy concerns in educational technology, yet practitioners lack empirical guidance for selecting between traditional resampling techniques and modern deep learning approaches. This study presents the first systematic benchmark comparing these paradigms using a 10,000-record student performance dataset. We evaluate three resampling methods (SMOTE, Bootstrap, Random Oversampling) against three deep learning models (Autoencoder, Variational Autoencoder, Copula-GAN) across multiple dimensions: distributional fidelity (Kolmogorov-Smirnov distance, Jensen-Shannon divergence), machine learning utility such as Train-on-Synthetic-Test-on-Real scores (TSTR), and privacy preservation (Distance to Closest Record). Our findings reveal a fundamental trade-off: resampling methods achieve near-perfect utility (TSTR: 0.997) but completely fail privacy protection (DCR ~ 0.00), while deep learning models provide strong privacy guarantees (DCR ~ 1.00) at significant utility cost. Variational Autoencoders emerge as the optimal compromise, maintaining 83.3% predictive performance while ensuring complete privacy protection. We also provide actionable recommendations: use traditional resampling for internal development where privacy is controlled, and VAEs for external data sharing where privacy is paramount. This work establishes a foundational benchmark and practical decision framework for synthetic data generation in learning analytics.

[453] Interpretable Quantile Regression by Optimal Decision Trees

Valentin Lemaire, Gaël Aglin, Siegfried Nijssen

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The field of machine learning is subject to an increasing interest in models that are not only accurate but also interpretable and robust, thus allowing their end users to understand and trust AI systems. This paper presents a novel method for learning a set of optimal quantile regression trees. The advantages of this method are that (1) it provides predictions about the complete conditional distribution of a target variable without prior assumptions on this distribution; (2) it provides predictions that are interpretable; (3) it learns a set of optimal quantile regression trees without compromising algorithmic efficiency compared to learning a single tree.

[454] JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning

Ali Aghababaei-Harandi, Aude Sportisse, Massih-Reza Amini

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Semi-supervised learning has emerged as a powerful paradigm for leveraging large amounts of unlabeled data to improve the performance of machine learning models when labeled data are scarce. Among existing approaches, methods derived from FixMatch have achieved state-of-the-art results in image classification by combining weak and strong data augmentations with confidence-based pseudo-labeling. Despite their strong empirical performance, these methods typically struggle with two critical bottlenecks: majority classes tend to dominate the learning process, which is amplified by incorrect pseudo-labels, leading to biased models. Furthermore, noisy early pseudo-labels prevent the model from forming clear decision boundaries, requiring prolonged training to learn informative representation. In this paper, we introduce a paradigm shift from conventional logical output threshold base, toward an explicit shaping of geometric representations. Our approach is inspired by the recently proposed Latent-Euclidean Joint-Embedding Predictive Architectures (LeJEPA), a theoretically grounded framework asserting that meaningful representations should exhibit an isotropic Gaussian structure in latent space. Building on this principle, we propose a new training objective that combines the classical semi-supervised loss used in FlexMatch, an adaptive extension of FixMatch, with a latent-space regularization term derived from LeJEPA. Our proposed approach, encourages well-structured representations while preserving the advantages of pseudo-labeling strategies. Through extensive experiments on CIFAR-100, STL-10 and Tiny-ImageNet, we demonstrate that the proposed method consistently outperforms existing baselines. In addition, our method significantly accelerates the convergence, drastically reducing the overall computational cost compared to standard FixMatch-based pipelines.

[455] TRAVELFRAUDBENCH: A Configurable Evaluation Framework for GNN Fraud Ring Detection in Travel Networks

Bhavana Sajja

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We introduce TravelFraudBench (TFG), a configurable benchmark for evaluating graph neural networks (GNNs) on fraud ring detection in travel platform graphs. Existing benchmarks–YelpChi, Amazon-Fraud, Elliptic, PaySim–cover single node types or domain-generic patterns with no mechanism to evaluate across structurally distinct fraud ring topologies. TFG simulates three travel-specific ring types–ticketing fraud (star topology with shared device/IP clusters), ghost hotel schemes (reviewer x hotel bipartite cliques), and account takeover rings (loyalty transfer chains)–in a heterogeneous graph with 9 node types and 12 edge types. Ring size, count, fraud rate, scale (500 to 200,000 nodes), and composition are fully configurable. We evaluate six methods–MLP, GraphSAGE, RGCN-proj, HAN, RGCN, and PC-GNN–under a ring-based split where each ring appears entirely in one partition, eliminating transductive label leakage. GraphSAGE achieves AUC=0.992 and RGCN-proj AUC=0.987, outperforming the MLP baseline (AUC=0.938) by 5.5 and 5.0 pp, confirming graph structure adds substantial discriminative power. HAN (AUC=0.935) is a negative result, matching the MLP baseline. On the ring recovery task (>=80% of ring members flagged simultaneously), GraphSAGE achieves 100% recovery across all ring types; MLP recovers only 17-88%. The edge-type ablation shows device and IP co-occurrence are the primary signals: removing uses_device drops AUC by 5.2 pp. TFG is released as an open-source Python package (MIT license) with PyG, DGL, and NetworkX exporters and pre-generated datasets at https://huggingface.co/datasets/bsajja7/travel-fraud-graphs, with Croissant metadata including Responsible AI fields.

[456] Spectral Embeddings Leak Graph Topology: Theory, Benchmark, and Adaptive Reconstruction

Thinh Nguyen-Cong, Truong-Son Hy, Thang N. Dinh

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Graph Neural Networks (GNNs) excel on relational data, but standard benchmarks unrealistically assume the graph is centrally available. In practice, settings such as Federated Graph Learning, distributed systems, and privacy-sensitive applications involve graph data that are localized, fragmented, noisy, and privacy-leaking. We present a unified framework for this setting. We introduce LoGraB (Local Graph Benchmark), which decomposes standard datasets into fragmented benchmarks using three strategies and four controls: neighborhood radius $d$, spectral quality $k$, noise level $σ$, and coverage ratio $p$. LoGraB supports graph reconstruction, localized node classification, and inter-fragment link prediction, with Island Cohesion. We propose AFR (Adaptive Fidelity-driven Reconstruction), a method for noisy spectral fragments. AFR scores patch quality via a fidelity measure combining a gap-to-truncation stability ratio and structural entropy, then assembles fragments using RANSAC-Procrustes alignment, adaptive stitching, and Bundle Adjustment. Rather than forcing a single global graph, AFR recovers large faithful islands. We prove heat-kernel edge recovery under a separation condition, Davis–Kahan perturbation stability, and bounded alignment error. We establish a Spectral Leakage Proposition: under a spectral-gap assumption, polynomial-time Bayesian recovery is feasible once enough eigenvectors are shared, complementing AFR’s deterministic guarantees. Experiments on nine benchmarks show that LoGraB reveals model strengths and weaknesses under fragmentation, AFR achieves the best F1 on 7/9 datasets, and under per-embedding $(ε,δ)$-Gaussian differential privacy, AFR retains 75% of its undefended F1 at $ε=2$. Our anonymous code is available at https://anonymous.4open.science/r/JMLR_submission

[457] Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

Neehal Tumma, Noel Loo, Daniela Rus

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: To address the increasing long-context compute limitations of softmax attention, several subquadratic recurrent operators have been developed. This work includes models such as Mamba-2, DeltaNet, Gated DeltaNet (GDN), and Kimi Delta Attention (KDA). As the space of recurrences grows, a parallel line of work has arisen to taxonomize them. One compelling view is the test-time regression (TTR) framework, which interprets recurrences as performing online least squares updates that learn a linear map from the keys to values. Existing delta-rule recurrences can be seen as first-order approximations to this objective, but notably ignore the curvature of the least-squares loss during optimization. In this work, we address this by introducing preconditioning to these recurrences. Starting from the theory of online least squares, we derive equivalences between linear attention and the delta rule in the exactly preconditioned case. Next, we realize this theory in practice by proposing a diagonal approximation: this enables us to introduce preconditioned variants of DeltaNet, GDN, and KDA alongside efficient chunkwise parallel algorithms for computing them. Empirically, we find that our preconditioned delta-rule recurrences yield consistent performance improvements across synthetic recall benchmarks and language modeling at the 340M and 1B scale.

[458] A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting

Brooks Kinch, Xiaozhe Hu, Yilong Huang, Martine Dyring Hansen, Sunniva Meltzer, Nathaniel Donald Hamlin, David Sirajuddin, Eric C. Cyr, Nathaniel Trask

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: For autoregressive modeling of chaotic dynamical systems over long time horizons, the stability of both training and inference is a major challenge in building scientific foundation models. We present a hybrid technique in which an autoregressive transformer is embedded within a novel shooting-based mixed finite element scheme, exposing topological structure that enables provable stability. For forward problems, we prove preservation of discrete energies, while for training we prove uniform bounds on gradients, provably avoiding the exploding gradient problem. Combined with a vision transformer, this yields latent tokens admitting structure-preserving dynamics. We outperform modern foundation models with a $65\times$ reduction in model parameters and long-horizon forecasting of chaotic systems. A “mini-foundation” model of a fusion component shows that 12 simulations suffice to train a real-time surrogate, achieving a $9{,}000\times$ speedup over particle-in-cell simulation.

[459] How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We measure how much one extra recurrence is worth to a looped (depth-recurrent) language model, in equivalent unique parameters. From an iso-depth sweep of 116 pretraining runs across recurrence counts $r \in {1, 2, 4, 8}$ spanning ${\sim}50\times$ in training compute, we fit a joint scaling law $L = E + A,(N_\text{once} + r^{\varphi} N_\text{rec})^{-α} + B,D^{-β}$ and recover a new recurrence-equivalence exponent $\varphi = 0.46$ at $R^2 = 0.997$. Intuitively, $\varphi$ tells us whether looping a block $r$ times is equivalent in validation loss to $r$ unique blocks of a non-looped model (full equivalence, $\varphi{=}1$) or to a single block run repeatedly with no capacity gain ($\varphi{=}0$). Our $\varphi = 0.46$ sits in between, so each additional recurrence predictably increases validation loss at matched training compute. For example, at $r{=}4$ a 410M looped model performs on par with a 580M non-looped model, but pays the training cost of a 1B non-looped one. On a five-axis downstream evaluation, the gap persists on parametric-knowledge tasks and closes on simple open-book tasks, while reasoning tasks are not resolvable at our compute budgets. For any looped LM, our $\varphi$ converts the design choice of $r$ into a predictable validation-loss cost, and future training recipes and architectures can be compared by how much they raise $\varphi$ above $0.46$.

[460] TabSHAP

Aryan Chaudhary, Prateek Agarwal, Tejasvi Alladi

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models (LLMs) fine-tuned on serialized tabular data are emerging as powerful alternatives to traditional tree-based models, particularly for heterogeneous or context-rich datasets. However, their deployment in high-stakes domains is hindered by a lack of faithful interpretability; existing methods often rely on global linear proxies or scalar probability shifts that fail to capture the model’s full probabilistic uncertainty. In this work, we introduce TabSHAP, a model-agnostic interpretability framework designed to directly attribute local query decision logic in LLM-based tabular classifiers. By adapting a Shapley-style sampled-coalition estimator with Jensen-Shannon divergence between full-input and masked-input class distributions, TabSHAP quantifies the distributional impact of each feature rather than simple prediction flips. To align with tabular semantics, we mask at the level of serialized key:value fields (atomic in the prompt string), not individual subword tokens. Experimental validation on the Adult Income and Heart Disease benchmarks demonstrates that TabSHAP isolates critical diagnostic features, achieving significantly higher faithfulness than random baselines and XGBoost proxies. We further run a distance-metric ablation on the same test instances and TabSHAP settings: attributions are recomputed with KL or L1 replacing JSD in the similarity step (results cached per metric), and we compare deletion faithfulness across all three.

[461] Graph Neural Network-Informed Predictive Flows for Faster Ford-Fulkerson and PAC-Learnability

Eleanor Wiesler, Trace Baxley

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We propose a learning-augmented framework for accelerating max-flow computation and image segmentation by integrating Graph Neural Networks (GNNs) with the Ford-Fulkerson algorithm. Rather than predicting initial flows, our method learns edge importance probabilities to guide augmenting path selection. We introduce a Message Passing GNN (MPGNN) that jointly learns node and edge embeddings through coupled updates, capturing both global structure and local flow dynamics such as residual capacity and bottlenecks. Given an input image, we propose a method to construct a grid-based flow network with source and sink nodes, extract features, and perform a single GNN inference to assign edge probabilities reflecting their likelihood of belonging to high-capacity cuts. These probabilities are stored in a priority queue and used to guide a modified Ford-Fulkerson procedure, prioritizing augmenting paths via an Edmonds-Karp-style search with bottleneck-aware tie-breaking. This avoids repeated inference over residual graphs while leveraging learned structure throughout optimization. We further introduce a bidirectional path construction strategy centered on high-probability edges and provide a theoretical framework relating prediction quality to efficiency via a weighted permutation distance metric. Our method preserves max-flow/min-cut optimality while reducing the number of augmentations in practice. We also outline a hybrid extension combining flow warm-starting with edge-priority prediction, establishing a foundation for learning-guided combinatorial optimization in image segmentation.

[462] Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo, Yi Liu, Songlei Wang, Zhiping Cai, Lin Liu, Xiaohua Jia, Shaojing Fu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges of limited resources and privacy concerns. Despite data localization, shared gradients can still expose sensitive information through membership inference attacks (MIAs). However, FedLLMs’ unique properties, i.e. massive parameter scales, rapid convergence, and sparse, non-orthogonal gradients, render existing MIAs ineffective. To address this gap, we propose ProjRes, the first projection residuals-based passive MIA tailored for FedLLMs. ProjRes leverages hidden embedding vectors as sample representations and analyzes their projection residuals on the gradient subspace to uncover the intrinsic link between gradients and inputs. It requires no shadow models, auxiliary classifiers, or historical updates, ensuring efficiency and robustness. Experiments on four benchmarks and four LLMs show that ProjRes achieves near 100% accuracy, outperforming prior methods by up to 75.75%, and remains effective even under strong differential privacy defenses. Our findings reveal a previously overlooked privacy vulnerability in FedLLMs and call for a re-examination of their security assumptions. Our code and data are available at $\href{https://anonymous.4open.science/r/Passive-MIA-5268}{link}$.

[463] ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Stephan Xie, Ben Cohen, Mononito Goswami, Junhong Shen, Emaad Khwaja, Chenghao Liu, David Asker, Othmane Abou-Amal, Ameet Talwalkar

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Time series question-answering (TSQA), in which we ask natural language questions to infer and reason about properties of time series, is a promising yet underexplored capability of foundation models. In this work, we present ARFBench, a TSQA benchmark that evaluates the understanding of multimodal foundation models (FMs) on time series anomalies prevalent in software incident data. ARFBench consists of 750 questions across 142 time series and 5.38M data points from 63 production incidents sourced exclusively from internal telemetry at Datadog. We evaluate leading proprietary and open-source LLMs, VLMs, and time series FMs and observe that frontier VLMs perform markedly better than existing baselines; the leading model (GPT-5) achieves a 62.7% accuracy and 51.9% F1. We next demonstrate the promise of specialized multimodal approaches. We develop a novel TSFM + VLM hybrid prototype which we post-train on a small set of synthetic and real data that yields comparable overall F1 and accuracy with frontier models. Lastly, we find models and human domain experts exhibit complementary strengths. We define a model-expert oracle, a best-of-2 oracle selector over model and expert answers, yielding 82.8% F1 and 87.2% accuracy and establishing a new superhuman frontier for future TSQA models. The benchmark is available at https://huggingface.co/datasets/Datadog/ARFBench.

[464] The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

Costin-Andrei Oncescu, Depen Morwani, Samy Jelassi, Alexandru Meterez, Mujin Kwun, Sham Kakade

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Transformers process tokens in parallel but are temporally shallow: at position $t$, each layer attends to key-value pairs computed based on the previous layer, yielding a depth capped by the number of layers. Recurrent models offer unbounded temporal depth but suffer from optimization instability and historically underutilize modern accelerators. We introduce the Recurrent Transformer, a simple architectural change where each layer attends to key-value pairs computed off its own activations, yielding layerwise recurrent memory while preserving standard autoregressive decoding cost. We show that the architecture can emulate both (i) a conventional Transformer and (ii) token-to-token recurrent updates under mild assumptions, while avoiding optimization instability. Naively, prefill/training appears bandwidth-bound with effective arithmetic intensity near $1$ because keys and values are revealed sequentially; we give an exact tiling-based algorithm that preserves the mathematical computation while reducing HBM traffic from $Θ(N^2)$ to $Θ(N\log N)$, increasing effective arithmetic intensity to $Θ(N/\log N)$ for sequence length $N$. On 150M and 300M parameter C4 pretraining, Recurrent Transformers improve cross-entropy over a parameter-matched Transformer baseline and achieve the improvement with fewer layers (fixed parameters), suggesting that recurrence can trade depth for width, thus reducing KV cache memory footprint and inference latency.

[465] Learning Dynamic Representations and Policies from Multimodal Clinical Time-Series with Informative Missingness

Zihan Liang, Ziwen Pan, Ruoxuan Xiong

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multimodal clinical records contain structured measurements and clinical notes recorded over time, offering rich temporal information about the evolution of patient health. Yet these observations are sparse, and whether they are recorded depends on the patient’s latent condition. Observation patterns also differ across modalities, as structured measurements and clinical notes arise under distinct recording processes. While prior work has developed methods that accommodate missingness in clinical time series, how to extract and use the information carried by the observation process itself remains underexplored. We therefore propose a patient representation learning framework for multimodal clinical time series that explicitly leverages informative missingness. The framework combines (1) a multimodal encoder that captures signals from structured and textual data together with their observation patterns, (2) a Bayesian filtering module that updates a latent patient state over time from observed multimodal signals, and (3) downstream modules for offline treatment policy learning and patient outcome prediction based on the learned patient state. We evaluate the framework on ICU sepsis cohorts from MIMIC-III, MIMIC-IV, and eICU. It improves both offline treatment policy learning and adverse outcome prediction, achieving FQE 0.679 versus 0.528 for clinician behavior and AUROC 0.886 for post-72-hour mortality prediction on MIMIC-III.

[466] CAP: Controllable Alignment Prompting for Unlearning in LLMs

Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. However, existing parameter-modifying methods face fundamental limitations: high computational costs, uncontrollable forgetting boundaries, and strict dependency on model weight access. These constraints render them impractical for closed-source models, yet current non-invasive alternatives remain unsystematic and reliant on empirical experience. To address these challenges, we propose the Controllable Alignment Prompting for Unlearning (CAP) framework, an end-to-end prompt-driven unlearning paradigm. CAP decouples unlearning into a learnable prompt optimization process via reinforcement learning, where a prompt generator collaborates with the LLM to suppress target knowledge while preserving general capabilities selectively. This approach enables reversible knowledge restoration through prompt revocation. Extensive experiments demonstrate that CAP achieves precise, controllable unlearning without updating model parameters, establishing a dynamic alignment mechanism that overcomes the transferability limitations of prior methods.

[467] Improving Performance in Classification Tasks with LCEN and the Weighted Focal Differentiable MCC Loss

Pedro Seber, Richard D. Braatz

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The LASSO-Clip-EN (LCEN) algorithm was previously introduced for nonlinear, interpretable feature selection and machine learning. However, its design and use was limited to regression tasks. In this work, we create a modified version of the LCEN algorithm that is suitable for classification tasks and maintains its desirable properties, such as interpretability. This modified LCEN algorithm is evaluated on four widely used binary and multiclass classification datasets. In these experiments, LCEN is compared against 10 other model types and consistently reaches high test-set macro F$_1$ score and Matthews correlation coefficient (MCC) metrics, higher than that of the majority of investigated models. LCEN models for classification remain sparse, eliminating an average of 56% of all input features in the experiments performed. Furthermore, LCEN-selected features are used to retrain all models using the same data, leading to statistically significant performance improvements in three of the experiments and insignificant differences in the fourth when compared to using all features or other feature selection methods. Simultaneously, the weighted focal differentiable MCC (diffMCC) loss function is evaluated on the same datasets. Models trained with the diffMCC loss function are always the best-performing methods in these experiments, and reach test-set macro F$_1$ scores that are, on average, 4.9% higher and MCCs that are 8.5% higher than those obtained by models trained with the weighted cross-entropy loss. These results highlight the performance of LCEN as a feature selection and machine learning algorithm also for classification tasks, and how the diffMCC loss function can train very accurate models, surpassing the weighted cross-entropy loss in the tasks investigated.

[468] Hyperloop Transformers

Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model’s memory footprint, thus motivating parameter-efficient architectures for language modeling. This paper describes a simple architecture that improves the parameter-efficiency of LLMs. Our architecture makes use of looped Transformers as a core primitive, which reuse Transformer layers across depth and are thus more parameter-efficient than ordinary (depth-matched) Transformers. We organize the looped Transformer into three blocks–begin, middle, and end blocks–where each block itself consists of multiple Transformer layers, and only the middle block is applied recurrently across depth. We augment the looped middle block with hyper-connections (Xie et al., 2026), which expand the residual stream into matrix-valued residual streams. Hyper-connections are applied only after each loop, and therefore add minimal new parameters and compute cost. Across various model scales, we find that our Hyper-Connected Looped Transformer (Hyperloop Transformer) is able to outperform depth-matched Transformer and mHC Transformer baselines despite using approximately 50% fewer parameters. The outperformance persists through post-training weight quantization, thus positioning Hyperloop Transformers as an attractive architecture for memory-efficient language modeling.

[469] Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Wenkai Wang, Xiyun Li, Hongcan Guo, Wenhao Yu, Tianqing Fang, Haitao Mi, Dong Yu, Shengyu Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Graphical User Interface (GUI) grounding requires mapping natural language instructions to precise pixel coordinates. However, due to visually homogeneous elements and dense layouts, models typically grasp semantic intent yet struggle with achieving precise localization. While scaling sampling attempts (Pass@k) reveals potential gains, static self-consistency strategies derived from geometric clustering often yield limited improvements, as the model’s predictions tend to be spatially dispersed. In this paper, we propose replacing static consistency strategies with a learnable selection mechanism that selects the optimal target by critiquing its own proposals rendered on the screenshot. Given the significant disparity between the model’s grounding and critiquing capabilities, we propose a co-evolving Propose-then-Critic framework. To jointly optimize these, we introduce a maturity-aware adaptive co-evolutionary reinforcement learning paradigm. This approach dynamically balances the training objectives of proposer and critic, where the diversity of the proposer’s outputs enhances critic robustness, while the critic’s maturing discrimination capability conversely unlocks the proposer’s potential for extensive spatial exploration, fostering the mutual reinforcement and co-evolution of both capabilities, thereby ensuring generalizability to adapt to diverse and complex interface layouts. Extensive experiments over 6 benchmarks show that our method significantly enhances both grounding accuracy and critic reliability.

[470] Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Yongcan Yu, Lingxiao He, Jian Liang, Kuangpu Guo, Meng Wang, Qianlong Xie, Xingxing Wang, Ran He

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization signals from label noise. Through an empirical study, we observe that responses with medium consistency form an ambiguity region and constitute the primary source of reward noise. Crucially, we find that such spurious signals can be even amplified through group-relative advantage estimation. Motivated by these findings, we propose a unified framework, Debiased and Denoised test-time Reinforcement Learning (DDRL), to mitigate spurious signals. Concretely, DDRL first applies a frequency-based sampling strategy to exclude ambiguous samples while maintaining a balanced set of positive and negative examples. It then adopts a debiased advantage estimation with fixed advantages, removing the bias introduced by group-relative policy optimization. Finally, DDRL incorporates a consensus-based off-policy refinement stage, which leverages the rejection-sampled dataset to enable efficient and stable model updates. Experiments on three large language models across multiple mathematical reasoning benchmarks demonstrate that DDRL consistently outperforms existing TTRL baselines. The code will soon be released at https://github.com/yuyongcan/DDRL.

[471] Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression

Wei Jiang, Wei Wang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Sub-token routing offers a finer control axis for transformer efficiency than the coarse units used in most prior work, such as tokens, pages, heads, or layers. In this paper, we study routing within a token representation itself in LoRA-adapted transformers. The motivation is that a relevant token need not be internally uniform: under a retention budget, preserved value groups are distributed unevenly both across tokens and within tokens, which suggests that KV compression need not be an all-or-nothing decision at token level. We study this fine-grained routing mechanism in two settings. For compression-aware language modeling, we introduce a query-independent design that combines routed subspace LoRA with value-group routing on the KV path. For downstream-task-preserving KV compression, we introduce a query-aware design in which a predictor-based selector allocates a global retention budget over context-token/value-group pairs using query-conditioned relevance. Experiments show that the query-independent design improves the quality-compression tradeoff for language modeling, while the query-aware design preserves downstream behavior under reduced KV budgets. We further examine the relation between token-level and sub-token-level query-aware routing, and show that they form complementary compression axes: token-level methods determine which tokens survive globally, while sub-token routing determines how the surviving tokens are compressed internally.

[472] Decoupled Travel Planning with Behavior Forest

Duanyang Yuan, Sihang Zhou, Yanning Hou, Xiaoshu Chen, Haoyuan Chen, Ke Liang, Jiyuan Liu, Chuan Ma, Xinwang Liu, Jian Huang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Behavior sequences, composed of executable steps, serve as the operational foundation for multi-constraint planning problems such as travel planning. In such tasks, each planning step is not only constrained locally but also influenced by global constraints spanning multiple subtasks, leading to a tightly coupled and complex decision process. Existing travel planning methods typically rely on a single decision space that entangles all subtasks and constraints, failing to distinguish between locally acting constraints within a subtask and global constraints that span multiple subtasks. Consequently, the model is forced to jointly reason over local and global constraints at each decision step, increasing the reasoning burden and reducing planning efficiency. To address this problem, we propose the Behavior Forest method. Specifically, our approach structures the decision-making process into a forest of parallel behavior trees, where each behavior tree is responsible for a subtask. A global coordination mechanism is introduced to orchestrate the interactions among these trees, enabling modular and coherent travel planning. Within this framework, large language models are embedded as decision engines within behavior tree nodes, performing localized reasoning conditioned on task-specific constraints to generate candidate subplans and adapt decisions based on coordination feedback. The behavior trees, in turn, provide an explicit control structure that guides LLM generation. This design decouples complex tasks and constraints into manageable subspaces, enabling task-specific reasoning and reducing the cognitive load of LLM. Experimental results show that our method outperforms state-of-the-art methods by 6.67% on the TravelPlanner and by 11.82% on the ChinaTravel benchmarks, demonstrating its effectiveness in increasing LLM performance for complex multi-constraint travel planning.

[473] mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

Adam Skurla, Dominik Macko, Jakub Simko

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-domain detection of the machine-generated code snippets in various programming languages is a challenging task. SemEval-2026 Task~13 copes with this challenge in various angles, as a binary detection problem as well as attribution of the source. Specifically, its subtasks also cover generator LLM family detection, as well as a hybrid code co-generated by humans and machines, or adversarially modified codes hiding its origin. Our submitted systems adjusted the existing mdok approach (focused on machine-generated text detection) to these specific kinds of problems by exploring various base models, more suitable for code understanding. The results indicate that the submitted systems are competitive in all three subtasks. However, the margins from the top-performing systems are significant, and thus further improvements are possible.

[474] Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

Tatsuhito Hasegawa

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Human activity recognition (HAR) in Internet of Things (IoT) environments must cope with heterogeneous sensor settings that vary across datasets, devices, body locations, sensing modalities, and channel compositions. This heterogeneity makes conventional channel-fixed models difficult to reuse across sensing environments because their input representations are tightly coupled to predefined channel structures. To address this problem, we investigate strict channel-free HAR, in which a single shared model performs inference without assuming a fixed number, order, or semantic arrangement of input channels, and without relying on sensor-specific input layers or dataset-specific channel templates. We argue that fusion design is the central issue in this setting. Accordingly, we propose a channel-free HAR framework that combines channel-wise encoding with a shared encoder, metadata-conditioned late fusion via conditional batch normalization, and joint optimization of channel-level and fused predictions through a combination loss. The proposed model processes each channel independently to handle varying channel configurations, while sensor metadata such as body location, modality, and axis help recover structural information that channel-independent processing alone cannot retain. In addition, the joint loss encourages both the discriminability of individual channels and the consistency of the final fused prediction. Experiments on PAMAP2, together with robustness analysis on six HAR datasets, ablation studies, sensitivity analysis, efficiency evaluation, and cross-dataset transfer learning, demonstrate three main findings…

[475] Relocation of compact sets in $\mathbb{R}^n$ by diffeomorphisms and linear separability of datasets in $\mathbb{R}^n$

Xiao-Song Yang, Xuan Zhou, Qi Zhou

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Relocation of compact sets in an $n$-dimensional manifold by self-diffeomorphism is of its own interest as well as significant potential applications to data classification in data science. This paper presents a theory for relocating a finite number of compact sets in $\mathbb{R}^n$ to be relocated to arbitrary target domains in $\mathbb{R}^n$ by diffeomorphisms of $\mathbb{R}^n$. Furthermore, we prove that for any such collection, there exists a differentiable embedding into $\mathbb{R}^{n+1}$ such that their images become linearly separable. As applications of the established theory, we show that a finite number of compact datasets in $\mathbb{R}^n$ can be made linearly separable by width-$n$ deep neural networks (DNNs) with Leaky-ReLU, ELU, or SELU activation functions, under a mild condition. In addition, we show that any finite number of mutually disjoint compact datasets in $\mathbb{R}^n$ can be made linearly separable in $\mathbb{R}^{n+1}$ by a width-$(n+1)$ DNN.

Vishal Rajput

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises supervised loss must retain non-zero Jacobian sensitivity in directions that are label-correlated in training data but nuisance at test time. This is not a contingent failure of current methods; it is a mathematical consequence of the supervised objective itself. We call this the geometric blind spot of supervised learning (Theorem 1), and show it holds across proper scoring rules, architectures, and dataset sizes. This single theorem unifies four lines of prior empirical work that were previously treated separately: non-robust predictive features, texture bias, corruption fragility, and the robustness-accuracy tradeoff. In this framing, adversarial vulnerability is one consequence of a broader structural fact about supervised learning geometry. We introduce Trajectory Deviation Index (TDI), a diagnostic that measures the theorem’s bounded quantity directly, and show why common alternatives miss the key failure mode. PGD adversarial training reaches Jacobian Frobenius 2.91 yet has the worst clean-input geometry (TDI 1.336), while PMH achieves TDI 0.904. TDI is the only metric that detects this dissociation because it measures isotropic path-length distortion – the exact quantity Theorem 1 bounds. Across seven vision tasks, BERT/SST-2, and ImageNet ViT-B/16 backbones used by CLIP, DINO, and SAM, the blind spot is measurable and repairable. It is present at foundation-model scale, worsens monotonically across language-model sizes (blind-spot ratio 0.860 to 0.765 to 0.742 from 66M to 340M), and is amplified by task-specific ERM fine-tuning (+54%), while PMH repairs it by 11x with one additional training term whose Gaussian form Proposition 5 proves is the unique perturbation law that uniformly penalises the encoder Jacobian.

[477] Even More Guarantees for Variational Inference in the Presence of Symmetries

Lena Zellinger, Antonio Vergari

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: When approximating an intractable density via variational inference (VI) the variational family is typically chosen as a simple parametric family that very likely does not contain the target. This raises the question: Under which conditions can we recover characteristics of the target despite misspecification? In this work, we extend previous results on robust VI with location-scale families under target symmetries. We derive sufficient conditions guaranteeing exact recovery of the mean when using the forward Kullback-Leibler divergence and $α$-divergences. We further show how and why optimization can fail to recover the target mean in the absence of our sufficient conditions, providing initial guidelines on the choice of the variational family and $α$-value.

[478] A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization

Mohammad Mahdi Abedi, David Pardo, Tariq Alkhalifah

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Standard physics-informed neural networks (PINNs) struggle to simulate highly oscillatory Helmholtz solutions in heterogeneous media because pointwise minimization of second-order PDE residuals is computationally expensive, biased toward smooth solutions, and requires artificial absorbing boundary layers to restrict the solution. To overcome these challenges, we introduce a Green-Integral (GI) neural solver for the acoustic Helmholtz equation. It departs from the PDE-residual-based formulation by enforcing wave physics through an integral representation that imposes a nonlocal constraint. Oscillatory behavior and outgoing radiation are encoded directly through the integral kernel, eliminating second-order spatial derivatives and enforcing physical solutions without additional boundary layers. Theoretically, optimizing this GI loss via a neural network acts as a spectrally tuned preconditioned iteration, enabling convergence in heterogeneous media where the classical Born series diverges. By exploiting FFT-based convolution to accelerate the GI loss evaluation, our approach substantially reduces GPU memory usage and training time. However, this efficiency relies on a fixed regular grid, which can limit local resolution. To improve local accuracy in strong scattering regions, we also propose a hybrid GI+PDE loss, enforcing a lightweight Helmholtz residual at a small number of nonuniformly sampled collocation points. We evaluate our method on seismic benchmark models characterized by structural contrasts and subwavelength heterogeneity at frequencies up to 20Hz. GI-based training consistently outperforms PDE-based PINNs, reducing computational cost by over a factor of ten. In models with localized scattering, the hybrid loss yields the most accurate reconstructions, providing a stable, efficient, and physically grounded alternative.

[479] Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

Heng Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal “Boltzmann-tilted” distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.

[480] Conditional anomaly detection with soft harmonic functions

Michal Valko, Branislav Kveton, Hamed Valizadegan, Gregory F. Cooper, Milos Hauskrecht

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions.

[481] Dynamical Priors as a Training Objective in Reinforcement Learning

Sukesh Subaharan

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Standard reinforcement learning (RL) optimizes policies for reward but imposes few constraints on how decisions evolve over time. As a result, policies may achieve high performance while exhibiting temporally incoherent behavior such as abrupt confidence shifts, oscillations, or degenerate inactivity. We introduce Dynamical Prior Reinforcement Learning (DP-RL), a training framework that augments policy gradient learning with an auxiliary loss derived from external state dynamics that implement evidence accumulation and hysteresis. Without modifying the reward, environment, or policy architecture, this prior shapes the temporal evolution of action probabilities during learning. Across three minimal environments, we show that dynamical priors systematically alter decision trajectories in task-dependent ways, promoting temporally structured behavior that cannot be explained by generic smoothing. These results demonstrate that training objectives alone can control the temporal geometry of decision-making in RL agents.

[482] Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms

Jiyan Song, Wenyang Wang, Chengcheng Yan, Zhiquan Han, Feifei Zhao

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In the treatment of complex diseases, treatment regimens using a single drug often yield limited efficacy and can lead to drug resistance. In contrast, combination drug therapies can significantly improve therapeutic outcomes through synergistic effects. However, experimentally validating all possible drug combinations is prohibitively expensive, underscoring the critical need for efficient computational prediction methods. Although existing approaches based on deep learning and graph neural networks (GNNs) have made considerable progress, challenges remain in reducing structural bias, improving generalization capability, and enhancing model interpretability. To address these limitations, this paper proposes a collaborative prediction graph neural network that integrates molecular structural features and cell-line genomic profiles with drug-drug interactions to enhance the prediction of synergistic effects. We introduce a novel model named the Residual Graph Isomorphism Network integrated with an Attention mechanism (ResGIN-Att). The model first extracts multi scale topological features of drug molecules using a residual graph isomorphism network, where residual connections help mitigate over-smoothing in deep layers. Subsequently, an adaptive Long Short-Term Memory (LSTM) module fuses structural information from local to global scales. Finally, a cross-attention module is designed to explicitly model drug-drug interactions and identify key chemical substructures. Extensive experiments on five public benchmark datasets demonstrate that ResGIN-Att achieves competitive performance, comparing favorably against key baseline methods while exhibiting promising generalization capability and robustness.

[483] Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

Hanjun Cho, Gahyun Yoo, Hanseong Kim, Jay-Yoon Lee

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift. Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning. We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner. By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning. Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%) with full train data and proprietary models such as GPT-5, Gemini-2.5-Pro. Furthermore, in the domain-shift experiments, TaNOS displays nearly-negligible cross-domain gap (<2pp) when standard SFT shows over 10pp gap. These results suggest that structural guidance with operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision can improve the robustness of numerical reasoning across diverse expert-domain tables.

[484] A temporal deep learning framework for calibration of low-cost air quality sensors

Arindam Sengupta, Tony Bush, Ben Marner, Jose Miguel Pérez, Soledad Le Clainche

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Low-cost air quality sensors (LCS) provide a practical alternative to expensive regulatory-grade instruments, making dense urban monitoring networks possible. Yet their adoption is limited by calibration challenges, including sensor drift, environmental cross-sensitivity, and variability in performance from device to device. This work presents a deep learning framework for calibrating LCS measurements of PM${2.5}$, PM${10}$, and NO$2$ using a Long Short-Term Memory (LSTM) network, trained on co-located reference data from the OxAria network in Oxford, UK. Unlike the Random Forest (RF) baseline, which treats each observation independently, the proposed approach captures temporal dependencies and delayed environmental effects through sequence-based learning, achieving higher $R^2$ values across training, validation, and test sets for all three pollutants. A feature set is constructed combining time-lagged parameters, harmonic encodings, and interaction terms to improve generalization on unseen temporal windows. Validation of unseen calibrated values against the Equivalence Spreadsheet Tool 3.1 demonstrates regulatory compliance with expanded uncertainties of 22.11% for NO$2$, 12.42% for PM${10}$, and 9.1% for PM${2.5}$.

[485] Hybrid Deep Learning Approach for Coupled Demand Forecasting and Supply Chain Optimization

Nusrat Yasmin Nadia, Md Habibul Arif, Habibor Rahman Rabby, Md Iftekhar Monzur Tanvir, Md. Jakir Hossen, M. F. Mridha

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Supply chain resilience and efficiency are vital in industries characterized by volatile demand and uncertain supply, such as textiles and personal protective equipment (PPE). Traditional forecasting and optimization approaches often operate in isolation, limiting their real-world effectiveness. This paper proposes a Hybrid AI Framework for Demand-Supply Forecasting and Optimization (HAF-DS), which integrates a Long Short-Term Memory (LSTM)-based demand forecasting module with a mixed integer linear programming (MILP) optimization layer. The LSTM captures temporal and contextual demand dependencies, while the optimization layer prescribes cost-efficient replenishment and allocation decisions. The framework jointly minimizes forecasting error and operational cost through embedding-based feature representation and recurrent neural architectures. Experiments on textile sales and supply chain datasets show significant performance gains over statistical and deep learning baselines. On the combined dataset, HAF-DS reduced Mean Absolute Error (MAE) from 15.04 to 12.83 (14.7%), Root Mean Squared Error (RMSE) from 19.53 to 17.11 (12.4%), and Mean Absolute Percentage Error (MAPE) from 9.5% to 8.1%. Inventory cost decreased by 5.4%, stockouts by 27.5%, and service level rose from 95.5% to 97.8%. These results confirm that coupling predictive forecasting with prescriptive optimization enhances both accuracy and efficiency, providing a scalable and adaptable solution for modern textile and PPE supply chains.

[486] Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

Benedikt Bollig, Matthias Függer, Thomas Nowak, Paul Zeinaty

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We compare lightweight automata-based models (n-grams) with neural architectures (LSTM, Transformer) for next-activity prediction in streaming event logs. Experiments on synthetic patterns and five real-world process mining datasets show that n-grams with appropriate context windows achieve comparable accuracy to neural models while requiring substantially fewer resources. Unlike windowed neural architectures, which show unstable performance patterns, n-grams provide stable and consistent accuracy. While we demonstrate that classical ensemble methods like voting improve n-gram performance, they require running many agents in parallel during inference, increasing memory consumption and latency. We propose an ensemble method, the promotion algorithm, that dynamically selects between two active models during inference, reducing overhead compared to classical voting schemes. On real-world datasets, these ensembles match or exceed the accuracy of non-windowed neural models with lower computational cost.

[487] Geometric Characterisation and Structured Trajectory Surrogates for Clinical Dataset Condensation

Pafue Christy Nganjimi, Andrew Soltan, Danielle Belgrave, Lei Clifton, David Clifton, Anshul Thakur

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dataset condensation constructs compact synthetic datasets that retain the training utility of large real-world datasets, enabling efficient model development and potentially supporting downstream research in governed domains such as healthcare. Trajectory matching (TM) is a widely used condensation approach that supervises synthetic data using changes in model parameters observed during training on real data, yet the structure of this supervision signal remains poorly understood. In this paper, we provide a geometric characterisation of trajectory matching, showing that a fixed synthetic dataset can only reproduce a limited span of such training-induced parameter changes. When the resulting supervision signal is spectrally broad, this creates a conditional representability bottleneck. Motivated by this mismatch, we propose Bezier Trajectory Matching (BTM), which replaces SGD trajectories with quadratic Bezier trajectory surrogates between initial and final model states. These surrogates are optimised to reduce average loss along the path while replacing broad SGD-derived supervision with a more structured, lower-rank signal that is better aligned with the optimisation constraints of a fixed synthetic dataset, and they substantially reduce trajectory storage. Experiments on five clinical datasets demonstrate that BTM consistently matches or improves upon standard trajectory matching, with the largest gains in low-prevalence and low-synthetic-budget settings. These results indicate that effective trajectory matching depends on structuring the supervision signal rather than reproducing stochastic optimisation paths.

Yi-Ling Liu, Melvin Laux, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable long-term monitoring. Reinforcement learning, particularly multi-task RL, overcomes these limitations by leveraging shared representations to enable efficient adaptation across tasks and environments. However, while such policies show promising results in simulation and controlled experiments, they yet remain opaque and offer limited insight into the agent’s internal decision-making, creating gaps in transparency, trust, and safety that hinder real-world deployment. The internal policy structure and task-specific specialization remain poorly understood. To address these gaps, we analyze the internal structure of a pretrained multi-task reinforcement learning network in the HoloOcean simulator for underwater navigation by identifying and comparing task-specific subnetworks responsible for navigating toward different species. We find that in a contextual multi-task reinforcement learning setting with related tasks, the network uses only about 1.5% of its weights to differentiate between tasks. Of these, approximately 85% connect the context-variable nodes in the input layer to the next hidden layer, highlighting the importance of context variables in such settings. Our approach provides insights into shared and specialized network components, useful for efficient model editing, transfer learning, and continual learning for underwater monitoring through a contextual multi-task reinforcement learning method.

[489] Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask

Ashley N. Abraham, Andrew Strelzoff, Haley R. Dozier, Althea C. Henslee, Mark A. Chappell

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large-scale Nearest Neighbor (NN) search, though widely utilized in the similarity search field, remains challenged by the computational limitations inherent in processing large scale data. In an effort to decrease the computational expense needed, Approximate Nearest Neighbor (ANN) search is often used in applications that do not require the exact similarity search, but instead can rely on an approximation. Product Quantization (PQ) is a memory-efficient ANN effective for clustering all sizes of datasets. Clustering large-scale, high dimensional data requires a heavy computational expense, in both memory-cost and execution time. This work focuses on a unique way to divide and conquer the large scale data in Python using PQ, Inverted Indexing and Dask, combining the results without compromising the accuracy and reducing computational requirements to the level required when using medium-scale data.

[490] Transferable SCF-Acceleration through Solver-Aligned Initialization Learning

Eike S. Eberhard, Viktor Kotsev, Timm Güthle, Stephan Günnemann

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Machine learning methods that predict initial guesses from molecular geometry can reduce this cost, but matrix-prediction models fail when extrapolating to larger molecules, degrading rather than accelerating convergence [Liu et al. 2025]. We show that this failure is a supervision problem, not an extrapolation problem: models trained on ground-state targets fit those targets well out of distribution, yet produce initial guesses that slow convergence. Solver-Aligned Initialization Learning (SAIL) resolves this for both Hamiltonian and density matrix models by differentiating through the SCF solver end-to-end. We introduce the Effective Relative Iteration Count (ERIC), a correction to the commonly used RIC that accounts for hidden Fock-build overhead. On QM40, containing molecules up to 4$\times$ larger than the training distribution, SAIL reduces ERIC by 37% (PBE), 33% (SCAN), and 27% (B3LYP), more than doubling the previous state-of-the-art reduction on B3LYP (10%). On QMugs molecules 10$\times$ the training size, SAIL delivers a 1.25$\times$ wall-time speedup at the hybrid level of theory, extending ML SCF acceleration to large drug-like molecules.

[491] Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

Eylon E. Krause

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The choice of activation function plays a crucial role in the optimization and performance of deep neural networks. While the Rectified Linear Unit (ReLU) remains the dominant choice due to its simplicity and effectiveness, its lack of smoothness may hinder gradient-based optimization in deep architectures. In this work we propose a family of $C^{2N}$-smooth activation functions whose gate follows a log-logistic CDF, achieving ReLU-like performance with purely rational arithmetic. We introduce three variants: GEM (the base family), E-GEM (an $ε$-parameterized generalization enabling arbitrary $L^p$-approximation of ReLU), and SE-GEM (a piecewise variant eliminating dead neurons with $C^{2N}$ junction smoothness). An $N$-ablation study establishes $N=1$ as optimal for standard-depth networks, reducing the GELU deficit on CIFAR-100 + ResNet-56 from 6.10% to 2.12%. The smoothness parameter $N$ further reveals a CNN-transformer tradeoff: $N=1$ is preferred for deep CNNs, while $N=2$ is preferred for transformers. On MNIST, E-GEM ties the best baseline (99.23%). On CIFAR-10 + ResNet-56, SE-GEM ($ε=10^{-4}$) surpasses GELU (92.51% vs 92.44%) – the first GEM-family activation to outperform GELU. On CIFAR-100 + ResNet-56, E-GEM reduces the GELU deficit from 6.10% (GEM $N=2$) to just 0.62%. On GPT-2 (124M), GEM achieves the lowest perplexity (72.57 vs 73.76 for GELU), with GEM $N=1$ also beating GELU (73.32). On BERT-small, E-GEM ($ε=10$) achieves the best validation loss (6.656) across all activations. The $ε$-parameterization reveals a scale-dependent optimum: small $ε$ ($10^{-4}$–$10^{-6}$) for deep CNNs and larger transformers, with the special case of small transformers (BERT-small) benefiting from large $ε$ ($ε=10$) due to its limited depth and unconstrained gradients.

[492] Evaluating Post-hoc Explanations of the Transformer-based Genome Language Model DNABERT-2

Isabel Kurth, Paulo Yanez Sarmiento, Bernhard Y. Renard

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Explaining deep neural network predictions on genome sequences enables biological insight and hypothesis generation-often of greater interest than predictive performance alone. While explanations of convolutional neural networks (CNNs) have been shown to capture relevant patterns in genome sequences, it is unclear whether this transfers to more expressive Transformer-based genome language models (gLMs). To answer this question, we adapt AttnLRP, an extension of layer-wise relevance propagation to the attention mechanism, and apply it to the state-of-the-art gLM DNABERT-2. Thereby, we propose strategies to transfer explanations from token and nucleotide level. We evaluate the adaption of AttnLRP on genomic datasets using multiple metrics. Further, we provide an extensive comparison between the explanations of DNABERT-2 and a baseline CNN. Our results demonstrate that AttnLRP yields reliable explanations corresponding to known biological patterns. Hence, like CNNs, gLMs can also help derive biological insights. This work contributes to the explainability of gLMs and addresses the comparability of relevance attributions across different architectures.

[493] Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Liane Vogel, Kavitha Srinivas, Niharika D’Souza, Sola Shirai, Oktie Hassanzadeh, Horst Samulowitz

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which approach works best in practice, as existing methods are often evaluated under task-specific settings that make direct comparison difficult. To address this, we introduce TEmBed, the Tabular Embedding Test Bed, a comprehensive benchmark for systematically evaluating tabular embeddings across four representation levels: cell, row, column, and table. Evaluating a diverse set of tabular representation learning models, we show that which model to use depends on the task and representation level. Our results offer practical guidance for selecting tabular embeddings in real-world applications and lay the groundwork for developing more general-purpose tabular representation models.

[494] Fairness under uncertainty in sequential decisions

Michelle Seng Ah Lee, Kirtan Padh, David Watson, Niki Kilbertus, Jatinder Singh

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Fair machine learning (ML) methods help identify and mitigate the risk that algorithms encode or automate social injustices. Algorithmic approaches alone cannot resolve structural inequalities, but they can support socio-technical decision systems by surfacing discriminatory biases, clarifying trade-offs, and enabling governance. Although fairness is well studied in supervised learning, many real ML applications are online and sequential, with prior decisions informing future ones. Each decision is taken under uncertainty due to unobserved counterfactuals and finite samples, with dire consequences for under-represented groups, systematically under-observed due to historical exclusion and selective feedback. A bank cannot know whether a denied loan would have been repaid, and may have less data on marginalized populations. This paper introduces a taxonomy of uncertainty in sequential decision-making – model, feedback, and prediction uncertainty – providing shared vocabulary for assessing systems where uncertainty is unevenly distributed across groups. We formalize model and feedback uncertainty via counterfactual logic and reinforcement learning, and illustrate harms to decision makers (unrealized gains/losses) and subjects (compounding exclusion, reduced access) of policies that ignore the unobserved space. Algorithmic examples show it is possible to reduce outcome variance for disadvantaged groups while preserving institutional objectives (e.g. expected utility). Experiments on data simulated with varying bias show how unequal uncertainty and selective feedback produce disparities, and how uncertainty-aware exploration alters fairness metrics. The framework equips practitioners to diagnose, audit, and govern fairness risks. Where uncertainty drives unfairness rather than incidental noise, accounting for it is essential to fair and effective decision-making.

[495] Transferable Physics-Informed Representations via Closed-Form Head Adaptation

Jian Cheng Wong, Isaac Yin Chung Lai, Pao-Hsiung Chiu, Chin Chun Ooi, Abhishek Gupta, Yew-Soon Ong

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Physics-informed neural networks (PINNs) have garnered significant interest for their potential in solving partial differential equations (PDEs) that govern a wide range of physical phenomena. By incorporating physical laws into the learning process, PINN models have demonstrated the ability to learn physical outcomes reasonably well. However, current PINN approaches struggle to predict or solve new PDEs effectively when there is a lack of training examples, indicating they do not generalize well to unseen problem instances. In this paper, we present a transferable learning approach for PINNs premised on a fast Pseudoinverse PINN framework (Pi-PINN). Pi-PINN learns a transferable physics-informed representation in a shared embedding space and enables rapid solving of both known and unknown PDE instances via closed-form head adaptation using a least-squares-optimal pseudoinverse under PDE constraints. We further investigate the synergies between data-driven multi-task learning loss and physics-informed loss, providing insights into the design of more performant PINNs. We demonstrate the effectiveness of Pi-PINN on various PDE problems, including Poisson’s equation, Helmholtz equation, and Burgers’ equation, achieving fast and accurate physics-informed solutions without requiring any data for unseen instances. Pi-PINN can produce predictions 100-1000 times faster than a typical PINN, while producing predictions with 10-100 times lower relative error than a typical data-driven model even with only two training samples. Overall, our findings highlight the potential of transferable representations with closed-form head adaptation to enhance the efficiency and generalization of PINNs across PDE families and scientific and engineering applications.

[496] PrismaDV: Automated Task-Aware Data Unit Test Generation

Hao Chen, Arnab Phani, Sebastian Schelter

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing automated data unit testing frameworks are largely task-agnostic: they validate datasets without considering the semantics and requirements of the code that consumes the data. We present PrismaDV, a compound AI system that analyzes downstream task code together with dataset profiles to identify data access patterns, infer implicit data assumptions, and generate task-aware executable data unit tests. To further adapt the data unit tests over time to specific datasets and downstream tasks, we propose “Selective Informative Feedback for Task Adaptation” (SIFTA), a prompt-optimization framework that leverages the scarce outcomes from the execution of data unit tests and downstream tasks. We evaluate PrismaDV on two new benchmarks spanning 60 tasks across five datasets, where it consistently outperforms both task-agnostic and task-aware baselines in generating unit tests that reflect the end-to-end impact of data errors. Furthermore, we show that with SIFTA, we can automatically learn prompts for PrismaDV’s modules that outperform prompts written by hand or generated from a generic prompt optimizer. We publicly release our benchmarks and prototype implementation.

[497] An effective variant of the Hartigan $k$-means algorithm

François Clément, Stefan Steinerberger

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd’s algorithm (1957). It has become clear that Hartigan’s algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5%$ – $10%$. We point out that a very minor variation of Hartigan’s method leads to another $2%$ – $5%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.

[498] Quotient-Space Diffusion Models

Yixian Xu, Yusong Wang, Shengjie Luo, Kaiyuan Gao, Tianyu He, Di He, Chang Liu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Diffusion-based generative models have reformed generative AI, and have enabled new capabilities in the science domain, for example, generating 3D structures of molecules. Due to the intrinsic problem structure of certain tasks, there is often a symmetry in the system, which identifies objects that can be converted by a group action as equivalent, hence the target distribution is essentially defined on the quotient space with respect to the group. In this work, we establish a formal framework for diffusion modeling on a general quotient space, and apply it to molecular structure generation which follows the special Euclidean group $\text{SE}(3)$ symmetry. The framework reduces the necessity of learning the component corresponding to the group action, hence simplifies learning difficulty over conventional group-equivariant diffusion models, and the sampler guarantees recovering the target distribution, while heuristic alignment strategies lack proper samplers. The arguments are empirically validated on structure generation for small molecules and proteins, indicating that the principled quotient-space diffusion model provides a new framework that outperforms previous symmetry treatments.

[499] GFlowState: Visualizing the Training of Generative Flow Networks Beyond the Reward

Florian Holeczek, Andreas Hinterreiter, Alex Hernandez-Garcia, Marc Streit, Christina Humer

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present GFlowState, a visual analytics system designed to illuminate the training process of Generative Flow Networks (GFlowNets or GFNs). GFlowNets are a probabilistic framework for generating samples proportionally to a reward function. While GFlowNets have proved to be powerful tools in applications such as molecule and material discovery, their training dynamics remain difficult to interpret. Standard machine learning tools allow metric tracking but do not reveal how models explore the sample space, construct sample trajectories, or shift sampling probabilities during training. Our solution, GFlowState, allows users to analyze sampling trajectories, compare the sample space relative to reference datasets, and analyze the training dynamics. To this end, we introduce multiple views, including a chart of candidate rankings, a state projection, a node-link diagram of the trajectory network, and a transition heatmap. These visualizations enable GFlowNet developers and users to investigate sampling behavior and policy evolution, and to identify underexplored regions and sources of training failure. Case studies demonstrate how the system supports debugging and assessing the quality of GFlowNets across application domains. By making the structural dynamics of GFlowNets observable, our work enhances their interpretability and can accelerate GFlowNet development in practice.

[500] A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models

Max Defez, Filippo Quarenghi, Mathieu Vrac, Stephan Mandt, Tom Beucler

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep-learning video super-resolution has progressed rapidly, but climate applications typically super-resolve (increase resolution) either space or time, and joint spatiotemporal models are often designed for a single pair of super-resolution (SR) factors (upscaling spatial and temporal ratio between the low-resolution sequence and the high-resolution sequence), limiting transfer across spatial resolutions and temporal cadences (frame rates). We present a scale-adaptive framework that reuses the same architecture across factors by decomposing spatiotemporal SR into a deterministic prediction of the conditional mean, with attention, and a residual conditional diffusion model, with an optional mass-conservation (same precipitation amount in inputs and outputs) transform to preserve aggregated totals. Assuming that larger SR factors primarily increase underdetermination (hence required context and residual uncertainty) rather than changing the conditional-mean structure, scale adaptivity is achieved by retuning three factor-dependent hyperparameters before retraining: the diffusion noise schedule amplitude beta (larger for larger factors to increase diversity), the temporal context length L (set to maintain comparable attention horizons across cadences) and optionally a third, the mass-conservation function f (tapered to limit the amplification of extremes for large factors). Demonstrated on reanalysis precipitation over France (Comephore), the same architecture spans super-resolution factors from 1 to 25 in space and 1 to 6 in time, yielding a reusable architecture and tuning recipe for joint spatiotemporal super-resolution across scales.

[501] Low-Rank Adaptation Redux for Large Models

Bingcong Li, Yilang Zhang, Georgios B. Giannakis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive which architectural choices, optimization techniques, and deployment constraints should guide practical method selection. This overview revisits LoRA through the lens of signal processing (SP), bridging modern adapter designs with classical low-rank modeling tools and inverse problems, as well as highlighting how SP principles can inform principled advances of fine-tuning approaches. Rather than providing a comprehensive enumeration and empirical comparisons of LoRA variants, emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness. These advances are categorized into three complementary axes: architectural design, efficient optimization, and pertinent applications. The first axis builds on singular value decomposition (SVD)-based factorization, rank-augmentation constructions, and cross-layer tensorization, while the second axis deals with initialization, alternating solvers, gauge-invariant optimization, and parameterization-aware methods. Beyond fine-tuning, emerging applications of LoRA are accounted across the entire lifecycle of large models, ranging from pre- and post-training to serving/deployment. Finally, open research directions are outlined at the confluence of SP and deep learning to catalyze a bidirectional frontier: classical SP tools provide a principled vocabulary for designing principled PEFT methods, while the unique challenges facing modern deep learning, especially the overwhelming scale and prohibitive overhead, also offer new research lines benefiting the SP community in return.

[502] The Sample Complexity of Multicalibration

Natalie Collina, Jiuyao Lu, Georgy Noarov, Aaron Roth

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study the minimax sample complexity of multicalibration in the batch setting. A learner observes $n$ i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most $\varepsilon$ with respect to a given family of groups. For every fixed $κ> 0$, in the regime $|G|\le \varepsilon^{-κ}$, we prove that $\widetildeΘ(\varepsilon^{-3})$ samples are necessary and sufficient, up to polylogarithmic factors. The lower bound holds even for randomized predictors, and the upper bound is realized by a randomized predictor obtained via an online-to-batch reduction. This separates the sample complexity of multicalibration from that of marginal calibration, which scales as $\widetildeΘ(\varepsilon^{-2})$, and shows that mean-ECE multicalibration is as difficult in the batch setting as it is in the online setting, in contrast to marginal calibration which is strictly more difficult in the online setting. In contrast we observe that for $κ= 0$, the sample complexity of multicalibration remains $\widetildeΘ(\varepsilon^{-2})$ exhibiting a sharp threshold phenomenon. More generally, we establish matching upper and lower bounds, up to polylogarithmic factors, for a weighted $L_p$ multicalibration metric for all $1 \le p \le 2$, with optimal exponent $3/p$. We also extend the lower-bound template to a regular class of elicitable properties, and combine it with the online upper bounds of Hu et al. (2025) to obtain matching bounds for calibrating properties including expectiles and bounded-density quantiles.

[503] Fine-Tuning Regimes Define Distinct Continual Learning Problems

Paul-Tiberiu Iordache, Elena Burceanu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable parameter subspace, is itself a key evaluation variable. We formalize adaptation regimes as projected optimization over fixed trainable subspaces, showing that changing the trainable depth alters the effective update signal through which both current task fitting and knowledge preservation operate. This analysis motivates the hypothesis that method comparisons need not be invariant across regimes. We test this hypothesis in task incremental CL, five trainable depth regimes, and four standard methods: online EWC, LwF, SI, and GEM. Across five benchmark datasets, namely MNIST, Fashion MNIST, KMNIST, QMNIST, and CIFAR-100, and across 11 task orders per dataset, we find that the relative ranking of methods is not consistently preserved across regimes. We further show that deeper adaptation regimes are associated with larger update magnitudes, higher forgetting, and a stronger relationship between the two. These results show that comparative conclusions in CL can depend strongly on the chosen fine-tuning regime, motivating regime-aware evaluation protocols that treat trainable depth as an explicit experimental factor.

[504] Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce different CL regimes and therefore different benchmark conclusions. To study this effect, we introduce a taskification-level framework based on plasticity and stability profiles, a profile distance between taskifications, and Boundary-Profile Sensitivity (BPS), which diagnoses how strongly small boundary perturbations alter the induced regime before any CL model is trained. We evaluate continual finetuning, Experience Replay, Elastic Weight Consolidation, and Learning without Forgetting on network traffic forecasting with CESNET-Timeseries24, keeping the stream, model, and training budget fixed while varying only the temporal taskification. Across 9-, 30-, and 44-day splits, we observe substantial changes in forecasting error, forgetting, and backward transfer, showing that taskification alone can materially affect CL evaluation. We further find that shorter taskifications induce noisier distribution-level patterns, larger structural distances, and higher BPS, indicating greater sensitivity to boundary perturbations. These results show that benchmark conclusions in streaming CL depend not only on the learner and the data stream, but also on how that stream is taskified, motivating temporal taskification as a first-class evaluation variable.

[505] Mind the Gap: Optimal and Equitable Encouragement Policies

Angela Zhou

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2309.07176: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2309.07176&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[506] Adaptive Soft Error Protection for Neural Network Processing

Xinghua Xue, Cheng Liu, Feng Min, Yinhe Han

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2407.19664: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2407.19664&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[507] Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2410.16698: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2410.16698&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[508] EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

Mujin Cheon, Jay H. Lee, Dong-Yeun Koh, Calvin Tsay

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.00171: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.00171&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[509] Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces

Yuwen Li, Guozhi Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.11275: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.11275&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[510] Anomaly Detection in Smart Power Grids with Graph-Regularized MS-SVDD: a Multimodal Subspace Learning Approach

Thomas Debelle, Fahad Sohrab, Pekka Abrahamsson, Moncef Gabbouj

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2502.15793: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2502.15793&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[511] Post-Training Augmentation Invariance

Keenan Eikenberry, Lizuo Liu, Yoonsang Lee

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.11702: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.11702&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[512] Product Quantization for Surface Soil Similarity

Haley Dozier, Althea Henslee, Ashley Abraham, Andrew Strelzoff, Mark Chappell

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.03374: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.03374&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[513] Distributed Associative Memory via Online Convex Optimization

Bowen Wang, Matteo Zecchin, Osvaldo Simeone

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.22321: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.22321&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[514] ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics

Luke Thompson, Davy Guan, Dai Shi, Slade Matthews, Junbin Gao, Andi Han

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.05482: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.05482&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[515] Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

Hongyi Liu, Jiaji Huang, Zhen Jia, Youngsuk Park, Yu-Xiang Wang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.20064: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.20064&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[516] Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Liang Huang, Can Tang, Xiaojiang Zhang, Junyi Peng, Li Wan, Haotian Zhang, Bin Chen

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.00413: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.00413&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[517] Artificial intelligence for methane detection: from continuous monitoring to verified mitigation

Gonzalo Mateo-Garcia, Anna Allen, Itziar Irakulis-Loitxate, Manuel Montesino-San Martin, Marc Watine, Cynthia Randles, Tharwat Mokalled, Alma Raunak, Carol Castañeda-Martinez, Juan E. Jonhson, Javier Gorroño, James Requeima, Claudio Cifarelli, Luis Guanter, Richard E. Turner, Manfredi Caltagirone

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.21777: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.21777&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[518] GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels

Bhavya Sai Nukapotula, Rishabh Tripathi, Seth Pregler, Dileep Kalathil, Srinivas Shakkottai, Theodore S. Rappaport

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.22793: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.22793&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[519] BackPlay: Head-Only Look-Back Self-Correction for Diffusion Language Models

Liming Liu, Binxuan Huang, Zixuan Zhang, Xin Liu, Bing Yin, Tuo Zhao

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.06428: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.06428&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Kirandeep Kaur, Vinayak Gupta, Aditya Gupta, Chirag Shah

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.09926: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.09926&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[521] Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting

Chutian Ma, Grigorii Pomazkin, Giacinto Paolo Saggese, Paul Smith

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.10863: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.10863&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[522] Partially Lazy Gradient Descent for Smoothed Online Learning

Naram Mhaisen, George Iosifidis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.15984: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.15984&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[523] A Dynamic Framework for Grid Adaptation in Kolmogorov-Arnold Networks

Spyros Rigas, Thanasis Papaioannou, Panagiotis Trakadas, Georgios Alexandridis

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.18672: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.18672&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[524] BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

Run Wang, Victor J. B. Jung, Philip Wiese, Sebastian Frey, Giusy Spacone, Francesco Conti, Alessio Burrello, Luca Benini

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.13359: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.13359&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[525] Scalable Physics-Informed Neural Differential Equations and Data-Driven Algorithms for HVAC Systems

Hanfeng Zhai, Hongtao Qiao, Hassan Mansour, Christopher Laughman

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.18438: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.18438&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[526] Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

David Holzmüller, Francis Bach

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2303.03237: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2303.03237&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[527] Cosmological Analysis with Calibrated Neural Quantile Estimation and Approximate Simulators

He Jia

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.14748: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.14748&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[528] Weighted quantization using MMD: From mean field to mean shift via gradient flows

Ayoub Belhadji, Daniel Sharp, Youssef Marzouk

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2502.10600: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2502.10600&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[529] Accurate predictive model of band gap with selected important features based on explainable machine learning

Joohwi Lee, Kaito Miyamoto

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.04492: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.04492&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[530] ICNN-enhanced 2SP: Leveraging input convex neural networks for solving two-stage stochastic programming

Yu Liu, Fabricio Oliveira, Jan Kronqvist

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.05261: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.05261&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[531] GARG-AML against Smurfing: A Scalable and Interpretable Graph-Based Framework for Anti-Money Laundering

Bruno Deprez, Bart Baesens, Tim Verdonck, Wouter Verbeke

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.04292: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.04292&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[532] Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

Stella Huang, Qing Zhou

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.05590: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.05590&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[533] Certified Coil Geometry Learning for Short-Range Magnetic Actuation and Spacecraft Docking Application

Yuta Takahashi, Hayate Tajima, Shin-ichiro Sakai

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.03806: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.03806&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[534] Artifacts of Numerical Integration in Learning Dynamical Systems

Bing-Ze Lu, Richard Tsai

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.14491: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.14491&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[535] When Langevin Monte Carlo Meets Randomization: New Sampling Algorithms with Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

Xiaojie Wang, Bin Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.25630: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.25630&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[536] Learning Linear Regression with Low-Rank Tasks in-Context

Kaito Takanami, Takashi Takahashi, Yoshiyuki Kabashima

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.04548: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.04548&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[537] Spira: Exploiting Voxel Data Structural Properties for Efficient Sparse Convolution in Point Cloud Networks

Dionysios Adamopoulos, Anastasia Poulopoulou, Georgios Goumas, Christina Giannoula

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.20834: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.20834&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[538] Concurrence: A dependence criterion for time series, applied to biological data

Evangelos Sariyanidi, John D. Herrington, Lisa Yankowitz, Pratik Chaudhari, Theodore D. Satterthwaite, Casey J. Zampella, Jeffrey S. Morris, Edward Gunning, Robert T. Schultz, Russell T. Shinohara, Birkan Tunc

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.16001: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.16001&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[539] H-EFT-VA: An Effective-Field-Theory Variational Ansatz with Provable Barren Plateau Avoidance

Eyad I.B Hamid

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.10479: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.10479&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[540] Active Learning for Planet Habitability Classification under Extreme Class Imbalance

R. I. El-Kholy, Z. M. Hayman

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.23666: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.23666&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[541] Spatio-temporal probabilistic forecast using MMAF-guided learning

Leonardo Bardi, Imma Valentina Curato, Lorenzo Proietti

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.15055: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.15055&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[542] PDGMM-VAE: A Variational Autoencoder with Adaptive Per-Dimension Gaussian Mixture Model Priors for Nonlinear ICA

Yuan-Hao Wei, Yan-Jie Sun

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.23547: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.23547&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[543] Toward a Multi-Layer ML-Based Security Framework for Industrial IoT

Aymen Bouferroum, Valeria Loscri, Abderrahim Benslimane

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.24111: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.24111&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[544] Conformal Prediction Assessment: A Framework for Conditional Coverage Evaluation and Selection

Zheng Zhou, Xiangfei Zhang, Chongguang Tao, Yuhong Yang

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.27189: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.27189&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[545] Transfer Learning for Loan Recovery Prediction under Distribution Shifts with Heterogeneous Feature Spaces

Christopher Gerling, Hanqiu Peng, Ying Chen, Stefan Lessmann

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.02832: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.02832&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[546] Phase Transitions in the Fluctuations of Functionals of Random Neural Networks

Simmaco Di Lillo, Leonardo Maini, Domenico Marinucci

Main category: cs.LG

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19738: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19738&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[547] Forecasting Individual NetFlows using a Predictive Masked Graph Autoencoder

Georgios Anyfantis, Pere Barlet-Ros

[553] AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe

Adam Cole, Mick Grierson

Main category: cs.MM

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists’ ability to build intuition for the model’s material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over linear edits. AttentionBender contributes a tool that functions both as an Explainable AI style probe of transformer attention mechanisms, and as a creative technique for producing novel aesthetics beyond the model’s learned representational space.

[554] Sema: Semantic Transport for Real-Time Multimodal Agents

Jiaying Meng, Bojie Li

Main category: cs.MM

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-time multimodal agents transport raw audio and screenshots using networking stacks designed for human receivers, which optimize for perceptual fidelity and smooth playout. Yet agent models act as event-driven processors with no inherent sense of physical time, consuming task-relevant semantics rather than reconstructing signals in real time. This fundamental difference shifts the transport goal from the technical problem of signal fidelity (Shannon-Weaver Level A) to the semantic problem of meaning preservation (Level B). This mismatch imposes significant overhead. In visual pipelines, screenshot upload accounts for over 60% of end-to-end action latency on constrained uplinks, and in voice pipelines, conventional transport carries massive redundancy, sending 43-64x more data than needed to maintain task accuracy. We present Sema, a semantic transport system that combines discrete audio tokenizers with a hybrid screen representation (lossless accessibility-tree or OCR text, plus compact visual tokens) and bursty token delivery that eliminates jitter buffers. In simulations under emulated WAN conditions, Sema reduces uplink bandwidth by 64x for audio and 130-210x for screenshots while preserving task accuracy within 0.7 percentage points of the raw baseline.

[555] High-Fidelity 3D Gaussian Human Reconstruction via Region-Aware Initialization and Geometric Priors

Yang Liu, Zhiyong Zhang

Main category: cs.MM

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-time, high-fidelity 3D human reconstruction from RGB images is essential for interactive applications such as virtual reality and gaming, yet remains challenging due to the complex non-rigid deformations of dynamic human bodies. Although 3D Gaussian Splatting enables efficient rendering, existing methods struggle to capture fine geometric details and often produce artifacts such as fused fingers and over-smoothed faces. Moreover, conventional spatial-field-based dynamic modeling faces a trade-off between reconstruction fidelity and GPU memory consumption. To address these issues, we propose a novel 3D Gaussian human reconstruction framework that combines region-aware initialization with rich geometric priors. Specifically, we leverage the expressive SMPL-X model to initialize both 3D Gaussians and skinning weights, providing a robust geometric foundation for precise reconstruction. We further introduce a region-aware density initialization strategy and a geometry-aware multi-scale hash encoding module to improve local detail recovery while maintaining computational efficiency.Experiments on PeopleSnapshot and GalaBasketball show that our method achieves superior reconstruction quality and finer detail preservation under complex motions, while maintaining real-time rendering speed.

[556] Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

Dali Wang, Yunyao Zhang, Junqing Yu, Yi-Ping Phoebe Chen, Chen Xu, Zikai Song

[563] EDU-Net: Retinal Pathological Fluid Segmentation in OCT Images with Multiscale Feature Fusion and Boundary Optimization

Zijun Lei, Zikang Xu, Liang Zhang, Ge Song, Hanyu Guo, Dan Cao, Yujia Zhou, Qianjin Feng

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Objective: Diabetic macular edema (DME) is the leading cause of severe visual impairment in patients with diabetes. Quantification of retinal fluid, particularly intraretinal fluid (IRF) and subretinal fluid (SRF), plays a critical role in the management of DME. Although optical coherence tomography (OCT) can be used for detection, the variable morphology of fluid accumulation and the blurred boundaries caused by noise interference still limit the accuracy of OCT’s automatic segmentation. Methods: Retrospective model development and validation study. This study proposes a novel edge-guided dual-branch encoder-decoder network (EDU-Net) to achieve accurate and efficient automatic segmentation of OCT liquid lesions. The local feature extraction branch is based on the EfficientNet model, which precisely captures tiny lesions by leveraging its lightweight separable convolution and high-resolution feature preservation strategy. The global feature extraction branch is based on the large-kernel efficient convolution (LKEC) module and the downsampling layer design to enhance long-range dependencies and global semantics. EDU-Net applies a multi-category edge-guided attention module to fuse high-frequency boundary detail information to each resolution feature to optimize the boundary segmentation performance. Results: Extensive results on the in-house and public datasets demonstrate that EDU-Net achieves state-of-the-art DSC segmentation performance in terms of efficiency and robustness, especially in the segmentation of IRF lesions. Conclusions: EDU-Net integrates local details with global context and optimizes boundaries, achieving an improvement in the accuracy of automatic segmentation of retinal fluid.

[564] DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Shiyan Su, Ruyi Zha, Danli Shi, Hongdong Li, Xuelian Cheng

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Neural representations (NRs), such as neural fields and 3D Gaussians, effectively model volumetric data in computed tomography (CT) but suffer from severe artifacts under sparse-view settings. To address this, we propose DiffNR, a novel framework that enhances NR optimization with diffusion priors. At its core is SliceFixer, a single-step diffusion model designed to correct artifacts in degraded slices. We integrate specialized conditioning layers into the network and develop tailored data curation strategies to support model finetuning. During reconstruction, SliceFixer periodically generates pseudo-reference volumes, providing auxiliary 3D perceptual supervision to fix underconstrained regions. Compared to prior methods that embed CT solvers into time-consuming iterative denoising, our repair-and-augment strategy avoids frequent diffusion model queries, leading to better runtime performance. Extensive experiments show that DiffNR improves PSNR by 3.99 dB on average, generalizes well across domains, and maintains efficient optimization.

[565] Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction for Sparse-View CT

Haodong Li, Shuo Han, Haiyang Mao, Yu Shi, Changsheng Fang, Jianjia Zhang, Weiwen Wu, Hengyong Yu

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Sparse-View CT (SVCT) reconstruction enhances temporal resolution and reduces radiation dose, yet its clinical use is hindered by artifacts due to view reduction and domain shifts from scanner, protocol, or anatomical variations, leading to performance degradation in out-of-distribution (OOD) scenarios. In this work, we propose a Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction (CDPIR) framework to tackle the OOD problem in SVCT. CDPIR integrates cross-distribution diffusion priors, derived from a Scalable Interpolant Transformer (SiT), with model-based iterative reconstruction methods. Specifically, we train a SiT backbone, an extension of the Diffusion Transformer (DiT) architecture, to establish a unified stochastic interpolant framework, leveraging Classifier-Free Guidance (CFG) across multiple datasets. By randomly dropping the conditioning with a null embedding during training, the model learns both domain-specific and domain-invariant priors, enhancing generalizability. During sampling, the globally sensitive transformer-based diffusion model exploits the cross-distribution prior within the unified stochastic interpolant framework, enabling flexible and stable control over multi-distribution-to-noise interpolation paths and decoupled sampling strategies, thereby improving adaptation to OOD reconstruction. By alternating between data fidelity and sampling updates, our model achieves state-of-the-art performance with superior detail preservation in SVCT reconstructions. Extensive experiments demonstrate that CDPIR significantly outperforms existing approaches, particularly under OOD conditions, highlighting its robustness and potential clinical value in challenging imaging scenarios.

[566] Towards robust quantitative photoacoustic tomography via learned iterative methods

Anssi Manninen, Janek Gröhl, Felix Lucka, Andreas Hauptmann

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Photoacoustic tomography (PAT) is a medical imaging modality that can provide high-resolution tissue images based on the optical absorption. Classical reconstruction methods for quantifying the absorption coefficients rely on sufficient prior information to overcome noisy and imperfect measurements. As these methods utilize computationally expensive forward models, the computation becomes slow, limiting their potential for time-critical applications. As an alternative approach, deep learning-based reconstruction methods have been established for faster and more accurate reconstructions. However, most of these methods rely on having a large amount of training data, which is not the case in practice. In this work, we adopt the model-based learned iterative approach for the use in Quantitative PAT (QPAT), in which additional information from the model is iteratively provided to the updating networks, allowing better generalizability with scarce training data. We compare the performance of different learned updates based on gradient descent, Gauss-Newton, and Quasi-Newton methods. The learning tasks are formulated as greedy, requiring iterate-wise optimality, as well as end-to-end, where all networks are trained jointly. The implemented methods are tested with ideal simulated data as well as against a digital twin dataset that emulates scarce training data and high modeling error.

[567] Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

Aneesh Rangnekar, Harini Veeraraghavan

Main category: eess.IV

TL;DR: Error: Processing failed

Details

Motivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Accurate segmentation of lung tumors from 3D computed tomography (CT) scans is essential for automated treatment planning and response assessment. Despite self-supervised pretraining on numerous datasets, state-of-the-art transformer backbones remain susceptible to out-of-distribution (OOD) inputs, often producing confidently incorrect segmentations with potential for risk in clinical deployment. Hence, we introduce RF-Deep, a lightweight post-hoc random forests-based framework that leverages deep features trained with limited outlier exposure, requiring as few as 40 labeled scans (20 in-distribution and 20 OOD), to improve scan-level OOD detection. RF-Deep repurposes the hierarchical features from the pretrained-then-finetuned segmentation backbones, aggregating features from multiple regions-of-interest anchored to predicted tumor regions to capture OOD likelihood. We evaluated RF-Deep on 2,232 CT volumes spanning near-OOD (pulmonary embolism, COVID-19 negative) and far-OOD (kidney cancer, healthy pancreas) datasets. RF-Deep achieved AUROC >~93 on the challenging near-OOD datasets, where it outperformed the next best method by 4–7 percentage points, and produced near-perfect detection (AUROC >~99) on far-OOD datasets. The approach also showed transferability to two blinded validation datasets under the ensemble configuration (COVID-19 positive and breast cancer; AUROC >~94). RF-Deep maintained consistent performance across backbones of different depths and pretraining strategies, demonstrating applicability of post-hoc detectors as a safety filter for clinical deployment of tumor segmentation pipelines.

Today’s Research Highlights

Table of Contents

cs.CL

[1] AITP: Traffic Accident Responsibility Allocation via Multimodal Large Language Models

[2] AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

[3] Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

[4] TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping

[5] Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

[6] DWTSumm: Discrete Wavelet Transform for Document Summarization

[7] Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

[8] Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation

[9] Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting

[10] Machine learning and digital pragmatics: Which word category influences emoji use most?

[11] Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts

[12] GRISP: Guided Recurrent IRI Selection over SPARQL Skeletons

[13] Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

[14] StructMem: Structured Memory for Long-Horizon Behavior in LLMs

[15] Enhancing Science Classroom Discourse Analysis through Joint Multi-Task Learning for Reasoning-Component Classification

[16] Slot Machines: How LLMs Keep Track of Multiple Entities

[17] Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue

[18] “This Wasn’t Made for Me”: Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

[19] Prefix Parsing is Just Parsing

[20] On Reasoning Behind Next Occupation Recommendation

[21] Subject-level Inference for Realistic Text Anonymization Evaluation

[22] Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

[23] EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

[24] Unlocking the Power of Large Language Models for Multi-table Entity Matching

[25] Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation

[26] When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

[27] Listen and Chant Before You Read: The Ladder of Beauty in LM Pre-Training

[28] Cross-Entropy Is Load-Bearing: A Pre-Registered Scope Test of the K-Way Energy Probe on Bidirectional Predictive Coding

[29] Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

[30] When Bigger Isn’t Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

[31] CARE: Counselor-Aligned Response Engine for Online Mental-Health Support

[32] MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization

[33] VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

[34] Decoupled DiLoCo for Resilient Distributed Pre-training

[35] Reasoning Primitives in Hybrid and Non-Hybrid LLMs

[36] Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

[37] Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

[38] OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

[39] Job Skill Extraction via LLM-Centric Multi-Module Framework

[40] UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

[41] Finding Meaning in Embeddings: Concept Separation Curves

[42] Measuring Opinion Bias and Sycophancy via LLM-based Coercion

[43] AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

[44] Language as a Latent Variable for Reasoning Optimization

[45] Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

[46] Multilinguality at the Edge: Developing Language Models for the Global South

[47] Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales

[48] Fixation Sequences as Time Series: A Topological Approach to Dyslexia Detection

[49] Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

[50] From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation

[51] Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

[52] AEL: Agent Evolving Learning for Open-Ended Environments

[53] Why are all LLMs Obsessed with Japanese Culture? On the Hidden Cultural and Regional Biases of LLMs

[54] AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA

[55] Misinformation Span Detection in Videos via Audio Transcripts

[56] SemEval-2026 Task 4: Narrative Story Similarity and Narrative Representation Learning

[57] Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

[58] Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

[59] A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

[60] TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

[61] Mango: Multi-Agent Web Navigation via Global-View Optimization

[62] EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

[63] Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach

[64] GiVA: Gradient-Informed Bases for Vector-Based Adaptation

[65] MathDuels: Evaluating LLMs as Problem Posers and Solvers

[66] Evaluation of Automatic Speech Recognition Using Generative Large Language Models

[67] Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

[68] Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

[69] Federated Co-tuning Framework for Large and Small Language Models

[70] SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

[71] XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration

[72] Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

[73] It’s High Time: A Survey of Temporal Question Answering

[74] RewardBench 2: Advancing Reward Model Evaluation

[75] Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

[76] Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models

[77] EduCoder: An Open-Source Annotation System for Education Transcript Data