Daily arXiv Papers - 2026-05-04

AI-enhanced summaries of 0 research papers from arXiv

Today’s Research Highlights

AI-enhanced summaries of the latest research papers from arXiv.

Table of Contents

cs.CL

[1] Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment

Woody Haosheng Gan, William Held, Diyi Yang

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The rapid proliferation of large audio models (LAMs) demands efficient approaches for model comparison, yet comprehensive benchmarks are costly. To fill this gap, we investigate whether minimal subsets can reliably evaluate LAMs while reducing costs and data redundancy. Analyzing 10 subset selection methods with 18 audio models across 40 tasks covering major LAM evaluation dimensions, we show that subsets of just 50 examples (0.3% of data) can achieve over 0.93 Pearson correlation with full benchmark scores. To understand how well these scores align with what practitioners ultimately care about, user satisfaction, we collect 776 human preference ratings from realistic voice assistant conversations, finding that both subsets and full benchmark achieve only 0.85 correlation with human. To better predict preferences, we trained regression models on these selected subsets, achieving 0.98 correlation – outperforming regression models trained on both random subsets and the full benchmark. This demonstrates that in regression modeling, well-curated subsets outpredict the full benchmark, showing quality over quantity. We open-source these regression-weighted subsets as the HUMANS benchmark, an efficient proxy for LAM evaluation that captures both benchmark performance and user preferences.

[2] NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

Enzo S. N. Silva, Pablo B. Costa, Raphael C. Vlasman, Rosimeire P. Costa, Henrique L. P. Silva, Lucas F. A. O. Pellicer, Guilherme Rinaldo, Renato A. Almeida, Darian S. R. Rabbani, Cinthya O. Oestreich, Vinicius F. Caridá

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: High-quality corpora are essential for advancing Natural Language Processing (NLP) in Portuguese. Building on previous encoder-only models such as BERTimbau and Albertina PT-BR, we introduce NorBERTo, a modern encoder based on the ModernBERT architecture, featuring long-context support and efficient attention mechanisms. NorBERTo is trained on Aurora-PT, a newly curated Brazilian Portuguese corpus comprising 331 billion GPT-2 tokens collected from diverse web sources and existing multilingual datasets. We systematically benchmark NorBERTo against Strong baselines on semantic similarity, textual entailment and classification tasks using standardized datasets such as ASSIN 2 and PLUE. On PLUE, NorBERTo-large achieves the best results among the encoder models we evaluated, notably reaching 0.9191 F1 on MRPC and 0.7689 accuracy on RTE. On ASSIN 2, NorBERTo-large attains the highest entailment F1 (~0.904) among all encoders considered, although Albertina-900M and BERTimbau-large still hold an advantage. To the best of our knowledge, Aurora-PT is currently the largest openly available monolingual Portuguese corpus, surpassing previous resources. NorBERTo provides a modern, mid-sized encoder designed for realistic deployment scenarios: it is straight-forward to fine-tune, efficient to serve, and well suited as a backbone for retrieval-augmented generation and other downstream Portuguese NLP systems.

[3] How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses

Ishan Gupta, Pavlo Buryi

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We examine if frontier chat-based large language models (LLMs) adjust their outputs based on neurodivergence (ND) context in system prompts and describe the nature of these adjustments. Specifically, we propose NDBench, a 576-output benchmark involving two frontier models, three system prompt types (baseline, ND-profile assertion, and ND-profile assertion with explicit instructions for adjustments), four canonical ND profiles, and 24 prompts across four categories, one of which involves an adversarial masking strategy. Four trends emerge consistently from our findings. First, LLMs show significant adaptation under ND context, where fully instructed conditions yield lengthier and more structured outputs, characterized by higher token counts, more headings, and more granular steps (p < 10^-8, Holm-corrected). Second, such adaptation is largely structural in nature: although list density does not change much, there is a marked rise in the frequency of headings and per-step detail. Third, ND persona assertion alone fails to suppress potentially harmful tendencies, as masking-reinforcement decreases only in explicitly instructed cases (36-44% reduction); the reduction rate barely changes in persona assertion conditions. Moreover, reliability analysis of LLM-based harm assessment reveals that only two out of the six dimensions (masking and reinforcement, validation quality) exceed the pre-defined inter-judge agreement criterion (alpha >= 0.67) and thus can be considered primary results. NDBench is made publicly available along with its prompts, outputs, code, and other resources, forming a reproducible framework for auditing future LLMs’ adaptation to ND awareness.

Nhung Thi-Hong Duong, Mai Ngoc Ho, Tin Van Huynh, Kiet Van Nguyen

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this article, we introduce ViLegalNLI, the first large-scale Vietnamese Natural Language Inference (NLI) dataset specifically constructed for the legal domain. The dataset consists of 42,012 premise-hypothesis pairs derived from official statutory documents and annotated with binary inference labels (Entailment and Non-entailment). It covers multiple legal domains and reflects realistic legal reasoning scenarios characterized by structured logic, conditional clauses, and domain-specific terminology. To construct ViLegalNLI, we propose a semi-automatic data generation framework that integrates large language models for controlled hypothesis generation and systematic quality validation procedures. The framework incorporates artifact mitigation strategies and cross-model validation to improve annotation reliability and ensure legal consistency. The resulting dataset captures diverse reasoning patterns, including paraphrasing, logical implication, and legally invalid inferences, thereby providing a comprehensive benchmark for Vietnamese legal inference tasks. We conduct extensive experiments on the ViLegalNLI using multilingual models, Vietnamese-specific pretrained language models, and instruction-tuned large language models. The results show that few-shot LLM configurations consistently achieve superior performance, while performance is significantly influenced by hypothesis length, lexical overlap, and reasoning complexity. Cross-domain evaluations further reveal the challenges of generalizing legal inference across distinct legal fields. Overall, ViLegalNLI establishes a foundational benchmark for Vietnamese legal NLI and supports future research in legal reasoning, statutory text understanding, and the development of reliable AI systems for legal analysis and decision support. The dataset is publicly available for research purposes.

[5] Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

Gaofei Shen, Martijn Bentum, Tom Lentz, Afra Alishahi, Grzegorz Chrupała

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Probing is widely used to study which features can be decoded from language model representations. However, the common decoding probe approach has two limitations that we aim to solve with our new encoding probe approach: contributions of different features to model representations cannot be directly compared, and feature correlations can affect probing results. We present an Encoding Probe that reverses this direction and reconstructs internal representations of models using interpretable features. We evaluate this method on text and speech transformer models, using feature sets spanning acoustics, phonetics, syntax, lexicon, and speaker identity. Our results suggest that speaker-related effects vary strongly across different training objectives and datasets, while syntactic and lexical features contribute independently to reconstruction. These results show that the Encoding Probe provides a complementary perspective on interpreting model representations beyond decodability.

[6] Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

Muhammad Dehan Al Kautsar, Saeed Almheiri, Momina Ahsan, Bilal Elbouardi, Younes Samih, Sarfraz Ahmad, Amr Keleg, Omar El Herraoui, Kareem Elzeky, Abed Alhakim Freihat, Mohamed Anwar, Zhuohan Xie, Junhong Liang, Mohammad Rustom Al Nasar, Preslav Nakov, Fajri Koto

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: There is a significant gap in evaluating cultural reasoning in LLMs using conversational datasets that capture culturally rich and dialectal contexts. Most Arabic benchmarks focus on short text snippets in Modern Standard Arabic (MSA), overlooking the cultural nuances that naturally arise in dialogues. To address this gap, we introduce ArabCulture-Dialogue, a culturally grounded conversational dataset covering 13 Arabic-speaking countries, in both MSA and each country’s respective dialect, spanning 12 daily-life topics and 54 fine-grained subtopics. We utilize the dataset to form three benchmarking tasks: (i) multiple-choice cultural reasoning, (ii) machine translation between MSA and dialects, and (iii) dialect-steering generation. Our experiments indicate that the performance gap between MSA and Arabic dialects still exists, whereby the models perform worse on all three tasks in the dialectal setup, compared to the MSA one.

[7] Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Yuxi Ma, Yongqian Peng, Junchen Lyu, Chi Zhang, Yixin Zhu

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Humor is a fundamental cognitive phenomenon in which humans derive pleasure from the expectation violations and their resolution, exemplifying the brain’s dynamic capacity for predictive processing. Classical humor theories emphasize semantic incongruity as the primary driver of amusement, yet overlook temporal dynamics despite comedians’ intuition that “timing is everything.” The extent to which temporal structure contributes to humor appreciation and how it interacts with semantic content remains poorly understood. Here, we propose the Dual Prediction Violation (DPV) framework to capture the interplay between content and timing. By analyzing 828 professional Chinese stand-up performances, we show that temporal features substantially outweigh semantic incongruity in predicting audience appreciation. Specifically, we find that peak semantic violations matter more than average incongruity levels, and pauses systematically lengthen before high-surprise punchlines–a strategic coupling that distinguishes successful from unsuccessful performances. These findings reframe humor as temporally scaffolded, where timing and semantic content operate in strategic coordination rather than independently. Our DPV framework bridges humor theory with predictive processing, demonstrating that temporal structure plays a central role in naturalistic humor appreciation with implications for understanding multi-scale prediction integration in linguistic processing.

[8] Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

Kai-Wei Chang, En-Pei Hu, Chun-Yi Kuan, Wenze Ren, Wei-Chih Chen, Guan-Ting Lin, Yu Tsao, Shao-Hua Sun, Hung-yi Lee, James Glass

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.26388: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.26388&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[9] RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

Jugal Gajjar, Kamalasankari Subramaniakuppusamy

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: When a language model answers a table question, users have no way to verify which cells informed which reasoning steps. We introduce RSAT, a method that trains small language models (SLMs, 1-8B) to produce step-by-step reasoning with cell-level citations grounded in table evidence. Phase 1 (SFT) teaches a structured JSON output format from verified reasoning traces. Phase 2 (GRPO) optimizes a composite reward centered on NLI-based faithfulness, alongside citation validity and parsimony. Across six models from two families-Qwen 2.5 (1.5B/3B/7B) and Llama 3 (1B/3B/8B)-RSAT improves faithfulness 3.7$\times$ over SFT alone (0.224$\rightarrow$0.826), with near-perfect citation validity (0.992). Post-hoc attribution collapses below 13% format success, confirming that attribution must be integrated into reasoning, not retrofitted. Ablations show the faithfulness reward is essential: removing it drops faithfulness from 0.97 to 0.03.

[10] Confidence Estimation in Automatic Short Answer Grading with LLMs

Longwei Cong, Sonja Hahn, Sebastian Gombert, Leon Camus, Hendrik Drachsler, Ulf Kroehne

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automatic Short Answer Grading (ASAG) with generative large language models (LLMs) has recently demonstrated strong performance without task-specific fine-tuning, while also enabling the generation of synthetic feedback for educational assessment. Despite these advances, LLM-based grading remains imperfect, making reliable confidence estimates essential for safe and effective human-AI collaboration in educational decision-making. In this work, we investigate confidence estimation for ASAG with LLMs by jointly considering model-based confidence signals and dataset-derived uncertainty. We systematically compare three model-based confidence estimation strategies, namely verbalizing, latent, and consistency-based confidence estimation, and show that model-based confidence alone is insufficient to reliably capture uncertainty in ASAG. To address this limitation, we propose a hybrid confidence framework that integrates model-based confidence signals with an explicit estimate of dataset-derived aleatoric uncertainty. Aleatoric uncertainty is operationalized by clustering semantically embedded student responses and quantifying within-cluster heterogeneity. Our results demonstrate that the proposed hybrid confidence measure yields more reliable confidence estimates and improves selective grading performance compared to single-source approaches. Overall, this work advances confidence-aware LLM-based grading for human-in-the-loop assessment, supporting more trustworthy AI-assisted educational assessment systems.

Jan Sobotka, Mustafa O. Karabag, Ufuk Topcu

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are increasingly tasked with strategic decision-making under incomplete information, such as in negotiation and policymaking. While LLMs can excel at many such tasks, they also fail in ways that are poorly understood. We shed light on these failures by uncovering two fundamental gaps in the internal mechanisms underlying the decision-making of LLMs in incomplete-information games, supported by experiments with open-weight models Llama 3.1, Qwen3, and gpt-oss. First, an observation-belief gap: LLMs encode internal beliefs about latent game states that are substantially more accurate than their own verbal reports, yet these beliefs are brittle. In particular, the belief accuracy degrades with multi-hop reasoning, exhibits primacy and recency biases, and drifts away from Bayesian coherence over extended interactions. Second, a belief-action gap: The implicit conversion of internal beliefs into actions is weaker than that of the beliefs externalized in the prompt, yet neither belief-conditioning consistently achieves higher game payoffs. These results show how analyzing LLMs’ internal processes can expose systematic vulnerabilities that warrant caution before deploying LLMs in strategic domains without robust guardrails.

[12] Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Prerna Juneja, Lika Lomidze

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: There are growing concerns about the risks posed by AI companion applications designed for emotional engagement. Existing safety evaluations often rely on self-reported user data or interviews, offering limited insights into real-time dynamics. We present the first end-to-end scalable framework for controlled simulation and safety evaluation of multi-turn interactions with AI companion applications. Our framework integrates four key components: persona construction with clinical and psychometric validation, persona-specific scenario generation, scenario-driven multi-turn simulation with a dialogue refinement module that preserves persona fidelity, and harm evaluation. We apply this framework to evaluate how Replika, a widely used AI companion app, responds to high-risk user groups. We construct 9 personas representing individuals with depression, anxiety, PTSD, eating disorders, and incel identity, and collect 1,674 dialogue pairs across 25 high-risk scenarios. We combine emotion modeling and LLM-assisted utterance-and harm-level classification to analyze these exchanges. Results show that Replika exhibits a narrow emotional range dominated by curiosity and care, while frequently mirroring or normalizing unsafe content such as self-harm, disordered eating, and violent-fantasy narratives. These findings highlight how controlled persona simulations can serve as a scalable testbed for evaluating safety risks in AI companions.

[13] Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory

Longwei Cong, Sonja Hahn, Sebastian Gombert, Leon Camus, Hendrik Drachsler, Ulf Kroehne

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automated short answer grading (ASAG) with large language models (LLMs) is commonly evaluated with aggregate metrics such as macro-F1 and Cohen’s kappa. However, these metrics provide limited insight into how grading performance varies across student responses of differing grading difficulty. We introduce an evaluation framework for LLM-based ASAG based on item response theory (IRT), which models grading correctness as a function of latent grader ability and response grading difficulty. This formulation enables response-level analysis of where LLM graders succeed or fail and reveals robustness differences that are not visible from aggregate scores alone. We apply the framework to 17 open-weight LLMs on the SciEntsBank and Beetle benchmarks. The results show that even models with similar overall performance differ substantially in how sharply their grading accuracy declines as response difficulty increases. In addition, confusion patterns show that errors on difficult responses concentrate disproportionately on the \texttt{partially_correct_incomplete} label, indicating a tendency toward intermediate-label collapse under ambiguity. To characterize difficult responses, we further analyze semantic and linguistic correlates of estimated difficulty. Across both datasets, higher difficulty is associated with weaker semantic alignment to the reference answer, stronger contradiction signals, and greater semantic isolation in embedding space. Overall, these results show that item response theory offers a useful framework for evaluating LLM-based ASAG beyond aggregate performance measures.

[14] Lost in State Space: Probing Frozen Mamba Representations

Bhagyashree Wagh, Akash Singh

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Mamba’s recurrent state h_t is, by construction, a compressed summary of every token seen so far. This raises a tempting hypothesis: if we extract token-level outputs y_t at fixed patch boundaries, we obtain semantic sentence summaries for free, with no pooling head, no fine-tuning, and no [CLS] token. We test this hypothesis carefully. Across five benchmarks (SST-2, CoLA, MRPC, STS-B, IMDb), we compare four strategies for extracting frozen sentence representations from a pretrained Mamba-130M backbone under a strict frozen-feature probing protocol, using three random seeds where computationally feasible. The results do not support the hypothesis: patch boundary readouts do not consistently outperform simple mean pooling. We identify and quantify two structural pathologies: severe anisotropy (mean pairwise cosine similarity 0.9999, std 0.000044) and representational collapse in the raw final SSM state (MCC = 0.000 on CoLA across all three seeds, confirmed via confusion matrix). We further propose orthogonal injection, a modified recurrence that constrains new information per

[15] Retrieval-Augmented Reasoning for Chartered Accountancy

Jatin Gupta, Akhil Sharma, Saransh Singhania, Ali Imam Abidi

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The inception of Large Language Models (LLMs) has catalyzed AI adoption in the finance sector, yet their reliability in complex, jurisdiction-specific tasks like Indian Chartered Accountancy (CA) remains limited. The models display difficulty in executing numerical tasks which require multiple steps while also needing advanced knowledge about legal regulations and the method of scaling their operations is not feasible in settings which have limited access to resources. We present CA-ThinkFlow as a parameter-efficient Retrieval-Augmented Generation (RAG) framework which operates with a 14B, 4-bit-quantized reasoning model, 14B-DeepSeek-R1, and a layout-aware Docling extraction system which maintains document structure during extraction. CA-ThinkFlow uses a basic RAG method which automatically adds retrieved information into the prompt, while it depends on the model’s built-in Chain-of-Thought (CoT) functions to create context and produce correct answers. The system we developed system operates at performance levels which match large proprietary models when we tested it on the multi-level CA-Ben benchmark, achieving Scholastic Reliability Coefficient (SRC) results which equal 68.75% of GPT-4o and Claude 3.5 Sonnet. The framework shows high efficiency and strength in handling parameters, but essential reasoning abilities fail to process complex regulatory texts which exist in fields such as Taxation.

[16] How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework

Hamidreza Saghir

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent white-box OOD detection methods for LLMs – including CED, RAUQ, and WildGuard confidence scores – appear effective, but we show they are structurally confounded by sequence length (|r| >= 0.61) and collapse to near-chance under length-matched evaluation. Even raw attention entropy (mean H(alpha) across heads and layers), a natural baseline we include for completeness, shows the same confound. The confound stems from attention’s Theta(log T) dependence on input length. To identify genuine OOD signals after deconfounding, we propose a two-pathway framework: embeddings capture what text is about (effective for topic shifts), while the processing trajectory – hidden-state evolution across layers – captures how the model processes input. The relative power of each pathway varies along a vocabulary-transparency spectrum: embedding methods excel on vocabulary-distinctive OOD, while trajectory features detect covert-intent inputs that share vocabulary with normal text (0.721 avg AUROC; Jailbreak: 0.850). Three evidence lines support this framework: (1) a crossover between k-NN and trajectory scoring across 6 tasks, where each pathway wins on different OOD types; (2) a per-layer analysis showing that layer-0 k-NN signal is almost entirely a length artifact (Jailbreak: 0.759 raw -> 0.389 matched) – processing constructs genuine OOD signal from near-chance embeddings; and (3) circuit attribution showing adversarial tasks engage attention circuits more than semantic tasks (p = 0.022; Jailbreak patching p < 0.001), with partial cross-model replication. Code release upon publication.

[17] Are You the A-hole? A Fair, Multi-Perspective Ethical Reasoning Framework

Sheza Munir, Ahanaf Rodoshi, Sumin Lee, Feiran Chang, Xujie Si, Syed Ishtiaque Ahmed

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Standard methods for aggregating natural language judgments, such as majority voting, often fail to produce logically consistent results when applied to high-conflict domains, treating differing opinions as noise. We propose a neuro-symbolic aggregation framework that formalizes conflict resolution through Weighted Maximum Satisfiability (MaxSAT). Our pipeline utilizes a language model to map unstructured natural language explanations into interpretable logical predicates and confidence weights. These components are then encoded as soft constraints within the Z3 solver, transforming the aggregation problem into an optimization task that seeks the maximum consistency across conflicting testimony. Using the Reddit r/AmItheAsshole forum as a case study in large-scale moral disagreement, our system generates logically coherent verdicts that diverge from popularity-based labels 62% of the time, corroborated by an 86% agreement rate with independent human evaluators. This study demonstrates the efficacy of coupling neural semantic extraction with formal solvers to enforce logical soundness and explainability in the aggregation of noisy human reasoning.

[18] What Don’t You Understand? Using Large Language Models to Identify and Characterize Student Misconceptions About Challenging Topics

Michael J. Parker, Maria G. Zavala-Cerna

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This study presents a systematic approach to identifying and characterizing student misconceptions in online learning environments through a novel combination of quantitative performance analysis and large language model (LLM) assessment. We analyzed data from 9 course periods across 5 online biomedical science courses, encompassing 3,802 medical student enrollments. Using data from 40-50 topic-focused quizzes per course, we developed a two-stage methodology. First, we identified challenging central topics using quiz-level performance metrics. Second, we employed LLMs to characterize the underlying misconceptions in these high-priority areas. By examining student performance on first attempts across primarily multiple-choice questions (MCQs), we identified consistently challenging topics that were also central to course objectives. We then leveraged recent advances in generative AI to analyze three distinct data sources in combination: quiz question content, student response patterns, and lecture transcripts. This approach revealed actionable insights about student misconceptions that were not apparent from performance data alone. The quality of the LLM-identified misconceptions was rated as excellent by subject matter experts. We also conducted teacher interviews to assess the perceived utility of our topic identification method. Faculty found that data-driven identification of challenging topics was valuable and corroborated their own classroom observations. This methodology provides a scalable approach to characterizing student difficulties in learning environments where quizzes are used. Our findings demonstrate the potential for targeted and potentially personalized interventions in future course iterations, with clear pathways for measuring intervention effectiveness through follow-up quiz performance.

[19] Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

Pooja Guttal, Varun Magotra, Vasudeva Mahavishnu, Natasha Chanto, Sidharth Sivaprasad, Manas Gaur

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Tabular documents such as CSV and Excel files are widely used in enterprise data pipelines, yet existing chunking strategies for retrieval-augmented generation (RAG) are primarily designed for unstructured text and do not account for tabular structure. We propose a structure-aware tabular chunking (STC) framework that operates on row-level units by constructing a hierarchical Row Tree representation, where each row is encoded as a key-value block. STC performs token-constrained splitting aligned with structural boundaries and applies overlap-free greedy merging to produce dense, non-overlapping chunks. This design preserves semantic relationships between fields within a row while improving token utilization and reducing fragmentation. Across evaluations on the MAUD dataset, STC reduces chunk count by up to 40% and 56% compared to standard recursive and key-value based baselines, respectively, while improving token utilization and processing efficiency. In retrieval benchmarks, STC improves MRR from 0.3576 to 0.5945 in a hybrid setting and increases Recall@1 from 0.366 to 0.754 in BM25-only retrieval. These results demonstrate that preserving structure during chunking improves retrieval performance, highlighting the importance of structure-aware chunking for RAG over tabular data.

[20] Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

Charles Weng, Dingwen Li, Alexander Martin

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when the binary label is constrained to a fixed output position, equivalent prompts can induce materially different unsafe probabilities for the same sample. Across multimodal safety benchmarks and multiple VLM families, cross-prompt variance is strongly associated with prompt-level disagreement and higher error, making it a useful fragility diagnostic. A training-free mean ensemble improves NLL on all 14 dataset-model evaluation pairs and ECE on 12/14 relative to a train-selected single-prompt baseline, and wins more head-to-head NLL comparisons than labeled temperature scaling, Platt scaling, and isotonic regression applied to the same prompt. Ranking gains are consistent against the train-selected baseline on both AUROC and AUPRC, and against the full 15-prompt distribution remain consistent on AUPRC while softening on AUROC. Labeled calibration on top of the mean provides further gains when labels are available, identifying prompt averaging as a strong label-free first stage rather than a replacement for calibration. We frame this as a reliability stress test for zero-shot VLM first-token safety scores and recommend prompt-family evaluation with mean aggregation as a standard label-free reliability baseline.

[21] Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Garvin Kruthof

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and 38 research briefs from 24 scientific domains, we find that iterative pressure reliably increases structural complexity and often reduces adherence to original constraints. A restatement probe reveals a dissociation between declarative recall and behavioral adherence, as models accurately restate constraints they simultaneously violate. The knows-but-violates (KBV) rate, measuring constraint non-compliance despite preserved recall, ranges from 8% to 99% across models. Structured checkpointing partially reduces KBV rates but does not close the dissociation, and complexity inflation persists. Human validation against blind raters confirms that the LLM judge under-detects constraint violations, making reported constraint adherence scores conservative. Sensitivity analyses confirm the findings are robust to temperature (0.7 vs.\ 1.0) and pressure type (novelty vs.\ rigor). We release all briefs, prompts, rubrics, transcripts, and scores as an open benchmark.

[22] Budget-Aware Routing for Long Clinical Text

Khizar Qureshi, Geoffrey Martin, Yifan Peng

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: A key challenge for large language models is token cost per query and overall deployment cost. Clinical inputs are long, heterogeneous, and often redundant, while downstream tasks are short and high stakes. We study budgeted context selection, where a subset of document units is chosen under a strict token budget so an off-the-shelf generator can meet fixed cost and latency constraints. We cast this as a knapsack-constrained subset selection problem with two design choices, unitization that defines document segmentation and selection that determines which units are kept. We propose \textbf{RCD}, a monotone submodular objective that balances relevance, coverage, and diversity. We compare sentence, section, window, and cluster-based unitization, and introduce a routing heuristic that adapts to the budget regime. Experiments on MIMIC discharge notes, Cochrane abstracts, and L-Eval show that optimal strategies depend on the evaluation setting. Positional heuristics perform best at low budgets in extractive tasks, while diversity-aware methods such as MMR improve LLM generation. Selector choice matters more than unitization, with cluster-based grouping reducing performance and other schemes behaving similarly. ROUGE saturates for LLM summaries, while BERTScore better reflects quality differences. We release our code at https://github.com/stone-technologies/ACL_budget_paper.

[23] Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

Lehan Pan, Ziyang Tao, Ruoyu Pang, Xiao Wang, Jianjun Zhao, Yanyong Zhang

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Tree-based speculative decoding accelerates autoregressive generation by verifying multiple draft candidates in parallel, but this advantage weakens for sparse Mixture-of-Experts (MoE) models. As the draft tree grows, different branches activate different experts, expanding the union of activated experts and substantially increasing target-side verification cost. We propose EVICT, a training-free, hyperparameter-free, and lossless adaptive verification method for MoE speculative decoding. EVICT makes every verified token count by truncating the draft tree before target verification and retaining only the cost-effective prefix. It leverages fine-grained drafter signals to estimate candidate benefit, combines them with offline-profiled verification cost, and remains highly compatible with the high-performance graph-based serving framework SGLang. Extensive experiments on diverse MoE backbones and benchmarks show that EVICT achieves up to 2.35x speedup over autoregressive decoding and an average 1.21x speedup over the state-of-the-art baseline EAGLE-3, while significantly reducing unnecessary expert activations during verification.

[24] MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

Tianyu Hu, Weikai Lin, Weizhi Zhang, Jing Ma, Song Wang

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Long-term conversational agents must decide which turns to store in external memory, yet recent systems rely on autoregressive LLM generation at every turn to make that decision. We present MemRouter, a write-side memory router that decouples memory admission from the downstream answer backbone and replaces per-turn memory-management decoding with an embedding-based routing policy. MemRouter encodes each turn together with recent context, projects the resulting embeddings through a frozen LLM backbone, and predicts whether the turn should be stored using lightweight classification heads while training only 12M parameters. Under a controlled matched-harness comparison on LoCoMo, where the retrieval pipeline, answer prompts, and QA backbone (Qwen2.5-7B) are held identical, MemRouter outperforms an LLM-based memory manager on every question category (overall F1 52.0 vs 45.6, non-overlapping 95% CIs) while reducing memory-management p50 latency from 970ms to 58ms. Descriptive factorial averaging further shows that learned admission improves mean F1 by +10.3 over random storage, category-specific prompting adds +5.2 over a generic prompt, and retrieval contributes +0.7. These results suggest that write-side memory admission can be learned by a small supervised router, while answer generation remains a separate downstream component in long-horizon conversational QA.

[25] From Backward Spreading to Forward Replay: Revisiting Target Construction in LLM Parameter Editing

Wei Liu, Hongkai Liu, Zhiying Deng, Yee Whye Teh, Wee Sun Lee

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: LLM parameter editing methods commonly rely on computing an ideal target hidden-state at a target layer (referred as anchor point) and distributing the target vector to multiple preceding layers (commonly known as backward spreading) for cooperative editing. Although widely used for a long time, its underlying basis have not been systematically investigated. In this paper, we first conduct a systematic study of its foundations, which helps clarify its capability boundaries, practical considerations, and potential failure modes. Then, we propose a simple and elegant alternative that replaces backward spreading with forward-propagation. Instead of optimizing the target at the last editing layer, we optimize the anchor point at the first editing layer, and then propagate it forward to obtain accurate and mutually compatible target hidden-states for all subsequent editing layers. This approach achieves the same computational complexity as existing methods while producing more accurate layer-wise targets. Our method is simple, without interfering with either the computation of the initial target hidden state or any other components of the subsequent editing pipeline, and thus constituting a benefit for a wide range of LLM parameter editing methods.

[26] Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

Jiawei Wu, DouDou Zhou

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Machine unlearning has emerged as a critical capability for addressing privacy, safety, and regulatory concerns in large language models (LLMs). Existing methods operate at the sequence level, applying uniform updates across all tokens despite only a subset encoding the knowledge targeted for removal. This introduces gradient noise, degrades utility, and leads to suboptimal forgetting. We propose TokenUnlearn, a token-level attribution framework that identifies and selectively targets critical tokens. Our approach combines knowledge-aware signals via masking, and entropy-aware signals to yield importance scores for precise token selection. We develop two complementary strategies: hard selection, applying unlearning only to high-importance tokens, and soft weighting, modulating gradient contributions based on importance scores. Both extend existing methods to token-level variants. Theoretical analysis shows token-level selection improves gradient signal-to-noise ratio. Experiments on TOFU and WMDP benchmarks across three model architectures demonstrate consistent improvements over sequence-level baselines in both forgetting effectiveness and utility preservation.

[27] Language-free Experience at Expo 2025 Osaka

Michael Paul, Kenji Imamura, Xiaolin Wang, Shohei Higashiyama, Masao Utiyama

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In line with the Global Communication Plan 2025, we have pursued the development of multilingual translation technologies to realize a language-barrier-free experience at Expo 2025 Osaka. Our work includes the advancement of simultaneous interpretation systems emphasizing high translation quality and low latency. Key achievements include chunk-based input segmentation, context-aware translation, and multi-engine machine translation technologies. Through demonstration deployments and collaboration with private companies, our technologies have led to real-world applications, with several services and systems showcased at Expo 2025 Osaka.

[28] Agentic AI for Substance Use Education: Integrating Regulatory and Scientific Knowledge Sources

Kosar Haghani, Zahra Kolagar, Mohammed Atiquzzaman

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The delivery of traditional substance education has remained problematic due to challenges in scalability, personalization, and the currency of information in a rapidly evolving substance use landscape. While artificial intelligence (AI) offers a promising frontier for enhancing educational delivery, its application in providing real-time, authoritative substance use education remains largely underexplored. We built an agentic-based AI web application that combined Drug Enforcement Administration records with peer-reviewed literature in real-time to provide transparent context-sensitive substance use education. The system uses retrieval-augmented generation with a carefully filtered corpus of 102 documents and dynamic PubMed queries. Document storage was semantically chunked and placed in a vector representation in order to be easily retrieved. We conducted an expert evaluation study in which a panel of five subject matter experts generated 30 domain-specific questions, and two independent raters assessed 90 system interactions (30 primary questions plus two contextual follow-ups each) using a five-point Likert scale across four criteria: factual accuracy, citation quality, contextual coherence, and regulatory appropriateness. Mean ratings ranged from 4.18 to 4.35 across the four criteria (overall category range: 4.05-4.52), with substantial inter-rater agreement (Cohen’s kappa = 0.78). These findings suggest that agentic AI architectures integrating authoritative regulatory sources with real-time scientific literature represent a promising direction for scalable, accurate, and verifiable health education delivery, warranting further evaluation through longitudinal user studies.

[29] Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines

Aninda Ray

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: A multi-agent pipeline with N agents typically issues N LLM calls per run. Merging agents into fewer calls (compound execution) promises token savings, but naively merged calls silently degrade quality through tool loss and prompt compression. We present Agent Capsules, an adaptive execution runtime that treats multi-agent pipeline execution as an optimization problem with empirical quality constraints. The runtime instruments coordination overhead per group, scores composition opportunity, selects among three compound execution strategies, and gates every mode switch on rolling-mean output quality. A controlled negative result confirms that injecting more context into a merged call worsens compression rather than relieving it, so the framework’s escalation ladder (standard, then two-phase, then sequential) recovers quality by moving toward per-agent dispatch rather than by rewriting merged prompts. On LLM-judged quality, the controller matches a hand-tuned oracle on every measured (model, group, mode) cell: routing compound whenever the oracle would, and reverting to fine whenever quality would fail the floor, without per-model configuration. Against a hand-crafted LangGraph implementation of a 14-agent competitive intelligence pipeline, Agent Capsules uses 51% fewer fine-mode input tokens and 42% fewer compound-mode input tokens, at +0.020 and +0.017 quality respectively. Against a DSPy implementation of a 5-agent due diligence pipeline, the framework uses 19% fewer tokens than uncompiled DSPy at quality parity, and 68% fewer tokens than MIPROv2 at +0.052 quality. Even before compound mode fires, the runtime delivers efficiency through automatic policy resolution, cache-aligned prompts, and topology-aware context injection, matching both hand-tuned and compile-time baselines without training data or per-pipeline engineering.

[30] RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI

Pankaj Gupta, Kartik Bose

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) show promise in radiology but their deployment is limited by computational requirements that preclude use in resource-constrained clinical environments. We investigate whether small language models (SLMs) of 3-4 billion parameters can achieve strong multi-task radiology performance through LoRA fine-tuning, enabling deployment on consumer-grade CPUs. We train Qwen2.5-3B-Instruct and Qwen3-4B on 162K samples spanning 9 radiology tasks - RADS classification across 10 systems, impression generation, temporal comparison, radiology NLI, NER, abnormality detection, N/M staging, and radiology Q&A - compiled from 12 public datasets. Both models are evaluated on up to 500 held-out test samples per task with standardized metrics. Our key findings are: (1) LoRA fine-tuning dramatically improves performance over zero-shot baselines (RADS accuracy +53%, NLI +60%, N-staging +89%); (2) the two models exhibit complementary strengths - Qwen2.5 excels at structured generation tasks while Qwen3 dominates extractive tasks; (3) a task-outed oracle ensemble combining both models achieves the best performance across all tasks; (4) few-shot prompting with fine-tuned models hurts performance, demonstrating that LoRA adaptation is more effective than in-context learning for specialized domains; and (5) models can be quantized to GGUF format (~1.8-2.4GB) for CPU deployment at 4-8 tokens/second on consumer hardware. Our work demonstrates that small, efficiently fine-tuned models - which we collectively call RadLite - can serve as practical multi-task radiology AI assistants deployable entirely on consumer hardware without GPU requirements.

[31] Escaping Mode Collapse in LLM Generation via Geometric Regulation

Xin Du, Kumiko Tanaka-Ishii

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Mode collapse is a persistent challenge in generative modeling and appears in autoregressive text generation as behaviors ranging from explicit looping to gradual loss of diversity and premature trajectory convergence. We take a dynamical-systems view and reinterpret mode collapse as reduced state-space accessibility caused by geometric collapse: during generation, the model’s internal trajectory becomes confined to a low-dimensional region of its representation space. This implies mode collapse is not purely a token-level phenomenon and cannot be reliably solved by symbolic constraints or probability-only decoding heuristics. Guided by this perspective, we propose Reinforced Mode Regulation (RMR), a lightweight, online state-space intervention that regulates dominant self-reinforcing directions in the Transformer value cache (implemented as low-rank damping). Across multiple large language models, RMR substantially reduces mode collapse and enables stable, high-quality generation at extremely low entropy rates (down to 0.8 nats/step), whereas standard decoding typically collapses near 2.0 nats/step.

[32] Impact of Task Phrasing on Presumptions in Large Language Models

Kenneth J. K. Ong

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Concerns with the safety and reliability of applying large-language models (LLMs) in unpredictable real-world applications motivate this study, which examines how task phrasing can lead to presumptions in LLMs, making it difficult for them to adapt when the task deviates from these assumptions. We investigated the impact of these presumptions on the performance of LLMs using the iterated prisoner’s dilemma as a case study. Our experiments reveal that LLMs are susceptible to presumptions when making decisions even with reasoning steps. However, when the task phrasing was neutral, the models demonstrated logical reasoning without much presumptions. These findings highlight the importance of proper task phrasing to reduce the risk of presumptions in LLMs.

[33] ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?

Joey Chan, Yikun Han, Jingyuan Chen, Samuel Fang, Lauren D. Gryboski, Alexandra Lee, Sheel Tanna, Qingqing Zhu, Zhiyong Lu, Lucy Lu Wang, Yue Guo

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Plain Language Summaries (PLS) aim to make research accessible to lay readers, but they are typically written in a one-size-fits-all style that ignores differences in readers’ information needs and comprehension. In health contexts, this limitation is particularly important because misunderstanding scientific information can affect real-world decisions. Large language models (LLMs) offer new opportunities for personalizing PLS, but it remains unclear whether personalization helps, which strategies are most effective, and how to balance personalization with safety. We introduce ReLay, a dataset of 300 participant–PLS pairs from 50 lay participants in both static (expert-written) and interactive (LLM-personalized) settings. ReLay includes user characteristics, health information needs, information-seeking behavior, comprehension outcomes, interaction logs, and quality ratings. We use ReLay to evaluate five LLMs across two personalization methods. Personalization improves comprehension and perceived quality, but it also raises the risk of reinforcing user biases and introducing hallucinations, revealing a trade-off between personalization and safety. These findings highlight the need for personalization methods that are both effective and trustworthy for diverse lay audiences.

[34] Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue

Tom Utting, Mario Giulianelli, Arabella Sinclair

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We model utterance production as probabilistic cost-sensitive choice over contextual alternatives, using information-theoretic notions of cost. We distinguish between goal-directed alternatives that realise a fixed communicative intent and goal-agnostic alternatives defined only by contextual plausibility, allowing us to derive speaker- and listener-oriented interpretations of different cost measures. We present a procedure to generate both types of alternative sets using language models. Analysing production choices in open-ended dialogue under both deterministic and probabilistic cost minimisation, we find that surprisal minimisation relative to goal-directed alternatives provides the strongest predictive account under both analyses. By contrast, uniform information density and length-based costs exhibit weaker and less consistent predictive power across conditions. More broadly, our study suggests that alternative-conditioned optimisation with LM-generated alternatives provides a principled framework for studying speaker and listener pressures in naturalistic language production.

[35] ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks

Ta Thanh Thuy, Jiaqi Zhu, Xuan Liu, Lin Shang, Reihaneh Rabbany, Guillaume Rabusseau, Lihui Chen, Zheng Yilun, Sitao Luan

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.

[36] AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Wenxiang Lin, Juntao Huang, Luhan Zhang, Laili Li, Xiang Bao, Mengyang Zhang, Bing Wang, Shaohuai Shi

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Quantization is a key method for reducing the GPU memory requirement of training large language models (LLMs). Yet, current approaches are ineffective for 4-bit activations and 8-bit gradients, which would easily cause slow convergence or accuracy loss. To address this, we introduce AGoQ, incorporating two new techniques: 1) a layer-aware activation quantization algorithm that allocates appropriate bit-widths for activations of various layers based on their types and pipeline stages to achieve near 4-bit activation storage, and 2) a gradient quantization algorithm that reduces memory usage and shortens communication time by employing 8-bit gradient storage and precision-preserving 8-bit All-Reduce communication. We conduct extensive experiments using different sizes of LLMs on two GPU clusters (up to 64 GPUs), and the experimental results show that our AGoQ reduces the memory by up to 52% and achieves up to 1.34$\times$ improvement of training speed compared to state-of-the-art training systems Megatron-LM (w/ or w/o ZeRO), COAT and DeepSpeed with 8B to 32B LLaMA models, while achieving convergence loss on pretraining and comparable accuracy on downstream tasks with LLaMA architectures.

[37] A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction

Michito Takeshita, Takuro Kawada, Takumi Ohashi, Shunsuke Kitada, Hitoshi Iyatomi

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural information such as spatial relationships among elements. We propose A11y-Compressor, a framework that transforms linearized accessibility trees into compact and structured representations. Our implementation, Compressed-a11y, applies a lightweight and structured transformation pipeline with modal detection, redundancy reduction, and semantic structuring. Experiments on the OSWorld benchmark show that Compressed-a11y reduces input tokens to 22% of the original while improving task success rates by 5.1 percentage points on average.

[38] Structure Liberates: How Constrained Sensemaking Produces More Novel Research Output

James Mooney, Zae Myung Kim, Young-Jun Lee, Dongyeop Kang

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Scientific discovery is an extended process of ideation–surveying prior work, forming hypotheses, and refining reasoning–yet existing approaches treat this phase as a brief preamble despite its central role in research. We introduce SCISENSE, a sensemaking-grounded framework that operationalizes ideation as a structured sequence of eight cognitive stages (Pirolli & Card, 2005). We construct SCISENSE-Traj, a 100K-scale dataset of citation-conditioned research trajectories in two modes: Target, where an LLM reconstructs the ideation path leading to a known paper from its cited works, and Infer, where the LLM proposes novel directions from the same citations. We distill these into SCISENSE-LM, a family of sensemaking LLMs spanning 3B to 70B parameters. Contrary to the assumption that looser supervision promotes greater exploration, Target-trained models achieve a 2.0% improvement in trajectory quality over Infer-trained models while also producing more novel and diverse outputs. This advantage propagates downstream: coding agents conditioned on Target trajectories produce research artifacts with higher executability and quality than those conditioned on Infer trajectories. This suggests that targeted ideation reduces cognitive burden on downstream agents, freeing them to explore more creatively. SCISENSE offers both a practical tool for augmenting LLM-driven research workflows and a principled testbed for studying how planning shapes scientific discovery.

[39] Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus

Daria Boratyn, Damian Brzyski, Albert Leśniak, Wojciech Łukasik, Maciej Rapacz, Jan Rybicki, Wojciech Słomczyński, Dariusz Stolicki

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate the extent to which cosine similarity between paragraph embeddings is invariant under machine translation, using the Manifesto Corpus of over 2,800 political party platforms in 28 languages translated to English via the EU eTranslation service. Rather than measuring translation-induced semantic shift directly we measure the stability of pairwise similarity relationships across embedding models, and use inter-model disagreement on original-language text as a calibrated invariance threshold. This yields a per-language non-inferiority test for four hypotheses about how translation interacts with embedding choice, with verdicts that distinguish languages where translation demonstrably preserves semantic structure from those where it demonstrably degrades it and from those where the available evidence does not resolve the question. The framework is corpus- and pipeline-agnostic and extends naturally to downstream tasks. Applied to our data, it identifies ten languages with translation invariance and four with detectable distortion.

[40] SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models

Shiqiang Cai, Nianhong Niu, Shizhu He, Kang Liu, Jun Zhao

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge. A high-quality scientific taxonomy offers a structured and hierarchical representation of a research field, facilitating literature exploration and topic navigation, as well as enabling downstream applications such as trend analysis, idea generation, and information retrieval. However, existing taxonomy generation approaches often suffer from structural inconsistencies and semantic misalignment across hierarchical levels. Through empirical analysis, we find that these issues largely stem from inadequate modeling of hierarchical semantic consistency. To address this limitation, we propose a semantic-consistent taxonomy generation (SC-Taxo) framework that leverages large language models (LLMs) with hierarchy-aware refinement stages to ensure semantic consistency. Specifically, SC-Taxo introduces a bidirectional heading generation mechanism that jointly performs bottom-up abstraction and top-down semantic constraint, while further capturing peer-level semantic dependencies to enhance horizontal consistency. Experiments on multiple benchmark datasets demonstrate consistent improvements in hierarchy alignment and heading quality, and additional evaluation on Chinese scientific literature validates its robust cross-lingual generalization.

[41] H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations

Passant Elchafei, Hossam Emam, Mohamed Alansary, Monorama Swain, Markus Schedl

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages). Task A evaluates standalone retrieval quality, while Task C assesses end-to-end retrieval-augmented generation (RAG) in multi-turn conversational settings, requiring both accurate answer generation and faithful grounding in retrieved evidence. Our approach implements a hierarchical parent-child RAG pipeline that separates fine-grained child-level retrieval from parent-level context reconstruction during generation. Documents are segmented into overlapping sentence-based child chunks, while full documents are preserved as parent units to provide coherent context. Retrieval combines hybrid dense-sparse search, tunable weighting, and embedding-based similarity rescoring over child chunks. Retrieved evidence is aggregated at the parent level and supplied to an instruction-tuned language model for response generation. H-RAG achieves an nDCG@5 score of 0.4271 on Task A and a harmonic mean score of 0.3241 on Task C (RB_agg: 0.2488, RL_F: 0.2703, RB_llm: 0.6508), underscoring the importance of retrieval configuration and parent-level aggregation in multi-turn RAG performance.

[42] Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

Jasper Dekoninck, Nikola Jovanović, Tim Gehrunger, Kári Rögnvalddson, Ivo Petrov, Chenhao Sun, Martin Vechev

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient for evaluating progress: they are often narrow in scope, quickly saturated, and rarely updated. This makes it hard to compare models reliably and track progress over time. Instead, we need evaluation platforms: continuously maintained systems that run, aggregate, and analyze evaluations across many benchmarks to give a comprehensive picture of model performance within a broad domain. In this work, we build on the original MathArena benchmark by substantially broadening its scope from final-answer olympiad problems to a continuously maintained evaluation platform for mathematical reasoning with LLMs. MathArena now covers a much wider range of tasks, including proof-based competitions, research-level arXiv problems, and formal proof generation in Lean. Additionally, we maintain a clear evaluation protocol for all models and regularly design new benchmarks as model capabilities improve to ensure that MathArena remains challenging. Notably, the strongest model, GPT-5.5, now reaches 98% on the 2026 USA Math Olympiad and 74% on research-level questions, showing that frontier models can now comfortably solve extremely challenging mathematical problems. This highlights the importance of continuously maintained evaluation platforms like MathArena to track the rapid progress of LLMs in mathematical reasoning.

[43] ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

Yunhan Zhao, Zhaorun Chen, Xingjun Ma, Yu-Gang Jiang, Bo Li

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As Large Language Models (LLMs) are increasingly deployed in cross-linguistic contexts, ensuring safety in diverse regulatory and cultural environments has become a critical challenge. However, existing multilingual benchmarks largely rely on general risk taxonomies and machine translation, which confines guardrail models to these predefined categories and hinders their ability to align with region-specific regulations and cultural nuances. To bridge these gaps, we introduce ML-Bench, a policy-grounded multilingual safety benchmark covering 14 languages. ML-Bench is constructed directly from regional regulations, where risk categories and fine-grained rules derived from jurisdiction-specific legal texts are directly used to guide the generation of multilingual safety data, enabling culturally and legally aligned evaluation across languages. Building on ML-Bench, we develop ML-Guard, a Diffusion Large Language Model (dLLM)-based guardrail model that supports multilingual safety judgment and policy-conditioned compliance assessment. ML-Guard has two variants, one 1.5B lightweight model for fast `safe/unsafe’ checking and a more capable 7B model for customized compliance checking with detailed explanations. We conduct extensive experiments against 11 strong guardrail baselines across 6 existing multilingual safety benchmarks and our ML-Bench, and show that ML-Guard consistently outperforms prior methods. We hope that ML-Bench and ML-Guard can help advance the development of regulation-aware and culturally aligned multilingual guardrail systems.

[44] Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memory updates, sparse outcome rewards provide weak supervision, resulting in unstable long-horizon optimization. Drawing on memory schema theory and the functional division between prefrontal regions and hippocampus regions, we introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, we propose Memory Guideline Induction to optimize a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable robustness, transferability, and efficiency.

[45] FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chinese) red-teaming benchmark designed to test an LLM’s refusal of requests that violate financial compliance. Grounded in real-world financial crime cases and ethics standards, the benchmark comprises 14 subcategories spanning financial crimes and ethical violations. Through extensive experiments on general-purpose and finance-specialized LLMs under three representative attack settings, we identify critical vulnerabilities that allow adversarial prompts to bypass compliance safeguards. Further analysis reveals stronger susceptibility in Chinese contexts and highlights the limitations of prompt-level defenses against sophisticated or implicit manipulation strategies.

[46] Characterizing the Expressivity of Local Attention in Transformers

Jiaoda Li, Ryan Cotterell

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before generating the next token. One common variant of attention is called local attention, which restricts each token to aggregating information from a bounded window of predecessors, reducing the quadratic cost of global attention to linear. Although this restriction is usually motivated by efficiency, it has also been found to improve model quality, a phenomenon that has so far lacked a satisfactory explanation. We provide a formal account of this phenomenon in terms of recognizer expressivity. It has been shown that fixed-precision transformers with global attention correspond to a fragment of linear temporal logic containing a single past operator. We additionally prove that adding local attention introduces a second temporal operator, strictly enlarging the class of recognizable regular languages. Moreover, global and local attention are expressively complementary: neither subsumes the other, and combining them yields the richest fragment. Experiments on formal language recognition and natural language modeling corroborate the theory, showing that hybrid global–local transformers outperform their global-only counterparts.

[47] Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

Scott Friedman, Ruta Wheelock, Sonja Schmer-Galunder, Drisana Iverson, Jake Vasilakes, Joan Zheng, Jeffrey Rye, Vasanth Sarathy, Christopher Miller

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpfulness, compassion) and anti-social sentiment (e.g., threats, opposition, blame) at different topics, all in the same message. While many natural language processing (NLP) tools classify or score a text’s overall sentiment as positive, neutral, or negative, these tools cannot report that positive and negative sentiments coexist, and they cannot report the target of those sentiments. This paper presents the Directed Social Regard (DSR) approach to multi-dimensional, multi-valence sentiment analysis, comprised of a pair of transformer-based models that (1) detects span-level targets of sentiment in a message and then (2) scores all spans within the message context along three (-1, 1) axes of regard that are motivated by social science theories of moral disengagement and moral framing. We present a data collection and annotation strategy for DSR dataset construction, a transformer-based architecture for span-level scoring, and a validation study with promising results. We apply the validated DSR model on six third-party datasets of online media and report meaningful correlations between DSR outputs and the labels and topics in these pre-existing social science datasets.

[48] When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a step-wise arithmetic algorithm and two numeric inputs, and must return the final computed value. The benchmark uses simple arithmetic operations but increases complexity through algorithm length and look-back dependencies over intermediate variables. Across 14 models and 55 datasets, average first-answer accuracy drops from 61% on 5-step procedures to 20% on 95-step procedures. Generation-level analysis shows that failures often involve missing answers, premature answers, self-correction after an initial error, under-executed traces, and hallucinated extra steps. These findings suggest that apparent reasoning ability can mask substantial weaknesses in faithful instruction execution.

[49] Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs

Dylan Bouchard

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2407.10853: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2407.10853&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[50] Reinforcement Learning for LLM Post-Training: A Survey

Zhichao Wang, Kiran Ramnath, Bin Bi, Shiva Kumar Pentyala, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Xiang-Bo Mao, Sitaram Asur, Cheng

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2407.16216: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2407.16216&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[51] Bias in Large Language Models: Origin, Evaluation, and Mitigation

Yufei Guo, Muzhe Guo, Juntao Su, Zhou Yang, Mengqiu Zhu, Hongfei Li, Mengyang Qiu, Shuo Shuo Liu

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.10915: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.10915&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[52] Representation in large language models

Cameron Yetman

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.00885: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.00885&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[53] Exploring the System 1 Thinking Capability of Large Reasoning Models

Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.10368: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.10368&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[54] SCAN: Structured Capability Assessment and Navigation for LLMs

Zongqi Wang, Tianle Gu, Chen Gong, Xin Tian, Siqi Bao, Yujiu Yang

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.06698: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.06698&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Jatin Gupta, Akhil Sharma, Saransh Singhania, Ali Imam Abidi

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.22003: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.22003&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[56] ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Zexi Liu, Jingyi Chai, Xinyu Zhu, Shuo Tang, Rui Ye, Bo Zhang, Lei Bai, Siheng Chen

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action step, accelerating experience collection and improving training efficiency; (3) an agentic ML-specific reward module, which unifies varied ML feedback signals into consistent rewards for RL optimization. Leveraging this framework, we train ML-Agent, driven by a 7B-sized Qwen-2.5 LLM for autonomous ML. Despite training on only 9 ML tasks, our 7B-sized ML-Agent achieves comparable performance to agents using much larger proprietary LLMs (e.g., GPT-5) but at significantly lower computational cost, demonstrating strong performance and cross-task generalization.

[57] ToolGrad: Efficient Tool-use Dataset Generation with Textual “Gradients”

Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data generation. We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual “gradients”, and then synthesizes corresponding user queries. This “answer-first” approach led to ToolGrad-500, a dataset generated with more complex tool use, lower cost, and almost 100% pass rate. Experiments show that ToolGrad models outperform those trained on expensive baseline datasets and proprietary LLMs. The ToolGrad source code, dataset, and models are available at https://github.com/zhongyi-zhou/toolgrad.

[58] InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

Anirudh Iyengar Kaniyar Narayana Iyengar, Srija Mukhopadhyay, Adnan Qidwai, Shubhankar Singh, Dan Roth, Vivek Gupta

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types ranging from entity inference and trend correlation to numerical estimation and abstract multi-step reasoning grounded in 2-3 thematically or structurally related charts. We organize the benchmark into three tiers of increasing difficulty: (1) factual reasoning over individual charts, (2) integrative analysis across synthetically aligned chart sets, and (3) semantic inference over visually complex, real-world chart pairs. Our evaluation of state-of-the-art open- and closed-source VLMs reveals consistent and steep accuracy declines as chart complexity increases. We find that models perform better when we decompose multi-entity charts into simpler visual units, underscoring their struggles with cross-chart integration. By exposing these systematic limitations, InterChart provides a rigorous framework for advancing multimodal reasoning in complex, multi-visual environments.

[59] Reasoning-Intensive Regression

Diane Tchuindjo, Omar Khattab

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e., deducing subtle numerical scores from text. Unlike standard language regression tasks such as sentiment or similarity analysis, RiR often appears instead in ad-hoc applications such as rubric-based scoring, modeling dense rewards in complex environments, or domain-specific retrieval, where much deeper analysis of context is required while only limited task-specific training data and computation are available. We cast four realistic problems as RiR tasks to establish an initial benchmark, and use that to test our hypothesis that prompting frozen LLMs and fine-tuning Transformer encoders via gradient descent will both often struggle in RiR. We then propose MENTAT, a simple and lightweight method that combines batch-reflective prompt optimization with neural ensemble learning. MENTAT achieves up to 65% improvement over both baselines, though substantial room remains for future advances.

[60] Structured In-context Environment Scaling for Large Language Model Reasoning

Peng Yu, Zeyuan Zhao, Shao Zhang, Luoyi Fu, Xinbing Wang, Ying Wen

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) have achieved significant advancements in reasoning capabilities through reinforcement learning (RL) via environmental exploration. As the intrinsic properties of the environment determine the abilities that LLMs can learn, the environment plays a important role in the RL finetuning process. An ideal LLM reasoning environment should possess three core characteristics: scalability, generalizable reasoning, and verifiability. However, existing mathematical and coding environments are difficult to scale due to heavy reliance on expert annotation, while the skills learned in game-based environments are too specialized to generalize. To bridge this gap, we introduce the \textbf{S}tructured \textbf{I}n-context \textbf{E}nvironment (SIE) framework. SIE achieves scalability by automatically constructing reasoning environments from large-scale structured data, where the rich compositional patterns naturally support generalizable reasoning. Moreover, the explicit schemas and reasoning chains in structured data provide a foundation for rule-based verifiability. Experimental results show that SIE framework not only achieves substantial improvements in in-domain structured reasoning, but also enables the learned compositional reasoning skills to generalize effectively to out-of-domain mathematical and logical reasoning tasks. We further explored learning in information-limited partial SIEs and found that LLMs can infer the missing information through exploring the environment, leading to robust reasoning improvements and generalization performance.

[61] Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

Haolin Yang, Hakaze Cho, Naoya Inoue

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.

[62] Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

Haolin Yang, Hakaze Cho, Kaize Ding, Naoya Inoue

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of “key heads” most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.

[63] ADVICE: Answer-Dependent Verbalized Confidence Estimation

Ki Jung Seo, Sehun Lim, Taeuk Kim

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent progress in large language models (LLMs) has enabled them to communicate their confidence in natural language, improving transparency and reliability. However, this expressiveness is often accompanied by systematic overconfidence, whose underlying causes remain poorly understood. In this work, we analyze the dynamics of verbalized confidence estimation and identify answer-independence – the failure to condition confidence on the model’s own answer – as a primary driver of this behavior. To address this, we introduce ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that promotes answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration, while exhibiting strong generalization to unseen settings without degrading task performance. We further demonstrate that these gains stem from enhanced answer dependence, shedding light on the origins of overconfidence and enabling trustworthy confidence verbalization.

[64] Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often costly, inflexible, and opaque, we propose a memory-augmented framework that leverages LLM-generated critiques grounded in labeled data. Our framework uses episodic memory to store instance-level critiques - capturing specific past experiences - and semantic memory to distill these into reusable, task-level guidance. Across a diverse set of tasks and models, our best performing self-critique strategy (utilizing both memory types) yields an average improvement of 8.1 percentage points over the zero shot baseline, and 4.6pp over a RAG-based baseline that relies only on labels. However, improvements vary substantially across models and domains. To explain this variation, we introduce suggestibility - a novel metric capturing how receptive a model is to external reasoning provided in context. We use suggestibility to illuminate when and why memory augmentation succeeds or falls short. Beyond accuracy gains, we find pre-computed critiques substantially reduce inference-time computation for reasoning models, cutting thinking tokens by an average of 31.95% across all datasets by substituting for reasoning that the model would otherwise perform independently. Our findings highlight the conditions under which memory-driven, reflective learning can serve as a lightweight, interpretable, and efficient strategy for improving LLM adaptability.

[65] PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Feijie Wu, Weiwu Zhu, Yuxiang Zhang, Soumya Chatterjee, Jiarong Zhu, Fan Mo, Rong Luo, Jing Gao

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents’ tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step’s importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final answer, plus an auxiliary term indicating whether the step’s tool calls satisfy formatting constraints and execute successfully. Using these step-wise importance estimates, PORTool updates the policy to generate efficient tool-call steps, guided by both local comparisons within each branching decision and the overall quality of entire trajectories. Experiments show that PORTool improves final-answer accuracy while reducing tool-call steps compared with state-of-the-art policy-optimization baselines, and ablation studies confirm the robustness of the proposed step-wise importance estimates.

[66] Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices

Liu Zai, Iraklis Klampanos

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers, yet little work has been done to optimize it for edge-side inference. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-3, LLaMa-3, and Qwen-2.5. After breaking down and analyzing the logic of the original cl100k pretokenizer, we introduced a new pretokenization algorithm with linear time complexity and constant, trivial memory usage, suited for edge scenarios. Test results show that it increases microbenchmarking throughput by up to $ 2.48\times $ and delivers a $ 1.14\times $ improvement in overall throughput across the entire Byte-level BPE encoding process, depending on the dataset, while providing identical results as the baseline Regex-based tokenizer.

[67] Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

Jen-tse Huang, Chang Chen, Shiyang Lai, Wenxuan Wang, Michelle R. Kaufman, Mark Dredze

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually annotated dataset of 200 short videos spanning four health domains. This dataset provides fine-grained annotations for three deceptive patterns-experimental errors, logical fallacies, and fabricated claims-each verified by evidence such as national standards and academic literature. We evaluate eight frontier MLLMs across five modality settings. Experimental results demonstrate that Gemini-2.5-Pro achieves the highest performance in the multimodal setting with a belief score of 71.5/100, while o3 performs the worst at 35.2. Furthermore, we investigate social cues that induce false beliefs in videos and find that models are susceptible to biases like authoritative channel IDs.

[68] Reward Modeling from Natural Language Human Feedback

Zongqi Wang, Rui Wang, Yuchuan Wu, Yiyao Yu, Pinyi Zhang, Shaoning Sun, Yujiu Yang, Yongbin Li

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs generate reasoning chains ending with critiques and preference labels, and RLVR then relies on the correctness of the preference labels as the training reward. However, in this paper, we demonstrate that such binary classification tasks make GRMs susceptible to guessing correct outcomes without sound critiques. Consequently, these spurious successes introduce substantial noise into the reward signal, thereby impairing the effectiveness of reinforcement learning. To address this issue, we propose Reward Modeling from Natural Language Human Feedback (RM-NLHF), which leverages natural language feedback to obtain process reward signals, thereby mitigating the problem of limited solution space inherent in binary tasks. Specifically, we compute the similarity between GRM-generated and human critiques as the training reward, which provides more accurate reward signals than outcome-only supervision. Additionally, considering that human critiques are difficult to scale up, we introduce Meta Reward Model (MetaRM) which learns to predict process reward from datasets with human critiques and then generalizes to data without human critiques. Experiments on multiple benchmarks demonstrate that our method consistently outperforms state-of-the-art GRMs trained with outcome-only reward, confirming the superiority of integrating natural language over binary human feedback as supervision.

[69] Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

Zhaoyi Li, Jiatong Li, Gangwei Jiang, Linqi Song, Defu Lian, Ying Wei

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems. However, recent studies reveal a sharp performance drop in reasoning hop generalization scenarios, where the required number of reasoning steps exceeds training distributions while the underlying algorithm remains unchanged. The internal mechanisms driving this failure remain poorly understood. In this work, we conduct a systematic study on tasks from multiple domains, and find that errors concentrate at token positions of a few critical error types, rather than being uniformly distributed. Closer inspection reveals that these token-level erroneous predictions stem from internal competition mechanisms: certain attention heads, termed erroneous processing heads (ep heads), tip the balance by amplifying incorrect reasoning trajectories while suppressing correct ones. Notably, removing individual ep heads during inference can often restore the correct predictions. Motivated by these insights, we propose test-time correction of reasoning, a lightweight intervention method that dynamically identifies and deactivates ep heads in the reasoning process. Extensive experiments across different tasks and LLMs show that it consistently improves reasoning hop generalization, highlighting both its effectiveness and potential.

Jinu Lee, Kyoung-Woon On, Simeng Han, Arman Cohan, Julia Hockenmaier

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.01020: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.01020&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[71] Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation

Lakshan Cooray, Deshan Sumanathilaka, Pattigadapa Venkatesh Raju

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00665: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00665&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[72] Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization

Runquan Gui, Jie Wang, Zhihai Wang, Chi Ma, Jianye Hao, Feng Wu

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03141: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03141&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[73] Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03216: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03216&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[74] Language Models Struggle to Use Representations Learned In-Context

Michael A. Lepori, Tal Linzen, Ann Yuan, Katja Filippova

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.04212: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.04212&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[75] Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood, Hongtai Wei, Sudeep Das, Danny Nightingale, Meg Watson, Charles Pollnow V

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.03565: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.03565&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[76] BanglaSocialBench: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

Tanvir Ahmed Sijan, S. M Golam Rifat, Pankaj Chowdhury Partha, Md. Tanjeed Islam, Md. Musfique Anwar

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.15949: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.15949&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[77] Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

Yanchen Wu, Tenghui Lin, Yingli Zhou, Fangyuan Zhang, Qintian Guo, Xun Zhou, Sibo Wang, Xilin Liu, Yuchi Ma, Yixiang Fang

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.01707: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.01707&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[78] How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Gregory N. Frank

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.04385: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.04385&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[79] Turing or Cantor: That is the Question

Eugene Eberbach

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10418: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10418&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[80] On Cost-Effective LLM-as-a-Judge Improvement Techniques

Ryan Lail, Luke Markham

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.13717: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.13717&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[81] VGR: Visual Grounded Reasoning

Jiacong Wang, Zijian Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.11991: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.11991&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[82] SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning

Rania Elbadry, Sarfraz Ahmad, Ahmed Heakl, Dani Bouch, Momina Ahsan, Muhra AlMahri, Marwa Elsaid khalil, Yuxia Wang, Salem Lahlou, Sophia Ananiadou, Veselin Stoyanov, Jimin Huang, Xueqing Peng, Preslav Nakov, Zhuohan Xie

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.19098: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.19098&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[83] SCOPE:Planning for Hybrid Querying over Clinical Trial Data

Suparno Roy Chowdhury, Manan Roy Choudhury, Tejas Anvekar, Muhammad Ali Khan, Kaneez Zahra Rubab Khakwani, Mohamad Bassam Sonbol, Irbaz Bin Riaz, Vivek Gupta

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.25120: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.25120&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[84] From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

Alex Petrov, Alexander Gusak, Denis Mukha, Dima Korolev

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27906: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27906&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[85] FlowBot: Inducing LLM Workflows with Bilevel Optimization and Textual Gradients

Hongyeon Yu, Young-Bum Kim, Yoon Kim

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26258: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26258&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[86] LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps

Keito Inoshita, Xiaokang Zhou, Akira Kawai, Katsutoshi Yada

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27345: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27345&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[87] Exploring Applications of Transfer-State Large Language Models: Cognitive Profiling and Socratic AI Tutoring

Minori Noguchi

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27454: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27454&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[88] Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

Ansar Aynetdinov, Patrick Haller, Alan Akbik

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.28075: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.28075&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[89] Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Ziyuan Zhang, Darcy Wang, Ningyuan Chen, Rodrigo Mansur, Vahid Sarhangian

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.09901: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.09901&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[90] Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment

Kundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.00166: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.00166&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[91] ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

Yiran Wu, Mauricio Velazco, Andrew Zhao, Manuel Raúl Meléndez Luján, Srisuma Movva, Yogesh K Roy, Quang Nguyen, Roberto Rodriguez, Qingyun Wu, Michael Albada, Julia Kiseleva, Anand Mudgerikar

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.14201: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.14201&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[92] Knowing When to Defer: Selective Prediction for Responsible Knowledge Tracing

Joshua Mitton, Prarthana Bhattacharyya, Ralph Abboud, Simon Woodhead

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.21514: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.21514&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[93] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

Gregory N. Frank

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.18280: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.18280&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[94] Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yi R., Fung

Main category: cs.CL

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26173: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26173&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

cs.CV

[95] Learning from the Unseen: Generative Data Augmentation for Geometric-Semantic Accident Anticipation

Yanchen Guan, Haicheng Liao, Chengyue Wang, Xingcheng Liu, Jiaxun Zhang, Keqiang Li, Zhenning Li

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Anticipating traffic accidents is a critical yet unresolved problem for autonomous driving, hindered by the inherent complexity of modeling interactions between road users and the limited availability of diverse, large-scale datasets. To address these issues, we propose a dual-path framework. On the one hand, we employ a video synthesis pipeline that, guided by structured prompts, derives feature distributions from existing corpora and produces high-fidelity synthetic driving scenes consistent with the statistical patterns of real data. On the other hand, we design a graph neural network enriched with semantic cues, enabling dynamic reasoning over both spatial and semantic relations among participants. To validate the effectiveness of our approach, we release a new benchmark dataset containing standardized, finely annotated video sequences that cover a broad spectrum of regions, weather, and traffic conditions. Evaluations across existing datasets and our new benchmark confirm notable gains in both accuracy and anticipation lead time, highlighting the capacity of the proposed framework to mitigate current data bottlenecks and enhance the reliability of autonomous driving systems.

[96] Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps

Sungjun Cho

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Hybrid-capture novel view synthesis combines images at substantially different camera distances (e.g., aerial drone and ground-level views). Standard 3D Gaussian Splatting (3DGS), trained for 30K iterations with one rendered view per optimizer step, under-fits the minority regime by 1-3 dB on five hybrid-capture benchmarks. We isolate the lever that closes this gap. Among compute-matched alternatives – vanilla 60K iterations, magnitude corrections (GradNorm), direction-aware near/far gradient surgery, projective preconditioning, confidence-gated sample-level surgery, and a random two-view-per-step control – the simplest structural change wins: rendering two views per optimizer step. The pairing rule (geometry-defined near/far, random, or active loss-disparity) does not change PSNR beyond seed variance on any of the five scenes; the structural change of having two views per step does. We propose a variance-decomposition framework that predicts and explains this finding: under bimodal camera regimes, between-regime gradient variance turns out to be small relative to within-regime variance in 3DGS, so structured and random pairings are variance-equivalent in expectation, and the variance halving from two-view accumulation itself is the dominant effect. We verify the framework on five scenes whose camera-altitude bimodality coefficients span [0.55, 1.00], and we report the negative result that direction-aware projection, magnitude correction, confidence gating, and an active loss-disparity pairing all fall within seed variance of random two-view pairing. The two-view structural lever transfers cleanly to the Scaffold-GS and Pixel-GS backbones. We position this work as an honest characterization of which training-side axes do and do not move PSNR for hybrid-capture 3DGS, together with the framework that explains why.

[97] AIDA-ReID: Adaptive Intermediate Domain Adaptation for Generalizable and Source-Free Person Re-Identification

Sundas Iqbal, Qing Tian, Danish Ali, Jianping Gou, Weihua Oue

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Person re-identification (Re-ID) aims to match images of the same individual across non-overlapping camera views and remains challenging due to domain shifts caused by variations in illumination, background, camera characteristics, and population distributions. Although supervised models perform well under matched training and testing conditions, their performance degrades significantly when deployed in unseen environments. Existing intermediate domain approaches such as IDM and IDM++ alleviate this gap by constructing bridge feature distributions between domains; however, they rely on fixed mixing strategies and joint source-target access, limiting their applicability to multi-source and source-free settings. To address these limitations, this paper proposes Adaptive Intermediate Domain Adaptation (AIDA), also referred to as Source-Free Multi-Source Intermediate Domain Adaptation (SF-MIDA). The proposed framework treats intermediate-domain learning as a dynamically regulated process, where feature mixing and regularization strength are adaptively controlled using feedback signals derived from model uncertainty and training stability. A multi-source intermediate domain generator synthesizes diverse intermediate representations, while a pseudo-mirror regularization strategy preserves identity consistency under domain perturbations. Extensive experiments across domain generalization and source-free settings demonstrate the effectiveness of the proposed framework.

[98] GAFSV-Net: A Vision Framework for Online Signature Verification

Himanshu Singhal, Suresh Sundaram

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Online signature verification (OSV) requires distinguishing skilled forgeries from genuine samples under high intra-class variability and with very few enrollment samples. Existing deep learning methods operate directly on raw temporal sequences, restricting them to 1D architectures and preventing the use of pretrained 2D vision backbones. We bridge this gap with GAFSV-Net, which represents each signature as a six-channel asymmetric Gramian Angular Field image: three kinematic channels (pen speed, pressure derivative, direction angle) are each encoded into complementary GASF and GADF matrices that capture pairwise temporal co-occurrence and directional transition structure respectively. A dual-branch ConvNeXt-Tiny encoder processes GASF and GADF independently, with bidirectional cross-attention enabling each branch to query discriminative patterns from the other before metric-space projection. Training uses semi-hard triplet loss with skilled-forgery hard-negative injection; verification is performed via cosine similarity against a small enrollment prototype. We evaluate on DeepSignDB and BiosecurID, outperforming all sequence-based baselines trained under identical objectives, demonstrating that the representational gain of 2D temporal encoding is consistent and independent of training procedure, with ablations characterising each design choice’s contribution.

[99] Real-Time Frame- and Event-based Object Detection with Spiking Neural Networks on Edge Neuromorphic Hardware: Design, Deployment and Benchmark

Udayanga G. W. K. N. Gamage, Yan Zeng, Cesar Cadena, Matteo Fumagalli, Silvia Tolu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-time object detection on energy-constrained platforms is critical for applications such as UAV-based inspection, autonomous navigation, and mobile robotics. Spiking neural networks (SNNs) on neuromorphic hardware are believed to be significantly more energy-efficient than conventional artificial neural networks (ANNs). In this work, we present a comprehensive methodology for designing general SNN detection architectures targeting neuromorphic platforms, along with the engineering adaptations required to deploy them on the state-of-the-art Neuromorphic processor, Intel Loihi 2. We benchmark SNN-based object detection on Loihi 2 using both frame-based and event-based datasets, comparing performance with ANN-based detection on the NVIDIA Jetson Orin Nano, NVIDIA Jetson Nano B01, and the Apple M2 CPU. Our results show that SNNs on Loihi 2 can perform real-time detection while achieving the lowest per-inference dynamic energy among all platforms. Also, Loihi 2 outperforms the other platforms in terms of power consumption, though ANNs on Jetson Orin Nano achieve higher inference rates. Furthermore, our ANN-to-SNN distillation-aware training enables SNNs to recover 87-100% of the detection accuracy of their ANN counterparts while maintaining lower inference latency; without distillation, SNNs exhibit an 11-27% accuracy drop. These results highlight the potential of neuromorphic systems for energy-efficient, real-time object detection at the edge.

[100] CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection

Hang Wang, Chao Shen, Chenhao Lin, Minghui Yang, Lei Zhang, Cong Wang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The proliferation of advanced AI video synthesis techniques poses an unprecedented challenge to digital video authenticity. Existing AI-generated video (AIGV) detection methods primarily focus on uni-modal or spatiotemporal artifacts, but they overlook the rich cues within the visual-textual cross-modal space, especially the temporal stability of semantic alignment. In this work, we identify a distinctive fingerprint in AIGVs, termed cross-modal temporal artifact (CMTA). Unlike real videos that exhibit natural temporal fluctuations in cross-modal alignment due to semantic variations, AIGVs display unnaturally stable semantic trajectories governed by given input prompts. To bridge this gap, we propose the CMTA framework, a cross-modal detection approach that captures these unique temporal artifacts through joint cross-modal embedding and multi-grained temporal modeling. Specifically, CMTA leverages BLIP to generate frame-level image captions and utilizes CLIP to extract corresponding visual-textual representations. A coarse-grained temporal modeling branch is then designed to characterize temporal fluctuations in cross-modal alignment with a GRU. In parallel, a fine-grained branch is constructed to capture intricate inter-frame variations from integrated visual-textual features with a Transformer encoder. Extensive experiments on 40 subsets across four large-scale datasets, including GenVideo, EvalCrafter, VideoPhy, and VidProM, validate that our approach sets a new state-of-the-art while exhibiting superior cross-generator generalization. Code and models of CMTA will be released at https://github.com/hwang-cs-ime/CMTA

[101] From Images2Mesh: A 3D Surface Reconstruction Pipeline for Non-Cooperative Space Objects

Bala Prenith Reddy Gopu, Patrick Quinn, George M. Nehma, Madhur Tiwari, Matt Ueckermann, David Hinckley, Christopher McKenna

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: On-orbit inspection imagery is crucial as it enables characterization of non-cooperative resident space objects, providing the geometry and structural condition essential for active debris removal and on-orbit servicing mission planning. However, most existing neural implicit surface reconstruction methods have been confined to synthetic or hardware-in-the-loop data with known camera poses and controlled illumination. In this work, we present a pipeline for neural implicit surface reconstruction of non-cooperative space objects from monocular inspection imagery. We demonstrate it on publicly released ISS inspection footage from the STS-119 mission and publicly released on-orbit inspection footage of an H-IIA rocket upper stage. We find that segmentation-based background removal is essential for successful camera pose estimation from real on-orbit footage, where background variation between frames caused direct processing to fail entirely. We further incorporate photometric correction of per-frame exposure variations and analyze its behavior across datasets, finding that performance in shadowed regions varies with the illumination characteristics of the input footage.

[102] VkSplat: High-Performance 3DGS Training in Vulkan Compute

Jingxiang Chen, Mohamed Ibrahim, Yang Liu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present VkSplat, a high-performance, cross-vendor 3D Gaussian Splatting (3DGS) training pipeline implemented fully in Vulkan compute, addressing performance and compatibility limitation of existing training pipelines. With various optimizations, we achieve $3.3\times$ speed and $33%$ VRAM reduction over CUDA+PyTorch baseline, maintaining quality, and demonstrating compatibility across GPU vendors. To the best of our knowledge, this is the first fully-Vulkan-based 3DGS training pipeline that achieves state-of-the-art performance. Code: \href{https://github.com/harry7557558/vksplat}{https://github.com/harry7557558/vksplat}

[103] Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis

Shadid Yousuf, S. M. Mahbubur Rahman, Mohammed Imamul Hassan Bhuiyan

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The deployment of artificial intelligence in medical imaging is hindered by high computational complexity and resource-intensive processing of volumetric data. Although chest computed tomography (CT) volumes offer richer diagnostic information than projection radiography, their use in AI-based diagnosis remains limited due to the computational burden of processing uncompressed volumetric images (typically stored in NIfTI or DICOM format). Addressing the growing need for low-resource deployment and efficient electronic data transfer, we investigate the utilization of JPEG-compressed chest CT volumes for thoracic abnormality detection. We propose Feature Attention Style Transfer (FAST), a novel distillation framework that transfers both activation patterns and structural relationships from high-fidelity CT representations to a spatiotemporal visual encoder operating on compressed inputs. By combining Gram-matrix-based attention style preservation with dual-attention feature alignment, FAST enables robust feature extraction from degraded volumes. Furthermore, we introduce Structured Factorized Projection (SFP), leveraging Block Tensor Train decomposition as a parameter-efficient alternative to dense projection layers, reducing projection-head parameters by almost half. Our contrastive learning pipeline, CT-Lite, integrates these components with a SigLIP-based multimodal alignment objective. Experiments on CT-RATE, NIDCH, and Rad-ChestCT demonstrate that CT-Lite achieves AUROC within 5-7% of the uncompressed-input baseline across all three datasets, despite operating on compressed inputs with significantly fewer parameters, paving the way for AI-based clinical evaluation under resource constraints.

[104] Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation

Aishani Pathak, Hasti Seifi

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Egocentric pose estimation for Augmented Reality (AR) and assistive devices requires not just accurate predictions but guaranteed uncertainty regions. Conformal prediction (CP) provides such guarantees without retraining, but we show that standard CP with a single fixed threshold achieves nominal 90% overall coverage while covering only ~60% of the hardest 25% of frames (Q4) – a ~30 percentage-point conditional coverage gap consistent across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. We further show that a geodesic SE(3) nonconformity score identifies physically harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth camera displacement for geodesic Q4 frames. To close the coverage gap, we propose DINOv2-Bridge adaptive CP: a two-stage difficulty estimator trained on a single source participant that transfers cross-participant without any images at test time, improving Q4 coverage from ~0.75 to ~0.93 while maintaining overall coverage at the 90% target.

[105] MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

Xijia Wei, Yuan Fang, Kevin Chetty, Youngjun Cho, Nadia Bianchi-Berthouze

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose estimation predictions. We evaluate it across three datasets based on leave-one-person-out cross-validation with rigorous statistical testing. MAEPose consistently outperforms state-of-the-art baselines by up to 22.1% in MPJPE p<0.05, and maintains robust accuracy under zero-shot bystander interference with only a 6.5% error increase. Ablation studies confirm that both the pre-training and the heatmap decoder contribute substantially, while modality analysis indicates that leveraging Range-Doppler video as input achieves better pose estimation performance than Range-Azimuth or their fusion, with lower computational cost.

[106] Remote SAMsing: From Segment Anything to Segment Everything

Osmar Luiz Ferreira de Carvalho, Osmar Abílio de Carvalho Júnior, Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent quality-coverage trade-off: strict thresholds yield precise masks but leave most of the image unsegmented, while relaxed thresholds increase coverage at the cost of mask quality; and (2) large images must be tiled, fragmenting objects across tile boundaries. We propose Remote SAMsing, an open-source pipeline that solves both problems without modifying SAM2 or requiring training data. For coverage, a multi-pass algorithm runs SAM2 repeatedly on each tile, painting accepted masks black between passes to simplify the scene for the next iteration, and relaxing quality thresholds only when coverage gains stagnate, ensuring that the most precise masks are always captured first. For spatial consistency, contextual padding and a parameter-free best-match merge reconstruct objects fragmented across tile boundaries. Evaluated on seven scenes (5cm to 4.78m GSD), the pipeline raises coverage from 30–68% (single-pass SAM2) to 91–98%. Ablation experiments quantify the contribution of each component to coverage and detection quality. Per-class evaluation shows that SAM2 transfers well to discrete RS objects (buildings 95%, cars 82–93% Det@0.5) with segment boundaries 3–8$\times$ more precise than SLIC and Felzenszwalb baselines. Tile size functions as an implicit scale parameter: reducing it from $1{,}000$ to 250 raises Det@0.5 from 56% to 85%, outperforming SAM2’s built-in multi-scale mechanism. The pipeline generalizes to MNF false-color imagery without retraining (99.5% ASA) and scales to production-sized images: a 1.94 billion pixel Potsdam mosaic achieved 97% coverage without quality degradation.

[107] REALM: An RGB and Event Aligned Latent Manifold for Cross-Modal Perception

Vincenzo Polizzi, David B. Lindell, Jonathan Kelly

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Event cameras provide several unique advantages over standard frame-based sensors, including high temporal resolution, low latency, and robustness to extreme lighting. However, existing learning-based approaches for event processing are typically confined to narrow, task-specific silos and lack the ability to generalize across modalities. We address this gap with REALM, a cross-modal framework that learns an RGB and Event Aligned Latent Manifold by projecting event representations into the pretrained latent space of RGB foundation models. Instead of task-specific training, we leverage low-rank adaptation (LoRA) to bridge the modality gap, effectively unlocking the geometric and semantic priors of frozen RGB backbones for asynchronous event streams. We demonstrate that REALM effectively maps events into the ViT-based foundation latent space. Our method allows us to perform downstream tasks like depth estimation and semantic segmentation by simply transferring linear heads trained on the RGB teacher. Most significantly, REALM enables the direct, zero-shot application of complex, frozen image-trained decoders, such as MASt3R, to raw event data. We demonstrate state-of-the-art performance in wide-baseline feature matching, significantly outperforming specialized architectures. Code and models are available upon acceptance.

[108] When Do Diffusion Models learn to Generate Multiple Objects?

Yujin Jeong, Arnas Uselis, Iro Laina, Seong Joon Oh, Anna Rohrbach

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Text-to-image diffusion models achieve impressive visual fidelity, yet they remain unreliable in multi-object generation. Despite extensive empirical evidence of these failures, the underlying causes remain unclear. We begin by asking how much of this limitation arises from the data itself. To disentangle data effects, we consider two regimes across different dataset sizes: (1) concept generalization, where each individual concept is observed during training under potentially imbalanced data distributions, and (2) compositional generalization, where specific combinations of concepts are systematically held out. To study these regimes, we introduce mosaic (Multi-Object Spatial relations, AttrIbution, Counting), a controlled framework for dataset generation. By training diffusion models on mosaic, we find that scene complexity plays a dominant role rather than concept imbalance, and that counting is uniquely difficult to learn in low-data regimes. Moreover, compositional generalization collapses as more concept combinations are held out during training. These findings highlight fundamental limitations of diffusion models and motivate stronger inductive biases and data design for robust multi-object compositional generation.

[109] An End-to-End Decision-Aware Multi-Scale Attention-Based Model for Explainable Autonomous Driving

Maryam Sadat Hosseini Azad, Shahriar Baradaran Shokouhi, Amir Abbas Hamidi Imani, Shahin Atakishiyev, Randy Goebel

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The application of computer vision is gradually increasing across various domains. They employ deep learning models with a black-box nature. Without the ability to explain the behavior of neural networks, especially their decision-making processes, it is not possible to recognize their efficiency, predict system failures, or effectively implement them in real-world applications. Due to the inevitable use of deep learning in fully automated driving systems, many methods have been proposed to explain their behavior; however, they suffer from flawed reasoning and unreliable metrics, which have prevented a comprehensive understanding of complex models in autonomous vehicles and hindered the development of truly reliable systems. In this study, we propose a multi-scale attention-based model in which driving decisions are fed into the reasoning component to provide case-specific explanations for each decision simultaneously. For quantitative evaluation of our model’s performance, we employ the F1-score metric, and also proposed a new metric called the Joint F1 score to demonstrate the accurate and reliable performance of the model in terms of Explainable Artificial Intelligence (XAI). In addition to the BDD-OIA dataset, the nu-AR dataset is utilized to further validate the generalization capability and robustness of the proposed network. The results demonstrate the superiority of our reasoning network over the classic and state-of-the-art models.

[110] Efficient Spatio-Temporal Vegetation Pixel Classification with Vision Transformers

Alan Gomes, Anderson Gonçalves, Samuel Felipe dos Santos, Nathan Felipe Alves, Magna Soelma Beserra de Moura, Bruna de Costa Alberton, Leonor Patricia C. Morellato, Ricardo da Silva Torres, Jurandy Almeida

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Plant phenology-the study of recurrent life cycle events-is essential for understanding ecosystem dynamics and their responses to climate change impacts. While Unmanned Aerial Vehicles (UAVs) and near-surface cameras enable high-resolution monitoring, identifying plant species across time remains computationally challenging. State-of-the-art approaches, specifically Multi-Temporal Convolutional Networks (CNNs), rely on rigid multi-branch architectures that scale poorly with longer time series and require large spatial context windows. In this paper, we present an extensive study on optimizing Vision Transformers (ViTs) for efficient spatio-temporal vegetation pixel classification. We conducted a comprehensive ablation study analyzing seven key design dimensions, including: (i) data normalization; (ii) spectral arrangement; (iii) boundary handling; (iv) spatial context window shape and size; (v) tokenization strategies; (vi) positional encoding; and (vii) feature aggregation strategies. Our method was evaluated on two datasets from the Brazilian Cerrado biome, Serra do Cipó (aerial imagery) and Itirapina (near-surface imagery). Experimental results demonstrate that our ViT approach offers a substantial improvement in computational efficiency while maintaining competitive classification performance. Notably, our ViT reduces Floating Point Operations (FLOPs) by an order of magnitude and maintains constant parameter complexity regardless of the time series length, whereas the CNN baseline scales linearly. Our findings confirm that ViTs are a robust, scalable solution for resource-constrained phenological monitoring systems.

[111] Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration

Zhili Li, Kangyang Chai, Zhihao Wang, Xiaowei Jia, Yanhua Li, Gengchen Mai, Sergii Skakun, Dinesh Manocha, Yiqun Xie

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Super-resolution (SR) techniques have made major advances in reconstructing high-resolution images from low-resolution inputs. The increased resolution provides visual enhancement and utility for monitoring tasks. In particular, SR has been increasingly developed for satellite-based Earth observation, with applications in urban planning, agriculture, ecology, and disaster response. However, existing SR studies and benchmarks typically use fidelity metrics such as PSNR or SSIM, whereas the true utility of super-resolved images lies in supporting downstream tasks such as land cover classification, biomass estimation, and change detection. To bridge this gap, we introduce GeoSR-Bench, a downstream task-integrated SR benchmark dataset to evaluate SR models beyond fidelity metrics. GeoSR-Bench comprises spatially co-located, temporally aligned, and quality-controlled image pairs from about 36,000 locations across diverse land covers, spanning resolutions from 500m to 0.6m. To the best of our knowledge, GeoSR-Bench is the first SR benchmark that directly connects improved image resolution from SR models with downstream Earth monitoring tasks, including land cover segmentation, infrastructure mapping, and biophysical variable estimation. Using GeoSR-Bench, we benchmark GAN, transformer, neural operator, and diffusion-based SR models on perceptual quality and downstream task performance. We conduct experiments with 270 settings, covering 2 cross-platform SR tasks, 9 SR models, 3 downstream task models, and 5 downstream tasks for each SR task. The results show that improvements in traditional SR metrics often do not correlate with gains in task performance, and the correlations can be negative, indicating that these metrics provide limited guidance for selecting superior models for downstream tasks. This reveals the need to integrate downstream tasks into SR model development and evaluation.

[112] Online Self-Calibration Against Hallucination in Vision-Language Models

Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Perception Mismatch: the student model is forced to align with fine-grained details beyond its perceptual capacity, learning to guess rather than to see. To obtain reliable self-supervision for online learning, we identify a Generative-Discriminative Gap within LVLMs, where models exhibit higher accuracy on discriminative verification than open-ended generation. Leveraging this capability, we propose \textbf{O}nline \textbf{S}elf-\textbf{CA}lib\textbf{R}ation (OSCAR), a framework that integrates Monte Carlo Tree Search with a Dual-Granularity Reward Mechanism to construct preference data and iteratively refines the model via Direct Preference Optimization. Extensive experiments demonstrate that OSCAR achieves state-of-the-art performance on hallucination benchmarks while improving general multimodal capabilities.

[113] Pose-Aware Diffusion for 3D Generation

Zihan Zhou, Luxi Chen, Jingzhi Zhou, Yuhao Wan, Min Zhao, Baoyu Fan, Chongxuan Li

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.

[114] CURE-OOD: Benchmarking Out-of-Distribution Detection for Survival Prediction

Wenjie Zhao, Jia Li, Mingrui Liu, Jing Wang, Yunhui Guo

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: ``How long can I live and remain free of cancer?’’ is often the first question a patient asks after receiving a cancer diagnosis and treatment. Accurate survival prediction helps alleviate psychological distress and supports risk stratification and personalized treatment planning. Recent survival prediction frameworks have shown strong performance using computed tomography (CT) images. However, variations in imaging acquisition introduce out-of-distribution (OOD) samples caused by covariate shifts that undermine model reliability. Despite this challenge, to our knowledge, no existing benchmark systematically studies OOD detection in cancer survival prediction. To address this gap, we introduce the Cancer sURvival bEnchmark for OOD Detection (CURE-OOD), the first benchmark for systematically evaluating OOD detection in survival prediction under controlled acquisition-induced distribution shifts. CURE-OOD defines scanner-parameter-based training, in-distribution (ID), and OOD test splits across four survival prediction tasks. Our experiments show that covariate shifts notably reduce survival prediction performance. It also shows that mainstream classification-oriented OOD detectors can fail in survival prediction. Finally, we include HazardDev as a simple survival-aware reference baseline for OOD detection. CURE-OOD enables systematic analysis of how distribution shifts affect both downstream survival performance and OOD detectability.

[115] Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking

Nhat-Tan Do, Le-Huy Tu, Nhi Ngoc-Yen Nguyen, Dieu-Phuong Nguyen, Trong-Hop Do

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-object tracking (MOT) is critical in numerous real-world applications, including surveillance, autonomous driving, and robotics. Accurately predicting object motion is fundamental to MOT, but current methods struggle with the complexities of real-world, non-linear motion (e.g., sudden stops, sharp turns). While recent research has gravitated towards increasingly complex and computationally expensive generative models to tackle this problem, their practical utility is often constrained. This paper challenges that paradigm, arguing that such complexity is not only unnecessary but can be outperformed by a more efficient, purpose-built approach. We introduce the Temporal Convolutional Motion Predictor (TCMP), a novel framework for MOT that leverages a modified Temporal Convolutional Network (TCN) featuring dilated convolutions and a regression head. This design allows for effective motion prediction across arbitrary temporal context lengths. Experimental results demonstrate that our approach achieves state-of-the-art performance, specifically improves upon the previous best method in several key metrics: HOTA (a measure of overall tracking accuracy) increases from 62.3% to 63.4%, IDF1 (a measure of identity preservation) rises from 63.0% to 65.0%, and AssA (a measure of association accuracy) improves from 47.2% to 49.1%. Significantly, TCMP achieves this performance while being highly efficient; it has only 0.014 times the parameters and requires only 0.05 times the computational cost (FLOPs) compared to the SOTA method. while is only 0.014 times the size (in terms of parameters) and requires only 0.05 times the computational cost (in terms of FLOPs). These findings highlight the robustness of our method to advance MOT systems by ensuring adaptability, accuracy, and efficiency in complex tracking environments.

[116] Flow matching for Sentinel-2 super-resolution: implementation, application, and implications

Dakota Hester, Vitor S. Martins, Lucas B. Ferreira, Thainara M. A. Lima, Juliana A. Araújo

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Developing robust techniques for super-resolution of satellite imagery involves navigating commonly observed trade-offs between spectral fidelity and perceptual quality. In this work, we introduce a flow matching model for 4x super-resolution of 10-m Sentinel-2 visible and near-infrared bands over the conterminous United States (CONUS) using a dataset of 120,851 10-m Sentinel-2 and 2.5-m resampled NAIP imagery pairs acquired on the same day. Our results showed that the flow matching model outperformed diffusion and Real-ESRGAN models in pixel-wise accuracy in a single sampling step using the Euler method. When evaluated with a second-order Midpoint solver, our model generated perceptually realistic super-resolved imagery in only 20 sampling steps, effectively navigating the perception-distortion trade-off at inference time without retraining. We used this model to produce a super-resolved 2.5-m 4-band CONUS imagery product derived from 2025 10-m Sentinel-2 annual composites, consisting of over 1.58 trillion pixels. We further evaluated the use of super-resolved data on a land cover classification task using semantic segmentation models. Finally, we generated a yearly 2.5-m land cover product for the Chesapeake Bay watershed for 2020-2025. An accuracy assessment against 25,000 ground truth points revealed an overall accuracy of 89.11% for the annual land cover product. We conclude that flow matching is an effective generative modeling approach for super-resolution of Sentinel-2 imagery compared to diffusion and Generative Adversarial Network-based methods, and has strong implications for expanding access to high-resolution imagery for geospatial applications that demand fine spatial detail.

[117] RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

Ben Wan, Yan Feng, Zihan Tang, Weizhe Huang, Yuting Zeng, Jia Wang, Tongxuan Liu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventional vision-language models (VLMs) fail to preserve textual fidelity due to improper compression mechanisms. By analyzing the decoding process of DeepSeek-OCR, we find that a distinct two-stage reading trajectory: the model initially prioritizes the majority of high-norm tokens, then subsequently redistributes its attention to the remaining ones. Motivated by this insight, we propose RTPrune, a two-stage token pruning method tailored for DeepSeek-OCR. In the first stage, we prioritize high-norm visual tokens that capture salient textual and structural information. In the second stage, the remaining tokens are paired and merged based on optimal transport theory to achieve efficient feature aggregation. We further introduce a dynamic pruning ratio that adapts to token similarity and textual density for OCR tasks, enabling a better efficiency-accuracy trade-off. Extensive experiments demonstrate state-of-the-art performance, as evidenced by 99.47% accuracy and 1.23$\times$ faster prefill on OmniDocBench, achieved with 84.25% token retention when applied to DeepSeek-OCR-Large.

[118] SIMON: Saliency-aware Integrative Multi-view Object-centric Neural Decoding

YuSheng Lin, Ji-Hwa Tsai, Chun-Shu Wei

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent EEG-to-image retrieval methods leverage pretrained vision encoders and foveation-inspired priors, but typically assume a fixed, center-focused view. This center bias conflicts with content-driven human attention, creating a geometric-semantic dissociation between visual features and EEG responses. We propose SIMON, a saliency-aware multi-view framework for zero-shot EEG-to-image retrieval. SIMON combines foreground segmentation and saliency prediction to select fixation centers via Saliency-Aware Sampling (SAS), then generates foveated views that emphasize informative object regions while suppressing background clutter. On THINGS-EEG, SIMON achieves state-of-the-art performance in both intra-subject and inter-subject settings, reaching an average Top-1 accuracy of 69.7% and 19.6%, respectively, consistently outperforming recent competitive baselines. Analyses across sampling granularity, EEG channel topology, and visual/brain encoder backbones further support the robustness of saliency-aware multi-view integration. Our code and models are publicly available at https://github.com/simonlink666/SIMON.

[119] BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception

Kang Yang, Tianci Bu, Peng Wang, Deying Li, Yongcai Wang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Most existing heterogeneous cooperative perception methods depend on prior preparation like offline joint training or tailored collaborator-model adaptation. Such preprocessing is, however, generally impractical in real scenarios, as agents are usually independently trained by different developers and meet occasionally online. This work investigates \emph{preparation-free heterogeneous cooperative perception}, where agents use independently trained single-agent detectors without any pre-deployment coordination. We find direct cross-agent fusion under this setting greatly underperforms ego-only perception. We present BOLT, a lightweight plug-and-play module that adapts neighboring features online via ego-as-teacher distillation, requiring only ego predictions without ground-truth labels. BOLT leverages high-confidence ego perception features to guide cross-agent feature-domain alignment, while enabling neighbors to contribute features in the ego’s low-confidence regions. With only 0.9M trainable parameters, BOLT improves AP@50 by up to 32.3 points over vanilla unadapted fusion in the preparation-free setting. It consistently outperforms ego-only results on DAIR-V2X and OPV2V, across different encoder pairs and fusion strategies. Code: https://github.com/sidiangongyuan/BOLT.

[120] Beyond Heuristics: Learnable Density Control for 3D Gaussian Splatting

Zhenhua Ning, Xin Li, Jun Yu, Guangming Lu, Yaowei Wang, Wenjie Pei

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While 3D Gaussian Splatting (3DGS) has demonstrated impressive real-time rendering performance, its efficacy remains constrained by a reliance on heuristic density control. Despite numerous refinements to these handcrafted rules, such methods inherently lack the flexibility to adapt to diverse scenes with complex geometries. In this paper, we propose a paradigm shift for density control from rigid heuristics to fully learnable policies. Specifically, we introduce \textbf{LeGS}, a framework that reformulates density control as a parameterized policy network optimized via Reinforcement Learning (RL). Central to our approach is the tailored effective reward function grounded in sensitivity analysis, which precisely quantifies the marginal contribution of individual Gaussians to reconstruction quality. To maintain computational tractability, we derive a closed-form solution that reduces the complexity of reward calculation from $O(N^2)$ to $O(N)$. Extensive experiments on the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets demonstrate that \textbf{LeGS} significantly outperforms state-of-the-art methods, striking a superior balance between reconstruction quality and efficiency. The code will be released at https://github.com/AaronNZH/LeGS

[121] LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations

Huangbiao Xu, Huanqi Wu, Xiao Ke, Yuxin Peng

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-world multimodal learning is often hindered by missing modalities. While Incomplete Multimodal Learning (IML) has gained traction, existing methods typically rely on the unrealistic assumption of full-modal availability during training to provide reconstruction supervision or cross-modal priors. This paper tackles the more challenging setting of IML under training-time incomplete observations, which precludes reliance on a ``God’s eye view’’ of complete data. We propose LIMSSR (LLM-Driven Incomplete Multimodal Sequence-to-Score Reasoning), a framework that reformulates this challenge as a conditional sequence reasoning task. LIMSSR leverages the semantic reasoning capabilities of Large Language Models via Prompt-Guided Context-Aware Modality Imputation and Multidimensional Representation Fusion to infer latent semantics from available contexts without direct reconstruction. To mitigate hallucinations, we introduce a Mask-Aware Dual-Path Aggregation to dynamically calibrate inference uncertainty. Extensive experiments on three Action Quality Assessment datasets demonstrate that LIMSSR significantly outperforms state-of-the-art baselines without relying on complete training data, establishing a new paradigm for data-efficient multimodal learning. Code is available at https://github.com/XuHuangbiao/LIMSSR.

[122] Scaling Video Understanding via Compact Latent Multi-Agent Collaboration

Kerui Chen, Jinglu Wang, Jianrong Zhang, Ming Li, Yan Lu, Hehe Fan

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-modal large language models (MLLMs) advance vision language understanding but face inherent limitations in long-video tasks due to bounded perception context budgets. Existing agentic methods mitigate this via rule-based preprocessing, yet often suffer from information loss, high cost, and reliance on textual intermediates. We propose MACF, an end-to-end Multi-Agent Collaboration Framework that decouples per-agent perception budgets from global video complexity, enabling scalable video understanding while preserving visual fidelity. MACF partitions videos into segments for locally budgeted agents and enables holistic reasoning via an agent-native latent communication protocol. Each agent encodes partial observations into compact, task-sufficient tokens in a shared embedding space, allowing efficient and information-preserving collaboration by a central coordinator. We introduce a curriculum training strategy that progressively enforces semantic alignment, evidence summarization, and cross-agent coordination. Extensive experiments on diverse video understanding benchmarks show that MACF consistently outperforms state-of-the-art MLLMs and multi-agent systems under identical budget constraints, demonstrating the effectiveness of our latent collaboration for scalable video understanding.

[123] From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

Yearim Kim, Sangyu Han, Nojun Kwak

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Modern vision models achieve remarkable accuracy, but explaining where evidence arises, what the model encodes, and how internal computations assemble that evidence remains fragmented. We introduce an iERF-centric framework that unifies local, global, and mechanistic interpretability around a single analysis unit: the pointwise feature vector (PFV) paired with its instance-specific Effective Receptive Field (iERF). On the local side, Sharing Ratio Decomposition (SRD) expresses each PFV as a mixture of upstream PFVs via sharing ratios and propagates iERFs to construct class-discriminative saliency maps. SRD yields high-resolution, activation-faithful explanations, is robust to targeted manipulation and noise, and remains activation-agnostic across common nonlinearities. For the global view, we introduce Concept-Anchored Feature Explanation (CAFE), which utilizes the iERF as a semantic label, grounding abstract latent vectors in verifiable pixel-level evidence. With CAFE, we address the challenge of non-localized sparse autoencoder latents–especially in Transformers, where early self-attention mixes distant context. To answer how representations are composed through depth, we propose the Interlayer Concept Graph with Interlayer Concept Attribution (ICAT), which quantifies concept-to-concept influence while isolating layer pairs; an interlayer insertion, deletion protocol identifies Integrated Gradients as the most faithful instantiation. Empirically, across ResNet50, VGG16, and ViTs, our framework outperforms baselines in both fidelity and robustness, successfully interprets dispersed SAE features, and exposes dominant concept routes in correct, misclassified, and adversarial cases. Grounded in iERFs, our approach provides a coherent, evidence-backed map from pixels to concepts to decisions.

[124] Leveraging Vision-Language Models as Weak Annotators in Active Learning

Phuong Ngoc Nguyen, Kaito Shiku, Ryoma Bise, Seiichi Uchida, Shinnosuke Matsuo

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Active learning aims to reduce annotation cost by selectively querying informative samples for supervision under a limited labeling budget. In this work, we investigate how vision-language models (VLMs) can be leveraged to further reduce the reliance on costly human annotation within the active learning paradigm. To this end, we find that the reliability of VLMs varies significantly with label granularity in fine-grained recognition tasks: they perform poorly on fine-grained labels but can provide accurate coarse-grained labels. Leveraging this property, we propose an active learning framework that combines fine-grained human annotations with coarse-grained VLM-generated weak labels through instance-wise label assignment. We further model the systematic noise in VLM-generated labels using a small set of trusted full labels. Experiments on CUB200 and FGVC-Aircraft show that the proposed framework consistently outperforms existing active learning methods under the same annotation budget.

[125] High-Speed Vision Improves Zero-Shot Semantic Understanding of Human Actions

Yongpeng Cao, Yuji Yamakawa

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Understanding human actions from visual observations is essential for human–robot interaction, particularly when semantic interpretation of unfamiliar or hard-to-annotate actions is required. In scenarios such as rapid and less common activities, collecting sufficient labeled data for supervised learning is challenging, making zero-shot approaches a practical alternative for semantic understanding without task-specific training. While recent advances in large-scale pretrained models enable such zero-shot reasoning, the impact of temporal resolution, especially for rapid and fine-grained motions, remains underexplored. In this study, we investigate how temporal resolution affects zero-shot semantic understanding of high-speed human actions. Using kendo as a representative case of rapid and subtle motion patterns, we propose a training-free pipeline that combines a pre-trained video-language model for semantic representation with large language model-based reasoning for pairwise action comparison. Through controlled experiments across multiple frame rates (120 Hz, 60 Hz, and 30 Hz), we show that higher temporal resolution significantly improves semantic separability in zero-shot settings. We further analyze the role of tracking-based human joint information under both full and partial observation scenarios. Quantitative evaluation using a nearest-class prototype strategy demonstrates that high-speed video provides more stable and interpretable semantic representations for fast actions. These findings highlight the importance of temporal resolution in training-free action recognition and suggest that high-speed perception can enhance semantic understanding capabilities.

[126] GOR-IS: 3D Gaussian Object Removal in the Intrinsic Space

Yonghao Zhao, Yupeng Gao, Jian Yang, Jin Xie, Beibei Wang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent advances in Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made it standard practice to reconstruct 3D scenes from multi-view images. Removing objects from such 3D representations is a fundamental editing task that requires complete and seamless inpainting of occluded regions, ensuring consistency in geometry and appearance. Although existing methods have made notable progress in improving inpainting consistency, they often neglect global lighting effects, leading to physically implausible results. Moreover, these methods struggle with view-dependent non-Lambertian surfaces, where appearance varies across viewpoints, leading to unreliable inpainting. In this paper, we present 3D Gaussian Object Removal in the Intrinsic Space (GOR-IS), a novel framework for physically consistent and visually coherent 3D object removal. Our approach decomposes the scene into intrinsic components and explicitly models light transport to maintain global lighting effects consistency. Furthermore, we introduce an intrinsic-space inpainting module that operates directly in the material and lighting domains, effectively addressing the challenges posed by non-Lambertian surfaces. Extensive experiments on both synthetic and real-world datasets demonstrate that our framework substantially improves the physical consistency and visual coherence of object removal, outperforming existing methods by 13% in perceptual similarity (LPIPS) and 2dB in peak signal-to-noise ratio (PSNR). Code is publicly available at https://applezyh.github.io/GOR-IS-project-page/

[127] End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

Wenda Chu, Bingliang Zhang, Jiaqi Han, Yizhuo Li, Linjie Yang, Yisong Yue, Qiushan Guo

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Autoregressive image modeling relies on visual tokenizers to compress images into compact latent representations. We design an end-to-end training pipeline that jointly optimizes reconstruction and generation, enabling direct supervision from generation results to the tokenizer. This contrasts with prior two-stage approaches that train tokenizers and generative models separately. We further investigate leveraging vision foundation models to improve 1D tokenizers for autoregressive modeling. Our autoregressive generative model achieves strong empirical results, including a state-of-the-art FID score of 1.48 without guidance on ImageNet 256x256 generation.

[128] PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation

Nan Lei, Yuan-Ming Li, Ling-An Zeng, Liang Xu, Zhi-Wei Xia, Hui-Wen Huang, Fa-Ting Hong, Wei-Shi Zheng

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Despite substantial progress in text-driven 3D human motion synthesis, generating realistic multi-person interaction sequences remains challenging. Notably, body inter-penetration is a pervasive issue from both data acquisition to the generated results, which significantly undermines the realism and usability. Previous generative models either ignored this issue or introduced computationally expensive mesh-level loss functions to alleviate inter-body collisions. In this paper, we propose a general-purpose and computationally efficient optimization strategy named PhysiGen to explicitly integrate collision-aware physical constraints for human-human interaction generation. Specifically, we simplify the high-resolution human body mesh into geometric primitives to greatly reduce the cost of inter-person collision detection. Moreover, we identify the collision regions as the guidance of the optimization directions. PhysiGen is plug-and-play and can be readily integrated into existing human interaction generation models. Extensive cross-dataset and cross-model experiments show that our method can effectively reduce interpenetration and significantly improve visual coherence and physical plausibility compared to the state-of-the-art methods.

[129] IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations

Weichen Liu, Yixin Yang, Changsheng Chen, Alex Kot

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Suspect face generation remains a technical challenge in crime investigations. Traditional sketch-drawing workflows suffer from low efficiency and quality, while diffusion-based approaches still face intrinsic limitations on conditional ambiguity for text-to-image models and sampling variance for one-shot generation. We proposed IdentiFace, a novel diffusion-based framework for identifiable suspect face generation, which addressed these issues through (1) multi-modal input design to strengthen conditional control, and (2) an iterative generation pipeline enabling identifiable feature adjustment. We additionally contributed a facial identity loss and two task-specific datasets. Comprehensive experiments on synthetic datasets and in real-world scenarios indicate that IdentiFace achieves superior performance over existing methods, especially in terms of identity retrieval, and shows strong potential for practical applications.

[130] Vesselpose: Vessel Graph Reconstruction from Learned Voxel-wise Direction Vectors in 3D Vascular Images

Rajalakshmi Palaniappan, Christoph Karg, Nemesio Navarro-Arambula, Peter Hirsch, Kristin Kraeker, Lisa Mais, Dagmar Kainmueller

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Blood vessel segmentation and -tracing are essential tasks in many medical imaging applications. Although numerous methods exist, the prevailing segment-then-fix paradigm is fundamentally limited regarding its suitability for modeling the task of complete and topologically accurate vascular network reconstruction. Here, we propose an approach to extract topologically more accurate vascular graphs from 3D image data, building upon highly successful ideas from the related biomedical tasks of cell segmentation and -tracking. Our approach first predicts voxel-wise vessel direction vectors joint with standard vessel segmentation masks. Second, to extract the vascular graph from these predictions, we introduce a direction-vector-guided extension of the TEASAR algorithm. Our approach achieves state-of-the-art performance on three benchmark datasets, spanning both synthetic and real imagery. We further demonstrate the applicability of our approach to challenging 3D micro-CT scans of rat heart vasculature. Finally, we propose meaningful and interpretable measures of topological error, namely false splits and false merges for graphs. Overall, our approach substantially improves the topological accuracy of reconstructed vascular graphs, being able to separate closely apposed vessel segments and handle multiple vascular trees within a single volume.

[131] Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation

Nadav Z. Cohen, Ofir Abramovich, Ariel Shamir

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Text-to-image diffusion models generate images by gradually converting white Gaussian noise into a natural image. White Gaussian noise is well suited for producing diverse outputs from a single text prompt due to its absence of structure. However, this very property limits control over, and predictability of, specific visual attributes, as the noise is not human-interpretable. In this work, we investigate the characteristics of the input noise in diffusion models. We show that, although all frequencies in white Gaussian noise have comparable statistical energy, low-frequency components primarily determine the images global structure and color composition, while high-frequency components control finer details. Building on this observation, we demonstrate that simple manipulations of the low-frequency noise using low-frequency image priors can effectively condition the generation process to reconstruct these low-frequency visual cues. This allows us to define a simple, training-free method with minimal overhead that steers overall image structure and color, while letting high-frequency components freely emerge as fine details, enabling variability across generated outputs.

[132] Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds

Heejoon Moon, Jongwoo Lee, Jeonggon Kim, Je Hyeong Hong

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The emergence of deep neural networks capable of revealing high-fidelity scene details from sparse 3D point clouds has raised significant privacy concerns in visual localization involving private maps. Lifting map points to randomly oriented 3D lines is a well-known approach for obstructing undesired recovery of the scene images, but these lines are vulnerable to a density-based attack that can recover the point cloud geometry by observing the neighborhood statistics of lines. With the aim of nullifying this attack, we present a new privacy-preserving scene representation called \emph{sphere cloud}, which is constructed by lifting all points to 3D lines crossing the centroid of the map, resembling points on the unit sphere. Since lines are most dense at the map centroid, the sphere cloud mislead the density-based attack algorithm to incorrectly yield points at the centroid, effectively neutralizing the attack. Nevertheless, this advantage comes at the cost of i) a new type of attack that may directly recover images from this cloud representation and ii) unresolved translation scale for camera pose estimation. To address these issues, we introduce a simple yet effective cloud construction strategy to thwart new attack and propose an efficient localization framework to guide the translation scale by utilizing absolute depth maps acquired from on-device time-of-flight (ToF) sensors. Experimental results on public RGB-D datasets demonstrate sphere cloud achieves competitive privacy-preserving ability and localization runtime while not excessively compensating the pose estimation accuracy compared to other depth-guided localization methods.

[133] 2D-SuGaR: Surface-Aware Gaussian Splatting for Geometrically Accurate Mesh Reconstruction

Prajwal Gupta C. R., Divyam Sheth, Jinjoo Ha, Mirela Ostrek, Justus Thies

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful technique for generating photorealistic renderings of a scene in real-time. However, the volumetric nature of 3DGS limits its ability to accurately capture surface geometry. To address this, 2D Gaussian Splatting (2DGS) was proposed to enable view-consistent and geometrically accurate surface reconstruction from multi-view images. However, 2DGS can be sensitive to the initialization of the Gaussian primitives. Reliance on Structure-from-Motion (SfM) initializations, which can produce poor estimates on challenging image sets, may lead to subpar results. In this work, we enhance 2DGS by incorporating monocular depth and normal priors to improve both geometric accuracy and robustness. We propose a depth-guided initialization strategy for Gaussians and introduce a clustering-based technique for pruning degenerate Gaussians. We evaluate our method on the DTU dataset, where it achieves state-of-the-art results in mesh reconstruction while preserving high-quality novel view synthesis.

[134] Federated Distillation for Whole Slide Image via Gaussian-Mixture Feature Alignment and Curriculum Integration

Luru Jing, Cong Cong, Yanyuan Chen, Yongzhi Cao

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated learning (FL) offers a promising framework for collaborative digital pathology by enabling model training across institutions. However, real-world deployments face heterogeneity arising from diverse multiple instance learning (MIL) architectures and heterogeneous feature extractors across institutions. We propose FedHD, a novel FL framework that performs local Gaussian-mixture feature alignment tailored for WSI analysis. Instead of exchanging model parameters, each client independently distills semantically rich synthetic feature representations aligned with the distribution of real WSIs. To preserve diagnostic diversity, FedHD adopts a one-to-one distillation strategy, generating a synthetic counterpart for each real slide to avoid over-compression. During federation, a curriculum-based integration strategy progressively incorporates cross-site synthetic features into local training once performance plateaus. Furthermore, an optional interpretation module reconstructs pseudo-patches from synthetic embeddings, enhancing transparency. FedHD is architecture-agnostic, privacy-preserving, and supports personalized yet collaborative training across diverse institutions. Experiments on TCGA-IDH, CAMELYON16, and CAMELYON17 show that FedHD consistently outperforms state-of-the-art federated and distillation baselines.

[135] Jailbreaking Vision-Language Models Through the Visual Modality

Aharon Azulay, Jan Dubiński, Zhuoyun Li, Atharv Mittal, Yossi Gandelsman

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The visual modality of vision-language models (VLMs) is an underexplored attack surface for bypassing safety alignment. We introduce four jailbreak attacks exploiting the vision component: (1) encoding harmful instructions as visual symbol sequences with a decoding legend, (2) replacing harmful objects with benign substitutes (e.g., bomb -> banana) then prompting for harmful actions using the substitute term, (3) replacing harmful text in images (e.g., on book covers) with benign words while visual context preserves the original meaning, and (4) visual analogy puzzles whose solution requires inferring a prohibited concept. Evaluating across six frontier VLMs, our visual attacks bypass safety alignment and expose a cross-modality alignment gap: text-based safety training does not automatically generalize to harmful intent conveyed visually. For example, our visual cipher achieves 40.9% attack success on Claude-Haiku-4.5 versus 10.7% for an equivalent textual cipher. To further our insight into the attack mechanism, we present preliminary interpretability and mitigation results. These findings highlight that robust VLM alignment requires treating vision as a first-class target for safety post-training.

[136] Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models

Jiayu Li, Jiaxin Qi, Sheng Zhou, Jiaqiang Huang, Xiansheng Hua

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Contrastive vision-language models like CLIP exhibit remarkable zero-shot generalization. However, prompt tuning remains highly sensitive to label noise, as mislabeled samples generate disproportionately large gradients that can overwhelm pre-trained priors. We argue that because CLIP already provides a near-optimal initialization, adaptation should be inherently conservative, particularly against the extreme gradient updates common in noisy settings. To this end, we propose Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free method for intrinsic gradient suppression. By applying a sequential probabilistic normalization, DSPT induces a self-adaptive saturation zone that suppresses gradients from high-error noisy samples while maintaining informative updates. We also provide both theoretical analysis and empirical evidence about how this mechanism achieves adaptive suppression. This design transforms ``gradient vanishing’’, traditionally a training bottleneck, into a principled noise-filtering shield for label-noise prompt tuning. Extensive experiments confirm that this simple, drop-in design achieves state-of-the-art robustness across various noisy benchmarks, outperforming methods with complex architectures and handcrafted hyperparameters.

[137] Robust Fusion of Object-Level V2X for Learned 3D Object Detection

Lukas Ostendorf, Lennart Reiher, Onn Haran, Lutz Eckstein

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by line-of-sight and field-of-view constraints. These inherent limitations may cause onboard perception to fail under occlusions or poor visibility conditions. In parallel, cooperative awareness via vehicle-to-everything (V2X) communication is becoming increasingly available, enabling vehicles and infrastructure to share their own state as object-level information that complements onboard perception. In this work, we study how such V2X information can be integrated into 3D object detection and how robust the resulting system is to realistic V2X imperfections. Using the nuScenes dataset, we emulate object-level cooperative awareness messages from ground truth, injecting controlled noise and object dropout to mimic real-world conditions such as latency, localization errors, and low V2X penetration rates. We convert these messages into a dedicated bird’s-eye view (BEV) input and fuse them into a BEVFusion-style detector. Our results demonstrate that while object-level cooperative information can substantially improve detection performance, achieving an NDS of 0.80 under favorable conditions, models trained on idealized data become fragile and over-reliant on V2X. Conversely, our proposed noise-aware training strategy, coupled with explicit confidence encoding, enhances robustness, maintaining performance gains even under severe noise and reduced V2X penetration.

[138] Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors

Hao Wei, Yanhui Zhou, Chenyang Ge, Saeed Anwar, Ajmal Mian

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Most recent extreme rescaling methods struggle to preserve semantically consistent structures and produce realistic details, due to the severely ill-posed nature of low- to high-resolution mapping under scaling factors of $16\times$ or higher. To alleviate the above problems, we propose FaithEIR, a diffusion-based framework for extreme image rescaling. Inspired by singular value decomposition, we develop learnable reversible transformation that enables invertible downscaling and upscaling in the latent space. To compensate for information loss due to quantization, we propose an adaptive detail prior, a high-frequency dictionary that captures the empirical average of commonly occurring structures in the training data. Finally, we design a lightweight pixel semantic embedder to provide semantic conditioning for the pretrained diffusion model. We present extensive experimental results demonstrating that our FaithEIR consistently outperforms state-of-the-art methods, achieving superior reconstruction fidelity and perceptual quality. Our code, model weights, and detailed results are released at https://github.com/cshw2021/FaithEIR.

[139] BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis

Massimo Rondelli, Francesco Pivi, Maurizio Gabbrielli

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automatic generation of executable Blender code from natural language remains challenging, with state-of-the-art LLMs producing frequent syntactic errors and geometrically inconsistent objects. We present BlenderRAG, a retrieval-augmented generation system that operates on a curated multimodal dataset of 500 expert-validated examples (text, code, image) across 50 object categories. By retrieving semantically similar examples during generation, BlenderRAG improves compilation success rates from 40.8% to 70.0% and semantic normalized alignment from 0.41 to 0.77 (CLIP similarity) across four state-of-the-art LLMs, without requiring fine-tuning or specialized hardware, making it immediately accessible for deployment. The dataset and code will be available at https://github.com/MaxRondelli/BlenderRAG.

[140] UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unified multimodal framework that leverages VDM priors for versatile video generation. UniVidX formulates pixel-aligned tasks as conditional generation in a shared multimodal space, adapts to modality-specific distributions while preserving the backbone’s native priors, and promotes cross-modal consistency during synthesis. It is built on three key designs. Stochastic Condition Masking (SCM) randomly partitions modalities into clean conditions and noisy targets during training, enabling omni-directional conditional generation instead of fixed mappings. Decoupled Gated LoRA (DGL) introduces per-modality LoRAs that are activated when a modality serves as the generation target, preserving the strong priors of the VDM. Cross-Modal Self-Attention (CMSA) shares keys and values across modalities while keeping modality-specific queries, facilitating information exchange and inter-modal alignment. We instantiate UniVidX in two domains: UniVid-Intrinsic, for RGB videos and intrinsic maps including albedo, irradiance, and normal; and UniVid-Alpha, for blended RGB videos and their constituent RGBA layers. Experiments show that both models achieve performance competitive with state-of-the-art methods across distinct tasks and generalize robustly to in-the-wild scenarios, even when trained on fewer than 1,000 videos. Project page: https://houyuanchen111.github.io/UniVidX.github.io/

[141] InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization

Jaeyoung Chung, Suyoung Lee, Kyoung Mu Lee

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present a training-free approach for controllable 3D inpainting based on initial noise optimization. In the structured 3D latent diffusion framework, we observe that the underlying geometric structure is established during the early stages of the diffusion process and exhibits high sensitivity to the initial noise. Such characteristics compromise stability in tasks like inpainting and editing, where the model must ensure strict alignment with the existing context while synthesizing a new structure. In this paper, we introduce a strategy to optimize the initial noise within the structured 3D latent diffusion framework, ensuring high-fidelity 3D inpainting. Specifically, we update the initial noise by leveraging a backpropagation approximation grounded in the rectified flow model, with the spectral parameterization specially designed for robust and efficient structured 3D latent optimization. Experiments demonstrate consistent improvements in contextual consistency and prompt alignment over representative training-free inpainting baselines, establishing initial noise control as an independent dimension for 3D inpainting, orthogonal to conventional sampling trajectory manipulation.

[142] Prediction of Alzheimer’s Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

Seowung Leem, Yunchao Yang, Adam J. Woods, Ruogu Fang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The systemic, metabolic, lifestyle factors have established associations with Alzheimer’s Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using UK Biobank CFPs, DL models were trained using 62,876 images from 44,501 unique participants to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

[143] DMDSC: A Dynamic-Margin Deep Simplex Classifier for Open-Set Recognition on Medical Image Datasets

Vishal, Arnav Aditya, Nitin Kumar, Saurabh J. Shigwan

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Medical imaging datasets are often characterized by extreme class imbalances, where rare pathologies are significantly underrepresented compared to common conditions. This imbalance poses a dual challenge for Open-Set Recognition (OSR): models must maintain high classification accuracy on known classes while reliably rejecting unknown samples unseen during training in the clinical settings. While recently proposed Deep Simplex Classifier (DSC)\cite{cevikalp2024reaching} and UnCertainty-aware Deep Simplex Classifier (UCDSC)\cite{Aditya_2026_WACV} successfully leverage Neural Collapse to ensure maximal inter-class separation, they rely on a uniform margin that does not account for the varying densities of medical classes. In this paper, we propose DMDSC an enhanced framework featuring a dynamic margin approach. Our approach automatically adapts class-specific margins based on label frequency, enforcing a higher penalty and tighter feature clustering for rare pathologies to counteract the effects of data imbalance. Extensive experiments conducted on diverse medical benchmarks on BloodMNIST\cite{medmnistv2}, OCTMNIST\cite{medmnistv2}, DermaMNIST\cite{medmnistv2}, and BreaKHis~\cite{spanhol2015dataset} datasets, demonstrate that our framework outperforms state-of-the-art methods.

[144] Foundation AI Models for Aerosol Optical Depth Estimation from PACE Satellite Data

Zahid Hassan Tushar, Sanjay Purushotham

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Aerosol Optical Depth (AOD) retrieval is essential for Earth observation, supporting applications from air quality monitoring to climate studies. Conventional physics-based AOD retrieval methods formulate the problem as a pixel-wise inversion, relying on radiative transfer modeling, memory-intensive look-up tables, and auxiliary meteorological data. While recent data-driven approaches have shown promise, many fail to exploit the spatial-spectral coherence of hyperspectral imagery, leading to spatially inconsistent and noise-sensitive retrievals. We present the first study exploring Foundation AI models for AOD retrieval and propose ViTCG, a Vision Transformer with Channel-wise Grouping-based spatial regression framework that reduces retrieval bias and error. ViTCG uses hyperspectral top-of-atmosphere radiance as input and jointly models spatial context and spectral information. Validation with PACE radiance observations demonstrates a 62% reduction in mean squared error compared to state-of-the-art foundation models, including Prithvi, and produces spatially coherent AOD fields.

[145] Static and Dynamic Graph Alignment Network for Temporal Video Grounding

Zhanjie Hu, Bolin Zhang, Jianhua Wang, Jianbo Zheng, Chenchen Yan, Takahiro Komamizu, Ichiro Ide, Jiangbo Qian

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries. Recently, Graph Convolutional Networks (GCN) have been widely adopted in TVG to model temporal relations among video clips and enhance contextual reasoning by constructing clip-level graphs. Despite their effectiveness, existing GCN-based TVG methods encounter three critical bottlenecks: 1) Most methods construct graph nodes using either static or dynamic features alone, resulting in incomplete visual representation and overlooking complementary semantics, 2) Most methods construct temporal graphs in a query-agnostic manner, leading to inefficient feature interaction within the temporal graph representation, and 3) Most methods often suffer from a single-granularity semantic matching, while direct training on complex temporal localization task may lead to slow convergence and suboptimal precision. To address these challenges, we propose Static and Dynamic Graph Alignment Network (SDGAN). First, SDGAN jointly exploits static and dynamic visual features to construct two complementary temporal graphs and performs Position-wise Nodes Alignment, enabling more expressive and robust visual representation. Second, SDGAN introduces Query-Clip Contrastive Learning and Adaptive Graph Modeling to explicitly align visual clips with their corresponding textual queries, yielding query-aware visual representations. Third, SDGAN incorporates multi-granularity temporal proposals within Progressive Easy-to-Hard Training Strategy, effectively bridging coarse-grained semantic localization and fine-grained temporal boundary refinement. Extensive experiments on three benchmark datasets demonstrate that SDGAN achieves superior performance across complex TVG scenarios. Codes and datasets are available at https://github.com/ZhanJieHu/SDGAN.

[146] PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

Guandong Li, Mengxia Ye

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Image editing instructions are heterogeneous: a color swap, an object insertion, and a physical-action edit all demand different spatial coverage and different reasoning depth, yet existing reasoning-based editors apply a single fixed inference recipe to every instruction. We argue that adaptivity along both the spatial and temporal axes is the missing degree of freedom, and we present PhysEdit, an editing framework built around this principle. PhysEdit introduces two inference-time modules that compose without retraining the backbone. At its core, (1) Complexity-Adaptive Reasoning Depth (CARD) predicts edit complexity directly from the instruction and reference image and allocates the reasoning step count N_r and reasoning-token length r per sample – turning a previously fixed inference schedule into a conditional-computation problem. CARD is supported by (2) a Spatial Reasoning Mask (SRM) that extracts an instruction-conditioned spatial prior from cross-attention to confine reasoning to regions that semantically require it. On the full 737-case ImgEdit Basic-Edit Suite, PhysEdit delivers a 1.18x wall-clock speedup (64.3s vs. 76.1s per sample) over a strong reasoning baseline while slightly improving instruction adherence (CLIP-T 0.2283 vs. 0.2266, +0.7%) and matching identity preservation within noise (CLIP-I 0.8246 vs. 0.8280). The speedup is category-dependent and reaches 1.52x on appearance-level edits, validating CARD’s adaptive allocation as the principal source of efficiency gain. A 30-sample pilot with full ablations isolates the contribution of each module.

[147] Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels

Tongxu Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Knee osteoarthritis (OA) assessment involves a natural but often underused label hierarchy: a coarse binary OA decision and a fine-grained Kellgren–Lawrence (KL) severity grade. Existing deep learning studies commonly treat these targets as separate classification problems, either reducing OA assessment to disease presence or directly optimizing noisy ordinal KL labels. In this work, we ask whether this clinical hierarchy can serve as a representation-level supervisory prior. Rather than introducing a complex architecture, we use a deliberately simple dual-head model with a shared encoder and two task-specific heads as a probe of hierarchical supervision. We compare single-OA, single-KL, and dual-head training across multiple 3D backbones under the same test protocol. Beyond standard classification metrics, we perform paired statistical comparisons, analyze latent severity-axis geometry, and examine saliency overlap with cartilage regions. The results show that dual-head supervision produces backbone-dependent gains, with clear improvements in KL-related metrics for selected backbones. More importantly, the gains are accompanied by a more ordered coarse-to-fine latent organization and, for responsive backbones, stronger anatomical alignment of saliency with cartilage. These findings suggest that even simple hierarchical dual-head supervision can reshape disease representations under noisy coarse/fine labels, providing a useful inductive bias for OA diagnosis and severity grading.

[148] Unpaired Image Deraining Using Reward-Guided Self-Reinforcement Strategy

Yinghao Chen, Yeying Jin, Xiang Chen, Yanyan Wei, Ziyang Yan, Yaowen Fu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Unsupervised deraining has attracted attention for its ability to learn the real-world distribution of rain without paired supervision. However, the lack of strong constraints makes it difficult for the network to converge, especially with the complex diversity of rain degradation. A key motivation is that high-quality deraining results occasionally emerge during training, which can be leveraged to guide the optimization process. To overcome these challenges, we introduce RGSUD (Reward-Guided Self-Reinforcement Unsupervised Image Deraining), comprising two key stages: reward recycling and self-reinforcement (SR) training. For the former stage, we propose an Image Quality Assessment (IQA)-based dynamic reward recycling mechanism that selects optimal derained outputs during training and continuously collects high-quality deraining images. In latter stage, we incorporate these rewards into the model’s optimization process, constraining the optimization space and improving alignment between derained outputs and clean images. By leveraging IQA-based self-reinforced loss and dynamically updated rewards, we enhance the quality of synthesized pseudo-paired data and stabilize the optimization. Extensive experiments demonstrate that our method achieves SOTA performance across multiple datasets, including paired synthetic, paired real, and unpaired real images, outperforming existing unsupervised deraining approaches in both subjective and objective IQA metrics. Additionally, we show that the self-reinforcement strategy is adaptable to other unsupervised deraining methods and our deraining framework demonstrates strong generalization across existing supervised deraining networks.

[149] Exploring the Limits of End-to-End Feature-Affinity Propagation for Single-Point Supervised Infrared Small Target Detection

Qiancheng Zhou, Wenhua Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Single-point supervised infrared small target detection (IRSTD) drastically reduces dense annotation costs. Current state-of-the-art (SOTA) methods achieve high precision by recovering mask supervision through explicit, offline pseudo-label construction, such as multi-stage active learning and physics-driven mask generation. In this paper, we study a minimalist alternative: generating point-to-mask supervision online through in-batch, point-anchored feature-affinity propagation. We instantiate this paradigm as GSACP, an end-to-end testbed that directly supervises the detector using hard-margin feature affinity gated by local image priors, entirely eliminating external label-evolution loops. This compact design, however, exposes an optimization bottleneck. Because the affinity target is generated from the same feature representation being optimized, training forms a self-referential loop. We theoretically formalize this as \emph{Self-Referential Propagation Drift}, a representation-supervision entanglement that can sharpen true boundaries or distort the feature space to satisfy its own targets. To systematically isolate these failure modes, we apply a protocolized single-variable ablation procedure spanning local EMA teacher decoupling, hard-background contrastive separation, and adaptive support geometry. On the SIRST3 dataset, GSACP-Final establishes a new ultra-low false-alarm operating regime, achieving a highly competitive $0.6674$ mIoU while demonstrating a $38% relative reduction in false-positive artifacts ($\mathrm{Fa}$) compared with PAL. By systematically deconstructing the end-to-end paradigm, we map its performance boundaries and show that in-batch feature propagation provides a compact alternative for deployment scenarios where false-alarm suppression is paramount.

[150] Quantum Gradient-Based Approach for Edge and Corner Detection Using Sobel Kernels

Mohammad Aamir Sohail, Gabriela Pinheiro, Yasemin Poyraz Kocak, Batuhan Hangun, Emre Camkerten, Simge Yigit, Hafize Asude Ertan

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. Corners are locations where gray-level intensity changes abruptly in multiple directions and are widely used in feature extraction, object tracking, and 3D modeling. In this study, we present a quantum implementation of Sobel-based edge detection and Harris-style corner detection. Two quantum image encoding methods - Flexible Representation of Quantum Images (FRQI) and Quantum Probability Image Encoding (QPIE) - are used to encode the input data and are comparatively analyzed. The proposed approach introduces a quantum gradient computation scheme based on lag-2 differences, enabling the evaluation of gradient-like features in superposition. To improve detection quality and reduce false positives, a classical post-processing step is applied to candidate corner points identified by the quantum circuit. Results show that the proposed quantum circuits produce outputs consistent with classical Sobel and Harris operators. Furthermore, the QPIE-based configuration yields more stable and coherent results than FRQI, especially under limited measurement shots. While gradient computation can be performed efficiently at the circuit level, the overall cost remains dominated by state preparation, measurement, and classical post-processing. All experiments are conducted under noiseless simulation, and performance on NISQ hardware may be affected by noise and measurement limitations. Therefore, this work demonstrates a functional and scalable quantum realization of classical edge and corner detection methods rather than an end-to-end speedup.

[151] Modeling Subjective Urban Perception with Human Gaze

Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.

[152] Map2World: Segment Map Conditioned Text to 3D World Generation

Jaeyoung Chung, Suyoung Lee, Jianfeng Xiang, Jiaolong Yang, Kyoung Mu Lee

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: 3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.

[153] Make Your LVLM KV Cache More Lightweight

Xihao Chen, Yangyang Guo, Roger Zimmermann

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.

[154] GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image features, lack of factor-aware conditioning, and impractical capacity scaling. To address these challenges, we propose Globally-conditioned Multi-scale Gaze estimation (GMGaze), which leverages a multi-scale transformer architecture. Specifically, the model first introduces semantic prototype conditioning, which modulates the CLIP global image embedding using four learned prototype banks (i.e., illumination, background, head pose and appearance) to generate two complementary context-biased global tokens. These tokens, along with the CLIP patch and CNN tokens, are fused at the first layer. This early unified fusion prevents information loss common in late-stage merging. Finally, each token passes through sparse Mixture-of-Experts modules, providing conditional computational capacity without uniformly increasing dense parameters. For cross-domain adaptation, we incorporate an adversarial domain adaptation technique with a feature separation loss that encourages the two global tokens to remain de-correlated. Experiments using four public benchmarks (MPIIFaceGaze, EYEDIAP, Gaze360, and ETH-XGaze) show that GMGaze achieves mean angular errors of 2.49$^\circ$, 3.22$^\circ$, 10.16$^\circ$, and 1.44$^\circ$, respectively, outperforming previous baselines in all within-domain settings. In cross-domain evaluations, it provides state-of-the-art (SOTA) results on two standard transfer routes.

[155] Let ViT Speak: Generative Language-Image Pre-training

Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLIP trains a ViT to predict language tokens directly from visual tokens using a standard language modeling objective, without contrastive batch construction or an additional text decoder. This design offers three key advantages: (1) \textbf{Simplicity}: a single transformer jointly models visual and textual tokens; (2) \textbf{Scalability}: it scales effectively with both data and model size; and (3) \textbf{Performance}: it achieves competitive or superior results across diverse multimodal benchmarks. Trained on 8B samples from Recap-DataComp-1B, GenLIP matches or surpasses strong baselines despite using substantially less pretraining data. After continued pretraining on multi-resolution images at native aspect ratios, GenLIP further improves on detail-sensitive tasks such as OCR and chart understanding, making it a strong foundation for vision encoders in MLLMs.

[156] Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a “Visual Signal Dilution” phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to ensure sustained, on-demand visual perception. Integrated as a parallel branch alongside the Feed-Forward Network (FFN) in LVLMs, PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for precise visual perception, thereby structurally mitigating the signal suppression inherent to deep generation. Extensive experiments on Qwen3-VL models demonstrate that PVM brings notable improvements with negligible parameter overhead, delivering consistent average accuracy gains across both 4B and 8B scales, particularly in complex reasoning tasks that demand persistent visual perception. Furthermore, in-depth analysis reveals that PVM can resist length-induced signal decay and accelerate internal prediction convergence.

[157] Posterior Augmented Flow Matching

George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.

[158] Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Ahmet Enis Cetin, Ulas Bagci

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Self-attention is central to the success of Transformer architectures; however, learning the query, key, and value projections from random initialization remains challenging and computationally expensive. In this paper, we propose two complementary methods that leverage the Discrete Cosine Transform (DCT) to enhance the efficiency and performance of Vision Transformers. First, we address the initialization problem by introducing a simple yet effective DCT-based initialization strategy for self-attention, where projection weights are initialized using DCT coefficients. This structure-preserving approach consistently improves classification accuracy on the CIFAR-10 and ImageNet-1K benchmarks. Second, we propose a DCT-based attention compression technique that exploits the decorrelation properties of the frequency domain. By observing that high-frequency DCT coefficients typically correspond to noise, we truncate high-frequency components of the input patches, thereby reducing the dimensionality of the query, key, and value projections without sacrificing accuracy. Experiments on Swin Transformer models demonstrate that the proposed compression method achieves a substantial reduction in computational overhead while maintaining comparable performance.

Chingis Oinar, Miao Cao, Shanshan Fu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2408.11349: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2408.11349&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[160] PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Shangkun Sun, Ruyang Liu, Haoran Tang, Yixiao Ge, Haibo Lu, Wei Gao, Jiankun Yang, Chen Li

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.02327: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.02327&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[161] Copula-enhanced Vision Transformer for high myopia diagnosis through OU UWF fundus images

Chong Zhong, Yunhao Liu, Yang Li, Xiang Fu, Jin Yang, Danjuan Yang, Meiyan Li, Jinfeng Xu, Aiyi Liu, Alan H. Welsh, Xingtao Zhou, Bo Fu, Catherine C. Liu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.06540: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.06540&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[162] Diffusion Models are Secretly Zero-Shot 3DGS Harmonizers

Vsevolod Skorokhodov, Nikita Durasov, Pascal Fua

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.06740: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.06740&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[163] Color Conditional Generation with Sliced Wasserstein Guidance

Alexander Lobashev, Maria Larchenko, Dmitry Guskov

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.19034: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.19034&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[164] Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark

Udayanga G.W.K.N. Gamage, Xuanni Huo, Luca Zanatta, T Delbruck, Cesar Cadena, Matteo Fumagalli, Silvia Tolu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.05679: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.05679&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[165] APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

Yuan Gao, Shaobo Xia, Sheng Nie, Cheng Wang, Xiaohuan Xi, Bisheng Yang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.09971: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.09971&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[166] Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.18875: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.18875&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[167] Thought Graph Traversal for Test-time Scaling in Chest X-ray VLLMs

Yue Yao, Zelin Wen, Yan Tong, Xinyu Tian, Xuqing Li, Xiao Ma, Dongliang Xu, Tom Gedeon

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.11989: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.11989&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[168] How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.01955: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.01955&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[169] Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement

Xiaoran Sun, Liyan Wang, Yeying Jin, Kin-man Lam, Zhixun Su, Yang Yang, Jinshan Pan, Cong Wang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.18064: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.18064&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[170] Smoothing Slot Attention Iterations and Recurrences

Rongzhen Zhao, Wenyan Yang, Juho Kannala, Joni Pajarinen

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.05417: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.05417&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[171] A Deep Learning-Based CCTV System for Automatic Smoking Detection in Fire Exit Zones

Sami Sadat, Mohammad Irtiza Hossain, Junaid Ahmed Sifat, Suhail Haque Rafi, Md. Waseq Alauddin Alvi, Md. Khalilur Rhaman

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.11696: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.11696&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[172] Quantization Robustness to Input Degradations for Object Detection

Toghrul Karimov, Hassan Imani, Allan Kazakov

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.19600: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.19600&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[173] CollaFuse: Collaborative Diffusion Models

Simeon Allmendinger, Domenique Zipperling, Lukas Struppek, Niklas Kühl

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2406.14429: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2406.14429&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[174] Deepfakes: we need to re-think the concept of “real” images

Janis Keuper, Margret Keuper

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.21864: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.21864&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[175] FreeRet: MLLMs as Training-Free Retrievers

Yuhan Zhu, Xiangyu Zeng, Chenting Wang, Xinhao Li, Chunxu Liu, Yicheng Xu, Ziang Yan, Yi Wang, Limin Wang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24621: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24621&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[176] Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

Chenhui Xu, Fuxun Yu, Michael J. Bianco, Jacob Kovarskiy, Raphael Tang, Qi Zhang, Zirui Xu, Will LeVine, Brandon Dubbs, Heming Liao, Cassandra Burgess, Suvam Bag, Jay Patravali, Rupanjali Kukal, Mikael Figueroa, Rishi Madhok, Nikolaos Karianakis, Jinjun Xiong

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.00072: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.00072&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[177] ClustViT: Clustering-based Token Merging for Semantic Segmentation

Fabio Montello, Ronja Güldenring, Lazaros Nalpantidis

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.01948: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.01948&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[178] Instance-Aware Pseudo-Labeling and Class-Focused Contrastive Learning for Weakly Supervised Domain Adaptive Segmentation of Electron Microscopy

Shan Xiong, Jiabao Chen, Ye Wang, Jialin Peng

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.16450: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.16450&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[179] Residual Diffusion Bridge Model for Image Restoration

Hebaixu Wang, Jing Zhang, Haoyang Chen, Haonan Guo, Di Wang, Jiayi Ma, Bo Du

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.23116: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.23116&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[180] The Determinism of Randomness: Latent Space Degeneracy in Diffusion Model

Song Yan, Chenfeng Wang, Wei Zhai, Xinliang Bi, Jian Yang, Yancheng Cai, Yusen Zhang, Yunwei Lan, Tao Zhang, GuanYe Xiong, Min Li, Zheng-Jun Zha

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.07756: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.07756&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[181] LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping

Chenying Liu, Wei Huang, Xiao Xiang Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.08156: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.08156&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[182] High Dynamic Range 3D Gaussian Splatting via Luminance-Chromaticity Decomposition

Kaixuan Zhang, Minxian Li, Mingwu Ren, Jiankang Deng, Xiatian Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.12895: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.12895&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[183] Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.01116: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.01116&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[184] A Novel Patch-Based TDA Approach for Computed Tomography Imaging

Dashti A. Ali, Aras T. Asaad, Jacob J. Peoples, Ahmad Bashir Barekzai, Camila Vilela, Hala Khasawneh, Jayasree Chakraborty, João Miranda, Mohammad Hamghalam, Natalie Gangai, Natally Horvat, Richard K. G. Do, Alice C. Wei, Amber L. Simpson

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.12108: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.12108&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[185] Adapting MLLMs for Nuanced Video Retrieval

Piyush Bagad, Andrew Zisserman

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.13511: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.13511&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[186] Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations

Jiawei Ge, Jiuxin Cao, Xinyi Li, Xuelin Zhu, Chang Liu, Bo Liu, Chen Feng, Ioannis Patras

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.20260: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.20260&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[187] It’s Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.00090: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.00090&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[188] VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image

Teng-Fang Hsiao, Bo-Kai Ruan, Yu-Lun Liu, Hong-Han Shuai

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.04349: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.04349&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[189] Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

Haoyuan Li, Qihang Cao, Tao Tang, Kun Xiang, Zihan Guo, Jianhua Han, Hang Xu, JiaWang Bian, Xiaodan Liang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.06037: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.06037&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[190] WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

Aydin Ayanzadeh, Prakhar Dixit, Sadia Kamal, Milton Halem

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.13305: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.13305&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[191] The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, Haiyi Zhu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.09896: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.09896&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[192] ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

A. Said Gurbuz, Sunghwan Hong, Ahmed Nassar, Marc Pollefeys, Peter Staar

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.14276: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.14276&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[193] Driving with A Thousand Faces: A Benchmark for Closed-Loop Personalized End-to-End Autonomous Driving

Xiaoru Dong, Ruiqin Li, Xiao Han, Zhenxuan Wu, Jiamin Wang, Jian Chen, Qi Jiang, SM Yiu, Xinge Zhu, Yuexin Ma

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.18757: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.18757&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[194] Prefer-DAS: Learning from Local Preferences and Sparse Prompts for Domain Adaptive Segmentation of Electron Microscopy

Jiabao Chen, Shan Xiong, Jialin Peng

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.19423: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.19423&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[195] Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

Hongbo Zheng, Afshin Bozorgpour, Dorit Merhof, Minjia Zhang

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.02727: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.02727&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[196] VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Ruoliu Yang, Chu Wu, Caifeng Shan, Ran He, Chaoyou Fu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.22285: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.22285&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[197] Learning Locally, Revising Globally: Global Reviser for Federated Learning with Noisy Labels

Yuxin Tian, Mouxing Yang, Yuhao Zhou, Jian Wang, Qing Ye, Tongliang Liu, Gang Niu, Jiancheng Lv

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2412.00452: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2412.00452&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[198] Adaptive Dual-Teacher Distillation with Subnetwork Rectification for Bridging Semantic Gaps in Black-Box Domain Adaptation

Zhe Zhang, Jing Li, Wanli Xue, Xu Cheng, Jianhua Zhang, Qinghua Hu, Shengyong Chen

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.22908: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.22908&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[199] Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

Felix Wimbauer, Fabian Manhardt, Michael Oechsle, Nikolai Kalischek, Christian Rupprecht, Daniel Cremers, Federico Tombari

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.28980: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.28980&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[200] Are Pretrained Image Matchers Good Enough for SAR-Optical Satellite Registration?

Isaac Corley, Alex Stoken, Gabriele Berton

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10217: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10217&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[201] Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance

Saeed Mohseni-Sehdeh, Walid Saad, Kei Sakaguchi, Tao Yu

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.18654: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.18654&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[202] At FullTilt: Real-Time Open-Set 3D Macromolecule Detection Directly from Tilted 2D Projections

Ming-Yang Ho, Alberto Bartesaghi

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.10766: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.10766&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[203] Find, Fix, Reason: Context Repair for Video Reasoning

Haojian Huang, Chuanyu Qin, Yinchuan Li, Yingcong Chen

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16243: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16243&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[204] Semantic Foam: Unifying Spatial and Semantic Scene Decomposition

Amr Sharafeldin, Shrisudhan Govindarajan, Thomas Walker, Aryan Mikaeili, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26262: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26262&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[205] DiffMI: Breaking Face Recognition Privacy via Diffusion-Driven Training-Free Model Inversion

Hanrui Wang, Shuo Wang, Chun-Shien Lu, Isao Echizen

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.18015: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.18015&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[206] Certifiable Factor Graph Optimization

Zhexin Xu, Nikolas R. Sanderson, Hanna Jiamei Zhang, David M. Rosen

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.01267: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.01267&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[207] ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Stephan Xie, Ben Cohen, Mononito Goswami, Junhong Shen, Emaad Khwaja, Chenghao Liu, David Asker, Othmane Abou-Amal, Ameet Talwalkar

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.21199: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.21199&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[208] Phase-Separated Complex Hilbert PCA on Markerless 3D Pose Estimation Data: A Global Phase Network and Its Extension to a Continuous Field on the Body Surface

Hiromitsu Goto, Tao Tao, Zheng-Lin Chia

Main category: cs.CV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.24415: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.24415&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

cs.AI

[209] TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

Rong Lu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present TADI (Tool-Augmented Drilling Intelligence), an agentic AI system that transforms drilling operational data into evidence-based analytical intelligence. Applied to the Equinor Volve Field dataset, TADI integrates 1,759 daily drilling reports, selected WITSML real-time objects, 15,634 production records, formation tops, and perforations into a dual-store architecture: DuckDB for structured queries over 12 tables with 65,447 rows, and ChromaDB for semantic search over 36,709 embedded documents. Twelve domain-specialized tools, orchestrated by a large language model via iterative function calling, support multi-step evidence gathering that cross-references structured drilling measurements with daily report narratives. The system parses all 1,759 DDR XML files with zero errors, handles three incompatible well naming conventions, and is backed by 95 automated tests plus a 130-question stress-question taxonomy spanning six operational categories. We formalize the agent’s behavior as a sequential tool-selection problem and propose the Evidence Grounding Score (EGS) as a simple grounding-compliance proxy based on measurements, attributed DDR quotations, and required answer sections. The complete 6,084-line, framework-free implementation is reproducible given the public Volve download and an API key, and the case studies and qualitative ablation analysis suggest that domain-specialized tool design, rather than model scale alone, is the primary driver of analytical quality in technical operations.

[210] AgentReputation: A Decentralized Agentic AI Reputation Framework

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Decentralized, agentic AI marketplaces are rapidly emerging to support software engineering tasks such as debugging, patch generation, and security auditing, often operating without centralized oversight. However, existing reputation mechanisms fail in this setting for three fundamental reasons: agents can strategically optimize against evaluation procedures; demonstrated competence does not reliably transfer across heterogeneous task contexts; and verification rigor varies widely, from lightweight automated checks to costly expert review. Current approaches to reputation drawing on federated learning, blockchain-based AI platforms, and large language model safety research are unable to address these challenges in combination. We therefore propose \textbf{AgentReputation}, a decentralized, three-layer reputation framework for agentic AI systems. The framework separates task execution, reputation services, and tamper-proof persistence to both leverage their respective strengths and enable independent evolution. The framework introduces explicit verification regimes linked to agent reputation metadata, as well as context-conditioned reputation cards that prevent reputation conflation across domains and task types. In addition, AgentReputation provides a decision-facing policy engine that supports resource allocation, access control, and adaptive verification escalation based on risk and uncertainty. Building on this framework, we outline several future research directions, including the development of verification ontologies, methods for quantifying verification strength, privacy-preserving evidence mechanisms, cold-start reputation bootstrapping, and defenses against adversarial manipulation.

[211] Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

Shubham Kumar, Narendra Ahuja

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Safety trained large language models (LLMs) can often be induced to answer harmful requests through jailbreak prompts. Because we lack a robust understanding of why LLMs are susceptible to jailbreaks, future frontier models operating more autonomously in higher-stakes settings may similarly be vulnerable to such attacks. Prior work has studied jailbreak success by examining the model’s intermediate representations, identifying directions in this space that causally encode concepts like harmfulness and refusal. Then, they globally explain all jailbreak attacks as attempting to reduce or strengthen these concepts (e.g., reduce harmfulness). However, different jailbreak strategies may succeed by strengthening or suppressing different intermediate concepts, and the same jailbreak strategy may not work for different harmful request categories (e.g., violence vs. cyberattack); thus, we seek to give a local explanation – i.e., why did this specific jailbreak succeed? To address this gap, we introduce LOCA, a method that gives Local, CAusal explanations of jailbreak success by identifying a minimal set of interpretable, intermediate representation changes that causally induce model refusal on an otherwise successful jailbreak request. We evaluate LOCA on harmful original-jailbreak pairs from a large jailbreak benchmark across Gemma and Llama chat models, comparing against prior methods adapted to this setting. LOCA can successfully induce refusal by making, on average, six interpretable changes; prior work routinely fails to achieve refusal even after 20 changes. LOCA is a step toward mechanistic, local explanations of jailbreak success in LLMs. Code to be released.

[212] Causal Foundations of Collective Agency

Frederik Hytting Jørgensen, Sebastian Weichwald, Lewis Hammond

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group’s joint actions as rational and goal-directed successfully predicts its behavior. We formalize this perspective on collective agency using causal games – which are causal models of strategic, multi-agent interactions – and causal abstraction – which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems.

[213] Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

Kaituo Zhang, Zhen Xiong, Mingyu Zhong, Zhimeng Jiang, Zhouyuan Yuan, Zhecheng Li, Ying Lin

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the “tool-use tax”, which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings suggest that more substantial improvements still require strengthening the model’s intrinsic reasoning and tool-interaction capabilities.

[214] TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization

Abdulhady Abas Abdullah, Fatemeh Daneshfar, Seyedali Mirjalili, Mourad Oussalah

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Aligning large language models (LLMs) with human preferences is commonly done via reinforcement learning from human feedback (RLHF) with Proximal Policy Optimization (PPO) or, more simply, via Direct Preference Optimization (DPO). While DPO is stable and RL-free, it treats preferences as flat winner vs. loser signals and is sensitive to noisy or brittle preferences arising from fragile chains of thought. We propose TUR-DPO, a topology- and uncertainty-aware variant of DPO that rewards how answers are derived, not only what they say, by eliciting lightweight reasoning topologies and combining semantic faithfulness, utility, and topology quality into a calibrated uncertainty signal. A small learnable reward is factorized over these signals and incorporated into an uncertainty-weighted DPO objective that remains RL-free and relies only on a fixed or moving reference policy. Empirically, across open 7-8B models and benchmarks spanning mathematical reasoning, factual question answering, summarization, and helpful/harmless dialogue, TUR-DPO improves judge win-rates, faithfulness, and calibration relative to DPO while preserving training simplicity and avoiding online rollouts. We further observe consistent gains in multimodal and long-context settings, and show that TUR-DPO matches or exceeds PPO on reasoning-centric tasks while maintaining operational simplicity.

[215] ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts

Sydney Johns, Heng Jin, Chaoyu Zhang, Y. Thomas Hou, Wenjing Lou

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) are now being explored for defense applications that require reliable and legally compliant decision support. They also hold significant potential to enhance decision making, coordination, and operational efficiency in military contexts. These uses demand evaluation methods that reflect the doctrinal standards that guide real military operations. Existing safety benchmarks focus on general social risks and do not test whether models follow the legal and ethical rules that govern real military operations. To address this gap, we introduce ARMOR 2025, a military aligned safety benchmark grounded in three core military doctrines the Law of War, the Rules of Engagement, and the Joint Ethics Regulation. We extract doctrinal text from these sources and generate multiple choice questions that preserve the intended meaning of each rule. The benchmark is organized through a taxonomy informed by the Observe Orient Decide Act (OODA) decision making framework. This structure enables systematic testing of accuracy and refusal across military relevant decision types. This benchmark features a structured 12-category taxonomy, 519 doctrinally grounded prompts, and rigorous evaluation procedures applied to 21 commercial LLMs. Evaluation results reveal critical gaps in safety alignment for military applications.

[216] Agentic AI for Trip Planning Optimization Application

Tiejin Chen, Ahmadreza Moradipari, Kyungtae Han, Hua Wei, Nejib Ammar

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Trip planning for intelligent vehicles increasingly requires selecting optimal routes rather than merely producing feasible itineraries, as interacting factors such as travel time, energy consumption, and traffic conditions directly affect plan quality. Yet existing systems are largely designed for feasibility-oriented planning, and current benchmarks provide only reference answers without ground truth, preventing objective evaluation of optimization performance. In our paper, we address these limitations with an agentic AI framework that enables dynamic refinement through an orchestration agent coordinating specialized agents for traffic, charging, and points of interest, and with the Trip-planning Optimization Problems Dataset, which supplies definitive optimal solutions and category-level task structure for fine-grained analysis. Experiments show that our system achieves 77.4% accuracy on the TOP Benchmark, significantly outperforming single-agent and workflow-based multi-agent baselines, demonstrating the importance of orchestrated agentic reasoning for robust trip planning optimization.

[217] Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

Yuxuan Gao, Megan Wang, Yi Ling Yu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Public inference benchmarks compare AI systems at the model and provider level, but the unit at which deployment decisions are actually made is the endpoint: the (provider, model, stock-keeping-unit) tuple at which a specific quantization, decoding strategy, region, and serving stack is exposed. We introduce TokenArena, a continuous benchmark that measures inference at endpoint granularity along five core axes (output speed, time to first token, workload-blended price, effective context, and quality on the live endpoint) and synthesizes them, together with a modeled energy estimate, into three headline composites: joules per correct answer, dollars per correct answer, and endpoint fidelity (output-distribution similarity to a first-party reference). The framework’s novelty is empirical and methodological. Across 78 endpoints serving 12 model families, the same model on different endpoints differs in mean accuracy by up to 12.5 points on math and code, in fingerprint similarity to first party by up to 12 points, in tail latency by an order of magnitude, and in modeled joules per correct answer by a factor of 6.2. We further show that workload-aware blended pricing reorders the leaderboard substantially: 7 of 10 top-ranked endpoints under the chat preset (3:1 input:output) fall out of the top 10 under the retrieval-augmented preset (20:1), and the reasoning preset (1:5) elevates frontier closed models that the chat preset penalizes on price. We release the framework, schema, probe and eval harness, and a v1.0 leaderboard snapshot under CC BY 4.0. TokenArena is a methodology, not a single ranking; we publish full provenance and limitations and welcome external replication.

[218] AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

Ranit Karmakar, Jayita Chatterjee

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine. This raises a practical routing question that existing evaluations do not directly answer: which parts of an agent workflow truly require large frontier intelligence, and which can be handled by smaller models? We introduce AgentFloor, a deterministic 30-task benchmark organized as a six-tier capability ladder, spanning instruction following, tool use, multi-step coordination, and long-horizon planning under persistent constraints. We evaluate 16 open-weight models, from 0.27B to 32B parameters, alongside GPT-5 across 16,542 scored runs. Our results reveal a clear boundary of model necessity. Small and mid-sized open-weight models are already sufficient for much of the short-horizon, structured tool use work that dominates real agent pipelines, and in aggregate, the strongest open-weight model matches GPT-5 on our benchmark while being substantially cheaper and faster to run. The gap appears most clearly on long-horizon planning tasks that require sustained coordination and reliable constraint tracking over many steps, where frontier models still hold an advantage, though neither side reaches strong reliability. We also find that this boundary is not explained by scale alone: some failures respond to targeted interventions, but the effects are model-specific rather than universal. These findings suggest a practical design principle for agentic systems: use smaller open-weight models for the broad base of routine actions, and reserve large frontier models for the narrower class of tasks that truly demand deeper planning and control. We release the benchmark, harness, sweep configurations, and full run corpus.

[219] Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning

Richard Dewey, Janos Botyanszki, Ciamac C. Moallemi, Andrew T. Zheng

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold’em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar’s Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar’s Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.

[220] Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

Sen Cui, Jingheng Ma

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.

[221] AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

Haotian Zhao, Yuxin Zhang, Songlin Zhou, Stephen S. -T. Yau, Wenyu Zhang, Lun Tian, Tianshu Zhu, Yifeng Huang, Yucheng Zeng, Jingnan Gu, Daxiang Dong, Jianmin Wu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement learning (RL) has significantly advanced the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. Yet effective training remains challenging, as sparse, outcome-only rewards make it difficult to assign credit to individual steps in an agent’s action trajectory. A common remedy is to introduce dense intermediate supervision, such as process reward models or auxiliary self-supervised signals, but this increases supervision and tuning complexity and often generalizes poorly across tasks and domains. This paper presents AEM, a supervision-free credit assignment method that adaptively modulates entropy dynamics during RL training to achieve a more effective exploration-exploitation trade-off. Theoretically, we elevate entropy analysis from the token level to the response level to reduce token sampling variance and show that entropy drift under natural gradients is intrinsically governed by the product of the advantage and the relative response surprisal. Specifically, we derive a practical proxy to reshape training dynamics, enabling a natural transition from exploration to exploitation. Extensive experiments across various benchmarks and models ranging from 1.5B to 32B parameters demonstrate the effectiveness of AEM, including a notable 1.4 percent gain when integrated into a state-of-the-art baseline on the highly challenging SWE-bench-Verified benchmark.

[222] Thinking in Text and Images: Interleaved Vision–Language Reasoning Traces for Long-Horizon Robot Manipulation

Jinkun Liu, Haohan Chi, Lingfeng Zhang, Yifan Xie, YuAn Wang, Long Chen, Hangjun Ye, Xiaoshuai Hao, Wenbo Ding

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Long-horizon robotic manipulation requires plans that are both logically coherent and geometrically grounded. Existing Vision-Language-Action policies usually hide planning in latent states or expose only one modality: text-only chain-of-thought encodes causal order but misses spatial constraints, while visual prediction provides geometric cues but often remains local and semantically underconstrained. We introduce Interleaved Vision–Language Reasoning (IVLR), a policy framework built around \trace{}, an explicit intermediate representation that alternates textual subgoals with visual keyframes over the full task horizon. At test time, a single native multimodal transformer self-generates this global semantic-geometric trace from the initial observation and instruction, caches it, and conditions a closed-loop action decoder on the trace, original instruction, and current observation. Because standard robot datasets lack such traces, we construct pseudo-supervision by temporally segmenting demonstrations and captioning each stage with a vision-language model. Across simulated benchmarks for long-horizon manipulation and visual distribution shift, \method{} reaches 95.5% average success on LIBERO, including 92.4% on LIBERO-Long, and 59.4% overall success on SimplerEnv-WidowX. Ablations show that both modalities are necessary: without traces, LIBERO-Long success drops to 37.7%; text-only and vision-only traces reach 62.0% and 68.4%, while the full interleaved trace reaches 92.4%. Stress tests with execution perturbations and masked trace content show moderate degradation, suggesting that the trace can tolerate local corruption and moderate execution drift, but remains limited under stale or incorrect global plans.

[223] On the Role of Artificial Intelligence in Human-Machine Symbiosis

Ching-Chun Chang, Yuchen Guo, Hanrui Wang, Timo Spinde, Isao Echizen

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The evolution of artificial intelligence (AI) has rendered the boundary between humanity and computational machinery increasingly ambiguous. In the presence of more interwoven relationships within human-machine symbiosis, the very notion of AI-generated information becomes difficult to define, as such information arises not from either humans or machines in isolation, but from their mutual shaping. Therefore, a more pertinent question lies not merely in whether AI has participated, but in how it has participated. In general, the role assumed by AI is often specified, either implicitly or explicitly, in the input prompt, yet becomes less apparent or altogether unobservable when the generated content alone is available. Once detached from the dialogue context, the functional role may no longer be traceable. This study considers the problem of tracing the functional role played by AI in natural language generation. A methodology is proposed to infer the latent role specified by the prompt, embed this role into the content during the probabilistic generation process and subsequently recover the nature of AI participation from the resulting text. Experimentation is conducted under a representative scenario in which AI acts either as an assistive agent that edits human-written content or as a creative agent that generates new content from a brief concept. The experimental results support the validity of the proposed methodology in terms of discrimination between roles, robustness against perturbations and preservation of linguistic quality. We envision that this study may contribute to future research on the ethics of AI with regard to whether AI has been used fairly, transparently and appropriately.

[224] Instance-Aware Parameter Configuration in Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem

Yinghao Qin, Xinwei Wang, Mosab Bazargani, Jun Chen

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Algorithm performance in combinatorial optimization is highly sensitive to parameter settings, while a single globally tuned configuration often fails to exploit the heterogeneity of instances. This limitation is particularly evident in the Electric Capacitated Vehicle Routing Problem, where instances differ in structure, demand patterns, and energy constraints. This paper investigates instance-aware parameter configuration for Bilevel Late Acceptance Hill Climbing, a state-of-the-art metaheuristic for the Electric Capacitated Vehicle Routing Problem. An offline tuning procedure is used to obtain instance-specific parameter labels, which are then mapped from instance features via a regression model to enable parameter prediction for unseen instances prior to execution. Experimental results on the IEEE WCCI 2020 benchmark and its extensions show that the proposed approach achieves an average objective value reduction of $0.28%$ across eight held-out test instances relative to a globally tuned configuration. This corresponds to a significant cost reduction in multimillion-dollar transportation operations.

[225] Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

Yan Zhang, Daiqing Wu, Huawen Shen, Yu Zhou, Can Ma

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided distillation, which adaptively weights tokens based on digit significance and teacher confidence, concentrating optimization on the most impactful and reliable positions. Extensive experiments on six representative GUI grounding benchmarks show that GUI-SD consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency. Code and training data are available at https://zhangyan-ucas.github.io/GUI-SD/.

[226] To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

Qinyuan Wu, Soumi Das, Mahsa Amani, Arijit Nag, Seungeon Lee, Krishna P. Gummadi, Abhilasha Ravichander, Muhammad Bilal Zafar

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may be redundant or even harmful. Effective tool use, therefore, hinges on a core LLM decision: whether to call or not call a tool, when performing a task. This decision is particularly challenging for web search tools, where the benefits of external information depend on the model’s internal knowledge and its ability to integrate potentially noisy tool responses. We introduce a principled framework inspired by decision-making theory to evaluate web search tool-use decisions along three key factors: necessity, utility, and affordability. Our analysis combines two complementary lenses: a normative perspective that infers true need and utility from an optimal allocation of tool calls, and a descriptive perspective that infers the model’s self-perceived need and utility from their observed behaviors. We find that models’ perceived need and utility of tool calls are often misaligned with their true need and utility. Building on this framework, we train lightweight estimators of need and utility based on models’ hidden states. Our estimators enable simple controllers that can improve decision quality and lead to stronger task performance than the self-perceived set up across three tasks and six models.

[227] Position: agentic AI orchestration should be Bayes-consistent

Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this position paper argues that the control layer of an agentic AI system (that orchestrates LLMs and tools) is a clear case where Bayesian principles should shine. Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities, to update these beliefs from observed agentic and human-AI interactions, and to choose actions. Making LLMs themselves explicitly Bayesian belief-updating engines remains computationally intensive and conceptually nontrivial as a general modeling target. In contrast, this paper argues that coherent decision-making requires Bayesian principles at the orchestration level of the agentic system, not necessarily the LLM agent parameters. This paper articulates practical properties for Bayesian control that fit modern agentic AI systems and human-AI collaboration, and provides concrete examples and design patterns to illustrate how calibrated beliefs and utility-aware policies can improve agentic AI orchestration.

[228] Koopman-Assisted Reinforcement Learning

Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. Brunton

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman equation, are ubiquitous in reinforcement learning and control theory. However, these equations become intractable for high-dimensional or nonlinear systems. This paper develops two new reinforcement learning algorithms based on the data-driven Koopman operator, which lifts a nonlinear system into new coordinates where the dynamics become approximately linear, and where Hamilton-Jacobi-Bellman-based methods are more tractable. In particular, the Koopman operator captures the expectation of the time evolution of the value function via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``controlled Koopman tensor’’ that facilitates the estimation of the optimal value function. This enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic. This flexible and interpretable framework includes deterministic and stochastic systems, as well as discrete and continuous dynamics. Koopman Assisted reinforcement learning attains state-of-the-art performance with respect to traditional neural network-based soft actor-critic baselines on a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.

[229] Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies

Guangyu Zhao, Kewei Lian, Haoxuan Ru, Borong Zhang, Haowei Lin, Zhancun Mu, Haobo Fu, Qiang Fu, Shaofei Cai, Zihao Wang, Yitao Liang

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Goal-conditioned policies enable decision-making models to execute diverse behaviors based on specified goals, yet their downstream performance is often highly sensitive to the choice of instructions or prompts. To bypass the limitations of discrete text prompts, we formulate post-training adaptation as a latent control problem, where the goal embedding serves as a continuous control variable to modulate the behavior of a frozen policy. We propose Preference Goal Tuning (PGT), a framework that optimizes this latent control variable to align the induced trajectory distribution with task preferences. Unlike standard fine-tuning that updates policy parameters, PGT keeps the policy frozen and updates only the latent goal using a trajectory-level preference objective. This approach essentially searches for the optimal conditioning input that maximizes the likelihood of preferred behaviors while suppressing undesirable ones. We evaluate PGT on the Minecraft SkillForge benchmark across 17 tasks. With minimal data, PGT achieves average relative improvements of 72.0% and 81.6% on two foundation policies, consistently outperforming expert-crafted prompts. Crucially, by decoupling task alignment (latent goal) from physical dynamics (frozen policy), PGT surpasses full fine-tuning by 13.4% in out-of-distribution settings, demonstrating superior robustness and generalization.

[230] InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction

Bin Lei, Weitai Kang, Zijian Zhang, Winson Chen, Xi Xie, Shan Zuo, Mimi Xie, Ali Payani, Mingyi Hong, Yan Yan, Caiwen Ding

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper introduces \textsc{InfantAgent-Next}, a generalist agent capable of interacting with computers in a multimodal manner, encompassing text, images, audio, and video. Unlike existing approaches that either build intricate workflows around a single large model or only provide workflow modularity, our agent integrates tool-based and pure vision agents within a highly modular architecture, enabling different models to collaboratively solve decoupled tasks in a step-by-step manner. Our generality is demonstrated by our ability to evaluate not only pure vision-based real-world benchmarks (i.e., OSWorld), but also more general or tool-intensive benchmarks (e.g., GAIA and SWE-Bench). Specifically, we achieve $\mathbf{7.27%}$ accuracy on OSWorld, higher than Claude-Computer-Use. Codes and evaluation scripts are open-sourced at https://github.com/bin123apple/InfantAgent.

[231] Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs

Yisen Gao, Jiaxin Bai, Tianshi Zheng, Qingyun Sun, Ziwei Zhang, Xingcheng Fu, Jianxin Li, Yangqiu Song

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Abductive reasoning in knowledge graphs aims to generate plausible logical hypotheses from observed entities, with broad applications in areas such as clinical diagnosis and scientific discovery. However, due to a lack of controllability, a single observation may yield numerous plausible but redundant or irrelevant hypotheses on large-scale knowledge graphs. To address this limitation, we introduce the task of controllable hypothesis generation to improve the practical utility of abductive reasoning. This task faces two key challenges when controlling for generating long and complex logical hypotheses: hypothesis space collapse and hypothesis oversensitivity. To address these challenges, we propose CtrlHGen, a Controllable logcial Hypothesis Generation framework for abductive reasoning over knowledge graphs, trained in a two-stage paradigm including supervised learning and subsequent reinforcement learning. To mitigate hypothesis space collapse, we design a dataset augmentation strategy based on sub-logical decomposition, enabling the model to learn complex logical structures by leveraging semantic patterns in simpler components. To address hypothesis oversensitivity, we incorporate smoothed semantic rewards including Dice and Overlap scores, and introduce a condition-adherence reward to guide the generation toward user-specified control constraints. Extensive experiments on three benchmark datasets demonstrate that our model not only better adheres to control conditions but also achieves superior semantic similarity performance compared to baselines. Our code is available at https://github.com/HKUST-KnowComp/CtrlHGen.

[232] CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments

Nitish Jaipuria, Lorenzo Gatto, Zijun Kan, Shankey Poddar, Bill Cheung, Diksha Bansal, Ramanan Balakrishnan, Aviral Suri, Jose Estevez

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The proliferation of digital payment platforms has transformed commerce, offering unmatched convenience and accessibility globally. However, this growth has also attracted malicious actors, leading to a corresponding increase in sophisticated social engineering scams. These scams are often initiated and orchestrated on multiple surfaces outside the payment platform, making user and transaction-based signals insufficient for a complete understanding of the scam’s methodology and underlying patterns, without which it is very difficult to prevent it in a timely manner. This paper presents CASE (Conversational Agent for Scam Elucidation), a novel Agentic AI framework that addresses this problem by collecting and managing user scam feedback in a safe and scalable manner. A conversational agent is uniquely designed to proactively interview potential victims to elicit intelligence in the form of a detailed conversation. The conversation transcripts are then consumed by another AI system that extracts information and converts it into structured data for downstream usage in automated and manual enforcement mechanisms. Using Google’s Gemini family of LLMs, we implemented this framework on Google Pay (GPay) India. By augmenting our existing features with this new intelligence, we have observed a 21% uplift in the volume of scam enforcements. The architecture and its robust evaluation framework are highly generalizable, offering a blueprint for building similar AI-driven systems to collect and manage scam intelligence in other sensitive domains.

[233] G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge

Linhao Luo, Zicheng Zhao, Junnan Liu, Zhangchi Qiu, Junnan Dong, Serge Panev, Chen Gong, Thuy-Trang Vu, Gholamreza Haffari, Dinh Phung, Alan Wee-Chung Liew, Shirui Pan

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) excel at complex reasoning but remain limited by static and incomplete parametric knowledge. Retrieval-augmented generation (RAG) mitigates this by incorporating external knowledge, yet existing RAGs struggle with knowledge-intensive tasks due to fragmented information and weak modeling of knowledge structure. Graphs offer a natural way to model relationships within knowledge, but LLMs are inherently unstructured and cannot effectively reason over graph-structured data. Recent graph-enhanced RAG (GraphRAG) attempts to bridge this gap by constructing tailored graphs and enabling LLMs to reason on them. However, these methods often depend on ad-hoc graph designs, heuristic search, or costly agent pipelines, which hinder scalability and generalization. To address these challenges, we present G-reasoner, a unified framework that integrates graph and language foundation models for scalable reasoning over diverse graph-structured knowledge. Central to our approach is QuadGraph, a standardized four-layer abstraction that unifies heterogeneous knowledge sources into a common graph representation. Building on this, we introduce a 34M-parameter graph foundation model (GFM) that jointly captures graph topology and textual semantics, and is integrated with LLMs to enhance reasoning in downstream applications. To ensure scalability and efficiency, mixed-precision training and distributed message-passing are implemented to scale GFM with more GPUs. Extensive experiments on six benchmarks show that G-reasoner consistently outperforms state-of-the-art baselines, significantly enhances LLM reasoning, and achieves strong efficiency and cross-graph generalization.

[234] Training-Free Time Series Classification via In-Context Reasoning with LLM Agents

Songyuan Sui, Zihang Xu, Xia Hu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Time series classification (TSC) spans diverse application scenarios, yet labeled data are often scarce, making task-specific training costly and inflexible. Recent reasoning-oriented large language models (LLMs) show promise in understanding temporal patterns, but purely zero-shot usage remains suboptimal. We propose FETA, a multi-agent framework for training-free TSC via exemplar-based in-context reasoning. FETA decomposes a multivariate series into channel-wise subproblems, retrieves a few structurally similar labeled examples for each channel, and leverages a reasoning LLM to compare the query against these exemplars, producing channel-level labels with self-assessed confidences; a confidence-weighted aggregator then fuses all channel decisions. This design eliminates the need for pretraining or fine-tuning, improves efficiency by pruning irrelevant channels and controlling input length, and enhances interpretability through exemplar grounding and confidence estimation. On nine challenging UEA datasets, FETA achieves strong accuracy under a fully training-free setting, surpassing multiple trained baselines. These results demonstrate that a multi-agent in-context reasoning framework can transform LLMs into competitive, plug-and-play TSC solvers without any parameter training. The code is available at https://github.com/SongyuanSui/FETATSC.

[235] A hybrid solution approach for the Integrated Healthcare Timetabling Competition 2024

Daniela Guericke, Rolf van der Hulst, Asal Karimpour, Ieke Schrader, Matthias Walter

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this work, we present the solution approach for the Integrated Healthcare Timetabling Competition 2024 submitted by Team Twente, which ultimately ranked third among the finalists. Our approach combines mixed-integer programming, constraint programming, and simulated annealing in a 3-phase solution approach based on decomposition into subproblems. In addition to describing our approach and design decisions, we share our insights and, for the first time, lower bounds on the optimal solution values for the benchmark instances. We analyze the results based on solution quality for the competition and an extended runtime Additionally, we investigate the different soft constraints and specific parts of the algorithm. Finally, we highlight open problems and future research directions for further improving the approach.

[236] E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory

Kaixiang Wang, Yidan Lin, Jiong Lou, Zhaojiacheng Zhou, Bunyod Suvonov, Jie Li

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.21714: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.21714&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[237] The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning

Henry Han, Xiyang Liu, Xiaodong Wang, Fei Han, Xiaodong Li

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.13595: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.13595&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[238] HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Xiaochen Zhao, Kaikai Wang, Xiaowen Zhang, Chen Yao, Aili Wang

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.13933: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.13933&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[239] Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.25719: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.25719&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[240] The Topology of Multimodal Fusion: Why Current Architectures Fail at Creative Cognition

Xiujiang Tan

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.04465: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.04465&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[241] Compiling Deterministic Structure into SLM Harnesses

Zan Kai Chong, Hiroyuki Ohsaki, Bryan Ng

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17450: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17450&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[242] Language models recognize dropout and Gaussian noise applied to their activations

Damiano Fornasiere, Mirko Bronzi, Spencer Kitts, Alessandro Palmas, Yoshua Bengio, Oliver Richardson

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17465: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17465&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[243] Error-free Training for MedMNIST Datasets

Bo Deng

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.18916: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.18916&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[244] Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Behnaz Ranjbar, Kirankumar Raveendiran, Sudeep Pasricha, Samarjit Chakraborty, Cecilia Carbonelli, Akash Kumar

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27807: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27807&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[245] D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

Hanane Nour Moussa, Yifei Li, Zhuoyang Li, Yankai Yang, Cheng Tang, Tianshu Zhang, Nesreen K. Ahmed, Ali Payani, Ziru Chen, Huan Sun

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27977: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27977&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[246] Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Qiyuan Zhu, Xuanhe Zhou, Jingxuan Wei, Siyuan Li, Jintao Chen, Conghui He, Cheng Tan

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliably reconstruct method evolution topologies from unstructured text. We introduce Intern-Atlas, a methodological evolution graph that automatically identifies method-level entities, infers lineage relationships among methodologies, and captures the bottlenecks that drive transitions between successive innovations. Built from 1,030,314 papers spanning AI conferences, journals, and arXiv preprints, the resulting graph comprises 9,410,201 semantically typed edges, each grounded in verbatim source evidence, forming a queryable causal network of methodological development. To operationalize this structure, we further propose a self-guided temporal tree search algorithm for constructing evolution chains that trace the progression of methods over time. We evaluate the quality of the resulting graph against expert-curated ground-truth evolution chains and observe strong alignment. In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graphs as a foundational data layer for the emerging automated scientific discovery.

[247] Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs

Washim Uddin Mondal, Vaneet Aggarwal

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2408.11513: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2408.11513&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[248] Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey

Hugo Attali, Davide Buscaldi, Nathalie Pernelle, Fragkiskos D. Malliaros

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2411.17429: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2411.17429&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[249] Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments

Luca Castri, Gloria Beraldo, Nicola Bellotto

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2504.11901: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2504.11901&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[250] Effective LLM Code Refinement via Property-Oriented and Structurally Minimal Feedback

Lehan He, Zeren Chen, Zhe Zhang, Xiang Gao, Lu Sheng

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.18315: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.18315&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[251] Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.06361: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.06361&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

Arshia Akhavan, Alireza Hosseinpour, Abbas Heydarnoori, Hamid Bagheri, Mehdi Keshani

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.12232: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.12232&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[253] Vibe Coding in Product Teams: Reconfiguring AI-Assisted Workflows, Prototyping, and Collaboration

Jie Li, Youyang Hou, Laura Lin, Ruihao Zhu, Hancheng Cao, Abdallah El Ali

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.10652: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.10652&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[254] LLM DNA: Tracing Model Evolution via Functional Representations

Zhaomin Wu, Haodong Zhao, Ziyang Wang, Jizhou Guo, Qian Wang, Bingsheng He

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24496: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24496&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[255] Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

Lorenzo Nikiforos, Luciano Prono, Charalampos Antoniadis, Fabio Pareschi, Riccardo Rovatti, Gianluca Setti

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.09696: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.09696&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[256] ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

Charidimos Papadakis, Angeliki Dimitriou, Giorgos Filandrianos, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.15949: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.15949&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[257] Feedback Lunch: Learned Feedback Codes for Secure Communications

Yingyao Zhou, Natasha Devroye, Onur Günlü

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.16620: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.16620&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[258] MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, Yiqun Liu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.17281: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.17281&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[259] Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

Md. Mehedi Hasan, Sk Tanzir Mehedi, Ziaur Rahman, Rafid Mostafiz, Md. Abir Hossain

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.22628: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.22628&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[260] TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation

Baris Yilmaz, Bevan Deniz Cilgin, Erdem Akagündüz, Salih Tileylioglu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.04694: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.04694&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[261] AI-Driven Expansion and Application of the Alexandria Database

Théo Cavignac, Jonathan Schmidt, Pierre-Paul De Breuck, Antoine Loew, Tiago F. T. Cerqueira, Hai-Chen Wang, Anton Bochkarev, Yury Lysogorskiy, Aldo H. Romero, Ralf Drautz, Silvana Botti, Miguel A. L. Marques

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.09169: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.09169&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[262] Evolutionary BP+OSD Decoding for Low-Latency Quantum Error Correction

Hee-Youl Kwak, Seong-Joon Park, Hyunwoo Jung, Jeongseok Ha, Jae-Won Kim

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.18273: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.18273&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[263] Adoption and Use of LLMs at an Academic Medical Center

Nigam H. Shah, Nerissa Ambers, Abby Pandya, Timothy Keyes, Juan M. Banda, Srikar Nallan, Carlene Lugtu, Artem A. Trotsyuk, Suhana Bedi, Alyssa Unell, Miguel Fuentes, Francois Grolleau, Sneha S. Jain, Jonathan Chen, Devdutta Dash, Danton Char, Aditya Sharma, Duncan McElfresh, Patrick Scully, Vishanthan Kumar, Clancy Dennis, Connor OBrien, Satchi Mouniswamy, Elvis Jones, Krishna Jasti, Gunavathi Mannika Lakshmanan, Sree Ram Akula, Varun Kumar Singh, Ramesh Rajmanickam, Sudhir Sinha, Vicky Zhou, Xu Wang, Bilal Mawji, Joshua Ge, Wencheng Li, Travis Lyons, Jarrod Helzer, Vikas Kakkar, Ramesh Powar, Darren Batara, Cheryl Cordova, William Frederick III, Olivia Tang, Phoebe Morgan, April S. Liang, Stephen P. Ma, Shivam Vedak, Dong-han Yao, Akshay Swaminathan, Mehr Kashyap, Brian Ng, Jamie Hellman, Nikesh Kotecha, Christopher Sharp, Gretchen Brown, Christian Lindmark, Anurang Revri, Michael A. Pfeffer

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.00074: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.00074&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[264] BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Abdullah Arafat Miah, Kevin Vu, Yu Bi

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.07200: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.07200&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[265] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education

Stephan Vonschallen, Dominique Oberle, Theresa Schmiedel, Friederike Eyssel

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.12873: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.12873&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[266] Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising

Yuduo Guo, Hao Zhang, Mingyu Li, Fujiang Yu, Yunjing Wu, Yuhan Hao, Song Huang, Yongming Liang, Xiaojing Lin, Xinyang Li, Jiamin Wu, Zheng Cai, Qionghai Dai

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.17205: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.17205&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[267] GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables

Zhengyu Li, Xiangfei Qiu, Yuhan Zhu, Xingjian Wu, Jilin Hu, Chenjuan Guo, Bin Yang

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.08032: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.08032&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[268] Semantic Level of Detail for Knowledge Graphs: Discovering Abstraction Boundaries via Spectral Heat Diffusion

Edward Izgorodin

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.08965: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.08965&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[269] GenRecEdit: Adapting Model Editing for Generative Recommendation with Cold-Start Items

Chenglei Shen, Teng Shi, Weijie Yu, Xiao Zhang, Jun Xu

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.14259: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.14259&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[270] Degrees, Levels, and Profiles of Contextuality

Ehtibar N. Dzhafarov, Victor H. Cervantes

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.26692: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.26692&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[271] A First Guess is Rarely the Final Answer: Learning to Search in the Traveling Salesperson Problem

Andoni Irazusta Garmendia

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.06940: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.06940&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[272] Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.07669: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.07669&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[273] Bridging the Experimental Last Mile: Digitizing Laboratory Know-How for Safe AI-Assisted Support

Akira Miura, Yuki Sasahara, Momoka Demura, Yuji Masubuchi, Tetsuya Asai, Chikahiko Mitsui

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.16345: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.16345&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[274] Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.18239: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.18239&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[275] Removing Sandbagging in LLMs by Training with Weak Supervision

Emil Ryd, Henning Bartsch, Julian Stastny, Joe Benton, Vivek Hebbar

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.22082: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.22082&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[276] SUDP: Secret-Use Delegation Protocol for Agentic Systems

Xiaohang Yu, Hejia Geng, Xinmeng Zeng, William Knottenbelt

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.24920: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.24920&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[277] Learning Rate Transfer in Normalized Transformers

Boris Shigida, Boris Hanin, Andrey Gromov

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27077: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27077&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[278] Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Halley Young, Nikolaj Björner

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27209: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27209&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[279] AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

Xubin Luo, Cheng Yang

Main category: cs.AI

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.27855: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.27855&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

cs.SD

[280] Alethia: A Foundational Encoder for Voice Deepfakes

Yi Zhu, Brahmi Dwivedi, Jayaram Raghuram, Surya Koppisetti

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Existing voice deepfake detection and localization models rely heavily on representations extracted from speech foundation models (SFMs). However, downstream finetuning has now reached a state of diminishing returns. In this paper, we shift the focus to pretraining and propose a novel recipe that combines bottleneck masked embedding prediction with flow-matching based spectrogram reconstruction. The outcome, Alethia, is the first foundational audio encoder for various voice deepfake detection and localization tasks. We evaluate on $5$ different tasks with $56$ benchmark datasets, and note Alethia significantly outperforms state-of-the-art SFMs with superior robustness to real-world perturbations and zero-shot generalization to unseen domains (e.g., singing deepfakes). We also demonstrate the limitation of discrete targets in masked token prediction, and show the importance of continuous embedding prediction and generative pretraining for capturing deepfake artifacts.

[281] Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation

Kuan-Po Huang, Bo-Ru Lu, Byeonggeun Kim, Mihee Lee, Zalan Fabian, Renard Korzeniowski, Qingming Tang, Greg Ver Steeg, Hung-yi Lee, Chieh-Chi Kao, Chao Wang

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Autoregressive (AR) models with diffusion heads have recently achieved strong text-to-audio performance, yet their iterative decoding and multi-step sampling process introduce high-latency issues. To address this bottleneck, we propose a one-step sampling framework that combines an energy-distance training objective with representation-level distillation. An energy-scoring head maps Gaussian noise directly to audio latents in one step, eliminating the need for a costly recursive diffusion sampling process, while distillation from a masked autoregressive (MAR) text-to-audio model preserves the strong conditioning learned during diffusion training. On the AudioCaps benchmark, our method consistently outperforms prior one-step baselines such as ConsistencyTTA, SoundCTM, AudioLCM and AudioTurbo, on both objective and subjective metrics, while substantially narrowing the quality gap to AR diffusion systems with multi-step sampling. Compared to the state-of-the-art AR diffusion system, IMPACT, our approach achieves up to $8.5$x faster batch inference with highly competitive audio quality. These results demonstrate that combining energy-distance training with representation-level distillation provides an effective recipe for fast, high-quality text-to-audio synthesis.

[282] MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation

Akira Takahashi, Ryosuke Sawata, Shusuke Takahashi, Yuki Mitsufuji

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Although recent video-to-audio (V2A) models excelled at synthesizing semantically plausible sounds from visual inputs, they do not explicitly model room-acoustic effects such as reverberation or room impulse responses (RIRs), and thus offer limited controllability over these effects. However, we hypothesize that such V2A models implicitly have semantic knowledge of the relationship between spatial audio and the corresponding vision cues. In this paper, we revisit a V2A model for the sake of the above, and propose the way to utilize the pretrained model as prior for physically grounded room-acoustic processing. Based on one of the state-of-the-art V2A models, MMAudio, we propose MMAudioReverbs that is a unified framework dealing with i) dereverberation and ii) room impulse response (RIR) estimation without network architectural modification, and fine-tuned on a small dataset. Experimental results showed that audio and visual cues respectively have advantage depending on the type of physical room acoustics. It implies that foundation V2A models can be used for physically grounded room-acoustic analysis.

[283] GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models

Zuyao You, Zhesong Yu, Mingyu Liu, Bilei Zhu, Yuan Wan, Zuxuan Wu

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this paper, we propose GaMMA, a state-of-the-art (SoTA) large multimodal model (LMM) designed to achieve comprehensive musical content understanding. GaMMA inherits the streamlined encoder-decoder design of LLaVA, enabling effective cross-modal learning between music and language. By incorporating audio encoders in a mixture-of-experts manner, GaMMA effectively unifies both time-series and non-time-series music understanding tasks within one set of parameters. Our approach combines carefully curated datasets at scale with a progressive training pipeline, effectively pushing the boundaries of music understanding via pretraining, supervised fine-tuning (SFT), and reinforcement learning (RL). To comprehensively assess both temporal and non-temporal capability of music LMMs, we introduce MusicBench, the largest music-oriented benchmark, comprising 3,739 human-curated multiple-choice questions covering diverse aspects of musical understanding. Extensive experiments demonstrate that GaMMA establishes new SoTA in the music domain, achieving 79.1% accuracy on MuchoMusic, 79.3% on MusicBench-Temporal, and 81.3% on MusicBench-Global, consistently outperforming previous methods.

[284] MMAudio-LABEL: Audio Event Labeling via Audio Generation for Silent Video

Kazuya Tateishi, Akira Takahashi, Atsuo Hiroe, Hirofumi Takeda, Shusuke Takahashi, Yuki Mitsufuji

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Recent advances in multimodal generation have enabled high-quality audio generation from silent videos. Practical applications, such as sound production, demand not only the generated audio but also explicit sound event labels detailing the type and timing of sounds. One straightforward approach involves applying a standard sound event detection to the generated audio. However, this post-hoc pipeline is inherently limited, as it is prone to error accumulation. To address this limitation, we propose MMAudio-LABEL (LAtent-Based Event Labeling), an event-aware audio generation framework built on a foundational audio generation model as its backbone that jointly generates audio and frame-aligned sound event predictions from silent videos. We evaluate our method on the Greatest Hits dataset for onset detection and 17-class material classification. Our approach improves onset-detection accuracy from 46.7% to 75.0% and material-classification accuracy from 40.6% to 61.0% over baselines. These results suggest that jointly learning audio generation and event prediction enables a more interpretable and practical video-to-audio synthesis.

[285] Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation

Anton Ratnarajah, Mehmet Ergezer, Arun Nair, Mrudula Athi

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The Room Acoustics and Speaker Distance Estimation (SDE) Challenge at ICASSP 2025 explores the effectiveness of augmented room impulse response (RIR) data for improving SDE model performance. This challenge at GenDARA involves generating RIRs to supplement sparse datasets and fine-tuning SDE models with the augmented data. We employ the open-source fast diffuse room impulse response generator (FastRIR) conditioned only on speaker and listener locations. We design a quality filter to ensure generated RIR alignment with challenge RIRs, and hyperparameter optimization is employed for model fine-tuning. Our approach reduces the mean absolute error (MAE) of the five positions from 1.66m to 0.6m for GWA rooms and from 2.18m to 0.69m for Treble rooms, with results demonstrating that the augmentation approach significantly improves estimation accuracy, particularly at medium to long distances.

[286] LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Venkata Pushpak Teja Menta

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, Hindi, Telugu, and Tamil, WavLM-base-plus-sv loses 0.082 absolute cosine similarity when the same voice changes script and ECAPA-TDNN loses 0.105. On a 1369-pair Indian-accented voice corpus, the gap shrinks to 0.006 (WavLM-SV) and 0.044 (ECAPA-TDNN). The leak is largest where it matters most for cross-script TTS: when a system projects a non-Indic-trained voice into Indic scripts. We present LASE (Language-Adversarial Speaker Encoder), a small projection head over frozen WavLM-base-plus trained with two losses: a supervised contrastive loss over voice identity, and a gradient-reversal cross-entropy against a 4-language classifier that pushes the embedding to be language-uninformative while remaining speaker-informative. Trained on 1118 quality-gated cross-script pairs synthesised from 8 commercial multilingual voices, LASE’s residual gap is consistent with zero on both corpora (Delta = 0.013 Western, Delta = 0.026 Indian; both bootstrap 95% CIs include zero) and amplifies the cross-script-vs-floor margin 2.4-2.7x over both baselines. An ECAPA+GRL ablation shows the GRL objective improves either backbone but the WavLM choice contributes too. In synthetic multi-speaker diarisation, LASE matches ECAPA-TDNN on cross-script speaker recall (0.788 vs 0.789) with ~100x less training data. We release the r1 checkpoint, both corpora, and the bootstrap recipe.

[287] Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement

Szu-Wei Fu, Rong Chao, Xuesong Yang, Sung-Feng Huang, Ryandhimas E. Zezario, Rauf Nasretdinov, Ante Jukić, Yu Tsao, Yu-Chiang Frank Wang

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.02641: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.02641&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[288] Environmental Sound Deepfake Detection Using Deep-Learning Framework

Lam Pham, Khoi Vu, Dat Tran, Phat Lam, Vu Nguyen, David Fischinger, Son Le

Main category: cs.SD

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In this paper, we propose a deep-learning framework for environmental sound deepfake detection (ESDD) – the task of identifying whether the sound scene and sound event in an input audio recording is fake or not. To this end, we conducted extensive experiments to explore how individual spectrograms, a wide range of network architectures and pre-trained models, ensemble of spectrograms or network architectures affect the ESDD task performance. The experimental results on the benchmark datasets of EnvSDD and ESDD-Challenge-TestSet indicate that detecting deepfake audio of sound scene and detecting deepfake audio of sound event should be considered as individual tasks. We also indicate that the approach of finetuning a pre-trained model is more effective compared with training a model from scratch for the ESDD task. Eventually, our best model, which was finetuned from the pre-trained WavLM model with the proposed three-stage training strategy, achieve the Accuracy of 0.98, F1 Score of 0.95, AuC of 0.99 on EnvSDD Test subset and the Accuracy of 0.88, F1 Score of 0.77, and AuC of 0.92 on ESDD-Challenge-TestSet dataset.

cs.LG

[289] Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

Pragya Sharma, Hang Qiu, Mani Srivastava

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The increasing deployment of deep neural networks (DNNs) in cyber-physical systems (CPS) enhances perception fidelity, but imposes substantial computational demands on execution platforms, posing challenges to real-time control deadlines. Traditional distributed CPS architectures typically favor on-device inference to avoid network variability and contention-induced delays on remote platforms. However, this design choice places significant energy and computational demands on the local hardware. In this work, we revisit the assumption that cloud-based inference is intrinsically unsuitable for latency-sensitive control tasks. We demonstrate that, when provisioned with high-throughput compute resources, cloud platforms can effectively amortize network and queueing delays, enabling them to match or surpass on-device performance for real-time decision-making. Specifically, we develop a formal analytical model that characterizes distributed inference latency as a function of the sensing frequency, platform throughput, network delay, and task-specific safety constraints. We instantiate this model in the context of emergency braking for autonomous driving and validate it through extensive simulations using real-time vehicular dynamics. Our empirical results identify concrete conditions under which cloud-based inference adheres to safety margins more reliably than its on-device counterpart. These findings challenge prevailing design strategies and suggest that the cloud is not merely a feasible option, but often the preferred inference location for distributed CPS architectures. In this light, the cloud is not as distant as traditionally perceived; in fact, it is closer than it appears.

[290] FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources

Md Sirajul Islam, Isabelle G Chapman, N I Md Ashafuddula, Xu Yuan, Li Chen, Nian-Feng Tzeng, Klara Nahrstedt

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated Learning (FL) enables collaborative intelligence across decentralized data source devices in a privacy-preserving way. While substantial research attention has been drawn to optimizing the learning process for an individual task, real-world applications increasingly require multiple machine learning tasks simultaneously training their models across a shared pool of devices. Naively applying single-FL optimization techniques in multi-FL systems results in suboptimal system performance, particularly due to device heterogeneity and resource inefficiency. To address such a critical open challenge, we introduce {\em FedACT}, a novel resource heterogeneity-aware device scheduling approach designed to efficiently schedule heterogeneous devices across multiple concurrent FL jobs, with the goal of minimizing their average job completion time (JCT). {\em FedACT} dynamically assigns devices to FL jobs based on an alignment scoring mechanism that evaluates the compatibility between available resources of devices and resource demands of jobs. Additionally, it incorporates participation fairness to ensure balanced contributions from devices across jobs, further enhancing the accuracy levels of learned global models. An optimal scheduling plan is formulated in {\em FedACT} by prioritizing devices with higher alignment scores, while ensuring fair participation across jobs. To evaluate the effectiveness of the proposed scheduling algorithm, we carried out comprehensive experiments using diverse FL jobs and benchmark datasets. Experimental results demonstrate that {\em FedACT} reduces the average JCT by up to 8.3(\times) and improves model accuracy by up to 44.5%, compared to the state-of-the-art baselines.

[291] What Physics do Data-Driven MoCap-to-Radar Models Learn?

Kevin Chen, Kenneth W. Parker, Anish Arora

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Data-driven MoCap-to-radar models generate plausible micro-Doppler spectrograms, but do they actually learn the underlying physics? We introduce a physics-based interpretability framework to answer this question via two proposed complementary metrics: one measures alignment between model predictions and the physics-derived Doppler frequency, while the other tests whether predictions preserve the velocity-frequency relationship under velocity intervention. Both metrics require only MoCap input and model predictions, without access to measured radar data. Experiments across several model architectures reveal that low reconstruction error does not guarantee physical consistency: some, but not all, models achieve low error yet perform poorly on the two physics-based metrics. Further analysis shows that temporal attention is critical for transformer-based models to learn the underlying physics.

[292] AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G

Kejia Bian, Meixia Tao, Jianhua Mo, Zhiyong Chen, Leyan Chen

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The success of large foundation models is catalyzing a new paradigm for AI-native 6G network design: wireless foundation models for physical layer design. However, existing models often operate on channel state information (CSI) in the space-time-frequency (STF) domain, where distinct multipath components are inherently superimposed and structurally entangled. This hinders the learning of universal channel representation. Meanwhile, their reliance on global attention mechanisms incurs prohibitive computational overhead. In this paper, we propose AirFM-DDA, an Air-interface Foundation Model operating in the Delay-Doppler-Angle (DDA) domain for physicallayer tasks. Specifically, AirFM-DDA reparameterizes CSI from the STF domain into the DDA domain to explicitly resolve multipath components along physically meaningful axes. It employs a window-based attention module augmented with framestructure-aware positional encoding (FS-PE). This window-based attention aligns with locally clustered multipath dependencies while avoiding quadratic-complexity global attention, and FS-PE injects frame-structure priors into network. Extensive experiments demonstrate that AirFM-DDA achieves superior zero-shot generalization across unseen scenarios and datasets, consistently outperforming the baselines on channel prediction and estimation tasks. Compared to the global attention, its window-based attention reduces training and inference costs by nearly an order of magnitude. Moreover, AirFM-DDA maintains robustness under high mobility, large delay spreads, severe noise, and extreme aliasing conditions.

[293] High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

Aleksandar Armacki, Haoyuan Cai, Ali H. Sayed

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent’s cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.

[294] Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

Chunlei Meng, Pengbin Feng, Rong Fu, Hoi Leong Lee, Xiaojing Du, Zhaolu Kang, Zeyu Zhang, Weilin Zhou, Chun Ouyang, Zhongxue Gan

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Centralized multimodal learning commonly compresses language, acoustic, and visual signals into a single fused representation for prediction. While effective, this paradigm suffers from two limitations: modality dominance, where optimization gravitates towards the path of least resistance, ignoring weaker but informative modalities, and spurious modality coupling, where models overfit to incidental cross-modal correlations. To address these, we propose Group Cognition Learning (GCL), a governed collaboration paradigm that applies a two-stage protocol after modality-specific encoding. In Stage 1 (Selective Interaction), a Routing Agent proposes directed interaction routes, and an Auditing Agent assigns sample-wise gates to emphasize exchanges that yield positive marginal predictive gain while suppressing redundant coupling. In Stage 2 (Consensus Formation), a Public-Factor Agent maintains an explicit shared factor, and an Aggregation Agent produces the final prediction through contribution-aware weighting while keeping each modality representation as a specialization channel. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate that GCL mitigates dominance and coupling, establishing state-of-the-art results across both regression and classification benchmarks. Analysis experiments further demonstrate the effectiveness of the design.

[295] Learning physically grounded traffic accident reconstruction from public accident reports

Yanchen Guan, Haicheng Liao, Chengyue Wang, Zhenning Li

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Traffic accidents are routinely documented in textual reports, yet physically grounded accident reconstruction remains difficult because detailed scene measurements and expert reconstructions are scarce, costly and hard to scale. Here we formulate accident reconstruction from publicly accessible reports and scene measurements as a parameterized multimodal learning problem. We construct CISS-REC, a dataset of 6,217 real-world accident cases curated from the NHTSA Crash Investigation Sampling System, and develop a reconstruction framework that grounds report semantics to road topology and participant attributes, reconstructs lane consistent pre-impact motion, and refines collision relevant interactions through localized geometric reasoning and temporal allocation. Our method outperforms representative baselines on CISS-REC, achieving the strongest overall reconstruction fidelity, including improved accident point accuracy and collision consistency. These results show that public accident reports can serve as scalable computational substrates for quantitatively verifiable accident reconstruction, with potential value for traffic safety analysis, simulation and autonomous driving research.

[296] Smart Ensemble Learning Framework for Predicting Groundwater Heavy Metal Pollution

T. Ansah-Narh, G. Y. Afrifa, J. B. Tandoh, K. Asare, M. Addi, K. E. Yorke, D. M. A. Akpoley, K. Aidoo, S. K. Fosuhene

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Groundwater in the Densu Basin is increasingly threatened by heavy metal contamination, but conventional methods fail to capture the statistical complexity and spatial heterogeneity of pollution indicators. A key challenge is modelling the Heavy Metal Pollution Index (HPI), which is typically skewed and affected by correlated contaminants, leading to biased predictions without transformation. This study develops a predictive framework integrating response transformations with nested cross-validated ensemble machine learning. Three transformations (raw, log, and Gaussian copula) were applied to HPI and evaluated across six learners: support vector regression (SVM), $k$-nearest neighbours (k-NN), CART, Elastic Net, kernel ridge regression, and a stacked Lasso ensemble. Raw-scale models produced deceptively high fits (Elastic Net and stacked ensemble $R^2 \approx 1.0$), suggesting over-optimism. The log transformation stabilised variance (SVM: $R^2 = 0.93$, RMSE $= 0.18$; k-NN: $R^2 = 0.92$, RMSE $= 0.20$). The Gaussian copula gave the most reliable results: stacked ensemble $R^2 = 0.96$ (RMSE $= 0.19$), with other learners maintaining high accuracy. Copula-based models improved residuals and produced spatially plausible maps. DBSCAN clustering revealed Fe and Mn as primary HPI contributors, consistent with regional hydrogeochemistry. Limitations include reliance on random (not spatial) cross-validation and basin-specific scope. Future work should explore spatial validation and other geological settings. Overall, distribution-aware ensembles with clustering diagnostics offer robust, interpretable assessments of groundwater contamination.

[297] Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Shradha Sharma, Swapnil Dhamal, Shweta Jain

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

[298] Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise

Mohammad Partohaghighi

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Information-theoretic generalization bounds analyze stochastic optimization by relating expected generalization error to the mutual information between learned parameters and training data. Virtual perturbation analyses of SGD add auxiliary Gaussian noise only in the proof, making mutual information tractable while leaving the actual SGD trajectory unchanged. Existing bounds, however, typically require perturbation covariances to be fixed independently of the optimization history, limiting their ability to represent geometries induced by moving gradient statistics, preconditioners, curvature proxies, and other pathwise information. We introduce predictable history-adaptive virtual perturbations, where the perturbation covariance at each iteration may depend on the past real SGD history but not on current or future randomness. This predictability enables a conditional Gaussian relative-entropy argument and yields generalization bounds for SGD with adaptive virtual-noise geometry. The bounds replace fixed sensitivity and gradient-deviation terms with conditional adaptive counterparts, include an output-sensitivity penalty from accumulated perturbation covariance, and reduce the deviation term to a conditional variance only under conditional unbiasedness. Since adaptive covariances may be data-dependent, we separate local Gaussian smoothing from global reference-kernel comparison. The resulting bound includes a covariance-comparison cost measuring the KL price of using an admissible reference geometry different from the actual adaptive covariance. Fixed-noise-style bounds are recovered under admissible synchronization, such as deterministic, public, or prefix-observable covariance rules. The framework recovers fixed isotropic and geometry-aware bounds as special cases while extending virtual perturbation analysis to history-dependent SGD without modifying the algorithm.

[299] RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., \texttt{IF}, \texttt{GOTO}, \texttt{FORALL}). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.

[300] Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications

Ricardo Luna Gutierrez, Sahand Ghorbanpour, Ejaz Rahman, Varchas Gopalaswamy, Riccardo Betti, Vineet Gundecha, Aarne Lees, Soumyendu Sarkar

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Inertial Confinement Fusion (ICF) holds transformative promise for sustainable, near-limitless clean energy, yet remains constrained by prohibitively high costs and limited experimental opportunities. This paper presents Human-in-the-Loop Meta Bayesian Optimization (HL-MBO), a framework that integrates expert knowledge with few-shot, uncertainty-aware machine learning to accelerate discovery in data-scarce, high-stakes scientific domains. HL-MBO introduces a meta-learned surrogate model with an expert-informed acquisition function to recommend candidate experiments. To foster trust and enable informed decisions, HL-MBO also provides interpretable explanations of its suggestions. We show HL-MBO outperforms current BO methods on ICF energy yield optimization, as well as benchmarks in molecular optimization and critical temperature maximization for superconducting materials.

[301] Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series

Christopher Holder, Anthony Bagnall

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Elastic distances like dynamic time warping (DTW) are central to time series machine learning because they compare sequences under local temporal misalignment. Soft-DTW is an adaptation of DTW that can be used as a gradient-based loss by replacing the hard minimum in its dynamic-programming recursion with a smooth relaxation. However, this approach does not directly extend to elastic distances whose transition costs depend on the local alignment context. Move-Split-Merge (MSM) is one such distance: it uses context-aware split and merge penalties and has often outperformed DTW in supervised and unsupervised time series machine learning tasks such as classification and clustering. We introduce Soft-MSM, a smooth relaxation of MSM and an elastic alignment loss with context-aware transition costs. Central to the formulation is a smooth gated surrogate for MSM’s piecewise split/merge cost, which enables gradients through both the dynamic-programming recursion and the local transition structure. We derive the forward recursion, backward recursion, soft alignment matrix, closed-form gradient, limiting behaviour, and divergence-corrected formulation. Experiments on 112 UCR datasets show that Soft-MSM gives lower MSM barycentre loss than existing MSM barycentre methods, and yields significantly better clustering and nearest-centroid classification performance than Soft-DTW-based alternatives. An implementation is available in the open-source \texttt{aeon} toolkit.

[302] CRADIPOR: Crash Dispersion Predictor

Edgar Chaillou, Sebastian Rodriguez, Yves Tourbier, Francisco Chinesta

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present CRADIPOR, a numerical dispersion prediction tool for automotive crash simulations. Finite Element (FE) crash models are widely used throughout vehicle development, but their predictions are not strictly repeatable because of parallel computation and model complexity. As a result, performance criteria evaluated during post-processing may exhibit significant numerical dispersion, which complicates engineering decision-making. Although dispersion can be estimated by repeating the same simulation, this approach is generally impractical because of its high computational cost. This work therefore investigates a prediction tool that can be applied during routine crash-simulation post-processing without repeating the computation. The proposed approach relies on a Rank Reduction Autoencoder (RRAE) combined with supervised classification in order to identify regions sensitive to numerical dispersion. The comparative analysis suggests that the RRAE-based framework is more effective than the Random Forest baseline on the studied dataset. Among the tested signal representations, wavelet-based and slope-based inputs appear to be the most promising, with slope variations providing the best classification performance. These results support the use of structured latent representations for improving numerical-dispersion detection in automotive crash post-processing.

[303] Hyperspherical Forward-Forward with Prototypical Representations

Shalini Sarode, Brian Moser, Joachim Folz, Federico Raue, Tobias Nauen, Stanislav Frolov, Andreas Dengel

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The Forward-Forward (FF) algorithm presents a compelling, bio-inspired alternative to backpropagation. However, while efficient in training, it has a computationally prohibitive inference process that requires a separate forward pass for every class that is evaluated. In this work, we introduce the Hyperspherical Forward-Forward (HFF), a novel reformulation that resolves this critical bottleneck. Our core innovation is to reframe the local objective of each layer from a binary goodness-of-fit task to a direct multi-class classification problem within a hyperspherical feature space. We achieve this by learning a set of class-specific, unit-norm prototypes that act as geometric anchors and implicit negatives. This architectural innovation preserves the benefits of local training while enabling weight update and inference in a single forward pass, making it >40x faster than the original FF algorithm. Our method is simple to implement, scales effectively to modern convolutional architectures, and achieves superior accuracy on standard image classification benchmarks, closing the gap with backpropagation. Most notably, we are among the first greedy local-learning methods to report over 25% top-1 accuracy on ImageNet-1k, and 65.96% with transfer learning.

[304] Comparative Analysis of Polygon-Based and Global Machine Learning Models for Bus Occupancy Prediction

Daniel Azenkot, Michael Fire, Eran Ben Elia

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Accurate forecasting of bus ridership (passengers numbers) is crucial for efficient management and optimization of public transport systems. Traditional forecasting models often fail to capture the unique and localized dynamics of different urban areas by treating the entire city as a single, homogeneous region. This paper introduces a novel framework that enhances bus ridership prediction by integrating a spatial clustering methodology with multi-dimensional feature analysis. The proposed framework utilizes a diverse set of data, including bus ridership data (by route number, time, and bus stop) complemented by a variety of open source data, such as spatial features (e.g., attractive destinations), meteorological conditions (e.g., temperature, rainfall), and temporal patterns (e.g., time of day, day of week). By clustering the urban area into distinct regions, based on the principle that bus stops in close proximity share similar ridership characteristics, a separate local forecasting model is trained for each of these clusters. This localized approach demonstrates an accuracy comparable to that of global models. The findings suggest that a spatially-aware, localized modeling strategy is effective for public transport prediction, paving the way for more targeted and efficient service improvements.

[305] SPLICE: Latent Diffusion over JEPA Embeddings for Conformal Time-Series Inpainting

Arnaud Zinflou

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Generative models for time-series imputation achieve strong reconstruction accuracy, yet provide no finite-sample reliability guarantees, a critical limitation in power systems where imputed values inform dispatch and planning. We introduce SPLICE (Self-supervised Predictive Latent Inpainting with Conformal Envelopes), a modular framework coupling latent generative imputation with distribution-free, online-adaptive prediction intervals. A JEPA encoder maps daily load segments into a 64-dimensional latent space; a conditional latent bridge with four sampling modes generates candidate gap trajectories; an hourly-conditioned decoder maps back to signal space; and Adaptive Conformal Inference (ACI) wraps the output with coverage-guaranteed prediction bands. The flow-matching variant achieves comparable quality to DDIM in 5–10 ODE steps (5-10x speedup). On thirteen load datasets (nine proprietary, three UCI Electricity, ETTh1), SPLICE achieves the lowest mean Load-only MSE (0.056), winning 9/12 non-degenerate datasets at 91-day gaps and 18/32 across all gap lengths vs. five established baselines, and produces the best CRPS (0.161, -18.3% vs. the strongest competitor). ACI delivers 93–95% empirical coverage, correcting under-coverage failures of up to 7.5 pp observed with static conformal prediction. A pooled JEPA encoder trained on nine feeds transfers to four unseen domains, matching or exceeding per-dataset oracles with only a quick bridge fine-tuning.

[306] Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization

Huayu Li, ZhengXiao He, Xiwen Chen, Jingjing Wang, Siyuan Tian, Jinghao Wen, Ao Li

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Learning meaningful representations from medical time series (MedTS) such as ECG or EEG signals is a critical challenge. These signals are often high-dimensional, variable-length and rife with noise. Existing self-supervised approaches, such as Masked Autoencoders (MAEs) are highly effective for pre-training general-purpose encoders. However, they do not explicitly learn compact and semantically interpretable latent representations, typically relying on heuristic aggregation strategies such as global average pooling or a designated [CLS] token. We propose a novel framework that compresses a variable-length MedTS into a fixed-size set of $k$ latent Fingerprint Tokens. Our architecture employs a cross-attention bottleneck to generate these tokens and is trained with a dual-objective function. The first objective is a reconstruction loss, which ensures the tokens are \textit{sufficient statistics} for the original data. The second, a diversity penalty based on the Total Coding Rate (TCR), explicitly minimizes the redundancy between tokens, encouraging them to become statistically \textit{disentangled} representations. We present the theoretical justification for our method, framing it as a novel \textbf{Disentangled Rate-Distortion} problem. This approach produces a low-dimensional, interpretable, and sample-efficient representation, where each token is encouraged to capture an independent factor of variation, paving the way for more robust digital biomarkers.

[307] Smart Profit-Aware Crop Advisory System: Kisan AI

Debasis Dwibedy, Avyay Nishtala, Pranathi Mukku, D Snehaja

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Modern crop advisory systems exhibit a critical limitation termed \textit{economic blindness}. These systems primarily optimize for biological yield, often overlooking market price, which can lead farmers toward agronomically sound yet financially unviable decisions. In this paper, we develop Kisan AI, a smart profit-aware crop advisory system that resolves the above-mentioned limitation through a research-driven, full-stack application. We train the Random Forest(RF) classifier model on a nine-feature benchmark dataset, the standard seven agronomic attributes augmented with a \textit{market_price} variable, and evaluated against eight baseline models, considering the evaluation matrices, such as, accuracy, precision, recall, F1-score, and Log Loss. The RF model achieves the highest accuracy of 99.3% and the lowest Log Loss, confirming that the inclusion of market price as a predictive feature is both valid and impactful. We then implement the RF model within a multilingual progressive Web App alongside a Facebook Prophet six-month price forecasting engine and a MobileNetV2 disease detection module. A nine-language AI chatbot powered by the Anthropic Claude API unifies all modules into a single, mobile-installable platform accessible to farmers across India.

[308] Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization

YiFeng Wang, Zhun Sun, Keisuke Sakaguchi

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Activation Residual Hessian Quantization (ARHQ), a post-training weight splitting method designed to mitigate error propagation in low-bit activation-weight quantization. By constructing an input-side residual Hessian from activation quantization residuals (G_x), ARHQ analytically identifies and isolates error-sensitive weight directions into a high-precision low-rank branch. This is achieved via a closed-form truncated SVD on the scaled weight matrix W G^{1/2}_x . Experimental results on Qwen3-4B-Thinking-2507 demonstrate that ARHQ significantly improves layer-wise SNR and preserves downstream reasoning performance on ZebraLogic even under aggressive quantization. The code is available at https://github.com/BeautMoonQ/ARHQ.

[309] Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Yikai Wang, Shang Liu, Jose Blanchet

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement learning from human feedback (RLHF) has become a core post-training step for aligning large language models, yet the reward signal used in RLHF is only a learned proxy for true human utility. From an operations research perspective, this creates a decision problem under objective misspecification: the policy is optimized against an estimated reward, while deployment performance is determined by an unobserved objective. The resulting gap leads to reward over-optimization, or Goodharting, where proxy reward continues to improve even after true quality deteriorates. Existing mitigations address this problem through uncertainty penalties, pessimistic rewards, or conservative constraints, but they can be computationally burdensome and overly pessimistic. We propose Wasserstein distributionally robust regret optimization (DRRO) for RLHF. Instead of pessimizing worst-case value as in standard DRO, DRRO pessimizes worst-case regret relative to the best policy under the same plausible reward perturbation. We study the promptwise problem through a simplex allocation model and show that, under an $\ell_1$ ambiguity set, the inner worst-case regret admits an exact solution and the optimal policy has a water-filling structure. These results lead to a practical policy-gradient algorithm with a simple sampled-bonus interpretation and only minor changes to PPO/GRPO-style RLHF training. The framework also clarifies theoretically why DRRO is less pessimistic than DRO, and our experiments show that DRRO mitigates over-optimization more effectively than existing baselines while standard DRO is systematically over-pessimistic.

[310] Consistent Diffusion Language Models

Hasan Amin, Yuan Gao, Yaser Souri, Subhojit Som, Ming Yin, Rajiv Khanna, Xia Song

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Diffusion language models (DLMs) are an attractive alternative to autoregressive models because they promise sublinear-time, parallel generation, yet practical gains remain elusive as high-quality samples still demand hundreds of refinement steps. In continuous domains, consistency training along the probability-flow ODE is a popular recipe to accelerate diffusion. For discrete diffusion, no analogous sample-space ODE exists, making direct adaptation ill-defined. We argue that the natural discrete substitute is not a deterministic trajectory but its stochastic counterpart: the exact posterior bridge, available in closed form for broad corruption families including masked and uniform diffusion. Building on this observation, we introduce Multi-Path Discrete Consistency (MPDC), a new principle that trains a denoiser to be path-invariant in expectation across these stochastic bridges, and instantiate it as the Consistent Diffusion Language Model (CDLM), a single-stage, teacher-free training framework. A single CDLM objective unifies masked diffusion, continuous consistency models, and progressive/discrete distillation as analytic limits or empirical approximations of one common view. Empirically, CDLM establishes a new state of the art on both conditional and unconditional text-generation, consistently outperforming strong base discrete diffusion models and often even multi-stage distilled baselines across sampling budgets, with the largest gains in the few-step regime. Together, these results position CDLM as a principled and scalable foundation for the next generation of fast, high-fidelity discrete generative modeling.

[311] Towards A Generative Protein Evolution Machine with DPLM-Evo

Xinyou Wang, Liang Hong, Jiasheng Ye, Zaixiang Zheng, Yu Li, Shujian Huang, Quanquan Gu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Proteins are shaped by gradual evolution under biophysical and functional constraints. Protein language models learn rich evolutionary constraints from large-scale sequences, and discrete diffusion-based protein language models~(\eg, DPLMs) are promising for both understanding and generation. However, existing DPLMs typically rely on masking-based absorbing diffusion that contradicts a simple biological intuition: proteins evolve through accumulated edits, not by emerging from masks. Consequently, these frameworks lack explicit pretraining objectives for substitution and insertion/deletion (indel) operations, limiting both optimization-style post-editing and flexible guided generation. To address these limitations, we present DPLM-Evo, an evolutionary discrete diffusion framework that explicitly predicts substitution, insertion, and deletion operations during denoising. DPLM-Evo decouples an upsampled-length latent alignment space from the variable-length observed sequence space, which makes indel-aware generation tractable and enables adaptive scaffold growth throughout the process with negligible computational overhead. To better align substitutions with real evolution, we further introduce a contextualized evolutionary noising kernel that produces biologically informed, context-dependent mutation patterns. Across tasks, DPLM-Evo improves sequence understanding and achieves state-of-the-art mutation effect prediction performance on ProteinGym in the single-sequence setting. It also enables variable-length simulated evolution, and post-editing/optimization of existing proteins via explicit edit trajectories.

[312] Introducing WARM-VR: Benchmark Dataset for Multimodal Wearable Affect Recognition in Virtual Reality

Karim Alghoul, Faisal Mohd, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: With the growing integration of human-computer interaction into everyday life, advances in machine learning have enabled systems to better perceive and respond to users’ emotional states. Most existing affect recognition datasets focus on static environments, limiting their applicability to immersive multimedia contexts such as Virtual Reality (VR). In this paper, we introduce WARM-VR, a novel publicly available multimodal dataset designed to support affect recognition in immersive, multisensory environments using wearable sensing instrumentation. Data were collected from 31 participants aged 19-37 using wearable sensors: a wristband measuring Blood Volume Pulse (BVP), EDA, skin Temperature, three-axis Acceleration, and a chest strap recording ECG signals. Participants engaged in immersive VR experiences designed to elicit relaxation through a calming beach environment following stress induction via an arithmetic task. These sessions incorporated synchronized multimedia stimuli: visual, auditory, and olfactory. Affective states were assessed subjectively through validated self-report questionnaires and objectively through the analysis of physiological measurements. Statistical analysis of the questionnaires confirmed that VR relaxation significantly reduced negative affect, particularly with olfactory enhancement. Furthermore, we established a benchmark on the dataset using widely recognized machine learning algorithms. The best performance for binary classification from BVP data of valence, was obtained with a CNN and a CNN-Bi-GRU model, both achieving an average F1-score of 0.63 and an AUC of 0.69. For arousal, a lightweight Transformer architecture provided the most balanced results (F1-0 0.54 and F1-1 0.63), outperforming recurrent hybrids. In the relaxation task, a CNN-Bi-GRU model reached the highest overall performance (average F1-score 0.64, AUC 0.69).

[313] Fair Dataset Distillation via Cross-Group Barycenter Alignment

Mohammad Hossein Moslemi, Nima Hosseini Dashtbayaz, Zhimin Mei, Boyu Wang, Bissan Ghaddar

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dataset Distillation aims to compress a large dataset into a small synthetic one while maintaining predictive performance. We show that as different demographic groups exhibit distinct predictive patterns, the distillation process struggles to simultaneously preserve informative signals for all subgroups, regardless of whether group sizes are mildly or severely imbalanced. Consequently, models trained on distilled data can experience substantial performance drops for certain subgroups, leading to fairness gaps. Crucially, these gaps do not disappear by merely correcting group imbalance, since they stem from fundamental mismatches in subgroup predictive patterns rather than from sample-size disparities alone. We therefore formally analyze the interaction between these two sources of bias and cast the solution as identifying a group-imbalance-agnostic barycenter of the predictive information that induces similar representations across all subgroups. By distilling toward this shared aggregate representation, we show that group fairness concerns can be reduced. Our approach is compatible with existing distillation methods, and empirical results show that it substantially reduces bias introduced by dataset distillation.

[314] OTSS: Output-Targeted Soft Segmentation for Contextual Decision-Weight Learning

Renjun Hu, Hyun-Soo Ahn

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Many machine learning systems make constrained decisions by optimizing factorized objectives, but the context-specific objective is often treated as fixed. We study contextual decision-weight learning: from logged decisions and proxy outputs, learn an optimizer-facing weight vector w(x) over interpretable decision factors z(x,d), rather than a direct policy or generic predictive score. We propose OTSS, an output-targeted soft-segmentation model that deploys the personalized decision-ready weight vector. At the function-class level, the theory highlights a hard-versus-soft distinction. Hard partitions incur an approximation-estimation tradeoff under overlap, while a realizable fixed-K soft class removes the hard-partition approximation floor and attains a parametric rate. We evaluate OTSS in controlled benchmarks with finite evaluation libraries, where the true weight vector and downstream regret can be computed exactly. In the representative overlap setting, OTSS attains the lowest mean regret among the comparators, including EM mixture regression, the strongest soft-mixture baseline in our comparison; it matches EM on coefficient recovery while running about two orders of magnitude faster. In a matched K=5 benchmark, OTSS remains competitive under hard-routed truth and improves as heterogeneity becomes softer and sample size grows. On a fixed Complete Journey retail anchor with real household covariates and action geometry, OTSS again achieves the lowest mean-regret point estimate.

[315] Diversity in Large Language Models under Supervised Fine-Tuning

Roman Klypa, Oleksandr Cherednichenko

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Supervised Fine-Tuning (SFT) is essential for aligning Large Language Models (LLMs) with user intent, yet it is believed to suppress generative diversity. Although this reduction is frequently referenced, formal empirical testing of the phenomenon remains limited. The expressiveness of LLMs by itself was addressed by multiple prior methods. Their varying perspectives suggest that deeper analysis could yield further improvements. In this study, we attribute the decline to two primary drivers: the neglect of low-frequency patterns within fine-tuning datasets and the forgetting of preexisting knowledge. Motivated by our theoretical analysis, we develop Tempered Focal (TOFU) loss, a novel objective that addresses both stated challenges simultaneously. Our extensive evaluation confirms at scale that generation breadth narrows after SFT and strengthens the hypothesis explaining this effect. Across multiple models and benchmarks, we demonstrate that TOFU enhances output diversity while preserving high response quality, offering a principled approach to SFT.

[316] State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning

Thea Aviss

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Current transformers discard their rich latent residual stream between positions, reconstructing latent reasoning context at each new position and leaving potential reasoning capacity untapped. The State Stream Transformer (SST) V2 enables parameter-efficient reasoning in continuous latent space through an FFN-driven nonlinear recurrence at each decoder layer, where latent states are streamed horizontally across the full sequence via a learned blend. This same mechanism supports continuous latent deliberation per position at inference time, dedicating additional FLOPs to exploring abstract reasoning before committing to a token. A two-pass parallel training procedure resolves the sequential dependency of the recurrence to allow compute-efficient training. Hidden state analysis shows the state stream facilitates reasoning through exploration of distinct semantic basins in continuous latent space, where transitions at content-dependent positions move the model into a substantially different Bayesian posterior, directly influencing the latent space at future positions. We also find, via a learned probe, that at the first generated token position, the latent state already predicts whether the eventual answer will survive or break under additional latent computation for every subsequent position. Co-trained into an existing 27B backbone using only a small dataset of GSM8K examples, the SST delivers a +15.15 point gain over a fine-tuning-matched baseline on out-of-distribution GPQA-Diamond and cuts that same baseline’s remaining GSM8K errors by 46%, together showing that the reasoning improvement is attributable to the architectural mechanism rather than scale or training data. On GPQA-Diamond, the resulting 27B SST also achieves higher accuracy than several larger open-weight and proprietary systems, including open-weight models up to 25 times larger.

[317] CompleteRXN: Toward Completing Open Chemical Reaction Databases

Gabriel Vogel, Minouk Noordsij, Evgeny Pidko, Jana M. Weber

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Chemical reaction datasets such as USPTO suffer from substantial incompleteness, frequently missing byproducts, co-reactants, and stoichiometric coefficients. This limits their applicability and reliability in downstream applications. Here, we introduce CompleteRXN, a large-scale supervised benchmark for reaction completion under realistic missing-data conditions. We construct a dataset of aligned incomplete and atom-balanced reactions by mapping USPTO records to curated mechanistic reactions. We evaluate representative baselines, including a novel encoder-decoder reaction completion model with constrained decoding, the Constrained Reaction Balancer (CRB), and a recent algorithmic method, SynRBL. On our CompleteRXN benchmark, the CRB achieves high performance across splits of increasing difficulty, reaching 99.20% equivalence accuracy on the random split and 91.12% on the extreme out-of-distribution split. SynRBL produces many balanced and chemically plausible completions, but with lower accuracy on the benchmark test splits. Across all methods, performance degrades with increasing incompleteness. We observe a substantial drop when evaluating on reactions outside the benchmark (full uncurated USPTO), highlighting the gap between benchmark performance and practical robustness and motivating future work.

[318] Bayesian Optimization in Linear Time

Jesse Schneider, William J. Welch

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Bayesian optimization is a sequential method for minimizing objective functions that are expensive to evaluate and about which few assumptions can be made. By using all gathered data to train a Gaussian process model for the function and adaptively employing a mixture of global exploration and local exploitation, this method has been used for optimization in many fields including machine learning, automotive engineering and reinforcement learning. However, the standard method suffers from two problems: 1) with cubic computational complexity in the training-set size it eventually becomes computationally infeasible to train the model, and 2) globally modeling the objective function is not necessarily optimal given the local nature of minimization. Using flexible and recursive binary partitioning of the search space, we adapt both the modeling and acquisitive aspects of standard Bayesian optimization to work harmoniously with the partitioning scheme, thereby ameliorating both standard shortcomings. We compare our method against a commonly used Bayesian optimization library on seven challenging test functions, ranging in dimensionality from $6$ to $124$, and show that our method achieves superior optimization performance in all tests. In addition our method has linear computational complexity.

[319] NLPOpt-Net: A Learning Method for Nonlinear Optimization with Feasibility Guarantees

Bimol Nath Roy, Rahul Golder, MM Faruque Hasan

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Nonlinear Parametric Optimization Network (NLPOpt-Net) is an unsupervised learning architecture to solve constrained nonlinear programs (NLP). Given the structure of an NLP, it learns the parametric solution maps with guaranteed constraint satisfaction. The architecture consists of a backbone neural network (NN) followed by a multilayer ($k$-layered) projection. While the NN drives toward optimality through a loss function consisting of a modified Lagrangian augmented with a consistency loss, the projection ensures feasibility by projecting the NN predictions in the original constraint manifold. Instead of typical distance minimization, our projection exploits local quadratic approximations of the original NLP. Under certain conditions (such as convexity), the projection has a descent property, which improves the NN predictions further. NLPOpt-Net deploys an inversion-free, modified Chambolle-Pock algorithm to solve the constrained quadratic projections during the forward pass and uses the implicit function theorem for efficient backpropagation. The fixed structure of the projection further allows decoupling of the NN and the projection once the training is complete. NLPOpt-Net solves large-scale convex QP, QCQP, NLP, and nonconvex problems with near zero optimality gap and constraint violations reduced to machine precision. Additionally, it provides near accurate prediction of the active sets and corresponding dual variables, thereby enabling a scalable approach for multiparametric programming. Compiling the projection in C provides order of magnitude improvement in inference time compared to JAX. We provide the codes and NLPOpt-Net as a ready to use package that includes GPU support.

[320] Pessimism-Free Offline Learning in General-Sum Games via KL Regularization

Claire Chen, Yuheng Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Offline multi-agent reinforcement learning in general-sum settings is challenged by the distribution shift between logged datasets and target equilibrium policies. While standard methods rely on manual pessimistic penalties, we demonstrate that KL regularization suffices to stabilize learning and achieve equilibrium recovery. We propose General-sum Anchored Nash Equilibrium (GANE), which recovers regularized Nash equilibria at an accelerated statistical rate of $\widetilde{O}(1/n)$. For computational tractability, we develop General-sum Anchored Mirror Descent (GAMD), an iterative algorithm converging to a Coarse Correlated Equilibrium at the standard rate of $\widetilde{O}(1/\sqrt{n}+1/T)$. These results establish KL regularization as a standalone mechanism for pessimism-free offline learning that achieves equivalent or accelerated rates in multi-player general-sum games.

[321] Jailbroken Frontier Models Retain Their Capabilities

Daniel Zhu, Zihan Wang, Jenny Bao, Jerry Wei

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a “jailbreak tax” that degrades the target model’s task performance. We show that this tax scales inversely with model capability and that the most advanced jailbreaks effectively yield no reduction in model capabilities. Evaluating 28 jailbreaks on five benchmarks across Claude models ranging in capability from Haiku 4.5 to Opus 4.6, we find Haiku 4.5 loses an average of 33.1% on benchmark performance when jailbroken, while Opus 4.6 at max thinking effort loses only 7.7%. We also observe that across all models, reasoning-heavy tasks display considerably more degradation than knowledge-recall tasks. Finally, Boundary Point Jailbreaking, currently the strongest jailbreak against deployed classifiers, achieves near-perfect classifier evasion with near-zero degradation across safeguarded models. We recommend that safety cases for frontier models should not rely on a meaningful capability degradation from jailbreaks.

[322] Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning

Sahil Mishra, Srinitish Srinivasan, Sourish Dasgupta, Tanmoy Chakraborty

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-world knowledge is often organized as hierarchies such as product taxonomies, medical ontologies, and label trees, yet learning hierarchical representations is challenging due to asymmetric structure and noisy semantics. We introduce Polaris, a polar hyperspherical embedding framework that separates semanticity from hierarchy using angular geometry and radius, enabling the learning of meaning and structure without interference. To map latent representation onto the sphere, we project it to the tangent space at the north pole, apply the exponential map, and learn unit-norm representations using spherical linear layers. Polaris then combines robust local constraints, global regularization that prevents geometric collapse, and uncertainty-aware asymmetric objectives that encourage directional containment. At inference time, Polaris uses structure-guided retrieval to efficiently narrow down candidate parents before final ranking. We evaluate Polaris on different settings of taxonomy expansion - spanning trees, multi-parent DAGs, and multimodal hierarchies, showing consistent improvements of up to ~19 points in top-K retrieval and up to ~60% reduction in mean rank over fourteen strong baselines.

[323] Caracal: Causal Architecture via Spectral Mixing

Bingzheng Gan, Tianyi Zhang, Yusu Li, Jing Huang, Wei Shi, Yangkai Ding, Tao Yu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, $\mathcal{O}(L \log L)$ Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.

[324] A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions

Matteo Raviola, Benjamin Peherstorfer

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dirac-Frenkel instantaneous residual minimization evolves nonlinear parametrizations of PDE solutions in time, but ill-conditioning can render the parameter dynamics non-unique. We interpret this non-uniqueness as a gauge freedom: nullspace directions that leave the time derivative unchanged can be used to select better-conditioned parameter velocities. Building on Onsager’s minimum-dissipation principle, we introduce a history variable – interpretable as momentum – and inject it only along the nullspace directions. The resulting Dirac-Frenkel-Onsager dynamics preserve instantaneous residual minimization, in contrast to standard regularization that can introduce bias, while promoting temporally smooth parameter evolutions. Examples demonstrate that the approach leads to increased robustness in singular and near-singular regimes.

[325] Data Deletion Can Help in Adaptive RL

Param Budhraja, Aditya Gangrade, Alex Olshevsky, Venkatesh Saligrama

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Markov Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called “universal policy” which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory. We identify a simple, counterintuitive trick that substantially improves the estimator: randomly delete a fraction of the training buffer after each round. This works because data is collected across multiple rounds using progressively better policies, and older trajectories come from a different distribution than what the estimator will face at deployment time; random deletion creates an implicit exponential decay on older data while preserving diversity without requiring any explicit identification of which samples are stale. This reduces robustness gap by 30% for MLPs and by 6% on average for recurrent networks. Strikingly, it allows a narrow MLP with 5x fewer parameters to outperform a wide MLP trained without deletion. To understand when and why deletion helps, we analyze regularized empirical risk minimization with a mismatch between the train distribution and the distribution at deployment; in this idealized setting, we prove that removing a single uniformly random training point decreases expected test loss in expectation under mild conditions. For ridge regression we make this quantitative: deletion helps when the regularization coefficient is moderate and the signal-to-noise ratio (SNR) is sufficiently low, and, crucially, this SNR threshold gives a direct measure of how large the distribution mismatch between training and deployment must be for deletion to be beneficial.

[326] Federated Weather Modeling on Sensor Data

Shengchao Chen, Guodong Long

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated weather modeling on sensor data is a distributed system underpinned by federated learning, enabling multiple sensor data sources, including ground weather stations, satellites and IoT devices, to collaboratively train deep learning models without sharing raw data. This method safeguards data privacy and security while leverages diverse, geographically distributed datasets to improve the accuracy and robustness of global/regional weather modeling tasks such as forecasting and anomaly detection.

[327] Conformalized Quantum DeepONet Ensembles for Scalable Operator Learning with Distribution-Free Uncertainty

Purav Matlia, Christian Moya, Guang Lin

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Operator learning enables fast surrogate modeling of high-dimensional dynamical systems, but existing approaches face two fundamental limitations: quadratic inference complexity and unreliable uncertainty quantification in safety-critical settings. We propose Conformalized Quantum DeepONet Ensembles, a framework that addresses both challenges simultaneously. By leveraging Quantum Orthogonal Neural Networks (QOrthoNNs), we reduce operator inference complexity from O(n^2) to O(n), enabling scalable evaluation over fine discretizations. To provide rigorous uncertainty quantification, we combine ensemble-based epistemic modeling with adaptive conformal prediction, yielding distribution-free coverage guarantees. A key challenge in ensembling is that naive parallelism scales hardware resources linearly with the number of models. We resolve this by using Superposed Parameterized Quantum Circuits (SPQCs), which compress multiple ensemble members into a single circuit and enable simultaneous multi-model execution. Experiments on synthetic partial differential equations and real-world power system dynamics demonstrate that our approach achieves accurate predictions while maintaining calibrated uncertainty under realistic quantum noise. These results establish a practical pathway toward scalable, uncertainty-aware operator learning in quantum machine learning.

[328] Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities

Abay Bektursun

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Frozen Gemma 4 31B weights pretrained exclusively on text tokens, unmodified, transfer across modality boundaries through a thin trainable interface. (1) OGBench scene-play-singletask-task1-v0: $+4.33$pt over published GCIQL at $n=3$ with std 0.74 – a published-SOTA win on a robotic manipulation task the substrate has never seen. (2) D4RL Walker2d-medium-v2: Decision-Transformer parity ($76.2 \pm 0.8$, $n=3$) at $0.43\times$ DT’s trainable count, with the frozen substrate compressing to a 5L slice ($+1.66$pt over the 6L baseline at $n=3$). (3) Associative recall as the cleanest pretraining-load-bearing case: the frozen slice + a 113K-parameter linear interface reaches L30 best-checkpoint per-bit error 0.0505 ($n=2$); a 6.36M-parameter from-scratch trained transformer at matched capacity ($1/\sqrt{d_k}$ scaling, two seeds, LR sweep) cannot solve the task at all under the protocol (best L30 = 0.4395), an $8.7\times$ advantage. Architecture-alone falsifications: a frozen random transformer with correct $1/\sqrt{d_k}$ scaling stays at random-chance loss for 50k steps; a random-init Gemma slice fails OGBench cube-double-play-task1 entirely (0.89% across $n=3$ where pretrained reaches 60%). A dual-measurement protocol – text-activation probing on 95 English sentences plus task-ablation on a non-language target – names individual heads independently identifiable on both protocols: head L26.28 scores $3.7\times$ the slice mean for English token-copying and is the #2 most-critical head for binary copy ablation ($Δ$ L30 $= +0.221$); three further heads (L27.28, L27.2, L27.3) classify by the same protocol. The mechanism is single-model and the cross-modality results are single-task within their respective benchmarks; cross-model replication is structurally constrained because Gemma 4 31B is the only model on the small-scale Pareto frontier as of April 2026.

[329] Free Energy Surface Sampling via Reduced Flow Matching

Zichen Liu, Tiejun Li

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Sampling the free energy surface, namely, the distribution of collective variables (CVs), is a crucial problem in statistical physics, as it underpins a better understanding of chemical reactions and conformational transitions. Traditional methods for free energy surface sampling involve simulation in high-dimensional configuration space and projecting the resulting configurations onto the CV space. To reduce the computational costs of such sampling, we propose FES-FM, a reduced flow matching (FM) method for free energy sampling (FES). We train a dynamical transport map in the CV space, thereby enabling direct sampling of the free energy surface. For many-particle systems, we construct a prior distribution based on the Hessian at a local minimum of the potential, which ensures both rotation-translation invariance and physically meaningful configurations. We evaluate the proposed method across a variety of potential functions and collective variables. Comparative experiments demonstrate that our approach drastically reduces computational costs while delivering superior accuracy per unit sampling time.

[330] Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning (RL) only in relatively short-horizon settings (typically around 20–30 turns). In this work, we study RL-based training of VLMs for long-horizon decision-making in Super Mario Land, a visually grounded environment requiring 100+ turns of interaction with coordinated perception, reasoning, and action. We begin with a systematic investigation of key algorithmic components and propose an adapted variant of PPO with a lightweight turn-level critic, which substantially improves training stability and sample efficiency over critic-free methods such as GRPO and Reinforce++. We further show that pretrained VLMs provide strong action priors, significantly improving sample efficiency during RL training and reducing the need for manual design choices such as action engineering, compared to classical deep RL trained from scratch. Building on these insights, we introduce Odysseus, an open training framework for VLM agents, achieving substantial gains across multiple levels of the game and at least 3 times average game progresses than frontier models. Moreover, the trained models exhibit consistent improvements under both in-game and cross-game generalization settings, while maintaining general-domain capabilities. Overall, our results identify key ingredients for making RL stable and effective in long-horizon, multi-modal settings, and provide practical guidance for developing VLMs as embodied agents.

[331] Hypergraph and Latent ODE Learning for Multimodal Root Cause Localization in Microservices

Xin Liu, Yuhang He, Sichen Zhao, Kejian Tong, Xingyu Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Root cause localization in cloud native microservice systems requires modeling complex service dependencies, irregular temporal dynamics, and heterogeneous observability data. We present HyperODE RCA, a unified framework that combines hypergraph attention learning, latent ordinary differential equations, and multimodal cross attention fusion for fine grained root cause analysis. The method learns higher order service interactions through differentiable hyperedge construction, captures continuous anomaly evolution from irregular observations with an ODE RNN encoder, and adaptively fuses logs, traces, metrics, entities, and events using context aware modality routing. We further improve robustness with a variational information bottleneck, temporal causal regularization, and invariant risk constraints. Experiments on the Tianchi AIOps benchmark show clear gains over strong baselines in ranking and classification performance, while preserving interpretability through learned hypergraph attention.

[332] VQ-SAD: Vector Quantized Structure Aware Diffusion For Molecule Generation

Farshad Noravesh, Reza Haffari, Layki Soon, Arghya Pal

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Many diffusion based molecule generation methods ignore the symbolic information of molecules and represent the atom and bond type as one hot representation. Methods based on Morgan fingerprints produce hash collisions and are hard to embed into a continuous space without information loss and random fingerprints correspond to no valid molecule. To circumvent this issue we use another paradigm and consider atom and bond codes as latent variables of VQ-VAE. We introduce VQ-SAD which first trains a VQ-VAE and uses the frozen pretrained VQ-VAE model and considers the codebooks for both atom and bond types as tokenizers for the downstream diffusion process. VQ-SAD is a neuro-symbolic model that utilizes both symbolic and neural structural information for a diffusion based model with learnable forward process. The large discrete code space provides a more balanced atom and bond types which enhances the denoising process. VQ-VAE slightly outperforms SOTA models for diffusion based molecule generation on QM9 and ZINC250k datasets.

[333] Binomial flows: Denoising and flow matching for discrete ordinal data

Yair Shenfeld, Ricardo Baptista, Stefano Peluchetti

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Flow-based generative modeling in continuous spaces exploit Tweedie’s formula to express the denoiser (learned in training) as a score function (used in sampling). In contrast, this relation has been largely missing in the discrete setting where common approaches focus on learning discrete scores and rates. In this work we close this gap for discrete non-negative ordinal data by introducing Binomial flows. Our framework provides a simple recipe for training a discrete diffusion model which simultaneously denoises, samples, and estimates exact likelihoods. We verify our methodology on synthetic examples and obtain competitive results on real-world data sets.

[334] Uniform-Correct Policy Optimization: Breaking RLVR’s Indifference to Diversity

Anamika Lochab, Bolian Li, Ruqi Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has achieved substantial gains in single-attempt accuracy (Pass@1) on reasoning tasks, yet often suffers from reduced multi-sample coverage (Pass@K), indicating diversity collapse. We identify a structural cause for this degradation: common RLVR objectives, such as GRPO, are indifferent to how probability mass is distributed among correct solutions. Combined with stochastic training dynamics, this indifference induces a self-reinforcing collapse, in which probability mass concentrates on a narrow subset of correct outputs while alternative valid solutions are suppressed. We formalize this collapse mechanism and further characterize the optimal policy structure under two complementary criteria: robustness and entropy-regularized optimality, which identify the Uniform-Correct Policy as uniquely optimal. Motivated by this analysis, we propose Uniform-Correct Policy Optimization (UCPO), a modification to GRPO that adds a conditional uniformity penalty on the policy’s distribution over correct solutions. The penalty redistributes gradient signal toward underrepresented correct responses, encouraging uniform allocation of probability mass within the correct set. Across three models (1.5B-7B parameters) and five mathematical reasoning benchmarks, UCPO improves Pass@K and diversity while maintaining competitive Pass@1, achieving up to +10% absolute improvement on AIME24 at Pass@64 and up to 45% higher equation-level diversity within the correct set. The code is available at https://github.com/AnamikaLochab/UCPO.

[335] AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

Chenyu Huang, Jianghao Lin, Zhengyang Tang, Bo Jiang, Ruoqing Jiang, Benyou Wang, Lai Wei

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose AlphaInventory, an end-to-end inventory-policy evolution and inference framework grounded in confidence-interval-based certification. The framework trains a large language model using reinforcement learning, incorporates demand data as well as numerical and textual features beyond demand, and generates white-box inventory policy with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical interface that connects training, inference, and deployment. This allows us to characterize the probability that the AlphaInventory evolves a statistically safe and improved policy, and to quantify the deployment gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, AlphaInventory outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.

[336] Advancing Edge Classification through High-Dimensional Causal Modeling of Node-Edge Interplay

Duanyu Feng, Li Ding, Hongru Liang, Wenqiang Lei

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Edge classification, a crucial task for graph applications, remains relatively under-explored compared to link prediction. Current methods often overlook the potential causal influences of node features on edge features, leading to a loss of relevant prior information. In this work, we present an empirical exploration using the Causal Edge Classification Framework (CECF). Unlike conventional causal inference methods, CECF is the first framework to apply causal inference principles to the edge classification task and to explore modeling edge features as a high-dimensional treatment within a causal framework. Based on the node embedding of Graph Neural Network (GNN), CECF seeks to learn a balanced representation of high-dimensional edge features by mitigating the potential influence of node features. Then, a cross-attention network captures the complex dependencies between node and edge features for final edge classification.Extensive experiments demonstrate that CECF not only achieves superior performance but also serves as a flexible, plug-and-play enhancement for existing methods.We also provide empirical analyses, offering insights into when and how this high-dimensional causal modeling framework works for the edge classification.

[337] ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects negative-token hidden representations onto an SVD-based low-rank positive subspace and uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4% in Avg@16 and 7.0% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

[338] PILIR: Physics-Informed Local Implicit Representation

Jianfeng Li, Feng Wang, Ke Tang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Physics-Informed Neural Networks have become a powerful mesh-free method for solving partial differential equations, but their performance is often limited by spectral bias. Specifically, in standard MLPs used in PINNs, the global parameter coupling causes the model to prioritize learning low-frequency components, resulting in slow convergence for high-frequency details. To overcome this limitation, we introduce the Physics-Informed Local Implicit Representation (PILIR). Our approach separates the global physical domain into a discrete latent feature space and a continuous generative decoder. By using a learnable grid to encode explicit spatial locality, PILIR can capture high-frequency details locally, preventing dilution by global patterns. A generative neural operator then synthesizes these local latent features into continuous physical fields, allowing accurate reconstruction of fine-scale structures. Experiments on a range of challenging PDEs show that PILIR effectively mitigates spectral bias, thereby boosting the convergence of high-frequency details and achieving superior accuracy compared to state-of-the-art methods.

[339] Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

Sai Niranjan Ramachandran, Suvrit Sra

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Decision trees and diffusion models are ostensibly disparate model classes, one discrete and hierarchical, the other continuous and dynamic. This work unifies the two by establishing a crisp mathematical correspondence between hierarchical decision trees and diffusion processes in appropriate limiting regimes. Our unification reveals a shared optimization principle: \emph{Global Trajectory Score Matching (GTSM)}, for which gradient boosting (in an idealized version) is asymptotically optimal. We underscore the conceptual value of our work through two key practical instantiations: \treeflow, which achieves competitive generation quality on tabular data with higher fidelity and a 2\times computational speedup, and \dsmtree, a novel distillation method that transfers hierarchical decision logic into neural networks, matching teacher performance within 2% on many benchmarks.

[340] Towards Robust and Scalable Density-based Clustering via Graph Propagation

Yingtao Zheng, Hugo Phibbs, Ninh Pham

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present \textit{CluProp}, a novel framework that reimagines varied-density clustering in high-dimensional spaces as a label propagation process over neighborhood graphs. Our approach formally bridges the gap between density-based clustering and graph connectivity, leveraging efficient propagation mechanisms from network science to mitigate the parameter sensitivity inherent in traditional density-based methods. Specifically, we introduce a deterministic density-based propagation strategy to ensure scalable neighborhood identification. The framework is agnostic to the choice of distance metric and exhibits superior performance on large-scale data, processing millions of points in minutes while consistently outperforming existing baselines in accuracy.

[341] BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

Zhixiong Zhao, Zukang Xu, Dawei Yang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandwidth cost. However, existing methods cannot address activation heavy tails and thus must keep activations in high precision, preventing true end-to-end acceleration. To overcome this limitation, we propose BWLA (Binarized Weights and Low-bit Activations), the first post-training quantization framework that preserves high accuracy while achieving 1-bit weight quantization together with low-bit activations (e.g., 6 bits). The Orthogonal-Kronecker Transformation (OKT) learns an orthogonal mapping via EM minimization, converting unimodal weights into symmetric bimodal forms while suppressing activation tails and incoherence. The Proximal SVD Projection (PSP) then performs lightweight low-rank refinement through proximal SVD projection, further enhancing quantizability with minimal overhead. On Qwen3-32B, BWLA reaches a Wikitext2 perplexity of 11.92 under 6-bit activations (vs. 38 from SOTA), improves five zero-shot tasks by more than 70%, and delivers 3.26 times inference speedup, demonstrating strong potential for real-world LLM compression and acceleration.

[342] Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Haichen Hu, Jian Qian, David Simchi-Levi

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While recent advances have explored offline oracle-efficient algorithms, their computational complexity typically scales with the cardinality of the state and action spaces, rendering them intractable for large-scale or continuous environments. In this paper, we address this fundamental limitation by studying offline oracle-efficient episodic RL through the lens of log-barrier and log-determinant regularization. Specifically, for tabular Markov Decision Processes (MDPs), we propose a novel algorithm that achieves the optimal $\tilde{O}(\sqrt{T})$ regret bound while requiring only $O(H\log\log T)$ calls to both the offline statistical estimation and planning oracles when $T$ is known and $O(H\log T)$ calls when $T$ is unknown. Crucially, this oracle complexity is entirely independent of the size of the state and action spaces. This strict independence drastically reduces the planning oracle complexity, representing a substantial improvement over existing offline oracle-efficient algorithms (Qian et al., 2024). Furthermore, we demonstrate the versatility of our framework by generalizing the algorithm to linear MDPs featuring infinite state spaces and arbitrary action spaces. We prove that this generalized approach successfully attains meaningful sub-linear regret. Consequently, our work yields the first doubly oracle-efficient (i.e., efficient with respect to both statistical estimation and policy optimization) regret minimization algorithm capable of solving MDPs with infinite state and action spaces, significantly expanding the boundaries of computationally tractable RL.

[343] Mesh Field Theory: Port-Hamiltonian Formulation of Mesh-Based Physics

Satoshi Noguchi, Yoshinobu Kawahara

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present Mesh Field Theory (MeshFT) and its neural realization, MeshFT-Net: a structure-preserving framework for mesh-based continuum physics that cleanly separates the physics’ topological structure from its metric structure. Imposing minimal physical principles (locality, permutation equivariance, orientation covariance, and energy balance/dissipation inequality), we prove a reduction theorem for mesh-based physics. Under these conditions, the physical dynamics admit a local factorization into a port-Hamiltonian form: the conservative interconnection is fixed uniquely by mesh topology, whereas metric effects enter only through constitutive relations and dissipation. This reduction clarifies what must be fixed and what should be learned, directly informing MeshFT-Net’s design. Across evaluations on analytic and realistic datasets, physics-consistency tests, and out-of-distribution validation, MeshFT-Net achieves near-zero energy drift and strong physical fidelity (correct dispersion and momentum conservation) along with robust extrapolation and high data efficiency. By eliminating non-physical degrees of freedom and learning only metric-dependent structure, MeshFT provides a principled inductive bias for stable, faithful, and data-efficient learning-based physical simulation.

[344] M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data

J. Jake Nichol, Michael Weylandt, G. Matthew Fricke, Jhayron Perez-Carrasquilla, Melanie E. Moses

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Causal graph discovery for space-time systems is challenging in high-dimensional gridded data, which often has many more grid cells than temporal observations per cell. The Causal Space-Time Stencil Learning (CaStLe) meta-algorithm was developed to address that niche under space-time locality and stationarity assumptions, but it is currently limited to univariate analyses. In this work, we present M-CaStLe. M-CaStLe generalizes the local embedding and parent-identification phases of CaStLe to jointly model local within-variable and cross-variable space-time causal structures in gridded data. Like CaStLe, by constraining candidate parents to a constant-size space-time neighborhood and pooling spatial replicates, M-CaStLe increases effective sample size to make discovery tractable in high-dimensional settings. We further decompose the resulting multivariate stencil graph into reaction and spatial graphs to aid interpretation in complex settings. We study M-CaStLe in four settings: a multivariate space-time vector autoregression benchmark with known ground truth, an advective-diffusive-reaction partial differential equation verification problem with derived physical reference structure, an atmospheric chemistry case study in a low-temporal-sample regime, and an El Niño Southern Oscillation study on reanalysis data, identifying phase-dependent ocean–atmosphere coupling. Across these settings, M-CaStLe more accurately recovers multivariate causal structure in controlled settings and identifies important physical dynamics in real-world case studies. Overall, M-CaStLe advances causal discovery for multivariate space-time systems while retaining interpretability at the grid level.

[345] PAMod: Modeling Cyclical Shifts via Phase-Amplitude Modulation for Non-stationary Time Series Forecasting

Yingbo Zhou, Yutong Ye, Shuhao Li, Rui Qian, Qiang Huang, Lemao Liu, Li Sun, Dejing Dou

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Real-world time series forecasting faces the fundamental challenge of non-stationary statistical properties, including shifts in mean and variance over time. While reversible instance normalization (RevIN) has shown promise by stationarizing inputs and denormalizing outputs, it relies on the strong assumption that historical and future distributions remain identical. We observe that in many practical applications, distribution shifts follow cyclical patterns that correlate with periodic positions (e.g., seasonal and holiday volatility). To this end, we propose PAMod, a lightweight yet powerful framework that models cyclical distribution shifts via Phase-Amplitude Modulation in the normalized feature space. PAMod learns periodic embeddings to modulate representations: phase modulation captures mean shifts, while amplitude modulation adapts to variance changes. Crucially, we prove mathematically that modulating in normalized space is equivalent to applying dynamic denormalization, offering an elegant unification of distribution adaptation and representation learning. Extensive experiments on twelve real-world benchmarks demonstrate that PAMod achieves state-of-the-art performance with fewer computational resources. Furthermore, our modulation mechanism, as a novel plug-and-play technique, can improve existing time-series forecasting methods with simple integration.

[346] Rethinking LLM Ensembling from the Perspective of Mixture Models

Jiale Fu, Yuchu Jiang, Peijun Wu, Chonghan Liu, Joey Tianyi Zhou, Xu Yang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Model ensembling is a well-established technique for improving the performance of machine learning models. Conventionally, this involves averaging the output distributions of multiple models and selecting the most probable label. This idea has been naturally extended to large language models (LLMs), yielding improved performance but incurring substantial computational cost. This inefficiency stems from directly applying conventional ensemble implementation to LLMs, which require a separate forward pass for each model to explicitly compute the ensemble distribution. In this paper, we propose the Mixture-model-like Ensemble (ME). By reinterpreting the ensemble as a mixture model, ME stochastically selects a single model at each step to generate the next token, thereby avoiding the need to explicitly compute the full ensemble distribution. ME is mathematically equivalent to sampling from the ensemble distribution, but requires invoking only one model, making it 1.78x-2.68x faster than conventional ensemble. Furthermore, this perspective connects LLM ensembling and token-level routing methods, suggesting that LLM ensembling is a special case of routing methods. Our findings open new avenues for efficient LLM ensembling and motivate further exploration of token-level routing strategies for LLMs. Our code is available at https://github.com/jialefu/Mixture-model-like-Ensemble/.

[347] Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks

Sara Malacarne, Eirik Hoel-Høiseth, Erlend Aune, David Zsolt Biró, Massimiliano Ruocco

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Mobile network operators must monitor thousands of heterogeneous network elements across the radio access network and the packet core, each exposing high-dimensional KPI time series. The scale and cost of incident labelling make supervised approaches impractical, motivating unsupervised anomaly detection robust to context shifts and nonstationarity. We propose \textbf{C-MTAD-GAT} (\emph{Context-aware Multivariate Time-series Anomaly Detection with Graph Attention}), an anomaly detection framework designed to operate as a single shared model across large populations of network elements. The model combines temporal and feature-wise graph attention with lightweight static and dynamic context conditioning and a dual-head decoder for reconstruction and multi-step forecasting. It produces per-element, per-feature anomaly scores, converted to alerts via fully unsupervised thresholds calibrated from validation residuals. On the TELCO dataset released with DC-VAE \cite{garcia2023onemodel}, C-MTAD-GAT improves event-level affiliation and pointwise F1 while generating fewer alarms than prior graph-attention and VAE-based baselines. We then apply the same system to nation-scale radio access and evolved packet core control-plane counter data from a mobile network operator, where it is deployed. Operator feedback indicates the alerts are actionable and support daily monitoring, showing scalability across domains without relying on labelled incidents.

[348] GD4: Graph-based Discrete Denoising Diffusion for MIMO Detection

Qincheng Lu, Sitao Luan, Xiao-Wen Chang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In wireless communications, recovering the optimal solution to the multiple-input multiple-output (MIMO) detection problem is NP-hard. Obtaining high-quality suboptimal solutions with a favorable performance-complexity trade-off is particularly challenging in under-determined systems with $N_t$ transmit antennas and $N_r < N_t$ receive antennas. Recent diffusion-based MIMO detectors have shown promise, but they require extensive sampling iterations at inference time, and their performance degrades in under-determined scenarios. We propose GD4, a graph-based discrete denoising diffusion method for MIMO detection. Unlike existing diffusion-based detectors that operate in a continuous relaxed space, GD4 performs denoising directly in the discrete symbol space and enables fast inference with one or a few denoising evaluations. Numerical results show that, under a similar inference-time compute budget, GD4 produces higher-quality suboptimal solutions than existing diffusion-based detectors and some widely used classical baseline including box-constrained Babai point and the $K$-best box-constrained randomized Klein-Babai point in both under-determined and overdetermined settings.

[349] Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

Yu-Hsueh Fang, Chia-Yen Lee

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Online Conformal Prediction (CP) struggles to balance temporal adaptability and structural stability. Feedback-driven methods (e.g., Adaptive Conformal Inference (ACI)) suffer from systemic marginal under-coverage and high interval variance during abrupt shifts, while temporally discounted Bayesian CP suffers from severe structural lag and uncalibrated interval bloat. We propose State-Adaptive Bayesian Conformal Prediction (SA-BCP) to achieve optimal spatio-temporal decoupling. By gating long-term temporal inertia with spatial kernel-density evidence, SA-BCP proactively expands intervals for recognized historical regimes while maintaining tight efficiency during stable states. We rigorously prove this mechanism’s optimality, identifying a minimax bias-variance tradeoff governed by an evidence threshold $K$. Extensive benchmarks on volatile financial datasets (2016–2026), including AMD, Gold, and GBP/USD, demonstrate that SA-BCP consistently minimizes the strictly proper Winkler score across diverse confidence levels. Specifically, SA-BCP resolves the systematic under-coverage inherent to ACI variants while simultaneously reducing the uncalibrated interval bloat of Bayesian CP by 10% to 37% under high-confidence requests. By elegantly navigating this tradeoff, SA-BCP achieves an optimal balance between conditional reliability and predictive efficiency.

[350] Adaptive Equilibrium: Dynamic Weighting Framework for Generalized Interruption of DeepFake Models

Hongrui Zheng, Liejun Wang, Zhiqing Guo

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: The advancement of generalized deepfake disruption is constrained by the interruption imbalance, a fundamental bottleneck inherent to the generation of universal perturbations. We reveal that conventional static gradient normalization fundamentally struggles to resolve architectural conflicts, causing the optimization to bias towards susceptible models while neglecting resistant ones. We argue that achieving high and uniform effectiveness requires resolving this imbalance by reaching an adaptive equilibrium. We propose the Adaptive Equilibrium Framework (AEF), which employs a dynamic weighting mechanism that utilizes real-time loss feedback to adaptively assign greater interruption weights to the most resistant models. This approach shifts the optimization from an average-case problem to finding a dynamic balance, driving the perturbation to a uniformly effective equilibrium state. Comprehensive experiments validate that AEF achieves a more balanced interruption performance, maintaining a consistent interruption success rate across the evaluated diverse architectures.

[351] The Power of Order: Fooling LLMs with Adversarial Table Permutations

Xinshuai Dong, Haifeng Chen, Xuyuan Liu, Shengyu Chen, Haoyu Wang, Shaoan Xie, Kun Zhang, Zhengzhang Chen

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Models have achieved remarkable success and are increasingly deployed in critical applications involving tabular data, such as Table Question Answering. However, their robustness to the structure of this input remains a critical, unaddressed question. This paper demonstrates that modern LLMs exhibit a significant vulnerability to the layout of tabular data. Specifically, we show that semantically-invariant permutations of rows and columns - rearrangements that do not alter the table’s underlying information - are sometimes sufficient to cause incorrect or inconsistent model outputs. To systematically probe this vulnerability, we introduce Adversarial Table Permutation, a novel, gradient-based attack that efficiently identifies worst-case permutations designed to maximally disrupt model performance. Our extensive experiments demonstrate that ATP significantly degrades the performance of a wide range of LLMs. This reveals a pervasive vulnerability across different model sizes and architectures, including the most recent and popular models. Our findings expose a fundamental weakness in how current LLMs process structured data, underscoring the urgent need to develop permutation-robust models for reliable, real-world applications.

[352] Federated Learning with Hypergradient-based Online Update of Aggregation Weights

Ayano Nakai-Kasai, Tadashi Wadayama

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated learning using mobile and Internet of Things devices requires not only the ability to handle heterogeneity of clients’ data distributions but also high adaptability to varying communication environments. We propose FedHAW (Federated Learning with Hypergradient-based update of Aggregation Weights) that implements online updates of aggregation weights. FedHAW updates the aggregation weights by using hypergradient, the gradient of the objective function with respect to the weights, which can be calculated with low computational overhead. Simulation results show that the proposed method possesses high generalization performance in heterogeneous environments and high robustness to communication errors.

[353] Batch Normalization for Neural Networks on Complex Domains

Xuan Son Nguyen, Nistor Grozavu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of principled Riemannian analogs of fundamental building blocks in deep neural networks (DNNs). Among those, Riemannian batch normalization (BN) layers have shown to enhance training stability and improve accuracy. In this paper, we propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.

[354] Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Ziwen Zhao, Menglin Yang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $Ψ$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative “merging and collapse” process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $Ψ$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.

[355] Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation

Shihong Ding, Fangyu Du, Cong Fang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Multi-task learning (MTL) has emerged as a pivotal paradigm in machine learning by leveraging shared structures across multiple related tasks. Despite its empirical success, the development of likelihood-based efficiently solvable algorithms–even for shared linear representations–remains largely underdeveloped, primarily due to the non-convex structure intrinsic to matrix factorization. This paper introduces a first-order algorithm that jointly learns a shared representation and task-specific parameters, with guaranteed efficiency. Notably, it converges in $\widetilde{\mathcal{O}}(1)$ iterations and attains a \emph{near-optimal} estimation error of $\widetilde{\mathcal{O}}(dk/(TN))$, \emph{improving} over existing likelihood-based methods by a factor of $k$, where $d$, $k$, $T$, $N$ denote input dimension, representation dimension, task count, and samples per task, respectively. Our results justify that likelihood-based first-order methods can efficiently solve the MTL problem.

[356] Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

Junda Ying, Yuxuan Wang, Bowen Yang, Peijie Zhou, Lei Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis. Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level. However, this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution, which is essential for understanding lineage branching and fate decisions. We present Unbalanced Schrödinger Bridge (USB), a simulation-free framework for learning underlying dynamics that effectively integrates both stochastic and unbalanced effects which also models the discrete, jump-like birth-death dynamics at single-cell resolution. Theoretically, USB provides a tractable solution to the Branching Schrödinger Bridge (BSB) problem, offering a rigorous microscopic interpretation where individual cells undergo both Brownian motion and discrete birth-death jumps. Technically, the method implements an efficient solver by introducing a simulation-free training objective that effectively scales to high-dimensional omics data. Empirically, we demonstrate on both simulated and real-world datasets that USB not only achieves trajectory reconstruction performance better than or comparable to deterministic baselines but also uniquely enables realistic discrete simulation of birth-death dynamics at single-cell resolution.

[357] Trading off rewards and errors in multi-armed bandits

Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm. We study the tradeoff between identifying arm means accurately and accumulating reward, and present an algorithm with regret guarantees that interpolates between the two objectives. We provide both upper and lower bounds and validate empirically.

[358] Revealing graph bandits for maximizing local influence

Alexandra Carpentier, Michal Valko

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We study a graph bandit setting where the objective of the learner is to detect the most influential node of a graph by requesting as little information from the graph as possible. One of the relevant applications for this setting is marketing in social networks, where the marketer aims at finding and taking advantage of the most influential customers. The existing approaches for bandit problems on graphs require either partial or complete knowledge of the graph. In this paper, we do not assume any knowledge of the graph, but we consider a setting where it can be gradually discovered in a sequential and active way. At each round, the learner chooses a node of the graph and the only information it receives is a stochastic set of the nodes that the chosen node is currently influencing. To address this setting, we propose BARE, a bandit strategy for which we prove a regret guarantee that scales with the detectable dimension, a problem dependent quantity that is often much smaller than the number of nodes.

[359] Distance metric learning for conditional anomaly detection

Michal Valko, Milos Hauskrecht

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Anomaly detection methods can be very useful in identifying unusual or interesting patterns in data. A recently proposed conditional anomaly detection framework extends anomaly detection to the problem of identifying anomalous patterns on a subset of attributes in the data. The anomaly always depends (is conditioned) on the value of remaining attributes. The work presented in this paper focuses on instance-based methods for detecting conditional anomalies. The methods depend heavily on the distance metric that lets us identify examples in the dataset that are most critical for detecting the anomaly. To optimize the performance of such methods we study and devise a metric learning method that learns the distance metric to reflect best the conditional anomaly pattern.

[360] Fairness of Classifiers in the Presence of Constraints between Features

Martin C. Cooper, Imane Bousdira

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In Machine Learning, an accepted definition of fairness of a decision taken by a classifier is that it should not depend on protected features, such as gender. Unfortunately, when constraints exist between features, such dependencies can be obscured by the constraints. To avoid this problem, we propose that a decision be considered fair if it has a fair explanation. We define a fair explanation as a prime-implicant reason for the decision that does not contain any protected feature (where the constraints are taken into account in the definition of prime-implicant). Surprisingly, ignoring constraints can completely change the fairness of a decision (according to this definition) even in the absence of constraints between protected and unprotected features. Three possible definitions of fairness of a classifier are that for all its decisions (1) there are only fair explanations, (2) there is at least one fair explanation, or (3) changing protected features does not change the outcome. We identify the relationships between these different definitions of fairness and study the computational complexity of testing fairness of classifiers.

[361] Scaling Federated Linear Contextual Bandits via Sketching

Hantao Yang, Hong Xie, Xutong Liu, Defu Lian

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In federated contextual linear bandits, high data dimensionality incurs prohibitive computation and communication costs: local agents perform $O(d^3)$-time determinant computation and upload $O(d^2)$ parameters, making existing algorithms unscalable, where $d$ is the dimension of data. To relieve these scaling bottlenecks, this paper proposes Federated Sketch Contextual Linear Bandits (FSCLB). On the computation side, FSCLB uses SVD to indirectly obtain the determinant required for communication, eliminating the prohibitive cost of direct determinant calculation and cutting complexity from $O(d^3)$ to $O(l^2d)$ per round, where $l< d$ is the sketch size. On the communication side, FSCLB introduces a double-sketch strategy that reduces both upload and download costs from $O(d^2)$ to $O(ld)$. Naively involving sketch update into federated contextual linear bandits can destroy the local increment and invalidate the asynchronous communication condition; FSCLB solves this by replacing the covariance matrix with the sketch matrix when deciding whether to communicate. Theoretically, FSCLB achieves a regret bound of $\widetilde{O} ((\sqrt{d}+\sqrt{M\varepsilon_l})\sqrt{lT})$, where $\varepsilon_l$ is the upper bounded by the spectral tail of the covariance matrix; when $l$ exceeds the rank of the covariance matrix, the bound simplifies to $\widetilde{O}(\sqrt{ldT})$, matching the optimal no-sketch regret. Experiments on both synthetic and real-world datasets show that FSCLB significantly reduces computational and communication costs by over 90 % while sacrificing only a negligible amount of cumulative reward.

[362] Possibilistic Predictive Uncertainty for Deep Learning

Yao Ni, Jeremie Houssineau, Yew Soon Ong, Piotr Koniusz

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep neural networks achieve impressive results across diverse applications, yet their overconfidence on unseen inputs necessitates reliable epistemic uncertainty modelling. Existing methods for uncertainty modelling face a fundamental dilemma: Bayesian approaches provide principled estimates but remain computationally prohibitive, while efficient second-order predictors lack rigorous derivations connecting their specific objectives to epistemic uncertainty quantification. To resolve this dilemma, we introduce Dirichlet-approximated possibilistic posterior predictions (DAPPr), a principled framework leveraging possibility theory. We define a possibilistic posterior over parameters, projects this posterior to the prediction space via supremum operators, and approximates the projected posterior using learnable Dirichlet possibility functions. This projection-and-approximation strategy yields a simple training objective with closed-form solutions. Extensive experiments across diverse benchmarks demonstrate that our approach achieves competitive or superior uncertainty quantification performance compared to state-of-the-art evidential deep learning methods while maintaining both principled derivation and computational efficiency. Code will be available at https://github.com/MaxwellYaoNi/DAPPr.

[363] LambdaRankIC: Directly Optimizing Rank IC for Financial Prediction

Yan Lin, Yihong Su, Yi Yang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In financial predictions, the performance of machine learning models is often assessed by Rank IC, which is the Spearman rank correlation between the model predictions and the realized asset returns. Despite its wide adoption, most existing models are trained using regression losses or ranking objectives that may not align with Rank IC. We propose LambdaRankIC, a novel learning-to-rank approach that directly optimizes Rank IC. We circumvent the non-differentiability of the ranking operator by deriving the closed-form expression for the lambda gradients induced by the pairwise rank swaps, which enables efficient gradient-based optimization within the LambdaRank framework. We implement LambdaRankIC as a custom objective in XGBoost. Theoretically, we show that our approach optimizes an upper bound on Rank IC. We evaluate the proposed approach on both simulated and real-world financial data. In simulation studies, LambdaRankIC accurately recovers the true ranking structure in noiseless settings and consistently outperforms regression-based and NDCG-oriented ranking methods under low signal-to-noise ratios and heavy-tailed noise regimes. In empirical experiments using real market data, LambdaRankIC achieves the best out-of-sample performance on evaluation metrics commonly used in finance, including Rank IC, ICIR, monthly return, and Sharpe ratio. These results show that directly optimizing Rank IC can yield substantial improvements over conventional learning objectives in financial predictions when the full-order ranking quality is the primary goal.

[364] A Comparative Study of QSPR Methods on a Unique Multitask PAMPA dataset

Andrs Formanek, Anna Vincze, Richrd Bicsak, Yves Moreau, Gyorgy T. Balogh, Adam Arany

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We present a unique, multitask dataset comprising 143 drug and drug candidate molecules, each evaluated on in vitro, parallel artificial-membrane permeability assays (PAMPA) using six different model membranes. Using this resource, we systematically assess the effectiveness of various molecular descriptors and regression models in predicting passive membrane permeability. The studied models range from simple linear regression to a modern pre-trained transformer architecture. Particular attention is given to the trade-off between predictive performance and model interpretability, highlighting the challenges introduced by machine learning approaches. To our knowledge, this is the most comprehensive study on simultaneous modeling of multiple organ-specific PAMPA membranes to date, offering novel insights into membrane-specific permeability profiles. We found that expert-designed physico-chemical property descriptors are more fitting for a limited sample size permeabilty study than deep learning based representations.

[365] Learning Multimodal Energy-Based Model with Multimodal Variational Auto-Encoder via MCMC Revision

Jiali Cui, Zhiqiang Lao, Heather Yu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Energy-based models (EBMs) are a flexible class of deep generative models and are well-suited to capture complex dependencies in multimodal data. However, learning multimodal EBM by maximum likelihood requires Markov Chain Monte Carlo (MCMC) sampling in the joint data space, where noise-initialized Langevin dynamics often mixes poorly and fails to discover coherent inter-modal relationships. Multimodal VAEs have made progress in capturing such inter-modal dependencies by introducing a shared latent generator and a joint inference model. However, both the shared latent generator and joint inference model are parameterized as unimodal Gaussian (or Laplace), which severely limits their ability to approximate the complex structure induced by multimodal data. In this work, we study the learning problem of the multimodal EBM, shared latent generator, and joint inference model. We present a learning framework that effectively interweaves their MLE updates with corresponding MCMC refinements in both the data and latent spaces. Specifically, the generator is learned to produce coherent multimodal samples that serve as strong initial states for EBM sampling, while the inference model is learned to provide informative latent initializations for generator posterior sampling. Together, these two models serve as complementary models that enable effective EBM sampling and learning, yielding realistic and coherent multimodal EBM samples. Extensive experiments demonstrate superior performance for multimodal synthesis quality and coherence compared to various baselines. We conduct various analyses and ablation studies to validate the effectiveness and scalability of the proposed multimodal framework.

[366] Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

Mengke Zhao, Guang-Xing Li, Duo Xu, Keping Qiu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Complex physical systems, from supersonic turbulence to the macroscopic structure of the universe, are governed by continuous multiscale dynamics. While modern machine learning architectures excel at mapping the high-dimensional observables of these systems, it remains unclear whether they internalize the governing physical laws or merely interpolate discrete statistical correlations. Standard Explainable AI (XAI) architectures, particularly perturbation-based and gradient-saliency methods, rely on pixel-wise perturbations, which generate unphysical artifacts and push inputs off the valid empirical distribution. To resolve this, we introduce a diagnostic framework driven by Constrained Diffusion Decomposition (CDD), a diffusion-based multiscale data decomposition algorithm that enables physically constrained data generation and model evaluation via scale-aware modifications. Applying this framework to a Denoising Diffusion Probabilistic Model (DDPM), we execute deterministic interventions directly within the continuous, CDD-based scale space. We demonstrate that under moderate physical perturbations, the unconstrained generative model exhibits localized structural freezing and non-linear instability rather than continuous PDE-like responses. The network fails to maintain cross-scale continuity, causing the generative trajectory to diverge when pushed into unseen physical states. By synthesizing a continuum of physically coherent states, this scale-informed methodology establishes a controlled test ground to evaluate algorithmic vulnerabilities, providing the rigorous physical constraints necessary for future architectures to respect the multiscale causality of the natural universe.

[367] AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

Zhijie Cai, Haolong Chen, Guangxu Zhu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Fine-tuning LLMs is necessary for various dedicated downstream tasks, but classic backpropagation-based fine-tuning methods require substantial GPU memory. To this end, a recent work, MeZO, which relies solely on forward passes to fine-tune LLMs, significantly reduces GPU requirements at the cost of slower convergence due to its indifference to loss landscapes. Standard solutions, such as Adam, explore loss landscapes by estimating the first- and second-order moments and storing them in memory to guide the model’s movement through dimensions with lower curvature and vice versa. However, directly applying Adam negates MeZO’s advantage as it will triple the memory requirement. In light of this, we propose AdaMeZO, a zeroth-order optimizer that leverages Adam-style first- and second-moment estimates without maintaining them in memory. We present a theoretical analysis of AdaMeZO, corroborated by extensive experiments demonstrating AdaMeZO’s performance, showing that AdaMeZO can outperform MeZO while requiring up to $70%$ fewer forward passes. Trajectory visualizations affirm AdaMeZO’s ability to adapt to diverse loss landscapes.

[368] Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that perform distribution matching are a promising methods, but they are notorious for training instability and mode collapse. In particular, unstable rewards in red-teaming accelerate mode collapse. We propose Stable-GFN (S-GFN), which eliminates partition function $Z$ estimation in GFN and reduces training instability. S-GFN avoids Z-estimation through pairwise comparisons and employs a robust masking methodology against noisy rewards. Additionally, we propose a fluency stabilizer to prevent the model from getting stuck in local optima that produce gibberish. S-GFN provides more stable training while maintaining the optimal policy of GFN. We demonstrate the overwhelming attack performance and diversity of S-GFN across various settings.

[369] Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

Andrzej Ruszczynski, Tiangang Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We also define the class of multipattern risk-averse problems that generalizes the class of linear systems. We use both concepts in a feature-based $Q$-learning method with multipattern $Q$-factor approximation and we prove a high-probability regret bound of $\mathcal{O}\big(H^2 N^H \sqrt{ K}\big)$, where $H$ is the horizon, $N$ is the mini-batch size, and $K$ is the number of episodes. We also propose an economical version of the $Q$-learning method that streamlines the policy evaluation (backward) step. The theoretical results are illustrated on a stochastic assignment problem and a short-horizon multi-armed bandit problem.

[370] Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

Man Yung Wong

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition. Three lightweight gate modifications raise this to 0.748 +/- 0.002 (124x), cutting experts needed for 99% coverage from infeasible to a small constant: temporal memory (beta), a per-expert LIF membrane potential accumulating routing context across tokens; precision-weighted gating (Pi), a per-expert inverse variance of recent prediction error, yielding 31x contrast between reliable and unreliable experts; and anticipatory routing, a next-state predictor conditioned on the beta-accumulated hidden state. The mechanisms draw from Friston’s Free Energy Principle and use LIF dynamics from spiking neural networks. An ablation across all 2^3 subsets reveals a super-additive beta x Ant interaction: anticipation alone gives nothing (+0.000 +/- 0.001); beta alone gives modest gain (+0.295 +/- 0.013); combined they close 75% of the oracle gap (+0.741 +/- 0.002, exceeding the sum by +0.446 +/- 0.014). This is structural: a stateless predictor cannot detect approaching transitions because pre-transition tokens are distributionally identical to within-domain tokens. In a character-level MoE LM (5 seeds), beta-routing reduces transition-step BPC from 6.56 +/- 0.01 (Standard) to 4.01 +/- 0.15 (beta-MoE); the beta + Ant gate places 0.86 +/- 0.02 probability on the correct domain expert before that domain appears in input, vs 0.42 +/- 0.12 for Standard MoE. Reference implementations (~200 lines each): https://github.com/russellwmy/affinity-is-not-enough

[371] Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning

Jiaming Zhang, Yujie Yang, Yao Lyu, Shengbo Eben Li, Liping Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominent paradigm. Handling state-wise constraints with the Lagrangian method requires a distinct multiplier for every state, necessitating neural networks to approximate them as a multiplier network. However, applying standard dual gradient ascent to multiplier networks induces severe training oscillations. This is because the inherent instability of dual ascent is exacerbated by network generalization – local overshoots and delayed updates propagate to adjacent states, further amplifying policy fluctuations. Existing stabilization techniques are designed for scalar multipliers, which are inadequate for state-dependent multiplier networks. To address this challenge, we propose an augmented Lagrangian multiplier network (ALaM) framework for stable learning of state-wise multipliers. ALaM consists of two key components. First, a quadratic penalty is introduced into the augmented Lagrangian to compensate for delayed multiplier updates and establish the local convexity near the optimum, thereby mitigating policy oscillations. Second, the multiplier network is trained via supervised regression toward a dual target, which stabilizes training and promotes convergence. Theoretically, we show that ALaM guarantees multiplier convergence and thus recovers the optimal policy of the constrained problem. Building on this framework, we integrate soft actor-critic (SAC) with ALaM to develop the SAC-ALaM algorithm. Experiments demonstrate that SAC-ALaM outperforms state-of-the-art safe RL baselines in both safety and return, while also stabilizing training dynamics and learning well-calibrated multipliers for risk identification.

[372] Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

Chaohao Yuan, Chenghao Xiao, Yu Rong, Hong Cheng, Long-Kai Huang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential training can cause catastrophic forgetting, and joint optimization often suffers from severe gradient conflicts. We analyze SFT and RLVR through the lens of task vectors and reveal three structural properties behind these failures: a 30* magnitude disparity, 45* sign interference, and heterogeneous module-wise update distributions. These findings show SFT and RLVR are difficult to integrate directly, but they also suggest that the two paradigms modify partly complementary components of the model. Motivated by these observations, we propose Decoupled Test-time Synthesis (DoTS), a post-hoc framework allows SFT and RLVR checkpoints to be trained independently and synthesizes their capabilities only at inference time via task vector arithmetic, without updating model parameters. To reduce interference, DOTS applies selective sparsification with norm-preserving rescaling. It then uses Bayesian optimization on a small set of unlabeled queries to search for combination coefficients on the Pareto frontier of consistency and perplexity. Empirically, \ours matches or exceeds the performance of training-based SFT–RLVR integration methods across multiple mathematical reasoning benchmarks, incurring only $\sim$3% of the computational cost. When applied to stronger post-trained checkpoints, DOTS surpasses SOTA models and generalizes to out-of-domain benchmarks without re-tuning. Code is available at https://github.com/chaohaoyuan/DoTS.

[373] Class Angular Distortion Index for Dimensionality Reduction

Kaviru Gunaratne, Stephen Kobourov, Jacob Miller

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dimensionality reduction (DR) techniques are often characterized by whether they preserve global, high-level structures in the data or local, neighborhood structures. This distinction matters in visualization: global methods can obscure clusters while local methods can over-emphasize them. Yet, even when clusters appear distinct, their relative arrangement in the projection may be arbitrary or misleading, a common issue in techniques such as t-SNE and UMAP. Existing cluster quality metrics either only measure cluster separability or assume spherical, globular clusters in the original space. We introduce the Class Angular Distortion Index (CADI), a metric that uses internal angles among point triples to determine the faithfulness of cluster organization in a projection. We show cases on both real and synthetic data where existing cluster metrics fail, but CADI provides an interpretable result. Since it relies on computing angles, CADI is also differentiable, enabling optimization. We demonstrate this with a CADI-based DR technique.

[374] Unlearning Offline Stochastic Multi-Armed Bandits

Zichun Ye, Runqi Wang, Xuchuang Wang, Xutong Liu, Shuai Li, Mohammad Hajiesmaili

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms that switch between them according to the data regime and privacy constraint. We further introduce a mixing procedure that elucidates the rationale behind these baselines. We provide performance guarantees across the above settings and establish lower bounds under both dataset models. Experiments validate the predicted tradeoffs and demonstrate the effectiveness of the proposed methods.

[375] Knowing when to trust machine-learned interatomic potentials

Shams Mehdi, Ilkwon Cho, Olexandr Isayev

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Prevailing machine-learned interatomic potential (MLIP) uncertainty-quantification methods rely on ensembles of independently trained backbones. These methods scale unfavorably with foundation-scale MLIPs, and their member-disagreement signals correlate weakly with per-molecule prediction error. Here we probe the frozen per-atom representations of a pretrained MLIP with a compact discriminative classifier, recasting MLIP uncertainty quantification as selective classification rather than error regression. The resulting method, PROBE (Post-hoc Reliability frOm Backbone Embeddings), produces a per-prediction reliability probability that monotonically tracks actual error without modification to the underlying model. Across large held-out evaluation sets and two structurally distinct MLIP architectures, PROBE outperforms ensemble disagreement as a binary reliability signal, which strengthens with the expressiveness of the backbone representation, implying a favorable scaling trajectory toward foundation-scale MLIPs. Multi-head self-attention additionally yields per-atom importance maps, providing chemically interpretable diagnostics at no additional computational cost. PROBE is post-hoc and architecture-agnostic, and is directly deployable on any MLIP that exposes per-atom representations.

[376] Bridging Graph Drawing and Dimensionality Reduction with Stochastic Stress Optimization

Daniel Hangan, Stephen Kobourov, Jacob Miller

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Both Dimensionality Reduction (DR) and Graph Drawing (GD) aim to visualize abstract, non-linear structures, yet rely on different optimization paradigms. This contrast is evident in Multidimensional Scaling (MDS), which typically depends on the SMACOF algorithm despite graph drawing results showing that simpler stochastic optimization schemes can be more effective for the same objective. We bridge these domains by adapting Stochastic Gradient Descent (SGD) techniques from graph drawing to vector data embedding. We present a scikit-learn compatible estimator that minimizes global stress through local pairwise updates, improving upon the existing implementation. Experiments on standard high-dimensional benchmarks show that our stochastic solver converges substantially faster than SMACOF while achieving comparable or lower stress.

[377] From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting

Alireza Namazi, Heman Shakeri

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Clinical time-series forecasting is increasingly studied for decision support, yet standard aggregate metrics can obscure whether a model is actually useful for the task it is meant to serve. In safety-critical settings, low average error can coexist with dangerous failures in exactly the high-risk regimes that matter most. We present a task-aware evaluation framework for blood glucose forecasting built around two downstream uses: hypoglycemia early warning and insulin dosing decision support. For early warning, we evaluate on real data from three clinical cohorts using event-level recall and false alarms per patient-day, metrics that reflect operational alarm burden rather than aggregate accuracy. We show that models appearing acceptable overall, with recall above 0.9 on the full test set, can fail badly in the post-bolus slice, where insulin-on-board is elevated and missed warnings carry the greatest clinical consequences. Standard forecasting evaluation, however, does not test whether a model can reason about the effects of actions, a requirement for supporting insulin dosing decisions. We therefore add a second, interventional arm using the FDA-accepted UVA/Padova simulator, where we evaluate whether forecasters can predict glucose responses to altered insulin plans in paired factual/counterfactual scenarios. We show that models that look strong on real-data forecasting often fail to predict the direction, magnitude, or ranking of intervention effects, and choose poor insulin doses when evaluated under a clinically motivated cost. Taken together, the two arms reveal a consistent gap between forecasting accuracy and task-relevant usefulness. We release the benchmark, the standardized preprocessing pipeline for public cohorts, and the simulator-based interventional dataset as a reproducible toolkit.

[378] PEACE: Cross-modal Enhanced Pediatric-Adult ECG Alignment for Robust Pediatric Diagnosis

Xinran Liu, Yuwen Li, Hongxiang Gao, Heyang Xu, Jianqing Li, Zongmin Wang, Chengyu Liu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Automated pediatric electrocardiogram (ECG) diagnosis remains challenging because models trained predominantly on adult data suffer from substantial cross-population mismatch, while pediatric labels are often scarce. We present PEACE (Pediatric-Adult ECG Alignment via Cross-modal Enhancement), a structured cross-modal alignment framework for adult-to-pediatric ECG transfer. PEACE integrates tri-axial clinical semantic decomposition, label-query feature extraction, and curriculum-gated optimization to align transferable adult ECG representations with pediatric diagnostic targets. Since ZZU-pECG provides no paired clinical reports, we generate label-conditioned semantic descriptors using Gemini with concise clinical prompts and use them only as auxiliary training supervision; inference remains ECG-only. On ZZU-pECG, PEACE achieves 59.39%, 79.03%, and 90.89% AUC under zero-shot, 50-shot, and full fine-tuning settings, respectively, and reaches 96.65% AUC on the shared PTB-XL label space. These results suggest that structured clinical semantic supervision can improve low-resource adult-to-pediatric ECG transfer, while prospective clinical validation and more explicit age-aware modeling remain necessary before real-world deployment.

[379] Budget Constraints as Riemannian Manifolds

Michael Helcig, Dan Alistarh

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Assigning one of K options to each of N groups under a total cost budget is a recurring problem in machine learning, appearing in mixed-precision quantization, non-uniform pruning, and expert selection. The objective (model loss) depends jointly on all assignments and does not decompose across groups, which prevents combinatorial solvers from optimizing the true objective directly and limits them to proxy objectives. Evolutionary search evaluates the actual loss but lacks gradient information, while penalty-based methods provide gradients but enforce the budget only approximately and require sensitive hyperparameter tuning. We observe that under softmax relaxation, the budget constraint defines a smooth Riemannian manifold in logit space with particularly simple geometry: the normal vector is available in closed form, shifting logits along the cost vector changes expected cost monotonically, allowing binary-search retraction, and vector transport reduces to a single inner product. Building on this structure, we propose Riemannian Constrained Optimization (RCO), which augments a standard Adam update with tangent projection, binary-search retraction, and momentum transport. Combined with Gumbel straight-through estimation and budget-constrained dynamic programming for discrete feasibility, RCO enables first-order optimization of the true objective under exact budget enforcement, without introducing constraint hyperparameters. On synthetic knapsack problems with known optima, the manifold-based constraint handling recovers optimal solutions, whereas penalty methods plateau at 83% of optimal. On LLM compression tasks, including mixed-precision quantization and MoE expert pruning, RCO matches or exceeds evolutionary search methods while requiring 3x to 16x lower wall-clock cost on the evaluated configurations.

[380] Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game

Lixing Li

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While Large Language Models have achieved notable success on formal mathematics benchmarks such as MiniF2F, it remains unclear whether these results stem from genuine logical reasoning or semantic pattern matching against pre-training data. This paper identifies Architectural Reasoning: the ability to synthesize formal proofs using exclusively local axioms and definitions within an alien math domain, as the necessary ability for future automated theorem discovery AI. We use the Obfuscated Natural Number Game, a benchmark to evaluate Architectural Reasoning. By renaming identifiers in the Natural Number Game in Lean 4, we created a zero-knowledge, closed environment. We evaluate state-of-the-art models, finding a universal latency tax where obfuscation increases inference time. The results also reveal a divergence in robustness: while general models (Claude-Sonnet-4.5, GPT-4o) suffer performance degradation, reasoning models (DeepSeek-R1, GPT-5, DeepSeek-Prover-V2) maintain the same accuracy despite the absence of semantic cues. These findings provide a quantitative metric for assessing the true capacity for mathematical reasoning.

[381] Deep Kernel Learning for Stratifying Glaucoma Trajectories

Bruce Rushing, Angela Danquah, Alireza Namazi, Arjun Dirghangi, Heman Shakeri

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Effectively stratifying patient risk in chronic diseases like glaucoma is a major clinical challenge. Clinicians need tools to identify patients at high risk of progression from sparse and irregularly-sampled electronic health records (EHRs). We propose a novel deep kernel learning (DKL) architecture that leverages a Gaussian Process (GP) backend. The GP’s kernel is defined by a transformer-based feature extractor applied to clinical-BERT embeddings to model glaucoma patient trajectories from multimodal EHR data. Our method successfully identifies three clinically distinct patient subgroups. Crucially, the model learns to decouple disease progression from current severity, identifying a high-risk group with a worsening trajectory despite having better average visual acuity than a second, stably poor group. This reveals that the model learns to identify progression risk rather than just the current disease state. This ability to stratify patients based on their risk trajectory progression offers a powerful tool for clinical decision support, enabling targeted interventions for high-risk individuals and improving the management of glaucoma care.

[382] Aitchison Embeddings for Learning Compositional Graph Representations

Nikolaos Nakis, Chrysoula Kosma, Panagiotis Promponas, Michail Chatzianastasis, Giannis Nikolentzos

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Representation learning is central to graph machine learning, powering tasks such as link prediction and node classification. However, most graph embeddings are hard to interpret, offering limited insight into how learned features relate to graph structure. Many networks naturally admit a role-mixture view, where nodes are best described as mixtures over latent archetypal factors. Motivated by this structure, we propose a compositional graph embedding framework grounded in Aitchison geometry, the canonical geometry for comparing mixtures. Nodes are represented as simplex-valued compositions and embedded via isometric log-ratio (ILR) coordinates, which preserve Aitchison distances while enabling unconstrained optimization in Euclidean space. This yields intrinsically interpretable embeddings whose geometry reflects relative trade-offs among archetypes and supports coherent behavior under component restriction; we consider both fixed and learnable ILR bases. Across node classification and link prediction, our method achieves competitive performance with strong baselines while providing explainability by construction rather than post-hoc. Finally, subcompositional coherence enables principled component restriction: removing and renormalizing subsets preserves a well-defined geometry, which we exploit via subcompositional dimensionality removal to probe how archetype groups influence representations and predictions.

[383] Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

Jiawen Chen, Qi Shao, Duxin Chen, Wenwu Yu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Combinatorial complexes have unified set-based (e.g., graphs, hypergraphs) and part-whole (e.g., simplicial, cellular complexes) structures into a common topological framework. Existing topological neural networks and Weisfeiler-Lehman variants remain fragmented, lacking a unified theoretical foundation for topological deep learning. In this work, we introduce the Combinatorial Complex Weisfeiler-Lehman (CCWL) test, an axiomatic-style extension of the WL test to combinatorial complexes. CCWL formalizes topological message passing through four types of neighborhood relation and provides a unified perspective on the expressive power of higher-order variants. We further prove that upper and lower neighborhoods are sufficient among the four adjacent WL tests to reach the expressivity of the full CCWL framework across topological structures of combinatorial complexes. Building on this framework, we also propose the Combinatorial Complex Isomorphism Network (CCIN) and evaluate it on synthetic and real-world benchmarks. Experimental results indicate CCIN outperforms baseline methods and offers a generalized expressive framework for topological deep learning.

[384] Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

Ramin Mohammadi, Vahab vahdat, Sarthak Jain, Amir T. Namin, Ramya Palacholla, Sagar Kamarthi

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.

Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.

[386] Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

Rodolphe Barlogis, Ferhat Tamssaouet, Quentin Falcoz, Stéphane Grieu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the DeepONet framework. We consider a 2D square domain with an inclusion of arbitrary boundary geometry at its center. This inclusion acts as a scatterer for an incoming harmonic wave. The aim is to learn the operator linking the geometry of the scatterer to the resulting scattered field. A signed distance function to the boundary of the inner inclusion, evaluated at several points in the domain, is used to encode its geometry. It serves as input for the branch part of the DeepONet architecture, while local information is used as input for the trunk part. This approach enables the encoding of arbitrary geometries, whether they are parameterized or not. The evaluation of the model on unseen geometries is compared with its finite element method (FEM) equivalent to test its generalization capabilities. The trained network weights implicitly embed the local physics and their interaction with the domain geometry. If the training space sufficiently covers the target evaluation space, the model can generalize accordingly. Furthermore, it can be refined to extend to another region of interest without retraining from scratch. This framework also avoids the need to remesh the domain for each geometry. The proposed approach delivers a computationally lighter surrogate model than FEM alternatives and avoids relying on FEM-generated training data.

[387] Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

Jacques Raynal, Pierre Slangen, Jacques Margerit

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly presumes a correspondence between output metrics and internal system states that may not hold in adaptive systems. In this study, the vertical dimension of occlusion (VDO) is considered as a constraint applied to an adaptive neuromechanical system, enabling the exploration of system-level responses under controlled variations. A single-case design in a patient with Parkinson’s disease allows an intra-individual analysis across repeated conditions.The analysis is structured across three complementary levels: (i) aggregated linear metrics describing observable performance, (ii) a dynamical systems framework describing temporal organization in state space, and (iii) a latent space representation obtained through unsupervised embedding. The results show that conditions with comparable observable performance may correspond to different organizations in both state space and latent space representations. This dissociation highlights a limitation of aggregated metrics and suggests that similar outputs may arise from non-equivalent system states. A fourth level is proposed as a purely conceptual extension describing potential relationships between system states. This level is not implemented and is not derived from experimental data. These observations are strictly exploratory and non-causal. The proposed framework does not establish mechanistic, predictive, or directional relationships, but provides a structured approach for analyzing constraint-driven systems across multiple levels of representation.

[388] SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Stavros Orfanoudakis, Pedro P. Vergara

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy updates directly in the action space. To bridge this gap, a geometry-aware RL algorithm that explicitly incorporates value-based similarity into the policy update, State-Action Value Geometry Optimization (SAVGO), is proposed. In detail, SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity, while dissimilar pairs are mapped to distinct directions. This learned geometry enables the generation of a similarity kernel over candidate actions sampled at each update, allowing policy improvement to be guided directly toward higher-value regions beyond local gradient-based updates. As a result, representation learning, value estimation, and policy optimization are unified within a single geometry-consistent objective, while preserving the scalability of off-policy actor-critic training. The proposed method is evaluated on standard MuJoCo continuous-control benchmarks, demonstrating improvements over strong baselines on challenging high-dimensional tasks. Ablation studies are done to analyze the contributions of value-geometry learning and similarity-based policy updates.

[389] Generating Statistical Charts with Validation-Driven LLM Workflows

Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-answer pairs. We present a structured LLM-based workflow that decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. By incorporating rendered-output validation, the workflow addresses visualization-specific failure modes such as readability and semantic mismatch. It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task, retaining each chart with its code, dataset context, description, and question-answer pairs. Applied to UCI datasets, the workflow produces 1,500 charts from 74 datasets, spanning 24 chart families and paired with 30,003 question-answer pairs. We evaluate 16 multimodal LLMs (MLLMs) on these chart-question pairs. The results show that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain more challenging, illustrating the workflow’s utility for diagnostic studies of chart-grounded multimodal reasoning.

[390] Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2309.06577: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2309.06577&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[391] Value Explicit Pretraining for Learning Transferable Representations

Kiran Lekkala, Henghui Bao, Sumedh A. Sontakke, Erdem Biyik, Laurent Itti

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2312.12339: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2312.12339&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[392] Mutatis Mutandis: Revisiting the Comparator in Discrimination Testing

Jose M. Alvarez, Salvatore Ruggieri

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2405.13693: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2405.13693&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[393] Dynamics-Encoded Deep Learning for Robust System Identification and Parameter Estimation

Caitlin Ho, Andrea Arnold

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2410.04299: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2410.04299&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[394] Latent Generative Modeling of Random Fields from Limited Training Data

James E. Warner, Tristan A. Shah, Patrick E. Leser, Geoffrey F. Bomarito, Joshua D. Pribe, Michael C. Stanley

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.13007: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.13007&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[395] Privacy Amplification in Differentially Private Zeroth-Order Optimization with Hidden States

Eli Chien, Wei-Ning Chen, Pan Li

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2506.00158: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2506.00158&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[396] Graph Concept Bottleneck Models

Haotian Xu, Tsui-Wei Weng, Lam M. Nguyen, Tengfei Ma

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2508.14255: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2508.14255&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[397] Concolic Testing on Individual Fairness of Neural Network Models

Ming-I Huang, Chih-Duo Hong, Fang Yu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.06864: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.06864&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[398] Optimal hypersurface decision trees

Xi He

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.12057: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.12057&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[399] Incomplete Data, Complete Dynamics: A Diffusion Approach

Zihan Zhou, Chenguang Wang, Hongyi Ye, Yongtao Guan, Tianshu Yu

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.20098: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.20098&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[400] Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling

Siva Viknesh, Amirhossein Arzani

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.00233: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.00233&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[401] Adaptive Node Feature Selection For Graph Neural Networks

Ali Azizpour, Madeline Navarro, Santiago Segarra

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.03096: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.03096&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[402] Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models

Xinshuai Dong, Ignavier Ng, Haoyue Dai, Jiaqi Sun, Xiangchen Song, Peter Spirtes, Kun Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.04378: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.04378&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[403] SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

Murtaza Rangwala, Farag Azzedin, Richard O. Sinnott, Rajkumar Buyya

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.07922: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.07922&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[404] Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

Jingxin Zhan, Yuze Han, Zhihua Zhang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.22819: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.22819&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[405] SynQuE: Estimating Synthetic Dataset Quality Without Annotations

Arthur Chen, Victor Zhong

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.03928: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.03928&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[406] Uncertainty Modeling for Multi-Objective RTA Interception with Distillation Acceleration

Gaoxiang Zhao, Ruinan Qiu, Pengpeng Zhao, Rongjin Wang, Xiaoting Wang, Zhangang Lin, Xiaoqiang Wang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.05582: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.05582&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[407] When Structure Doesn’t Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected

Haotian Xu, Yuning You, Tengfei Ma

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2511.16767: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2511.16767&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[408] Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism

Tianwei Ni, Esther Derman, Vineet Jain, Vincent Taboga, Siamak Ravanbakhsh, Pierre-Luc Bacon

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.04341: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.04341&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[409] Resting Neurons, Active Insights: Robustify Activation Sparsity for Large Language Models

Haotian Xu, Jiannan Yang, Tian Gao, Tsui-Wei Weng, Tengfei Ma

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.12744: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.12744&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[410] NRGPT: An Energy-based Alternative for GPT

Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.16762: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.16762&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[411] The Seismic Wavefield Common Task Framework

Alexey Yermakov, Yue Zhao, Marine Denolle, Yiyu Ni, Philippe M. Wyder, Judah Goldfeder, Stefano Riva, Jan Williams, David Zoro, Amy Sara Rude, Matteo Tomasetto, Joe Germany, Joseph Bakarji, Georg Maierhofer, Miles Cranmer, J. Nathan Kutz

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.19927: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.19927&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[412] Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces

Bryon Tjanaka, Henry Chen, Matthew C. Fontaine, Stefanos Nikolaidis

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.01082: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.01082&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[413] Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

Luca Zhou, Bo Zhao, Rose Yu, Emanuele Rodolà

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.22285: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.22285&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[414] Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

Markus Mueller, Kathrin Gruber, Dennis Fok

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.22816: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.22816&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[415] Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE

Yuanteng Chen, Peisong Wang, Nanxin Zeng, Yuantian Shao, Shuang Qiu, Gang Li, Jing Liu, Jian Cheng

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.02443: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.02443&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[416] Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models

Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.03265: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.03265&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[417] Riemannian MeanFlow

Dongyeop Woo, Marta Skreta, Seonghyun Park, Kirill Neklyudov, Sungsoo Ahn

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.07744: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.07744&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[418] One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Amith Bhat, Haipeng Luo, Aadirupa Saha

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.14474: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.14474&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[419] RAT+: Train Dense, Infer Sparse – Recurrence Augmented Attention for Dilated Inference

Xiuying Wei, Caglar Gulcehre

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2602.18196: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2602.18196&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[420] A Comparative Study of UMAP and Other Dimensionality Reduction Methods

Guanzhe Zhang, Shanshan Ding, Zhezhen Jin

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.02275: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.02275&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[421] Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning

Xueqiao Peng, Andrew Perrault

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.19397: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.19397&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[422] Placing Puzzle Pieces Where They Matter: A Question Augmentation Framework for Reinforcement Learning

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.15830: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.15830&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[423] A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

S. Gratton, Ph. L. Toint

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.17423: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.17423&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[424] How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals

Dharshan Kumaran, Viorica Patraucean, Simon Osindero, Petar Veličković, Nathaniel Daw

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.22271: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.22271&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[425] A Differentiable Framework for Global Circulation Model Precipitation Bias Correction

Kamlesh Sawadekar, Seth McGinnis, Peijun Li, Kathryn Lawson, Chaopeng Shen

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.23045: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.23045&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[426] RL Token: Bootstrapping Online RL with Vision-Language-Action Models

Charles Xu, Jost Tobias Springenberg, Michael Equi, Ali Amin, Adnan Esmail, Sergey Levine, Liyiming Ke

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.23073: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.23073&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[427] SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations

Jason Wu, Shir-Kang Scott Jin, Yuyang Yuan, Maggie Wigness, Lance M. Kaplan, Hang Qiu, Mani Srivastava

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.26181: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.26181&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[428] Distance-Aware Error for Spline Networks: A Bottom-Up Approach to Uncertainty

Masoud Ataei, Mohammad Javad Khojasteh, Vikas Dhiman

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.04757: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.04757&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[429] Mean-field limit from general mixtures of experts to quantum neural networks

Anderson Melchor Hernandez, Davide Pastorello, Giacomo De Palma

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2501.14660: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2501.14660&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[430] Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium

Kaizhao Liu, Qi Long, Zhekun Shi, Weijie J. Su, Jiancong Xiao

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.10990: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.10990&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[431] Doubly robust identification of treatment effects from multiple environments

Piersilvio De Bartolomeis, Julia Kostin, Javier Abad, Yixin Wang, Fanny Yang

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2503.14459: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2503.14459&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[432] TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference

Raja Gond, Nipun Kwatra, Ramachandran Ramjee

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2505.11329: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2505.11329&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[433] Characterizing control between interacting subsystems with deep Jacobian estimation

Adam J. Eisen, Mitchell Ostrow, Sarthak Chandra, Leo Kozachkov, Earl K. Miller, Ila R. Fiete

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.01946: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.01946&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[434] Surprisingly High Redundancy in Electronic Structure Data Across Materials Explained by Low Intrinsic Dimensionality

Sazzad Hossain, Ponkrshnan Thiagarajan, Shashank Pathrudkar, Stephanie Taylor, Abhijeet S. Gangan, Amartya S. Banerjee, Susanta Ghosh

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2507.09001: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2507.09001&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[435] Understanding Cognitive States from Head & Hand Motion Data

Kaiang Wen, Mark Roman Miller

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2509.24255: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2509.24255&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[436] Foundation Models for Discovery and Exploration in Chemical Space

Alexius Wadell, Anoushka Bhutani, Victor Azumah, Austin R. Ellis-Mohr, Andrew J. Stier, Kareem Hegazy, Alexander Brace, Hancheng Zhao, Celia Kelly, Anuj K. Nayak, Yuhan Chen, Dimitrios Simatos, Hongyi Lin, Murali Emani, Venkatram Vishwanath, Kevin Gering, Melisa Alkan, Tom Gibbs, Jack Wells, Wesley W. Qian, Richard C. Gerkin, Benjamin Amorelli, Alexander B. Wiltschko, Lav R. Varshney, Bharath Ramsundar, Karthik Duraisamy, Michael W. Mahoney, Arvind Ramanathan, Venkatasubramanian Viswanathan

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.18900: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.18900&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[437] TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests

Haarika Manda, Manshi Sagar, Yogesh, Kartikay Singh, Cindy Zhao, Tarun Mangla, Phillipa Gill, Elizabeth Belding, Arpit Gupta

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.21141: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.21141&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[438] Minimizing Human Intervention in Online Classification

William Réveillard, Vasileios Saketos, Alexandre Proutiere, Richard Combes

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2510.23557: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2510.23557&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[439] Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network

Elham Kiyani, Amit Makarand Deshpande, Madhura Limaye, Zhiwei Gao, Zongren Zou, Sai Aditya Pradeep, Srikanth Pilla, Gang Li, Zhen Li, George Em Karniadakis

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2512.13746: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2512.13746&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[440] Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

Huan Li, Yiming Dong, Zhouchen Lin

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2601.07326: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2601.07326&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[441] Statistical Testing Framework for Clustering Pipelines by Selective Inference

Yugo Miyata, Tomohiro Shiraishi, Shuichi Nishino, Ichiro Takeuchi

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.18413: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.18413&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[442] On the Expressive Power of Contextual Relations in Transformers

Demián Fraiman

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2603.25860: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2603.25860&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[443] Generative Modeling under Non-Monotone MAR Missingness via Approximate Wasserstein Gradient Flows

Gitte Kremling, Jeffrey Näf, Johannes Lederer

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.04567: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.04567&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

[444] From Cursed to Competitive: Closing the ZO-FO Gap via Input-to-State Stability

Amir Ali Farzin, Philipp Braun, Iman Shames

Main category: cs.LG

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Failed to fetch summary for 2604.25372: Page request resulted in HTTP 429 (https://export.arxiv.org/api/query?search_query=&id_list=2604.25372&sortBy=relevance&sortOrder=descending&start=0&max_results=100)

cs.MA

[445] The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations

Aurélien Bück-Kaeffer, Sneheel Sarangi, Maximilian Puelma Touzel, Reihaneh Rabbany, Zachary Yang, Jean-François Godbout

Main category: cs.MA

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Studies attempting to simulate human behavior with $\textit{Silicon Societies}$ grow in numbers while LLM-only social networks have started appearing outside of controlled settings. However, the design space of these networks remains under-studied, which contributes to a gap in validating model realism. To enable future works to make more informed design decisions, we perform a systematic analysis of the consequences and interactions of key design choices in simulated social networks, including the choice of base model used to model individual agents, and how they are connected to each other. Using surveys as a proxy for agent opinions, our findings suggest that the geometry of the design space is non-trivial, with some parameters behaving in additive ways while others display more complex interactions. In particular, the choice of the base LLM is the most important variable impacting the simulation outcomes.

[446] Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

Maksym Nechepurenko, Pavel Shuvalov

Main category: cs.MA

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Evaluating the true forecasting ability of AI agents requires environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL – a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score – proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy’s classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $α^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $α^* = 0.01$ requires four times more. We complement these analytical results with a 50-round live evaluation of five frontier LLM agents plus a random baseline. Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. All smart contracts and evaluation infrastructure are open-source.

[447] Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen

Main category: cs.MA

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle to balance local adaptation, global coordination, and communication efficiency in heterogeneous nonconvex environments. In this paper, we take an initial step toward trajectory-driven self-design for distributed black-box consensus optimization. We first redesign the agent-level swarm dynamics with an adaptive internal mechanism tailored to decentralized consensus settings, improving the balance between exploration, convergence, and local escape. Built on top of this adaptive execution layer, we propose Learning to Act and Cooperate (LACMAS), a trajectorydriven framework in which large language models provide sparse highlevel guidance for shaping both agentinternal action behaviors and agentexternal cooperation patterns from historical optimization trajectories. We further introduce a phased cognitive scheduling strategy to activate different forms of adaptation in a resource-aware manner. Experiments on standard distributed black-box benchmarks and real-world distributed tasks show that LAC-MAS consistently improves solution quality, convergence efficiency, and communication efficiency over strong baselines, suggesting a practical route from handcrafted distributed coordination toward self-designing multi-agent optimization systems.

cs.MM

[448] RoboKA: KAN Informed Multimodal Learning for RoboCall Surveillance System

Nitin Choudhury, Nikhil Kumar, Aditya Kumar Sinha, Abhijeet Anand, Hossein Salemi, Orchid Chetia Phukan, Hemant Purohit, Arun Balaji Buduru

Main category: cs.MM

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Wide exploration on robocall surveillance research is hindered due to limited access to public datasets, due to privacy concerns. In this work, we first curate Robo-SAr, a synthetic robocall dataset designed for robocall surveillance research. Robo-SAr comprises of ~200 unwanted and ~1200 legitimate synthetic robocall samples across three realistic adversarial axes: psycholinguistics-manipulated transcripts, emotion-eliciting speech, and cloned voices. We further propose RoboKA, a Kolmogorov-Arnold Network (KAN)-based multimodal fusion framework designed to model structured nonlinear interactions between acoustic and linguistic cues that characterize diverse adversarial robocall strategies. RoboKA first leverages cross-modal contrastive learning to align latent modality representations and feeds the resulting embeddings to a KAN-projection head for final classification. We benchmark RoboKA against strong unimodal and multimodal baselines in both in-domain and out-of-domain setups, finding RoboKA to surpass all baselines in terms of recall and F1-score.

[449] CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

Yawen Qin, Ke Qiu, Qin Zhang

Main category: cs.MM

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because dance requires simultaneous reasoning over linguistic semantics, musical rhythm, and full-body motion dynamics. We introduce TD-Data, a large-scale open dataset for text-dance retrieval, containing about 4,000 12-second dance clips, 14.6 hours of motion, 22 genres, and annotations from professional dance experts. On top of this dataset, we propose CustomDancer, a multimodal retrieval framework that aligns text with dance through a CLIP-based text encoder, music and motion encoders, and a music-motion blending module. CustomDancer achieves state-of-the-art performance on TD-Data, reaching 10.23% Recall@1 and improving retrieval quality in both quantitative benchmarks and user preference studies.

eess.AS

[450] From Birdsong to Rumbles: Classifying Elephant Calls with Out-of-Species Embeddings

Christiaan M. Geldenhuys, Thomas R. Niesler

Main category: eess.AS

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: We show that pretrained acoustic embeddings classify elephant vocalisations at a level approaching that of end-to-end supervised neural networks, without any fine-tuning of the embedding model. This result is of practical importance because annotated bioacoustic data are scarce and costly to obtain, leaving conventional supervised approaches prone to overfitting and to poor generalisation under domain shift. A broad range of embedding models drawn from general audio, speech, and bioacoustic domains is evaluated, all of which are either out-of-domain (containing no bioacoustic data) or out-of-species (containing no elephant call data). The embedding networks themselves remain fixed; only the lightweight downstream classifiers, which include a linear model and several small neural networks, are trained. Among the models considered, Perch 2.0 achieves the best cross-validated classification performance, attaining AUCs of 0.849 on African bush elephant (Loxodonta africana) calls and 0.936 on Asian elephant (Elephas maximus) calls, with Perch 1.0 close behind. The best-performing system is within 2.2 % of an end-to-end supervised elephant call classification system. A layerwise analysis of pretrained transformer encoders, considered as embedding models, shows that intermediate representations outperform final-layer outputs. The second layer of both wav2vec2.0 and HuBERT encodes sufficient information for effective elephant call classification; truncation at this layer therefore preserves classification performance whilst retaining only approximately 10 % of the parameters of the full network. Such compact embedding networks are well suited to on-device processing where computational resources are limited.

[451] Transformer-based End-to-End Control Filter Generation for Active Noise Control

Ziyi Yang, Zhengding Luo, Yisong Zou, Boxiang Wang, Qirui Huang, Woon-Seng Gan

Main category: eess.AS

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: To address the limitations of existing Generative Fixed-Filter Active Noise Control (GFANC) methods, which rely on filter decomposition and recombination and require supervised learning with labeled data, this paper proposes a Transformer-based End-to-End Control-Filter Generation (E2E-CFG) framework. Unlike previous approaches that predict combination weights of sub control filters, the proposed method directly generates control filters in an unsupervised manner by integrating the co-processor and real-time controller into a fully differentiable ANC system, where the accumulated error signal is used as the training objective. By abandoning the decomposition–reconstruction process, the proposed design simplifies the control pipeline and avoids error accumulation, while the Transformer architecture effectively captures global and dynamic noise characteristics through its attention mechanism. Numerical simulations on real-recorded noises demonstrate that the proposed method achieves improved noise reduction performance and adaptability to different types of noises compared with the original GFANC framework.

[452] Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

Main category: eess.AS

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

[453] The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Donghang Wu, Tianyu Zhang, Yuxin Li, Hexin Liu, Chen Chen, Eng Siong Chng, Yoshua Bengio

Main category: eess.AS

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional “thinking” mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user’s speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.

eess.IV

[454] Broadband Wide Field of View Imaging with Computational Mirrors

Vishwanath Saragadam, Niki Nezakati, Amit Roy-Chowdhury, Vivek Boominathan

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Traditional glass-based optics are typically optimized for narrow spectral bands, such as the visible (400-700nm) or shortwave infrared (1000-1800nm). While the emergence of VIS-SWIR sensors (400-1700nm) offers transformative potential, refractive optics struggle to focus this entire range simultaneously. Mirrors represent a promising achromatic alternative; however, they are often sidelined by field curvature, and off-axis aberrations. This paper introduces Computational Mirrors, a framework that enables high-resolution, wide-field-of-view imaging across the complete VIS-SWIR spectrum using a single sensor. Our method is built on the observation that distinct regions of the field of view reach focus at varying distances from the mirror. By capturing a minimal focal stack (2-4 images), we utilize a computational backend to recover a sharp, all-in-focus image. A key contribution of this work is SeidelConv, a novel, physics-inspired, spatially-varying point spread function (PSF) model designed to accurately characterize and correct the off-axis aberrations inherent in simple concave mirrors. We demonstrate the efficacy of our approach using a first-of-its-kind 50mm F/1 optical system equipped with a VIS-SWIR sensor. Our system produces sharp images across RGB, NIR, and SWIR wavelengths without requiring refocusing, revealing material details invisible within individual spectral bands. We further validate the scalability of our approach with a 100mm F/2 system optimized for long-range imaging.

[455] RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics

Bojun Zhang, Huiyu Yang, Yunpeng Wang, Yuntian Chen, Yuanwei Bin, Rikui Zhang, Jianchun Wang

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Rapid aerodynamic evaluation is crucial for modern vehicle design, yet existing neural operators struggle to capture intricate spatial correlations. We propose the rotary-enhanced transformer operator (RETO), a novel neural solver featuring a dual-stage spatial awareness mechanism: sinusoidal-cosine encodings for global referencing and rotary positional encodings (RoPE) for relative displacements. RoPE encodes spatial relations via unitary rotations, enforcing translation invariance and enhancing local gradient resolution. RETO is validated on ShapeNet and the high-fidelity DrivAerML benchmark. On ShapeNet, RETO achieves a relative $L_2$ error of 0.063, outperforming RegDGCNN at 0.125 and representing a 16% improvement over the Transolver baseline, which yields an error of 0.075. These performance gains are further amplified on the DrivAerML dataset, where RETO achieves relative $L_2$ errors of 0.089 for surface pressure and 0.097 for velocity. In comparison, Transolver results in errors of 0.116 and 0.121 for the same metrics, indicating that RETO achieves precision enhancements of 23% and 19%, respectively. For comprehensive comparison, the surface pressure and velocity errors for AB-UBT are 0.102 and 0.124, while RegDGCNN yields 0.235 and 0.312, respectively. Information-theoretical analysis shows that the entropy peak of RETO at 0.35 is significantly lower than that of Transolver at 0.75 under $10^4$ resolution, indicating a focused attentional mechanism capable of preserving localized gradients against global diffusion.

[456] Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

Ge Luo, Jun-Jie Huang, Qi Yu, Tianrui Liu, Ke Liang, Yuming Xiang, Wentao Zhao, Xinwang Liu, Meng Wang

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.

[457] Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy

Minhee Lee, Sangyoon Lee, Jiwook Lee, Minki Hong, Kyuyoung Kim, Wonhwa Kim, Jaeho Lee

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios. However, Lissajous scanning traces a resonant trajectory and samples only the visited pixels per frame; at high frame rates, many pixels remain unvisited, creating structured holes. In this work, we introduce the first benchmark for high-rate Lissajous CLE, consisting of low-quality video clips paired with high-quality reference images. The reference images are wide-FOV mosaics obtained by stitching stabilized, slow-scan frames of the same tissue, enabling temporally aligned supervision. Using this dataset, we propose MIRA, a lightweight recurrent framework for Lissajous CLE restoration that iteratively aggregates temporal context through feature reuse and displacement alignment. Our experiments demonstrate that MIRA outperforms both lightweight and high-complexity baselines in restoration quality while maintaining a favorable computational efficiency suitable for clinical deployment.

[458] FedKPer: Tackling Generalization and Personalization in Medical Federated Learning via Knowledge Personalization

Zoe Fowler, Ghassan AlRegib

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Federated learning (FL) holds great potential for medical applications. However, statistical heterogeneity across healthcare institutions poses a major challenge for FL, as the global model struggles both to generalize across unseen patient populations and to adapt to the unique data distributions of individual hospitals. This heterogeneity also exacerbates forgetting at both the global and local level, resulting in previous learned patient patterns to be misclassified after model updates. While prior work has largely treated generalization and personalization as separate challenges, we show that a better balance between the two can be achieved through selective alignment with the global model and a modified aggregation scheme, which together mitigate the effects of statistical heterogeneity. Specifically, we introduce FedKPer, which introduces knowledge personalization into the training stage of each local device. Afterwards, generalization is considered via the global model aggregation process, where local updates that are reliable and label-diverse are emphasized. We evaluate the performance of FedKPer, devising additional metrics that relate to common consequences of forgetting. Overall, we demonstrate FedKPer improves the generalization-personalization trade-off without sacrificing retention.

[459] Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

Jingxi Pu, Tonghua Liu, Zhilin Guan, Siqiao Li, Yang Ming, Zheng Cong, Wei Zhang, Fangwei Li

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, which may interfere with visual interpretation by physicians and affect diagnostic results. To address this problem, inspired by Cycle-GAN for unsupervised learning, this paper proposes an end-to-end unsupervised low-dose computed tomography denoising framework. The proposed framework combines a U-Net structure for multi-scale feature extraction, an attention mechanism for feature fusion, and a residual network for feature transformation. It also introduces perceptual loss to improve the network for the characteristics of medical images. In addition, we construct a real low-dose computed tomography dataset and design a large number of comparative experiments to validate the proposed method, using both image-based evaluation metrics and medical evaluation criteria. Compared with classical methods, the main advantage of this paper is that it addresses the limitation that real clinical data cannot be directly used for supervised learning, while still achieving excellent performance. The experimental results are also professionally evaluated by imaging physicians and meet clinical needs.

[460] A Unified Deep Learning Framework for Motion Correction in Medical Imaging

Jian Wang, Razieh Faghihpirayesh, Danny Joca, Polina Golland, Ali Gholipour

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Deep learning has shown significant value in medical image registration for motion correction, however, current techniques are either limited by the type and range of motion they can handle, or require iterative inference and/or retraining for new imaging data. To address these limitations, we introduce UniMo, a Unified Motion Correction framework that leverages deep neural networks to correct for various types of motion in medical imaging. UniMo exploits an alternating optimization scheme for a unified loss function to train an integrated model of 1) an equivariant neural network for global rigid motion correction and 2) an encoder-decoder network to correct local deformations. It features a geometric deformation augmenter that 1) enhances the robustness of global motion correction by addressing any local deformations, and 2) generates augmented data to improve the training process. UniMo is a hybrid model that uses both image intensities and shapes to achieve robust performance amid image appearance variations, and, therefore, it generalizes well to various medical imaging modalities without a need for network retraining. We trained and tested UniMo to track motion in fetal magnetic resonance imaging. Then we tested the trained model, without retraining, on various image modalities from three public datasets, including MedMNIST, lung CT, and BraTS. The results show that UniMo surpassed existing motion correction methods in terms of accuracy, and, notably, it enabled one-time training on a single modality while maintaining high stability and adaptability for inference across multiple unseen imaging datasets. By offering a unified solution, UniMo marks a significant advantage in challenging applications with a mixture of bulk motion and local deformations. https://github.com/IntelligentImaging/UNIMO

[461] CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction

Suyi Chen, Haibin Ling

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: As a critical modality for structural biology, cryogenic electron microscopy (cryo-EM) facilitates the determination of macromolecular structures at near-atomic resolution. The core computational task in single-particle cryo-EM is to reconstruct the 3D electrostatic potential of a molecule from noisy 2D projections acquired at unknown orientations. Gaussian mixture models (GMMs) provide a continuous, compact, and physically interpretable representation for molecular density and have recently gained interest in cryo-EM reconstruction. However, existing methods rely on external consensus maps or atomic models for initialization, limiting their use in self-contained pipelines. In parallel, differentiable rendering techniques such as Gaussian splatting have demonstrated remarkable scalability and efficiency for volumetric representations, suggesting a natural fit for GMM-based cryo-EM reconstruction. However, off-the-shelf Gaussian splatting methods are designed for photorealistic view synthesis and remain incompatible with cryo-EM due to mismatches in the image formation physics, reconstruction objectives, and coordinate systems. Addressing these issues, we propose cryoSplat, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation. In particular, we develop an orthogonal projection-aware Gaussian splatting, with adaptations such as a view-dependent normalization term and FFT-aligned coordinate system tailored for cryo-EM imaging. These innovations enable stable and efficient homogeneous reconstruction directly from raw cryo-EM particle images using random initialization. Experimental results on real datasets validate the effectiveness and robustness of cryoSplat over representative baselines. The code will be released at https://github.com/Chen-Suyi/cryosplat.

[462] Brain MR Image Synthesis with 3D Multi-Contrast Self-Attention GAN

Zaid A. Abod, Furqan Aziz

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Complete and high-quality multi-modal Magnetic Resonance Imaging (MRI) is essential for accurate neuro-oncological assessment, as each contrast provides complementary anatomical and pathological information. However, acquiring all modalities (e.g., T1c, T1n, T2w, T2f) for every patient is often impractical due to prolonged scan times, cost, and patient discomfort, potentially limiting comprehensive tumour evaluation. We propose 3D-MC-SAGAN (3D Multi-Contrast Self-Attention Generative Adversarial Network), a unified 3D multi-contrast synthesis framework that generates high-fidelity missing modalities from a single T2w input while explicitly preserving tumour characteristics. The model employs a multi-scale 3D encoder–decoder generator with residual connections and a novel Memory-Bounded Hybrid Attention (MBHA) block to capture long-range dependencies efficiently, and is trained with a WGAN-GP critic and an auxiliary domain classification head to produce T2f, T1n, and T1c volumes within a unified network. To ensure anatomical and pathological fidelity, we incorporate a frozen 3D U-Net-based segmentation network that enforces a tumour-consistency constraint during training. A composite objective combining adversarial, reconstruction, perceptual, structural similarity, contrast-classification, and segmentation-guided losses further promotes both global realism and tumour-preserving structure. Extensive experiments on 3D brain MRI datasets demonstrate that 3D-MC-SAGAN achieves state-of-the-art quantitative performance and produces visually coherent, anatomically plausible contrasts with improved distributional realism. Importantly, the proposed method maintains tumour segmentation accuracy comparable to that achieved using fully acquired multi-modal inputs, highlighting its potential to reduce acquisition burden while preserving clinically meaningful information.

[463] Conditional Diffusion Posterior Alignment for Sparse-View CT Reconstruction

Luis Barba, Johannes Kirschner, Benjamin Bejar

Main category: eess.IV

TL;DR: Error: Processing failed

DetailsMotivation: Error: Processing failed

Method: Error: Processing failed

Result: Error: Processing failed

Conclusion: Error: Processing failed

Abstract: Computed Tomography (CT) is a widely used imaging modality in medical and industrial applications. To limit radiation exposure and measurement time, there is a growing interest in sparse-view CT, where the number of projection views is significantly reduced. Deep neural networks have shown great promise in improving reconstruction quality in sparse-view CT, especially generative diffusion models. However, these methods struggle to scale to large 3D volumes due to several reasons: (i) the high memory and computational requirements of 3D models, (ii) the lack of large 3D training datasets, and (iii) the inconsistencies across slices when using 2D models independently on each slice. We overcome these limitations and scale diffusion-based sparse-view CT reconstruction to large 3D volumes by combining conditional diffusion with explicit data consistency. We propose Conditional Diffusion Posterior Alignment (CDPA) to enable scalable 3D sparse-view CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to improve inter-slice consistency, combined with data-consistency alignment to match measured projections. Experiments on synthetic and real Cone Beam CT (CBCT) data show state-of-the-art performance, with ablations that confirm the synergistic effects of the proposed pipeline. Finally, we show that the same principles also strengthen fast denoising U-Nets, yielding near-diffusion quality at a fraction of the computational cost.

Last updated: 2026-05-04
Built with Hugo, theme modified on Stack