AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

69 Papers today
8h Update frequency
7 Days of history
Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators
Andrew Bukowski, Aditya Kothari, Simba Shi, Ishir Rao
Generative Models Theory Robotics
  • Neural networks can predict motion but often violate conservation laws.
  • Different model designs significantly affect the recovery of conserved quantities.
  • Training duration and data quality are critical for the performance of polynomial CDNs.
  • The structured energy model's advantage decreases in the presence of noise.
Read more
When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System
Deemah H. Tashman, Soumaya Cherkaoui
Reinforcement Learning Optimization Theory
  • Introduction of Disagreement-Guided Reward Poisoning (DGRP) attack targeting SAC agents in CRNs.
  • DGRP exploits substantial disagreement between dual critics to corrupt rewards and misguide learning.
  • The attack significantly reduces the performance improvements from RIS, impacting transmission quality.
  • DGRP is shown to be more damaging than existing reward poisoning strategies based on periodic timing or exploration triggers.
Read more
Simply Stabilizing the Loop via Fully Looped Transformer
Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang
NLP Large Language Models Efficient ML
  • Introduction of Fully Looped Transformer (FLT) to stabilize training of looped models.
  • Identification of training instability issues: gradient oscillation and residual explosion.
  • Parameter-free architectural modifications improve training dynamics.
  • FLT achieves stable training with up to 12 loop iterations, unlike baseline models.
Read more
On-Device Continual Learning with Dual-Stage Buffer and Dynamic Loss for Point-of-Care Pneumonia Diagnosis
Danu Kim
Computer Vision Efficient ML
  • First application of domain-incremental learning to X-ray pneumonia detection.
  • Development of a dual-stage balanced buffer for maintaining class balance in replay.
  • Introduction of a dynamic class-weighted loss function to address intra-batch imbalances.
  • PneumoNet achieves high accuracy and low forgetting under resource constraints.
Read more
LLM Benchmark Datasets Should Be Contamination-Resistant
Ali Al-Lawati, Jason Lucas, Dongwon Lee, Suhang Wang
Large Language Models NLP Theory
  • Benchmark dataset contamination significantly undermines the reliability of LLM evaluations.
  • Contamination-resistant datasets should be unlearnable during training but usable for inference.
  • The asymmetry in Transformer architectures can be leveraged to support contamination resistance.
  • Mathematical advancements are necessary for ensuring interoperability across different LLM architectures.
Read more
Planner-Admissible Graph-PDE Value Extensions for Sparse Goal-Conditioned Planning
Shiheng Zhang
Reinforcement Learning Graph Learning Theory
  • Introduces a planner-admissibility criterion for sparse goal-conditioned planning.
  • Demonstrates that AMLE outperforms harmonic averaging in maintaining local greedy orderings.
  • Establishes a theoretical certificate linking local value errors to rollout success.
  • Empirical results validate the effectiveness of AMLE across various graph configurations.
Read more
D$^3$-Subsidy: Online and Sequential Driver Subsidy Decision-Making for Large-Scale Ride-Hailing Market
Taijie Chen, Rui Su, Siyuan Feng, Laoming Zhang, Hongyang Zhang, Haijiao Wang, Zhaofeng Ma, Jintao Ke
Reinforcement Learning Generative Models Optimization
  • D3-Subsidy optimizes driver subsidies in ride-hailing markets under strict budget constraints.
  • The framework utilizes a diffusion-based model for generating future trajectories from historical data.
  • A context-conditioned inverse module translates high-level plans into actionable control signals.
  • Real-world testing shows a 1.59% increase in completed rides and a 2.06% increase in GMV.
Read more
Active Context Selection Improves Simple Regret in Contextual Bandits
Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash
Theory Optimization
  • Active context selection can significantly reduce simple regret in contextual bandits compared to passive sampling.
  • The proposed active sampling strategy achieves a tight regret rate that can improve by Θ(k1/4), where k is the number of contexts.
  • The EETC algorithm optimally balances exploration and exploitation when the context distribution is unknown, matching known distribution rates for large horizons.
  • The analysis extends to budgeted active sampling, providing insights into the minimum budget required to achieve optimal performance.
Read more
INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification
Seongjun Lee, Seokhyun Lee, Changhee Lee
Time Series Interpretability
  • INSHAPE discovers instance-specific shapelets that improve classification performance and interpretability.
  • The framework models temporal dependencies among shapelets, addressing limitations of existing methods.
  • It provides both local and global interpretability by aggregating instance-level shapelets into population-level insights.
  • Extensive experiments show INSHAPE's superior performance on benchmark datasets compared to traditional shapelet methods.
Read more
LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models
Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi
Large Language Models Efficient ML Optimization
  • LEAP introduces a new parameterization for unstructured pruning that is scalable and tractable for large language models.
  • The method improves zero-shot accuracy significantly compared to existing layer-wise pruning methods.
  • LEAP operates on frozen pretrained weights, allowing for easier deployment and integration with fine-tuning.
  • The framework is validated across various LLM families and demonstrates consistent performance improvements at high sparsity levels.
Read more
Federated Martingale Posterior Sampling
Boning Zhang, Matteo Zecchin, Mingzhao Guo, Dongzhu Liu, Osvaldo Simeone
Federated Learning Theory Optimization
  • Introduces Federated Martingale Posterior (FMP) sampling for federated Bayesian neural networks.
  • FMP allows clients to upload compressed data embeddings, reducing the need for full dataset sharing.
  • Demonstrates improved calibration and predictive performance over consensus-style baselines.
  • Validates the method on multiple datasets, showing its effectiveness in both homogeneous and heterogeneous client scenarios.
Read more
Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew
Joan SerrΓ , Dipam Goswami, Fabio Morreale, Wei-Hsiang Liao, Yuki Mitsufuji
Generative Models Interpretability
  • Introduction of MUCS, a novel method for TDA in diffusion models.
  • MUCS combines mirrored unlearning with noise-consistent skew for improved reliability.
  • Demonstrated significant performance improvements over existing TDA methods.
  • Analysis of influential instance overlap and the effectiveness of ensemble TDA approaches.
Read more
Scale Determines Whether Language Models Organize Representation Geometry for Prediction
Weilun Xu
NLP Large Language Models Theory
  • Introduction of Subspace PGA, a metric for assessing the alignment of representation geometry with predictive functions.
  • Demonstration that predictive organization in language models is scale-dependent, with smaller models losing this organization in later training layers.
  • Identification of a capacity trade-off where dominant directions in smaller models deviate from the readout subspace, masking predictive structure.
  • Large models maintain predictive organization throughout their layers, contrasting with the detour observed in smaller models.
Read more
Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space
Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis
Generative Models Theory
  • Establishes the first convergence bounds independent of state space size S, applicable to masked distributions.
  • Unified derivation covering all integral probability metrics (IPMs) based on a single rate-matrix condition.
  • Introduces novel techniques, including adjoint equations in the space of observables and coupling arguments.
  • Framework extends beyond convergence bounds, offering a toolkit for broader theoretical analyses.
Read more
Learning over Positive and Negative Edges with Contrastive Message Passing
Peter Pao-Huang, Charilaos I. Kanatsoulis, Michael Bereket, Jure Leskovec
Graph Learning Theory
  • Negative edges can provide significant information gain under specific graph conditions.
  • Contrastive Message Passing (CMP) effectively integrates both positive and negative edges in GNNs.
  • CMP outperforms standard GNNs and contrastive learning methods in low-label scenarios.
  • Theoretical analysis provides guidance on when to utilize negative edges for improved performance.
Read more
Content-Style Identification via Differential Independence
Subash Timilsina, Hoang-Son Nguyen, Sagar Shrestha, Xiao Fu
Generative Models Computer Vision Theory
  • Introduces content-style differential independence (CSDI) for identifying content and style variables in unpaired multi-domain data.
  • Imposes a blockwise orthogonality constraint on the Jacobian to achieve identifiability without requiring statistical independence.
  • Develops a scalable implementation for high-dimensional data using a multi-domain GAN framework.
  • Demonstrates practical benefits in counterfactual generation and domain translation across various datasets.
Read more
A Bitter Lesson for Data Filtering
Christopher Mohri, John Duchi, Tatsunori Hashimoto
NLP Large Language Models Theory
  • Data filtering may not be necessary for large-scale model pretraining.
  • Sufficiently large models can benefit from low-quality or distractor data.
  • The full Common Crawl dataset outperforms filtered versions when models are adequately trained.
  • Scaling laws predict compute requirements for optimal performance without filtering.
Read more
BrainDyn: A Sheaf Neural ODE for Generative Brain Dynamics
Siddharth Viswanath, Panayiotis Ketonis, Chen Liu, Michael Perlmutter, Dhananjay Bhaskar, Smita Krishnaswamy
Generative Models Graph Learning Time Series
  • Introduction of BrainDyn, a sheaf-based neural ODE framework for modeling brain dynamics.
  • Utilization of learnable restriction maps for expressive interaction modeling between brain regions.
  • Capability to generate brain-like trajectories that reflect complex neural interactions.
  • Comprehensive evaluation across multiple modalities, achieving strong performance in modeling neural dynamics.
Read more
DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift
Kieran Wood, Stefan Zohren, Stephen J. Roberts
Time Series
  • DeRegiME separates latent uncertainty regimes from the underlying signal, improving interpretability.
  • Utilizes a sparse variational Gaussian process with a nonstationary regime-mixing kernel.
  • Demonstrates significant improvements in predictive performance over existing models across multiple datasets.
  • Captures complex distribution shifts in time-series data, including abrupt and gradual changes.
Read more
Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models
Mohammed Saidul Islam, Negin Baghbanzadeh, Farnaz Kohankhaki, Afshin Cheraghi, Ali Kore, Shayaan Mehdi, Elham Dolatabadi, Arash Afkanpour
Large Language Models Theory Efficient ML
  • Introduction of FLAME framework for automated benchmark generation.
  • Generation of benchmarks with broad coverage and rich metadata.
  • Expert-reviewed benchmarks demonstrate lower error rates than existing benchmarks.
  • FLAME reveals fine-grained performance differences across models.
Read more
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences
Madeline Celi Kitch, Nihar B. Shah
Theory Interpretability NLP
  • The paper introduces a minimal assumption framework for modeling evaluator preferences, focusing on non-decreasing functions.
  • It highlights the issues caused by common modeling assumptions that can lead to significant errors in preference learning.
  • A new algorithm is proposed that can learn evaluator preferences robustly, maintaining performance even under model mismatch.
  • The effectiveness of the algorithm is validated through both synthetic simulations and real-world applications.
Read more
Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing
Manal Benhamza, Marianne Clausel, Myriam Tami
Multimodal Theory Interpretability
  • Establishes identifiability guarantees for causal representations in multimodal settings with partially shared latent structures.
  • Proves identifiability under weaker assumptions than previous works, allowing for undercomplete cases.
  • Introduces a differentiable Wasserstein-based module for recovering latent structures, applicable across different architectures.
  • Demonstrates superior performance compared to state-of-the-art methods through extensive experiments.
Read more
FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics
Qiran Zou, Hou Hei Lam, Wenhao Zhao, Tingting Chen, Yiming Tang, Samson Yu, Yingtao Zhu, Srinivas Anumasa, Zufeng Zhang, Tianyi Zhang, Chang Liu, Zhengyao Jiang, Anirudh Goyal, Dianbo Liu
Optimization
  • FML-bench separates agent strategy from execution infrastructure, allowing for clearer performance attribution.
  • Complexity of agent strategies does not necessarily lead to better performance; simpler strategies can be equally effective.
  • Greedy search strategies excel in dense opportunity landscapes, while broader strategies are better for sparse environments.
  • An adaptive agent that changes exploration strategies based on performance shows improved results.
Read more
Fast and Featureless Node Representation Learning with Partial Pairwise Supervision
Sujan Chakraborty, Saptarshi Bej
Graph Learning Optimization Efficient ML
  • Introduction of Contrastive FUSE for node representation learning without node features.
  • Development of a signed, normalized contrastive Laplacian that enhances modularity-based learning.
  • Efficient optimization scheme that approximates modularity gradient for faster training.
  • Demonstrated competitive performance on benchmark datasets with lower runtime compared to existing methods.
Read more
Post-Trained MoE Can Skip Half Experts via Self-Distillation
Xingtai Lv, Li Sheng, Kaiyan Zhang, Yichen You, Siyan Gao, Xueheng Luo, Yuxin Zuo, Yuchen Fan, Junlin Yang, Ganqu Cui, Bingning Wang, Fan Yang, Youbang Sun, Ning Ding, Bowen Zhou
NLP Large Language Models Efficient ML
  • Introduces ZEDA, a framework for adapting post-trained MoE models to dynamic MoE models.
  • Utilizes zero-output experts to enhance routing efficiency without sacrificing performance.
  • Achieves over 50% reduction in expert FLOPs with marginal accuracy loss.
  • Outperforms existing dynamic MoE baselines by significant margins.
Read more
Graph Hierarchical Recurrence for Long-Range Generalization
Stefano Carotti, Marco Pacini, Alessio Gravina, Davide Bacciu, Bruno Lepri, Sebastiano Bontorin
Graph Learning
  • Introduction of Graph Hierarchical Recurrence (GHR) for improved long-range generalization in graph learning.
  • GHR combines dual-level recurrent architecture with hierarchical pooling to enhance information propagation.
  • Demonstrated strong performance on long-range benchmarks with significantly fewer parameters compared to state-of-the-art models.
  • GHR effectively addresses both in-range and out-of-range generalization challenges.
Read more
What Makes a Representation Good for Single-Cell Perturbation Prediction?
Wenkang Jiang, Yuhang Liu, Yichao Cai, Erdun Gao, Jiayi Dong, Ehsan Abbasnejad, Lina Yao, Javen Qinfeng Shi
Generative Models Theory Interpretability
  • Introduces the Perturbation Suppression Hypothesis, highlighting the dominance of invariant information in gene expression data.
  • Proposes PerturbedVAE, a framework that separates perturbation-specific signals from invariant structures.
  • Includes an identifiability analysis for reliable recovery of perturbation effects.
  • Achieves state-of-the-art performance on single-cell perturbation benchmarks.
Read more
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
Zunhai Su, Rui Yang, Chao Zhang, Yaxiu Liu, Yifan Zhang, Wei Wu, Jing Xiong, Dayou Du, Xialie Zhuang, Yulei Qian, Yuchen Xie, Yik-Chung Wu, Hongxia Yang, Ngai Wong
Large Language Models Multimodal Efficient ML
  • OScaR addresses the limitations of per-channel quantization by mitigating Token Norm Imbalance (TNI).
  • The framework employs Canalized Rotation and Omni-Token Scaling for efficient KV cache compression.
  • OScaR achieves significant improvements in decoding speed, memory efficiency, and throughput.
  • The methodology is lightweight and does not require complex quantization pipelines.
Read more
When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search
John T. Robertson, Jianing Zhu, Haris Vikalo, Zhangyang Wang
NLP Large Language Models Optimization
  • Prompt-boundary directional alignment can predict effective rank-1 steering directions, enabling efficient layer selection.
  • Rank-1 steering is framed as a budget-constrained optimization problem, allowing for geometry-guided search that reduces search costs.
  • Concept granularity measures directional heterogeneity and predicts optimization difficulty and steering performance.
  • GRACE framework provides a systematic approach to diagnose steering difficulties and allocate optimization efforts effectively.
Read more
Anytime and Difficulty-Adaptive PAC-Bayes for Constrained Density-Ratio Network with Continual Learning Guarantees
Paulo Akira F. Enabe
Theory
  • Introduces a constrained density-ratio network for learning under covariate shift.
  • Combines importance-weighted empirical risk with PAC-Bayes generalization guarantees.
  • Imposes structural constraints to ensure calibration and stability in learning.
  • Validated through controlled and real-data experiments, showing superior performance.
Read more
Online Market Making and the Value of Observing the Order Book
Davide Maran, Marcello Restelli
Theory Optimization
  • Introduction of an action-dependent feedback model for online market making.
  • Achieves O(√T) regret in stochastic settings without smoothness assumptions.
  • Extends results to mean-reverting price processes with relaxed conditions.
  • Proposes an explore-then-perturb algorithm for adversarial settings achieving O(T^(2/3)) regret.
Read more
TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting
Zeyu Zhang, Bradly C. Stadie
NLP Large Language Models Reinforcement Learning
  • Introduces TEMPO, a method to enforce temporal discipline in LLMs during backtesting.
  • Utilizes a two-mode reward system to eliminate knowledge leakage before optimizing performance.
  • Employs a GRPO-based training pipeline to help models learn valid reasoning strategies.
  • Achieves significant reductions in leakage rates and improvements in task performance.
Read more
A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization
Lei Dong
Theory Large Language Models Efficient ML
  • Establishes a precise asymmetry constant for sign-flip versus magnitude perturbations in ReLU + RMSNorm models.
  • Characterizes ternary quantization error and its impact on model performance.
  • Demonstrates that ReLU activation creates directional asymmetry affecting output energy.
  • Identifies the role of outlier features in amplifying sign sensitivity in real models.
Read more
Chessformer: A Unified Architecture for Chess Modeling
Daniel Monroe, George Eilender, Philip Chalmers, Zhenwei Tang, Ashton Anderson
Interpretability
  • Chessformer is a unified architecture that improves chess modeling across multiple objectives.
  • Achieved a state-of-the-art move-matching accuracy of 57.1% with a significantly smaller model.
  • Increased the playing strength of Leela Chess Zero by over 100 Elo points, defeating Stockfish.
  • Introduced Geometric Attention Bias (GAB) for better adaptation to chess geometry.
Read more
Take It or Leave It: Intent-Controlled Partial Optimal Transport
Salil Parth Tripathi, Bertrand Chapron, Fabrice Collard, Nicolas Courty, Ronan Fablet
Optimization Theory Multimodal
  • Introduces intent-controlled partial optimal transport (IC-POT) for structured rejection mechanisms.
  • Replaces global rejection with pointwise rejection costs based on local information.
  • Demonstrates practical relevance in positive-unlabeled learning and open-partial domain adaptation.
  • Shows improvements in performance through empirical experiments compared to traditional methods.
Read more
Protein Fold Classification at Scale: Benchmarking and Pretraining
Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt
Theory Generative Models Graph Learning
  • Introduction of TEDBench, a large-scale benchmark for protein fold classification.
  • Development of Masked Invariant Autoencoders (MiAE) for protein structure representation learning.
  • MiAE achieves significant performance improvements over existing models on TEDBench.
  • The benchmark and methods provide a foundation for future research in protein classification.
Read more
From Simple to Complex: Curriculum-Guided Physics-Informed Neural Networks via Gaussian Mixture Models
Jianan Yang, Yiran Wang, Shuai Li, Fujun Cao, Xuefei Yan, Junmin Liu
Theory Optimization
  • Introduction of CGMPINN, combining GMM with dynamic curriculum learning for improved PINN training.
  • The framework quantifies spatially varying learning difficulty and adapts training focus accordingly.
  • Theoretical guarantees established for convergence and generalization.
  • Experimental validation shows significant error reduction compared to standard PINNs.
Read more
EviTrack: Selection over Sampling for Delayed Disambiguation
Omer A. Haq
Time Series Theory Efficient ML
  • EviTrack introduces a test-time inference framework that operates over latent trajectories rather than marginal states.
  • The framework maintains competing trajectory hypotheses and delays commitment until sufficient evidence is available.
  • EviTrack outperforms traditional sampling-based methods in scenarios of delayed disambiguation.
  • The study emphasizes the importance of trajectory-level selection for effective sequential prediction.
Read more
In-context learning enables continental-scale subsurface temperature prediction from sparse local observations
Daniel O'Malley, Christopher W. Johnson, Javier E. Santos, Pablo Lara, Sandro MalusΓ , Bharat Srikishan, John Kath, Arnab Mazumder, Mohamed Mehana, David Coblentz, Nathan DeBardeleben, Earl Lawrence, Hari Viswanathan
Theory Interpretability Optimization
  • Introduces In-Context Earth, a transformer-based model for subsurface temperature prediction.
  • Achieves a mean absolute error of 4.7 Β°C, outperforming traditional models.
  • Maintains high accuracy in diverse geological regions without fine-tuning.
  • Learns internal representations of subsurface properties, enhancing interpretability.
Read more
Bridge: Retrieval-Augmented Spatiotemporal Modeling for Urban Delivery Demand
Yihong Tang, Tong Nie, Junlin He, Qianjun Huang, Dingyi Zhuang, Lijun Sun
Graph Learning Time Series Optimization
  • BRIDGE addresses urban delivery demand forecasting in cold-start regions with limited historical data.
  • The framework integrates a contextual graph backbone with a retrieval mechanism for future demand patterns.
  • A future-aware training objective enhances the retriever's effectiveness in aligning with forecasting needs.
  • Experiments demonstrate consistent performance improvements over traditional spatiotemporal forecasting models.
Read more
StampFormer: A Physics-Guided Material-Geometry-Coupled Multimodal Model for Rapid Prediction of Physical Fields in Sheet Metal Stamping
Jiajie Luo, Mohamed Mohamed, Osama Hassan, Haosu Zhou, Yingxue Zhao, Haoran Li, Xinrun Li, Zhutao Shao, Yang Long, Nan Li, Jichun Li
Multimodal
  • Introduces a physics-guided deep learning framework for sheet metal forming.
  • Integrates both geometric and material properties as multimodal inputs.
  • Achieves rapid predictions of physical fields in under a second.
  • Reduces simulation time from hours to subsecond AI inference.
Read more
Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality
Mohamed Bouadi, Nassim Bouarour, Varun Kulkarni, Shivam Dubey, Aditya Tanna, Vinay Kumar Sankarapu
Theory Generative Models Optimization
  • Introduction of O’PRIOR, a compositional realism prior for tabular model pretraining.
  • Establishment of a controlled evaluation protocol to isolate the effects of prior design.
  • Demonstration of substantial improvements in model performance due to enhanced synthetic task distributions.
  • Identification of independent contributions from various realism components in the prior.
Read more
CLIC: Contextual Language-Informed Cardiac Pathology Classification
Giovani D. Lucafo, Rafael da Costa Silva, JoΓ£o Lucas Luz Lima Sarcinelli, Andre Guarnier De Mitri, Diego Furtado Silva
Multimodal Time Series NLP
  • Introduction of CLIC, a multimodal framework for ECG-based cardiac pathology classification.
  • Demonstrates the importance of integrating contextual patient data with ECG signals.
  • Two configurations are explored: Data-to-Text and Prompt-guided approaches using LLMs.
  • Template-based contextual descriptions outperform LLM-generated texts in classification tasks.
Read more
D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting
Tianyu Wu, Yu Yao, Zhenting Qi, Han Zheng, Zhuohan Wang, Haoran Ma, Lawrence Liao, Himabindu Lakkaraju, Ju Li, Yilun Du
Large Language Models Efficient ML Generative Models
  • D-PACE introduces adaptive per-position weights for training, improving the efficiency of speculative decoding.
  • The method stabilizes training through asymmetric smoothing, preserving token-level gradients.
  • D-PACE shows significant improvements in decoding speed and emitted length across multiple benchmarks.
  • The approach does not require changes to existing drafter architectures or inference processes.
Read more
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
Wanghan Xu, Yuhao Zhou, Hengyuan Zhao, Shuo Li, Dianzhi Yu, Zhenfei Yin, Yaowen Hu, Fengli Xu, Wanli Ouyang, Wenlong Zhang, Lei Bai
NLP Large Language Models Reinforcement Learning
  • ReCrit addresses the instability of LLMs in multi-turn critic interactions.
  • The framework decomposes correctness transitions into four distinct quadrants.
  • Dynamic asynchronous rollout enhances training efficiency and scalability.
  • Significant improvements in critic accuracy were observed across multiple scientific reasoning benchmarks.
Read more
Efficient Conditioning Why Pseudo Observation Batch Bayesian Optimization Works When It Does not
Kumbha Nagaswetha, Rabi Pathak
Optimization Theory Efficient ML
  • Efficient conditioning is identified as the key property for batch diversity in Bayesian Optimization.
  • Gaussian Processes are proven to support efficient conditioning, unlike parametric models.
  • A unified framework for CL, KB, and fantasy models is established, linking them to Local Penalization and Determinantal Point Processes.
  • The Structural Diversity Diagnostic (SDD) is introduced to assess surrogate model compatibility.
Read more
Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence
Mridul Gupta, Samyak Jain, Vansh Ramani, Hariprasad Kodamana, Sayan Ranu
Graph Learning Efficient ML Optimization
  • Current graph condensation methods rely on full-dataset training, undermining efficiency.
  • Gradient matching introduces high computational overhead and poor generalization across GNNs.
  • Existing evaluation protocols do not accurately reflect resource savings and overhead.
  • A shift towards lightweight and architecture-agnostic methods is necessary for practical deployment.
Read more
MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning
Ankita Awasthi, Marco Apolinario, Kaushik Roy
Optimization Theory Efficient ML
  • MANGO effectively balances stability and plasticity in online continual learning.
  • The gradient-gating mechanism selectively scales parameter updates based on sensitivity.
  • Meta-learned regularization dynamically adapts stability coefficients to prevent catastrophic forgetting.
  • MANGO achieves state-of-the-art performance across multiple benchmark datasets.
Read more
Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction
Robert Jenkinson Alvarez
Theory
  • Enforcing Euclidean isotropy in JEPAs can misalign representations with structured task geometries.
  • The paper derives the minimax and maximum-entropy covariance under a Hamiltonian energy budget, highlighting the cost of isotropy.
  • HamJEPA introduces a phase-space representation that leverages Hamiltonian dynamics for improved predictive coupling.
  • Empirical results show significant performance improvements over existing methods on CIFAR-100 and ImageNet-100 datasets.
Read more
AR1-ZO: Topology-Aware Rank-1 Zeroth-Order Queries for High-Rank LoRA Fine-Tuning
Ziye Chen, Hongbin Lin, Chenyu Zhang, Xiangda Yan, Yongjie Yang, Yao Shu
NLP Large Language Models Optimization
  • Introduces AR1-ZO, a method that optimizes LoRA fine-tuning using zeroth-order optimization.
  • Identifies and resolves the measurement-topology problem that complicates high-rank LoRA optimization.
  • Proposes a topology-aware scaling mechanism that restores rank-invariant active signals.
  • Demonstrates the effectiveness of AR1-ZO through theoretical proofs and empirical validation on OPT and Qwen3 models.
Read more
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
Wenpu Liu, Yuqi Xu, Weichu Xie, Yongfu Zhu, Shuai Dong, Ziyue Wang, Wenqi Shao, Xiaoying Zhang, Tong Yang, Nan Duan, Jiaqi Wang
Reinforcement Learning Large Language Models Optimization
  • Error diversity within group rollouts is a critical factor for training success in RLVR.
  • EDAS is a lightweight, algorithm-agnostic method that reshapes advantage signals based on error diversity.
  • The method encourages exploration of diverse reasoning paths and discourages repetitive errors.
  • Extensive empirical validation shows significant performance improvements across various benchmarks.
Read more
Delta Attention Residuals
Cheng Luo, Zefan Cai, Junjie Hu
NLP Large Language Models Theory
  • Identifies the routing collapse problem in Attention Residuals due to source redundancy.
  • Proposes Delta Attention Residuals that route over deltas instead of cumulative states.
  • Demonstrates improved attention sharpness and model performance across various scales.
  • Enables easy conversion of pretrained models into Delta Attention Residuals via fine-tuning.
Read more
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
Feihu Huang, Yuning Luo, Songcan Chen
Optimization Theory Efficient ML
  • Establishes the generalization error of the Muon optimizer as O(1/(NΞΊT)).
  • Introduces the MiMuon optimizer, which improves generalization error to O(1/N).
  • Proves that MiMuon maintains the same convergence rate as the Muon optimizer.
  • Demonstrates the effectiveness of MiMuon through numerical experiments on large models.
Read more
LoRA vs. Full Fine-Tuning: A Theoretical Perspective
Ali Zindari, Rotem Mulayoff, Sebastian U. Stich
Theory Efficient ML Large Language Models
  • LoRA can achieve lower excess risk than FFT under certain conditions, particularly when task differences are low-rank.
  • The choice of LoRA rank is crucial for generalization performance, with small ranks sometimes improving test accuracy despite limiting expressivity.
  • Theoretical bounds for LoRA's excess risk are established, providing a clearer understanding of its performance compared to FFT.
  • Empirical experiments support the theoretical findings, indicating broader applicability to LLM fine-tuning.
Read more
PhysioSeq2Seq: A Hybrid Physiological Digital Twin and Sequence-to-Sequence LSTM for Long-Horizon Glucose Forecasting in Type 1 Diabetes
Phat Tran, Neville Mehta, Clara Mosquera-Lopez, Robert H. Dodier, Lizhong Chen, Peter G. Jacobs
Time Series
  • PhysioSeq2Seq combines physiological modeling with Seq2Seq LSTM for improved glucose forecasting.
  • The twin matching approach allows for patient-specific adaptation without retraining.
  • Incorporating internal ODE state variables as covariates reduces long-horizon prediction bias.
  • The model significantly outperforms traditional LSTM and ODE-based approaches in accuracy.
Read more
Learning Variable-Length Tokenization for Generative Recommendation
Minhao Wang, Bowen Wu, Wei Zhang
Generative Models Optimization Theory
  • Introduction of the Popularity-Length Paradox in generative recommendation.
  • Development of VarLenRec framework for variable-length tokenization.
  • Use of PIBA for optimal identifier length allocation based on item popularity.
  • Implementation of Hyperbolic Residual Quantization to manage diverse code lengths.
Read more
Goal-Conditioned Supervised Learning for LLM Fine-Tuning
Shijun Li, Kaiwen Dong, Xiang Gao, Joydeep Ghosh
NLP Large Language Models Optimization
  • GCSL enables direct training from feedback signals without the need for external reward models or paired preference data.
  • The novel goal-achieving objective allows for consistent improvement beyond the average quality of selected training subsets.
  • Natural-language goal representations enhance the model's ability to utilize its semantic understanding.
  • The approach shows improved performance across multiple tasks compared to traditional offline fine-tuning methods.
Read more
Exact Linear Attention
Weinuo Ou
NLP Large Language Models Efficient ML
  • Introduces Exact Linear Attention (ELA) with linear computational complexity.
  • Addresses gradient explosion and token attention dilution through kernel constraints.
  • Proposes innovative structures like Hyper-Link and Memory Lobe for improved performance.
  • Demonstrates significant improvements in decoding speed and memory efficiency.
Read more
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference
Anay Chauhan, Gurucharan Marthi Krishna Kumar, Arion Das, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das
Large Language Models Efficient ML Optimization
  • Introduces Spherical KV, a new inference primitive for efficient long-context decoding.
  • Frames KV memory as a rate-distortion allocation problem, focusing on directional attention.
  • Achieves 1.55x to 1.72x higher throughput while reducing resident KV bytes/token by 24-42%.
  • Demonstrates improved performance in high-stress scenarios with long contexts.
Read more
ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction
Jinwoong Kim, Sangjin Park
Time Series
  • Introduces a unified token-sequence framework for irregular clinical time series that preserves observation structure and missingness patterns.
  • Develops a reliability-aware temporal aggregation mechanism that estimates observation validity based on missingness and elapsed time.
  • Utilizes Chronological Weaving for multi-scale sequence modeling, allowing integration of information from different temporal resolutions.
  • Implements budgeted token routing to manage sequence length while retaining informative summaries.
Read more
Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking
Qinwu Xu, Zhuoheng Li, Jessie Salas
Large Language Models Multimodal
  • Proposes a multi-stage framework for checkpoint selection that integrates real-world data and structured evaluations.
  • Introduces subsampling-based confidence estimation to enhance reliability in ranking checkpoints.
  • Highlights the critical role of data quality, particularly OCR readability, in evaluation validity.
  • Critiques existing evaluation methods for their lack of robustness and alignment with real-world performance.
Read more
DCFold: Efficient Protein Structure Generation with Single Forward Pass
Zhe Zhang, Yuanning Feng, Yuxuan Song, Keyue Qiu, Hao Zhou, Wei-Ying Ma
Generative Models Efficient ML
  • DCFold achieves AlphaFold3-level accuracy with a single-step generative model.
  • The Dual Consistency framework eliminates iterative overhead, enhancing efficiency.
  • Temporal Geodesic Matching (TGM) stabilizes training and improves performance for variable-length protein sequences.
  • DCFold demonstrates a 15Γ— speedup in inference time compared to AlphaFold3.
Read more
An Integrated Forecasting Prototype for Emergency Department Boarding Time to Support Proactive Operational Decision Making
Orhun Vural, Abdulaziz Ahmed, Ferhat Zengul, James Booth, Bunyamin Ozaydin
Time Series
  • Developed a multi-horizon forecasting framework for ED boarding time.
  • Utilized real-world data and integrated external contextual factors for improved predictions.
  • Demonstrated superior performance of deep learning models in forecasting boarding times.
  • Created an MLOps web application to support practical implementation of the forecasting framework.
Read more
UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models
Van-Tuan Tran, Hong-Hanh Nguyen-Le, Marco Ruffini, Merim Dzaferagic
Federated Learning Efficient ML Optimization
  • UB-SMoE addresses expert utilization imbalance and non-differentiability issues in federated learning.
  • Dynamic Modulated Routing (DMR) and Universal Pseudo-Gradient (PG) are introduced to enhance expert viability.
  • The method significantly reduces computational costs for low-resource clients while improving their performance.
  • Experimental results show UB-SMoE outperforms existing heterogeneous LoRA-rank methods.
Read more
CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection
Junjun Pan, Yixin Liu, Yu Zheng, Lianhua Chi, Alan Wee-Chung Liew, Shirui Pan
Graph Learning
  • CAMERA addresses the issue of semantic camouflage in unsupervised TAGFD.
  • The framework utilizes a mixture-of-experts architecture to model diverse fraud-indicative cues.
  • A context-informed gating model allows for adaptive integration of cues from different experts.
  • CAMERA supports unsupervised one-class learning by focusing on benign patterns.
Read more
Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting
Liu Chong, Yingjie Zhou, Hao Li, Pengyang Wang, Qingsong Wen, Ce Zhu
Time Series
  • KUP-BI introduces a bidirectional forecasting approach that utilizes post-target continuation information.
  • The framework distills continuation-style auxiliary features from historical data to enhance forecasting models.
  • KUP-BI can be integrated into existing forecasting backbones with minimal overhead.
  • Experimental results show consistent improvements in forecasting performance across multiple datasets.
Read more
The Symmetries of Three-Layer ReLU Networks
Johanna Marie Gegenfurtner, Moritz Grillo, Guido MontΓΊfar
Theory Optimization
  • Developed a systematic framework for analyzing symmetries in three-layer ReLU networks.
  • Characterized layerwise symmetries and described fibers using polynomial equations.
  • Identified new symmetries from layer composition that introduce additional redundancies.
  • Showed that symmetries in deep networks are not always localizable, impacting parameter redundancy.
Read more
DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data
Masahiro Suzuki, Bohui Xia, Hiroto Yamamoto, Masanori Miyahara
Time Series Generative Models Reinforcement Learning
  • DAD4TS optimizes data augmentation specifically for small-scale time-series data.
  • The framework introduces a Selector that evaluates and retains only the most informative generated samples.
  • Joint training of the TSF model, data generator, and Selector allows for dynamic adaptation to the forecasting task.
  • DAD4TS is architecture-agnostic, making it a versatile extension for various TSF models.
Read more
TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing
Pei Yang, Wanyi Chen, Tongyun Yang, Pengbin Feng, Jiarong Xing, Wentao Guo, Yuhang Yao, Yuhang Han, Hanchen Li, Xu Wang, Zeyu Wang, Jie Xiao, Anjie Yang, Liang Tian, Lynn Ai, Eric Yang, Tianyu Shi
Large Language Models NLP Efficient ML
  • TwinRouterBench provides a step-level evaluation framework for LLM routing, addressing limitations of existing benchmarks.
  • The static track offers 970 execution-verified labels for realistic routing supervision, facilitating rapid offline development.
  • The dynamic track validates routing decisions in real-time, measuring success based on official task resolution and API costs.
  • The benchmark supports a two-track development loop, allowing for both training and live execution validation.
Read more