AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

44 Papers today
8h Update frequency
7 Days of history
OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination
Jae-Won Chung, Zhirui Liang, Yanyong Mao, Jiasi Chen, Mosharaf Chowdhury, Vladimir Dvorkin
Optimization
  • OpenG2G is an open-source library for simulating AI datacenter-grid coordination.
  • The platform allows for the comparison of various control strategies and their impacts on both AI and grid performance.
  • Realistic simulations demonstrate the potential for AI datacenters to provide power flexibility to the grid.
  • The modular architecture supports easy integration of new AI workloads and grid configurations.
Read more
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
Majid Mohammadi, Grigory Reznikov, Pavel Sinitcyn, Krikamol Muandet, Siu Lun Chau
Efficient ML Interpretability Theory
  • QuadraSHAP provides a stable and scalable method for computing Shapley values in product games.
  • The method utilizes a Gauss-Legendre quadrature scheme to achieve high precision with fewer nodes.
  • Numerical stability is enhanced through log-space evaluation, reducing overflow and underflow issues.
  • QuadraSHAP matches the performance of existing methods while significantly improving computational efficiency.
Read more
FedFrozen: Two-Stage Federated Optimization via Attention Kernel Freezing
Junye Du, Zhenghao Li, Yushi Feng, Long Feng
Federated Learning Optimization Theory
  • Introduction of FedFrozen, a two-stage federated optimization framework.
  • First analysis of federated linear attention by decomposing it into query/key and value blocks.
  • The warm-up phase allows for learning a stable attention kernel, while the frozen phase optimizes the value block.
  • Demonstrated improvements in stability and effectiveness of Transformer models in federated learning.
Read more
Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning
David Leeftink, Max Hinne, Marcel van Gerven
Reinforcement Learning Optimization Robotics
  • Establishes a formal link between recurrent policies and the Pontryagin minimum principle (PMP).
  • Introduces Neural Co-state Policies (NCP) to structure hidden states in RNNs.
  • Proposes a co-state loss to align training with optimal control dynamics.
  • Demonstrates improved performance and robustness in partially observable tasks.
Read more
Hybrid Quantum-Classical GANs for the Generation of Adversarial Network Flows
Prateek Paudel, Nitin Jha, Abhishek Parakh, Mahadevan Subramaniam
Generative Models
  • Introduction of a hybrid quantum-classical GAN framework for generating adversarial network traffic.
  • Utilization of variational quantum circuits to enhance the expressiveness of latent representations.
  • Evaluation of generated traffic against classical IDS models to test evasion capabilities.
  • Highlighting the implications of quantum computing in improving attack flow generation.
Read more
Two-Stage Learned Decomposition for Scalable Routing on Multigraphs
Filip Rydin, Morteza Haghir Chehreghani, BalΓ‘zs KulcsΓ‘r
Optimization Reinforcement Learning Graph Learning
  • Introduces Node-Edge Policy Factorization (NEPF) for scalable routing on multigraphs.
  • Utilizes a pre-encoding edge aggregation scheme to reduce memory and computational costs.
  • Employs a non-autoregressive architecture for efficient edge selection.
  • Demonstrates superior performance in solution quality and speed compared to existing methods.
Read more
On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning
Pratik Deshmukh, Atirek Gupta
Theory Graph Learning Large Language Models
  • Identification of catastrophic model collapse in causal reasoning fine-tuning with a 100% occurrence rate.
  • Introduction of a semantic loss function with graph-based constraints to prevent model collapse.
  • Achieved significant performance improvements on causal reasoning tasks compared to collapsed baselines.
  • Comprehensive evaluation across 200,000+ samples validating the necessity of semantic loss for stable predictions.
Read more
Attribution-Guided Continual Learning for Large Language Models
Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong
Large Language Models NLP
  • Introduces an attribution-guided framework for continual learning in LLMs.
  • Estimates task-specific parameter importance to modulate gradient updates.
  • Addresses the limitations of existing methods that lack semantic awareness.
  • Demonstrates superior performance in retaining knowledge from previous tasks.
Read more
Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs
Nicholas Potteiger, Ankita Samaddar, Taylor T. Johnson, Xenofon Koutsoukos
Reinforcement Learning Robotics Large Language Models
  • Introduction of the Masking Reward Behavior Tree (MRBT) for compositional tasks.
  • Automated design of reactive and modular rewards and action masks using LLMs.
  • Successful generation and refinement of MRBTs leading to improved training efficiency.
  • Demonstrated advantages of MRBTs in terms of transferability and modularity.
Read more
Normalized Architectures are Natively 4-Bit
Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg
Large Language Models Efficient ML Optimization
  • nGPT architecture is natively robust to 4-bit quantization, eliminating the need for overhead interventions.
  • Robustness arises from effective signal accumulation rather than noise suppression, enhancing SNR per layer.
  • Training dynamics under the hypersphere constraint promote distributed alignments across dimensions, ensuring signal coherence.
  • Empirical validation shows nGPT maintains stability and lower relative error compared to standard transformers across diverse architectures.
Read more
A Robust Foundation Model for Conservation Laws: Injecting Context into Flux Neural Operators via Recurrent Vision Transformers
Taeyoung Kim, Joon-Hyuk Ko
Theory Efficient ML Time Series
  • Introduction of a context-conditioned Flux Neural Operator using recurrent Vision Transformers.
  • Formulation of an in-context flux-learning problem for parametric conservation laws.
  • Demonstration of improved autoregressive stability and out-of-distribution robustness.
  • Ability to infer latent numerical flux operators from short observed trajectories.
Read more
COPYCOP: Ownership Verification for Graph Neural Networks
Rahul Nandakumar, Deepayan Chakrabarti
Graph Learning
  • COPYCOP is the first fingerprinting method for GNNs that is robust against a wide range of adversarial transformations.
  • The algorithm uses stationary points of the embedding function as fingerprints, which are invariant to transformations.
  • COPYCOP is architecture-agnostic, allowing it to detect surrogates regardless of differences in model architecture or parameters.
  • Extensive experiments validate the effectiveness of COPYCOP across multiple datasets and GNN architectures.
Read more
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning
William T. Redman, Erik C. Johnson, Brian Robinson
NLP Large Language Models Theory
  • BERT learns shortcut solutions that impair generalization and forward transfer in continual learning.
  • ALBERT exhibits a more effective algorithmic solution, leading to better performance in continual learning tasks.
  • Both models fail in tasks requiring compositional reasoning across experiences, but ALBERT can be improved with specific training strategies.
  • Architectural choices significantly influence the performance of Transformer models in continual learning settings.
Read more
Crafting Reversible SFT Behaviors in Large Language Models
Yuping Lin, Pengfei He, Yue Xing, Yingqian Cui, Jiayuan Ding, Subhabrata Mukherjee, Hui Liu, Zhen Xiang
NLP Large Language Models Optimization
  • Introduces the concept of sparse behavioral carriers for SFT-induced behaviors in LLMs.
  • Presents Loss-Constrained Dual Descent (LCDD) for constructing these carriers.
  • Demonstrates the effectiveness of SFT-Eraser for behavior reversal without modifying model weights.
  • Provides evidence that the sparse structure is crucial for causal necessity of behaviors.
Read more
CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification
Iason Ofeidis, Nikos Papadis, Randeep Bhatia, Leandros Tassiulas, TV Lakshman
Federated Learning
  • CLAD combines Clustered Federated Learning with a Dual-Mode Micro-Architecture for enhanced anomaly detection and attack classification.
  • The framework effectively utilizes both labeled and unlabeled data, maximizing the learning potential from diverse IoT devices.
  • Dynamic clustering of devices improves model accuracy by preserving distinct operational patterns.
  • CLAD demonstrates significant performance improvements over state-of-the-art methods, particularly in environments with high proportions of unlabeled data.
Read more
Training Transformers for KV Cache Compressibility
Yoav Gelberg, Yam Eitan, Michael Bronstein, Yarin Gal, Haggai Maron
NLP Large Language Models Efficient ML
  • Introduces the concept of KV compressibility as a property of learned representations.
  • Proposes KV-Compression Aware Training (KV-CAT) to guide transformers towards compressible representations during training.
  • Demonstrates that KV-CAT improves the effectiveness of post-hoc KV cache compression methods.
  • Empirical evaluations show enhanced performance across various long-context tasks.
Read more
Do Neural Operators Forget Geometry? The Forgetting Hypothesis in Deep Operator Learning
Yanming Xia, Angelica I. Aviles-Rivero
Theory
  • Introduction of the Geometric Forgetting Hypothesis, highlighting the loss of geometric information in deep neural operators.
  • Demonstration of systematic geometric information decay through layer-wise geometric probing in spectral and attention-based operators.
  • Identification of a structural limitation in transformer-based models termed the Geometric Shortcut, which leads to feature collapse when geometry is injected too late.
  • Proposal of a Geometry Memory Injection mechanism that restores geometric information flow with minimal architectural changes.
Read more
SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
Dmitri Goloubentsev, Natalija Karpichina
Reinforcement Learning Optimization Theory
  • SNAPO integrates a neural policy into a differentiable simulator for optimal control.
  • It computes exact gradients and sensitivities in a single adjoint pass, significantly improving efficiency.
  • Demonstrated effectiveness in three diverse domains with rapid training times and high sensitivity computation speedup.
Read more
Perceive, Route and Modulate: Dynamic Pattern Recalibration for Time Series Forecasting
Siru Zhong, Zhao Meng, Haohuan Fu, Haoyang Li, Qingsong Wen, Yuxuan Liang
Time Series
  • Introduction of Dynamic Pattern Recalibration (DPR) to address static pattern responses in time series forecasting.
  • DPR is a backbone-agnostic mechanism that enhances various forecasting architectures with minimal parameter overhead.
  • The 'Perceive-Route-Modulate' pipeline allows for continuous token-level recalibration of features.
  • DPRNet, a minimalist model based on DPR, achieves competitive performance across multiple benchmarks.
Read more
Uncertainty Estimation via Hyperspherical Confidence Mapping
Eunseo Choi, Ho-Yeon Kim, Jaewon Lee, Taeyong Jo, Myungjun Lee, Heejin Ahn
Theory Efficient ML Interpretability
  • HCM provides a novel, geometric interpretation of uncertainty in neural network predictions.
  • The method is sampling-free and distribution-free, making it efficient for real-time applications.
  • HCM shows superior performance in calibration and confidence-error alignment compared to existing methods.
  • The framework is applicable to both regression and classification tasks, enhancing its versatility.
Read more
INEUS: Iterative Neural Solver for High-Dimensional PIDEs
Jean-Loup Dupret, Davide Gallon, Patrick Cheridito
Theory Efficient ML
  • INEUS effectively addresses the curse of dimensionality in high-dimensional PIDEs.
  • The method reformulates PIDE solving into recursive regression problems, enhancing efficiency.
  • INEUS combines the strengths of PINNs and Feynman-Kac methods for better handling of nonlocal terms.
  • A contraction-based convergence proof is established for linear PIDEs.
Read more
Structure-Preserving Gaussian Processes Via Discrete Euler-Lagrange Equations
Jan-Hendrik Ewering, Kathrin Flaßkamp, Niklas Wahlstrâm, Thomas B. Schân, Thomas Seel
Robotics Efficient ML Theory
  • Introduction of Lagrangian Gaussian Processes (LGPs) for learning dynamics models.
  • Preservation of geometric structure of the Lagrange-d’Alembert principle for energy consistency.
  • Ability to learn from discrete position data without requiring velocity or momentum measurements.
  • Demonstrated data efficiency and generalization in synthetic and real-world applications.
Read more
When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy
April Chan, Davide D'Ascenzo, Sebastiano Cultrera di Montesano
Computer Vision
  • HACE is a drop-in replacement for standard cross-entropy that incorporates class hierarchy into the loss function.
  • The method combines prediction aggregation and ancestral label smoothing to effectively utilize hierarchical information.
  • HACE outperforms standard cross-entropy in 15 out of 18 architecture-dataset pairs, with an average accuracy gain of 4.66%.
  • In linear probing, HACE achieves a mean improvement of 2.18% over competing methods across all datasets.
Read more
Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization
Abhijit Das, Sayantan Dutta
Optimization Theory Large Language Models
  • Weight decay is proven to be essential for satisfying Villani's differential growth conditions in Transformer models.
  • The paper introduces empirical diagnostics to visualize the relationship between weight decay and curvature at infinity.
  • Explicit convergence rates for Langevin-based optimizers are derived, linking them to weight decay practices.
  • A reproducible experimental suite is provided for evaluating functional-analytic properties in large Transformer models.
Read more
Towards Metric-Faithful Neural Graph Matching
Jyotirmaya Shivottam, Subhankar Mishra
Graph Learning Theory Optimization
  • Introduces a theoretical framework linking encoder geometry to GED estimation quality.
  • Demonstrates that bi-Lipschitz encoders yield improved GED surrogates and ranking stability.
  • Establishes that node-level bi-Lipschitz geometry affects downstream alignment objectives.
  • Empirically validates the framework using FSW-GNN in various neural GED architectures.
Read more
Verifier-Backed Hard Problem Generation for Mathematical Reasoning
Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao
Large Language Models Generative Models Theory
  • Introduction of a three-party self-play framework for problem generation that includes a verifier.
  • Demonstrated significant performance improvements in generating valid and challenging mathematical problems.
  • Evaluation of both Hard and Soft verifiers to validate problem correctness and difficulty.
  • VHG outperforms existing baseline methods, including state-of-the-art models, in various mathematical benchmarks.
Read more
PACE: Prune-And-Compress Ensemble Models
Fabian Akkerman, Julien Ferry, ThΓ©o Guyard, Thibaut Vidal
Efficient ML Interpretability Optimization
  • PACE combines pruning and compression techniques to enhance ensemble models.
  • The framework allows for active generation of new learners to improve diversity.
  • Pruning is performed on an enriched ensemble, allowing for better performance.
  • The method provides rigorous control over faithfulness to the original ensemble.
Read more
Distributionally-Robust Learning to Optimize
Vinit Ranjan, Jisun Park, Bartolomeo Stellato
Optimization Theory
  • Introduction of the DR-L2O framework that combines worst-case analysis with data-driven optimization.
  • Establishment of a continuous trade-off between classical L2O and worst-case optimal design via a Wasserstein radius.
  • Development of a scalable solution method using stochastic gradient descent with implicit differentiation.
  • Proof of out-of-sample guarantees for the learned algorithms, ensuring robustness and performance.
Read more
When Does β„“2-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the β„“1 Implicit Bias
Ye Su, Jian Li, Yong Liu
Theory Optimization
  • Benign overfitting in β„“2-Boosting is characterized by a logarithmic decay of excess variance under isotropic noise.
  • The risk under spiked-isotropic designs converges to zero at a slower logarithmic rate compared to β„“2 geometries.
  • A deterministic early stopping rule is proposed to prevent noise interpolation and achieve optimal prediction rates.
Read more
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari
Large Language Models NLP Efficient ML
  • CRAFT is the first framework to perform routing, adaptation, and merging entirely in representation space without modifying model weights.
  • It employs deterministic routing based on KL divergence, eliminating the need for learned gating mechanisms.
  • The framework effectively controls forgetting during adaptation by measuring divergence from existing knowledge.
  • Empirical results show significant improvements in performance and reduced forgetting rates compared to state-of-the-art methods.
Read more
Soft Deterministic Policy Gradient with Gaussian Smoothing
Hyunjun Na, Donghwan Lee
Reinforcement Learning Robotics Theory
  • Introduction of Soft-DPG to overcome limitations of standard DPG in non-smooth environments.
  • Development of a smoothed Bellman equation that ensures well-defined policy gradients.
  • Establishment of analytical upper bounds for approximation errors related to the smoothing parameter.
  • Implementation of Soft DDPG, a practical deep reinforcement learning algorithm.
Read more
Directional Consistency as a Complementary Optimization Signal: The GONO Framework
Victor Daniel Gera
Optimization
  • Directional consistency and loss convergence can be decoupled, revealing limitations in existing optimizers.
  • GONO adapts the momentum coefficient based on directional alignment, improving optimization performance.
  • The framework achieves perfect oscillation detection and competitive results on standard datasets.
  • The study introduces a theoretically grounded approach to enhance optimizer design by considering directional signals.
Read more
Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs
Zixuan Chen, Hao Lin, Zizhe Chen, Yizhou Tian, Garry Yang, Depeng Wang, Ya Guo, Huijia Zhu, James Cheng
NLP Large Language Models
  • Correction suppression is a prevalent issue in LLMs, with suppression rates between 19% and 90%.
  • Models exhibit a 'knowing but not correcting' behavior, recognizing errors internally but failing to correct them due to task context.
  • Two effective training-free interventions, CDS and DPA, significantly enhance factual correction rates.
  • The study introduces 'factual strictness' as a new dimension of model reliability.
Read more
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
Yun Qu, Qi Wang, Yixiu Mao, Heming Zou, Yuhang Jiang, Yingyue Li, Wutong Xu, Lizhou Cai, Weijie Liu, Clive Bai, Kai Yang, Yangkun Chen, Saiyong Yang, Xiangyang Ji
Reinforcement Learning Large Language Models Optimization
  • LPO provides a unified geometric perspective on group-based RL algorithms, revealing implicit target-projections.
  • The framework allows for explicit target-projection, leading to improved optimization stability and response diversity.
  • LPO demonstrates higher accuracy in training performance compared to traditional policy gradient baselines.
  • The decoupled structure of LPO supports flexible divergence selection, enhancing its applicability.
Read more
Attributions All the Way Down? The Metagame of Interpretability
Hubert Baniecki, Przemyslaw Biecek, Fabian Fumagalli
Interpretability NLP Multimodal
  • Introduction of the METAGAME framework for quantifying second-order interactions in model explanations.
  • Development of meta-attributions that generalize first-order attribution methods to capture directional influences.
  • Theoretical proof of hierarchical decomposition of attributions into directional interactions.
  • Empirical demonstration of the METAGAME's effectiveness across various machine learning interpretability applications.
Read more
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
Brett Barkley, Preston Culbertson, David Fridovich-Keil
Computer Vision NLP Efficient ML
  • Frozen pretrained representations contain sufficient geometric structure for effective label-free OOD detection.
  • Two complementary detection methods (global Mahalanobis and local ReSCOPED) were evaluated across diverse tasks.
  • Performance of both detection methods improves with better representation quality, reducing the importance of detector choice.
  • The study provides empirical evidence supporting the use of frozen models for OOD detection without fine-tuning.
Read more
Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion
Eugenio Lomurno, Filippo Balzarini, Francesco Benelle, Francesca Pia Panaccione, Matteo Matteucci
Generative Models
  • TARDIS framework refines synthetic tabular data at inference time using a pre-trained backbone.
  • Introduces Bidirectional Chamfer Refinement (BCR) to minimize the distance between synthetic and real samples.
  • Achieves a median +8.6% improvement in downstream task performance over real data models across 15 datasets.
  • Demonstrates that inference-time refinement can effectively close the synthetic-real performance gap.
Read more
MinMax Recurrent Neural Cascades
Alessandro Ronca
Theory Efficient ML Time Series
  • MinMax RNCs can express all regular languages and have favorable theoretical properties.
  • They can be evaluated in parallel with logarithmic runtime concerning input length.
  • MinMax RNCs maintain bounded states and gradients, avoiding issues of vanishing or exploding gradients.
  • Empirical results show superior performance on synthetic tasks compared to state-of-the-art models.
Read more
Online Bayesian Calibration under Gradual and Abrupt System Changes
Yang Xu, Chiwoo Park
Theory Optimization Time Series
  • Introduces Bayesian Recursive Projected Calibration (BRPC) for online Bayesian calibration.
  • Separates parameter updates from discrepancy modeling to improve identifiability.
  • Integrates a restart mechanism to handle abrupt regime shifts effectively.
  • Demonstrates improved calibration accuracy and robustness through empirical evaluations.
Read more
Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models
Sicong Chang, Yidan Shen, Justina Varghese, Akshay R Prabhakar, Sebastian Guadarrama-Sistos-Vazquez, Jiefu Chen, Masayoshi Takashima, Omar G. Ahmed, Renjie Hu, Xin Fu
Interpretability
  • Utilized nationwide EHR data to enhance CRS diagnosis prediction.
  • Developed a hybrid feature-selection method to condense clinical codes.
  • Implemented demographic-stratified models to capture variations in disease presentation.
  • Achieved an AUC of 0.8461, demonstrating improved predictive performance.
Read more
Sparse Prefix Caching for Hybrid and Recurrent LLM Serving
Mikhail Shirokikh, Sergey Nikolenko
Large Language Models Optimization Efficient ML
  • Introduction of sparse prefix caching for optimizing LLM serving.
  • Formalization of the caching problem as a one-sided weighted k-median problem.
  • Demonstration of improved performance over existing heuristics on real-world datasets.
  • The method allows for exact output preservation without altering recurrent computations.
Read more
SPADE: Faster Drug Discovery by Learning from Sparse Data
Rahul Nandakumar, Ben Fauber, Deepayan Chakrabarti
Optimization Efficient ML
  • SPADE addresses the inefficiencies in drug discovery for novel proteins by focusing on sparse data.
  • The algorithm requires only an average of 40 tests to identify 10 high-quality ligands.
  • SPADE outperforms traditional methods, achieving significant improvements in sample efficiency.
  • A new dataset of 1.5 million entries was created to support the evaluation of the proposed method.
Read more
In-Context Black-Box Optimization with Unreliable Feedback
Nicolas Samuel Blumer, Julien Martinelli, Samuel Kaski
Optimization
  • FICBO integrates auxiliary feedback into the optimization process, improving query selection.
  • The framework uses a structured feedback prior to model the reliability of feedback sources.
  • Empirical evaluations show FICBO's superiority over classical and amortized optimization baselines.
  • The model enhances interpretability by providing insights into how it assesses feedback reliability.
Read more
When Graph Language Models Go Beyond Memorization
Masatsugu Yamada, Mahito Sugiyama
Graph Learning Generative Models Large Language Models
  • Introduces a calibrated diagnostic protocol to evaluate GLMs, overcoming limitations of aggregate fidelity metrics.
  • Demonstrates scale-dependent structural learning, with models transitioning from memorization to structural alignment as dataset size increases.
  • Identifies a persistent deficit in learning rare graph patterns, indicating a critical gap in current autoregressive graph generation methods.
  • Empirical evidence supports that GLMs can implicitly learn structural regularities without explicit pattern enumeration.
Read more