AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

24 Papers today
8h Update frequency
7 Days of history
Retrieval Mechanisms Surpass Long-Context Scaling in Time Series Forecasting
Rishi Ahuja, Kumar Prateek, Simranjit Singh, Vijay Kumar
Time Series
  • Long-context scaling in TSFMs leads to increased forecasting errors due to stochastic noise accumulation.
  • RAFT outperforms traditional long-context models by using selective retrieval of relevant historical data.
  • The study reveals an inverse scaling law in time series forecasting, contradicting the scaling hypothesis from NLP.
  • Dynamic exogenous variables from retrieved segments enhance model performance without the noise penalties of longer contexts.
Read more
On Uniform Error Bounds for Kernel Regression under Non-Gaussian Noise
Johannes Teutsch, Oleksii Molodchyk, Marion Leibold, Timm Faulwasser, Armin Lederer
Theory Robotics Reinforcement Learning
  • Introduction of non-asymptotic probabilistic uniform error bounds for kernel regression.
  • Extension of error bounds to a wide range of non-Gaussian noise distributions.
  • Separation of uncertainty into exploration and noise components for improved accuracy.
  • Demonstration of the tightness of proposed bounds through numerical examples.
Read more
Higher-Order Equilibrium Tracking for EM-Compressible Online Estimation
ZhiMing Li, Yue Song
Theory Optimization Efficient ML
  • Introduces an empirical-equilibrium formulation for online estimation, separating statistical fluctuation from tracking lag.
  • Develops higher-order equilibrium-jet tracking methods that improve convergence rates.
  • Defines EM-compressibility and EM-jet-compressibility as conditions for effective online estimation.
  • Establishes a batch-to-online transfer theorem linking online performance to batch properties.
Read more
Task-Aware Calibration: Provably Optimal Decoding in LLMs
Tim Tomov, Dominik Fuchsgruber, Rajeev Verma, Stephan Günnemann
NLP Large Language Models Theory
  • Introduces task calibration to improve LLM output quality by aligning predictive distributions with task-specific latent structures.
  • Demonstrates that MBR decoding on task-calibrated distributions is the optimal strategy for minimizing expected loss.
  • Presents Task Calibration Error (TCE) as a new metric for assessing miscalibration in LLMs, showing it is a strong predictor of performance improvements.
  • Empirical evaluations confirm that task calibration consistently enhances generation quality across diverse tasks.
Read more
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction
Manuel Heurich, Maximilian Granz, Tim Landgraf
Time Series
  • RareCP enhances conformal prediction efficiency by addressing temporal drift and error regime structures.
  • The method utilizes a mixture of cosine-attention experts to capture distinct error regimes.
  • RareCP retrieves relevant calibration examples to form adaptive prediction intervals.
  • It achieves competitive performance against state-of-the-art quantile forecasters while being backbone-agnostic.
Read more
Beyond Static Bias: Adaptive Multi-Fidelity Bandits with Improving Proxies
Muyun Lu, Haoyang Hong, Huazheng Wang, Ying Lin
Theory Optimization
  • Introduces a dynamic model for MF-MAB where low-fidelity sources improve with use.
  • Develops the TACC algorithm that balances low-fidelity sampling and high-fidelity escalation.
  • Establishes an instance-dependent regret bound that highlights the benefits of adaptive continuation.
  • Demonstrates the algorithm's effectiveness through synthetic simulations and LLM-based evaluations.
Read more
Machine Learning-Based Graph Simplification for Symbolic Accelerators
Tiffany Yu, Rye Stahle-Smith, Darssan Eswaramoorthi, Rasha Karakchi
Optimization Graph Learning Efficient ML
  • AutoSlim effectively reduces the size and complexity of automata graphs while preserving their semantics.
  • The framework utilizes a Random Forest classifier to identify redundant nodes and edges based on extracted features.
  • Implementation of AutoSlim on NAPOLY+ resulted in up to 40% reduction in FPGA resource usage.
  • The approach includes a verification step to ensure functional equivalence after graph pruning.
Read more
Predicting Plasticity in Deep Continual Learning: A Theoretical Perspective
Jiuqi Wang, Jayanth Srinivasa, Claire Chen, Shuze Daniel Liu, Ali Payani, Shangtong Zhang
Theory Optimization
  • Existing diagnostics for plasticity in neural networks can be misleading and fail to predict trainability.
  • The authors propose a new metric, Optimization Readiness (OR), which effectively predicts a model's ability to adapt to new tasks.
  • Theoretical guarantees are provided for OR, establishing its predictive power in optimization contexts.
  • Empirical results demonstrate OR's superiority over traditional metrics in ranking model checkpoints by trainability.
Read more
When Adaptation Fails: A Gradient-Based Diagnosis of Collapsed Gating in Vision-Language Prompt Learning
Yunxuan Fang, Ziwei Zhang, Xinhe Wang
Multimodal Efficient ML Optimization
  • Adaptive prompting mechanisms in VLMs often collapse, leading to ineffective adaptation.
  • Two main failure modes identified: gradient magnitude imbalance and gate degradation.
  • The study uses AdaptiveBiMaPLe as a controlled framework to analyze optimization dynamics.
  • Findings suggest that additional architectural complexity may not yield meaningful benefits.
Read more
PRISM: Fast Online LLM Serving via Scheduling-Memory Co-design
Xingyu Qu, Tianhao Lin, Yiqi Li, Zhiyu Chen, Sheng Wang
Large Language Models Optimization Efficient ML
  • PRISM co-designs scheduling and KV-cache management to optimize online LLM serving.
  • The architecture leverages reusable segments in prompts to enhance cache hit rates.
  • Experimental results indicate significant reductions in TTFT compared to baseline methods.
  • The approach addresses the inefficiencies of independent scheduling and KV-cache management.
Read more
Active Learning for Gaussian Process Regression Under Self-Induced Boltzmann Weights
Jixiang Qing, Henry Moss, Matthias Sachs
Theory Optimization Efficient ML
  • Introduction of the AB-SID-iVAR acquisition function for active learning under self-induced distributions.
  • Theoretical guarantees for convergence of prediction error in high-probability and average cases.
  • Demonstrated superior performance on synthetic and real-world datasets compared to existing methods.
  • Applicability to both discrete and continuous input domains without requiring partition function estimation.
Read more
Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems
Mieke Wilms, Christoph Heitz
Theory Optimization
  • Characterizes the Pareto frontier of binary prediction-based decision systems, highlighting the trade-off between fairness and performance.
  • Introduces a multi-objective optimization framework that incorporates various fairness metrics and justice-theoretic principles.
  • Demonstrates that the Pareto frontier includes both lower-bound and upper-bound threshold rules, depending on the fairness metric used.
  • Shows that the Pareto frontier's location is determined by population characteristics and utility functions, not by the algorithm's design.
Read more
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
Beshr IslamBouli, David Jin
Large Language Models Efficient ML Optimization
  • AAAC introduces learned scalar codebooks for improved weight quantization accuracy.
  • The method operates with negligible overhead, adding only 64 bytes per layer.
  • AAAC is a gradient-free approach, eliminating the need for backpropagation.
  • It significantly reduces quantization time compared to existing methods.
Read more
Selection of the Best Policy under Fairness Constraints for Subpopulations
Tingyu Zhu, Yuhang Wu, Zeyu Zheng
Theory Optimization Efficient ML
  • Introduction of the SBFC problem to address fairness in policy selection across subpopulations.
  • Development of the T-a-S-CS algorithm that achieves instance-specific lower bounds on sample complexity.
  • Extension of the framework to include general fairness specifications with matching guarantees.
  • Demonstration of substantial efficiency improvements over existing policy allocation baselines through numerical experiments.
Read more
Generalized Wasserstein Flow Matching: Transport Plans, Everywhere, All at Once
Moritz Piening, Richard Duong, Gabriele Steidl
Generative Models Theory Optimization
  • Introduction of Wasserstein-on-Wasserstein (WoW) formulation for flow matching.
  • Derivation of non-local velocity fields as minimizers of a specific loss function.
  • Development of efficient transport couplings using sliced and linear Wasserstein approximations.
  • Unification and extension of existing generative modeling methods for point clouds and sets.
Read more
AdamFLIP: Adaptive Momentum Feedback Linearization Optimization for Hard Constrained PINN Training
Binghang Lu, Runyu Zhang, Changhong Mou, Na Li, Guang Lin
Optimization Theory
  • AdamFLIP reformulates PINN training as an equality-constrained optimization problem.
  • The method utilizes feedback linearization to stabilize constraint violations.
  • AdamFLIP integrates adaptive moment estimation, enhancing convergence and robustness.
  • Empirical results show significant improvements in constraint satisfaction and solution accuracy.
Read more
Additive Atomic Forests for Symbolic Function and Antiderivative Discovery
Reda Belaiche
Theory Interpretability Optimization
  • Introduces a self-expanding library of function-derivative pairs for symbolic regression.
  • Employs two new primitives, EML and SOL, to efficiently generate elementary function atoms.
  • Demonstrates empirical success in classification and symbolic regression tasks, outperforming traditional methods like XGBoost.
  • Provides a framework that allows for the simultaneous recovery of functions and their antiderivatives without traditional integration methods.
Read more
Complex-Valued Phase-Coherent Transformer
Leona Hioki
Theory Computer Vision NLP
  • PCT introduces token-non-competing attention to preserve phase information in complex-valued computations.
  • The architecture consistently outperforms both real-valued and complex-valued Transformers under parameter-fair conditions.
  • PCT shows strong generalization across diverse tasks, including those traditionally difficult for complex-valued networks.
  • The design principles of PCT address long-standing concerns about depth scalability in complex-valued neural networks.
Read more
Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization
Jinping Wang, Qinhan Liu, Zhiwu Xie, Zhiqiang Gao
Optimization Theory
  • Identifies a mechanism-level objective mismatch in SAM, where first-order gradient signals dominate instead of second-order curvature.
  • Proposes LE-SAM, which fixes the loss budget and adapts the perturbation radius dynamically to focus on curvature.
  • Demonstrates that LE-SAM consistently outperforms SAM and its variants across diverse benchmarks.
  • Highlights the significance of curvature in achieving better generalization in deep neural networks.
Read more
Hierarchical Multi-Fidelity Learning for Predicting Three-Dimensional Flame Wrinkling and Turbulent Burning Velocity
Saghar Zolfaghari, Yu Xie, Junfeng Yang, Safa Jamali
Theory
  • Development of MuFiNNs framework for predicting flame dynamics.
  • Integration of high-fidelity and low-fidelity data to enhance predictive accuracy.
  • Demonstrated effectiveness in data-limited regimes with sparse experimental data.
  • Ability to interpolate and extrapolate across various operating conditions.
Read more
Identified-Set Geometry of Distributional Model Extraction under Top-K Censored API Access
Wenhua Nie, ZiCheng Zhu, Jianan Wu, Binhan Luo, Haoran Zheng, Jyh-Shing Roger Jang
NLP Large Language Models Theory
  • Introduces the first exact identified-set characterization for top-K censored model extraction.
  • Establishes computable bounds for KL recovery limits under different logit access models.
  • Demonstrates a layered extraction hierarchy with varying recovery rates based on access methods.
  • Shows that top-K censoring limits distribution recovery but does not fully prevent capability extraction.
Read more
Unsupervised Process Reward Models
Artyom Gadetsky, Maxim Kodryan, Siba Smarak Panigrahi, Hang Guo, Maria Brbic
NLP Large Language Models Reinforcement Learning
  • uPRMs eliminate the need for human annotations and ground-truth verification in training Process Reward Models.
  • The proposed method achieves up to 15% accuracy improvement in identifying first erroneous steps compared to LLM-as-a-Judge.
  • uPRMs perform comparably to supervised PRMs in test-time scaling and outperform majority voting baselines.
  • As a reward signal in reinforcement learning, uPRMs support more robust policy optimization than supervised PRMs.
Read more
Path-Dependent Denoising: A Non-Conservative Field Perspective on Order Collapse in Diffusion Language Models
Jeonseong Kim
NLP Large Language Models Generative Models
  • DLMs can theoretically generate tokens in arbitrary orders but often behave like autoregressive models in practice.
  • The paper introduces a compatibility formulation that connects local denoising conditionals to a common joint distribution.
  • A local circulation diagnostic is proposed to measure the order sensitivity of DLMs.
  • Path dependence in DLMs is characterized and separated from conditional total correlation and order-specific estimation errors.
Read more
A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability
Hamed Omidvar, Vahideh Akhlaghi
Large Language Models Theory Efficient ML
  • Introduces a unified communication-theoretic framework for LLM reliability techniques.
  • Derives analytical results linking agent behavior to classical decoding theory.
  • Presents a cost-aware router that optimizes technique selection based on empirical performance.
  • Empirical evaluations show significant improvements in quality and cost efficiency.
Read more