AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

24 Papers today
8h Update frequency
7 Days of history
Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts
Reza Rastegar
Theory Optimization
  • Introduces the concept of boundary mass to analyze routing ties in MoE models.
  • Proves that the boundary mass is linear in slab width, impacting soft-to-hard risk bounds.
  • Establishes Γ-convergence of soft objectives to hard-routing objectives under specific conditions.
  • Demonstrates a conditional landscape-transfer principle in a teacher-student setting.
Read more
Topological Neural Tangent Kernel
Sanjukta Krishnagopal
Graph Learning Theory Interpretability
  • Introduction of TopoNTK, an infinite-width kernel for higher-order interactions in simplicial complexes.
  • Demonstration of how TopoNTK captures filled-simplex structures that are invisible to traditional graph kernels.
  • Establishment of exact Hodge preservation in propagation and conditions for kernel-level compatibility.
  • Development of spectral learning dynamics and proof of finite-depth stability under perturbations.
Read more
MSMixer: Learned Multi-Scale Temporal Mixing with Complementary Linear Shortcut for Long-Term Time Series Forecasting
Ahmed Cherif
Time Series
  • MSMixer employs a multi-scale architecture with three parallel branches for improved temporal pattern capture.
  • A learnable softmax gate allows dynamic weighting of outputs from different branches, enhancing adaptability.
  • The DLinear complementary shortcut provides global context, improving trend and seasonality modeling.
  • MSMixer outperforms existing lightweight models and Transformer-based models in forecasting accuracy.
Read more
Combining Trained Models in Reinforcement Learning
Ujjwal Patil, Javad Ghofrani
Reinforcement Learning Federated Learning Robotics
  • High sample cost and weak transferability are significant issues in DRL.
  • The review synthesizes findings from 15 empirical studies on pretrained knowledge reuse.
  • Positive results are more common when source and target tasks are structurally similar.
  • Evidence for ensemble and federated methods is limited and context-specific.
Read more
Bolek: A Multimodal Language Model for Molecular Reasoning
Frederic Grabowski, Jacek Szczerbiński, Maciej Jaśkowski, Kalina Jasińska-Kobus, Paweł Dąbrowski-Tumański, Tomasz Jetka, Bartosz Topolski
NLP Large Language Models Multimodal
  • BOLEK integrates molecular embeddings into a language model for enhanced reasoning.
  • The model outperforms larger, specialized systems in multiple classification tasks.
  • It provides auditable explanations that are grounded in molecular features.
  • BOLEK demonstrates generalization capabilities beyond its training data.
Read more
Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation
Sofianos Panagiotis Fotias, Vassilis Gaganis
Reinforcement Learning Optimization
  • Closed-loop CO2 storage management is formulated as a partially observable sequential decision problem.
  • History-conditioned policies can effectively utilize deployable well-level information to achieve high performance.
  • Latent model-based adaptation outperforms direct model-free retuning in abnormal operational scenarios.
  • The proposed framework reduces the computational burden associated with traditional history matching and re-optimization.
Read more
A dimensional R2 regression metric
Jaesung Yoo, Stefan Lemke, Jian Zhong Guo, Kanaka Rajan, Adam Hantman
Theory
  • Dim-R2 extends the R2 metric to handle arbitrary dimensionality in regression tasks.
  • It provides a multidimensional view of prediction accuracy, revealing patterns that traditional R2 cannot.
  • Dim-R2 is less sensitive to low-variance noise, yielding more interpretable results.
  • The metric was validated on synthetic and real-world multidimensional datasets.
Read more
Predicting Post Virality with Temporal Cross-Attention over Trend Signals
Sarvagya Somvanshi, Mohan Xu, Rakhi Chadalavada, Nathan Canera
NLP Time Series Multimodal
  • Introduction of a post-level trend alignment signal from Wikipedia pageview spikes.
  • Development of ViralityNet, a cross-attention architecture that incorporates temporal trend signals.
  • Systematic ablation study to analyze the contributions of temporal context and exogenous trends.
  • Demonstrated significant improvements in prediction accuracy over traditional text-only models.
Read more
Metric-Normalized Posterior Leakage (mPL): Attacker-Aligned Privacy for Joint Consumption
Gaoyi Chen, Minghao Li, Weishi Shi, Yan Huang, Yusheng Wei, Sourabh Yadav, Chenxi Qiu
Theory
  • Introduction of mPL as a measure of privacy leakage under joint consumption scenarios.
  • Establishment of the equivalence between bounding mPL and mDP for independent releases.
  • Development of PBmPL to control the probability of exceeding privacy budgets.
  • Implementation of AmPL, which adapts perturbation based on attacker feedback.
Read more
Bridging the Gap Between Average and Discounted TD Learning
Haoxing Tian, Zaiwei Chen, Ioannis Ch. Paschalidis, Alex Olshevsky
Reinforcement Learning Theory Optimization
  • Introduces a novel algorithm for average-reward TD learning that guarantees convergence to a unique solution.
  • Achieves quadratic scaling in sample complexity, improving upon previous quartic dependencies.
  • Applicable to both tabular and linear function approximation settings without requiring restrictive assumptions.
  • Convergence analysis is independent of the dimensionality of the parameter vector, enhancing general applicability.
Read more
Hierarchical Federated Learning for Networked AI: From Communication Saving to Architecture-Aware Design
Seyed Mohammad Azimi-Abarghouyi, Mehdi Bennis, Leandros Tassiulas
Federated Learning Optimization Theory
  • HFL should be viewed as an architecture-aware design framework for networked AI.
  • The framework is structured around three design axes: architectural parameters, optimization decomposition, and communication realization.
  • Convergence in HFL is influenced by the architecture, optimization roles, and communication mechanisms.
  • The paper provides a comparative analysis of flat FL, two-tier HFL, and deep HFL.
Read more
PepSpecBench: A Unified Evaluation Benchmark for Peptide Tandem Mass Spectrometry Prediction
Zhiwen Yang, Pan Liu, Yifan Li, Yunhua Zhong, Jun Xia
Theory
  • PepSpecBench standardizes data preprocessing and model evaluation for peptide MS/MS prediction.
  • The benchmark employs a strict backbone-disjoint splitting strategy to mitigate sequence leakage.
  • It introduces a comprehensive evaluation suite that includes cross-species testing and robustness assessments.
  • The framework reveals previously unrecognized performance discrepancies among existing models.
Read more
Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem
Mitchell A. Thornton
Theory
  • Optimal group selection can be solved in polynomial time using a generalized eigenvalue problem.
  • The minimum eigenvalue of the double-commutator matrix indicates the existence of a perfectly commuting generator.
  • The proposed method links group theory, matrix analysis, and statistical estimation in a novel way.
  • The double-commutator formulation subsumes existing methods in independent component analysis and structured matrix nearness.
Read more
Skipping the Zeros in Diffusion Models for Sparse Data Generation
Phil Sidney Ostheimer, Mayank Nagda, Andriy Balinskyy, Gabriel Vicente Rodrigues, Jean Radig, Carl Herrmann, Stephan Mandt, Marius Kloft, Sophie Fellenz
Generative Models Efficient ML
  • SED preserves sparsity patterns by focusing on non-zero values, avoiding unnecessary computations.
  • The method achieves computational efficiency that scales with the number of non-zero entries.
  • SED outperforms traditional diffusion models and domain-specific baselines in generating high-fidelity sparse data.
  • The approach provides insights into the inefficiencies of dense models when applied to sparse data.
Read more
Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications
Sofianos Panagiotis Fotias, Vassilis Gaganis
Optimization
  • Introduction of GP-Perm, a permutation-invariant Gaussian Process kernel for Bayesian Optimization.
  • Development of a Deep Kernel Learning model (DKL-DS) for learning permutation-invariant embeddings.
  • Evaluation of the proposed methods across multiple synthetic and realistic CCS scenarios.
  • Demonstration of improved sample efficiency and optimization performance in the presence of permutation symmetries.
Read more
Towards Systematic Generalization for Power Grid Optimization Problems
Zeeshan Memon, Yijiang Li, Hongwei Jin, Kibaek Kim, Liang Zhao
Optimization
  • Introduces a joint modeling framework for ACOPF and SCUC to enhance systematic generalization.
  • Utilizes a shared graph-based backbone to capture grid topology and physical interactions.
  • Incorporates solver supervision with physics-informed objectives for improved decision-making.
  • Demonstrates superior performance and transferability compared to existing learning-based approaches.
Read more
Sparse Regression under Correlation and Weak Signals: A Reproducible Benchmark of Classical and Bayesian Methods
Hao Xiao
Theory
  • Bayesian methods outperform classical methods in prediction accuracy, especially under high correlation.
  • The Horseshoe prior provides well-calibrated credible intervals, while Spike-and-Slab exhibits under-coverage.
  • Lasso is a strong contender for variable selection when posterior distributions are not required.
  • High correlation negatively impacts Lasso's variable selection performance, making Bayesian methods more robust.
Read more
Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data
Ruiqi Xue, Lei Yuan, Kainuo Cheng, Jing-Wen Yang, Yang Yu
Reinforcement Learning Large Language Models Robotics
  • PROCO leverages large language models to incorporate natural language knowledge into offline safe RL.
  • The framework generates a conservative cost function to estimate risks without needing unsafe samples.
  • Model-based rollouts are used to synthesize counterfactual unsafe samples for better policy learning.
  • PROCO outperforms existing methods in safety-critical tasks, demonstrating significant improvements in safety performance.
Read more
Differentiable Kernel Ridge Regression for Deep Learning Pipelines
Jean-Marc Mercier, Gabriele Santin
Computer Vision Reinforcement Learning Efficient ML
  • Introduction of Sparse Kernels (SKs) as a differentiable, localized variant of kernel ridge regression.
  • Integration of SKs into PyTorch as modular layers that maintain end-to-end trainability.
  • Decomposition of learning into three components: feature representations, target values, and evaluation points.
  • Empirical validation shows competitive performance with reduced training requirements across various deep learning architectures.
Read more
Multi-Perspective Transformers in ARC-AGI-2 Challenge
Caleb Talley, Vedant Tibrewal, Seun Adekunle, Weiwen Dong, Xinyu Wu, Fariha Sheikh
Computer Vision Large Language Models Efficient ML
  • Introduces a multi-perspective approach to solving ARC-AGI-2 puzzles using TinyLM.
  • Utilizes data augmentation techniques to generate multiple views of puzzles for improved pattern recognition.
  • Employs Test-Time Training (TTT) and Products of Experts (POE) for fine-tuning during evaluation.
  • Achieves high training accuracy but lower evaluation accuracy, highlighting the challenge of generalization.
Read more
Learning Koopman operators for coupled systems via information on governing equations of subsystems
Tatsuya Naoi, Jun Ohkubo
Theory Time Series
  • Introduces a method to learn Koopman operators for coupled systems using known governing equations of subsystems.
  • Addresses limitations of traditional data-driven methods like EDMD in terms of stability and accuracy.
  • Demonstrates the proposed method's effectiveness through numerical experiments on coupled oscillator systems.
  • Highlights the importance of incorporating prior knowledge into the learning process for improved model performance.
Read more
Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks
Zongqian Li, Yixuan Su, Han Zhou, Zihao Fu, Nigel Collier
NLP Large Language Models Efficient ML
  • Flexi-LoRA is the first input-adaptive LoRA framework that adjusts ranks based on input complexity.
  • Dynamic rank allocation improves performance while using fewer parameters compared to static LoRA.
  • Maintaining consistency between training and inference dynamics is critical for effective adaptation.
  • Mathematical reasoning tasks show a higher dependency on rank dynamics than question answering tasks.
Read more
Federated Semi-Supervised Graph Neural Networks with Prototype-Guided Pseudo-Labeling for Privacy-Preserving Gestational Diabetes Mellitus Prediction
G. Victor Daniela, A. Mallikarjuna Reddya, Uday Kumar Addankia, Sridhar Reddy Gogua, Sravanth Kumar Ramakuria
Federated Learning Graph Learning
  • Introduces FedTGNN-SS, the first federated semi-supervised GNN framework for clinical tabular EHR data.
  • Combines prototype-guided pseudo-labeling with adaptive graph refinement to reduce error accumulation.
  • Implements privacy-safe prototype sharing to facilitate cross-silo pseudo-label refinement without data transfer.
  • Achieves strong AUROC scores even with up to 80% missing labels in datasets.
Read more
Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients
Sejun Park, Yeachan Park, Geonho Hwang
Theory
  • Floating-point networks can represent almost all floating-point functions and their gradients using automatic differentiation.
  • Theoretical results extend to practical activation functions like ReLU, ELU, and Sigmoid.
  • The findings have implications for applications in scientific machine learning and adversarial attacks.
  • The paper establishes a formal theorem that guarantees the representability of both function values and gradients.
Read more