AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

72 Papers today
8h Update frequency
7 Days of history
Discovering Latent Groups for Robust Classification
Ankur Garg, Ulrich Aรฏvodji, Samira Ebrahimi Kahou, Vincent Michalski
Interpretability Optimization Theory
  • NCT framework encodes subgroup structure in a tree architecture for robust classification.
  • The model routes samples based on prediction correctness, preserving this structure for interpretability.
  • NCT achieves competitive performance on benchmark datasets while isolating minority subgroups.
  • The approach does not require subgroup annotations, making it more accessible for practical applications.
Read more
Causal Gaussian Processes for Robust Treatment Effect Evaluation with Unobserved Confounding
Junzhe Zhang, Jingyuan Chen, Elias Bareinboim
Theory
  • Introduces Causal Gaussian Processes (CGP) for evaluating treatment effects with unobserved confounding.
  • Develops a universal discretization method for approximating causal models in continuous domains.
  • Demonstrates the effectiveness of CGP in mitigating confounding bias in observational data.
  • Provides a framework that requires only basic temporal ordering between treatment and outcome.
Read more
A Gated Graph Neural Network Approach to Fast-Convergent Dynamic Average Estimation
Antonio Marino, Claudio Pacchierotti, Paolo Robuffo Giordano
Graph Learning Robotics Optimization
  • Introduction of a GGNN-based learning model for dynamic average estimation.
  • Formal analysis of stability properties and incorporation of a regularization term.
  • Development of an encoding-decoding mechanism to minimize communication overhead.
  • Demonstration of superior performance compared to traditional model-based estimators.
Read more
Unsupervised Disentanglement Without Compromises : How Functional Orthogonality Enforces Identifiability
Mathieu Cyrille Simon, Pascal Frossard, Christophe De Vleeschouwer
Theory Generative Models
  • Introduces functional orthogonality as a key property for unsupervised disentanglement.
  • Proves that orthogonality leads to identifiability in nonlinear generative models without statistical independence.
  • Empirical results confirm the effectiveness of orthogonality-regularized normalizing flows in recovering latent factors.
  • Challenges existing impossibility claims regarding unsupervised disentanglement.
Read more
PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection
Youji Zhu, Hongbing Wang, Wenchao Liu, Xiaodong Liu, Xiangguang Xiong
Time Series
  • Introduces PaAno+, a lightweight model for time series anomaly detection.
  • Utilizes multiscale feature extraction and cross-variable attention to improve anomaly detection accuracy.
  • Implements a novel self-supervised learning task for better feature representation.
  • Demonstrates state-of-the-art performance on the TSB-AD benchmark.
Read more
Learning by Shifting: Temporal View Construction for Time Series Contrastive Learning
Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
Time Series
  • ShiFT introduces a deterministic view construction method that encodes temporal shift invariance.
  • The approach outperforms complex augmentation-based methods while reducing training time.
  • Empirical analysis reveals the impact of batch size and negative samples on representation quality.
  • ShiFT achieves state-of-the-art results on multiple large-scale time series datasets.
Read more
Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting
Alireza Jafari, Judy Fox, Geoffrey C. Fox, Madhav Marathe, Aniruddha Adiga
Time Series
  • A systematic evaluation of forecasting models for influenza using ILI and hospitalization data.
  • Mixture-of-experts models outperform other architectures, indicating the benefit of diverse pretrained representations.
  • Numerical transformer-based models are reliable, especially with appropriate pretraining.
  • LLM-based forecasting methods are less effective compared to traditional numerical approaches.
Read more
MedTS-TTT: Test-Time Training for Medical Time Series Classification
Mingzhi Chen, Yiyu Gui, Guibo Luo
Time Series
  • MedTS-TTT enables online adaptation from unlabeled test samples, making it suitable for real-world clinical applications.
  • The framework utilizes CLSA-TTT for efficient single-step fast-weight updates, avoiding the computational overhead of iterative optimization.
  • MedTS-TTT achieved 11 top-1 rankings out of 12 evaluations across multiple metrics, showcasing its effectiveness.
  • The Gated Convolutional Backbone enhances the model's ability to manage local dynamics and information flow in medical time series data.
Read more
DevoTG: Temporal Graph Neural Networks for Modeling C. elegans Developmental Connectomics
Jayadratha Gayen, Bradly Alicea
Graph Learning Time Series
  • Introduction of DevoTG, a framework for analyzing C. elegans neural development using temporal graph methods.
  • Significant improvement in lineage prediction accuracy using TGNs compared to static GNNs.
  • Identification of three classes of synaptic connection stability, enhancing understanding of neural connectivity dynamics.
  • Provision of interactive visualizations to aid in biological hypothesis generation.
Read more
Short-Term Electricity Demand Forecasting for New England Using a Hybrid Transformer-XGBoost Framework with Weather, Calendar, and COVID-19 Indicators
Reza Ghanavati, Behrooz Mosallaei
Time Series
  • The hybrid Transformer-XGBoost framework significantly outperforms a tabular-only XGBoost model in short-term electricity demand forecasting.
  • COVID-19 indicators initially improved model accuracy but became less relevant as behavioral adaptations occurred post-pandemic.
  • Hyperparameter optimization using Optuna enhanced the model's performance through efficient search strategies.
  • The study emphasizes the importance of considering temporal validity decay in forecasting models affected by structural changes in demand patterns.
Read more
Distribution-Aware Diffusion-LLM for Robust Ultra-Long-Term Time Series Forecasting
Falguni Ghosh, Vahid Hashemi, Bernhard Kainz
Time Series Large Language Models Generative Models
  • Introduction of Diffusion-LLM framework that combines LLMs with conditional diffusion models for time series forecasting.
  • Improvement in multimodal alignment and probabilistic modeling through a shared latent space.
  • Significant performance gains in ultra-long-term and few-shot forecasting across multiple benchmarks.
  • Demonstration of DDPMs as effective regularizers for enhancing LLM robustness.
Read more
UltraQuant: 4-bit KV Caching for Context-Heavy Agents
Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao
Large Language Models Efficient ML NLP
  • UltraQuant improves 4-bit KV caching for context-heavy agents, addressing memory pressure and GPU utilization.
  • The method incorporates practical design choices, including asymmetric key/value treatment and optimized decode-attention kernels.
  • UltraQuant achieves a 3.47ร— reduction in time-to-first-token during late rounds and a 1.63ร— increase in output throughput over FP8 KV caching.
  • The approach emphasizes the importance of serving efficiency metrics in evaluating KV-cache performance.
Read more
Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs
Xiang Rao, Yuxuan Shen
Theory Efficient ML
  • Introduction of QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network for PDEs.
  • Theoretical proof of accelerated convergence rates and reduced numerical dispersion.
  • Validation across three seepage scenarios in porous media demonstrates superior performance.
  • Outperforms existing models in accuracy, error control, and dynamic tracking.
Read more
Reward-free Pretraining for Reinforcement Learning via Occupancy Coverage Maximization
Marco Pratticรฒ, Pietro Novelli, Massimiliano Pontil, Carlo Ciliberto
Reinforcement Learning
  • ROVER maximizes state-space coverage for effective exploration in sparse-reward environments.
  • The method employs a learned resolvent world model to estimate occupancy, addressing common estimation challenges.
  • Introduction of a virtual 'sink' state stabilizes learning by managing unsupported state-action regions.
  • Empirical results show ROVER achieves superior coverage and initialization compared to traditional reward-free methods.
Read more
A Reward-Petri-Net Interpretation of Temporal Behavior Trees
Till Schmeil, Gรผnther Waxenegger-Wilfing, Sebastian Schirmer
Reinforcement Learning Robotics Theory
  • Introduces a method to interpret Temporal Behavior Trees as Reward-Petri-Nets for reinforcement learning.
  • Demonstrates how TBTs can improve reward function design for complex robotic tasks with temporal constraints.
  • Shows that TBT-based rewards enhance sample efficiency and learning in challenging environments.
  • Provides a systematic way to assign rewards based on user-defined task importance and structure.
Read more
Parameterized Representations via Implicit Stochastic Modulation for High-Dimensional and High-Order Neural PDE Solvers
Zhangyong Liang, Huanhuan Gao
Theory Optimization Efficient ML
  • PRISM decouples parameter encoding from the spatial AD graph, addressing memory growth issues.
  • The architecture enables zero-shot extrapolation for parameterized PDEs without retraining.
  • PRISM supports efficient scaling of high-dimensional PDEs, achieving up to 100,000 dimensions on a single GPU.
  • Variance-aware Lipschitz damping is incorporated to enhance optimization stability.
Read more
Efficient Network Inference via Hardware-Aware Architecture Search, Model Pruning & Quantization
Lucas Heublein, Mark Deutel, Axel Plinge, Felix Ott
Efficient ML
  • Investigates efficient network inference for GNSS interference characterization under strict resource constraints.
  • Utilizes a deployment-oriented compression pipeline combining pruning and quantization with MCUNet as a baseline.
  • Applies hardware-aware zero-shot NAS to optimize network architecture and pruning configurations.
  • Demonstrates trade-offs between predictive performance and deployment efficiency through experimental evaluations.
Read more
Computational Identifiability
Lucius E.J. Bynum, Rajesh Ranganath, Kyunghyun Cho
Theory
  • Introduction of computational identifiability as a practical, computation-bound notion of identifiability.
  • Formalization of the relationship between causal effect estimation and meta-learning.
  • Empirical demonstration of computational identifiability in complex scenarios.
  • Provision of a framework for identifying causal effects with finite samples and error tolerances.
Read more
Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System
Zhiwen Yu, Derong Yang, Liujian Zhang, Kaixiang Yang, Peilin Zhan, Jianmin Lv, Jane You, C. L. Philip Chen
Efficient ML Theory Optimization
  • PIBLS is the first application of Broad Learning System (BLS) to solving PDEs, offering a backpropagation-free computational framework.
  • The framework reformulates PDE solving as a direct least-squares optimization, enhancing computational efficiency.
  • Rigorous mathematical proof establishes PIBLS's universal approximation property for PDE solutions.
  • Experimental results show PIBLS is significantly faster and more accurate than traditional PINNs.
Read more
VIMPO: Value-Implicit Policy Optimization for LLMs
Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, Xuandong Zhao
Reinforcement Learning Large Language Models Optimization
  • VIMPO is a critic-free policy optimization method that improves reasoning in LLMs.
  • It derives a policy-implied value function using KL-regularized reinforcement learning principles.
  • The method allows for token-level credit assignment without the instability of a learned critic.
  • Empirical results show VIMPO outperforms existing methods like GRPO, especially under noisy rewards.
Read more
When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting
Rupasree Dey, Abdul Matin, Nathan Orwick, Yao Zhang, Shrideep Pallickara, Sangmi Lee Pallickara
Time Series Efficient ML Theory
  • Introduction of Guard framework for dynamic multi-teacher knowledge distillation.
  • Adaptive mechanisms for selecting teacher models based on input statistics and uncertainty.
  • Significant RMSE reduction compared to traditional distillation methods.
  • Demonstrated effectiveness in four scientific domains despite distributional misalignment.
Read more
Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks
Leona Hennig, Marius Lindauer
Efficient ML Optimization Computer Vision
  • Introduces the first configuration space tailored for Deep Shift Neural Networks (DSNNs).
  • Combines multi-objective and multi-fidelity optimization techniques for efficient AutoML.
  • Demonstrates significant improvements in accuracy and reductions in emissions for optimized DSNNs.
  • Reveals model-specific trade-offs in quantization strategies that enhance energy efficiency.
Read more
EvoRubrics: Dynamic Rubrics as Rewards via Adversarial Co-Evolution for LLM Reinforcement Learning
Hongxin Ding, Baixiang Huang, Yue Fang, Weibin Liao, Zheng Li, Jinyang Zhang, Zhijing Wu, Junfeng Zhao, Yasha Wang
Reinforcement Learning Large Language Models NLP
  • EvoRubrics enables real-time co-evolution of rubrics and policies, enhancing the effectiveness of reinforcement learning.
  • The framework uses adversarial interactions to ensure that evaluation standards adapt to the evolving capabilities of the model.
  • EvoRubrics consistently outperforms static and dynamic rubric baselines across multiple benchmarks.
  • A self-supervised variant of EvoRubrics achieves meaningful performance gains, highlighting the potential for unsupervised learning.
Read more
On the Position Bias of On-Policy Distillation
Yan Xie, Sijie Zhu, Tiansheng Wen, Bo Chen, Yifei Wang
Reinforcement Learning Optimization Efficient ML
  • Identifies the position bias phenomenon in OPD, where early tokens provide more valuable supervision than later ones.
  • Proposes IW-OPD, which adjusts token weights based on the accumulated discrepancy between student and teacher distributions.
  • Demonstrates that IW-OPD converges faster and achieves better performance than standard OPD.
  • Shows that the advantages of IW-OPD increase with the mismatch between teacher and student models.
Read more
Adversarial Bandit Optimization with Globally Bounded Perturbations to Convex Losses
Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
Optimization Theory
  • Introduces a model for bandit optimization with C-approximately convex and ฮฒ-smooth function sequences.
  • Establishes expected regret guarantees that account for adversarial perturbations under a global budget.
  • Demonstrates that sublinear expected regret is achievable even with non-convex losses.
  • Modifies existing bandit algorithms to accommodate the new perturbation model.
Read more
An Empirical Study of OpenPangu Quantization on Ascend NPUs
Tong Shi, Jiacheng Wang, Hui Xie, Ying Li, Aishan Liu, Jinyang Guo, Xianglong Liu
NLP Large Language Models Efficient ML
  • 8-bit weight-only quantization is effectively lossless for OpenPangu models.
  • 4-bit quantization is practical for the 7B model but harmful for the 1B model.
  • Ultra-low precision quantization (2-bit and binary) often results in poor performance.
  • The study provides a comprehensive evaluation of various quantization methods on Ascend NPUs.
Read more
Right Knowledge, Wrong Answer: Test-Time Steering for Temporal Fact Conflicts in Open-Weight Language Models
Elias Hossain, Sourav Saha, Umesh Chandra Biswas, Sanjeda Sara Jennifer
NLP Large Language Models
  • Formalizes the concept of Parametric Temporal Conflict (PTC) in language models.
  • Introduces Temporal Attractor Steering (TAS) as a retrieval-free, inference-time intervention.
  • Demonstrates that TAS can effectively recover newer facts while preserving accuracy on non-conflict queries.
  • Evaluates TAS across multiple models and a comprehensive benchmark dataset.
Read more
On the Curse of Dimensionality in Private Sparse Covariance Estimation and PCA
Syamantak Kumar, Shourya Pandey, Purnamrita Sarkar, Kevin Tian
Theory
  • Demonstrates a significant curse of dimensionality in DP covariance estimation and PCA.
  • Establishes poly(k, log d) sample complexity for DP PCA under additional sparsity assumptions.
  • Provides poly(d) lower bounds for both sparse covariance estimation and PCA under DP.
  • First to show an exponential gap between private and non-private sample complexities in sparse estimation.
Read more
Geometric and Information Compression of Representations in Deep Learning
Linara Adilova, Henning Petzka, Asja Fischer, Bernhard C. Geiger
Theory
  • Low mutual information (MI) does not reliably indicate geometric compression in latent representations.
  • The relationship between MI and geometric compression is negative and nonlinear, influenced by training conditions.
  • Generalization may confound the connection between MI and geometric compression.
  • The study employs CEB networks and continuous dropout networks for robust MI estimation.
Read more
Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power Grids
Farhin Farhad Riya, Shahinul Hoque, Yingyuan Yang, Jinyuan Sun, Kevin Tomsovic
Theory Efficient ML
  • Introduces a lightweight defense mechanism against FDIA in DNNs used in CPS.
  • Utilizes pseudo-feature padding to increase input dimensionality and complexity.
  • Model-agnostic approach requiring no modifications to existing DNN architectures.
  • Demonstrates significant improvements in robustness with minimal impact on performance.
Read more
Meta-Reinforcement Learning via Evolution for Multi-Objective Combinatorial Supply Chain Optimisation
Rifny Rachman, Bahrul Ilmi Nasution, Josh Tingey, Richard Allmendinger, Pradyumn Shukla, Wei Pan
Reinforcement Learning Optimization
  • MERLION combines population-based evolutionary search with gradient-based meta-learning for enhanced solution diversity.
  • The framework maintains multiple meta-policies, allowing for better exploration of the Pareto front in complex supply chain scenarios.
  • Empirical results show significant improvements in hypervolume and Pareto front approximation compared to traditional methods.
Read more
FLFL: Federated Latent Factor Learning for Private Recovery of Spatio-Temporal Signals
Chengjun Yu, Di Wu, Yi He, Jia Chen
Federated Learning Time Series Optimization
  • FLFL enables accurate recovery of missing data in WSNs while preserving privacy.
  • The model utilizes a federated learning framework that minimizes the need for raw data sharing.
  • Incorporation of spatio-temporal correlations improves recovery accuracy.
  • Extensive experiments show FLFL outperforms existing models in recovery tasks.
Read more
DCD-PFN: A Decoupling-Aware Foundation Model for Causal Discovery
Zhengkang Guan, Yikang Chen, Yi He, Yunze Tong, Zijing Hu, Haoyuan Qian, Fei Wu, Kun Kuang
Graph Learning Theory Efficient ML
  • DCD-PFN is specifically designed for explicit structural causal discovery, enabling efficient causal graph reconstruction.
  • The model employs a decoupling-based local-to-global approach, grounded in theoretical frameworks without restrictive assumptions.
  • DCD-PFN demonstrates strong robustness and zero-shot generalization capabilities across various datasets.
  • The model addresses computational bottlenecks associated with traditional causal discovery methods.
Read more
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen
NLP Large Language Models Theory
  • Latent Chain-of-Thought models face challenges due to weak learning signals from outcome supervision.
  • The dual collapse phenomenon involves gradient attenuation and representational drift in latent spaces.
  • Process supervision can be effectively decomposed into Trajectory and Space Supervision.
  • Generative reconstruction is more effective than geometric compression for preserving information capacity.
Read more
Breaking chains with trees: Deep learning with $ (log N)$ parallel time complexity
Neeraj Mohan Sushma, Aditya Nagarsekar, Cabrel Teguemne Fokam, Robin Schiewer, Amit Kumar Pal, Anand Subramoney, David Kappel
Efficient ML Computer Vision NLP
  • HBLL allows training of deep neural networks without full backpropagation, improving scalability and parallelism.
  • The framework achieves O(log N) parallel time complexity, significantly enhancing computational efficiency.
  • HBLL demonstrates competitive performance on challenging benchmarks in vision classification and language modeling.
  • The method supports flexible inference by defining subnetworks based on hierarchical paths.
Read more
Bypassing Minimization Bias: A Shift-Invariant Variance Estimator for Off-Equilibrium Local Learning Coefficients
Yingjia Cai
Theory Optimization
  • Introduction of the Shift-Invariant Variance Estimator (SIVE) to bypass minimization bias in LLC estimation.
  • SIVE structurally eliminates the need for the local minimum by using variance and a noise-debiasing correction.
  • Controlled experiments validate SIVE's effectiveness in recovering geometric signals in off-equilibrium settings.
  • SIVE is scalable to deep neural networks, enabling real-time tracking of structural phase transitions during training.
Read more
PG-MAP: Joint MAP Optimization for Inference-Time Alignment of Diffusion and Flow-Matching Models
Ruolan Sun, Pawel Polak
Generative Models Optimization Multimodal
  • PG-MAP is the first framework to jointly optimize conditioning and latent variables during inference-time alignment.
  • The framework employs a forward-consistency coupling, allowing coordinated updates across modalities.
  • PG-MAP shows consistent improvements in alignment metrics across different diffusion models.
  • Human evaluations indicate a strong preference for outputs generated using PG-MAP compared to existing baselines.
Read more
Comparing Linear Probes with Mahalanobis Cosine Similarity
Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte
Interpretability Theory Large Language Models
  • MCS provides a near-perfect linear prediction of OOD AUROC across multiple models and datasets.
  • Theoretical proof establishes the linear relationship between MCS and OOD AUROC under specific conditions.
  • MCS outperforms ECS significantly in terms of correlation with probe performance.
  • The study identifies failure modes for the linearity of MCS and OOD AUROC, enhancing understanding of probe generalization.
Read more
Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET
Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis
Multimodal
  • Introduces a novel multimodal approach combining 3D MRI and PET for AD diagnosis.
  • Utilizes three fusion strategies and a sparsely gated Mixture-of-Experts classifier.
  • Achieves high classification accuracies across multiple diagnostic tasks.
  • Implements Grad-CAM for enhanced model interpretability.
Read more
DiT-Reward: Generative Representations for Text-to-Image Reward Modeling
Yuanming Yang, Guoqing Ma, Bo Wang, Yuan Zhang, Wei Tang, Chenyi Li, Haoyang Huang, Nan Duan
Generative Models Reinforcement Learning Multimodal
  • DiT-Reward effectively repurposes a pretrained text-to-image DiT as a reward model.
  • The method outperforms existing models like HPSv3 on multiple preference benchmarks.
  • A lightweight head can extract meaningful preference predictions even when the generative backbone is frozen.
  • Reward performance benefits from representations in the middle-to-late layers of the transformer.
Read more
When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage
Nafis Fuad Shahid
Federated Learning Computer Vision Theory
  • Quantifies the marginal-conditional coverage gap in federated CRC for medical segmentation.
  • Proposes a shrinkage-based federated CRC protocol that enhances prediction set efficiency.
  • Demonstrates that naive pooling of calibration scores can lead to critical failures in individual site coverage.
  • Identifies the necessity of finite-sample correction terms to avoid excessive violations.
Read more
One-Step Flow Matching for Generative Modeling of Path-Dependent Physical Fields
Yijing Zhou, Jasmin Jelovica
Generative Models
  • Introduction of a novel flow matching model for generating path-dependent stress fields.
  • Utilization of a transformer backbone for improved long-range dependency modeling.
  • Significant computational efficiency improvements over traditional finite element methods.
  • Ability to generate high-resolution fields in a single step without extensive sampling.
Read more
Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta
Theory Optimization
  • Introduces Riemannian sharpness as an invariant measure of flatness under reparametrization.
  • Establishes a connection between SGD's implicit bias and Riemannian flat minima through a derived SDE.
  • Demonstrates a PAC-Bayes generalization bound explicitly controlled by Riemannian sharpness.
  • Empirical validation shows Riemannian sharpness better predicts generalization than Euclidean sharpness.
Read more
VLA-FAIL: Efficient Task Failure Detection for Finetuned Vision-Language-Action Models
Florian Seligmann, Emiliyan Gospodinov, Enes Ulas Dincer, Gerhard Neumann
Robotics Efficient ML Multimodal
  • VLA-FAIL is a lightweight framework for detecting task failures in Vision-Language-Action models.
  • It combines two novel detection methods: LLMD for out-of-distribution state detection and ACC for action consistency monitoring.
  • The framework requires no failure data and incurs minimal computational overhead.
  • AUCPDT is introduced as a new metric to evaluate detection accuracy and latency.
Read more
The Cost Geometry of Belief: finite-resource inference under noisy observation
Laurent Caraffa
Theory
  • Introduces a cost geometry for beliefs based on optimal transport and Fisher information.
  • Establishes that certainty is an unattainable boundary in finite-resource inference.
  • Identifies three key results: a wall of certainty, an honesty condition, and a rigidity in belief geometries.
  • Demonstrates that the Gaussian distribution is the most hyperbolic belief in this framework.
Read more
Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study
Khalil Ahammad, Derek Abbott, Mohsen Dorraki
Computer Vision Large Language Models Multimodal
  • Physiology-aware CNN models outperform zero-shot multimodal LLMs in ECG image classification.
  • LeadGroupECG model effectively captures anatomical relationships among ECG leads.
  • CNN models achieved high ROC-AUC scores, indicating strong classification performance.
  • Zero-shot LLMs showed near-chance performance, highlighting limitations in ECG interpretation.
Read more
Spectral Retrieval-Augmented Time-Series Forecasting
Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le
Time Series
  • Introduction of SpecReTF, a novel retrieval-augmented forecasting architecture.
  • Combines frequency-domain analysis with recency-weighted pattern retrieval.
  • Unified similarity measure integrates Jensenโ€“Shannon divergence and cosine similarity.
  • SpecReTF achieves state-of-the-art forecasting accuracy on benchmark datasets.
Read more
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
Tianyi Li, Zhiqiang Shen
NLP Large Language Models Computer Vision
  • Introduces a scalable framework for linear mode connectivity in billion-parameter pretrained transformers.
  • Utilizes parameterized weight transformations and a dual learning procedure for effective model merging.
  • Achieves near-zero loss barriers on WikiText and maintains high accuracy on ImageNet during interpolation.
  • Demonstrates the importance of resolving parameter symmetries in enhancing model connectivity.
Read more
SamatNext v0.2-B: An Exploratory Study of RMS-Normalized Hybrid Decoders for Curriculum Retention in Small Code Models
Samat Zharassov
NLP Large Language Models
  • SamatNext v0.2-B demonstrates improved retention of prior capabilities compared to a standard Transformer baseline.
  • The hybrid architecture effectively balances retention and plasticity in curriculum learning settings.
  • Despite improvements, both models face challenges with catastrophic forgetting, particularly in early-stage syntax tasks.
  • The study emphasizes the importance of structured curriculum learning in training adaptive models.
Read more
Learning a Normal World Model for Few-Shot Boundary-Calibrated Abnormality Detection
Weizhi Nie, Weichao Liu, Weijie Wang, Yuting Su
Time Series
  • Introduces a normal world modeling framework for few-shot abnormality detection.
  • Develops an entropy-aware normal-world energy for quantitative evaluation of abnormality.
  • Demonstrates strong performance on the NASA C-MAPSS turbofan degradation benchmark.
  • Mechanistic validation tests confirm the model captures the structure of normal behavior.
Read more
ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation
Chen Lin, Kedi Chen, Wei Zhang
Large Language Models Reinforcement Learning NLP
  • Incorrect student-generated outputs can provide more valuable training signals than correct ones in OPD.
  • ReNIO introduces a prefix-computable reweighting method that emphasizes negative trajectories without needing final-answer labels.
  • The method leverages student-to-teacher probability ratios to identify and weight pivotal tokens leading to incorrect reasoning.
  • ReNIO shows substantial performance improvements in mathematical reasoning and code generation tasks.
Read more
Deep Learning for Soil Moisture Estimation: Fusing Satellite Data with Optimally-Lagged Meteorological Features
Adrian Canovas-Rodriguez, Aurora Gonzรกlez Vidal, Antonio F. Skarmeta
Time Series
  • Optimal meteorological and inter-depth lags were identified using Cross-Correlation Function (CCF).
  • A per-pixel CNN model showed significant improvement in soil moisture prediction when combined with depth features.
  • The CNN-LSTM hybrid model achieved the best overall performance in held-out data evaluation.
  • Incorporating subsurface depth information was crucial for enhancing prediction accuracy.
Read more
Superhuman AI for Generals.io Using Self-Play Reinforcement Learning
Matej Straka, Viliam Lisรฝ, Martin Schmid
Reinforcement Learning
  • Development of a high-speed JAX-native simulator for GENERALS.IO, enabling rapid training.
  • Creation of a superhuman AI agent that dominates the public leaderboard and defeats top human players.
  • Utilization of self-play reinforcement learning with a focus on sparse rewards and sample efficiency.
  • Identification of key training components, such as parameter EMA and top-advantage filtering, that enhance performance.
Read more
Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates
Lin Tang, Wei Zhang, Jing Li, Hongyu Chen, Ming Zhao, Yuxuan Wang
NLP Large Language Models Efficient ML
  • Formalizes adapter mergeability for LoRA, separating single-task utility from post-merge retention.
  • Introduces MergeProbe, a lightweight predictor that estimates mergeability based on early training signals.
  • Demonstrates improved retention in merging adapters across multiple domains compared to existing methods.
  • Shifts the merging process from a post-hoc evaluation to an anticipatory measurement problem.
Read more
Dynamic estimation of slowly varying sequences
Prashant Gokhale, Mikhail Khodak, Sandeep Silwal
Theory Efficient ML Optimization
  • Introduces a general framework for dynamic estimation of slowly varying sequences.
  • Develops an algorithm that adapts the estimation budget based on local changes, improving efficiency.
  • Achieves sharper estimation bounds compared to previous methods.
  • Demonstrates applicability to various mathematical problems, including matrix powers and PDEs.
Read more
Post-Training Speech Enhancement Language Models with Perceptual Rewards
Frรฉdรฉric Berdoz, Luca A. Lanzendรถrfer, Antonis Asonitis, Roger Wattenhofer
Audio & Speech Reinforcement Learning Optimization
  • Introduction of a post-training stage for autoregressive speech enhancement models using GSPO.
  • Development of a composite reward system that combines multiple perceptual metrics to avoid reward hacking.
  • Achieved state-of-the-art performance on DNS2020 and DNS5 benchmarks.
  • Human evaluations indicate a preference for multi-metric rewards over single-metric approaches.
Read more
Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression
Yong Yi Bay, Kathleen A. Yearick
Theory Efficient ML Optimization
  • Introduces KORE, a method for directly solving for the optimal hyperparameter in spline regression.
  • Establishes a closed-form relationship between resolution, bias, and variance based on classical approximation theory.
  • Demonstrates that KORE matches the accuracy of exhaustive search methods while significantly reducing computational costs.
  • Applies the method across multiple input dimensions and various datasets, showcasing its effectiveness in real-world scenarios.
Read more
How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural
Stuart Whipp
NLP Large Language Models Efficient ML
  • Introduces a novel measure of linear recoverability for Transformer FFN blocks using closed-form least squares.
  • Demonstrates that linear recoverability varies significantly across different FFN blocks and is a learned property rather than an architectural one.
  • Finds that residual nonlinearity is not well captured by low-order multiplicative models.
  • Highlights the potential for targeted compression of FFN blocks based on their linear recoverability profiles.
Read more
FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning
Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima
Robotics Computer Vision Efficient ML
  • Identification of a bottleneck trade-off in fixed-capacity LAMs affecting action alignment.
  • Introduction of retained-prefix training for variable-length latent actions.
  • FlexLAM outperforms traditional fixed-capacity LAMs across all evaluated token budgets.
  • Supports inference-time token-budget adjustments without retraining.
Read more
Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems
Zewen Liu
Large Language Models NLP Theory
  • Introduces Contagion Networks to measure evaluator bias propagation in multi-agent LLM systems.
  • Establishes a Cross-Agent Contagion Matrix (ฮ“N) for quantifying bias spread across agents.
  • Identifies three propagation regimes and demonstrates that homogeneous agents have weaker contagion effects.
  • Finds that increasing evaluator committee size can reduce effective contagion by 72.4%.
Read more
Generative Robust Optimisation
Yuhui Yin, Vassilis M. Charitopoulos
Optimization Generative Models Theory
  • Introduces a framework for robust optimisation that uses deep generative models to define uncertainty sets.
  • Establishes a five-point evaluation framework for assessing neural network-based uncertainty sets.
  • Demonstrates the application of a Wasserstein Adversarial Autoencoder for generating uncertainty sets.
  • Shows that the proposed method can effectively handle complex data distributions in optimisation problems.
Read more
Protein contacts are already in the attention: a single-forward-pass alternative to the Categorical Jacobian
Rome Thorstenson
NLP Large Language Models Theory
  • The proposed method allows for protein contact prediction in a single forward pass, significantly reducing computational cost compared to the Categorical Jacobian.
  • Averaging a small subset of attention heads captures the relevant contact signal effectively, outperforming the CJ method on leakage-clean data.
  • The optimal number of attention heads to average varies by architecture and reflects how models distribute contact information.
  • The study introduces representation-CJ, extending the applicability of contact prediction methods to architectures without a masked-LM head.
Read more
From Handcrafted Features to Functional Edge Learning: Evolution of EEG Seizure Detection Frameworks
Sepideh Kheirollahi, Mohammad Rasoul Roshanshah
Time Series Interpretability Efficient ML
  • Deep Learning models for EEG analysis face significant challenges in clinical deployment due to their black-box nature and high data requirements.
  • Kolmogorov-Arnold Networks (KANs) offer a new paradigm by using learnable activation functions, enhancing interpretability and efficiency.
  • KANs are more robust to data scarcity and can facilitate cross-patient personalization without extensive retraining.
  • The paper provides a structured analysis of EEG seizure detection methodologies, highlighting the need for transparent and efficient models.
Read more
Asymptotic Signal Subspace Recovery in Softmax Attention Models
Lan V. Truong
Theory
  • Establishes a theoretical framework for understanding attention mechanisms in noisy environments.
  • Demonstrates that learned query vectors converge to the latent signal subspace under specific conditions.
  • Connects stochastic learning dynamics with deterministic limits using dynamical systems theory.
  • Provides insights into the positive-feedback mechanism of attention in identifying informative tokens.
Read more
Robustness Cannot be Reduced to Regularization: Studying Adversarial Training Beyond the Linear Case
David A. R. Robin, Rafael Pinot, Yann Chevaleyre
Theory Optimization Efficient ML
  • Adversarial training is effective but computationally expensive.
  • No equivalence between adversarial risk and regularized risk exists for two-layer networks.
  • The impossibility of reformulating adversarial risk extends to deeper architectures.
  • The study emphasizes the need for new methodologies in adversarial training beyond linear models.
Read more
Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States
Gihyun Lee, Thorben Menne, Simon Olma, Jakob Hilgert, Sangyoung Park
Time Series Efficient ML Theory
  • The study compares four neural network architectures for predicting internal battery states.
  • U-Net architecture shows superior performance with a 3% mean final-step nRMSE.
  • The proposed models significantly reduce inference latency, achieving a 5.38ร— speed-up over traditional numerical solvers.
  • Spatial inductive bias is identified as a critical factor influencing surrogate model performance.
Read more
Physics-Informed Discovery of Yield Functions in Plasticity via Convex Neural Representations
Hyeonbin Moon, Donghyuk Cho, Jecheon Yu, Jeong Whan Yoon, Seunghwa Ryu
Theory Interpretability Optimization
  • Introduces a physics-informed framework for yield function discovery from displacement and force data.
  • Utilizes a convex neural network to represent yield functions, ensuring convexity and symmetry.
  • Trains the neural yield function using force equilibrium residuals instead of direct stress supervision.
  • Validated against benchmark yield functions using finite element simulations.
Read more
Sakana Fugu Technical Report
Yujin Tang, Edoardo Cetin, Jinglue Xu, Qi Sun, Stefan Nielsen, Vincent Richard, Haruto Goda, Iaroslav Tymchenko, Nhan Nguyen, Hyunin Lee, Mari Ashiga, Shashank Kotyan, So Kuroki, Tarin Clanuwat
Large Language Models NLP Reinforcement Learning
  • Sakana Fugu combines the strengths of multiple LLMs to create a collectively intelligent system.
  • Two model variants are introduced: Fugu for speed and Fugu-Ultra for high-quality answers.
  • The training methodology includes fine-tuning, evolutionary algorithms, and reinforcement learning.
  • Fugu models achieve state-of-the-art performance on various benchmarks, outperforming many existing models.
Read more
Information Lattice Learning as Probabilistic Graphical Model Structure Learning
Haizi Yu, Lav R. Varshney
Theory Interpretability Graph Learning
  • ILL provides a framework for learning interpretable rules from signals, emphasizing low complexity.
  • The probabilistic rules learned through ILL can be interpreted as marginal constraints in PGMs.
  • The information lattice structure aids in understanding the relationships between different abstractions.
  • ILL distinguishes between general and special lifting, impacting the reconstruction of probability distributions.
Read more
When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents
Yanhang Li, Zhichao Fan, Zexin Zhuang
Multimodal
  • High AUC scores in probing do not guarantee effective detection of malicious content.
  • The authors propose a candidate control set to improve evaluation methodologies.
  • Two post-hoc diagnostics are introduced to differentiate between genuine and spurious detections.
  • The study highlights the risks of shortcut learning in model evaluations.
Read more
New Smooth Loss functions for Robust Regression that Closely Approximate Absolute Error and Provide Improved Performance on Datasets With Significant Outliers
Mathew Mithra Noel, Arindam Banerjee, Yug D. Oswal, Geraldine Bessie Amali D, Venkataraman Muthiah-Nakarajan
Optimization Theory
  • Introduction of two new loss functions (SRL and SMAE) for robust regression.
  • Both proposed loss functions are infinitely differentiable and closely approximate MAE.
  • Extensive empirical comparisons show superior performance of SRL and SMAE over traditional losses like Huber and Log-Cosh.
  • The paper presents new robust linear regression models utilizing the proposed loss functions.
Read more
Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection
Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki
Multimodal
  • cfDNA is a promising biomarker for non-invasive multi-cancer early detection.
  • The review categorizes computational methods into statistical, machine learning, and deep learning approaches.
  • Multimodal ensemble approaches are identified as having the highest readiness for clinical integration.
  • Standardization of evaluation protocols is crucial for future research and comparison.
Read more