AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

65 Papers today
8h Update frequency
7 Days of history
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs
Linrui Ma, Yufei Cui, Kai Han, Yunhe Wang
NLP Large Language Models Generative Models
  • Introduction of a masked data training paradigm for DLLMs that enhances reasoning capabilities.
  • Development of an Information Density Driven Smart Noise Scheduler that focuses on high-density information regions.
  • Implementation of Complementary Priority Masking to balance logical reasoning and syntactic structure in training.
  • Empirical results show a 4% accuracy improvement on reasoning benchmarks compared to traditional methods.
Read more
Cost Trade-offs in Matrix Inversion Updates for Streaming Outlier Detection
Florian Grivet, Louise Travé-Massuyès
Theory Efficient ML Optimization
  • Introduces the Christoffel function and its relevance to outlier detection.
  • Derives computational costs for three matrix inversion update methods: DI, ISM, and WMI.
  • Provides a simple rule for selecting the optimal update method based on matrix size and update rank.
  • Validates theoretical findings with comprehensive simulations.
Read more
Discovering the Hidden Role of Gini Index In Prompt-based Classification
Ruixi Lin
NLP Large Language Models Optimization
  • The Gini Index serves as a valuable tool for detecting and optimizing class accuracy imbalances in classification tasks.
  • Significant relative accuracy imbalances exist in prompt-based classification for both text and image data.
  • A post-hoc bias mitigation method based on the Gini Index can effectively reduce accuracy disparities across classes.
  • The proposed method is model-agnostic, making it applicable to various classification scenarios without the need for retraining.
Read more
Simplex-to-Euclidean Bijection for Conjugate and Calibrated Multiclass Gaussian Process
Bernardo Williams, Harsha Vardhan Tetali, Arto Klami, Marcelo Hartmann
Theory Efficient ML
  • Introduces a conjugate and calibrated GP model for multi-class classification.
  • Utilizes Aitchison geometry to map class probabilities from simplex to Euclidean space.
  • Reduces the dimensionality of latent variables from K to D (K-1) for improved efficiency.
  • Compatible with sparse GP regression techniques for scalability.
Read more
On the (Generative) Linear Sketching Problem
Xinyu Yuan, Yan Qiao, Zonghui Wang, Wenzhi Chen
Generative Models Theory Efficient ML
  • Identifies orthogonal information loss as a key challenge in linear sketching.
  • Introduces FLORE, a generative sketching framework that enables high-quality recovery.
  • FLORE can be trained without ground-truth data, enhancing its applicability.
  • Demonstrates significant performance improvements over existing sketching methods.
Read more
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks
Francesco Sovrano, Lidia Losavio, Giulia Vilone, Marc Langheinrich
Interpretability
  • Introduces in-context symbolic regression methods for improved operator extraction in KANs.
  • GSR and GMP methods enhance stability and robustness in symbolic regression.
  • GSR achieves up to 99.8% reduction in median OFAT test MSE.
  • GMP integrates operator selection into the training process, reducing computational costs.
Read more
Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models
Subina Khanal, Seshu Tirupathi, Merim Dzaferagic, Marco Ruffini, Torben Bach Pedersen
Time Series
  • Introduction of a millisecond-resolution dataset for high-frequency time series data from 5G networks.
  • Expansion of TSFM applicability to the wireless network domain.
  • Demonstration of poor performance of existing TSFMs on high-frequency data in both zero-shot and fine-tuned scenarios.
  • Highlighting the importance of high-frequency datasets for improving TSFM architectures and fine-tuning strategies.
Read more
Benchmarking Open-Source PPG Foundation Models for Biological Age Prediction
N. Brag
Time Series
  • AI-PPG Age model fails to generalize across different clinical populations.
  • Pulse-PPG outperforms AI-PPG Age in biological age prediction.
  • Fusing PPG embeddings with demographic data enhances prediction accuracy.
  • The PPG age gap correlates with cardiovascular risk factors.
Read more
PhasorFlow: A Python Library for Unit Circle Based Computing
Dibakar Sigdel, Namuna Panday
Theory Optimization Time Series
  • Introduction of PhasorFlow, a library for unit circle based computing.
  • Formalization of the Phasor Circuit model with a library of 22 gates.
  • Development of Variational Phasor Circuits for classical machine learning tasks.
  • Implementation of a DFT-based token mixing layer in the Phasor Transformer.
Read more
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
Mayank Mishra, Shawn Tan, Ion Stoica, Joseph Gonzalez, Tri Dao
NLP Large Language Models Efficient ML
  • M2RNN introduces matrix-valued states and non-linear transitions for improved language modeling.
  • The architecture overcomes limitations of traditional non-linear RNNs by expanding state size efficiently.
  • Empirical results show significant performance gains in language modeling and state tracking.
  • Hybrid models incorporating M2RNN layers demonstrate superior accuracy with reduced state sizes.
Read more
Reconciling In-Context and In-Weight Learning via Dual Representation Space Encoding
Guanyu Chen, Ruichen Wang, Tianren Zhang, Feng Chen
NLP Large Language Models Theory
  • Introduces CoQE architecture to separate context and sample encoding.
  • Demonstrates that dual representation spaces can alleviate ICL-IWL conflict.
  • Provides theoretical and empirical validation of the proposed method.
  • Shows improved performance in both in-distribution and out-of-distribution scenarios.
Read more
Bootstrapped Physically-Primed Neural Networks for Robust T2 Distribution Estimation in Low-SNR Pancreatic MRI
Hadas Ben Atya, Nicole Abramenkov, Noa Mashiah, Luise Brock, Daphna Link Sourani, Ram Weiss, Moti Freiman
Theory
  • Introduces a bootstrap-based inference strategy for robust T2 distribution estimation.
  • Transforms deterministic T2 relaxometry networks into probabilistic ensemble predictors.
  • Demonstrates superior performance in low-SNR conditions compared to traditional methods.
  • Achieves significant improvements in clinical differentiation tasks, particularly for T1DM.
Read more
OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning
Hao Wu, Yongheng Zhang, Yuan Gao, Fan Xu, Fan Zhang, Ruobing Xie, Ruijian Gou, Yuxuan Liang, Xiaomeng Huang, Xian Wu
Multimodal Large Language Models Interpretability
  • OMNIFLOW is the first training-free framework for generalized fluid physical reasoning using LLMs.
  • It introduces a Semantic-Symbolic Alignment mechanism for better understanding of physical structures.
  • The Physics-Guided Chain-of-Thought workflow ensures adherence to physical laws during reasoning.
  • Empirical results show superior performance in zero-shot and few-shot tasks compared to traditional models.
Read more
Physics-integrated neural differentiable modeling for immersed boundary systems
Chenglin Li, Hang Xu, Jianting Chen, Yanfei Zhang
Theory Efficient ML Optimization
  • Introduces a physics-integrated framework for long-horizon prediction of immersed boundary flows.
  • Replaces traditional pressure projection with a learned implicit correction to reduce computational costs.
  • Employs a sub-iteration strategy to enhance stability during coarse-grid rollouts.
  • Achieves significant improvements in flow-field fidelity and long-horizon stability over existing models.
Read more
Generative Inverse Design with Abstention via Diagonal Flow Matching
Miguel de Campos, Werner Krebs, Hanno Gottschalk
Generative Models Optimization Theory
  • Introduction of Diagonal Flow Matching (Diag–CFM) to improve stability and accuracy in generative inverse design.
  • Development of two novel uncertainty metrics, Zero-Deviation and Self-Consistency, for assessing design reliability.
  • Demonstrated significant improvements in round-trip accuracy over existing methods across various design dimensions.
  • Practical capabilities include candidate selection, abstention from unreliable predictions, and out-of-distribution detection.
Read more
Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits
Jia Qing Yap
Large Language Models Interpretability
  • Introduces SAE-decoded probe steering for behavioral intervention in a 35B MoE model.
  • Finds that all five behavioral traits primarily modulate a single agency axis.
  • Demonstrates a dissociation between correlation and causal efficacy in behavioral steering.
  • Establishes that behavioral commitments are computed during the prefill phase, not during autoregressive decoding.
Read more
Laya: A LeJEPA Approach to EEG via Latent Prediction over Reconstruction
Saarang Panchavati, Uddhav Panchavati, Corey Arnold, William Speier
Time Series
  • Laya is the first EEG foundation model utilizing the LeJEPA framework.
  • The model focuses on latent prediction instead of signal reconstruction to improve representation learning.
  • Laya shows improved performance over traditional reconstruction-based EEG models.
  • The use of explicit geometric regularization helps prevent representation collapse.
Read more
Grid-World Representations in Transformers Reflect Predictive Geometry
Sasha Brenner, Thomas R. Knösche, Nico Scherf
NLP Large Language Models Theory
  • Transformers develop internal representations that reflect the geometry of the latent world.
  • Optimal prediction in constrained random walks is based on a sufficient vector determined by position and time.
  • The study shows strong alignment between learned representations and ground-truth predictive vectors.
  • Low-dimensional representations emerge from training on stochastic processes.
Read more
Understanding Quantization of Optimizer States in LLM Pre-training: Dynamics of State Staleness and Effectiveness of State Resets
Kristi Topollai, Anna Choromanska
Optimization Large Language Models Efficient ML
  • Quantization of optimizer states can lead to state-update stalling, reducing responsiveness.
  • A predictive model for stalling helps understand when optimizer-state resets are beneficial.
  • Resetting stale optimizer states can recover performance in low-precision settings.
  • The timing of resets is crucial; applying them too early can discard useful averaging.
Read more
Attribution-Guided Model Rectification of Unreliable Neural Network Behaviors
Peiyu Yang, Naveed Akhtar, Jiantong Jiang, Ajmal Mian
Interpretability Computer Vision Efficient ML
  • Introduces a rank-one model editing framework for rectifying unreliable neural network behaviors.
  • Develops an attribution-guided method for layer localization to identify the most affected layers.
  • Achieves effective model rectification with as few as one cleansed sample.
  • Demonstrates robustness against neural Trojans, spurious correlations, and feature leakage.
Read more
DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning
Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya
Reinforcement Learning Federated Learning Optimization
  • DeFRiS enables silo-cooperative scheduling while maintaining data privacy.
  • The framework employs an action-space-agnostic policy for seamless knowledge transfer.
  • It integrates a silo-optimized local learning mechanism to address sparse delayed rewards.
  • The Dual-Track Non-IID aggregation protocol enhances robustness against adversarial threats.
Read more
Trajectory-Optimized Time Reparameterization for Learning-Compatible Reduced-Order Modeling of Stiff Dynamical Systems
Joe Standridge, Daniel Livescu, Paul Cizmas
Optimization Theory Efficient ML
  • Introduces trajectory-optimized time reparameterization (TOTR) for ML-ROMs to address stiffness in dynamical systems.
  • TOTR formulates time reparameterization as an optimization problem, improving the learnability of neural ODEs.
  • Demonstrates significant improvements in training efficiency and prediction accuracy across multiple stiff benchmark problems.
  • Achieves loss reductions of one to two orders of magnitude compared to existing time reparameterization methods.
Read more
Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods
Hong Jeong
NLP Large Language Models
  • Persistent memory can be integrated into frozen LLMs, allowing for information retention across sessions.
  • The study introduces six architectural methods for memory management, focusing on differentiable operations within the model's latent space.
  • Memory capacity significantly impacts the performance of the memory systems, with higher capacities leading to better recall.
  • The concept of conversational learning is introduced, where the model accumulates knowledge over multiple sessions.
Read more
MBD: A Model-Based Debiasing Framework Across User, Content, and Model Dimensions
Yuantong Li, Lei Yuan, Zhihao Zheng, Weimiao Wu, Songbin Liu, Jeong Min Lee, Ali Selman Aydin, Shaofeng Deng, Junbo Chen, Xinyi Zhang, Hongjing Xia, Sam Fieldman, Matthew Kosko, Wei Fu, Du Zhang, Peiyu Yang, Albert Jin Chung, Xianlei Qiu, Miao Yu, Zhongwei Teng, Hao Chen, Sunny Baek, Hui Tang, Yang Lv, Renze Wang, Qifan Wang, Zhan Li, Tiantian Xu, Peng Wu, Ji Liu
Theory Optimization Efficient ML
  • Introduction of a generalized debiasing framework that shifts from point-wise error minimization to distributional bias mitigation.
  • Utilization of a dual-prediction model architecture that integrates distribution modeling without requiring separate serving infrastructure.
  • Demonstrated effectiveness through large-scale deployment, improving long-term retention and engagement metrics.
  • Flexible definition of 'unbiasedness' allows for adaptation to various personalization objectives.
Read more
When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making
Nazia Riasat
Large Language Models NLP Theory
  • Stability in LLM outputs does not ensure correctness in scientific decision-making.
  • LLMs can produce consistent outputs while diverging from statistical ground truths.
  • Minor changes in prompt wording can significantly affect LLM outputs.
  • Relaxed statistical thresholds may lead to over-selection of gene candidates.
Read more
HIPO: Instruction Hierarchy via Constrained Reinforcement Learning
Keru Chen, Jun Luo, Sen Lin, Yingbin Liang, Alvaro Velasquez, Nathaniel Bastian, Shaofeng Zou
NLP Large Language Models Reinforcement Learning
  • HIPO formulates instruction hierarchy as a Constrained Markov Decision Process (CMDP), a novel approach in this domain.
  • The algorithm employs a primal-dual safe reinforcement learning method to ensure system prompt compliance while optimizing user utility.
  • Extensive experiments show that HIPO significantly enhances compliance and utility across various LLM architectures.
  • Attention analysis indicates that HIPO effectively reallocates model attention towards system instruction tokens.
Read more
Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards
Yuxuan Zhu, Daniel Kang
Reinforcement Learning Large Language Models
  • Noisy data significantly degrades the performance of RLVR models.
  • Existing algorithmic improvements fail to mitigate the impact of data noise.
  • Training on 100% incorrect annotations leads to performance similar to format-only rewards.
  • Real-world annotation errors in Text2SQL tasks further illustrate the destructive impact of noise.
Read more
The Importance of Being Smoothly Calibrated
Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, Pranay Tankala
Theory
  • Introduces a new omniprediction guarantee for smoothly calibrated predictors.
  • Characterizes smooth calibration in terms of earth mover's distance to perfect calibration.
  • Demonstrates that upper distance to calibration cannot be estimated within a quadratic factor.
  • Unifies and extends previous results on omniprediction from smooth calibration.
Read more
Residual Stream Duality in Modern Transformer Architectures
Yifan Zhang
NLP Large Language Models Theory
  • Residual pathways in Transformers are crucial for representation, not just optimization.
  • A depth-wise residual attention read is equivalent to ShortSWA on the depth axis.
  • Existing models like ELC-BERT and DenseFormer illustrate the benefits of learned aggregation over depth.
  • Sequence-axis ShortSWA is generally more hardware-efficient than depth-axis aggregation.
Read more
Learning Lineage-guided Geodesics with Finsler Geometry
Aaron Zweig, Mingxuan Zhang, David A. Knowles, Elham Azizi
Time Series Theory Optimization
  • Introduction of a Finsler metric for trajectory inference that incorporates lineage information.
  • Formal proof of the well-defined local geometry induced by the proposed metric.
  • Demonstration of improved accuracy in trajectory interpolation tasks using the new metric.
  • Integration of discrete and directed priors enhances the modeling of biological systems.
Read more
Mechanistic Foundations of Goal-Directed Control
Alma Lago
Robotics Theory Interpretability
  • Extends mechanistic interpretability to embodied control systems using infant motor learning as a model.
  • Identifies critical parameters influencing the formation of causal control circuits.
  • Demonstrates a clean phase transition in arbitration mechanisms during learning.
  • Establishes a two-dimensional phase diagram for task-dependent route arbitration.
Read more
Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting
Ziqing Ma, Kai Ying, Xinyue Gu, Tian Zhou, Tianyu Zhu, Haifan Zhang, Peisong Niu, Wang Zheng, Cong Bai, Liang Sun
Multimodal Time Series Optimization
  • Introduces Baguan-solar, a two-stage multimodal framework for solar irradiance forecasting.
  • Combines global weather foundation model forecasts with high-resolution satellite imagery.
  • Achieves a 16.08% reduction in RMSE compared to strong baseline models.
  • Effectively resolves fine-scale cloud structures and improves long-term forecasting accuracy.
Read more
Grokking as a Variance-Limited Phase Transition: Spectral Gating and the Epsilon-Stability Threshold
Pratyush Acharya, Habish Dhakal
Optimization Theory
  • Introduction of the Spectral Gating mechanism that regulates the transition from memorization to generalization.
  • Identification of a stability condition that constrains grokking, requiring accumulated gradient variance to access the generalizing solution.
  • Categorization of three complexity regimes that describe different learning dynamics.
  • Refutation of the 'Flat Minima' hypothesis, emphasizing the role of anisotropic noise in achieving generalization.
Read more
Decoding the Critique Mechanism in Large Reasoning Models
Hoang Phan, Quang H. Nguyen, Hung T. Q. Le, Xiusi Chen, Heng Ji, Khoa D. Doan
Large Language Models Interpretability Theory
  • LRMs exhibit a hidden critique ability that allows for self-correction despite errors in intermediate reasoning.
  • Injecting arithmetic mistakes reveals that LRMs can still produce correct final answers, indicating internal error detection mechanisms.
  • A critique vector is identified that captures the model's ability to detect and correct errors without explicit verbalization.
  • Steering the critique vector improves the model's performance on error detection tasks without requiring extra training.
Read more
A federated learning framework with knowledge graph and temporal transformer for early sepsis prediction in multi-center ICUs
Yue Chang, Guangsen Lin, Jyun Jie Chuang, Shunqi Liu, Xinkui Li, Yaozheng Li
Federated Learning Graph Learning Time Series
  • Integration of federated learning with knowledge graphs and temporal transformers for sepsis prediction.
  • Preservation of patient privacy through decentralized model training without sharing raw data.
  • Significant performance improvements over traditional centralized and federated learning models.
  • Utilization of a knowledge graph to enhance model interpretability and capture complex clinical relationships.
Read more
PiGRAND: Physics-informed Graph Neural Diffusion for Intelligent Additive Manufacturing
Benjamin Uhrich, Tim Häntschel, Erhard Rahm
Graph Learning
  • Introduction of PiGRAND, a physics-informed graph neural diffusion framework for heat transport modeling.
  • Development of an efficient graph construction method for transforming thermal images into graph data.
  • Integration of physical principles and sub-learning models to enhance prediction accuracy.
  • Significant improvements in computational performance and accuracy over traditional methods.
Read more
SemRep: Generative Code Representation Learning with Code Transformations
Weichen Li, Jiamin Song, Bogdan Alexandru Stoica, Arav Dhoot, Gabriel Ryan, Shengyu Fu, Kexin Pei
Large Language Models Generative Models Optimization
  • SEMREP improves code transformation by learning generative code representations.
  • The framework utilizes semantics-preserving transformations as intermediate representations.
  • SEMREP outperforms existing methods in correctness, performance, generalization, and robustness.
  • The approach is particularly effective for evolutionary search in code optimization.
Read more
PLUME: Building a Network-Native Foundation Model for Wireless Traces via Protocol-Aware Tokenization
Swadhin Pradhan, Shazal Irshad, Jerome Henry
NLP Large Language Models Efficient ML
  • Plume is a 140M-parameter foundation model tailored for 802.11 wireless traces, emphasizing protocol-aware tokenization.
  • The model achieves higher accuracy and efficiency compared to larger generalist LLMs, with >600× fewer parameters.
  • A novel protocol- and timing-aware tokenizer enhances information density and reduces sequence length significantly.
  • Plume supports on-premises deployment, ensuring privacy and facilitating root cause analysis without reliance on external APIs.
Read more
Joint Routing and Model Pruning for Decentralized Federated Learning in Bandwidth-Constrained Multi-Hop Wireless Networks
Xiaoyu He, Weicai Li, Tiejun Lv, Xi Yu
Federated Learning Optimization Efficient ML
  • Introduction of a joint routing and model pruning framework for D-FL.
  • Establishment of convergence guarantees under communication constraints.
  • Development of a routing algorithm that improves multi-hop transmission efficiency.
  • Significant reductions in transmission latency and improvements in model accuracy demonstrated through simulations.
Read more
Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention
Jeffrey D. Varner
Generative Models
  • Introduces stochastic attention (SA) for training-free protein sequence generation.
  • SA effectively samples from a Boltzmann distribution derived from stored protein sequences.
  • Generates sequences with low KL divergence and high structural plausibility.
  • Outperforms traditional generative models in maintaining sequence identity to canonical family folds.
Read more
A Stability-Aware Frozen Euler Autoencoder for Physics-Informed Tracking in Continuum Mechanics (SAFE-PIT-CM)
Emil Hovad
Computer Vision Theory Interpretability
  • Introduces SAFE-PIT-CM, an autoencoder that integrates a frozen PDE operator for stability-aware latent-space transitions.
  • The SAFE operator enables accurate parameter recovery by addressing numerical stability issues in temporal data.
  • Supports zero-shot inference, allowing learning from a single simulation without ground-truth labels.
  • Demonstrates effectiveness on the heat equation and reverse heat equation, recovering transport coefficients accurately.
Read more
Adaptive regularization parameter selection for high-dimensional inverse problems: A Bayesian approach with Tucker low-rank constraints
Qing-Mei Yang, Da-Qing Zhang
Efficient ML Theory Computer Vision
  • Introduces a variational Bayesian method with Tucker decomposition for high-dimensional inverse problems.
  • Employs adaptive regularization through per-mode precision parameters for anisotropic structures.
  • Estimates noise levels from data, eliminating reliance on prior noise knowledge.
  • Demonstrates consistent performance improvements over traditional methods in various experimental settings.
Read more
Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning
Reek Das, Biplab Kanti Sen
Federated Learning
  • FedAOT introduces a meta-learning-based aggregation strategy that dynamically assigns client weights to enhance robustness against Byzantine attacks.
  • The adaptive optimization mechanism reduces the impact of unreliable updates without relying on prior attack assumptions.
  • FedAOT provides a unified defense against untargeted poisoning and label-flipping attacks in non-IID and heterogeneous data distributions.
  • Empirical evaluations show that FedAOT outperforms existing Byzantine-robust approaches in terms of resilience and convergence.
Read more
Generalization and Memorization in Rectified Flow
Mingxing Rao, Daniel Moyer
Generative Models Theory Efficient ML
  • Development of three test statistics for Membership Inference Attacks tailored for Rectified Flow models.
  • Significant performance improvements in MIA metrics, indicating enhanced understanding of memorization dynamics.
  • Identification of a peak susceptibility to MIA at the midpoint of integration during training.
  • Proposed substitution of uniform timestep sampling with a Symmetric Exponential distribution to mitigate memorization risks.
Read more
HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification
Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, Kalyan Ramakrishnan, Flaviu Cipcigan, Philip Torr, Alessandro Abate
Theory Optimization Large Language Models
  • HorizonMath provides a benchmark of over 100 unsolved mathematical problems, immune to data contamination.
  • The framework automates verification of solutions using high-precision numerical comparisons.
  • The benchmark aims to measure AI's capability for novel mathematical discovery rather than just problem-solving.
  • Initial results show that GPT 5.4 Pro proposed solutions that may improve upon existing mathematical results.
Read more
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
Zhenghang Song, Tang Qian, Lu Chen, Yushuai Li, Zhengke Hu, Bingbing Fang, Yumeng Song, Junbo Zhao, Sheng Zhang, Tianyi Li
Efficient ML Theory Optimization
  • FEAT introduces a linear-complexity model for structured data, overcoming the limitations of quadratic self-attention.
  • The architecture combines local and global modeling strategies to maintain expressive representations.
  • Robustness is enhanced through a hybrid structural causal model and stable reconstruction objectives.
  • FEAT shows significant improvements in zero-shot performance and inference speed on real-world datasets.
Read more
Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning
Jello Zhou, Vudtiwat Ngampruetikorn, David J. Schwab
Reinforcement Learning Optimization Theory
  • Stochastic resetting accelerates policy convergence in reinforcement learning environments.
  • Resetting improves learning efficiency even when it does not reduce search time for a random walker.
  • The mechanism of resetting biases learning towards shorter trajectories and efficient reward propagation.
  • Stochastic resetting preserves the optimal policy while enhancing convergence speed.
Read more
RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation
Yixuan Huang, Jiawei Chen, Shengfan Zhang, Zongsheng Cao
Graph Learning
  • RaDAR addresses structural semantics degradation and limited relational expressiveness in recommendation systems.
  • The framework combines graph generative and relation-aware denoising models for enhanced robustness.
  • Innovations include asymmetric contrastive learning and diffusion-guided augmentation.
  • Extensive experiments show superior performance over existing methods in sparse and noisy conditions.
Read more
Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function
Amon Lahr, Anna Scampicchio, Johannes Köhler, Melanie N. Zeilinger
Theory Optimization Robotics
  • Introduces a tight, distribution-free uncertainty bound for multivariate kernel regression.
  • Addresses limitations of existing bounds that are either conservative or difficult to apply in multi-output cases.
  • Utilizes a Gaussian process-based dual function framework for deriving the uncertainty bounds.
  • Demonstrates the application of the proposed method through a quadrotor dynamics learning example.
Read more
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
Christian Belardi, Justin Lovelace, Kilian Q. Weinberger, Carla P. Gomes
Generative Models Computer Vision Optimization
  • Adaptive moment estimation significantly stabilizes noisy likelihood scores in guided diffusion sampling.
  • The proposed method achieves state-of-the-art results in image restoration and class-conditional generation tasks.
  • Performance improves consistently across varying task difficulties, suggesting robustness.
  • The approach is simple and computationally efficient compared to more complex alternatives.
Read more
Manifold-Matching Autoencoders
Laurent Cheret, Vincent Létourneau, Isar Nejadgholi, Chris Drummond, Hussein Al Osman, Maia Fraser
Theory Efficient ML
  • Introduction of Manifold-Matching Autoencoder (MMAE) for unsupervised dimensionality reduction.
  • MMAE aligns pairwise distances in latent space with input data distances using mean squared error.
  • Demonstrated superior performance in preserving topological features compared to existing methods.
  • MMAE provides a scalable alternative to Multi-Dimensional Scaling (MDS).
Read more
Novelty-Driven Target-Space Discovery in Automated Electron and Scanning Probe Microscopy
Utkarsh Pratiush, Kamyar Barakati, Boris N. Slautin, Catherine C. Bodinger, Christopher D. Lowe, Brandi M. Cossairt, Sergei V. Kalinin
Optimization Robotics Theory
  • Introduction of the BEACON framework for novelty-driven exploration in microscopy.
  • Benchmarking against classical acquisition strategies to evaluate exploration quality.
  • Successful transition from offline validation to real experimental implementation.
  • Provision of reproducible notebooks for community use and adaptation.
Read more
Unlearning-based sliding window for continual learning under concept drift
Michał Wozniak, Marek Klonowski, Maciej Maczynski, Bartosz Krawczyk
Computer Vision Theory Efficient ML
  • Introduces UIL, a framework that combines machine unlearning with continual learning to address concept drift.
  • Demonstrates that unlearning outdated data followed by incremental adaptation can be computationally efficient.
  • Empirical results show UIL's effectiveness in image classification tasks with concept drift.
  • Establishes a theoretical foundation connecting machine unlearning and concept drift mitigation.
Read more
SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds
Viktor Stein, Wuchen Li, Gabriele Steidl
NLP Large Language Models Optimization
  • Introduction of accelerated attention blocks based on inertial dynamics.
  • Tokens are represented with both spatial features and velocity variables.
  • Demonstrated faster convergence rates than classical attention mechanisms.
  • Preservation of elliptically contoured probability distributions.
Read more
More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search
Gal Dalal, Assaf Hallak, Gal Chechik, Yftach Ziser
NLP Large Language Models Theory
  • Wider beam search can introduce systematic overestimation bias that degrades output quality.
  • The maximum useful beam width (kÌ‚) is determined by the signal-to-noise ratio of the scoring mechanism.
  • Perplexity scoring shows no benefit at any beam width, while PRM scoring can yield significant performance gains.
  • A principled approach for beam width selection is proposed, focusing on output quality rather than inference efficiency.
Read more
FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios
Andrea Moleri, Christian Internò, Ali Raza, Markus Olhofer, David Klindt, Fabio Stella, Barbara Hammer
Federated Learning Generative Models Computer Vision
  • FederatedFactory recovers centralized performance under extreme single-class silo conditions.
  • The framework operates without dependencies on external pretrained models, relying solely on localized generative priors.
  • It achieves one-shot communication efficiency, reducing the need for iterative updates.
  • The framework allows for exact modular unlearning, enhancing privacy and data management.
Read more
GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators
Mattia Rigotti, Nicholas Thumiger, Thomas Frick
Graph Learning Theory Efficient ML
  • GIST achieves end-to-end O(N) complexity using random projections.
  • The architecture preserves gauge invariance through inner-product-based attention.
  • GIST enables discretization-invariant learning, facilitating parameter transfer across different mesh resolutions.
  • Empirical results demonstrate state-of-the-art performance on both graph and mesh-based benchmarks.
Read more
Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes
Binesh Sadanandan
Multimodal Time Series NLP
  • Introduces a multimodal deep learning model that combines structured EHR data with clinical notes for predicting ICU patient deterioration.
  • Achieves a test AUROC of 0.7857, outperforming traditional models that rely solely on structured data.
  • Demonstrates that clinical notes significantly enhance predictive performance, improving AUROC by 2.5 percentage points.
  • Provides a systematic review of 31 studies, revealing gaps in the integration of clinical text in existing models.
Read more
Online Semi-infinite Linear Programming: Efficient Algorithms via Function Approximation
Yiming Zong, Jiashuo Jiang
Optimization Theory Efficient ML
  • Introduces a novel formulation for Online Semi-infinite Linear Programming (OSILP) using function approximation.
  • Establishes regret bounds that are independent of the number of constraints, enhancing scalability.
  • Develops a two-stage algorithm that achieves improved regret bounds under specific assumptions.
  • Demonstrates superior performance of the proposed algorithms in experiments compared to existing methods.
Read more
Age Predictors Through the Lens of Generalization, Bias Mitigation, and Interpretability: Reflections on Causal Implications
Debdas Paul, Elisa Ferrari, Irene Gravili, Alessandro Cellerino
Theory Interpretability
  • Chronological age predictors often struggle with out-of-distribution generalization due to bias from exogenous attributes.
  • The paper introduces a framework for learning invariant representations to mitigate bias and enhance fairness.
  • An interpretable neural network model based on adversarial representation learning is proposed and evaluated.
  • Results show consistency with existing studies, reinforcing the model's predictive capabilities.
Read more
Federated Learning for Privacy-Preserving Medical AI
Tin Hoang
Federated Learning
  • Proposes a site-aware data partitioning strategy for realistic federated learning scenarios.
  • Introduces an Adaptive Local Differential Privacy mechanism to enhance privacy-utility trade-off.
  • Demonstrates that FedProx can match or exceed centralized training performance while ensuring privacy.
  • Achieves up to 80.4% accuracy in Alzheimer's classification with improved training stability.
Read more
Self-Indexing KVCache: Predicting Sparse Attention from Compressed Keys
Xu Yang, Jiapeng Zhang, Dongyang Zhao, Guo Chen, Zhuo Tang
NLP Large Language Models Efficient ML
  • Introduces a unified optimization paradigm for KV cache management that integrates compression and sparsity.
  • Develops a sign-based 1-bit vector quantization scheme for efficient token retrieval in compressed domains.
  • Eliminates the need for external indices, reducing memory overhead and improving scalability.
  • Demonstrates compatibility with existing frameworks, ensuring low latency and high performance.
Read more
Evidential Domain Adaptation for Remaining Useful Life Prediction with Incomplete Degradation
Yubo Hou, Mohamed Ragab, Yucheng Wang, Min Wu, Abdulla Alseiari, Chee-Keong Kwoh, Xiaoli Li, Zhenghua Chen
Time Series
  • EviAdapt addresses the limitations of existing domain adaptation methods in RUL prediction with incomplete degradation data.
  • The method segments data into distinct degradation stages for accurate stage-wise alignment.
  • Evidential uncertainty alignment is introduced to handle varying degradation patterns across domains.
  • Extensive experiments show EviAdapt significantly outperforms existing methods.
Read more
Introducing Feature-Based Trajectory Clustering, a clustering algorithm for longitudinal data
Marie-Pierre Sylvestre, Laurence Boulanger
Time Series
  • Introduction of Feature-Based Trajectory Clustering (FBTC) for longitudinal data.
  • Two-step methodology: feature extraction followed by clustering using Spectral Clustering.
  • Utilization of twenty trajectory measures to represent and cluster time-dependent variables.
  • Demonstration of FBTC's effectiveness on various datasets, showcasing its clustering capabilities.
Read more
Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism
Kaixuan Du, Meng Cao, Hang Zhang, Yukun Wang, Xiangzhou Huang, Ni Li
Reinforcement Learning Large Language Models NLP
  • DCRL mitigates spurious majority bias through a two-stage vote mechanism.
  • The method operates entirely without external models or supervision.
  • It generates more reliable learning signals by balancing dominant and diverse responses.
  • Extensive experiments show consistent performance improvements across multiple benchmarks.
Read more