AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

69 Papers today
8h Update frequency
7 Days of history
Towards Green Wearable Computing: A Physics-Aware Spiking Neural Network for Energy-Efficient IMU-based Human Activity Recognition
Naichuan Zheng, Hailun Xia, Zepeng Sun, Weiyi Li, Yinze Zhou
Efficient ML Time Series Robotics
  • Introduction of PAS-Net, a multiplier-free spiking neural network tailored for wearable IMU-based HAR.
  • Adaptive topology and dynamic thresholding improve energy efficiency and responsiveness to non-stationary movements.
  • Achieves state-of-the-art accuracy while reducing energy consumption by up to 98% through an early-exit mechanism.
  • Addresses limitations of traditional DNNs in terms of computational demands and latency in wearable devices.
Read more
Classification of Epileptic iEEG using Topological Machine Learning
Sunia Tanweer, Narayan Puthanmadam Subramaniyam, Firas A. Khasawneh
Time Series
  • Topological data analysis (TDA) improves classification of epileptic states from iEEG signals.
  • The study utilizes a larger dataset of 55 patients, enhancing the robustness of findings.
  • Dimension-reduced topological features achieve competitive accuracy compared to deep learning models.
  • Classical machine learning methods can effectively classify seizure states with reduced complexity.
Read more
End-to-end Automated Deep Neural Network Optimization for PPG-based Blood Pressure Estimation on Wearables
Francesco Carlucci, Giovanni Pollo, Xiaying Wang, Massimo Poncino, Enrico Macii, Luca Benini, Sara Vinco, Alessio Burrello, Daniele Jahier Pagliari
Optimization Efficient ML Time Series
  • Introduction of an automated DNN optimization pipeline for PPG-based BP estimation.
  • Achieved significant parameter reduction while maintaining accuracy suitable for wearables.
  • Models fit within stringent memory constraints of wearable devices.
  • Patient-specific fine-tuning can greatly enhance model accuracy.
Read more
K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data
Zesheng Liu, Maryam Rahnemoonfar
Graph Learning Time Series Efficient ML
  • K-STEMIT combines geometric spatial learning with temporal convolution for improved thickness estimation.
  • The model incorporates physical data to enhance predictions and reduce noise sensitivity.
  • Adaptive feature fusion dynamically integrates features from multiple branches, improving accuracy.
  • K-STEMIT outperforms existing methods in both knowledge-informed and non-knowledge-informed settings.
Read more
Uncertainty Quantification in CNN Through the Bootstrap of Convex Neural Networks
Hongfei Du, Emre Barut, Fang Jin
Theory Efficient ML Computer Vision
  • Introduces a novel bootstrap framework for uncertainty quantification in CNNs.
  • Establishes theoretical consistency for predictions from bootstrap convex neural networks.
  • Integrates transfer learning to extend applicability to arbitrary neural networks.
  • Demonstrates superior performance compared to existing UQ methods on multiple datasets.
Read more
Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees
Zhuolun Dong, Junyu Cao
Large Language Models Theory Efficient ML
  • Proposes a flow-control framework for LLM inference to enhance stability and performance.
  • Derives a necessary condition for system stability related to workload and memory capacity.
  • Introduces a new scheduling algorithm that manages request activation to prevent memory overflow.
  • Demonstrates superior performance in empirical tests against benchmark algorithms.
Read more
Disposition Distillation at Small Scale: A Three-Arc Negative Result
Hari Sadasivan
NLP Large Language Models Interpretability
  • Three independent methods for instilling behavioral dispositions in small language models failed without damaging content quality.
  • Initial positive results were falsified upon re-evaluation, demonstrating the importance of rigorous testing protocols.
  • A new taxonomy of failure modes for linear probes is introduced, highlighting the challenges of behavioral editing at small scales.
  • The study reveals a significant decoupling of confidence and correctness in model outputs, raising concerns about trustworthiness.
Read more
Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
Hongkang Li, Hancheng Min, Rene Vidal
Generative Models Theory Optimization
  • First convergence analysis for transformer-based diffusion models under the DDPM framework.
  • Quantitative characterization of training dynamics and convergence requirements for multi-token Gaussian mixture data.
  • Demonstration that transformers can learn to approximate the oracle MMSE estimator for denoising tasks.
  • Validation of theoretical results through numerical experiments.
Read more
Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks
Amar Gahir, Varshil Patel, Shreyank N Gowda
Efficient ML
  • Adaptive Data Dropout dynamically adjusts training data based on model performance.
  • The method reduces effective training steps while maintaining competitive accuracy.
  • It introduces a feedback-driven approach to data selection, contrasting with fixed schedules.
  • The framework is simple and compatible with existing model architectures and optimization procedures.
Read more
Active Inference with a Self-Prior in the Mirror-Mark Task
Dongmin Kim, Hoshinori Kanazawa, Yasuo Kuniyoshi
Robotics Multimodal Theory
  • Introduces the concept of a self-prior that enables self-recognition behavior without external rewards.
  • Demonstrates that a simulated infant can identify and remove a sticker on its face using only visual and proprioceptive inputs.
  • Confirms that expected free energy decreases significantly after the sticker is removed, indicating effective self-recognition.
  • Suggests that the free energy principle can unify various theories on the development of self-awareness.
Read more
Decentralized Learning via Random Walk with Jumps
Zonghong Liu, Matthew Dwyer, Salim El Rouayheb
Federated Learning Optimization Theory
  • Introduces a decentralized learning framework using random walks for model propagation.
  • Identifies and addresses the 'entrapment' phenomenon in weighted random-walk learning.
  • Proposes Metropolis–Hastings with LΓ©vy Jumps (MHLJ) to enhance exploration in the network.
  • Establishes a convergence rate that factors in data heterogeneity and network characteristics.
Read more
Offline-Online Reinforcement Learning for Linear Mixture MDPs
Zhongjun Zhang, Sean R. Sinclair
Reinforcement Learning Theory
  • Introduction of the O-O UCRL-VTR algorithm for offline-online learning in linear mixture MDPs.
  • Establishment of regret bounds that characterize the conditions for beneficial offline data usage.
  • Demonstration of the algorithm's ability to adaptively leverage offline data based on its informativeness.
  • Identification of sufficient conditions for offline data to be informative, including sample size and environment shift.
Read more
XANE(3): An E(3)-Equivariant Graph Neural Network for Accurate Prediction of XANES Spectra from Atomic Structures
Vitor F. Grizzi, Luke N. Pretzie, Jiayi Xu, Cong Liu
Graph Learning
  • XANE(3) is an E(3)-equivariant graph neural network specifically designed for predicting XANES spectra.
  • The model employs a composite training objective that enhances spectral fidelity through derivative matching.
  • Evaluation on a large dataset yielded a low mean squared error, demonstrating high accuracy in spectral reproduction.
  • Ablation studies reveal the importance of various model components in improving performance.
Read more
A Hybrid Intelligent Framework for Uncertainty-Aware Condition Monitoring of Industrial Systems
Maryam Ahang, Todd Charter, Masoud Jalayer, Homayoun Najjaran
Time Series
  • Hybrid approaches combining data-driven and physics-based methods improve condition monitoring reliability.
  • Two integration strategies (feature-level fusion and model-level ensemble) are proposed and evaluated.
  • The model-level ensemble approach achieved a 2.9% improvement in diagnostic accuracy over the best baseline.
  • Conformal prediction enhances uncertainty management and provides well-calibrated prediction sets.
Read more
Calibration-Aware Policy Optimization for Reasoning LLMs
Ziqi Wang, Xingzhou Lou, Meiqi Wu, Zhengqi Wen, Junge Zhang
NLP Large Language Models Reinforcement Learning Optimization
  • Introduces Calibration-Aware Policy Optimization (CAPO) to address overconfidence in LLMs.
  • Proves that GRPO-style algorithms degrade calibration due to uncertainty-agnostic advantage estimation.
  • Demonstrates significant calibration improvements (up to 15%) without sacrificing accuracy.
  • Achieves better performance on downstream tasks with a 5% accuracy boost.
Read more
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang
Optimization Efficient ML
  • Introduction of TCL framework for cross-hardware tensor program optimization.
  • Utilization of RDU Sampler for efficient data collection and model accuracy retention.
  • Development of a Mamba-based cost model for improved performance prediction.
  • Implementation of continuous knowledge distillation for effective knowledge transfer.
Read more
Multi-Head Residual-Gated DeepONet for Coherent Nonlinear Wave Dynamics
Zhiwei Fan, Yiming Pan, Daniel Coca
Theory
  • Introduces a new paradigm for modeling coherent nonlinear wave dynamics using a dual-pathway approach.
  • Combines a standard DeepONet state pathway with a parallel conditioning pathway for physical descriptors.
  • Utilizes a low-rank multi-head mechanism to capture multiple response patterns efficiently.
  • Achieves lower prediction errors and better fidelity in dynamical quantities compared to traditional methods.
Read more
bacpipe: a Python package to make bioacoustic deep learning models accessible
Vincent S. Kather, Sylvain Haupert, Burooj Ghani, Dan Stowell
Audio & Speech
  • Bacpipe streamlines the use of bioacoustic deep learning models for ecological research.
  • The package allows for the generation of acoustic embeddings and classifier predictions.
  • Interactive visualizations and evaluation tools enhance user experience and model comparison.
  • Bacpipe targets a wide audience, making advanced bioacoustic analysis accessible to diverse researchers.
Read more
Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task
Alicia Curth, Rachel Lawrence, Sushrut Karmalkar, Niranjani Prasad
NLP Large Language Models Theory
  • Transformers may adaptively use their depth based on task difficulty, particularly in relational reasoning tasks.
  • Pretrained models show limited evidence of adaptive depth use, while fine-tuned models exhibit clearer patterns.
  • Less constrained fine-tuning regimes lead to stronger evidence of adaptive depth use in transformers.
  • The study employs logit lens and causal patching to analyze model behavior across layers and tasks.
Read more
Models Know Their Shortcuts: Deployment-Time Shortcut Mitigation
Jiayi Li, Shijie Tang, GΓΌn Kaynar, Shiyi Du, Carl Kingsford
NLP Large Language Models
  • Introduces SHORTCUT GUARDRAIL, a deployment-time framework for mitigating shortcut learning in NLP models.
  • Utilizes gradient-based attribution to identify shortcut tokens without requiring training data or annotations.
  • Employs a lightweight LoRA-based debiasing module trained via Masked Contrastive Learning.
  • Demonstrates substantial improvements in accuracy and robustness across multiple NLP tasks.
Read more
Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting
Abeer Mostafa, Raneen Younis, Zahra Ahmadi
Graph Learning Time Series
  • Introduction of the first dynamic sheaf-based formulation for spatio-temporal forecasting.
  • Development of a dynamic sheaf diffusion operator that captures heterogeneous interactions efficiently.
  • Demonstration of significant improvements over existing spatio-temporal GNN models across multiple domains.
  • Mitigation of oversmoothing in deep GNN architectures through locally heterogeneous restriction maps.
Read more
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
Yecheng Wu, Song Han, Hai Cai
Large Language Models Efficient ML Reinforcement Learning
  • Introduces Lightning OPD, an offline on-policy distillation framework for large reasoning models.
  • Identifies 'teacher consistency' as a critical condition for effective OPD, preventing suboptimal convergence.
  • Demonstrates that Lightning OPD can achieve state-of-the-art performance with significantly reduced training time.
  • Eliminates the need for a live teacher server, lowering infrastructure costs for academic research.
Read more
Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
Zixuan Liu, Xiaolin Sun, Zizhan Zheng
Reinforcement Learning Optimization Interpretability
  • Introduces a robust policy optimization framework to mitigate reward hacking in RL.
  • Formulates reward hacking as a max-min problem, optimizing against the worst-case proxy reward.
  • Demonstrates improved performance and robustness over existing methods like ORPO.
  • Incorporates prior knowledge of true rewards for enhanced interpretability.
Read more
How Transformers Learn to Plan via Multi-Token Prediction
Jianhao Huang, Zhanpeng Zhou, Renqiu Xia, Baharan Mirzasoleiman, Weijie Su, Wei Huang
NLP Large Language Models Theory
  • MTP consistently outperforms NTP in reasoning tasks, particularly in planning.
  • Theoretical analysis reveals a two-stage reverse reasoning process facilitated by MTP.
  • MTP provides a cleaner training signal through gradient decoupling, enhancing model performance.
  • The study highlights the importance of training objectives in developing reasoning capabilities in language models.
Read more
INCRT: An Incremental Transformer That Determines Its Own Architecture
Giansalvo Cirrincione
Theory Efficient ML NLP
  • INCRT dynamically adjusts its architecture during training, starting with a single attention head.
  • The model adds heads based on a geometric criterion, ensuring minimal redundancy and sufficient capacity.
  • Two foundational theorems support the architecture's design and performance guarantees.
  • Experimental results show INCRT can match or exceed BERT-base performance with fewer parameters.
Read more
Distributionally Robust K-Means Clustering
Vikrant Malik, Taylan Kargin, Babak Hassibi
Optimization Theory Efficient ML
  • Introduces a distributionally robust K-means algorithm that mitigates the impact of outliers and distribution shifts.
  • Utilizes Wasserstein-2 distance to define a family of distributions for robust clustering.
  • Develops a block coordinate descent algorithm with provable convergence properties.
  • Demonstrates substantial improvements in clustering performance on synthetic and real-world datasets.
Read more
SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation
Yexiong Lin, Jia Shi, Shanshan Ye, Wanyu Wang, Yu Yao, Tongliang Liu
Generative Models Computer Vision Efficient ML
  • SubFlow eliminates averaging distortion in flow matching by conditioning on sub-mode indices.
  • The method enhances diversity in generated samples, addressing the common issue of mode collapse.
  • SubFlow is plug-and-play, allowing integration with existing generative models without modifications.
  • Extensive experiments show improved diversity (Recall) and competitive image quality (FID) on ImageNet-256.
Read more
OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension
Zhiyuan Zhang, Yanzhao Li, Zhiqiang Zou, Bai Du, Yupeng Sun, Hui Dong, Hui Wang
NLP Large Language Models Efficient ML
  • Introduces OSC, a framework for efficient outlier suppression in 4-bit quantization.
  • Demonstrates a token-persistent structural clustering effect of outliers in LLMs.
  • Implements a hybrid-precision strategy to enhance accuracy in low-clustering regions.
  • Achieves a peak speedup of 1.78Γ— over W8A8 GEMM baseline on AI accelerators.
Read more
Adaptive Budget Allocation in LLM-Augmented Surveys
Zikun Ye, Jiameng Lyu, Rui Tao
Large Language Models Optimization Theory
  • Proposes an adaptive algorithm for budget allocation in LLM-augmented surveys.
  • Algorithm learns question difficulty in real-time, improving efficiency of human labeling.
  • Reduces budget waste significantly compared to uniform allocation methods.
  • No prior knowledge of LLM accuracy is required for effective implementation.
Read more
Socrates Loss: Unifying Confidence Calibration and Classification by Leveraging the Unknown
Sandra GΓ³mez-GΓ‘lvez, Tobias Olenyi, Gillian Dobbie, Katerina TaΕ‘kova
Theory Optimization
  • Socrates Loss unifies classification and confidence calibration by incorporating an auxiliary unknown class.
  • The method addresses the stability-performance trade-off seen in existing calibration techniques.
  • Theoretical guarantees confirm that Socrates Loss regularizes model weights to prevent miscalibration.
  • Empirical results demonstrate improved training stability and faster convergence compared to traditional methods.
Read more
Towards Autonomous Mechanistic Reasoning in Virtual Cells
Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi
Large Language Models Graph Learning Interpretability
  • Introduction of a structured explanation formalism for biological reasoning in virtual cells.
  • Development of VCR-Agent, a multi-agent framework for generating and validating mechanistic reasoning.
  • Release of the VC-Traces dataset containing verified mechanistic explanations.
  • Empirical evidence showing improved factual precision and effectiveness in gene expression prediction.
Read more
TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting
Fan Zhang, Shiming Fan, Hua Wang
Large Language Models Time Series Multimodal
  • Introduction of a hierarchical asynchronous fusion strategy that decouples unimodal encoding from cross-modal interaction.
  • Development of TimeSAF, which includes a cross-modal semantic fusion trunk and a stage-wise semantic refinement decoder.
  • Demonstration of superior performance on multiple long-term forecasting benchmarks compared to existing methods.
  • Effective handling of semantic perceptual dissonance in time series forecasting.
Read more
A Mechanistic Analysis of Looped Reasoning Language Models
Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong
NLP Large Language Models Theory
  • Looped language models tend to cyclic fixed-point behavior, leading to stable attention patterns.
  • Recurrent blocks learn stages of inference that mirror those of feedforward models.
  • Architectural choices significantly influence the emergence and stability of cyclic fixed points.
  • Empirical evidence shows that models self-organize into distinct inference stages during training.
Read more
A Layer-wise Analysis of Supervised Fine-Tuning
Qinghua Zhao, Xueling Gong, Xinyu Chen, Zhongfeng Kang, Xinlu Li
NLP Large Language Models Efficient ML
  • SFT incurs risks of catastrophic forgetting, particularly in final layers of LLMs.
  • A depth-dependent adaptation pattern was identified, with middle layers being stable and final layers sensitive.
  • Mid-Block Efficient Tuning selectively updates intermediate layers, leading to improved performance.
  • The proposed method outperforms standard LoRA techniques, demonstrating the importance of architectural locality in alignment.
Read more
Generative Path-Finding Method for Wasserstein Gradient Flow
Chengyu Liu, Xiang Zhou
Generative Models Theory Optimization
  • Introduces GenWGP, a generative framework for Wasserstein gradient flows.
  • Addresses limitations of existing numerical methods in terms of efficiency and adaptability.
  • Utilizes a path loss function derived from geometric action functional for mass transportation.
  • Achieves high accuracy with fewer discretization points compared to traditional methods.
Read more
GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support
Muhammad Umer Sheikh, Khawar Shehzad, Salman Khan, Fahad Shahbaz Khan, Muhammad Haris Khan
NLP Large Language Models Multimodal
  • Introduction of GCA-DS, a Gulf-focused multimodal dataset with 200k Q&A pairs.
  • Development of the Gulf Climate Agent (GCA) that integrates LLMs with climate-specific tools.
  • Demonstrated significant improvements in model reliability through domain fine-tuning and tool integration.
  • Addresses the unique climate challenges faced by the Gulf region with tailored solutions.
Read more
PubSwap: Public-Data Off-Policy Coordination for Federated RLVR
Anupam Nayak, Baris Askin, Muhammed Ustaomeroglu, Carlee Joe-Wong, Gauri Joshi
Reinforcement Learning Federated Learning Large Language Models
  • Introduces PubSwap, a federated RLVR framework that enhances communication efficiency.
  • Utilizes LoRA for local adaptation and public data for off-policy coordination.
  • Maintains privacy by using public datasets to align client models without sharing private data.
  • Demonstrates significant performance improvements in reasoning tasks across multiple domains.
Read more
Monte Carlo Stochastic Depth for Uncertainty Estimation in Deep Learning
Adam T. MΓΌller, Tobias RΓΆgelein, Nicolaj C. Stache
Computer Vision Theory Efficient ML
  • MCSD is theoretically grounded in Bayesian variational inference.
  • The first empirical benchmark of MCSD for object detection is presented.
  • MCSD shows competitive predictive accuracy and improved uncertainty calibration compared to MCD.
  • The method is compatible with multiple DNN architectures that utilize skip-connections.
Read more
Learning Discrete Diffusion of Graphs via Free-Energy Gradient Flows
Dario Rancati, Jan Maas, Francesco Locatello
Graph Learning Generative Models Theory
  • Introduces a new metric WK for discrete probability distributions, facilitating the application of gradient flow concepts.
  • Develops a practical learning methodology for discrete diffusion dynamics based on first-order optimality conditions.
  • Demonstrates significant improvements in training speed and performance over existing methods for learning Markov Jump Process dynamics.
  • Provides a lightweight training loop that does not require individual sample trajectories, enhancing computational efficiency.
Read more
Algorithmic Analysis of Dense Associative Memory: Finite-Size Guarantees and Adversarial Robustness
Madhava Gaikwad
Theory
  • Introduces finite-size guarantees for Dense Associative Memory (DAM) retrieval dynamics.
  • Establishes geometric convergence rates and adversarial robustness bounds.
  • Demonstrates capacity scaling of Θ(N nβˆ’1) for DAM under specific conditions.
  • Provides a potential-game interpretation of retrieval dynamics ensuring convergence.
Read more
Active Bayesian Inference for Robust Control under Sensor False Data Injection Attacks
Axel Andersson, GyΓΆrgy DΓ‘n
Robotics Graph Learning Optimization
  • Introduces a bipartite graph model for sensor perception pipelines enabling Bayesian inference over sensor attack states.
  • Proposes the LASE-AD algorithm to maintain beliefs about sensor integrity and selectively disable compromised sensors.
  • Develops an active probing strategy that increases the distinguishability of attack hypotheses by exploiting system nonlinearities.
  • Demonstrates superior performance of the proposed method in experiments compared to traditional outlier-robust and prediction-based approaches.
Read more
Continuous-time Online Learning via Mean-Field Neural Networks: Regret Analysis in Diffusion Environments
Erhan Bayraktar, Bingyan Han, Ziqing Zhang
Theory Optimization Time Series
  • Introduces a continuous-time online learning framework using mean-field neural networks.
  • Establishes regret bounds for both mean-field limits and finite-particle systems.
  • Utilizes advanced mathematical techniques such as logarithmic Sobolev inequality and Malliavin calculus.
  • Demonstrates the impact of network architecture and regularization on learning performance through simulations.
Read more
Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
Xinyu Peng, Ziyang Zheng, Wenrui Dai, Duoduo Xue, Shaohui Li, Chenglin Li, Junni Zou, Hongkai Xiong
Optimization Computer Vision Theory
  • Introduces an information-theoretic perspective to optimize task-adapted CS-MRI.
  • Addresses uncertainty in medical diagnoses through probabilistic inference.
  • Enables adaptive sampling and flexible control of sampling ratios.
  • Demonstrates competitive performance on MRI datasets compared to existing methods.
Read more
Clustering-Enhanced Domain Adaptation for Cross-Domain Intrusion Detection in Industrial Control Systems
Luyao Wang
Theory
  • Proposes a novel clustering-enhanced domain adaptation method for intrusion detection in ICS.
  • Utilizes a feature-based transfer learning module for effective cross-domain detection.
  • Implements a clustering enhancement strategy to improve correlation estimation and reduce tuning issues.
  • Achieves significant improvements in detection accuracy and stability over baseline models.
Read more
Stress Detection Using Wearable Physiological and Sociometric Sensors
Oscar Martinez Mozos, Virginia Sandulescu, Sally Andrews, David Ellis, Nicola Bellotto, Radu Dobrescu, Jose Manuel Ferrandez
Multimodal
  • Integration of physiological and sociometric data improves stress detection accuracy.
  • Personalized classifiers are essential due to individual variability in stress responses.
  • The study demonstrates the feasibility of real-time stress monitoring using wearable technology.
  • Combination of sensor modalities is a novel approach in stress detection research.
Read more
Loop Corrections to the Training and Generalization Errors of Random Feature Models
Taeyoung Kim
Theory
  • Development of a perturbative framework for random feature models that incorporates higher-order fluctuation statistics.
  • Derivation of explicit loop expansions for training error, test error, and generalization gap, revealing richer finite-width structures.
  • Identification of scaling laws for correction terms, distinguishing between Gaussian and non-Gaussian effects.
  • Experimental validation of theoretical predictions, confirming the accuracy of the loop-based description.
Read more
From Recency Bias to Stable Convergence Block Kaczmarz Methods for Online Preference Learning in Matchmaking Applications
James Nguyen
Optimization Theory Efficient ML
  • Introduces Tikhonov-regularized projection to mitigate recency bias in preference learning.
  • Develops Block Kaczmarz variants that enhance performance in matchmaking applications.
  • Demonstrates superior alignment and stability of the Block-NK method through extensive simulations.
  • Analyzes the effects of adaptive candidate filtering on preference alignment.
Read more
INTARG: Informed Real-Time Adversarial Attack Generation for Time-Series Regression
Gamze Kirman Tokgoz, Onat Gungor, Tajana Rosing, Baris Aksanli
Time Series
  • Introduces INTARG, a selective adversarial attack framework for time-series forecasting.
  • Operates under an online bounded-buffer setting, reflecting real-world constraints.
  • Employs a confidence-aware strategy to maximize the impact of fewer perturbations.
  • Achieves up to a 2.42Γ— increase in prediction error on power-related datasets.
Read more
PrivEraserVerify: Efficient, Private, and Verifiable Federated Unlearning
Parthaw Goswami, Md Khairul Islam, Ashfak Yeafi
Federated Learning Efficient ML Theory
  • PEV is the first framework to integrate efficiency, privacy, and verifiability in federated unlearning.
  • Adaptive checkpointing allows for fast model reconstruction without full retraining.
  • Layer-adaptive differential privacy ensures statistical indistinguishability while minimizing accuracy loss.
  • Fingerprint-based verification enables decentralized confirmation of unlearning effects.
Read more
An Optimal Sauer Lemma Over k-ary Alphabets
Steve Hanneke, Qinglin Meng, Shay Moran, Amirreza Shaeiri
Theory
  • Establishes a sharp Sauer inequality for multiclass and list prediction based on the DS dimension.
  • Improves upon existing Natarajan dimension bounds, particularly for k > 2.
  • Introduces optimal polynomial dependence on list size and better dependence on alphabet size.
  • Utilizes the polynomial method for proof, highlighting a gap in combinatorial proof techniques in the DS setting.
Read more
Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design
Leon Eshuijs, Shihan Wang, Aanske Fokkens
Reinforcement Learning Large Language Models NLP
  • Model size can reduce harmful misalignment in some environments but increase it in others, depending on environmental design.
  • Environmental features like role framing and gameability cues significantly influence the direction of harmful exploitation.
  • Existing safety benchmarks are poor predictors of RL-induced misalignment, with limited correlation to actual harmful behaviors.
  • On-policy RL preserves a safety buffer that is lost in off-policy training settings.
Read more
Subcritical Signal Propagation at Initialization in Normalization-Free Transformers
Sergey Alekseev
Theory
  • APJN serves as a critical measure for understanding signal propagation in transformers.
  • Pre-LayerNorm transformers exhibit power-law APJN growth, while normalization-free transformers show stretched-exponential growth.
  • Dynamic Tanh and Dynamic erf architectures are more sensitive to initialization and hyperparameters.
  • The study provides empirical evidence supporting the theoretical predictions regarding training stability.
Read more
A Temporally Augmented Graph Attention Network for Affordance Classification
Ami Chopra, Supriya Bordoloi, Shyamanta M. Hazarika
Graph Learning Time Series
  • Introduction of EEG-tGAT, a temporally augmented GAT for affordance classification.
  • Incorporation of temporal attention and dropout to enhance model performance.
  • Demonstrated improved classification accuracy over traditional GATv2.
  • Findings suggest that temporal modeling is essential for effective affordance classification.
Read more
Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores
Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania
Optimization Efficient ML Large Language Models
  • AILFM leverages Active Imitation Learning to optimize thermal management in 3D S-NUCA systems.
  • The framework accounts for core-level performance heterogeneity and kernel-specific behaviors of LFMs.
  • AILFM outperforms state-of-the-art thermal management approaches with minimal runtime overhead.
  • The proposed method generalizes well across diverse LFM workloads, enhancing inference efficiency.
Read more
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations
Benzhao Tang, Shiyu Yang
Efficient ML
  • CLAD is the first framework to perform log anomaly detection directly on compressed byte streams.
  • The architecture integrates a dilated convolutional encoder and a hybrid Transformer–mLSTM for effective anomaly detection.
  • A two-stage training strategy is implemented to handle severe class imbalance in the data.
  • CLAD achieves a state-of-the-art average F1-score of 0.9909 across five datasets.
Read more
Belief-State RWKV for Reinforcement Learning under Partial Observability
Liu Xiao
Reinforcement Learning
  • Introduces a belief-state variant of RL using RWKV-style models that incorporates uncertainty into decision-making.
  • The belief state consists of two components: a location statistic (Β΅t) and an uncertainty statistic (Ξ£t).
  • Pilot experiments show that the proposed method nearly matches the best recurrent baseline while improving performance under specific conditions.
  • Ablation studies indicate that the simple belief-state readout is more effective than more complex alternatives.
Read more
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms
Mainak Kundu, Catherine Chen, Rifatul Islam, Ismail Uysal, Ria Kanjilal
Interpretability Time Series Multimodal
  • Introduces a unified framework for explainability in HAR, separating conceptual dimensions from algorithmic mechanisms.
  • Presents a mechanism-centric taxonomy of XAI-HAR methods, covering major explanation paradigms.
  • Highlights the complexities of HAR, including temporal, multimodal, and semantic challenges.
  • Identifies gaps in existing literature and proposes directions for future research in XAI-HAR.
Read more
Parcae: Scaling Laws For Stable Looped Language Models
Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu
NLP Large Language Models Efficient ML
  • Parcae stabilizes looped architectures by constraining spectral norms of injection parameters.
  • The model achieves up to 6.3% lower validation perplexity compared to previous looped models.
  • Scaling laws derived indicate that looping can be an effective method for increasing training and test-time FLOPs.
  • Parcae outperforms parameter-matched Transformers by significant margins on quality benchmarks.
Read more
The Diffusion-Attention Connection
Julio Candanedo
Theory Generative Models Optimization
  • Introduces a unified framework connecting Transformers, diffusion maps, and magnetic Laplacians through QK bidivergence.
  • Reinterprets attention mechanisms in terms of divergences and Markov operators, expanding their theoretical foundation.
  • Demonstrates the application of product-of-experts and SchrΓΆdinger-bridges to organize various dynamics in machine learning.
  • Highlights the significance of raw query-key scores as a primary object of study for advancing neural computation.
Read more
Battery health prognosis using Physics-informed neural network with Quantum Feature mapping
Muhammad Imran Hossain, Md Fazley Rafy, Sarika Khushlani Solanki, Anurag K. Srivastava
Theory Optimization Time Series
  • Introduction of Quantum Feature Mapping (QFM) to enhance feature extraction for battery SOH estimation.
  • Development of a physics-informed neural network (QPINN) that is model-independent and adaptable to various battery chemistries.
  • Demonstrated superior SOH estimation accuracy of 99.46% on a large-scale dataset.
  • Significant reductions in MAPE and RMSE compared to existing methods.
Read more
CycloneMAE: A Scalable Multi-Task Learning Model for Global Tropical Cyclone Probabilistic Forecasting
Renlong Hang, Zihao Xu, Jiuwei Zhao, Runling Yu, Leye Cheng, Qingshan Liu
Multimodal Time Series Interpretability
  • CycloneMAE addresses the limitations of traditional NWP and existing deep learning models in TC forecasting.
  • The model uses a TC structure-aware masked autoencoder to learn from multi-modal data.
  • It provides both deterministic and probabilistic forecasts, enhancing uncertainty estimation.
  • CycloneMAE outperforms leading NWP systems in forecasting accuracy across multiple variables.
Read more
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh, Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao, Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Buvaneswari Mani, Carlo del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien, Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar
Large Language Models Reinforcement Learning Efficient ML
  • Introduces Nemotron 3 Super, a 120 billion parameter hybrid MoE model.
  • First model to utilize LatentMoE architecture for improved accuracy and efficiency.
  • Pre-trained on 25 trillion tokens with a focus on diverse and high-quality data.
  • Achieves significantly higher inference throughput compared to leading models.
Read more
Do VLMs Truly "Read" Candlesticks? A Multi-Scale Benchmark for Visual Stock Price Forecasting
Kaiqi Hu, Linda Xiao, Shiyue Xu, Ziyi Tang, Mingwen Liu
Multimodal Time Series
  • Development of a multi-scale candlestick chart dataset for evaluating VLMs.
  • Introduction of a comprehensive evaluation framework combining confusion matrix analysis and information coefficient metrics.
  • Identification of VLMs' strong performance in trending markets but weaknesses in volatile conditions.
  • Highlighting significant prediction biases and limitations in temporal reasoning of VLMs.
Read more
Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling
Xiaoxiao Liang, Juyuan Zhang, Liming Pan, Linyuan LΓΌ
Graph Learning Time Series Interpretability
  • COSINE framework enables joint optimization of latent graph structures and dynamical equations.
  • Sparse symbolic message passing enhances structural identifiability and prevents over-parameterization.
  • LLM-guided library evolution allows for adaptive adjustment of symbolic libraries without system-specific templates.
  • Extensive experiments show COSINE achieves state-of-the-art performance in relational inference.
Read more
ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism
Alan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, Steffen Cruz
Efficient ML Large Language Models Optimization
  • Introduction of ResBM, achieving 128Γ— activation compression.
  • End-to-end trainable architecture designed for low-bandwidth environments.
  • No degradation in convergence rates compared to uncompressed models.
  • Empirical analysis of optimizer effects on activation compressibility.
Read more
When Can You Poison Rewards? A Tight Characterization of Reward Poisoning in Linear MDPs
Jose Efraim Aguilar Escamilla, Haoyang Hong, Jiawei Li, Haoyu Zhao, Xuezhou Zhang, Sanghyun Hong, Huazheng Wang
Reinforcement Learning Theory Optimization
  • Establishes necessary and sufficient conditions for reward poisoning in linear MDPs.
  • Introduces a convex quadratic program (CQP) to determine attackability of RL instances.
  • Develops budget-efficient white-box and black-box attack methods.
  • Empirical validation shows the framework's predictive power in real-world RL tasks.
Read more
Agentic Control in Variational Language Models
Yves Ruffenach
NLP Large Language Models Generative Models
  • Introduces a variational language modeling framework that leverages internal signals for actionable control.
  • Proposes a homeostatic regulator to maintain a healthy latent regime during training.
  • Defines a checkpoint retention rule based on task quality and internal structural integrity.
  • Demonstrates that a calibrated uncertainty-aware controller can implement minimal agentic control during inference.
Read more
A unified data format for managing diabetes time-series data: DIAbetes eXchange (DIAX)
Elliott C. Pryor, Marc D. Breton, Anas El Fathi
Time Series
  • DIAX standardizes diabetes time-series data in a JSON format to enhance interoperability.
  • It addresses format heterogeneity, allowing for easier integration and analysis of diverse datasets.
  • The open-source repository includes tools for dataset conversion and analysis, promoting community engagement.
  • DIAX supports major datasets, ensuring compatibility with existing standardization efforts.
Read more
Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models
Yi Xiong, Liang Xiong, Xiaohong Ji, Sen Yang, Zhifeng Gao, Huaimin Wang, Kele Xu
NLP Large Language Models Optimization
  • Introduction of Scaffold-Conditioned Preference Triplets (SCPT) for molecular optimization.
  • Utilization of a pretrained LLM as a conditional editor to facilitate scaffold-preserving edits.
  • Demonstrated improvements in optimization success and property gains while maintaining scaffold similarity.
  • Effective generalization from single- and two-property tasks to three-property evaluations.
Read more