AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

64 Papers today
8h Update frequency
7 Days of history
Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth
Michael Chertkov
Robotics Theory Efficient ML
  • Introduces a stochastic memory framework for continual learning that avoids catastrophic forgetting.
  • Utilizes a three-step CAS recursion to incorporate new experiences efficiently.
  • Demonstrates linear scaling of memory retention half-life with the segment budget.
  • Provides an analytical model for studying forgetting mechanisms in continual learning.
Read more
Model-Based Reinforcement Learning for Control under Time-Varying Dynamics
Klemens Iten, Bruce Lee, Chenhao Li, Lenart Treven, Andreas Krause, Bhavya Sukhija
Reinforcement Learning Robotics Theory
  • Introduces a framework for MBRL that accommodates time-varying dynamics.
  • Develops two algorithms (R-OMBRL and SW-OMBRL) that use adaptive data buffers.
  • Establishes theoretical guarantees for dynamic regret in the context of non-stationarity.
  • Demonstrates improved performance on continuous control benchmarks.
Read more
Generalization Bounds and Statistical Guarantees for Multi-Task and Multiple Operator Learning with MNO Networks
Adrien Weihs, Hayden Schaeffer
Theory Multimodal Efficient ML
  • Introduces a covering-number-based generalization analysis for multiple operator learning.
  • Derives explicit metric-entropy bounds for hypothesis classes based on deep ReLU subnetworks.
  • Establishes an approximation-estimation tradeoff for expected test errors on unseen data.
  • Clarifies the impact of hierarchical sampling budgets on generalization performance.
Read more
Learning ECG Image Representations via Dual Physiological-Aware Alignments
Hung Manh Pham, Jialu Tang, Aaqib Saeed, Dong Ma, Bin Zhu, Zhou Pan
Multimodal Time Series Computer Vision
  • Introduces ECG-Scan, a self-supervised framework for ECG image analysis.
  • Utilizes dual physiological-aware alignments for improved representation learning.
  • Demonstrates superior performance of image-based models compared to existing baselines.
  • Addresses the gap between ECG image and signal analysis.
Read more
ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor
Yixiao Wang, Ting Jiang, Zishan Shao, Hancheng Ye, Jingwei Sun, Mingyuan Ma, Jianyi Zhang, Yiran Chen, Hai Li
Generative Models Efficient ML
  • ZEUS uses a second-order predictor to reduce denoiser evaluations effectively.
  • The interleaved caching scheme stabilizes predictions and prevents error amplification.
  • ZEUS achieves significant speedups (up to 3.2Γ—) while maintaining high sample fidelity.
  • The method is compatible with various model architectures and requires minimal integration effort.
Read more
Auction-Based Online Policy Adaptation for Evolving Objectives
Guruprerana Shabadi, Kaushik Mallik
Reinforcement Learning Robotics Optimization
  • Introduces a modular framework for multi-objective reinforcement learning with dynamic objectives.
  • Utilizes an auction-based mechanism for policy coordination, allowing for interpretable trade-offs among objectives.
  • Demonstrates superior performance compared to monolithic policies in dynamic environments.
  • Enhances interpretability by allowing identification of the active policy and its objective.
Read more
Improving Latent Generalization Using Test-time Compute
Arslan Chaudhry, Sridhar Thiagarajan, Andrew Lampinen
NLP Large Language Models Reinforcement Learning
  • Introduces test-time compute as a method to enhance latent generalization in LLMs.
  • Demonstrates that models trained to produce chains-of-thought can generalize effectively to both in-distribution and out-of-distribution knowledge.
  • Identifies limitations in the performance of thinking models on pure reversal tasks, highlighting challenges in factual self-verification.
  • Shows that thinking models outperform traditional train-time augmentation methods in terms of flexibility and generalization.
Read more
Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling
Deeptanshu Malu, Deevyanshu Malu, Aditya Nemiwal, Sunita Sarawagi
Large Language Models NLP Theory
  • Inter-example similarity is crucial for the emergence of ICL during fine-tuning.
  • Contrastive-Context effectively samples examples across varying similarity levels to enhance ICL and IWL.
  • The method shows consistent improvements in accuracy across diverse tasks and models.
  • Theoretical analysis reveals the importance of inter and intra-context contrasts for effective learning.
Read more
Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
Vikram Krishnamurthy, Luke Snow
Reinforcement Learning Theory Optimization
  • Introduces a novel Langevin-based algorithm for adaptive inverse reinforcement learning.
  • Utilizes Malliavin calculus to efficiently estimate counterfactual gradients.
  • Overcomes limitations of traditional Monte Carlo methods in estimating gradients conditioned on zero probability events.
  • Achieves optimal convergence rates without the need for resampling or kernel smoothing.
Read more
Neural network methods for two-dimensional finite-source reflector design
Roel Hacking, Lisa Kusch, Koondanibha Mitra, Martijn Anthonissen, Wilbert IJzerman
Optimization
  • Introduces a neural network approach for designing reflectors from finite light sources.
  • Develops two differentiable objective functions for optimization.
  • Demonstrates faster convergence and lower error rates compared to traditional deconvolution methods.
  • Handles height constraints effectively within the design process.
Read more
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
Yuejiang Liu, Fan Feng, Lingjing Kong, Weifeng Lu, Jinzhou Tang, Kun Zhang, Kevin Murphy, Chelsea Finn, Yilun Du
Reinforcement Learning Robotics Efficient ML
  • WAV enables world models to self-improve by verifying their own prediction errors.
  • The framework decomposes state prediction into state plausibility and action reachability.
  • WAV achieves 2Γ— higher sample efficiency and improves policy performance by over 18% across multiple tasks.
  • The method leverages abundant action-free data and lower-dimensional action-relevant features for verification.
Read more
Soft MPCritic: Amortized Model Predictive Value Iteration
Thomas Banker, Nathan P. Lawrence, Ali Mesbah
Reinforcement Learning Robotics Optimization
  • Soft MPCritic combines RL and MPC, leveraging their complementary strengths.
  • The framework operates entirely in value space, enhancing computational efficiency.
  • An amortized warm-start strategy significantly reduces computational burden.
  • Soft MPCritic effectively addresses both online control and value target generation.
Read more
Learn by Surprise, Commit by Proof
Kang-Sin Choi
Large Language Models Optimization Theory
  • LSCP allows language models to autonomously learn new information without external supervision.
  • The framework uses self-verification to distinguish between novel and noisy information.
  • LSCP reduces hallucinations by sharpening existing knowledge while learning new content.
  • The method demonstrates significant improvements in semantic learning over traditional fine-tuning approaches.
Read more
Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
Jaber Jaber, Osama Jaber
Large Language Models Efficient ML NLP
  • OUROBOROS introduces a Controller hypernetwork for dynamic weight modulation in recursive transformers.
  • The system achieves a 43.4% reduction in training loss compared to a baseline model.
  • Gated recurrence is essential for maintaining performance across deep iterations.
  • The Controller outperforms static LoRA configurations, particularly at lower depths.
Read more
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
Yiran Ma, Jerome Le Ny, Zhichao Chen, Zhihuan Song
Theory Optimization Generative Models
  • Introduces a diffusion-based posterior sampling framework for uncertainty quantification.
  • Eliminates the need for post-hoc calibration, providing intrinsically calibrated predictive uncertainty.
  • Demonstrates significant improvements in uncertainty calibration and predictive accuracy over existing methods.
  • Evaluated on synthetic data, a soft sensor benchmark, and a real-world ammonia synthesis case study.
Read more
Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning
Rafael Pardinas, Ehsan Kamalloo, David Vazquez, Alexandre Drouin
NLP Large Language Models Reinforcement Learning
  • Introduces a reproducible multi-domain RL post-training recipe for reasoning models.
  • Presents an adaptive domain sampling method to maintain target domain ratios during training.
  • Develops a difficulty-aware length penalty to optimize reasoning length based on problem difficulty.
  • Achieves significant improvements in accuracy and efficiency compared to previous models.
Read more
MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning
Sten RΓΌdiger, Sebastian Raschka
NLP Large Language Models Efficient ML
  • MiCA targets underutilized subspaces of model representations for fine-tuning.
  • The method uses Singular Value Decomposition to identify minor singular vectors.
  • MiCA shows up to 5.9x improvement in knowledge acquisition compared to LoRA.
  • The parameter footprint of MiCA is significantly lower than that of full fine-tuning.
Read more
Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
Dongrui Wu
Theory Efficient ML Optimization
  • Introduces feature weighting in distance computation for active learning regression.
  • Proposes five new active learning approaches that incorporate feature weights.
  • Demonstrates improved performance over existing unweighted ALR methods.
  • Validates effectiveness across both linear and nonlinear regression models.
Read more
GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes
Saman Khamesian, Sri Harini Balaji, Di Yang Shi, Stephanie M. Carpenter, Daniel E. Rivera, W. Bradley Knox, Peter Stone, Hassan Ghasemzadeh
Reinforcement Learning
  • GUIDE provides personalized behavioral recommendations for insulin and carbohydrate intake in T1D management.
  • The framework integrates a glucose level predictor and supports both offline and online reinforcement learning algorithms.
  • CQL-BC algorithm demonstrated the highest average time-in-range and low hypoglycemia exposure among evaluated methods.
  • The approach maintains behavioral similarity to patient action patterns, enhancing clinical applicability.
Read more
LΓ©vy-Flow Models: Heavy-Tail-Aware Normalizing Flows for Financial Risk Management
Rachid Drissi
Generative Models Theory Time Series
  • LΓ©vy-Flows replace Gaussian bases with LΓ©vy process-based distributions to better capture heavy tails.
  • The paper proves tail index preservation under asymptotically linear transformations.
  • Experimental results show significant improvements in density estimation and risk calibration over traditional Gaussian flows.
  • Different LΓ©vy bases (VG and NIG) are preferable depending on the target risk management objective.
Read more
MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction
Zhichong Zheng, Xiaohang Nie, Xueqi Wang, Yuanjin Zhao, Haitao Zhang, Yichao Tang
Time Series Multimodal
  • Introduction of MATA-Former, a transformer architecture that aligns clinical semantics with temporal dynamics.
  • Development of Plateau-Gaussian Soft Labeling (PSL) for continuous risk modeling instead of binary classification.
  • Creation of the SIICU dataset with over 506,000 expert-annotated clinical events for robust evaluation.
  • Demonstrated superior performance in risk prediction compared to existing methods on both SIICU and MIMIC-IV datasets.
Read more
Benchmark Problems and Benchmark Datasets for the evaluation of Machine and Deep Learning methods on Photoplethysmography signals: the D4 report from the QUMPHY project
Urs Hackstein, Jordi Alastruey, Philip Aston, Ciaran Bench, Peter H. Charlton, Loic Coquelin, Nando Hegemann, Vaidotas Marozas, Mohammad Moulaeifard, Manasi Nandi, Andrius Petrenas, Oskar Pfeffer, Mantas Rinkevicius, Andrius Solosenko, Nils Strodthoff, Sara Vardanega
Time Series
  • Identification of six benchmark problems for evaluating PPG signal analysis.
  • Provision of suitable datasets and guidelines for their usage.
  • Focus on quantifying uncertainties in machine learning applications in healthcare.
  • Encouragement of standardization and collaboration in PPG research.
Read more
Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation
Martin Jaraiz
Large Language Models Optimization Efficient ML
  • Introduction of cost-penalized fitness metrics enhances expert management in MoE systems.
  • Demonstration of 'molecular memory' allows for faster recovery from domain shifts without expert replacement.
  • Significant potential cost savings and energy reductions for large-scale LLM providers.
  • FMA orchestrated approach fundamentally differs from static expert management in existing MoE architectures.
Read more
Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference
Dimitrios Danopoulos, Enrico Lupi, Michael Kagan, Maurizio Pierini
Efficient ML NLP Large Language Models
  • Introduction of Head-Calibrated Clipped-Linear Softmax (HCCS) as a softmax surrogate for quantized multi-head attention.
  • HCCS preserves the ordering of logits and generates stable probability distributions without explicit exponentiation.
  • Lightweight per-head calibration method enhances accuracy across heterogeneous attention head distributions.
  • First int8-optimized softmax implementation for AMD Versal AI Engine, achieving higher throughput than existing BF16 implementations.
Read more
go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices
Torque Dandachi, Sophia Diggs-Galligan
Theory Efficient ML Large Language Models
  • Introduces go-mHC, a novel parameterization method for doubly stochastic matrices.
  • Achieves O(d^3) scaling, significantly improving efficiency over existing methods.
  • Demonstrates enhanced expressivity and stability in neural network training.
  • Converges up to 10 times faster on synthetic tasks compared to traditional methods.
Read more
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies
Zhanzhi Lou, Hui Chen, Yibo Li, Qian Wang, Bryan Hooi
NLP Large Language Models Optimization
  • META-TTL formulates Test-Time Learning as a meta-learning problem focused on optimizing adaptation policies.
  • The framework employs a bi-level optimization structure with an inner TTL loop and an outer evolutionary search loop.
  • Evaluations show significant performance improvements over hand-crafted adaptation policies, with gains generalizing to unseen tasks.
  • The learned adaptation policy is realized as a natural-language meta-prompt, enabling concrete adaptation instructions.
Read more
SAGE: Subsurface AI-driven Geostatistical Extraction with proxy posterior
Huseyin Tuna Erdinc, Ipsita Bhar, Rafael Orozco, Thales Souza, Felix J. Herrmann
Generative Models Theory Efficient ML
  • SAGE learns a proxy posterior from incomplete velocity and migrated image data.
  • It generates high-resolution velocity realizations conditioned solely on migrated images.
  • The framework can be fine-tuned on field data, enhancing its applicability.
  • SAGE serves as a data sample generator for training task-specific networks.
Read more
Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training
William Hoy, Binxu Wang, Xu Pan
Large Language Models Reinforcement Learning Optimization
  • ES can match or exceed GRPO in task accuracy across various settings.
  • The update behaviors of ES and GRPO are markedly different, with ES making larger and more diffuse updates.
  • Despite different update trajectories, ES and GRPO solutions are linearly connected in parameter space.
  • A theoretical framework explains the random-walk-like behavior of ES in high-dimensional spaces.
Read more
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu
NLP Large Language Models Generative Models
  • EC routing outperforms TC routing in DLMs, achieving better load balance and faster convergence.
  • Timestep-dependent expert capacity allows for dynamic allocation of resources based on the denoising step.
  • Low-mask-ratio contexts yield higher learning efficiency, justifying increased computational focus during these steps.
  • Existing pretrained TC DLMs can be adapted to EC routing with significant performance improvements.
Read more
Bridging the Simulation-to-Experiment Gap with Generative Models using Adversarial Distribution Alignment
Kai Nelson, Tobias Kreiman, Sergey Levine, Aditi S. Krishnapriyan
Generative Models
  • Introduces Adversarial Distribution Alignment (ADA) to bridge the simulation-to-experiment gap.
  • Proves that ADA can recover target observable distributions even with correlated observables.
  • Demonstrates empirical success on synthetic, molecular, and experimental protein data.
  • Aligns generative models trained on simulation data with real-world experimental observations.
Read more
Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction
BjΓΆrn Roman Kohlberger
NLP Large Language Models Efficient ML
  • SCT enables training of large language models on consumer hardware by using compact SVD representations.
  • Achieves up to 199Γ— memory reduction per MLP layer, allowing for training on devices with limited memory.
  • Rank-sweep experiments indicate that rank 128 is the most efficient configuration for training.
  • Convergence gaps compared to dense training are primarily influenced by learning rate rather than model rank.
Read more
ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving
Annette Taberner-Miller
Large Language Models Reinforcement Learning Optimization
  • ParetoBandit is the first adaptive router for LLMs that enforces budget constraints while adapting to non-stationary conditions.
  • The system employs an online primal-dual budget pacer for real-time cost management.
  • Geometric forgetting allows the router to quickly adjust to shifts in model quality and pricing.
  • A hot-swap registry facilitates the addition and removal of models without downtime.
Read more
Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions
Haoyu Zheng, Yongqiang Zhang, Fangcheng Fu, Xiaokai Zhou, Hao Luo, Hongchao Zhu, Yuanyuan Zhu, Hao Wang, Xiao Yan, Jiawei Jiang
Large Language Models NLP Optimization
  • Existing scheduling methods for LLM inference rely on point estimates of output lengths, which are inadequate due to the stochastic nature of LLM decoding.
  • Output lengths can be modeled as a heavy-tailed distribution, specifically a log-t distribution, to better capture their variability.
  • The proposed Tail Inflated Expectation (TIE) metric adjusts the expected output length to account for the risks of generating long outputs.
  • The TIE scheduler significantly outperforms traditional methods, reducing latency and improving throughput in both online and offline scenarios.
Read more
Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP
Sriram Sattiraju, Vaibhav Gollapalli, Aryan Shah, Timothy McMahan
Generative Models Time Series Optimization
  • Introduces a framework for cognitive energy modeling using EEG and SBP.
  • Demonstrates that synthetic EEG generated by WGAN retains necessary dynamical structures.
  • Validates the use of SBP-derived transport costs for analyzing cognitive state transitions.
  • Proposes real-time adaptive human-machine systems based on cognitive energy metrics.
Read more
CANDI: Curated Test-Time Adaptation for Multivariate Time-Series Anomaly Detection Under Distribution Shift
HyunGi Kim, Jisoo Mok, Hyungyu Lee, Juhyeon Shin, Sungroh Yoon
Time Series
  • CANDI introduces a novel TTA framework for MTSAD that addresses distribution shifts.
  • The False Positive Mining (FPM) strategy curates informative samples for adaptation.
  • The Spatiotemporally-Aware Normality Adaptation (SANA) module enables lightweight model updates.
  • CANDI achieves up to a 14% improvement in AUROC while using less than 2% of test data for adaptation.
Read more
Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold
Anamika Paul Rupa
Theory
  • Neural Collapse occurs when the mean feature norm reaches a critical threshold (fn*) that is largely invariant to training conditions.
  • Training dynamics primarily affect the rate at which the mean feature norm approaches fn*, rather than the threshold value itself.
  • The crossing of the mean feature norm below fn* predicts NC onset with a mean lead time of 62 epochs.
  • Significant architectural effects on fn* were observed, with variations across different datasets.
Read more
Variational LSTM with Augmented Inputs: Nonlinear Response History Metamodeling with Aleatoric and Epistemic Uncertainty
Manisha Sapkota, Min Li, Bowei Li
Time Series
  • Introduces a Variational LSTM model for nonlinear structural metamodeling.
  • Augmented inputs are used to capture variability and uncertainty in structural responses.
  • Epistemic uncertainty is quantified using Monte Carlo dropout, enhancing prediction reliability.
  • Validated on nonlinear systems under stochastic seismic and wind loads.
Read more
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang
NLP Large Language Models Efficient ML
  • FourierMoE integrates MoE architecture with inverse discrete Fourier transform (IDFT) for frequency-aware adaptation.
  • The method addresses task interference and representation deficiency in multi-task fine-tuning settings.
  • FourierMoE outperforms competitive baselines across 28 benchmarks with fewer trainable parameters.
  • The approach utilizes a frequency-adaptive router and learns complex coefficients to capture phase and amplitude information.
Read more
PI-JEPA: Label-Free Surrogate Pretraining for Coupled Multiphysics Simulation via Operator-Split Latent Prediction
Brandon Yee, Pairie Koh
Efficient ML
  • PI-JEPA enables label-free pretraining using unlabeled parameter fields, significantly reducing the need for expensive labeled simulation data.
  • The framework employs masked latent prediction and PDE residual regularization to ensure physical plausibility during training.
  • PI-JEPA achieves superior performance compared to existing methods like FNO and DeepONet, particularly with limited labeled data.
  • The architecture is structured to exploit the operator-splitting method, allowing for specialized learning of different physical processes.
Read more
Differentially Private Manifold Denoising
Jiaqi Wu, Yiqing Sun, Zhigang Yao
Theory
  • Introduces a differentially private framework for manifold denoising that protects sensitive data.
  • Employs an iterative procedure to estimate local geometry and project noisy queries while ensuring privacy.
  • Establishes utility guarantees for corrected queries based on manifold properties and privacy constraints.
  • Demonstrates practical applicability through simulations and case studies, highlighting utility-privacy trade-offs.
Read more
Embedded Variational Neural Stochastic Differential Equations for Learning Heterogeneous Dynamics
Sandeep Kumar Samota, Reema Gupta, Snehashish Chakraverty
Time Series
  • Introduction of V-NSDE model for socioeconomic data analysis.
  • Combines Neural SDEs and VAEs for improved modeling of heterogeneous dynamics.
  • Utilizes district-level data from Odisha, showcasing inter-district variability.
  • Demonstrates effective learning of complex temporal patterns and uncertainty quantification.
Read more
Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error
Taisuke Kobayashi
Reinforcement Learning Robotics Theory
  • Introduces the Pseudo-Quantized Actor-Critic (PQAC) algorithm for robust learning in RL.
  • Utilizes a sigmoid function to model TD errors, allowing for gradient vanishing under noise.
  • Implements pseudo-quantization of TD errors to enhance noise reduction.
  • Demonstrates improved stability and efficiency in learning compared to traditional methods.
Read more
Residuals-based Offline Reinforcement Learning
Qing Zhu, Xian Yu
Reinforcement Learning Optimization Theory
  • Introduces a residuals-based framework for offline reinforcement learning that mitigates data coverage issues.
  • Defines a residuals-based Bellman optimality operator that incorporates estimation errors into policy optimization.
  • Establishes conditions for the asymptotic optimality and finite-sample guarantees of the proposed operator.
  • Develops a residuals-based offline DQN algorithm and demonstrates its effectiveness in a stochastic CartPole environment.
Read more
DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting
Xiang Ao, Yinyu Tan, Mengru Chen
Time Series
  • DySCo introduces a learnable paradigm for dynamic semantic compression in time series forecasting.
  • The framework effectively distinguishes valuable signals from irrelevant noise in long historical sequences.
  • EGDS retains high-entropy segments while compressing redundant trends, enhancing predictive performance.
  • HFED separates high-frequency anomalies from low-frequency patterns for better detail preservation.
Read more
Empirical Validation of the Classification-Verification Dichotomy for AI Safety Gates
Arsenios Scrivens
Theory Reinforcement Learning Robotics
  • All tested classifier configurations failed to ensure safe self-improvement in AI systems.
  • The Lipschitz ball verifier achieved 100% soundness across various dimensions, demonstrating its effectiveness.
  • The impossibility of classifier-based safety gates is structural and not dependent on specific configurations or conditions.
  • The study provides empirical constants and scaling laws that were not predicted by theory alone.
Read more
LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding
Chenghao Yue, Zhiyuan Ma, Zhongye Xia, Xinche Zhang, Yisi Zhang, Xinke Shen, Sen Song
Time Series
  • LI-DSN addresses the 'information silo' problem in existing dual-stream EEG networks by enabling layer-wise interaction.
  • The Temporal-Spatial Integration Attention (TSIA) mechanism allows for dynamic integration of temporal and spatial features.
  • The model employs an adaptive fusion strategy with learnable weights to optimize feature integration.
  • Extensive experiments show LI-DSN outperforms 13 state-of-the-art models across various EEG tasks.
Read more
Sit-to-Stand Transitions Detection and Duration Measurement Using Smart Lacelock Sensor
Md Rafi Islam, Md Rejwanul Haque, Elizabeth Choma, Shannon Hayes, Siobhan McMahon, Xiangrong Shen, Edward Sazonov
Multimodal
  • The Smart Lacelock sensor effectively detects Sit-to-Stand transitions in older adults.
  • The methodology integrates load cell and IMU data for accurate motion analysis.
  • High classification accuracy (0.98) and low duration measurement error (0.047 seconds) were achieved.
  • The approach offers a non-invasive alternative to traditional clinical assessments.
Read more
Using predefined vector systems to speed up neural network multimillion class classification
Nikita Gabdullin, Ilya Androsov
Efficient ML
  • Reduction of label prediction complexity from O(n) to O(1) using predefined vector systems.
  • Achieves up to 11.6 times acceleration in neural network inference for multimillion class classification.
  • Maintains training accuracy while improving computational efficiency.
  • Enables potential prediction of new classes based on the latent space configuration.
Read more
Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation
Hoang-Chau Luong, Dat Ba Tran, Lingwei Chen
NLP Large Language Models Optimization
  • RKL provides advantages in LLM distillation by focusing on dominant modes but introduces overconfidence and low diversity in predictions.
  • The authors analyze RKL's gradient behavior, showing that non-target gradients negatively impact target logits, leading to poor non-target class alignment.
  • DRKL is proposed to address RKL's limitations by eliminating non-target gradient effects and enhancing non-target supervision.
  • Empirical results show DRKL's superiority over FKL, RKL, and other distillation methods across various datasets and model families.
Read more
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
Reinforcement Learning Large Language Models NLP
  • SKILL0 is the first RL framework explicitly designed for skill internalization, moving agents from dependence on inference-time skills to autonomous zero-shot behavior.
  • The in-context reinforcement learning approach provides structured skill guidance during training and removes it at inference, optimizing the transition to intrinsic competence.
  • Dynamic Curriculum adapts the withdrawal of skills based on their on-policy helpfulness, enhancing the internalization process.
  • Extensive experiments demonstrate substantial performance improvements over traditional RL methods and competitive results against skill-augmented approaches.
Read more
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Zihao Wu, Hongyao Tang, Yi Ma, Jiashun Liu, Yan Zheng, Jianye Hao
Reinforcement Learning Theory Optimization
  • Introduces a theoretical framework for understanding plasticity loss in deep RL.
  • Identifies two mechanisms causing plasticity loss: NTK rank collapse and gradient decay.
  • Proposes Sample Weight Decay (SWD) as a solution to restore gradient magnitude.
  • Demonstrates SWD's effectiveness across various RL algorithms and environments.
Read more
Perspective: Towards sustainable exploration of chemical spaces with machine learning
Leonardo Medrano Sandonas, David Balcells, Anton Bochkarev, Jacqueline M. Cole, Volker L. Deringer, Werner Dobrautz, Adrian Ehrenhofer, Thorben Frank, Pascal Friederich, Rico Friedrich, Janine George, Luca Ghiringhelli, Alejandra Hinostroza Caldas, Veronika Juraskova, Hannes Kneiding, Yury Lysogorskiy, Johannes T. Margraf, Hanna Türk, Anatole von Lilienfeld, Milica Todorović, Alexandre Tkatchenko, Mariana Rossi, Gianaurelio Cuniberti
Efficient ML
  • AI's growing computational demands pose sustainability challenges in molecular and materials science.
  • Emerging strategies for enhancing efficiency include multi-fidelity approaches and active learning.
  • Incorporating physics-based constraints can optimize resource use without sacrificing reliability.
  • Bridging computational predictions with real-world conditions is essential for practical applications.
Read more
Learning from the Right Rollouts: Data Attribution for PPO-based LLM Post-Training
Dong Shu, Denghui Zhang, Jessica Hullman
Reinforcement Learning Large Language Models Interpretability
  • Introduction of Influence-Guided PPO (I-PPO) framework for improved RL post-training.
  • Utilization of gradient-based influence scores to filter out detrimental episodes.
  • Demonstrated performance improvements over SFT and traditional PPO methods.
  • I-PPO acts as an intrinsic early stopping mechanism, accelerating training.
Read more
Informed Machine Learning with Knowledge Landmarks
Chuyi Dai, Witold Pedrycz, Suping Xu, Ding Liu, Xianmin Wang
Theory Optimization
  • Introduction of the KD-ML framework that combines local numeric data with global qualitative knowledge.
  • Development of knowledge landmarks as structural constraints that summarize system behavior across varying conditions.
  • Formulation of an augmented loss function that balances local data fitting with global knowledge regularization.
  • Demonstration of improved full-domain generalization on physics-governed benchmarks compared to traditional models.
Read more
Screening Is Enough
Ken M. Nakanishi
NLP Large Language Models Efficient ML
  • Introduction of Multiscreen architecture enabling absolute query-key relevance through screening.
  • Achieves comparable validation loss with 40% fewer parameters than Transformer models.
  • Enables stable optimization at larger learning rates and maintains strong long-context performance.
  • Reduces inference latency by up to 3.2 times compared to Transformer baselines.
Read more
Optimizing EEG Graph Structure for Seizure Detection: An Information Bottleneck and Self-Supervised Learning Approach
Lincan Li, Rikuto Kotoge, Xihao Piao, Zheng Chen, Yushun Dong
Graph Learning Time Series Interpretability
  • IRENE optimizes EEG graph structures using the Information Bottleneck principle to enhance seizure detection.
  • The framework employs a self-supervised learning approach to improve representation learning without relying on labeled data.
  • IRENE addresses the challenges of noise in EEG data and inter-patient variability, leading to more robust models.
  • The method provides interpretable insights into seizure propagation and the relationships between brain regions.
Read more
Event Embedding of Protein Networks : Compositional Learning of Biological Function
Antonin Sulc
Graph Learning
  • Event2Vec significantly improves pathway coherence and functional analogy accuracy compared to DeepWalk.
  • The study demonstrates that compositional structure enhances relational reasoning in biological networks.
  • Event2Vec achieves a mean pathway coherence of 0.870, outperforming DeepWalk's 0.648.
  • The research highlights the importance of geometric properties in understanding protein interactions.
Read more
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
Gengsheng Li, Tianyu Yang, Junfeng Fang, Mingyang Song, Mao Zheng, Haiyun Guo, Dan Zhang, Jinqiao Wang, Tat-Seng Chua
Reinforcement Learning Large Language Models Optimization
  • Introduces Sample-Routed Policy Optimization (SRPO) to unify GRPO and SDPO methods.
  • Addresses the limitations of GRPO's coarse credit assignment and SDPO's late-stage instability.
  • Implements an entropy-aware dynamic weighting mechanism to enhance training stability.
  • Achieves significant performance improvements over GRPO and SDPO across multiple benchmarks.
Read more
Label Shift Estimation With Incremental Prior Update
Yunrui Zhang, Gustavo Batista, Salil S. Kanhere
Theory Efficient ML Optimization
  • Introduces LEIP, a new method for label shift estimation that updates priors incrementally.
  • Assumes no concept drift, focusing on changes in label distribution while keeping feature likelihoods constant.
  • Demonstrates compatibility with any black-box probabilistic classifier.
  • Achieves superior performance compared to existing maximum likelihood-based methods.
Read more
Task-Centric Personalized Federated Fine-Tuning of Language Models
Gabriel U. Talasso, Meghdad Kurmanji, Allan M. de Souza, Nicholas D. Lane, Leandro A. Villas
Federated Learning Large Language Models NLP
  • Introduction of FedRouter, a task-centric personalized federated learning method.
  • Addresses generalization issues and intra-client task interference in FL.
  • Utilizes local and global clustering mechanisms for model specialization.
  • Implements an adaptive evaluation router for improved inference.
Read more
CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
Junyoung Sung, Seungwoo Lyu, Minjun Kim, Sumin An, Arsha Nagrani, Paul Hongsuck Seo
Multimodal
  • CRIT provides a new benchmark for evaluating cross-modal multi-hop reasoning in VLMs.
  • The dataset is generated using a graph-based automatic synthesis pipeline, ensuring complex interleaved relationships.
  • State-of-the-art models struggle with reasoning tasks in CRIT, indicating a gap in current multimodal training.
  • Models trained on CRIT show significant performance improvements on existing multimodal benchmarks.
Read more
When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals
Rui Wu, Ruixiang Tang
Reinforcement Learning Large Language Models Optimization
  • Identification of a reproducible three-phase rebound pattern in reward hacking behavior.
  • Discovery that the shortcut concept direction is the most effective for detecting hacking behavior.
  • Introduction of Advantage Modification, a method that penalizes hacking rollouts at the training-signal level.
  • Demonstration of the effectiveness of representation-level signals in mitigating reward hacking.
Read more
Performance of Neural and Polynomial Operator Surrogates
Josephine Westermann, Benno Huber, Thomas O'Leary-Roseberry, Jakob Zech
Theory Efficient ML
  • Neural and polynomial operator surrogates are compared for efficiency in approximating PDE solutions.
  • Polynomial surrogates outperform neural operators in data efficiency for smooth input fields.
  • Fourier neural operators show faster convergence rates for rough input fields.
  • Derivative-informed training improves data efficiency for neural operators.
Read more
Policy Improvement Reinforcement Learning
Huaiyang Wang, Xiaojie Li, Deqing Wang, Haoyi Zhou, Zixuan Huang, Yaodong Yang, Jianxin Li, Yikun Ban
Reinforcement Learning Large Language Models Optimization
  • PIRL addresses the lack of policy improvement feedback in existing RLVR methods, which can lead to instability.
  • PIPO implements a closed-loop optimization process that verifies updates and reinforces genuine improvements.
  • The proposed methods lead to smoother training dynamics and better robustness against mode collapse.
  • Theoretical analysis supports the effectiveness of PIPO in achieving the PIRL objective.
Read more