AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

52 Papers today
8h Update frequency
7 Days of history
DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting
Xiang Ao, Yinyu Tan, Mengru Chen
Time Series
  • DySCo addresses the paradox of increasing lookback windows in time series forecasting by reducing noise and redundancy.
  • The framework incorporates EGDS for dynamic sampling based on entropy, preserving valuable information.
  • HFED allows for multi-granularity modeling by separating high-frequency anomalies from low-frequency patterns.
  • CSIM enhances prediction accuracy by dynamically fusing global and local information.
Read more
AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression
Atul Kumar Sinha, FranΓ§ois Fleuret
NLP Large Language Models Efficient ML
  • AA-SVD allows for rapid compression of billion-parameter models without retraining.
  • The method addresses distribution shifts caused by upstream compression, improving accuracy.
  • AA-SVD minimizes block-level output distortion by refining all compressed layers jointly.
  • Experimental results show significant performance improvements over existing SVD-based baselines.
Read more
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
Zhongwei Yu, Rasul Tutunov, Alexandre Max Maraval, Zikai Xie, Zhenzhi Tan, Jiankang Wang, Zijing Li, Liangliang Xu, Qi Yang, Jun Jiang, Sanzhong Luo, Zhenxiao Guo, Haitham Bou-Ammar, Jun Wang
Optimization
  • Bayesian Optimization formalizes the scientific discovery process, reducing reliance on trial-and-error.
  • Surrogate models and acquisition functions are crucial for guiding experimental design and decision-making.
  • The tutorial provides practical coding examples and theoretical insights tailored for various audiences.
  • Real-world case studies demonstrate the effectiveness of BO in diverse scientific fields.
Read more
Soft MPCritic: Amortized Model Predictive Value Iteration
Thomas Banker, Nathan P. Lawrence, Ali Mesbah
Reinforcement Learning Robotics Optimization
  • Introduces a hybrid RL-MPC framework that operates entirely in value space.
  • Utilizes model predictive path integral control (MPPI) for online control and value target generation.
  • Implements an amortized warm-start strategy to enhance computational efficiency.
  • Demonstrates effectiveness on classic and complex control tasks.
Read more
Massively Parallel Exact Inference for Hawkes Processes
Ahmer Raza, Hudson Smith
Time Series Efficient ML Theory
  • Introduces a massively parallel algorithm for maximum likelihood estimation of linear exponential Hawkes processes.
  • Reduces computational complexity from O(N^2) to approximately O(N/P) with P parallel processors.
  • Utilizes a parallel prefix scan for efficient computation of event intensities.
  • Maintains exact likelihood computation without additional assumptions, preserving model interpretability.
Read more
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu
NLP Large Language Models Generative Models
  • EC routing outperforms TC routing in DLMs, achieving better load balance and faster convergence.
  • Timestep-dependent expert capacity scheduling enhances performance by allocating more resources to low-mask-ratio steps.
  • Existing pretrained TC DLMs can be retrofitted to EC routing with significant improvements in accuracy and convergence speed.
  • The study provides a mechanistic explanation for the efficiency gains observed with EC routing.
Read more
CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
Junyoung Sung, Seungwoo Lyu, Minjun Kim, Sumin An, Arsha Nagrani, Paul Hongsuck Seo
Multimodal Graph Learning Computer Vision
  • CRIT introduces a new dataset specifically designed for cross-modal multi-hop reasoning.
  • The dataset is generated using a graph-based automatic data synthesis pipeline.
  • State-of-the-art VLMs show significant performance improvements when trained on CRIT.
  • Existing multimodal benchmarks often fail to test true cross-modal grounding.
Read more
MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction
Zhichong Zheng, Xiaohang Nie, Xueqi Wang, Yuanjin Zhao, Haitao Zhang, Yichao Tang
Time Series Multimodal
  • Introduction of MATA-Former, a transformer architecture that integrates semantic awareness into temporal attention for ICU risk prediction.
  • Development of Plateau-Gaussian Soft Labeling (PSL) to transition from binary classification to continuous regression, enhancing risk modeling granularity.
  • Creation of the SIICU dataset, featuring over 506,000 expert-annotated clinical events, addressing the lack of fine-grained clinical datasets.
  • Demonstration of improved efficacy and generalization in risk prediction using the proposed methods on both SIICU and MIMIC-IV datasets.
Read more
Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular Grids
William Howes, Jason Yoo, Kazuma Kobayashi, Subhankar Sarkar, Farid Ahmed, Souvik Chakraborty, Syed Bahauddin Alam
Graph Learning Efficient ML Theory
  • VIRSO offers a novel approach to virtual sensing by combining spectral and spatial analysis for accurate reconstruction.
  • The method significantly reduces energy consumption and latency, making it suitable for edge deployment.
  • VIRSO outperforms existing neural operators in terms of accuracy and efficiency on complex benchmarks.
  • The introduction of Variable KNN (V-KNN) enhances graph construction for irregular geometries.
Read more
Improving Latent Generalization Using Test-time Compute
Arslan Chaudhry, Sridhar Thiagarajan, Andrew Lampinen
NLP Large Language Models Reinforcement Learning
  • Introduces test-time compute as a method to improve latent generalization in LLMs.
  • Demonstrates that models trained to generate chains-of-thought can generalize effectively to both in-distribution and out-of-distribution tasks.
  • Finds that while thinking improves performance on many tasks, pure reversal tasks remain challenging.
  • Highlights the brittleness of factual self-verification in thinking models compared to in-context learning.
Read more
Variational LSTM with Augmented Inputs: Nonlinear Response History Metamodeling with Aleatoric and Epistemic Uncertainty
Manisha Sapkota, Min Li, Bowei Li
Time Series
  • Introduces a Variational LSTM model for nonlinear structural metamodeling.
  • Augmented inputs effectively capture record-to-record variability and system uncertainties.
  • Epistemic uncertainty is quantified using Monte Carlo dropout, enhancing prediction reliability.
  • Validated on nonlinear systems subjected to stochastic seismic and wind loads.
Read more
Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
Jaber Jaber, Osama Jaber
Large Language Models Efficient ML NLP
  • OUROBOROS introduces a Controller hypernetwork for dynamic weight modulation in recursive transformers.
  • The system achieves a 43.4% reduction in training loss compared to a baseline model.
  • Gated recurrence is essential for maintaining performance during deep iterations.
  • The model outperforms static per-step LoRA methods, particularly at lower depths.
Read more
Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring
Feiyu Zhou, Marios Impraimakis
Time Series Multimodal
  • Introduction of a transformer model for wind-excited structural response forecasting.
  • Multimodal learning from both wind features and vibration signals enhances prediction accuracy.
  • Validation with real-world data from the Hardanger Bridge under variable conditions.
  • Improves modal energy retention and reduces false alarms in structural health monitoring.
Read more
PI-JEPA: Label-Free Surrogate Pretraining for Coupled Multiphysics Simulation via Operator-Split Latent Prediction
Brandon Yee, Pairie Koh
Efficient ML
  • PI-JEPA enables label-free pretraining using abundant unlabeled parameter fields.
  • The framework employs masked latent prediction and PDE residual regularization.
  • Significant accuracy improvements over existing methods with fewer labeled runs.
  • Aligns with operator-splitting methods to handle different physical processes effectively.
Read more
Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics
Khai Banh Nghiep, Duc Nguyen Minh, Lan Hoang Thi
Time Series Optimization
  • The study presents a systematic comparison of deep learning models and statistical methods for demand forecasting.
  • N-BEATS outperformed MSTL and N-HiTS in forecasting accuracy, making it the most optimized model for this application.
  • The proposed framework effectively integrates forecasting with optimization, providing a practical solution for supply chain logistics.
  • The research highlights the importance of accurate demand forecasting in reducing operational costs and improving service levels.
Read more
DDCL-INCRT: A Self-Organising Transformer with Hierarchical Prototype Structure (Theoretical Foundations)
Giansalvo Cirrincione
Theory Efficient ML NLP
  • Introduces a self-organising transformer architecture that adapts its structure during training.
  • Utilizes Deep Dual Competitive Learning to replace traditional feedforward blocks with a prototype layer.
  • Implements Incremental Transformer to dynamically adjust the number of attention heads based on task requirements.
  • Proves that the resulting hierarchical structure is unique, minimal, and robust to pruning.
Read more
JetPrism: diagnosing convergence for generative simulation and inverse problems in nuclear physics
Zeyu Xia, Tyler Kim, Trevor Reed, Judy Fox, Geoffrey Fox, Adam Szczepaniak
Generative Models
  • JetPrism addresses the premature plateau of standard CFM loss metrics, providing a more reliable convergence diagnostic.
  • The framework incorporates a multi-metric evaluation protocol that includes χ² statistics, W1 distances, and correlation matrix distances.
  • Validation of JetPrism is performed using synthetic stress tests and a Jefferson Lab dataset relevant to the Electron-Ion Collider.
  • The proposed methodology ensures precise statistical agreement with ground-truth data without memorizing the training set.
Read more
Residuals-based Offline Reinforcement Learning
Qing Zhu, Xian Yu
Reinforcement Learning Optimization Theory
  • Introduces a residuals-based offline RL framework that mitigates data coverage limitations.
  • Defines a residuals-based Bellman optimality operator that incorporates estimation errors.
  • Develops a residuals-based offline DQN algorithm for practical implementation.
  • Demonstrates effectiveness in a stochastic CartPole environment, showing improved policy learning.
Read more
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
Reinforcement Learning Large Language Models Robotics
  • SKILL0 is the first RL framework that explicitly formulates skill internalization as a training objective.
  • In-context reinforcement learning allows for structured skill guidance during training while removing it at inference.
  • Dynamic Curriculum adapts the retention of skills based on their on-policy helpfulness, enhancing the internalization process.
  • SKILL0 achieves substantial performance improvements over traditional RL methods while maintaining a low token overhead.
Read more
Learning ECG Image Representations via Dual Physiological-Aware Alignments
Hung Manh Pham, Jialu Tang, Aaqib Saeed, Dong Ma, Bin Zhu, Zhou Pan
Multimodal Time Series Computer Vision
  • Introduces ECG-Scan, a self-supervised framework for ECG image representation learning.
  • Utilizes dual physiological-aware alignments for multimodal contrastive learning.
  • Incorporates soft-lead constraints to improve signal lead inter-consistency.
  • Demonstrates superior performance of the image-based model over existing baselines.
Read more
Learn by Surprise, Commit by Proof
Kang-Sin Choi
Large Language Models NLP Optimization
  • LSCP allows models to autonomously learn new information by verifying it against existing knowledge.
  • The framework uses a self-gating mechanism to adjust learning intensity based on the model's conviction in new information.
  • Experiments show that LSCP significantly reduces rote memorization compared to standard fine-tuning.
  • The approach mimics biological memory processes, consolidating temporary information into long-term memory.
Read more
LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications
Mayank Mayank, Bharanidhar Duraisamy, Florian Geiss
Graph Learning Multimodal Robotics
  • LEO integrates Graph Attention Networks for adaptive shape estimation in extended object tracking.
  • The framework utilizes a unique parallelogram-based ground-truth formulation to represent complex geometries.
  • A dual-attention mechanism enhances the robustness of sensor fusion by capturing temporal and spatial dependencies.
  • LEO demonstrates real-time performance suitable for production systems in autonomous driving.
Read more
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang
NLP Large Language Models Efficient ML
  • FourierMoE reformulates adaptation in the spectral domain to enhance multi-task fine-tuning.
  • The method employs a frequency-adaptive router to allocate tasks to experts based on frequency specialization.
  • FourierMoE achieves superior performance across 28 benchmarks with fewer trainable parameters compared to traditional methods.
  • The integration of complex coefficients allows for a complete representation of spectral information, improving adaptation efficiency.
Read more
ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor
Yixiao Wang, Ting Jiang, Zishan Shao, Hancheng Ye, Jingwei Sun, Mingyuan Ma, Jianyi Zhang, Yiran Chen, Hai Li
Generative Models Efficient ML
  • ZEUS uses a second-order predictor to effectively reduce denoiser evaluations, simplifying the acceleration process.
  • The method avoids the complexities of higher-order predictors that can degrade output quality under aggressive speedups.
  • ZEUS maintains compatibility with various model architectures and requires minimal code changes for integration.
  • The approach achieves significant speed improvements while preserving perceptual fidelity in generated outputs.
Read more
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Zihao Wu, Hongyao Tang, Yi Ma, Jiashun Liu, Yan Zheng, Jianye Hao
Reinforcement Learning Theory Optimization
  • Introduces a theoretical framework for understanding plasticity loss in deep RL.
  • Identifies two mechanisms contributing to plasticity loss: NTK rank collapse and gradient decay.
  • Proposes Sample Weight Decay (SWD) as a solution to restore gradient magnitude.
  • Demonstrates SWD's effectiveness across multiple RL algorithms and environments.
Read more
Learning from the Right Rollouts: Data Attribution for PPO-based LLM Post-Training
Dong Shu, Denghui Zhang, Jessica Hullman
Reinforcement Learning Large Language Models Interpretability
  • Introduction of Influence-Guided PPO (I-PPO) to filter out unfaithful episodes in RL training.
  • Demonstrated significant improvements in training efficiency and model performance compared to traditional PPO and SFT.
  • I-PPO acts as an intrinsic early stopping mechanism, dynamically reducing the rollout buffer volume.
  • The method provides a fine-grained analysis revealing its effectiveness in detecting unfaithful reasoning episodes.
Read more
Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling
Aleksei Khalin, Ekaterina Zaychenkova, Aleksandr Yugay, Andrey Goncharov, Sergey Korchagin, Alexey Zaytsev, Egor Ershov
Computer Vision Interpretability
  • Expert evaluations significantly improve the quality of uncertainty estimates in medical AI.
  • The proposed method separates uncertainty into epistemic and aleatoric components using expert-generated soft labels.
  • A two-ensemble approach effectively estimates both types of uncertainty, outperforming existing methods.
  • The method shows substantial improvements across multiple medical tasks, enhancing AI reliability.
Read more
Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference
Dimitrios Danopoulos, Enrico Lupi, Michael Kagan, Maurizio Pierini
Efficient ML NLP Large Language Models
  • Introduction of Head-Calibrated Clipped-Linear Softmax (HCCS) as a surrogate for softmax in quantized multi-head attention.
  • HCCS preserves the ordering of logits and generates stable probability distributions without explicit exponentiation.
  • Lightweight per-head calibration method enhances the accuracy of approximations across diverse attention heads.
  • First int8-optimized softmax implementation for AMD Versal AI Engine, achieving higher throughput than existing BF16 implementations.
Read more
SECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous Driving
Wenjing Wang, Wenxuan Wang, Songning Lai
Computer Vision
  • SECURE framework enhances robustness in accident anticipation models.
  • Identifies significant instability in existing models like CRASH under perturbations.
  • Introduces a multi-objective loss function for fine-tuning model parameters.
  • Achieves state-of-the-art results on DAD and CCD datasets.
Read more
UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression
Mars Liyao Gao, Yuxuan Bao, Amy S. Rude, Xinwei Shen, J. Nathan Kutz
Time Series Theory Efficient ML
  • UQ-SHRED provides valid uncertainty quantification for sparse sensing problems.
  • The framework utilizes noise injection and energy score minimization for efficient distributional learning.
  • The method is validated across multiple scientific datasets, showcasing its versatility.
  • Theoretical guarantees are established for the learned conditional distribution.
Read more
Model-Based Reinforcement Learning for Control under Time-Varying Dynamics
Klemens Iten, Bruce Lee, Chenhao Li, Lenart Treven, Andreas Krause, Bhavya Sukhija
Reinforcement Learning Robotics Theory
  • Introduces a framework for MBRL that accommodates time-varying dynamics.
  • Develops two algorithms (R-OMBRL and SW-OMBRL) that utilize adaptive data buffers.
  • Establishes theoretical guarantees for dynamic regret in the context of non-stationarity.
  • Demonstrates improved performance on continuous control benchmarks.
Read more
Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training
William Hoy, Binxu Wang, Xu Pan
Large Language Models Reinforcement Learning Optimization
  • ES can match or exceed GRPO in task accuracy across various settings.
  • The geometric properties of updates differ significantly between ES and GRPO.
  • ES exhibits a random-walk-like behavior in high-dimensional parameter spaces.
  • The study provides a theoretical framework for understanding ES's performance.
Read more
CANDI: Curated Test-Time Adaptation for Multivariate Time-Series Anomaly Detection Under Distribution Shift
HyunGi Kim, Jisoo Mok, Hyungyu Lee, Juhyeon Shin, Sungroh Yoon
Time Series
  • CANDI addresses the critical issue of distribution shifts in MTSAD, which can lead to significant false positives.
  • The framework employs False Positive Mining to curate informative samples for adaptation.
  • CANDI uses a lightweight Spatiotemporally-Aware Normality Adaptation module to update the model without overwriting pre-trained knowledge.
  • The proposed method shows substantial performance improvements over baseline methods, with a 14% increase in AUROC.
Read more
Robust Graph Representation Learning via Adaptive Spectral Contrast
Zhuolong Li, Boxue Yang, Haopeng Chen
Graph Learning Theory Optimization
  • Identifies a spectral dilemma in graph representation learning where high-frequency signals are crucial but sensitive to noise.
  • Proposes ASPECT, a framework that uses a reliability-aware spectral gating mechanism to enhance robustness.
  • Demonstrates that existing global spectral fusion strategies are suboptimal for mixed graphs.
  • Achieves state-of-the-art performance on 8 out of 9 benchmarks, particularly on heterophilic graphs.
Read more
Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
Samuel Bright-Thonney, Thomas R. Harvey, Andre Lukas, Jesse Thaler
Optimization Efficient ML Theory
  • Sven optimizes neural networks by treating each data point's residual as a separate condition.
  • It approximates the Moore-Penrose pseudoinverse using truncated SVD, allowing for efficient computation.
  • Sven outperforms traditional optimization methods like Adam in terms of convergence speed and final loss.
  • The method is particularly suited for over-parameterized models and can be applied to scientific computing tasks.
Read more
Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error
Taisuke Kobayashi
Reinforcement Learning Robotics Theory
  • Introduces PQAC, a novel algorithm for robust learning in RL against noisy TD errors.
  • Critiques existing heuristics for TD error stabilization, highlighting their computational inefficiencies.
  • Utilizes a sigmoid function and divergence measures to derive a robust learning rule.
  • Demonstrates stable learning performance in simulations, even with noisy rewards.
Read more
Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents
Shalima Binta Manir, Tim Oates
NLP Large Language Models
  • Introduces Care-Conditioned Neuromodulation (CCN) to enhance autonomy in supportive dialogue agents.
  • Defines a utility function that balances helpfulness with the risks of dependency and coercion.
  • Constructs a benchmark for evaluating relational failure modes in multi-turn dialogues.
  • Demonstrates significant improvements in autonomy-preserving utility over existing alignment methods.
Read more
An Online Machine Learning Multi-resolution Optimization Framework for Energy System Design Limit of Performance Analysis
Oluwamayowa O. Amusat, Luka Grbcic, Remi Patureau, M. Jibran S. Zuberi, Dan Gunter, Michael Wetter
Optimization
  • Introduces a multi-resolution optimization framework that integrates machine learning for energy system design.
  • Addresses the performance gap caused by model mismatches across different fidelity levels.
  • Demonstrates a reduction in architecture-to-operation performance gap by up to 42% compared to traditional rule-based controllers.
  • Achieves a 34% reduction in high-fidelity model evaluations through ML guidance.
Read more
Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP
Sriram Sattiraju, Vaibhav Gollapalli, Aryan Shah, Timothy McMahan
Generative Models Time Series Optimization
  • Introduces a framework for modeling cognitive energy dynamics using EEG and SBP.
  • Validates the use of WGAN-generated synthetic EEG for cognitive state transition analysis.
  • Demonstrates strong agreement in transition energies between real and synthetic EEG.
  • Proposes a neuroadaptive system that adjusts behavior based on cognitive effort in real-time.
Read more
Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
Tin HadΕΎi VeljkoviΔ‡, Joshua Rosenthal, Ivor LončariΔ‡, Jan-Willem van de Meent
Generative Models Graph Learning Efficient ML
  • Introduction of Crystalite, a lightweight diffusion Transformer for crystal modeling.
  • Utilization of Subatomic Tokenization for efficient atom representation.
  • Development of the Geometry Enhancement Module (GEM) to inject geometric biases into attention mechanisms.
  • Achievement of state-of-the-art results in crystal structure prediction and generation.
Read more
Test-Time Scaling Makes Overtraining Compute-Optimal
Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala
NLP Large Language Models Optimization
  • Introduces Train-to-Test (T2) scaling laws that optimize model size, training tokens, and inference samples under a fixed compute budget.
  • Demonstrates that optimal pretraining strategies shift towards overtraining when considering inference costs.
  • Validates the T2 scaling approach by showing improved performance of overtrained models compared to traditional scaling methods.
  • Findings persist even after post-training, indicating the relevance of T2 scaling in practical deployments.
Read more
Coupled Query-Key Dynamics for Attention
Barak Gahtan, Alex M. Bronstein
NLP Large Language Models
  • Introduces Coupled QK Dynamics, enhancing attention mechanisms by evolving queries and keys jointly.
  • Achieves significant reductions in perplexity on language modeling tasks with minimal additional parameters.
  • Demonstrates that coupling is crucial for performance, independent of the integrator type used.
  • Identifies corpus dependency of the method's effectiveness, with varying results across different datasets.
Read more
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
Yiran Ma, Jerome Le Ny, Zhichao Chen, Zhihuan Song
Theory Optimization
  • Introduces a diffusion-based framework for uncertainty quantification in industrial models.
  • Eliminates the need for post-hoc calibration by providing intrinsically calibrated predictive uncertainty.
  • Demonstrates significant improvements in uncertainty calibration and predictive accuracy over existing methods.
  • Highlights the importance of reliable uncertainty measures for safety-critical industrial applications.
Read more
LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding
Chenghao Yue, Zhiyuan Ma, Zhongye Xia, Xinche Zhang, Yisi Zhang, Xinke Shen, Sen Song
Time Series
  • LI-DSN introduces a layer-wise interactive mechanism for EEG decoding, overcoming limitations of late-fusion strategies.
  • The Temporal-Spatial Integration Attention (TSIA) mechanism enables dynamic integration of spatial and temporal features.
  • Extensive experiments show that LI-DSN outperforms 13 state-of-the-art models across various EEG tasks.
  • The model addresses the 'information silo' problem by facilitating early-layer cross-stream communication.
Read more
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
Gengsheng Li, Tianyu Yang, Junfeng Fang, Mingyang Song, Mao Zheng, Haiyun Guo, Dan Zhang, Jinqiao Wang, Tat-Seng Chua
Reinforcement Learning Large Language Models Optimization
  • SRPO combines the strengths of GRPO and SDPO to improve reinforcement learning efficiency.
  • The framework routes samples based on their correctness, enhancing optimization focus.
  • An entropy-aware mechanism helps stabilize training by emphasizing reliable signals.
  • SRPO achieves significant performance gains over existing methods on multiple benchmarks.
Read more
go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices
Torque Dandachi, Sophia Diggs-Galligan
Theory Efficient ML Large Language Models
  • Introduces go-mHC, a novel parameterization method for doubly stochastic matrices.
  • Achieves a balance between expressivity and computational efficiency with a complexity of O(dΒ³).
  • Demonstrates significant improvements in training stability and convergence speed.
  • Validates the approach on a large-scale GPT-style language model.
Read more
Forecasting Supply Chain Disruptions with Foresight Learning
Benjamin Turtel, Paul Wilczewski, Kris Skotheim
NLP Large Language Models Time Series
  • Introduces a novel forecasting task linking real-time news to future supply chain disruptions.
  • Develops an end-to-end modeling approach that trains LLMs directly on raw news inputs.
  • Achieves superior predictive performance compared to pretrained models and baselines.
  • Induces structured and decision-relevant reasoning behavior in the model.
Read more
Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
Dongrui Wu
Theory Efficient ML Optimization
  • Introduces feature weighting in distance computation for active learning in regression.
  • Proposes five new active learning approaches that incorporate feature weights.
  • Demonstrates consistent performance improvements over existing unweighted methods.
  • Validates the effectiveness of feature weighting across both single-task and multi-task regression problems.
Read more
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
Yuejiang Liu, Fan Feng, Lingjing Kong, Weifeng Lu, Jinzhou Tang, Kun Zhang, Kevin Murphy, Chelsea Finn, Yilun Du
Reinforcement Learning Robotics Efficient ML
  • WAV enables world models to self-improve by verifying their own prediction errors.
  • The framework decomposes state prediction into state plausibility and action reachability.
  • WAV utilizes action-free data from videos to enhance verification processes.
  • Empirical results show a 2Γ— increase in sample efficiency and an 18% boost in policy performance.
Read more
PAC-Bayesian Reward-Certified Outcome Weighted Learning
Yuya Ishikawa, Shu Tamano
Theory
  • Introduces PROWL, a framework that incorporates reward uncertainty into ITR estimation.
  • Establishes a certified reduction that transforms policy learning into a cost-sensitive classification problem.
  • Derives a nonasymptotic PAC-Bayes lower bound for randomized ITRs.
  • Proposes an automated calibration procedure for learning-rate selection.
Read more
MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning
Sten RΓΌdiger, Sebastian Raschka
NLP Large Language Models Efficient ML
  • MiCA focuses on adapting underutilized subspaces of model representations, unlike conventional methods that target dominant subspaces.
  • The method uses Singular Value Decomposition to identify minor singular vectors for parameter updates.
  • MiCA achieves up to 5.9x improvement in knowledge acquisition with a reduced parameter footprint of 6-60% compared to LoRA.
  • The approach minimizes catastrophic forgetting and enhances knowledge retention in large language models.
Read more
Dual-Attention Based 3D Channel Estimation
Xiangzhao Qin, Sha Hu
Theory Optimization Efficient ML
  • Introduction of a dual-attention mechanism for channel estimation in MIMO systems.
  • Theoretical foundation for optimal 3D channel estimation derived from 5G-NR systems.
  • 3DCENet outperforms traditional and existing deep learning-based channel estimation methods.
  • Significant reduction in mean-square-error (MSE) by leveraging channel correlations across multiple domains.
Read more