AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

68 Papers today
8h Update frequency
7 Days of history
Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation
Michal Balcerak, Suprosana Shit, Chinmay Prabhakar, Sebastian Kaltenbach, Michael S. Albergo, Yilun Du, Bjoern Menze
Generative Models Graph Learning Optimization
  • GEM closes the fidelity gap between discrete energy-based models and discrete diffusion models for graph generation.
  • The framework incorporates a transport-aligned discrete proposal for efficient sampling and exploration.
  • GEM enables compositional constraints and property-based objectives during inference without retraining.
  • The model achieves high-quality molecular graph generation, matching or exceeding existing state-of-the-art methods.
Read more
Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
Nobuyuki Ota
Interpretability Multimodal
  • CDT-III extends mechanism-oriented AI to encompass the entire central dogma, improving interpretability and prediction accuracy.
  • The architecture's two-stage design allows for distinct modeling of transcription and translation processes.
  • Joint prediction of RNA and protein changes leads to improved performance and interpretability.
  • The model can predict clinical side effects and generate hypotheses from perturbation data alone, without clinical data.
Read more
Scaling Attention via Feature Sparsity
Yan Xie, Tiansheng Wen, Tangda Huang, Bo Chen, Chenyu You, Stefanie Jegelka, Yifei Wang
NLP Large Language Models Efficient ML
  • Introduces Sparse Feature Attention (SFA) to reduce self-attention costs by leveraging feature sparsity.
  • FlashSFA kernel enhances efficiency by avoiding the materialization of dense score matrices.
  • Achieves up to 2.5× speedup and nearly 50% reduction in computational resources compared to dense attention.
  • Maintains accuracy and robustness in long-context scenarios, outperforming short-embedding baselines.
Read more
Computationally lightweight classifiers with frequentist bounds on predictions
Shreeram Murali, Cristian R. Rojas, Dominik Baumann
Theory Efficient ML
  • Introduction of a computationally efficient classifier based on the Nadaraya-Watson estimator.
  • Derivation of frequentist uncertainty bounds for predicted class probabilities.
  • Achieves competitive accuracy (>96%) with linear and sublinear computational complexity.
  • Validated on synthetic and real-world medical data, highlighting its practical applicability.
Read more
A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling
Ruisong Zhou, Haijun Zou, Li Zhou, Chumin Sun, Zaiwen Wen
Reinforcement Learning Optimization Theory
  • Introduction of WeCAN, a reinforcement learning framework for heterogeneous DAG scheduling.
  • Development of a two-stage single-pass design for efficient schedule generation.
  • Order-space analysis revealing generation-induced optimality gaps and conditions for their elimination.
  • Skip-extended realization to enhance scheduling efficiency while preserving single-pass capabilities.
Read more
ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention
Xinyan Wang, Xiaogeng Liu, Chaowei Xiao
Large Language Models Efficient ML NLP
  • ROM is the first method to treat overthinking mitigation as a streaming prediction-and-control problem.
  • It utilizes a lightweight detection head for real-time monitoring of token generation.
  • The introduction of Counterfactual Self-Correction (CSC) enhances token-level supervision.
  • ROM significantly reduces response length by 47.2% while improving response efficiency.
Read more
Detection of adversarial intent in Human-AI teams using LLMs
Abed K. Musaffar, Ambuj Singh, Francesco Bullo
Large Language Models NLP Reinforcement Learning
  • LLMs can act as defensive supervisors in human-AI teams, detecting adversarial intent from behavioral patterns.
  • The study utilizes a dataset from a trivia game to analyze multi-party interactions involving a malicious AI.
  • LLMs demonstrated the ability to identify malicious behavior in real-time without task-specific knowledge.
  • The research suggests that LLMs can enhance the robustness of human-AI teams against adversarial attacks.
Read more
Generalization Limits of In-Context Operator Networks for Higher-Order Partial Differential Equations
Jamie Mahowald, Tan Bui-Thanh
Theory
  • ICONs extend the capabilities of operator networks to higher-order PDEs.
  • The model maintains qualitative accuracy despite reduced point-wise accuracy in complex problems.
  • New computational methods are required for efficient training on higher-dimensional differential equations.
  • The study quantifies the generalization limits of ICONs for in-distribution and out-of-distribution problems.
Read more
CN-Buzz2Portfolio: A Chinese-Market Dataset and Benchmark for LLM-Based Macro and Sector Asset Allocation from Daily Trending Financial News
Liyuan Chen, Shilong Li, Jiangpeng Yan, Shuoling Liu, Qiang Yang, Xiu Li
NLP Large Language Models
  • Introduction of CN-Buzz2Portfolio as a benchmark for evaluating LLMs in financial asset allocation.
  • Focus on macro and sector allocation rather than individual stock picking to reduce noise and improve evaluation accuracy.
  • Development of a Tri-Stage CPA Agent Workflow for systematic assessment of LLMs.
  • Significant disparities observed among LLMs in translating financial narratives into actionable portfolio strategies.
Read more
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores
Zvi N. Badash, Yonatan Belinkov, Moti Freiman
NLP Large Language Models Interpretability
  • Introduces a compact uncertainty estimation method using intra-layer local information scores.
  • Achieves competitive performance compared to traditional probing methods with a single forward pass.
  • Demonstrates robustness under cross-dataset transfer and quantization.
  • Provides insights into cross-layer agreement patterns in LLMs.
Read more
Calibeating Made Simple
Yurong Chen, Zhiyi Huang, Michael I. Jordan, Haipeng Luo
Theory Optimization
  • Calibeating is shown to be minimax-equivalent to regret minimization, allowing for a unified analysis across different loss functions.
  • New optimal rates for multi-calibeating are derived, improving upon previous results for multiple forecasters.
  • The paper introduces a meta-algorithm that achieves simultaneous calibeating and calibration for the Brier loss, providing optimal rates.
  • The results extend existing guarantees for specific losses to a broader class of mixable and bounded losses.
Read more
Causal Discovery in Action: Learning Chain-Reaction Mechanisms from Interventions
Panayiotis Panayiotou, Özgür Şimşek
Theory Graph Learning
  • Causal discovery is feasible in chain-reaction systems using blocking interventions.
  • The proposed method achieves exponential error decay and logarithmic sample complexity.
  • Experiments validate the effectiveness of the method in diverse causal environments.
  • Observational heuristics fail in complex scenarios, highlighting the need for interventional approaches.
Read more
Robustness Quantification for Discriminative Models: a New Robustness Metric and its Application to Dynamic Classifier Selection
Rodrigo F. L. Lassance, Jasper De Bock
Theory Interpretability
  • Introduction of a new robustness metric applicable to any probabilistic discriminative classifier.
  • The metric is based on Constant Odds Ratio (COR) perturbation, allowing for use with continuous and mixed features.
  • Demonstrated superior correlation with accuracy compared to existing robustness metrics.
  • Application of the robustness metric in dynamic classifier selection strategies.
Read more
MsFormer: Enabling Robust Predictive Maintenance Services for Industrial Devices
Jiahui Zhou, Dan Li, Ruibing Jin, Jian Lou, Yanran Zhao, Zhenghua Chen, Zigui Jiang, See-Kiong Ng
Time Series
  • MsFormer is designed to capture multi-scale temporal dependencies in industrial IoT sensor data.
  • The model incorporates a Multi-scale Sampling module and a lightweight attention mechanism for improved performance in data-scarce environments.
  • Extensive experiments show that MsFormer outperforms existing predictive maintenance models across diverse datasets and conditions.
  • The framework addresses the limitations of traditional deep learning methods in modeling long-term degradation patterns.
Read more
Robustness Quantification and Uncertainty Quantification: Comparing Two Methods for Assessing the Reliability of Classifier Predictions
Adrián Detavernier, Jasper De Bock
Theory
  • Robustness Quantification (RQ) outperforms Uncertainty Quantification (UQ) in assessing classifier reliability.
  • RQ and UQ are complementary approaches that can be combined for enhanced reliability assessments.
  • The study utilizes real benchmark datasets to validate the effectiveness of RQ and UQ.
  • Both methods address the inherent uncertainties in classifier predictions, particularly in high-stakes scenarios.
Read more
A Multi-Task Targeted Learning Framework for Lithium-Ion Battery State-of-Health and Remaining Useful Life
Chenhan Wang, Zhengyi Bao, Huipin Lin, Jiahao Nie, Chunxiang Zhu
Time Series Optimization
  • Proposes a multi-task targeted learning framework for SOH and RUL prediction.
  • Integrates multi-scale CNNs, improved extended LSTM, and dual-stream attention modules.
  • Achieves significant performance improvements over traditional and state-of-the-art methods.
  • Utilizes Hyperopt for optimization, reducing manual hyperparameter tuning.
Read more
Behavioral Heterogeneity as Quantum-Inspired Representation
Mohammad Elayan, Wissam Kontar
Theory Time Series Robotics
  • Introduces a quantum-inspired framework for modeling driver behavior as evolving latent states.
  • Uses density matrices to capture the dynamic transitions between different driving behaviors.
  • Employs non-linear Random Fourier Features for embedding behavioral observations.
  • Demonstrates the approach on empirical driving data, highlighting its effectiveness in extracting driving profiles.
Read more
Balancing Safety and Efficiency in Aircraft Health Diagnosis: A Task Decomposition Framework with Heterogeneous Long-Micro Scale Cascading and Knowledge Distillation-based Interpretability
Xinhang Chen, Zhihuan Wei, Yang Hu, Zhiguo Zeng, Kang Zeng, Suili Yang
Interpretability Time Series Efficient ML
  • Introduction of the Diagnosis Decomposition Framework (DDF) for aircraft health diagnosis.
  • Separation of diagnosis into Anomaly Detection (AD) and Fault Classification (FC) for improved efficiency.
  • Utilization of advanced techniques like ConvTokMHSA and MMK Net for feature extraction.
  • Implementation of knowledge distillation for interpretability in decision-making.
Read more
MCLR: Improving Conditional Modeling in Visual Generative Models via Inter-Class Likelihood-Ratio Maximization and Establishing the Equivalence between Classifier-Free Guidance and Alignment Objectives
Xiang Li, Yixuan Jia, Xiao Li, Jeffrey A. Fessler, Rongrong Wang, Qing Qu
Generative Models Computer Vision Theory
  • Introduces MCLR, a new training objective that enhances inter-class separation in diffusion models.
  • Establishes a theoretical equivalence between classifier-free guidance and alignment objectives.
  • Demonstrates that MCLR can achieve CFG-like improvements without requiring inference-time guidance.
  • Empirical results show significant qualitative and quantitative gains in generative performance.
Read more
Spiking Personalized Federated Learning for Brain-Computer Interface-Enabled Immersive Communication
Chen Shang, Dinh Thai Hoang, Diep N. Nguyen, Jiadong Yu
Federated Learning Multimodal Efficient ML
  • Introduction of a BCI-driven framework for immersive communication.
  • Development of a personalized federated learning model that processes neurodiverse brain signals.
  • Integration of spiking neural networks to reduce energy consumption in on-device learning.
  • Experimental validation showing improved accuracy and energy efficiency compared to traditional methods.
Read more
A Comparative Study of Machine Learning Models for Hourly Forecasting of Air Temperature and Relative Humidity
Jiaqi Dong
Time Series
  • XGBoost demonstrated the highest predictive accuracy among the models tested.
  • The study utilized a comprehensive dataset of hourly meteorological observations from Chongqing.
  • A systematic approach to data preprocessing and feature engineering was employed to enhance model performance.
  • The results indicate significant potential for machine learning in short-term meteorological forecasting.
Read more
Interpretable Multiple Myeloma Prognosis with Observational Medical Outcomes Partnership Data
Salma Rachidi, Aso Bozorgpanah, Eric Fey, Alexander Jung
Interpretability
  • Introduction of two training-time regularizers for interpretability in ML models.
  • Alignment of complex models with simpler, interpretable models to guide predictions.
  • Incorporation of clinical staging systems into the learning objective for consistency.
  • Demonstrated competitive predictive performance on real-world clinical data.
Read more
KV Cache Optimization Strategies for Scalable and Efficient LLM Inference
Yichun Xu, Navjot K. Khaira, Tejinder Singh
Large Language Models Optimization Efficient ML
  • KV cache optimization is essential for efficient LLM deployment, especially with increasing context lengths.
  • The paper categorizes KV cache strategies into five main directions, each with unique trade-offs.
  • No single optimization technique is optimal across all scenarios; context length and hardware constraints matter.
  • Adaptive multi-stage optimization pipelines are suggested as a future research direction.
Read more
A Bayesian Learning Approach for Drone Coverage Network: A Case Study on Cardiac Arrest in Scotland
Tathagata Basu, Edoardo Patelli, Gianluca Filippi, Ben Parsonage, Christy Maddock, Massimiliano Vasile, Marco Fossati, Adam Loyd, Shaun Marshall, Paul Gowens
Optimization Robotics
  • Introduces a Bayesian learning framework for optimizing drone-assisted AED delivery networks.
  • Focuses on the survival probability of OHCA patients to determine optimal drone station locations.
  • Demonstrates the impact of environmental variability and spatial demand on network design.
  • Assesses the economic viability of the proposed network through cost-effectiveness analysis.
Read more
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale
YanZhao Zheng, ZhenTao Zhang, Chao Ma, YuanQiang Yu, JiHuan Zhu, Baohua Dong, Hangcheng Zhu
Large Language Models NLP Efficient ML
  • Skill routing is a critical yet under-explored problem in LLM agent ecosystems.
  • The full implementation body of skills is essential for accurate skill selection, contrary to previous assumptions.
  • SKILLROUTER achieves 74.0% top-1 routing accuracy with a compact architecture suitable for consumer hardware.
  • A standardized evaluation benchmark with 80K skills and expert-verified queries is established.
Read more
Confidence Calibration under Ambiguous Ground Truth
Linwei Tao, Haoyang Luo, Minjing Dong, Chang Xu
Theory
  • Traditional confidence calibration methods fail under ambiguous ground truth due to reliance on majority-voted labels.
  • Temperature Scaling is biased towards underestimating annotator uncertainty, leading to increased miscalibration.
  • The proposed ambiguity-aware calibrators optimize against the full label distribution, improving calibration accuracy.
  • Dirichlet-Soft shows the best performance, reducing true-label Expected Calibration Error (ECE) by 55-87%.
Read more
Latent Semantic Manifolds in Large Language Models
Mohamed A. Mabrok
Large Language Models Theory NLP
  • Introduces a rigorous mathematical framework for LLMs as latent semantic manifolds.
  • Defines the expressibility gap, measuring the mismatch between continuous representations and finite vocabularies.
  • Proves two theorems connecting manifold geometry to limitations of finite vocabularies.
  • Validates theoretical predictions across multiple transformer architectures.
Read more
Large Neighborhood Search meets Iterative Neural Constraint Heuristics
Yudong W. Xu, Wenhao Li, Scott Sanner, Elias B. Khalil
Optimization
  • Introduces the ConsFormer-LNS framework that combines neural heuristics with Large Neighborhood Search.
  • Demonstrates the effectiveness of prediction-guided destroy operators in selecting neighborhoods.
  • Finds that stochastic destroy operators outperform greedy ones, while greedy repair methods are more effective than sampling-based ones.
  • Shows substantial performance gains over traditional neural and classical baselines in CSP benchmarks.
Read more
A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
Louis Claeys, Artur Goldman, Zebang Shen, Niao He
Optimization Reinforcement Learning Theory
  • Introduces a novel method for long-horizon stochastic optimal control using Schrödinger eigenfunctions.
  • Demonstrates that the HJB equation can be reduced to a linear PDE under specific conditions.
  • Achieves significant improvements in control accuracy and efficiency compared to state-of-the-art methods.
  • Proposes a new loss function for eigenfunction learning that mitigates performance degradation.
Read more
Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy
Andrii Shportko
Theory Large Language Models NLP
  • Semantic-preserving steganographic embedding increases Kolmogorov complexity.
  • The complexity increase is quantified by K(M2) ≥ K(M1) + K(P) - O(log n).
  • Language-model perplexity serves as a computable proxy for detecting complexity increases.
  • The Binoculars perplexity-ratio score effectively distinguishes stegotext from non-stegotext.
Read more
CRPS-Optimal Binning for Conformal Regression
Paolo Toccaceli
Theory Efficient ML Interpretability
  • Introduces a method for non-parametric conditional distribution estimation using optimal binning.
  • Derives a closed-form LOO-CRPS cost function for efficient computation.
  • Utilizes dynamic programming to find the globally optimal K-partition.
  • Proposes a cross-validated approach for selecting the number of bins to avoid in-sample optimism.
Read more
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
Yehjin Shin, Seojin Kim, Noseong Park
NLP Large Language Models Efficient ML
  • Introduction of HADES, a GSP-inspired framework for adaptive filtering in SSMs.
  • Hierarchical architecture with shared and expert filters enhances model efficiency and expressivity.
  • Achieves competitive performance while using significantly fewer parameters than Mamba2.
  • Demonstrates effective capture of local and global dependencies in sequence data.
Read more
Breaking the $O(\sqrt{T})$ Cumulative Constraint Violation Barrier while Achieving $O(\sqrt{T})$ Static Regret in Constrained Online Convex Optimization
Haricharan Balasundaram, Karthick Krishna Mahendran, Rahul Vaze
Optimization Theory
  • Introduces an algorithm that achieves O(√T) static regret and O(T^(1/3)) cumulative constraint violation for constrained online convex optimization in 2 dimensions.
  • Refutes the belief that CCV must be at least O(√T) when static regret is O(√T) for dimensions d ≥ 2.
  • Demonstrates the importance of geometric properties of feasible sets in optimizing both regret and CCV.
  • Builds on prior work while providing a more efficient solution for specific cases in COCO.
Read more
Full waveform inversion method based on diffusion model
Caiyun Liu, Siyang Pei, Qingfeng Yu, Jie Xiong
Generative Models Optimization Theory
  • Introduction of a conditional diffusion model for full waveform inversion.
  • Improvement of inversion resolution and structural fidelity through density information integration.
  • Enhanced stability and robustness in complex inversion scenarios.
  • Utilization of implicit prior distributions to regularize the inversion process.
Read more
Towards Practical Multimodal Hospital Outbreak Detection
Chang Liu, Jieshi Chen, Alexander J. Sundermann, Kathleen Shutt, Marissa P. Griffith, Lora Lee Pless, Lee H. Harrison, Artur W. Dubrawski
Multimodal
  • Integration of MALDI-TOF, AR patterns, and EHR data significantly improves outbreak detection performance.
  • A tiered surveillance paradigm is proposed to reduce reliance on costly WGS.
  • The study identifies high-risk clinical procedures that can inform proactive infection prevention strategies.
  • Machine learning techniques are employed to extract discriminative features from diverse data modalities.
Read more
Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs
Qi Luo, Minghui Xu, Dongxiao Yu, Xiuzhen Cheng
Graph Learning
  • TAGBD is a novel framework for text-only backdoor attacks on text-attributed graphs.
  • The attack leverages uncertainty-guided node selection and graph-aware trigger generation.
  • Two injection strategies (Overwriting and Appending) allow for a trade-off between attack strength and stealth.
  • Experiments show TAGBD achieves high attack success rates while preserving clean accuracy.
Read more
Hybrid Associative Memories
Leon Lufkin, Tomás Figliolia, Beren Millidge, Kamesh Krishnamurthy
NLP Large Language Models Efficient ML
  • Introduction of the Hybrid Associative Memory (HAM) layer combining RNNs and self-attention.
  • HAM allows precise control over KV-cache growth, enabling flexible performance trade-offs.
  • Empirical results show HAM outperforms state-of-the-art RNNs and is competitive with Transformers.
  • Detailed analysis of HAM's internal workings enhances understanding of its performance dynamics.
Read more
Permutation-Symmetrized Diffusion for Unconditional Molecular Generation
Gyeonghoon Ko, Juho Lee
Generative Models
  • Introduces a novel diffusion model that directly incorporates permutation symmetry in molecular generation.
  • Derives an explicit expression for the heat kernel on the quotient manifold, enhancing understanding of diffusion processes.
  • Utilizes MCMC to approximate the permutation-symmetrized score, addressing challenges in training.
  • Demonstrates competitive performance in unconditional 3D molecular generation tasks on the QM9 dataset.
Read more
A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning
Emmanouil M. Athanasakos
Federated Learning Optimization Efficient ML
  • Introduces a framework for energy-aware learning in Federated Learning.
  • Proposes Cost-Weighted Magnitude Pruning (CWMP) as an optimal greedy solution for energy-efficient gradient pruning.
  • Demonstrates that CWMP significantly improves performance-energy trade-offs compared to traditional Top-K pruning methods.
  • Formalizes the energy costs associated with parameter updates, addressing hardware-level disparities.
Read more
Trained Persistent Memory for Frozen Decoder-Only LLMs
Hong Jeong
Large Language Models NLP Generative Models
  • Adaptation of six memory methods to decoder-only LLMs, replacing cross-attention with self-attention.
  • Identification of an inductive-bias dichotomy where only methods with strong architectural priors succeed at lower capacities.
  • Demonstration that all methods converge at higher capacities, indicating architectural bias influences performance.
  • Establishment of persistent latent-space memory as a general paradigm for transformer models.
Read more
On the Interplay of Priors and Overparametrization in Bayesian Neural Network Posteriors
Julius Kobialka, Emanuel Sommer, Chris Kolb, Juntae Kwon, Daniel Dold, David Rügamer
Theory Optimization
  • Overparametrization significantly influences the shape and geometry of BNN posteriors.
  • Three key phenomena—balancedness, weight reallocation, and prior conformity—emerge from redundancy in overparametrized models.
  • The study provides a theoretical foundation linking optimization properties to prior choices and posterior shapes.
  • Extensive experiments show that overparametrization improves sampling-based inference in BNNs.
Read more
Hybrid Autoencoder-Isolation Forest approach for time series anomaly detection in C70XP cyclotron operation data at ARRONAX
F Basbous, F Poirier, F Haddad, D Mateus
Time Series
  • Proposes a hybrid AE-IF model to improve anomaly detection in time series data.
  • Addresses limitations of standard Isolation Forest in detecting subtle anomalies.
  • Utilizes reconstruction errors from an Autoencoder as input features for Isolation Forest.
  • Demonstrates improved detection performance validated on real-world cyclotron operation data.
Read more
PLR: Plackett-Luce for Reordering In-Context Learning Examples
Paweł Batorski, Paul Swoboda
NLP Large Language Models Optimization
  • PLR introduces a distributional approach to ICL example ordering, enhancing performance without requiring exhaustive search.
  • The method is label-space agnostic, making it applicable to a wider range of tasks including open-ended generation.
  • PLR employs a Gumbel perturb-and-sort procedure for efficient sampling of example orderings.
  • Experiments show significant improvements in few-shot accuracy across multiple benchmarks.
Read more
Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates
Samrendra Roy, Kazuma Kobayashi, Souvik Chakraborty, Rizwan-uddin, Syed Bahauddin Alam
Theory Optimization
  • Neural operators are vulnerable to sparse, physically plausible adversarial perturbations.
  • Minimal modifications can lead to catastrophic prediction failures, undetectable by standard validation metrics.
  • The effective perturbation dimension (deff) is introduced as a diagnostic tool for assessing vulnerability.
  • Gradient-free search methods outperform gradient-based methods in exploiting these vulnerabilities.
Read more
Universal and efficient graph neural networks with dynamic attention for machine learning interatomic potentials
Shuyu Bi, Zhede Zhao, Qiangchao Sun, Tao Hu, Xionggang Lu, Hongwei Cheng
Graph Learning Efficient ML
  • MLANet introduces a dual-path dynamic attention mechanism for improved message passing in graph neural networks.
  • The model achieves high accuracy while significantly reducing computational costs compared to mainstream equivariant models.
  • MLANet is validated across a wide range of datasets, demonstrating its versatility in modeling various atomic environments.
  • The framework enables stable long-time molecular dynamics simulations, addressing critical challenges in current MLIP approaches.
Read more
GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL
Haoyu Wang, Jingcheng Wang, Shunyu Wu, Xinwei Xiao
Reinforcement Learning
  • GEM introduces a candidate-based action selection interface that enhances decision-making in offline RL.
  • The framework employs a GMM actor trained with advantage-weighted EM-style updates to maintain multimodal action distributions.
  • Inference is guided by a scoring rule that balances uncertainty and support, enabling stable deployment across states.
  • GEM allows for a flexible candidate budget, improving decision quality without requiring retraining.
Read more
Conditionally Identifiable Latent Representation for Multivariate Time Series with Structural Dynamics
Minkey Chang, Jae-Young Kim
Time Series
  • Introduction of the Identifiable Variational Dynamic Factor Model (iVDFM) for multivariate time series.
  • Achieves identifiability by conditioning the innovation process rather than latent states.
  • Preserves identifiability through linear diagonal dynamics, avoiding traditional rotation ambiguities.
  • Demonstrates improved factor recovery and stable intervention accuracy on synthetic and real-world data.
Read more
Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors
Juan Sebastian Rojas, Chi-Guhn Lee
Reinforcement Learning Theory
  • The paper identifies and formalizes the differences between two interpretations of the TD error in deep RL.
  • Nonlinear deep RL architectures can lead to significant discrepancies in TD error calculations.
  • Choosing one interpretation of the TD error over the other can impact the performance of RL algorithms.
  • The findings challenge the conventional understanding of TD error in deep RL settings.
Read more
Mechanisms of Introspective Awareness
Uzay Macar, Li Yang, Atticus Wang, Peter Wallich, Emmanuel Ameisen, Jack Lindsey
NLP Large Language Models Interpretability
  • Introspective awareness in LLMs is behaviorally robust with 0% false positives.
  • Detection capability emerges from post-training, not pretraining.
  • Anomaly detection involves distributed computation across multiple directions.
  • Ablating refusal directions significantly enhances detection rates.
Read more
SpecXMaster Technical Report
Yutang Ge, Yaning Cui, Hanzheng Li, Jun-Jie Wang, Fanjie Xu, Jinhan Dong, Yongqi Jin, Dongxu Cui, Peng Jin, Guojiang Zhao, Hengxing Cai, Rong Zhu, Linfeng Zhang, Xiaohong Ji, Zhifeng Gao
Reinforcement Learning
  • SpecXMaster automates NMR spectral interpretation using Agentic Reinforcement Learning.
  • The framework processes raw FID data directly, improving accuracy and efficiency.
  • It has shown superior performance on public NMR interpretation benchmarks.
  • Iterative evaluations by experts have refined the system's capabilities.
Read more
Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity
Zixuan Zhang, Kaixuan Huang, Tuo Zhao, Mengdi Wang, Minshuo Chen
Generative Models Theory
  • Introduces a formal statistical framework for diffusion models on low-dimensional manifolds.
  • Develops a novel score decomposition approach for analyzing score functions under different noise levels.
  • Constructs neural network architectures tailored for effective score function approximation.
  • Establishes statistical rates for score estimation and distribution learning based on manifold curvature and intrinsic dimensionality.
Read more
Constrained Online Convex Optimization with Memory and Predictions
Mohammed Abdullah, George Iosifidis, Salah Eddine Elayoubi, Tijani Chahed
Optimization Theory
  • Introduction of COCO-M framework for constrained online convex optimization with memory.
  • Development of algorithms achieving sublinear regret and constraint violation under time-varying constraints.
  • Adaptive penalty approach for scenarios without predictions.
  • Optimistic algorithm designed for cases with predictions, improving performance with prediction accuracy.
Read more
From Causal Discovery to Dynamic Causal Inference in Neural Time Series
Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge
Time Series
  • DCNAR integrates causal discovery with time-varying causal inference, addressing the challenge of unknown causal structures.
  • The framework emphasizes interpretability and stability of causal inferences over mere predictive accuracy.
  • Behavioral diagnostics are used to evaluate the scientific validity of the model, focusing on causal necessity and temporal stability.
  • Experiments show that DCNAR outperforms traditional methods in terms of stability and meaningfulness of causal inferences.
Read more
COMPASS-Hedge: Learning Safely Without Knowing the World
Ting Hu, Luanda Cai, Manolis Vlatakis
Theory Optimization
  • COMPASS-Hedge achieves minimax-optimal regret in adversarial environments.
  • It provides instance-optimal, gap-dependent regret in stochastic settings.
  • The algorithm maintains near-constant regret relative to a designated baseline policy.
  • COMPASS-Hedge is parameter-free and does not require prior knowledge of the environment.
Read more
DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
Xiaoming Yu, Shize Tang, Guanghua Yu, Linchuan Xie, Song Liu, Jianchen Zhu, Feng Li
Large Language Models Efficient ML Optimization
  • DAQ preserves critical post-training knowledge by focusing on small-magnitude parameter updates.
  • The framework employs delta-aware metrics instead of traditional reconstruction loss to optimize quantization.
  • DAQ is data-free, requiring only the base and post-trained weight matrices for quantization.
  • Preliminary results show that DAQ can recover capabilities lost in standard quantization while maintaining performance.
Read more
SynForceNet: A Force-Driven Global-Local Latent Representation Framework for Lithium-Ion Battery Fault Diagnosis
Rongxiu Chen, Yuting Su
Theory Optimization Time Series
  • Introduces SynForceNet, a novel framework for online battery fault diagnosis.
  • Combines kernel one-class classification with minimum-volume estimation for anomaly detection.
  • Achieves significant improvements in diagnostic performance metrics compared to baseline methods.
  • Explores the spatial separation of fault representations and enhances robustness through manifold learning.
Read more
SafeSeek: Universal Attribution of Safety Circuits in Language Models
Miao Yu, Siyuan Fu, Moayad Aloqaily, Zhenhong Zhou, Safa Otoum, Xing Fan, Kun Wang, Yufei Guo, Qingsong Wen
NLP Large Language Models Interpretability
  • SafeSeek provides a unified framework for discovering safety circuits in LLMs, overcoming limitations of heuristic search methods.
  • The framework reveals that safety behaviors are often governed by highly sparse circuits, which are structurally distinct from general utility components.
  • SafeSeek enables precise enhancement or removal of safety abilities through its optimization-based approach.
  • The empirical validation shows significant reductions in attack success rates while preserving general model capabilities.
Read more
Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts
Maria Conchita Agana Navarro, Geng Li, Theo Wolf, Maria Perez-Ortiz
Time Series
  • The study benchmarks ML climate emulators under strictly historical training conditions.
  • An accuracy vs. stability trade-off is identified, with ClimaX showing lower absolute error but higher sensitivity to distribution shifts.
  • Simpler CNN architectures demonstrate greater stability compared to high-capacity models.
  • A temperature-precipitation disparity is observed, indicating different levels of model robustness.
Read more
Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
Janaka Chathuranga Brahmanage, Akshat Kumar
Reinforcement Learning Robotics Optimization
  • Introduction of Budget-Conditioned Reachability framework for safe offline RL.
  • Decoupling of reward maximization from cumulative safety cost constraints.
  • Dynamic budgets used to prune unsafe actions and guide value estimation.
  • BCRL integrates with existing offline RL algorithms, enhancing their safety without requiring online interactions.
Read more
Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
Ruihua Chen, Yisi Luo, Bangyu Wu, Deyu Meng
Theory Optimization
  • CR-FWI enhances robustness against initial model inaccuracies and poor seismic data quality.
  • The wave-based NTK framework provides a theoretical understanding of the dynamic behavior of CR-FWI.
  • Eigenvalue decay properties of the wave-based NTK explain the slower high-frequency convergence of CR-FWI.
  • The proposed IG-FWI method achieves a better trade-off between robustness and convergence rate.
Read more
ST-GDance++: A Scalable Spatial-Temporal Diffusion for Long-Duration Group Choreography
Jing Xu, Weiqiang Wang, Cunjian Chen, Jun Liu, Qiuhong Ke
Generative Models Graph Learning Time Series
  • ST-GDance++ decouples spatial and temporal dependencies for efficient group choreography generation.
  • Lightweight distance-aware graph convolutions are used to capture inter-dancer relationships with reduced computational cost.
  • A diffusion noise scheduling strategy enhances the generation of long-duration motion sequences.
  • The framework significantly reduces latency while maintaining competitive generation quality.
Read more
TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning
Dilina Rajapakse, Juan C. Rosero, Ivana Dusparic
Reinforcement Learning Interpretability Robotics
  • Introduction of TREX, a trajectory-based explainability framework for MORL.
  • Quantitative analysis of behavioral patterns influencing objective trade-offs.
  • Demonstration of TREX's applicability in standard MORL environments.
  • Ability to cluster trajectories into meaningful segments for better understanding.
Read more
The Coordinate System Problem in Persistent Structural Memory for Neural Architectures
Abhinaba Basu
Theory
  • Introduction of the Dual-View Pheromone Pathway Network (DPPN) for persistent structural memory.
  • Identification of coordinate stability and graceful transfer mechanisms as critical requirements for effective memory.
  • Demonstration that fixed random Fourier features provide stable coordinates but do not ensure transfer advantage.
  • Evidence that learning-rate modulation is more effective than routing bias for preventing negative transfer.
Read more
Large Language Models for Missing Data Imputation: Understanding Behavior, Hallucination Effects, and Control Mechanisms
Arthur Dantas Mangussi, Ricardo Cardoso Pereira, Ana Carolina Lorena, Pedro Henriques Abreu
Large Language Models NLP
  • LLMs demonstrate superior performance in missing data imputation for real-world datasets compared to traditional methods.
  • The effectiveness of LLMs is closely tied to their pre-training on domain-specific patterns from large corpora.
  • Traditional imputation methods outperform LLMs on synthetic datasets, highlighting the importance of semantic context.
  • LLMs incur higher computational costs and time, presenting a trade-off between quality and efficiency.
Read more
Beyond the Mean: Distribution-Aware Loss Functions for Bimodal Regression
Abolfazl Mohammadi-Seif, Carlos Soares, Rita P. Ribeiro, Ricardo Baeza-Yates
Theory Optimization
  • Proposes distribution-aware loss functions to address bimodal regression challenges.
  • Integrates normalized RMSE with Wasserstein and Cramér distances for improved predictive modeling.
  • Demonstrates significant reduction in Jensen-Shannon Divergence compared to standard methods.
  • Establishes a new Pareto efficiency frontier in stability and fidelity for regression tasks.
Read more
Does This Gradient Spark Joy?
Ian Osband
Reinforcement Learning Efficient ML Theory
  • Introduces the Kondo gate to optimize backward passes in policy gradient methods.
  • Delight, a combination of advantage and surprisal, serves as a more effective signal for sample selection.
  • The Kondo gate allows for significant computational savings while preserving learning quality.
  • Demonstrated effectiveness on MNIST and transformer token reversal tasks.
Read more
Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks
Hang-Cheng Dong, Pengcheng Cheng
Theory Optimization
  • Introduces a differential-geometric framework for analyzing shallow neural networks through quotient spaces.
  • Characterizes the symmetry and quotient structure of shallow-network parameters, leading to a natural metric.
  • Demonstrates that effective curvature can be defined on the quotient manifold, removing degeneracy from Hessians.
  • Establishes that only horizontal parameter motions contribute to predictor evolution, while vertical motions are gauge variations.
Read more
Weak-PDE-Net: Discovering Open-Form PDEs via Differentiable Symbolic Networks and Weak Formulation
Xinxin Li, Xingyu Cui, Jin Qi, Juan Zhang, Da Li, Junping Yin
Theory
  • Weak-PDE-Net is an end-to-end differentiable framework for discovering open-form PDEs.
  • The framework combines a forward response learner with a weak-form PDE generator to enhance robustness against noise.
  • Differentiable Neural Architecture Search is employed to dynamically construct a library of function terms for PDE discovery.
  • Physical constraints are integrated to ensure that discovered equations adhere to physical laws.
Read more