AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

63 Papers today
8h Update frequency
7 Days of history
Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization
Yong Si, Mingfei Lu, Jing Li, Yang Hu, Guijiang Li, Yueheng Song, Zhaokui Wang
Reinforcement Learning Optimization
  • Introduction of a hierarchical framework for fleet-level PHM decision-making.
  • Two-tier architecture separating strategic and tactical decision-making.
  • Integration of layered reward shaping and planning-enhanced neural networks.
  • Demonstrated superior performance compared to conventional DRL and rule-based methods.
Read more
A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis
Sk Miraj Ahmed, Yuewei Lin, Chuntian Cao, Shinjae Yoo, Xinpei Wu, Won-Il Lee, Nikhil Tiwale, Dan N. Le, Thi Thu Huong Chu, Jiyoung Kim, Kevin G. Yager, Chang-Yong Nam
Computer Vision
  • Introduction of the first foundation model for SEM image analysis.
  • Utilizes a self-supervised transformer architecture with a Mixture of Experts mechanism.
  • Pretrained on a large dataset of SEM images to enhance generalization across diverse conditions.
  • Demonstrates superior performance in defocus-to-focus image translation tasks.
Read more
Learning to Query History: Nonstationary Classification via Learned Retrieval
Jimmy Gammell, Bishal Thapaliya, Yoon Jung, Riyasat Ohib, Bilel Fehri, Deepayan Chakrabarti
Time Series
  • Nonstationarity in classification is addressed by leveraging historical labeled examples.
  • A learned retrieval mechanism samples relevant historical data, improving efficiency.
  • The approach allows for adaptation to distribution shifts without retraining.
  • Experiments show significant robustness improvements over standard classifiers.
Read more
Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems
Charbel Bou Chaaya, Mehdi Bennis
Reinforcement Learning Multimodal Graph Learning
  • Introduces a self-supervised multimodal learning framework for V2I systems.
  • Utilizes rotation symmetries to reduce the search space in decentralized MARL.
  • Implements a graph neural network for policy computation and coordination among RSUs.
  • Achieves significant accuracy and performance improvements over existing methods.
Read more
PD-SOVNet: A Physics-Driven Second-Order Vibration Operator Network for Estimating Wheel Polygonal Roughness from Axle-Box Vibrations
Xiancheng Wang, Lin Wang, Rui Wang, Zhibo Zhang, Minghang Zhao, Xiaoheng Zhang, Zhongyue Tan, Kaitai Mao
Time Series
  • Introduction of PD-SOVNet, a physics-guided framework for wheel roughness estimation.
  • Integration of multiple innovative components, including second-order vibration kernels and a MIMO coupling module.
  • Demonstrated competitive accuracy and stability across various datasets, especially under challenging conditions.
  • Highlights the importance of structured physical priors in improving regression stability.
Read more
Optimal-Transport-Guided Functional Flow Matching for Turbulent Field Generation in Hilbert Space
Li Kunpeng, Wan Chenguang, Qu Zhisong, Lim Kyungtak, Virginie Grandgirard, Xavier Garbet, Yu Hua, Ong Yew Soon
Generative Models
  • FOT-CFM generalizes Conditional Flow Matching to infinite-dimensional Hilbert spaces, enhancing turbulence modeling.
  • The integration of Optimal Transport theory allows for efficient and accurate generation of turbulent fields.
  • The method achieves high-quality sampling with fewer function evaluations compared to traditional diffusion-based approaches.
  • FOT-CFM demonstrates superior fidelity in reproducing turbulent statistics across complex chaotic systems.
Read more
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
Neharika Jali, Anupam Nayak, Gauri Joshi
Large Language Models Reinforcement Learning Efficient ML
  • Introduces Turn-Adaptive Budgets (TAB) for efficient multi-turn reasoning in LLMs.
  • Models multi-turn reasoning as a multi-objective Markov Decision Process (MDP).
  • Achieves up to 35% token savings while maintaining accuracy on benchmarks.
  • Proposes TAB All-SubQ for systems with prior knowledge of sub-questions, saving up to 40% tokens.
Read more
Drifting Fields are not Conservative
Leonard Franz, Sebastian Hoffmann, Georg Martius
Generative Models Optimization Theory
  • Drift fields in generative models are generally non-conservative and cannot be expressed as gradients of scalar potentials.
  • The Gaussian kernel is an exception where the drift field is conservative.
  • A new normalization method using the sharp kernel restores conservatism for radial kernels.
  • The drifting field matching objective is more general than scalar loss minimization but offers minimal practical advantages.
Read more
Transformer See, Transformer Do: Copying as an Intermediate Step in Learning Analogical Reasoning
Philipp Hellwig, Willem Zuidema, Claire E. Stevenson, Martha Lewis
NLP Large Language Models Theory
  • Transformers can learn analogical reasoning through a meta-learning approach.
  • Incorporating copying tasks in training data improves generalization to new alphabets.
  • The proposed model outperforms many existing large language models on letter-string analogy tasks.
  • Interpretability analyses reveal the model's reasoning mechanisms.
Read more
VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts
Peigui Qi, Kunsheng Tang, Yanpu Yu, Jialin Wu, Yide Song, Wenbo Zhou, Zhicong Huang, Cheng Hong, Weiming Zhang, Nenghai Yu
Multimodal
  • Introduction of the MAFE framework for effective multimodal feature extraction.
  • Development of VLMShield as a lightweight and efficient safety detector.
  • Demonstration of superior performance compared to existing defense methods.
  • Identification of distinct patterns in benign vs. malicious prompts.
Read more
Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse
Gregory Magarshak
Generative Models Reinforcement Learning Efficient ML
  • Introduction of Probabilistic Language Tries (PLTs) as a unified representation for generative models.
  • PLTs enable optimal lossless compression, decision policy representation, and efficient execution reuse.
  • A prior-guided caching theorem shows PLTs outperform empirical-frequency caches in inference cost.
  • Hybrid compression architecture achieves description lengths below Shannon entropy when the model is accurate.
Read more
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
Iva MikuΕ‘, Boris Muha, Domagoj Vlah
Theory Efficient ML Optimization
  • AE-ViT integrates convolutional encoding, transformer-based latent evolution, and decoding for parametric PDE modeling.
  • The model employs multi-stage parameter injection and coordinate channel injection to improve conditioning and spatial awareness.
  • AE-ViT outperforms existing deep learning reduced order models and latent transformers in multi-field predictions.
  • The approach significantly reduces relative rollout error by approximately five times compared to traditional methods.
Read more
TwinLoop: Simulation-in-the-Loop Digital Twins for Online Multi-Agent Reinforcement Learning
Nan Zhang, Zishuo Wang, Shuyu Huang, Georgios Diamantopoulos, Nikos Tziritas, Panagiotis Oikonomou, Georgios Theodoropoulos
Reinforcement Learning
  • TwinLoop leverages digital twins for accelerated policy adaptation in multi-agent systems.
  • The framework enables cost-free exploration through simulation before applying changes in the real environment.
  • Evaluation in vehicular edge computing scenarios shows significant improvements in adaptation efficiency.
  • TwinLoop reduces the need for costly trial-and-error interactions in dynamic environments.
Read more
Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game
TΓ΅nis Lees, Tambet Matiisen
Reinforcement Learning
  • Successful adaptation of AlphaZero's self-play framework to Tablut, an asymmetric board game.
  • Implementation of separate policy and value heads for each player to address the game's unique dynamics.
  • Challenges of catastrophic forgetting were mitigated through data augmentation and an increased replay buffer.
  • The model achieved a BayesElo rating of 1235, indicating steady improvement in performance over iterations.
Read more
Weighted Bayesian Conformal Prediction
Xiayin Lou, Peng Luo
Theory
  • WBCP generalizes BQ-CP to importance-weighted settings, addressing the limitations of i.i.d. assumptions.
  • The method replaces uniform Dirichlet distributions with weighted Dirichlet distributions for better threshold estimation.
  • Theoretical results confirm calibration consistency and improved posterior concentration rates.
  • WBCP is instantiated for spatial prediction, yielding interpretable diagnostics and effective sample size maps.
Read more
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model
Hongxu Zhou
Theory
  • Introduces the UNDO Flip-Flop task to evaluate reversible semantic state management.
  • Demonstrates that existing models struggle to learn stack-based rollback mechanisms.
  • Finds that models converge on a heuristic that fails under adversarial conditions.
  • Establishes a distinction between theoretical expressibility and practical learnability.
Read more
Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation
Umang Dobhal, Christina Garcia, Sozo Inoue
Generative Models Time Series
  • Introduces a temporal extension of TabDDPM for generating synthetic time-series data.
  • Incorporates lightweight temporal adapters and context-aware embeddings to model temporal dependencies.
  • Demonstrates improved temporal coherence and diversity in synthetic data compared to baseline methods.
  • Achieves competitive classification performance on the WISDM dataset, addressing data imbalance issues.
Read more
Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation
Guang Lin, Christian Moya, Di Qi, Xuda Ye
Generative Models Theory Efficient ML
  • Jeffreys Flow mitigates mode collapse in Boltzmann generators by using symmetric Jeffreys divergence.
  • The framework distills empirical data from Parallel Tempering to enhance sampling accuracy.
  • Theoretical results confirm that Jeffreys Flow generates distributions closer to the target than empirical samples.
  • Demonstrated scalability and accuracy on high-dimensional benchmarks.
Read more
A Theory-guided Weighted $L^2$ Loss for solving the BGK model via Physics-informed neural networks
Gyounghun Ko, Sung-Jun Son, Seung Yeon Cho, Myeong-Su Lee
Theory Optimization
  • The standard L2 PINN loss is insufficient for ensuring accuracy in the BGK model.
  • A new velocity-weighted L2 loss function is proposed to effectively penalize high-velocity errors.
  • The paper establishes a rigorous stability estimate for the weighted loss, ensuring convergence.
  • Numerical experiments demonstrate superior performance of the weighted loss over the standard approach.
Read more
$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models
Ahsan Bilal, Muhammad Ahmed Mohsin, Muhammad Umer, Asad Aali, Muhammad Usman Khanzada, Muhammad Usman Rafique, Zihao He, Emily Fox, Dean F. Hougen
NLP Large Language Models Generative Models
  • S3 improves upon naive best-of-K sampling by reallocating compute during the denoising process.
  • The method utilizes a verifier-guided search to enhance output quality without retraining the model.
  • S3 achieves significant performance gains on benchmarks like MATH-500 and GSM8K.
  • The approach maintains diversity in candidate outputs while favoring high-quality results.
Read more
BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning
Yi Yang, Ovidiu Daescu
Graph Learning
  • BiScale-GTR combines GNN-based local encoding with Transformer-based global reasoning for molecular representation learning.
  • The framework employs a graph-based BPE tokenizer to ensure consistent and chemically valid fragment tokenization.
  • It captures both atom-level and fragment-level structures, enhancing the model's ability to learn long-range dependencies.
  • Experiments show that BiScale-GTR achieves state-of-the-art performance on multiple molecular property prediction benchmarks.
Read more
FLeX: Fourier-based Low-rank EXpansion for multilingual transfer
Gaurav Narasimhan
Large Language Models Optimization Efficient ML
  • LoRA fine-tuning on a small dataset outperforms broader fine-tuning approaches.
  • Sophia optimizer provides faster convergence but with marginal final performance differences compared to Adam.
  • Fourier-based regularization significantly enhances cross-lingual transfer capabilities.
  • The approach demonstrates practical strategies for deploying multilingual code-generation models.
Read more
PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space
Asaf Buchnick, Aviv Shamsian, Aviv Navon, Ethan Fetaya
Generative Models Optimization Multimodal
  • PromptEvolver is the first text-level evolutionary framework for prompt inversion, using image-aware VLM operators.
  • The method operates without requiring access to model internals, making it applicable to both open-source and black-box models.
  • PromptEvolver achieves state-of-the-art results in prompt inversion, with up to 7.8% improvement in image reconstruction scores compared to existing baselines.
  • The genetic algorithm used in PromptEvolver promotes diversity in prompt generation, reducing the risk of local optima.
Read more
A machine learning framework for uncovering stochastic nonlinear dynamics from noisy data
Matteo Bosso, Giovanni Franzese, Kushal Swamy, Maarten Theulings, Alejandro M. AragΓ³n, Farbod Alijani
Time Series Theory Interpretability
  • Introduces a hybrid framework combining symbolic regression and Gaussian processes.
  • Successfully identifies both symbolic and stochastic components of dynamical systems.
  • Demonstrates data efficiency, requiring only 102-103 data points.
  • Validates the approach on both numerical benchmarks and experimental biological systems.
Read more
Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering
Manar D. Samad, Yina Hou, Shrabani Ghosh
Theory Optimization Multimodal
  • Traditional clustering methods like K-means dominate EHR analysis but are limited in effectiveness.
  • An ensemble-based deep clustering approach is proposed to enhance clustering performance by aggregating multiple embeddings.
  • The study utilizes real EHR data from the All of Us Research Program to evaluate clustering methods.
  • The proposed method outperforms traditional and deep learning methods across various metrics and patient cohorts.
Read more
Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression
Longsheng Zhou, Yu Shen
Efficient ML
  • Proposes a three-stage pipeline for neural network compression: pruning, quantization, and distillation.
  • Demonstrates that the order of these stages significantly impacts the accuracy and efficiency of the model.
  • Shows that traditional metrics may not accurately reflect real-world performance, advocating for runtime-based evaluations.
  • Achieves competitive accuracy and low latency across multiple architectures and datasets.
Read more
Neural Computers
Mingchen Zhuge, Changsheng Zhao, Haozhe Liu, Zijian Zhou, Shuming Liu, Wenyi Wang, Ernie Chang, Gael Le Lan, Junjie Fei, Wenxuan Zhang, Yasheng Sun, Zhipeng Cai, Zechun Liu, Yunyang Xiong, Yining Yang, Yuandong Tian, Yangyang Shi, Vikas Chandra, JΓΌrgen Schmidhuber
Generative Models Theory Multimodal
  • Introduction of Neural Computers (NCs) as a unified computing paradigm.
  • Demonstration of NC prototypes for command-line and GUI interactions.
  • Identification of early runtime primitives learned from raw I/O data.
  • Outline of challenges and roadmap towards Completely Neural Computers (CNCs).
Read more
Improving Sparse Memory Finetuning
Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta
NLP Large Language Models Efficient ML
  • Introduction of Sparse Memory Finetuning (SMF) to address catastrophic forgetting in LLMs.
  • Development of an open-source pipeline for retrofitting pretrained models with sparse memory layers.
  • Novel slot-selection mechanism based on KL divergence for prioritizing memory updates.
  • Empirical validation showing improved stability on held-out benchmarks while learning new tasks.
Read more
Asymptotic-Preserving Neural Networks for Viscoelastic Parameter Identification in Multiscale Blood Flow Modeling
Giulia Bertaglia, Raffaella Fiamma Cabini
Theory
  • Introduction of Asymptotic-Preserving Neural Networks for viscoelastic parameter identification.
  • Integration of physical principles into the neural network training process enhances model accuracy.
  • Utilization of non-invasive patient-specific data for pressure waveform estimation.
  • Demonstrated effectiveness through numerical simulations in synthetic and real-world scenarios.
Read more
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series
Annita Vapsi, Penghang Liu, Saheed Obitayo, Aakriti, Manoj Cherukumalli, Prathamesh Patil, Amit Varshney, Nicolas Marchesotti, Elizabeth Fons, Vamsi K. Potluru, Manuela Veloso
Time Series Generative Models
  • DynLMC generates synthetic multivariate time series with realistic, nonstationary correlation structures.
  • The model incorporates time-varying correlations, regime-switching, and lagged dependencies.
  • Fine-tuning on DynLMC-generated data improves forecasting performance across multiple benchmarks.
  • The approach enhances the transferability of foundation models for time series analysis.
Read more
Learning $ ext{AC}^0$ Under Graphical Models
Gautam Chandrasekaran, Jason Gaitonde, Ankur Moitra, Arsen Vasilyan
Theory
  • Introduces quasipolynomial-time algorithms for learning AC0 under graphical models with strong spatial mixing.
  • Overcomes the limitations of Fourier analysis in learning from correlated distributions.
  • Demonstrates the applicability of low-degree polynomial approximations beyond product structures.
  • Extends results to other function classes, enhancing the generality of the findings.
Read more
Busemann energy-based attention for emotion analysis in PoincarΓ© discs
Zinaid Kapić, Vladimir Jaćimović
NLP Theory Efficient ML
  • EmBolic leverages hyperbolic geometry for emotion analysis, capturing hierarchical relationships between words and emotions.
  • The model operates in a continuous space of emotions, avoiding the limitations of categorical representations.
  • An attention mechanism based on Busemann energy is utilized to evaluate the alignment of textual messages with emotional classes.
  • Experiments show strong generalization and prediction accuracy, even in small dimensions.
Read more
Bridging Theory and Practice in Crafting Robust Spiking Reservoirs
Ruggero Freddi, Nicolas Seseri, Diana Nigrisoli, Alessio Basti
Theory Time Series Efficient ML
  • Introduction of the robustness interval as a measure for hyperparameter tuning in spiking reservoirs.
  • Identification of monotonic trends linking robustness interval width to presynaptic connection density and firing threshold.
  • Discovery of iso-performance manifolds in the hyperparameter space that maintain performance near the critical point.
  • Validation of the theoretical critical point as a robust starting coordinate for parameter search.
Read more
MO-RiskVAE: A Multi-Omics Variational Autoencoder for Survival Risk Modeling in Multiple Myeloma
Zixuan Chen, Heng Zhang, YuPeng Qin, WenPeng Xing, Qiang Wang, Da Wang, Changting Lin, Meng Han
Generative Models Multimodal
  • MO-RiskVAE improves survival risk modeling in multiple myeloma by addressing limitations in traditional VAE approaches.
  • The study highlights the importance of latent regularization scale and structure in survival-driven training.
  • Moderate relaxation of KL regularization consistently enhances survival discrimination.
  • The model integrates multimodal omics data effectively, improving risk stratification without added complexity.
Read more
AdaBoost Does Not Always Cycle: A Computer-Assisted Counterexample
Erik Y. Wang
Theory
  • The paper disproves the conjecture that exhaustive AdaBoost always converges to a finite cycle.
  • A specific counterexample is constructed using a block-product matrix that demonstrates non-periodic behavior.
  • The irrationality of the eigenvalue ratio in the linearized return maps is crucial to the findings.
  • The results are supported by rigorous mathematical proofs and computational certificates.
Read more
The Rhetoric of Machine Learning
Robert C. Williamson
Theory
  • Machine learning is inherently rhetorical, influencing perceptions and decisions.
  • The concept of 'manipulation as a service' highlights the persuasive use of machine learning in business.
  • Viewing machine learning through the lens of rhetoric can open new lines of inquiry and discussion.
  • The paper challenges traditional narratives about the objectivity of machine learning technologies.
Read more
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
Yi Xu, Philipp Jettkant, Laura Ruis
NLP Large Language Models Theory
  • LLMs can discover latent planning strategies with limited depth, with small transformers achieving up to three steps.
  • Fine-tuned models like GPT-4o and Qwen3-32B reach a maximum of five latent planning steps.
  • GPT-5.4 demonstrates the ability to generalize strategies to eight steps during testing, despite training on fewer steps.
  • The study reveals a dissociation between the ability to discover strategies and the ability to execute them.
Read more
Extraction of linearized models from pre-trained networks via knowledge distillation
Fumito Kimura, Jun Ohkubo
Efficient ML Theory
  • Proposes a framework for extracting linearized models from pre-trained neural networks.
  • Integrates Koopman operator theory with knowledge distillation for improved classification tasks.
  • Demonstrates superior performance over conventional least-squares-based Koopman approximations.
  • Focuses on enhancing energy efficiency in machine learning architectures, particularly for optical devices.
Read more
SBBTS: A Unified SchrΓΆdinger-Bass Framework for Synthetic Financial Time Series
Alexandre Alouadi, GrΓ©goire Loeper, CΓ©lian Marsala, Othmane Mazhar, HuyΓͺn Pham
Generative Models Time Series Optimization
  • Introduces SBBTS, a unified framework for generating synthetic financial time series.
  • Jointly models drift and stochastic volatility, overcoming limitations of existing methods.
  • Demonstrates improved forecasting performance and data augmentation capabilities.
  • Empirical evaluations show accurate recovery of volatility and correlation structures.
Read more
From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures
Srinidhi Madabhushi, Pranesh Vyas, Swathi Vaidyanathan, Mayur Kurup, Elliott Nash, Yegor Silyutin
Graph Learning
  • Introduces a graph-based anomaly detection system for microservices using unsupervised learning.
  • Utilizes GCN-GAE to learn structural representations from service interaction graphs.
  • Achieves high precision and low false positive rates in anomaly detection.
  • Addresses gaps in traditional load testing by focusing on real-world traffic patterns.
Read more
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
Wenyue Hua, Sripad Karne, Qian Xie, Armaan Agrawal, Nikos Pagonas, Kostis Kaffes, Tianyi Peng
Large Language Models Optimization Efficient ML
  • AGENTOPT is the first framework-agnostic tool for client-side optimization of LLM-based agents.
  • Model selection is identified as a critical factor, with significant cost differences between model combinations.
  • The paper presents eight search algorithms to efficiently navigate the model assignment space.
  • Empirical results show that Arm Elimination can reduce evaluation budgets significantly while maintaining accuracy.
Read more
Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations
Chris Choy
Computer Vision Robotics Theory
  • Removing orthogonalization during training improves rotation estimation in deep learning.
  • SVD orthogonalization introduces significant gradient distortions, particularly early in training.
  • The SVD Jacobian has a rank of 3, indicating limited gradient information retention.
  • Gram-Schmidt orthogonalization results in asymmetric gradient signals, favoring 9D over 6D parameterization.
Read more
Improving Robustness In Sparse Autoencoders via Masked Regularization
Vivek Narayanaswamy, Kowshik Thopalli, Bhavya Kailkhura, Wesam Sakla
NLP Large Language Models Interpretability
  • Sparse autoencoders are prone to feature absorption, degrading interpretability despite high reconstruction fidelity.
  • The proposed masking-based regularization disrupts co-occurrence patterns, improving robustness and interpretability.
  • The method enhances performance across multiple SAE architectures and reduces the OOD gap.
  • Results indicate that stronger training objectives combined with architectural advances can mitigate shortcut learning in SAEs.
Read more
Time-Series Classification with Multivariate Statistical Dependence Features
Yao Sun, Bo Hu, Jose Principe
Time Series Audio & Speech Efficient ML
  • Introduces a framework for non-stationary time-series classification using multivariate statistical dependence features.
  • Utilizes the cross density ratio (CDR) for robust statistical dependence measurement independent of sample order.
  • Implements the functional maximal correlation algorithm (FMCA) to construct a projection space for feature extraction.
  • Achieves competitive recognition accuracy on the TI-46 digit speech corpus with a lightweight neural network architecture.
Read more
Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs
Suraj Yadav, Siddharth Yadav, Parth Goyal
NLP Large Language Models Optimization
  • Identification of a capacity boundary in SLMs, limiting performance on complex reasoning tasks.
  • Training on lower-difficulty samples yields competitive results with significantly reduced training effort.
  • Cross-dataset generalization shows that easier training distributions can enhance numeric reasoning performance.
  • GRPO's effectiveness is contingent on the base model's prior reasoning competence and dataset difficulty.
Read more
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu
NLP Large Language Models Efficient ML
  • Introduces the Master Key Hypothesis for capability transfer across models.
  • Presents Unlock, a training-free and label-free framework for capability transfer.
  • Demonstrates significant performance improvements in reasoning tasks without retraining.
  • Shows that capability transfer can match or exceed post-training performance.
Read more
Selective Neuron Amplification for Training-Free Task Enhancement
Ryyan Akhtar
NLP Large Language Models Efficient ML
  • Selective Neuron Amplification (SNA) enhances transformer model performance without changing learned parameters.
  • The method identifies and amplifies neurons with strong task-specific responses during inference.
  • SNA shows significant improvements in low-confidence scenarios, with a mean improvement of 27.85% in certain tasks.
  • The effectiveness of SNA varies across different performance zones, indicating a saturation effect.
Read more
On the Geometry of Positional Encodings in Transformers
Giansalvo Cirrincione
NLP Large Language Models Theory
  • Positional information is essential for Transformers to perform order-sensitive tasks.
  • Distinct positional encodings are learned during training, leading to effective representation of sequence positions.
  • Optimal positional encodings can be approximated using MDS on Hellinger distances, although exact reproduction is unattainable.
  • The sinusoidal encoding is theoretically justified as optimal for certain types of corpora.
Read more
Information as Structural Alignment: A Dynamical Theory of Continual Learning
Radu Negulescu
Theory
  • Catastrophic forgetting is a structural issue, not merely an engineering failure.
  • The Informational Buildup Framework (IBF) redefines knowledge retention through structural alignment.
  • IBF demonstrates superior performance in continual learning tasks without relying on raw data storage.
  • The agency mechanism's effectiveness is context-dependent, yielding varying outcomes based on the learning environment.
Read more
LLMs Should Express Uncertainty Explicitly
Junyu Guo, Shangding Gu, Ming Jin, Costas Spanos, Javad Lavaei
Large Language Models NLP Interpretability
  • Uncertainty in LLMs should be explicitly communicated rather than inferred post-hoc.
  • Two interfaces for uncertainty are proposed: global (verbalized confidence) and local (reasoning-time markers).
  • The verbalized-confidence interface improves calibration and reduces overconfident errors.
  • The reasoning-time interface enhances visibility of failures and aids in retrieval control.
Read more
On the Price of Privacy for Language Identification and Generation
Xiaoyu Li, Andi Han, Jiaojiao Jiang, Junbin Gao
NLP Large Language Models Theory
  • Approximate DP incurs no statistical penalty for language identification and generation tasks.
  • Under pure DP, the degradation in performance is characterized by a factor of min{1,Ξ΅}.
  • Generation tasks achieve a tighter privacy-utility tradeoff compared to identification tasks.
  • The study provides a complete characterization of the price of privacy in language learning.
Read more
Top-K Retrieval with Fixed-Size Linear-Attention Completion: Backbone- and KV-Format-Preserving Attention for KV-Cache Read Reduction
Yasuto Hoshi, Daisuke Miyashita, Jun Deguchi
NLP Large Language Models Efficient ML
  • Introduces a hybrid attention module that combines exact anchors with Top-K retrieval and a fixed-size completion term.
  • Maintains the original backbone language model and KV-cache format, ensuring compatibility with existing systems.
  • Demonstrates that the proposed method improves performance in long-context benchmarks, particularly in high-entropy attention scenarios.
  • Reduces decode-time KV payload reads by estimating contributions from unretrieved tokens, thus minimizing memory traffic.
Read more
Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning
Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu
NLP Large Language Models Reinforcement Learning
  • Fine-tuning on single best moves leads to effective RL but unfaithful reasoning.
  • Training on multi-move trajectories results in more stable RL and faithful reasoning.
  • Reinforcement learning improves move quality and reduces hallucination rates.
  • SFT-checkpoint metrics can predict final RL performance.
Read more
Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing
MD Shafikul Islam, Mahathir Mohammad Bappy, Saifur Rahman Tushar, Md Arifuzzaman
Graph Learning
  • Introduction of FI-LDP, a feature-importance-aware anisotropic local differential privacy mechanism.
  • Development of a stratified Hierarchical Graph Attention Network (HGAT) for capturing spatial and thermal dependencies in additive manufacturing.
  • Demonstrated significant improvements in utility recovery and defect detection accuracy while ensuring privacy.
  • Mechanistic analysis shows a strong correlation between feature importance and noise allocation, enhancing interpretability.
Read more
A comparative analysis of machine learning models in SHAP analysis
Justin Lin, Julia Fukuyama
Interpretability
  • SHAP analysis provides a framework for interpreting predictions from complex machine learning models.
  • The paper investigates the variability of SHAP values across different machine learning models.
  • A novel high-dimensional waterfall plot is introduced for visualizing SHAP values in multi-classification scenarios.
  • The study aims to enhance the understanding of model decision-making processes through SHAP analysis.
Read more
Bivariate Causal Discovery Using Rate-Distortion MDL: An Information Dimension Approach
Tiago Brogueira, MΓ‘rio A.T. Figueiredo
Theory Graph Learning Optimization
  • Introduces a new method for bivariate causal discovery called rate-distortion MDL (RDMDL).
  • Addresses limitations in existing MDL-based methods regarding the estimation of the cause variable's description length.
  • Utilizes rate-distortion theory and histogram-based density estimation for improved causal direction determination.
  • Demonstrates competitive performance of RDMDL on the TΓΌbingen dataset.
Read more
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Mohammed Nowaz Rabbani Chowdhury, Kaoutar El Maghraoui, Hsinyu Tsai, Naigang Wang, Geoffrey W. Burr, Liu Liu, Meng Wang
Efficient ML Large Language Models Theory
  • Introduces a theoretically grounded metric for expert-wise mixed-precision quantization based on router's L2 norm changes.
  • Demonstrates that experts capturing less prevalent features require higher precision to maintain model performance.
  • Empirical results show improved accuracy and reduced inference costs on large MoE models compared to existing methods.
  • The proposed method incurs negligible computational overhead for determining expert bit-widths.
Read more
Mixture Proportion Estimation and Weakly-supervised Kernel Test for Conditional Independence
Yushi Hirose, Akito Narahara, Takafumi Kanamori
Theory
  • Introduction of new assumptions for MPE based on conditional independence, enhancing identifiability.
  • Development of method of moments estimators with established asymptotic properties.
  • Creation of weakly-supervised kernel tests for validating CI assumptions using unlabeled data.
  • Empirical validation showing improved performance of proposed methods over existing approaches.
Read more
RAGEN-2: Reasoning Collapse in Agentic RL
Zihan Wang, Chi Gui, Xing Jin, Qineng Wang, Licheng Liu, Kangrui Wang, Shiqi Chen, Linjie Li, Zhengyuan Yang, Pingyue Zhang, Yiping Lu, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Reinforcement Learning Large Language Models NLP
  • Identification of template collapse in multi-turn agent RL, where reasoning appears diverse but is input-agnostic.
  • Introduction of a mutual information proxy for diagnosing reasoning quality, which outperforms traditional entropy measures.
  • Explanation of template collapse through a signal-to-noise ratio mechanism, highlighting the impact of low reward variance.
  • Development of SNR-Aware Filtering to enhance input dependence and task performance during training.
Read more
Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem
Hyukjun Yang, Han-Dong Lim, Donghwan Lee
Reinforcement Learning Optimization Theory
  • Introduces a soft Bellman residual minimization framework using weighted Lp-norms.
  • Establishes a connection between BRM and the contraction properties of the Bellman operator.
  • Derives performance error bounds that improve error control in reinforcement learning.
  • Demonstrates that the proposed method is compatible with gradient-based optimization.
Read more
Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem
Shihong Huang, Shengjie Wang, Lei Gao, Hong Ma, Zhanluo Zhang, Feng Zhang, Weihua Zhou
Reinforcement Learning Optimization
  • Introduces a unified DRL framework for solving HFVRP and its variants.
  • Develops the Vehicle-as-Prompt mechanism for efficient decision-making.
  • Achieves superior performance compared to state-of-the-art DRL methods and traditional heuristics.
  • Demonstrates strong zero-shot generalization across diverse problem settings.
Read more
ReLU Networks for Exact Generation of Similar Graphs
Mamoona Ghafoor, Tatsuya Akutsu
Generative Models Graph Learning Theory
  • Introduces ReLU networks for exact graph generation within specified edit distances.
  • Eliminates reliance on training data, ensuring validity of generated graphs.
  • Demonstrates scalability with successful generation of graphs with up to 1400 vertices.
  • Outperforms existing models like GraphRNN and GraphGDP in meeting edit distance constraints.
Read more
Graph Topology Information Enhanced Heterogeneous Graph Representation Learning
He Zhao, Zhiwei Zeng, Yongwei Wang, Chunyan Miao
Graph Learning
  • ToGRL improves the quality of graph structures for heterogeneous graphs by utilizing task-relevant topology information.
  • The two-stage GSL approach separates adjacency matrix optimization from node representation learning, reducing memory usage.
  • ToGRL incorporates prompt tuning to enhance the adaptability of learned representations for downstream tasks.
  • Extensive experiments show ToGRL outperforms existing methods on five real-world datasets.
Read more