gistml

By James Asher

Daily summaries of the latest Machine Learning research papers from Arxiv.

2026-02-06 • Found 24 papers

A Causal Perspective for Enhancing Jailbreak Attack and Defense

Licheng Pan, Yunsheng Lu, Jiexi Liu, Jialing Tao, Haozhe Feng, Hui Xue, Zhixuan Chu, Kui Ren
  • The paper introduces Causal Analyst, a framework combining LLM-based prompt encoding and GNN-based causal graph learning to uncover causal relationships in jailbreak prompts.
  • A dataset of 35,000 jailbreak attempts across seven LLMs is constructed, annotated with 37 interpretable prompt features.
  • Key causal features, such as 'Positive Character' and 'Number of Task Steps,' are identified as primary drivers of jailbreak success.
  • The framework is applied to develop a Jailbreaking Enhancer that improves attack success rates and a Guardrail Advisor that detects malicious intent in obfuscated queries.
  • The study highlights the advantages of a causal perspective over traditional correlation-based approaches for understanding and mitigating jailbreak vulnerabilities.
Read More
Abstract
This paper introduces a novel causal framework, called Causal Analyst, to analyze and enhance the understanding of jailbreak vulnerabilities in large language models (LLMs). Jailbreaks refer to crafted prompts that bypass LLM safety mechanisms, leading to harmful or malicious outputs. The authors propose a data-driven causal discovery approach to identify the direct causal relationships between interpretable prompt features and jailbreak occurrences. They construct a comprehensive dataset of 35,000 jailbreak attempts across seven LLMs, annotated with 37 human-readable prompt features. By integrating LLM-based prompt encoding with graph neural network (GNN)-based causal graph learning, the framework identifies key causal drivers of jailbreaks, such as features like 'Positive Character' and 'Number of Task Steps.' The insights are applied in two practical tools: a Jailbreaking Enhancer that improves attack success rates and a Guardrail Advisor that extracts malicious intent from obfuscated queries. The study demonstrates that a causal perspective provides a more interpretable and effective approach to improving LLM safety and reliability.
Methodology
The authors use a combination of LLM-based prompt encoding and GNN-based causal graph learning to reconstruct causal pathways between prompt features and jailbreak outcomes. They systematically generate a dataset of 35,000 jailbreak attempts using 100 attack templates and 50 harmful queries, annotated with 37 human-readable features. The causal discovery framework identifies direct causal drivers of jailbreaks, which are then used to enhance both attack and defense mechanisms.
Results
The study identifies specific prompt features, such as 'Positive Character' and 'Number of Task Steps,' as direct causal drivers of jailbreaks. The Jailbreaking Enhancer, leveraging these features, significantly improves attack success rates on public benchmarks. The Guardrail Advisor, based on the learned causal graph, effectively extracts true malicious intent from obfuscated queries. The causal framework outperforms non-causal approaches in robustness and interpretability.
Implications
The findings provide actionable insights for both attackers and defenders in the context of LLM safety. Developers can use the causal framework to design more robust guardrails and safety mechanisms, while researchers can leverage the dataset and methodology to further explore vulnerabilities in LLMs. The work also demonstrates the potential of causal analysis in improving the interpretability and reliability of AI systems.
View on arXiv

A Hybrid Data-Driven Algorithm for Real-Time Friction Force Estimation in Hydraulic Cylinders

Mohamad Amin Jamshidi, Mehrbod Zarifi, Zolfa Anvari, Hamed Ghafarirad, Mohammad Zareinejad
  • Introduces a hybrid data-driven algorithm combining LSTM networks and Random Forests for friction force estimation in hydraulic cylinders.
  • Achieves high accuracy with a model error of less than 10% across varying operating conditions and external loads.
  • Demonstrates computational efficiency with an estimation time of 1.51 milliseconds, enabling real-time applications.
  • Outperforms traditional analytical models like the LuGre model in adaptability and precision under dynamic conditions.
  • Validated through experimental data and direct comparisons with existing models, highlighting its robustness and reliability.
Read More
Abstract
This paper presents a novel hybrid data-driven algorithm for real-time friction force estimation in hydraulic cylinders, addressing the limitations of traditional analytical models. Hydraulic systems are widely used in industries due to their high force generation and precision, but their performance is significantly influenced by nonlinear friction forces. Existing analytical models, such as the LuGre model, struggle to adapt to varying operating conditions and exhibit limited computational efficiency. To overcome these challenges, the authors propose a hybrid approach that combines Long Short-Term Memory (LSTM) networks and Random Forests. The algorithm leverages experimental data from a hydraulic test setup to train the model, enabling accurate and robust friction force estimation under diverse operating conditions. The proposed method achieves a model error of less than 10% and a computational cost of 1.51 milliseconds per estimation, making it suitable for real-time applications. The paper validates the algorithm's performance through experimental comparisons with the LuGre model, demonstrating superior adaptability and precision in dynamic environments.
Methodology
The proposed hybrid algorithm integrates LSTM networks for capturing temporal dependencies in friction dynamics and Random Forests for feature detection and estimation. Training data was collected from an experimental hydraulic test setup under varying operating conditions, including changes in load, speed, and environmental factors. The model was trained to estimate nonlinear friction forces with high precision and computational efficiency.
Results
The hybrid algorithm achieved a consistent model error of less than 10% across diverse operating conditions, demonstrating robust performance. The computational cost was measured at 1.51 milliseconds per estimation, making it suitable for real-time applications. Experimental comparisons with the LuGre model showed that the proposed method outperformed the analytical model in terms of adaptability and accuracy under dynamic conditions.
Implications
This research has significant implications for industries relying on hydraulic systems, such as manufacturing, construction, and aerospace. The proposed algorithm enables real-time friction force estimation, improving the precision, efficiency, and reliability of hydraulic actuators. It also provides a scalable and adaptable solution for other mechanical systems where friction modeling is critical, potentially advancing control systems in robotics, automotive engineering, and fluid power applications.
View on arXiv

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Feng Zhu, Robert W. Heath Jr., Aritra Mitra
  • The paper introduces a unified proof framework for SAG, SAGA, and IAG, simplifying their convergence analysis.
  • A novel Lyapunov function is designed to handle delays caused by stochastic sub-sampling, enabling high-probability linear convergence bounds for SAG and SAGA.
  • The analysis significantly improves the best-known convergence rates for the IAG algorithm.
  • The framework is modular and extendable to non-convex objectives and Markov sampling scenarios.
  • The work addresses gaps in prior literature, particularly the lack of high-probability bounds for SAG and SAGA.
Read More
Abstract
This paper presents a unified and simplified convergence analysis for three prominent optimization algorithms: Stochastic Average Gradient (SAG), SAGA, and Incremental Aggregated Gradient (IAG). These algorithms are widely used in large-scale machine learning for solving finite-sum optimization problems with smooth and strongly convex objective functions. The authors address the disparate and complex nature of existing analyses for these algorithms, particularly the challenging proof for SAG, by introducing a novel, modular framework. Their approach involves two key steps: bounding delays caused by stochastic sub-sampling using concentration tools, and designing a new Lyapunov function to account for these delays. This unified framework not only simplifies the analysis but also provides high-probability convergence bounds for SAG and SAGA, which were previously unavailable. Additionally, the authors achieve improved convergence rates for IAG, surpassing prior results. The proposed framework is extendable to non-convex objectives and Markov sampling, making it broadly applicable.
Methodology
The authors develop a unified analysis framework based on two main steps: (1) using concentration tools to bound delays caused by stochastic sub-sampling, and (2) constructing a novel Lyapunov function that incorporates stale gradient information. This framework is applied to analyze the convergence of SAG, SAGA, and IAG, leading to high-probability bounds and improved convergence rates.
Results
The proposed framework provides high-probability linear convergence bounds for SAG and SAGA, filling a gap in the literature. Additionally, the analysis achieves the best-known convergence rates for the IAG algorithm, significantly improving upon previous results. The framework is also shown to be extendable to non-convex objectives and Markov sampling.
Implications
This work simplifies the theoretical understanding of variance-reduced optimization algorithms, making their analysis more accessible and modular. The high-probability bounds and improved convergence rates have practical implications for large-scale machine learning tasks, particularly in scenarios involving non-convex objectives or non-IID sampling. The unified framework could also inspire further research into other optimization algorithms and their convergence properties.
View on arXiv

A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression

Dhruv Sarkar, Abhishek Sinha
  • Introduces a reduction-based framework for constrained contextual bandits with adversarial contexts, extending the SquareCB framework.
  • Leverages online regression oracles to estimate mean reward and cost functions, enabling adaptive surrogate objective construction.
  • Achieves improved performance guarantees for regret and cumulative constraint violation (CCV) under various feasibility assumptions.
  • Provides a unified and modular analysis approach using a single key inequality.
  • Addresses practical challenges in dynamic environments, such as distributional shifts and adaptively chosen contexts.
Read More
Abstract
This paper addresses the problem of constrained contextual bandits (CCB) in adversarial settings, where contexts are not assumed to follow a stochastic or stationary distribution. The authors propose a novel reduction-based algorithmic framework that extends the SquareCB framework to handle long-term constraints. By leveraging an online regression oracle under the realizability assumption, the algorithm constructs surrogate reward functions that adaptively balance exploration, exploitation, and constraint satisfaction. The proposed method achieves improved regret and cumulative constraint violation (CCV) bounds compared to prior work, even under adversarial contexts. The analysis is streamlined through a single key inequality, offering a modular and flexible approach to constrained decision-making in dynamic environments.
Methodology
The authors build on the SquareCB framework, utilizing online regression oracles to estimate reward and cost functions under the realizability assumption. These estimates are used to define surrogate reward functions, which are optimized using an inverse-gap-weighting (IGW) policy with adaptive learning rates. The algorithm balances exploration, exploitation, and constraint satisfaction through a regret decomposition scheme tailored for long-term constraints.
Results
The proposed algorithm achieves improved regret and cumulative constraint violation (CCV) bounds compared to state-of-the-art methods. Specifically, it provides guarantees under both almost sure feasibility with general costs and expected feasibility with non-negative costs. The framework is shown to be effective in adversarial settings, where contexts may exhibit distributional shifts or be adaptively chosen.
Implications
This work has significant implications for applications requiring decision-making under uncertainty with long-term constraints, such as personalized recommendation systems, resource allocation in clinical trials, and online auctions. The ability to handle adversarial contexts makes the approach robust to dynamic and competitive environments, broadening its applicability to real-world scenarios where context distributions are non-stationary or unknown.
View on arXiv

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Zhenghao Xu, Qin Lu, Changlong Yu, Tuo Zhao
  • PMD-MEAN approximates the log-partition function using the mean reward under the sampling policy, simplifying the PMD framework for LLM post-training.
  • The algorithm induces an implicit adaptive mixed KL–χ² regularization, which enhances stability and robustness, particularly in data-constrained scenarios.
  • PMD-MEAN exhibits reduced sensitivity to finite-sample errors compared to traditional methods, mitigating overfitting risks.
  • Theoretical analysis reveals that PMD-MEAN moderates convergence rates during early training phases, explaining its empirical stability.
  • Experiments on math reasoning tasks confirm PMD-MEAN's superior performance, stability, and time efficiency compared to baseline methods like GRPO.
Read More
Abstract
This paper investigates the Policy Mirror Descent (PMD) framework for reinforcement learning (RL) in the context of post-training large language models (LLMs). PMD traditionally relies on solving KL-regularized policy improvement subproblems, which require accurate estimation of the log-partition function. However, this estimation is challenging in the large action spaces of LLMs. The authors propose PMD-MEAN, a practical algorithm that approximates the log-partition term using the mean reward under the sampling policy and performs regression in log-policy space. They demonstrate that PMD-MEAN implicitly optimizes a mirror descent subproblem with an adaptive mixed KL–χ² regularizer, where the χ² term dynamically adjusts based on the mean reward. This adaptive regularization constrains large probability changes, improving stability and robustness to finite-sample errors. Experiments on math reasoning tasks show that PMD-MEAN outperforms standard methods like GRPO, offering superior stability, efficiency, and performance. The findings provide theoretical insights into the implicit regularization mechanisms of PMD-MEAN and suggest pathways for improving RL algorithms in LLM post-training.
Methodology
The authors derive a closed-form characterization of PMD-MEAN, showing its equivalence to mirror descent with an adaptive mixed KL–χ² regularizer. They analyze its convergence properties and compare it to the ideal KL-regularized PMD update. Empirical validation is conducted on math reasoning tasks, where PMD-MEAN is benchmarked against GRPO and other methods.
Results
PMD-MEAN achieves superior performance on math reasoning tasks, demonstrating enhanced stability and time efficiency compared to GRPO. The algorithm's implicit χ² regularization reduces sensitivity to finite-sample errors, improving robustness and mitigating overfitting risks in data-constrained settings.
Implications
The findings provide a principled framework for improving RL algorithms in LLM post-training, particularly in scenarios with limited data or large action spaces. PMD-MEAN's stability and efficiency make it a promising approach for enhancing LLMs on reasoning tasks and other agentic objectives.
View on arXiv

Assessing Electricity Demand Forecasting with Exogenous Data in Time Series Foundation Models

Wei Soon Cheong, Lian Lian Jiang, Jamie Ng Suat Ling
  • Foundation models show mixed effectiveness in electricity demand forecasting, with performance varying across models, forecasting horizons, and geographic contexts.
  • Chronos-2 achieves the best performance among foundation models in zero-shot settings, but the baseline LSTM often outperforms foundation models in stable climates like Singapore.
  • Model architecture plays a critical role in leveraging exogenous features, with designs like TTM's channel-mixing and Chronos-2's grouped attention proving effective.
  • Geographic context is crucial, as foundation models demonstrate advantages primarily in variable climates, while their benefits are less pronounced in stable environments.
  • The study underscores the need for domain-specific adaptations rather than relying on universal foundation model superiority.
Read More
Abstract
This paper evaluates the effectiveness of time-series foundation models in electricity demand forecasting, with a focus on their ability to integrate exogenous features such as weather and date-related variables. The study compares several foundation models—MOIRAI, MOMENT, TinyTimeMixers (TTM), ChronosX, and Chronos-2—against a baseline LSTM with reversible instance normalization. Experiments are conducted on electricity markets in Singapore and Australia at hourly and daily granularities. The results reveal that while foundation models like Chronos-2 show strong performance in zero-shot settings, the baseline LSTM often outperforms them in stable climates like Singapore, particularly for short-term horizons. The paper highlights the importance of model architecture (e.g., TTM's channel-mixing and Chronos-2's grouped attention) in leveraging exogenous features effectively. Additionally, geographic context significantly influences model performance, with foundation models excelling in variable climates but showing limited advantages in stable ones. These findings challenge the assumption of universal superiority of foundation models and emphasize the need for domain-specific adaptations in the energy sector.
Methodology
The authors conducted a systematic evaluation of five time-series foundation models (MOIRAI, MOMENT, TinyTimeMixers, ChronosX, and Chronos-2) against a baseline LSTM with reversible instance normalization. The models were tested on electricity demand data from Singapore and Australian markets at hourly and daily granularities. Three feature configurations were used: all features, selected features, and target-only. The study assessed the models' ability to leverage exogenous features and model cross-channel correlations.
Results
Chronos-2 emerged as the best-performing foundation model in zero-shot settings, particularly in variable climates. However, the baseline LSTM frequently outperformed foundation models in Singapore's stable climate, especially for short-term forecasting horizons. Architectural innovations like TTM's channel-mixing and Chronos-2's grouped attention consistently improved the integration of exogenous features. Geographic context significantly influenced model performance, with foundation models excelling in variable climates but showing limited benefits in stable ones.
Implications
The findings suggest that while time-series foundation models hold promise for electricity demand forecasting, their effectiveness is highly context-dependent. This highlights the need for domain-specific adaptations, particularly in the energy sector, where geographic and climatic factors play a significant role. The study also emphasizes the importance of architectural innovations for effectively leveraging exogenous features, which could guide future model development for energy forecasting and other time-series applications.
View on arXiv

Classification Under Local Differential Privacy with Model Reversal and Model Averaging

Caihong Qin, Yang Bai
  • The paper reinterprets private learning under LDP as a transfer learning problem, leveraging noisy data as a source domain to improve classification on unobserved clean data.
  • A novel binary feedback mechanism is introduced to estimate the utility of LDP-perturbed datasets, preserving correlation structures and improving data utility.
  • Model reversal is proposed to address underperforming classifiers by inverting their decision boundaries, effectively correcting negative transfer scenarios.
  • Model averaging combines multiple reversed classifiers, assigning weights based on their estimated utility to enhance overall performance.
  • Theoretical analysis and empirical results show substantial improvements in classification accuracy under LDP constraints.
Read More
Abstract
This paper addresses the challenge of performing classification tasks under Local Differential Privacy (LDP), a privacy-preserving framework that perturbs user data at the source to eliminate the need for a trusted curator. While LDP provides strong privacy guarantees, the noise it introduces often degrades data utility and model performance. The authors reinterpret private learning under LDP as a transfer learning problem, where the noisy data serve as the source domain and the unobserved clean data as the target. They propose three novel techniques to improve classification performance under LDP: (1) a noised binary feedback-based evaluation mechanism to estimate dataset utility, (2) model reversal to salvage underperforming classifiers by inverting their decision boundaries, and (3) model averaging to combine multiple reversed classifiers using utility-based weights. The paper provides theoretical excess risk bounds under LDP and demonstrates the effectiveness of the proposed methods through empirical evaluations on both simulated and real-world datasets, achieving significant improvements in classification accuracy.
Methodology
The authors propose three techniques tailored for LDP settings: (1) a binary feedback-based evaluation mechanism to estimate dataset utility, (2) model reversal to invert decision boundaries of underperforming classifiers, and (3) model averaging to combine reversed classifiers using utility-based weights. These methods are grounded in transfer learning principles and adapted to the unique challenges of LDP, such as noise-induced distortions and negative transfer scenarios. Theoretical excess risk bounds are derived, and the methods are validated on simulated and real-world datasets.
Results
The proposed methods demonstrate significant improvements in classification accuracy under LDP constraints. Empirical evaluations on both simulated and real-world datasets show that the techniques effectively mitigate the utility loss caused by LDP noise. Theoretical analysis confirms that the methods reduce excess risk compared to baseline approaches.
Implications
The proposed techniques have broad implications for privacy-preserving machine learning, particularly in scenarios where sensitive user data must remain private. By improving classification performance under LDP, these methods can enable more accurate and practical deployment of machine learning models in domains such as healthcare, finance, and personalized services, where data privacy is paramount.
View on arXiv

Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Tao Huang, Rui Wang, Xiaofei Liu, Yi Qin, Li Duan, Liping Jing
  • The paper identifies epistemic uncertainty, specifically conflict and ignorance, as key drivers of misbehaviors in LVLMs.
  • Evidential Uncertainty Quantification (EUQ) is introduced to explicitly quantify these two types of uncertainty using Evidence Theory.
  • EUQ is computationally efficient, requiring only a single forward pass, and provides fine-grained insights into model misbehaviors.
  • The method is evaluated across four categories of LVLM misbehaviors: hallucinations, jailbreaks, adversarial vulnerabilities, and OOD failures.
  • EUQ consistently outperforms strong baselines and offers interpretability through layer-wise uncertainty dynamics analysis.
Read More
Abstract
This paper addresses the challenge of detecting misbehaviors in large vision-language models (LVLMs), such as hallucinations, adversarial vulnerabilities, jailbreaks, and out-of-distribution (OOD) failures. These misbehaviors often stem from epistemic uncertainty, which arises due to conflicting internal knowledge or the absence of supporting information. Existing uncertainty quantification methods fail to effectively capture these specific sources of uncertainty. To address this, the authors propose Evidential Uncertainty Quantification (EUQ), a novel framework that explicitly quantifies two types of epistemic uncertainty: conflict (CF) and ignorance (IG). EUQ interprets model output features as evidence, which is then processed using Evidence Theory to measure internal conflict and knowledge gaps. The method is computationally efficient, requiring only a single forward pass, and provides fine-grained insights into the sources of uncertainty. Extensive experiments on state-of-the-art LVLMs demonstrate that EUQ outperforms existing baselines in detecting misbehaviors, offering a new perspective on understanding and mitigating LVLM failures.
Methodology
The authors propose EUQ, which uses Evidence Theory to quantify epistemic uncertainty in LVLMs. Model output features are interpreted as evidence, which is then decomposed into positive (supporting) and negative (opposing) components. These components are fused using Dempster's rule of combination to compute conflict (CF) and ignorance (IG). The method operates efficiently within a single forward pass and provides token-level heatmaps for interpretability.
Results
EUQ demonstrates superior performance in detecting LVLM misbehaviors across four categories: hallucinations (high conflict), jailbreaks, adversarial vulnerabilities, and OOD failures (high ignorance). The method outperforms existing uncertainty quantification baselines and provides interpretable insights into the evolution of internal representations through layer-wise analysis.
Implications
EUQ has significant implications for improving the trustworthiness and safety of LVLMs in critical applications such as autonomous driving, medical diagnosis, and financial decision-making. By enabling fine-grained detection of misbehaviors, the method can enhance the reliability of LVLMs and facilitate their deployment in high-stakes domains.
View on arXiv

Enhanced QKNorm normalization for neural transformers with the Lp norm

Ezequiel Lopez-Rubio, Javier Montes-Perez, Esteban Jose Palomo
  • The paper generalizes QKNorm normalization by introducing the Lp norm, allowing for non-Euclidean norms in attention mechanisms.
  • Lp normalization provides bounded attention logits, enhancing numerical stability and robustness during training.
  • The method retains the practical benefits of QKNorm while expanding the design space for attention geometries.
  • Experimental results confirm the feasibility of the approach for stabilizing attention mechanisms in Transformers.
  • The proposed method could enable more aggressive learning rates and improved optimization in large-scale models.
Read More
Abstract
This paper introduces a generalization of the Query-Key Normalization (QKNorm) mechanism in Transformer architectures by incorporating the Lp norm instead of the traditional L2 norm. QKNorm is a technique designed to stabilize attention mechanisms by normalizing query and key vectors, thereby improving numerical stability and training efficiency. The proposed enhancement allows for the use of non-Euclidean norms, expanding the design space of attention geometries and enabling finer control over the 'spikiness' or entropy of attention distributions. The authors present the mathematical formulation of Lp normalization and demonstrate its bounded nature, which ensures stable attention logits. Experimental results on a simple problem validate the suitability of the method, showing its potential for improving Transformer training stability and performance.
Methodology
The authors extend the QKNorm mechanism by replacing the standard L2 normalization with Lp normalization, where p is a hyperparameter constrained to p ≥ 1. This involves normalizing query and key vectors using the Lp norm and computing attention logits as the dot product of these normalized vectors. The mathematical formulation ensures bounded logits, improving numerical stability. The approach is tested experimentally on a simple problem to validate its effectiveness.
Results
The experimental results demonstrate that the Lp-based QKNorm generalization is effective in stabilizing attention mechanisms and maintaining bounded logits. The method shows promise for improving training stability and efficiency in Transformer architectures, though the experiments are preliminary and conducted on a simple problem.
Implications
The proposed Lp normalization could be applied to enhance the training stability and performance of large-scale Transformer models in natural language processing and other sequence-based tasks. By allowing finer control over attention distributions, this approach may lead to improved optimization dynamics and better convergence in challenging training regimes.
View on arXiv

Erase at the Core: Representation Unlearning for Machine Unlearning

Jaewon Lee, Yongwoo Kim, Donghyun Kim
  • EC addresses 'superficial forgetting' by enforcing representation-level forgetting across all layers of the network.
  • The framework combines multi-layer contrastive unlearning with deep supervision to ensure effective removal of forget set information.
  • EC achieves superior performance in both logit-based and representation-based metrics compared to existing unlearning methods.
  • The method is model-agnostic and can be integrated into other unlearning algorithms as a plug-in module.
  • Comprehensive evaluations on ImageNet-1K highlight EC's effectiveness in large-scale multi-class unlearning scenarios.
Read More
Abstract
This paper introduces Erase at the Core (EC), a novel framework for machine unlearning that addresses the issue of 'superficial forgetting' in existing methods. Superficial forgetting occurs when models achieve logit-level forgetting but retain significant information in intermediate feature representations. EC enforces forgetting across the entire network hierarchy by integrating multi-layer contrastive unlearning with deep supervision. The framework attaches auxiliary modules to intermediate layers and applies contrastive unlearning and cross-entropy losses at each layer, with progressively larger weights assigned to deeper layers. This ensures that forget set information is removed from shallow to deep layers while preserving classification utility for the retain set. Experimental evaluations on ImageNet-1K demonstrate that EC achieves superior representation-level forgetting compared to existing baselines while maintaining performance on the retain set. Additionally, EC is model-agnostic and can be incorporated into other unlearning methods as a plug-in module.
Methodology
EC employs multi-layer contrastive unlearning combined with deep supervision. Auxiliary modules are attached to intermediate layers, and contrastive unlearning objectives are applied to the forget set while cross-entropy losses preserve retain set utility. Layer-wise weighted losses ensure deeper layers receive stronger forgetting signals, propagating erasure throughout the network hierarchy.
Results
EC consistently outperforms existing unlearning baselines across logit-based and representation-based metrics. On ImageNet-1K, EC achieves greater divergence from the original model in intermediate feature representations while maintaining high accuracy on the retain set. Representation-based evaluations using metrics like Centered Kernel Alignment (CKA) and Information Difference Index (IDI) confirm its effectiveness in removing forget set information.
Implications
EC has significant implications for regulatory compliance with data protection laws like GDPR, enabling effective machine unlearning at scale. It can also be used to correct models trained on corrupted or sensitive data. As a model-agnostic framework, EC can enhance existing unlearning methods, making it a versatile tool for privacy-preserving machine learning.
View on arXiv

Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting

Tianqi Shen, Jinji Yang, Junze He, Kunhan Gao, Ziye Ma
  • Introduces a deterministic framework, Simulated Oracle Direction (SOD), for escaping spurious local minima in non-convex optimization.
  • Simulates over-parameterized escape directions without the computational cost of actual tensor lifting.
  • Provides theoretical guarantees for escaping local minima without relying on randomness or heuristic methods.
  • Demonstrates the framework's effectiveness in low-rank matrix sensing through numerical experiments.
  • Highlights potential applications of the framework in other non-convex optimization problems.
Read More
Abstract
This paper addresses the challenge of escaping spurious local minima in non-convex optimization, particularly in the context of low-rank matrix sensing. The authors propose a novel deterministic framework called Simulated Oracle Direction (SOD) escape, which leverages insights from over-parameterization without incurring the computational costs typically associated with tensor lifting. The framework simulates the optimization landscape of an over-parameterized space and projects escape directions back into the original parameter space, ensuring a strict decrease in the objective value. Unlike existing methods that rely on random perturbations or heuristic rules, the proposed approach is theoretically grounded and guarantees escape from local minima. Numerical experiments demonstrate the effectiveness of the SOD framework in reliably escaping spurious solutions and converging to global optima with minimal computational overhead. The authors argue that this framework has broader implications for tackling non-convex optimization problems beyond matrix sensing.
Methodology
The authors develop the Simulated Oracle Direction (SOD) escape mechanism, which simulates the optimization landscape of an over-parameterized space to identify escape directions. These directions are then projected back into the original parameter space to ensure a strict decrease in the objective value. The approach is applied to the structured matrix sensing problem, where the goal is to recover a low-rank positive semidefinite matrix from linear measurements. Theoretical analysis is provided to characterize the conditions under which the projected directions are effective.
Results
The proposed SOD framework successfully escapes spurious local minima and converges to global optima in numerical experiments on low-rank matrix sensing problems. The method achieves this with minimal computational cost compared to explicit tensor over-parameterization. The results validate the theoretical guarantees and demonstrate the practical utility of the approach.
Implications
The SOD framework has significant implications for non-convex optimization, particularly in scenarios where spurious local minima hinder convergence. By leveraging simulated over-parameterization, the method provides a computationally efficient alternative to traditional over-parameterization techniques. This approach could be extended to other non-convex problems in machine learning, signal processing, and control theory, potentially improving optimization performance in large-scale systems.
View on arXiv

Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates

Chengxiao Wang, Haoze Wu, Gagandeep Singh
  • The paper introduces robust neural Lyapunov-barrier certificates that ensure safety and stability under bounded perturbations in system dynamics.
  • Sufficient conditions for robustness are derived using Lipschitz continuity, and practical training objectives are proposed to enforce these conditions.
  • The proposed methods include adversarial training, local neighborhood bounds, and global Lipschitz regularization.
  • The approach is validated in two case studies, showing up to 4.6x improvement in certified robustness bounds and up to 2.4x improvement in empirical success rates under perturbations.
  • This work provides deterministic guarantees for neural certificates, addressing a significant gap in safe RL for real-world applications.
Read More
Abstract
This paper addresses the challenge of synthesizing robust neural Lyapunov-barrier certificates for ensuring safety and stability in deep reinforcement learning (RL) controllers under system dynamics perturbations. Existing methods for neural certificates provide guarantees only for ideal, unperturbed dynamics, which limits their applicability in real-world scenarios with uncertainties. The authors formally define a robust Lyapunov-barrier certificate and propose sufficient conditions based on Lipschitz continuity to ensure robustness against bounded perturbations. They develop a training framework incorporating adversarial training, local neighborhood bounds, and global Lipschitz regularization to enforce these conditions. The approach is validated in two environments: the Inverted Pendulum and 2D Docking, demonstrating significant improvements in certified robustness bounds (up to 4.6x) and empirical success rates under strong perturbations (up to 2.4x) compared to baseline methods. This work bridges a critical gap in the literature by providing deterministic guarantees for neural certificates under norm-bounded perturbations, enabling safer deployment of RL controllers in safety-critical applications.
Methodology
The authors define a robust Lyapunov-barrier certificate and derive sufficient conditions for robustness based on Lipschitz continuity. They propose a training framework that enforces these conditions through adversarial training, local neighborhood bounds, and global Lipschitz regularization. The framework is evaluated in two environments: Inverted Pendulum and 2D Docking, using both theoretical analysis and empirical experiments.
Results
The proposed methods improve certified robustness bounds by up to 4.6 times and empirical success rates under strong perturbations by up to 2.4 times compared to baseline methods. These results demonstrate the effectiveness of the approach in ensuring safety and stability in RL controllers under dynamic uncertainties.
Implications
This work has significant implications for deploying RL controllers in safety-critical domains such as autonomous driving, robotics, and aerospace systems. By providing deterministic guarantees for safety and stability under perturbations, the proposed methods enhance the reliability and robustness of RL-based control systems in real-world scenarios.
View on arXiv

Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting

Hongyi Li, Han Lin, Jun Xu
  • HRT reformulates oblique regression tree splitting as a nonlinear least squares optimization problem using hinge-based max/min envelopes.
  • The optimization process is equivalent to a damped Newton (Gauss–Newton) method, ensuring fast and stable convergence.
  • HRT is proven to be a universal approximator with an explicit O(δ²) approximation rate.
  • The algorithm supports optional ridge regularization to improve robustness under multicollinearity.
  • Empirical results show that HRT achieves competitive performance with more compact tree structures compared to traditional single-tree baselines.
Read More
Abstract
This paper introduces the Hinge Regression Tree (HRT), a novel algorithm for oblique regression tree splitting that reframes the node-splitting problem as a nonlinear least squares optimization task. By leveraging two linear predictors and a hinge-based max/min envelope, HRT achieves ReLU-like expressive power. The optimization process is interpreted as a damped Newton (Gauss–Newton) method, ensuring fast and stable convergence. The authors provide theoretical guarantees, including monotonic decrease and convergence of the node-level objective, as well as universal approximation capabilities with an explicit O(δ²) rate. Empirical results on synthetic and real-world datasets demonstrate that HRT outperforms traditional single-tree baselines while maintaining more compact tree structures. This work advances the field of oblique regression trees by integrating optimization theory with regression modeling, offering a practical and theoretically sound approach to nonlinear function approximation.
Methodology
The Hinge Regression Tree (HRT) models each node split as a nonlinear least squares optimization problem involving two linear predictors. The hinge-based max/min envelope introduces ReLU-like expressive power. The optimization process is interpreted as a damped Newton (Gauss–Newton) method, with optional ridge regularization to handle multicollinearity. A backtracking line-search variant ensures monotonic decrease and convergence of the node-level objective.
Results
HRT achieves competitive or superior performance compared to single-tree baselines on synthetic and real-world datasets. It produces more compact tree structures while maintaining high predictive accuracy. Theoretical analysis confirms its universal approximation capabilities and stable convergence properties.
Implications
HRT provides a theoretically grounded and efficient approach for oblique regression tree splitting, making it a valuable tool for nonlinear function approximation in machine learning. Its compact tree structures and robust performance make it suitable for applications requiring interpretable models with high predictive power, such as healthcare, finance, and automated decision-making systems.
View on arXiv

Laplacian Representations for Decision-Time Planning

Dikshant Shehmar, Matthew Schlegel, Matthew E. Taylor, Marlos C. Machado
  • Laplacian representations provide a latent space that captures state-space distances at multiple time scales, making them suitable for long-horizon planning.
  • The proposed ALPS algorithm uses Laplacian representations for hierarchical decision-time planning, enabling subgoal discovery and distance estimation.
  • Laplacian representations naturally decompose environments into well-connected regions, facilitating efficient planning.
  • ALPS mitigates compounding errors in long-horizon tasks by leveraging the temporal structure of the environment.
  • Empirical results show that ALPS outperforms model-free RL baselines on goal-conditioned tasks from the OGBench benchmark.
Read More
Abstract
This paper addresses the challenge of decision-time planning in model-based reinforcement learning (RL), particularly in environments requiring function approximation. The authors propose using Laplacian representations as an effective latent space for planning, as these representations capture state-space distances across multiple time scales. By preserving meaningful distances and decomposing long-horizon tasks into subgoals, Laplacian representations mitigate compounding errors that typically arise in long-term predictions. Building on this, the authors introduce a novel hierarchical planning algorithm called Augmented Laplacian Planning with Subgoals (ALPS). ALPS leverages the Laplacian representation to identify subgoals and estimate distances, enabling effective decision-time planning. Empirical evaluations on offline goal-conditioned RL tasks from the OGBench benchmark demonstrate that ALPS outperforms commonly used model-free RL baselines, showcasing its potential for improving planning in complex environments.
Methodology
The authors leverage the Laplacian representation, which embeds states into a latent space defined by the eigenvectors of the graph Laplacian induced by the environment's dynamics. This representation captures temporal and spatial structure, enabling subgoal discovery and distance estimation. They implement these ideas in the ALPS algorithm, a hierarchical decision-time planning method that uses Laplacian representations to decompose tasks into subgoals. The algorithm is evaluated on offline goal-conditioned RL tasks using the OGBench benchmark.
Results
The ALPS algorithm demonstrated superior performance compared to commonly used model-free RL baselines on a set of goal-conditioned tasks from the OGBench benchmark. The results highlight the effectiveness of Laplacian representations in supporting hierarchical planning and mitigating compounding errors in long-horizon tasks.
Implications
The proposed approach has significant implications for improving decision-time planning in model-based RL, particularly in complex, long-horizon tasks. By leveraging Laplacian representations, the method can enhance sample efficiency, generalization, and adaptability in RL systems. This work could be applied to domains such as robotics, autonomous navigation, and other areas requiring efficient long-term planning.
View on arXiv

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

Yu-Ang Lee, Ching-Yun Ko, Pin-Yu Chen, Mi-Yen Yeh
  • Vanilla LoRA achieves comparable performance to advanced LoRA variants when learning rates are properly tuned.
  • Reported improvements in advanced LoRA methods often stem from insufficient hyperparameter tuning in prior studies.
  • Optimal learning rate ranges vary significantly across LoRA methods due to differences in the largest Hessian eigenvalues.
  • Performance parity is observed across diverse tasks, models, and LoRA ranks, with only minor rank-dependent variations.
  • The study highlights the need for rigorous hyperparameter tuning in parameter-efficient fine-tuning research.
Read More
Abstract
This paper investigates the effectiveness of vanilla Low-Rank Adaptation (LoRA) for fine-tuning large language models (LLMs) compared to advanced LoRA variants. While recent studies have proposed modifications to LoRA that claim significant performance improvements, these claims are often based on limited hyperparameter tuning, particularly for learning rates. The authors conduct a systematic re-evaluation of vanilla LoRA and four advanced variants through extensive hyperparameter searches across mathematical reasoning and code generation tasks on models of varying scales. Their findings reveal that, when learning rates are properly tuned, all methods achieve similar peak performance (within 1–2% of each other), challenging the notion that advanced LoRA variants offer consistent advantages. The paper also explores the theoretical underpinnings of these results, linking the optimal learning rate ranges to differences in the largest Hessian eigenvalues of the methods. Overall, the study underscores the importance of comprehensive hyperparameter tuning and positions vanilla LoRA as a competitive baseline for parameter-efficient fine-tuning.
Methodology
The authors benchmarked vanilla LoRA against four advanced LoRA variants using extensive hyperparameter searches, particularly focusing on learning rates. They evaluated performance across mathematical reasoning and code generation tasks on models of varying scales. Additionally, they conducted a second-order analysis to investigate the relationship between learning rate sensitivity and the largest Hessian eigenvalues of the methods.
Results
The study found that all LoRA methods, including vanilla LoRA, achieve similar peak performance (within 1–2%) when learning rates are properly tuned. Different methods exhibit distinct optimal learning rate ranges, with some advanced variants requiring lower learning rates due to larger Hessian eigenvalues. Performance parity was consistent across tasks, models, and LoRA ranks, with only minor rank-dependent differences.
Implications
This work challenges the perceived superiority of advanced LoRA variants and reaffirms vanilla LoRA as a strong baseline for parameter-efficient fine-tuning. It emphasizes the critical role of hyperparameter tuning, particularly learning rates, in achieving optimal performance. These findings could influence future research on fine-tuning methods and encourage more rigorous evaluation protocols in the field.
View on arXiv

Learning, Solving and Optimizing PDEs with TensorGalerkin: an efficient high-performance Galerkin assembly algorithm

Shizheng Wen, Mingyuan Chi, Tianwei Yu, Ben Moseley, Mike Yan Michelis, Pu Ren, Hao Sun, Siddhartha Mishra
  • TensorGalerkin introduces a tensorized Map-Reduce paradigm for efficient Galerkin assembly, addressing bottlenecks in FEM and PIML frameworks.
  • The framework eliminates Python interpreter overhead and GPU underutilization by fusing element-wise operations into monolithic GPU kernels.
  • TensorGalerkin supports three applications: TensorMesh (numerical PDE solver), TensorPils (physics-informed operator learning), and TensorOpt (PDE-constrained optimization).
  • Benchmarks on unstructured meshes for various PDE types show significant gains in computational efficiency and accuracy compared to traditional methods.
  • The framework avoids reliance on automatic differentiation for spatial derivatives, leveraging analytical shape gradients for improved performance.
Read More
Abstract
This paper introduces TensorGalerkin, a novel high-performance framework for solving, learning, and optimizing partial differential equations (PDEs) with variational structures. The framework is built on a Galerkin discretization approach and addresses computational inefficiencies in traditional finite element methods (FEM) and physics-informed machine learning (PIML). TensorGalerkin reformulates the assembly of stiffness matrices and load vectors as a tensorized Map-Reduce operation, enabling efficient GPU utilization and minimizing Python-level overhead. The framework is integrated into three key applications: TensorMesh, a GPU-optimized FEM solver; TensorPils, a physics-informed learning framework for PDE solution operators; and TensorOpt, an end-to-end differentiable pipeline for PDE-constrained optimization. Benchmarks on 2D and 3D elliptic, parabolic, and hyperbolic PDEs demonstrate significant computational efficiency and accuracy improvements over existing methods, making TensorGalerkin a promising tool for numerical PDE solving, operator learning, and optimization tasks.
Methodology
TensorGalerkin employs a Galerkin discretization approach reformulated as a tensorized Map-Reduce operation. The Map stage performs dense tensor contractions to compute local stiffness matrices and load vectors, while the Reduce stage handles domain topology using precomputed routing and sparse matrix multiplication. This approach minimizes Python overhead and maximizes GPU efficiency. The framework is implemented in PyTorch and supports applications in numerical PDE solving, physics-informed learning, and PDE-constrained optimization.
Results
TensorGalerkin achieves significant computational efficiency and accuracy improvements across multiple benchmarks, including 2D and 3D elliptic, parabolic, and hyperbolic PDEs on unstructured meshes. The TensorMesh solver outperforms legacy CPU-based FEM solvers, while TensorPils and TensorOpt demonstrate robust performance in physics-informed learning and optimization tasks, respectively. The framework's GPU compatibility and avoidance of automatic differentiation overhead contribute to its superior performance.
Implications
TensorGalerkin has broad implications for scientific computing, enabling faster and more accurate solutions to PDEs in fields such as physics, engineering, and computational design. Its applications in PDE-constrained optimization and physics-informed learning could accelerate advancements in inverse design, uncertainty quantification, and operator learning, particularly in scenarios with limited training data or complex domain geometries.
View on arXiv

Mining Generalizable Activation Functions

Alex Vitvitskyi, Michael Boratko, Matej Grcic, Razvan Pascanu, Deep Shah, Petar Veličković
  • The paper introduces an evolutionary search framework, AlphaEvolve, to discover novel activation functions by exploring an unbounded search space of Python functions.
  • The search process is guided by frontier language models (LLMs), which generate meaningful activation function proposals based on common knowledge and optional expert guidance.
  • The fitness function explicitly targets out-of-domain (OOD) validation loss, aiming to improve generalization to unseen data.
  • The proposed method demonstrates that small-scale synthetic datasets can be used effectively to discover activation functions that generalize well to more complex tasks.
  • Empirical results show that the discovered activation functions achieve better OOD generalization compared to standard activation functions, without sacrificing in-domain performance.
Read More
Abstract
This paper explores the discovery of novel activation functions for neural networks using an evolutionary search framework called AlphaEvolve. The authors argue that activation functions significantly influence both optimization and the inductive biases of neural networks, impacting their ability to generalize to out-of-distribution (OOD) data. Unlike prior approaches that rely on predefined sets of functions, the proposed method leverages frontier language models (LLMs) to explore an unbounded search space of Python functions. The fitness function used in the evolutionary search explicitly targets OOD validation loss, aiming to discover activation functions that enhance generalization without sacrificing in-domain performance. The authors demonstrate that small-scale synthetic datasets are sufficient for this search process, enabling efficient discovery of activation functions that generalize well to more complex tasks. Empirical results show that the discovered activation functions outperform standard alternatives like ReLU and GELU in terms of OOD generalization while maintaining competitive in-domain performance.
Methodology
The authors employ an evolutionary search framework, AlphaEvolve, which uses frontier language models (e.g., Gemini) to propose activation functions in an unbounded search space of Python programs. The fitness function is based on OOD validation loss, and the search is conducted on small-scale synthetic datasets to enable rapid iteration. The discovered activation functions are then evaluated for their generalization performance on more complex tasks.
Results
The discovered activation functions outperform standard activation functions like ReLU and GELU in terms of out-of-domain generalization while maintaining competitive in-domain performance. This demonstrates the effectiveness of targeting OOD validation loss in the search process.
Implications
The proposed approach has the potential to improve the robustness and generalization of neural networks, particularly in scenarios involving distribution shifts or unseen data. This could benefit applications in fields like autonomous systems, medical diagnostics, and any domain where generalization to novel conditions is critical. Additionally, the use of LLMs for activation function discovery highlights a novel intersection between natural language processing and neural architecture search.
View on arXiv

Near-Optimal Dynamic Matching via Coarsening with Application to Heart Transplantation

Itai Zilberstein, Ioannis Anagnostides, Zachary W. Sollie, Arman Kilic, Tuomas Sandholm
  • The paper introduces a coarsening-based online matching algorithm that aggregates offline nodes into capacitated clusters, enabling near-optimal performance.
  • The approach is applied to heart transplantation, leveraging structural properties in historical UNOS data to optimize donor-recipient matching.
  • The algorithm achieves a competitive ratio of 0.91 in simulations, outperforming the current US heart transplant allocation policy (0.51) and stochastic matching methods (0.63).
  • The framework connects organ allocation to the b-matching problem in Internet advertising, adapting algorithmic techniques to the medical domain.
  • This work provides rigorous theoretical justification for clustering-based approaches in organ allocation and demonstrates strong practical performance.
Read More
Abstract
This paper introduces a novel online matching algorithm based on a coarsening approach, which aggregates offline nodes into capacitated clusters to achieve near-optimal theoretical guarantees. The authors apply this framework to heart transplantation, a critical domain where decisions must be made dynamically and irrevocably due to the limited viability of donor organs. By analyzing historical data from the United Network for Organ Sharing (UNOS), the authors identify structural properties in the data that enable the use of coarsening to optimize the allocation process. The proposed algorithm achieves a competitive ratio close to 1, meaning it performs nearly as well as an omniscient policy with perfect foresight. Simulations on real-world data demonstrate that the algorithm significantly outperforms both the current US heart transplant allocation policy and other baseline methods. This work bridges the gap between theoretical guarantees and practical performance in high-stakes applications like organ allocation.
Methodology
The authors propose a coarsening-based online matching framework where offline nodes (e.g., patients) are aggregated into capacitated clusters based on structural properties of historical data. They analyze the UNOS heart transplant dataset to identify clusters of patients with similar expected life years gained, enabling the algorithm to bypass worst-case lower bounds in online matching. The competitive ratio, which measures performance relative to an omniscient policy, is used as the primary metric. The algorithm is evaluated through simulations on real-world data, comparing its performance to existing policies and baseline methods.
Results
The proposed algorithm achieves a competitive ratio of 0.91, indicating near-optimal performance compared to an omniscient policy. This is a significant improvement over the current US heart transplant allocation policy (competitive ratio of 0.51) and stochastic matching methods (0.63). The results demonstrate that the coarsening-based approach effectively balances immediate and future matching opportunities, optimizing the allocation of scarce donor hearts.
Implications
The proposed framework has significant implications for organ allocation, particularly in improving the efficiency and fairness of heart transplantation policies. By providing a theoretically grounded and practically effective algorithm, this work could inform future revisions of organ allocation systems, potentially saving more lives. Additionally, the connection to the b-matching problem suggests broader applicability of the method to other dynamic resource allocation problems, such as kidney transplantation or even non-medical domains like online advertising.
View on arXiv

Parity, Sensitivity, and Transformers

Alexander Kozachinskiy, Tomasz Steifer, Przemysław Wałȩga
  • The paper proves that a 1-layer, single-head transformer cannot compute the PARITY function due to limitations in average sensitivity.
  • A new 4-layer transformer construction is introduced, which computes PARITY using soft attention and length-independent, polynomially bounded positional encodings.
  • The proposed construction works for both full-attention and causal masking architectures, addressing limitations of prior approaches.
  • The study highlights the importance of average sensitivity as a tool for analyzing transformer expressivity.
  • The results contribute to the broader understanding of transformer capabilities and limitations in computing sensitive Boolean functions.
Read More
Abstract
This paper investigates the expressivity of transformer architectures, focusing on their ability to compute the PARITY function, a fundamental Boolean function that assigns 0 to binary words with an even number of ones and 1 otherwise. The authors address limitations in prior constructions, which required at least two layers and often relied on impractical features such as length-dependent positional encodings, hard attention, or non-standard layer normalization. The paper makes two key contributions: (1) a proof that no 1-layer, single-head transformer can compute PARITY due to constraints on average sensitivity, and (2) a novel 4-layer transformer construction that computes PARITY using soft attention, length-independent and polynomially bounded positional encodings, and no layer normalization. This construction works for both full-attention and causal masking settings, making it more practical for real-world implementations. The study advances understanding of transformer expressivity and provides insights into architectural design for tasks requiring high sensitivity.
Methodology
The authors use theoretical analysis to derive expressivity limitations of 1-layer, single-head transformers by leveraging the concept of average sensitivity. They also propose a new transformer architecture for computing PARITY, ensuring practical design choices such as soft attention, length-independent positional encodings, and compatibility with causal masking. The construction is validated through mathematical proofs and comparisons with prior work.
Results
The paper establishes that no 1-layer, single-head transformer can compute the PARITY function due to an average sensitivity bound of O(√n), whereas PARITY's sensitivity grows linearly with input length. Additionally, the authors present a 4-layer transformer capable of computing PARITY with practical design features, including soft attention, length-independent and polynomially bounded positional encodings, and no layer normalization.
Implications
The findings provide insights into the expressivity limits of transformers, particularly for tasks requiring high sensitivity, such as PARITY. The proposed 4-layer construction offers a more practical approach for implementing transformers in real-world scenarios, potentially improving their performance on tasks with similar characteristics. This work also informs future research on transformer architecture design and theoretical analysis of neural network expressivity.
View on arXiv

Path-Guided Flow Matching for Dataset Distillation

Xuhui Li, Zhengquan Luo, Xiwei Liu, Yongqiang Yu, Zhiqiang Xu
  • PGFM is the first flow-matching-based framework for dataset distillation, offering a more efficient and stable alternative to diffusion-based methods.
  • The method uses a pretrained VAE and performs flow matching in latent space to synthesize class-conditional data distributions.
  • A novel path-to-prototype guidance algorithm ensures reliable trajectory control, balancing diversity and efficiency during data synthesis.
  • PGFM achieves 7.6× higher efficiency and 78% mode coverage compared to diffusion-based methods, while maintaining or surpassing their performance.
  • The framework is particularly effective in low IPC (images per class) settings and high-resolution benchmarks like ImageNet100.
Read More
Abstract
This paper introduces Path-Guided Flow Matching (PGFM), a novel framework for dataset distillation that leverages flow matching as an efficient alternative to diffusion-based generative models. Dataset distillation aims to compress large datasets into compact synthetic datasets while maintaining comparable performance for training machine learning models. PGFM addresses limitations of diffusion-based methods, such as high computational cost, trajectory instability, and reliance on heuristic guidance. By conducting flow matching in the latent space of a pretrained Variational Autoencoder (VAE), PGFM learns class-conditional transport from Gaussian noise to data distributions. A key innovation is the introduction of a continuous path-to-prototype guidance algorithm, which ensures reliable trajectory control while preserving diversity and efficiency. Experimental results demonstrate that PGFM achieves competitive or superior performance compared to state-of-the-art diffusion-based methods, with significantly improved efficiency and mode coverage. For example, PGFM is 7.6× more efficient than diffusion-based approaches and achieves 78% mode coverage on challenging benchmarks like ImageNet100.
Methodology
PGFM employs flow matching in the latent space of a pretrained VAE to generate synthetic datasets. It introduces a path-to-prototype guidance algorithm to control trajectories, ensuring that samples align with class prototypes while maintaining diversity. The method avoids heavy heuristics and uses early stopping and trust-region constraints to limit over-regularization. Flow matching is used to deterministically solve an ordinary differential equation (ODE) for efficient and stable data synthesis.
Results
PGFM outperforms state-of-the-art diffusion-based methods like MGD3 and MinimaxDiff in terms of accuracy and efficiency. On ImageNet100 with IPC=10, PGFM achieves higher accuracy and 7.6× faster synthesis compared to diffusion-based methods. It also achieves 78% mode coverage, indicating better diversity in the distilled datasets. The method demonstrates consistent improvements across various benchmarks and evaluation backbones.
Implications
PGFM has significant implications for efficient dataset distillation, enabling the creation of compact, high-quality synthetic datasets for training machine learning models. Its efficiency and stability make it particularly valuable for resource-constrained scenarios, such as edge computing or low-budget training. Additionally, its ability to maintain diversity and class-discriminative features could benefit applications in data augmentation, transfer learning, and few-shot learning.
View on arXiv

Position: Capability Control Should be a Separate Goal From Alignment

Shoaib Ahmed Siddiqui, Eleni Triantafillou, David Krueger, Adrian Weller
  • Capability control should be treated as a separate goal from alignment, focusing on operational boundaries rather than context-driven preferences.
  • Capability control mechanisms are categorized into three layers: data-based, learning-based, and system-based interventions.
  • A defense-in-depth approach is necessary to address the limitations of individual layers and ensure robust control across the model lifecycle.
  • Challenges include the probabilistic nature of learning paradigms, dual-use knowledge, and adversarial circumvention of guardrails.
  • Effective capability control has significant implications for minimizing risks in real-world deployments of foundation models.
Read More
Abstract
This position paper argues that capability control—defining and enforcing operational boundaries on permissible model behaviors—should be treated as a distinct goal from alignment in the development and deployment of foundation models. While alignment focuses on ensuring models adhere to human values and preferences, capability control aims to impose hard limits on model behaviors, including under adversarial conditions. The authors propose a defense-in-depth approach to capability control, organizing interventions across three layers of the model lifecycle: (i) data-based control, which shapes the training distribution; (ii) learning-based control, which modifies model weights or representations; and (iii) system-based control, which enforces guardrails during deployment. They highlight the limitations of each layer when used in isolation and advocate for complementary measures across the stack. The paper also identifies key challenges, such as the dual-use nature of knowledge, compositional generalization, and adversarial circumvention, and emphasizes the need for further research to address these issues.
Methodology
The authors propose a layered framework for capability control across the model lifecycle, including interventions at the data level (e.g., filtering or curating training data), learning level (e.g., fine-tuning, reinforcement learning, adversarial training), and system level (e.g., input/output filters, information-flow policies). They advocate for combining these layers to create a defense-in-depth strategy and discuss practical limitations and challenges associated with each approach.
Results
As a position paper, the authors do not present experimental results but provide a conceptual framework and analysis of capability control mechanisms. They argue that no single layer is sufficient in isolation and emphasize the importance of integrating complementary controls to address failure modes effectively.
Implications
The proposed framework for capability control has significant implications for improving the safety and reliability of foundation models in real-world applications. By treating capability control as a distinct goal, developers can better mitigate risks such as malicious misuse, unintended behaviors, and adversarial exploitation. This approach is particularly relevant for high-stakes domains like autonomous systems, healthcare, and cybersecurity.
View on arXiv

Projected Boosting with Fairness Constraints: Quantifying the Cost of Fair Training Distributions

Amir Asiaee, Kaveh Aryan
  • FAIRPROJ introduces fairness constraints into boosting by projecting training distributions onto a convex set satisfying fairness requirements.
  • The algorithm quantifies the cost of fairness in terms of a divergence penalty, reducing the effective edge of weak learners.
  • Theoretical guarantees are preserved, with an exponential loss bound that incorporates the fairness cost term.
  • Experiments demonstrate competitive tradeoffs between fairness and accuracy, validating the theoretical analysis.
  • FAIRPROJ influences weak learner selection but does not directly enforce fairness in the final classifier, leaving this as an empirical question.
Read More
Abstract
This paper introduces FAIRPROJ, a novel boosting algorithm that incorporates group fairness constraints while preserving the theoretical guarantees of AdaBoost. FAIRPROJ modifies the training dynamics by projecting the ensemble-induced exponential-weights distribution onto a convex set of distributions satisfying fairness constraints. This projection ensures fairness as a reweighting surrogate, influencing the selection of weak learners without directly guaranteeing classifier fairness. The authors provide a theoretical analysis quantifying the tradeoff between accuracy and fairness, showing that the effective edge of weak learners is reduced by a term proportional to the KL-divergence between the original and projected distributions. They derive an exponential loss bound that incorporates this fairness cost, demonstrating that FAIRPROJ retains AdaBoost-like convergence guarantees with explicit fairness penalties. Experiments on standard benchmarks validate the theoretical predictions, showcasing competitive fairness-accuracy tradeoffs and stable training dynamics.
Methodology
FAIRPROJ modifies AdaBoost by projecting the ensemble-induced exponential-weights distribution onto a convex set of fairness-constrained distributions. Weak learners are trained on the projected distribution, while the boosting coefficient is computed using the original distribution. The fairness cost is quantified using the KL-divergence between the original and projected distributions, and theoretical bounds are derived to analyze the impact on convergence rates.
Results
FAIRPROJ achieves exponential loss convergence with a rate dependent on the weak learner edge minus the fairness cost term. Experiments on standard benchmarks validate the theoretical predictions, demonstrating stable training dynamics and competitive fairness-accuracy tradeoffs. The algorithm successfully balances fairness constraints with accuracy without breaking AdaBoost's theoretical guarantees.
Implications
FAIRPROJ provides a framework for incorporating fairness constraints into boosting algorithms, making it suitable for high-stakes domains where fairness is critical. Its ability to quantify the tradeoff between fairness and accuracy offers valuable insights for practitioners aiming to balance these objectives. The approach could be extended to other machine learning paradigms or fairness definitions, contributing to the broader field of fairness-aware machine learning.
View on arXiv

Rewards as Labels: Revisiting RLVR from a Classification Perspective

Zepeng Zhai, Meilin Chen, Jiaxuan Zhao, Junlang Qian, Lei Shen, Yuan Lu
  • Identifies two fundamental issues in GRPO-style RLVR: Gradient Misassignment in Positives and Gradient Domination in Negatives.
  • Proposes the REAL framework, which reformulates policy optimization as a classification task using categorical labels.
  • Introduces anchor logits to enhance policy learning and ensure balanced gradient allocation.
  • Demonstrates improved training stability and performance gains over GRPO and strong variants like DAPO and GSPO across mathematical reasoning benchmarks.
  • Shows REAL's robustness and stability even with a simple binary cross-entropy loss.
Read More
Abstract
This paper introduces the Rewards as Labels (REAL) framework, a novel approach to Reinforcement Learning with Verifiable Rewards (RLVR) that reformulates policy optimization as a classification problem. RLVR methods, such as GRPO and its variants, have shown strong empirical performance in improving large language models for reasoning tasks. However, the authors identify two key issues in GRPO-style methods: Gradient Misassignment in Positives and Gradient Domination in Negatives, which lead to inefficient and suboptimal policy updates. REAL addresses these issues by treating verifiable rewards as categorical labels rather than scalar weights, enabling balanced gradient allocation across rollouts. The framework introduces anchor logits to further enhance policy learning and ensures monotonic and bounded gradient magnitudes. Extensive experiments on mathematical reasoning benchmarks demonstrate that REAL improves training stability and consistently outperforms GRPO and its variants, achieving significant performance gains across model scales.
Methodology
The REAL framework reconceptualizes verifiable rewards as categorical labels, reformulating policy optimization as a classification problem. It uses anchor logits to regulate gradient allocation and ensure balanced updates. Theoretical analysis confirms that REAL induces monotonic and bounded gradient magnitudes, mitigating gradient mismatches. Experiments are conducted on mathematical reasoning benchmarks and large-scale language models to evaluate performance and stability.
Results
REAL improves average Pass@1 scores by 6.7% over DAPO on a 1.5B model and by 6.2% and 1.7% over DAPO and GSPO, respectively, on a 7B model. It demonstrates enhanced training stability and outperforms strong baselines even without explicit KL penalties. Using a vanilla binary cross-entropy loss, REAL achieves a 4.5% improvement over DAPO on average.
Implications
The REAL framework has the potential to improve policy optimization in RLVR settings, particularly for tasks requiring rule-based evaluation, such as mathematical reasoning and program synthesis. Its stability and efficiency could enable more robust training of large language models, paving the way for advancements in complex reasoning systems and applications in education, automated theorem proving, and code generation.
View on arXiv

StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation

Heajun An, Qi Zhang, Minqian Liu, Xinyi Zhang, Sang Won Lee, Lifu Huang, Pamela J. Wisniewski, Jin-Hee Cho
  • StagePilot introduces a novel stage-controlled dialogue simulation framework for cybergrooming prevention training, using offline reinforcement learning.
  • The agent employs adjacent-stage transition constraints to ensure realistic and interpretable conversational dynamics.
  • A composite reward function integrates victim sentiment signals with stage-distance rewards, balancing emotional coherence and strategic progression.
  • Evaluation shows that StagePilot achieves up to 43% higher final-stage reachability, 70% sentiment alignment, and 50% shorter dialogues compared to baseline methods.
  • The system provides a safer, controlled alternative to real-world predator interactions for educating teenagers about online grooming risks.
Read More
Abstract
StagePilot is a deep reinforcement learning (DRL)-based dialogue agent designed to simulate the stage-wise progression of cybergrooming behaviors for educational and preventive training. The system reframes dialogue simulation as a stage-level planning problem, focusing on abstract interaction stages rather than turn-by-turn text generation. This approach enables interpretable, long-horizon control of dialogue flow while maintaining emotional coherence and behavioral realism. StagePilot incorporates adjacent-stage transition constraints to ensure plausible conversational dynamics and employs a composite reward function that balances emotional engagement with strategic progression toward the final stage. The system is evaluated using large language model (LLM)-based simulations, demonstrating its ability to generate realistic and coherent conversations that align with real-world grooming dynamics. Among tested methods, the IQL+AWAC agent achieves the best performance, significantly improving stage reachability, sentiment alignment, and dialogue efficiency compared to baselines.
Methodology
StagePilot uses offline reinforcement learning to train a dialogue agent that operates over abstract interaction stages. The agent's policy is guided by a composite reward function combining sentiment analysis and stage-distance metrics. Adjacent-stage transition constraints are applied to ensure realistic and interpretable dialogue progression. Large language models (LLMs) are used as stochastic environment dynamics to simulate conversational responses. The system is evaluated using automated simulations with two chatbots (predator and victim) to measure stage completion, dialogue efficiency, and emotional engagement.
Results
StagePilot demonstrates significant improvements in simulating realistic and coherent grooming conversations. The IQL+AWAC agent achieves up to 43% higher final-stage reachability compared to baseline methods, maintains over 70% sentiment alignment, and reduces dialogue length by 50%. These results highlight the system's ability to balance emotional coherence with strategic dialogue planning.
Implications
StagePilot has the potential to serve as a critical tool for educating teenagers about online grooming risks in a safe and controlled environment. By simulating realistic predator-victim interactions, it can enhance awareness, improve recognition of grooming tactics, and promote safer online behaviors. The framework could also be extended to other safety-critical conversational domains, such as fraud prevention or mental health support.
View on arXiv