AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
65
Papers today
8h
Update frequency
7
Days of history
Step-level Denoising-time Diffusion Alignment with Multiple Objectives
Reinforcement Learning
Generative Models
Optimization
- Introduces a retraining-free framework for aligning diffusion models with multiple objectives.
- Develops a step-level RL formulation that avoids the need for reward gradients and approximation errors.
- Derives a closed-form solution for the optimal reverse denoising distribution based on single-objective models.
- Demonstrates superior performance compared to existing denoising-time approaches through extensive experiments.
Read more
Step-level Denoising-time Diffusion Alignment with Multiple Objectives
Summary
This paper addresses the challenge of aligning diffusion models with human preferences in reinforcement learning (RL) settings, particularly when multiple objectives must be balanced. Traditional methods often optimize a single reward function, which does not reflect the pluralistic nature of human preferences. The authors propose a novel framework called Multi-objective Step-level Denoising-time Diffusion Alignment (MSDDA), which allows for the alignment of diffusion models without retraining and without requiring access to individual reward functions. By introducing a step-level RL formulation, the authors derive a closed-form solution for the optimal reverse denoising distribution, expressed in terms of single-objective base models. The method is shown to be equivalent to existing RL fine-tuning approaches, eliminating approximation errors. Extensive experiments using the Stable Diffusion model demonstrate that MSDDA outperforms existing denoising-time methods, providing a more efficient and accurate way to align models with multiple objectives.
Methodology
The authors propose a step-level RL fine-tuning approach that allows for the alignment of diffusion models with multiple objectives without retraining. They derive the Multi-objective Step-level Denoising-time Diffusion Alignment (MSDDA) framework, which computes the optimal reverse denoising distribution in closed form, using mean and variance derived from single-objective models. This approach does not require access to reward functions or gradients, thus avoiding approximation errors.
Results
The experimental results indicate that the proposed MSDDA method outperforms existing denoising-time alignment techniques, demonstrating its effectiveness in aligning diffusion models with multiple objectives while maintaining computational efficiency.
Implications
The findings suggest that MSDDA can be applied in various domains where diffusion models are used, particularly in tasks requiring alignment with complex human preferences, such as text-to-image generation and other generative modeling tasks.
FedIDM: Achieving Fast and Stable Convergence in Byzantine Federated Learning through Iterative Distribution Matching
Federated Learning
- FedIDM leverages iterative distribution matching for robust and efficient convergence in Byzantine FL.
- The framework minimizes the impact on model utility even with a high proportion of colluded malicious clients.
- Empirical evaluations show substantial improvements over existing Byzantine-robust methods.
- The attack-tolerant condensed data generation effectively counters label-flipping attacks.
Read more
FedIDM: Achieving Fast and Stable Convergence in Byzantine Federated Learning through Iterative Distribution Matching
Summary
The paper presents FedIDM, a novel framework designed to enhance the robustness and convergence speed of Byzantine federated learning (FL). Existing Byzantine-robust methods often struggle with slow convergence and compromised model utility, especially when facing colluded malicious clients. FedIDM addresses these challenges through two main components: attack-tolerant condensed data generation and robust aggregation with negative contribution-based rejection. The framework employs iterative distribution matching to create trustworthy condensed datasets that help identify and filter out abnormal client updates. By utilizing a contrastive learning approach combined with a Gaussian Mixture Model (GMM), FedIDM mitigates label-flipping attacks during the condensed data generation process. The method is empirically validated against multiple state-of-the-art Byzantine attacks across three benchmark datasets, demonstrating significant improvements in convergence speed and model utility compared to existing defenses.
Methodology
FedIDM is structured into two stages: the first stage involves generating condensed datasets using iterative distribution matching, which captures essential information for model training. The second stage focuses on adjusting local updates based on historical data and evaluating them against the condensed dataset. A robust aggregation method is employed, which rejects updates that deviate significantly from the expected direction or cause substantial loss on the condensed dataset.
Results
The results indicate that FedIDM achieves faster and more stable convergence compared to traditional Byzantine-robust methods, maintaining acceptable model utility even in scenarios with a significant number of malicious clients. The empirical evaluations across three benchmark datasets demonstrate its effectiveness against various state-of-the-art Byzantine attacks.
Implications
The findings suggest that FedIDM can be effectively applied in real-world federated learning scenarios where data privacy is critical, and the risk of Byzantine attacks is prevalent. This framework can enhance the reliability of federated learning systems in various applications, including healthcare, finance, and distributed AI.
Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection
Theory
- Model evaluation is often reduced to a few aggregate metrics, risking misleading conclusions.
- Common pitfalls in evaluation include data leakage, class imbalance, and inappropriate metric selection.
- Evaluation should be treated as a decision-oriented and context-dependent process.
- The paper emphasizes the importance of aligning evaluation methods with operational objectives.
Read more
Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection
Summary
This paper addresses the critical aspect of evaluating supervised machine learning models, emphasizing the need for rigorous assessment methods that reflect real-world performance. The authors argue that despite the availability of automated workflows, model evaluation often relies on a limited set of aggregate metrics, which can lead to misleading conclusions. The paper explores the principles and challenges of evaluating algorithms in both classification and regression tasks, highlighting how factors such as dataset characteristics, validation design, class imbalance, and asymmetric error costs influence evaluation outcomes. Through controlled experiments with diverse benchmark datasets, the authors identify common pitfalls, including the accuracy paradox, data leakage, and inappropriate metric selection. They advocate for a decision-oriented and context-sensitive approach to model evaluation, which aligns with the operational objectives of the task. The work provides a structured foundation for selecting metrics and validation protocols that enhance the reliability and trustworthiness of supervised machine learning systems.
Methodology
The authors conducted controlled experimental scenarios using diverse benchmark datasets to analyze the strengths and limitations of various evaluation strategies, including hold-out validation and cross-validation. They examined the impact of dataset characteristics and task-specific objectives on the behavior of different performance metrics.
Results
The study revealed that relying on a single metric can lead to misleading conclusions, particularly in cases of class imbalance and asymmetric error costs. It highlighted the sensitivity of regression measures to outliers and distribution shifts, and underscored the importance of rigorous evaluation protocols to ensure model reliability and trustworthiness.
Implications
The findings suggest that practitioners should adopt more comprehensive and context-aware evaluation methods to improve the reliability of machine learning models in real-world applications. This approach can help mitigate risks associated with model deployment and enhance the interpretability and robustness of predictive systems.
Gating Enables Curvature: A Geometric Expressivity Gap in Attention
NLP
Large Language Models
Theory
- Gated attention mechanisms enable non-flat geometries, enhancing representational expressivity.
- Ungated attention is limited to flat statistical manifolds due to its affine structure.
- Multiplicative gating introduces nonlinear modulation, allowing for richer representation structures.
- Empirical results indicate gated models perform better on tasks with nonlinear decision boundaries.
Read more
Gating Enables Curvature: A Geometric Expressivity Gap in Attention
Summary
This paper investigates the geometric properties of attention mechanisms in neural networks, particularly focusing on the impact of multiplicative gating on the expressivity of attention layers. The authors model the outputs of attention as mean parameters of Gaussian distributions and analyze the resulting statistical manifolds using Fisher–Rao geometry. They demonstrate that ungated attention is confined to intrinsically flat statistical manifolds due to its affine structure, while multiplicative gating allows for non-flat geometries, including positively curved manifolds. This establishes a geometric expressivity gap between ungated and gated attention mechanisms. Empirical results show that gated models exhibit higher representation curvature and improved performance on tasks requiring nonlinear decision boundaries, while showing no consistent advantage on linear tasks. The paper also identifies a regime where curvature accumulates under composition, leading to a depth amplification effect, thereby enhancing the representational power of gated attention models.
Methodology
The authors employ a geometric framework to analyze attention mechanisms by modeling outputs as parameters of Gaussian distributions and studying the induced statistical manifolds under the Fisher–Rao metric. They provide theoretical proofs regarding the geometric limitations of ungated attention and the advantages of multiplicative gating, complemented by synthetic experiments to illustrate the curvature differences between gated and ungated models.
Results
The study reveals that ungated attention mechanisms are restricted to flat geometries, while multiplicative gating enables the realization of positively curved statistical manifolds. This geometric expressivity gap is supported by empirical evidence showing that gated models outperform ungated ones on tasks requiring nonlinear decision boundaries. Additionally, the research identifies a structured regime where curvature can accumulate, enhancing the expressivity of deeper models.
Implications
The findings suggest that incorporating multiplicative gating in attention mechanisms can significantly improve the ability of models to capture complex, nonlinear relationships in data. This has potential applications in various domains, particularly in natural language processing and other areas requiring sophisticated representation learning.
The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery
Computer Vision
Optimization
Theory
- Identifies Gradient Entanglement as a critical issue limiting GCD performance.
- Proposes EAGC, a plug-and-play module that effectively mitigates GE.
- Includes AGA and EEP components to enhance gradient optimization.
- Achieves new state-of-the-art results in GCD across various benchmarks.
Read more
The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery
Summary
This paper addresses the challenges in Generalized Category Discovery (GCD), which aims to categorize unlabeled samples using knowledge from labeled data. The authors identify a critical issue termed 'Gradient Entanglement' (GE), which arises from the interference between supervised and unsupervised gradient objectives, leading to distorted supervised gradients and reduced separability of novel categories. To mitigate this, they propose the Energy-Aware Gradient Coordinator (EAGC), a modular approach that includes two components: Anchor-based Gradient Alignment (AGA) and Energy-aware Elastic Projection (EEP). AGA preserves the discriminative structure of known classes by aligning gradients with a reference model, while EEP projects unlabeled gradients to reduce overlap with known-class representations. The EAGC can be integrated into existing GCD frameworks without altering their architecture or training objectives. Extensive experiments demonstrate that EAGC significantly enhances the performance of various GCD methods, achieving state-of-the-art results across multiple benchmarks.
Methodology
The authors introduce the Energy-Aware Gradient Coordinator (EAGC) which consists of two main components: Anchor-based Gradient Alignment (AGA) that aligns gradients of labeled samples with a reference model, and Energy-aware Elastic Projection (EEP) that projects unlabeled gradients onto the complement of the known-class subspace while adaptively scaling the projection based on the alignment of each sample.
Results
The experimental results show that EAGC consistently improves the performance of existing GCD methods, leading to new state-of-the-art results on multiple datasets. The proposed method effectively reduces the negative impact of gradient entanglement, enhancing the discrimination of known classes and the separability of novel categories.
Implications
The findings suggest that addressing gradient entanglement can significantly enhance the robustness of category discovery methods, which is crucial for applications in open-world visual learning and other domains where labeled data is limited.
Graph-Based Fraud Detection with Dual-Path Graph Filtering
Graph Learning
- DPF-GFD introduces a dual-path filtering approach to enhance fraud detection in graph data.
- The model effectively addresses challenges such as relation camouflage and high heterophily.
- It employs a beta wavelet-based operator for structural pattern extraction and a similarity graph for feature representation.
- The method shows improved performance on real-world financial fraud detection datasets.
Read more
Graph-Based Fraud Detection with Dual-Path Graph Filtering
Summary
This paper presents a novel approach to financial fraud detection using a Graph-Based Fraud Detection Model with Dual-Path Graph Filtering (DPF-GFD). Traditional graph neural networks (GNNs) face challenges in fraud detection due to issues like relation camouflage, high heterophily, and class imbalance. DPF-GFD addresses these challenges by employing a beta wavelet-based operator to extract structural patterns from the original graph and constructing a similarity graph based on distance-based node representations. The model utilizes an improved low-pass filter to fuse embeddings from both graphs through supervised representation learning, ultimately feeding these features into an ensemble tree model for fraud risk assessment. This dual-path filtering paradigm allows for a clear separation between structural anomaly modeling and feature similarity modeling, resulting in more robust node representations in complex fraud scenarios. The effectiveness of DPF-GFD is validated through comprehensive experiments on four real-world financial fraud detection datasets, demonstrating its superior performance compared to existing methods.
Methodology
The DPF-GFD framework consists of a beta wavelet-based operator for capturing structural patterns, a similarity graph constructed from distance-based node representations, and an improved low-pass filter for embedding fusion. The final node features are assessed using an ensemble tree model to evaluate fraud risk.
Results
Experiments conducted on four real-world financial fraud detection datasets demonstrate that DPF-GFD outperforms existing GNN-based methods, effectively addressing the issues of relation camouflage, high heterophily, and class imbalance, leading to more accurate fraud detection.
Implications
The proposed DPF-GFD model has significant implications for enhancing fraud detection systems in financial institutions, enabling better identification of fraudulent activities and improving the overall integrity of financial systems.
Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training
Optimization
- Introduces a teacher-student learning framework for portfolio optimization using CVaR as a supervisory signal.
- Utilizes Bayesian Neural Networks (BNNs) to provide uncertainty-aware predictions and mitigate overfitting in low-data settings.
- Demonstrates implicit reduction in trading turnover, achieving a 50% decrease compared to deterministic models without explicit constraints.
- Shows that the learned policies generalize effectively across different market conditions and asset universes.
Read more
Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training
Summary
This paper introduces a machine learning framework for portfolio optimization that addresses challenges posed by limited data and market regime shifts. The proposed framework employs a teacher-student learning pipeline where a Conditional Value at Risk (CVaR) optimizer serves as the teacher, generating supervisory labels for training neural models. The student models, which include both Bayesian and deterministic approaches, are trained on a combination of real and synthetically generated data, the latter produced using a factor-based model with t-copula residuals. The study evaluates the performance of four student models through a structured experimental setup that includes controlled synthetic experiments, real-market evaluations, and cross-universe generalization. The results indicate that the student models can match or exceed the performance of the CVaR teacher in various scenarios, demonstrating enhanced robustness during regime shifts and reduced trading turnover. This research highlights the potential of hybrid optimization-learning strategies to improve portfolio construction in data-constrained environments.
Methodology
The methodology involves a semi-supervised sandwich training paradigm where a CVaR optimizer generates labels for a Bayesian student model. The student learns to approximate the teacher's risk-aware behavior through alternating supervised and unsupervised training phases. Synthetic data is generated to augment the limited real data, and models are evaluated through controlled experiments and real-market applications.
Results
The student models consistently matched or outperformed the CVaR teacher in various experimental settings. They demonstrated improved robustness under regime shifts and achieved a significant reduction in trading turnover, indicating effective generalization and stability in portfolio construction.
Implications
The findings suggest that integrating machine learning with traditional portfolio optimization can enhance decision-making in finance, particularly in environments with limited data and high uncertainty. The approach may lead to more robust investment strategies and lower transaction costs, making it valuable for practitioners in the financial sector.
Asynchronous Probability Ensembling for Federated Disaster Detection
Federated Learning
Computer Vision
Efficient ML
- Introduces an asynchronous probability-level aggregation framework for disaster detection.
- Reduces communication overhead by exchanging class-probability vectors instead of model weights.
- Enhances collaboration among heterogeneous CNN architectures without requiring synchronization.
- Integrates ensemble strategies and a knowledge distillation feedback loop for improved accuracy.
Read more
Asynchronous Probability Ensembling for Federated Disaster Detection
Summary
This paper addresses the challenges of timely and accurate disaster detection using Federated Learning (FL) in environments with limited connectivity and heterogeneous devices. The authors propose a novel decentralized ensembling framework that utilizes asynchronous probability aggregation and feedback distillation, shifting the focus from model weights to class-probability vectors. This approach significantly reduces communication costs, maintains data privacy, and allows diverse convolutional neural network (CNN) architectures to collaborate without the need for synchronization. The proposed method includes a lightweight Message Queuing Telemetry Transport (MQTT) broker for clients to publish class-probability outputs, which the server consumes asynchronously to create a stacking meta-classifier or optimized combination weights. The experimental results demonstrate that this method outperforms traditional individual CNN backbones and standard federated approaches, establishing a scalable solution for real-time disaster response.
Methodology
The authors developed a decentralized framework that aggregates softmax vectors from multiple clients asynchronously. Clients send their class-probability outputs to an MQTT broker, which the server uses to learn a stacking meta-classifier or optimized weights. This method allows for heterogeneous CNN designs to collaborate while avoiding global synchronization and reducing payload size. Additionally, a knowledge distillation feedback loop is implemented to refine local model predictions based on the aggregated ensemble distribution.
Results
The proposed method was evaluated using the Aerial Image Database for Emergency Response (AIDER) dataset, showing accuracy levels comparable to or exceeding those of centralized and traditional federated approaches. The method also demonstrated a significant reduction in communication overhead, making it suitable for resource-constrained environments.
Implications
This research has significant implications for disaster response systems, particularly in scenarios where network resources are limited and timely decision-making is critical. The proposed framework can enhance the effectiveness of emergency handling by enabling real-time collaboration among diverse devices, improving the accuracy of disaster detection.
MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models
Computer Vision
Large Language Models
Efficient ML
- MOONSHOT enhances one-shot pruning by optimizing multiple objectives simultaneously.
- The framework is scalable and efficient, suitable for billion-parameter models.
- Experimental results show significant improvements in performance and accuracy across various models.
- The study reveals that different pruning criteria can yield complementary insights into parameter importance.
Read more
MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models
Summary
The paper introduces MOONSHOT, a novel framework for multi-objective pruning of vision and large language models, addressing the limitations of existing one-shot pruning methods that typically optimize a single objective. The authors argue that neither layer-wise reconstruction loss nor second-order Taylor approximation of training loss alone consistently yields optimal results across different architectures and sparsity levels. MOONSHOT extends existing single-objective pruning methods by jointly optimizing both objectives, thus improving the performance-sparsity trade-off. The framework is designed to be scalable for billion-parameter models and incorporates an efficient procedure for computing the inverse Hessian. Experimental results demonstrate that MOONSHOT, when combined with state-of-the-art pruning methods, significantly reduces perplexity and improves accuracy across various benchmarks, showcasing its effectiveness in compressing large models without retraining.
Methodology
The authors propose a multi-objective optimization framework that combines layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT is designed to be a wrapper around existing pruning algorithms, allowing for efficient integration while maintaining scalability. The methodology includes modeling decisions and an efficient computation of the inverse Hessian to preserve the efficiency of state-of-the-art one-shot pruners.
Results
MOONSHOT achieves a reduction in C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy by up to 4.9 points across seven classification benchmarks. For Vision Transformers, accuracy on ImageNet-1k improves by over 5 points at 70% sparsity, and ResNet-50 shows a 4-point gain at 90% sparsity.
Implications
The findings suggest that MOONSHOT can be a powerful tool for efficiently compressing large neural networks, making it applicable in real-world scenarios where computational resources are limited. This framework can enhance the deployment of large models in resource-constrained environments, potentially leading to broader adoption of advanced AI technologies.
Mean Flow Policy Optimization
Reinforcement Learning
Generative Models
Optimization
- MFPO leverages MeanFlow models to enhance efficiency in online RL.
- The approach promotes exploration through maximum entropy RL and soft policy iteration.
- MFPO addresses challenges in action likelihood evaluation and soft policy improvement.
- Experimental results demonstrate superior performance with reduced computational overhead.
Read more
Mean Flow Policy Optimization
Summary
The paper introduces Mean Flow Policy Optimization (MFPO), a novel approach that utilizes MeanFlow models as policy representations in online reinforcement learning (RL). Traditional diffusion models, while effective in capturing complex action distributions, suffer from high computational costs due to their iterative generative processes. MFPO addresses this limitation by employing MeanFlow models, which require fewer sampling steps while maintaining the ability to model multi-modal action distributions. The authors optimize these policies within the maximum entropy RL framework, promoting exploration through soft policy iteration. They tackle two significant challenges: evaluating action likelihoods and improving soft policies. By developing an average divergence network and an adaptive instantaneous velocity estimation method, MFPO effectively integrates MeanFlow models into the MaxEnt RL paradigm. Experimental results on MuJoCo and DeepMind Control Suite benchmarks show that MFPO achieves performance on par with or exceeding existing diffusion-based methods, while significantly reducing training and inference times.
Methodology
The authors propose MeanFlow models as a representation for policies in online RL, optimizing them under the maximum entropy framework. They develop techniques to approximate action likelihoods and construct a tractable training objective, enabling efficient policy optimization.
Results
MFPO matches or surpasses the performance of existing diffusion-based RL algorithms while requiring significantly fewer sampling steps and less training and inference time, as demonstrated in experiments on MuJoCo and DeepMind Control Suite benchmarks.
Implications
The findings suggest that MFPO could be a viable alternative to traditional diffusion-based methods in reinforcement learning, particularly in scenarios where computational efficiency is critical. This approach may enhance the applicability of RL in real-time systems and complex environments.
Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
Large Language Models
Reinforcement Learning
Theory
- Introduces PASS@(k, T), a two-dimensional evaluation metric for LLM agents.
- Demonstrates that RL expands the capability boundary of LLM agents in compositional tool-use tasks.
- Finds that supervised fine-tuning can regress capabilities on similar tasks, isolating exploration as a key factor.
- Mechanistic analysis reveals RL improves strategy selection and information integration.
Read more
Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
Summary
This paper investigates whether reinforcement learning (RL) expands the capabilities of large language model (LLM) agents or merely enhances their reliability in existing tasks. The authors introduce a novel evaluation metric, PASS@(k, T), which assesses agent performance across two dimensions: the number of independent attempts (k) and the depth of interaction with the environment (T). This framework allows for a clear distinction between capability expansion and efficiency improvement. The findings reveal that, unlike static mathematical reasoning tasks where RL does not significantly enhance capabilities, RL does expand the capability boundary for tasks requiring compositional tool use. As the sampling budget increases, the performance of RL agents surpasses that of base models, indicating genuine capability expansion. The study also highlights that supervised fine-tuning can regress capabilities in similar tasks, emphasizing the role of self-directed exploration in capability enhancement. A mechanistic analysis explains that RL reweights existing strategies towards those that yield correct answers more frequently, particularly in how agents integrate retrieved information. Overall, the results reconcile differing views on RL's impact on LLMs, showing that RL can teach new capabilities in agentic settings where compositional solutions are necessary.
Methodology
The authors developed the PASS@(k, T) metric to evaluate LLM agents based on sampling attempts and interaction depth. They conducted empirical experiments comparing base models, supervised fine-tuning, and RL-trained agents across various tasks, particularly focusing on compositional tool use. A mechanistic analysis was also performed to understand the underlying factors contributing to capability expansion.
Results
The study found that RL-trained agents significantly outperformed base models as the sampling budget increased, indicating a genuine expansion of capabilities in tasks requiring compositional strategies. In contrast, supervised fine-tuning regressed capabilities on similar tasks. The mechanistic analysis showed that RL effectively reweights strategies towards those that yield correct answers more frequently.
Implications
The findings suggest that investing in RL methods could lead to the development of more capable LLM agents, particularly in complex, compositional tasks. This has implications for future research and applications in AI, particularly in areas requiring advanced reasoning and tool use.
From Risk to Rescue: An Agentic Survival Analysis Framework for Liquidation Prevention
Optimization
Time Series
Theory
- The framework transitions from passive risk prediction to proactive intervention in liquidation prevention.
- A novel return period metric is introduced to normalize risk across transaction types.
- The counterfactual optimization loop simulates user actions to minimize required capital for risk mitigation.
- The system effectively differentiates between significant financial risks and minor 'dust' events.
Read more
From Risk to Rescue: An Agentic Survival Analysis Framework for Liquidation Prevention
Summary
This paper addresses the liquidation risks faced by users of decentralized finance (DeFi) lending protocols, particularly Aave v3, which often result from volatile market conditions and static risk management tools. The authors propose an autonomous agent that utilizes survival analysis to proactively prevent liquidations by simulating counterfactual futures and executing interventions. The framework introduces a 'return period' metric derived from a Cox proportional hazards model to normalize risk across different transaction types, along with a volatility-adjusted trend score to filter out transient market noise. A counterfactual optimization loop is implemented to determine the minimum capital required for risk mitigation. The proposed system is validated using a high-fidelity Aave v3 simulator on a dataset of 4,882 high-risk user profiles, demonstrating its ability to prevent liquidations in scenarios where static rules fail, achieving a zero worsening rate and optimizing capital efficiency by distinguishing between actionable risks and negligible events.
Methodology
The authors developed an agentic AI framework that employs survival analysis to assess liquidation risk. They created a high-fidelity simulator for Aave v3, allowing for causal replay-based evaluations of agent actions. The framework includes a counterfactual optimization loop to simulate potential user actions and determine the minimum capital needed for intervention.
Results
The proposed agent successfully prevented liquidations in imminent-risk scenarios for a cohort of 4,882 users, achieving a zero worsening rate while effectively ignoring economically irrelevant dust liquidation events. This demonstrates the agent's capability to save users from liquidation risks that static rules fail to address.
Implications
The findings suggest that integrating survival analysis with proactive intervention strategies can significantly enhance risk management in DeFi lending protocols. This approach could lead to more resilient financial systems and improved user experiences by reducing the likelihood of liquidation events.
Auxiliary Finite-Difference Residual-Gradient Regularization for PINNs
Theory
Optimization
- Introduces an auxiliary finite-difference regularizer for PINNs that maintains the governing PDE residual in AD form.
- Demonstrates a trade-off between accuracy of the field and cleanliness of the residual in a controlled Poisson problem study.
- Implements a body-fitted shell regularizer in a 3D heat-conduction benchmark, improving application-specific quantities.
- Identifies optimal configurations for regularization and learning rates that enhance model reliability.
Read more
Auxiliary Finite-Difference Residual-Gradient Regularization for PINNs
Summary
This paper introduces a novel approach to enhance the performance of Physics-Informed Neural Networks (PINNs) by employing an auxiliary finite-difference (FD) regularization technique. Traditional PINNs often rely on a single scalar loss function, which may not adequately capture the specific quantities of interest, such as boundary conditions or fluxes. The proposed method retains the automatic differentiation (AD) based governing PDE residual while introducing an auxiliary FD term that penalizes the gradients of the sampled residual field. This hybrid design aims to improve the accuracy of the PINN model without compromising the underlying PDE formulation. The study is conducted in two stages: the first stage involves a controlled experiment on a two-dimensional Poisson problem, comparing the baseline PINN with the FD regularizer and an AD residual-gradient baseline. The results indicate a trade-off between field accuracy and residual cleanliness. The second stage applies the same logic to a three-dimensional annular heat-conduction benchmark, where the auxiliary grid is implemented as a body-fitted shell adjacent to a wavy outer wall. The findings demonstrate that the shell regularizer significantly improves the outer-wall flux and boundary-condition behavior, achieving a reduction in RMSE for both wall flux and boundary conditions compared to the baseline model.
Methodology
The methodology involves a two-stage empirical study. Stage 1 tests the auxiliary FD regularizer on a controlled two-dimensional Poisson problem, comparing it to a baseline PINN and an AD residual-gradient baseline. Stage 2 applies the same residual-field logic to a three-dimensional annular heat-conduction problem, utilizing a body-fitted shell for the auxiliary grid. The performance is evaluated based on RMSE metrics for outer-wall flux and boundary conditions.
Results
The auxiliary FD regularizer significantly improved the outer-wall flux RMSE from 9.21 × 10−3 to 9.63 × 10−4 and the boundary condition RMSE from 1.22 × 10−2 to 9.29 × 10−4 across multiple seeds and epochs. The study also found that a fixed shell weight of 5 × 10−4 under the Kourkoutas-β optimizer regime yielded the most reliable results, while varying learning rates and optimizer settings affected the robustness of the improvements.
Implications
The findings suggest that incorporating auxiliary regularization aligned with specific physical quantities can enhance the performance of PINNs, particularly in complex geometries and applications where traditional loss functions may fall short. This approach could be beneficial in various engineering and scientific applications where accurate modeling of boundary conditions and fluxes is critical.
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
Time Series
- MambaSL achieves state-of-the-art performance in time series classification.
- The framework is guided by four TSC-specific hypotheses that refine Mamba's architecture.
- A unified benchmarking protocol is established, addressing issues of coverage, fairness, and reproducibility.
- MambaSL outperforms the second-best method by 1.41% in accuracy across UEA datasets.
Read more
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
Summary
This paper introduces MambaSL, a novel framework that enhances the application of the Mamba state space model (SSM) for time series classification (TSC). The authors identify critical gaps in existing TSC methodologies, particularly the underexplored potential of Mamba as a standalone architecture. They propose four TSC-specific hypotheses that guide the redesign of Mamba's selective SSM and projection layers. To address benchmarking limitations, the authors establish a unified evaluation protocol that re-assesses 20 strong baseline models across all 30 datasets from the University of East Anglia (UEA). MambaSL demonstrates state-of-the-art performance, achieving statistically significant improvements over existing methods while ensuring reproducibility through public checkpoints. The findings underscore the viability of Mamba-based architectures as a robust backbone for TSC, supported by comprehensive evaluations and visualizations that validate the proposed hypotheses.
Methodology
The authors propose a framework called MambaSL, which involves redesigning selective SSM components based on four hypotheses tailored for TSC. They conduct extensive hyperparameter tuning and evaluate 20 models across 30 UEA datasets using a consistent benchmarking setup to ensure fair comparisons.
Results
MambaSL achieves state-of-the-art performance in TSC, surpassing the second-best method by 1.41% in accuracy. The framework's architectural refinements and the comprehensive benchmarking approach contribute to statistically significant improvements in classification performance.
Implications
The findings suggest that Mamba-based architectures can serve as effective backbones for time series classification tasks, potentially influencing future research and applications in this domain. The established benchmarking protocol may also lead to more reliable comparisons in TSC literature.
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
Computer Vision
NLP
Multimodal
- GUI grounding models show a significant accuracy drop (27-56 pp) when tasked with spatial reasoning.
- A 70% browser zoom leads to a notable performance degradation across all tested models.
- Standard training methods do not enhance model robustness and may worsen spatial reasoning abilities.
- The GUI-Perturbed framework allows for controlled evaluation of grounding robustness by varying visual and instructional conditions.
Read more
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
Summary
This paper addresses the limitations of current GUI grounding models, which achieve high accuracy on standard benchmarks but exhibit significant performance drops when faced with spatial reasoning tasks. The authors introduce GUI-Perturbed, a controlled perturbation framework that systematically varies visual scenes and instructions to evaluate the robustness of GUI grounding models. Through experiments with three 7B models, the study reveals that relational instructions lead to a substantial accuracy collapse, with drops of 27-56 percentage points. Additionally, a 70% browser zoom significantly degrades model performance, indicating that models rely on fixed visual heuristics rather than understanding spatial relationships. The findings suggest that traditional training methods, including rank-8 LoRA fine-tuning, do not improve performance and may even degrade spatial reasoning capabilities. The authors provide a dataset, augmentation pipeline, and a fine-tuned model to facilitate further research and reproducibility.
Methodology
The authors developed the GUI-Perturbed framework, which applies domain randomization techniques to GUI grounding evaluation. This involves perturbing visual scenes (e.g., style changes, zoom levels) and instructions (e.g., direct vs. spatial-relational) along independent axes. They evaluated three models from the same architecture lineage using this framework to quantify performance under varying conditions.
Results
The study found that all models experienced a systematic accuracy collapse when faced with relational instructions. A 70% browser zoom resulted in a 3-8 percentage point drop in accuracy, particularly affecting relational queries. Additionally, rank-8 LoRA fine-tuning did not yield improvements and often led to performance degradation. The results indicate that models struggle with spatial reasoning and visual robustness, revealing critical weaknesses not captured by standard benchmarks.
Implications
The findings suggest that current GUI grounding models may not be reliable for real-world applications, particularly in dynamic environments. The GUI-Perturbed framework provides a valuable tool for diagnosing model weaknesses and guiding future improvements in GUI grounding systems. The released dataset and augmentation pipeline can facilitate further research in enhancing model robustness.
Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces
Reinforcement Learning
- Introduces Soft Q(λ), a multi-step off-policy method for entropy-regularized reinforcement learning.
- Develops a novel Soft Tree Backup operator to handle entropy terms across multiple time steps.
- Eliminates the on-policy bias inherent in traditional n-step soft Q-learning methods.
- Demonstrates the ability to learn entropy-regularized value functions under arbitrary behavior policies.
Read more
Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces
Summary
This paper introduces Soft Q(λ), a novel framework for multi-step off-policy reinforcement learning that incorporates eligibility traces and entropy regularization. The authors begin by formalizing an n-step soft Q-learning formulation, addressing the limitations of existing methods that are constrained to on-policy action sampling. They propose a new Soft Tree Backup operator that effectively manages entropy terms over multiple time steps without requiring knowledge of the behavior policy, thus eliminating the on-policy bias present in traditional n-step soft backups. The resulting Soft Q(λ) framework allows for efficient online, off-policy credit assignment, enabling the learning of entropy-regularized value functions under arbitrary behavior policies. This work provides a theoretically grounded approach to reinforcement learning that enhances exploration and stability, paving the way for future empirical experiments.
Methodology
The authors formalize an n-step soft Q-learning formulation and introduce the Soft Tree Backup operator, which leverages the recursive relationship between state-value and action-value functions. This operator allows for the handling of entropy terms over multiple time steps without requiring knowledge of the behavior policy, thus facilitating off-policy learning.
Results
The derivations in the paper show that Soft Q(λ) can learn entropy-regularized value functions stably under arbitrary behavior policies, without reliance on target networks or fixed exploration schedules. This indicates a significant advancement in the flexibility and robustness of reinforcement learning methods.
Implications
The Soft Q(λ) framework has the potential to improve exploration strategies in reinforcement learning applications, making it suitable for complex environments where traditional methods struggle. Its ability to operate off-policy could enhance the efficiency of learning in real-world scenarios, such as robotics and autonomous systems.
When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration
Large Language Models
NLP
Efficient ML
- Introduces Orthogonal Backfill (OBF) to enhance KV compression in multi-agent LLM communication.
- Achieves a significant reduction in communication costs (79.8%–89.4%) while maintaining competitive performance.
- Demonstrates that preserving useful information is more critical than merely relaying large amounts of data.
- Evaluates the method across nine diverse benchmarks, showing superior results in several cases.
Read more
When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration
Summary
This paper addresses the communication challenges in Large Language Model (LLM)-based multi-agent systems, particularly focusing on the inefficiencies of relaying full key-value (KV) caches between agents. The authors propose a novel approach called Orthogonal Backfill (OBF), which enhances eviction-style KV compression to mitigate information loss during the relay process. By injecting low-rank orthogonal residuals from discarded KV states into retained states, OBF aims to preserve the most useful information for downstream tasks. The authors evaluate their method against full KV relay across nine benchmarks, including mathematical reasoning, coding, and knowledge-intensive question answering. The results demonstrate that their approach achieves comparable performance to full KV relay while significantly reducing communication costs by 79.8% to 89.4%. Furthermore, OBF improves performance, achieving the best results on seven out of nine benchmarks. This suggests that effective communication in multi-agent systems relies more on the quality of preserved information rather than the quantity, highlighting the importance of targeted information retention in collaborative settings.
Methodology
The authors develop an eviction-style KV compression framework tailored for inter-agent relay in multi-agent systems. They introduce Orthogonal Backfill (OBF) to counteract information loss from hard eviction by incorporating residual information from discarded KV states into the retained states. The effectiveness of this approach is empirically validated through experiments on various benchmarks.
Results
The proposed method matches or outperforms full KV relay while achieving a drastic reduction in communication costs. OBF leads to improved performance on seven out of nine benchmarks tested, indicating that effective communication relies on the preservation of useful information rather than the sheer volume of data transmitted.
Implications
The findings suggest that optimizing communication strategies in multi-agent systems can enhance collaboration efficiency, particularly in complex tasks that require rich intermediate state exchanges. This work could inform future designs of multi-agent frameworks and improve the scalability of LLM applications.
Curvature-Aligned Probing for Local Loss-Landscape Stabilization
Theory
Optimization
Efficient ML
- Introduces a unified family of local stabilization criteria for loss-landscape analysis.
- Proposes a curvature-aligned criterion ∆(D)² that focuses on the top-D eigenspace of the Hessian.
- Demonstrates that the new criterion preserves the mean-squared rate of traditional methods while improving efficiency.
- Develops scalable estimators that significantly reduce computational costs compared to direct Monte Carlo methods.
Read more
Curvature-Aligned Probing for Local Loss-Landscape Stabilization
Summary
This paper addresses the challenge of local loss-landscape stabilization in neural networks as the training sample size increases. The authors argue that traditional methods of measuring local loss geometry, which often rely on isotropic averaging or pointwise evaluations, fail to capture the dominant local deformations in strongly anisotropic landscapes. To overcome this limitation, they introduce a unified family of stabilization criteria parameterized by aggregation order and probing distribution. A key contribution is the curvature-aligned criterion ∆(D)², which focuses on probing the loss increment field in the top-D eigenspace of the empirical Hessian near a trained solution. The authors prove that this criterion maintains the O(k⁻²) mean-squared rate of the full-space criterion while reducing the dependence on ambient curvature to a subspace dimension D. They also develop scalable estimators based on Hessian-vector products and Monte Carlo methods, demonstrating that the curvature-aligned probe can effectively reproduce the full-space mean-squared signal with significantly improved computational efficiency. Empirical results on a decoder-only transformer validate the effectiveness of the proposed methods, showing that they can capture local loss landscape characteristics accurately and efficiently.
Methodology
The authors recast local loss-landscape stabilization as an observational problem, introducing a family of criteria based on aggregation order and probing distribution. They focus on a curvature-aligned criterion that restricts probing to the top-D eigenspace of the Hessian. Theoretical proofs establish the performance of this criterion under a local quadratic model, while scalable estimators are derived using Hessian-vector products and Monte Carlo techniques.
Results
The curvature-aligned criterion ∆(D)² preserves the O(k⁻²) mean-squared decay of the full-space criterion, with empirical results showing that it can reproduce the full-space mean-squared signal to within numerical noise. The closed-form estimator is orders of magnitude faster than traditional Monte Carlo methods after subspace construction.
Implications
The findings suggest that local loss-landscape stabilization can be effectively analyzed using targeted probing strategies, which may lead to more efficient training and optimization of neural networks. This approach could enhance understanding of model behavior as training data grows, potentially informing better practices in model design and evaluation.
RPS: Information Elicitation with Reinforcement Prompt Selection
NLP
Large Language Models
Reinforcement Learning
- Proposes Reinforcement Prompt Selection (RPS) for adaptive information elicitation in dialogues.
- Introduces IELegal, a benchmark dataset for evaluating information elicitation in legal contexts.
- RPS outperforms static prompt baselines, demonstrating the effectiveness of adaptive strategies.
- Addresses the limitations of existing prompt engineering methods by reducing reliance on static prompts.
Read more
RPS: Information Elicitation with Reinforcement Prompt Selection
Summary
This paper addresses the challenge of information elicitation in open-ended dialogues using large language models (LLMs). Despite their advanced capabilities in dialogue generation, LLMs struggle to extract concealed or uncertain information from users due to privacy concerns and social hesitations. The authors propose a novel framework called Reinforcement Prompt Selection (RPS), which formulates prompt selection as a sequential decision-making problem using reinforcement learning. RPS adapts its prompt strategy based on user feedback to effectively elicit concealed information. The paper introduces IELegal, a benchmark dataset derived from real legal case documents, designed to simulate dialogue-based information elicitation tasks. Experimental results demonstrate that RPS outperforms static prompt baselines in both synthetic and real-world settings, highlighting its effectiveness in uncovering critical information during interactions. The findings suggest that adaptive prompt selection can significantly enhance the performance of LLM-driven dialogue systems in various applications, including legal consultation and personal assistance.
Methodology
The authors define the problem of information elicitation in open-ended dialogues and propose RPS, a lightweight reinforcement learning framework. RPS formulates prompt selection as a sequential decision-making task, learning a policy over a pool of prompts to adaptively elicit concealed information. The methodology includes synthetic experiments using a Gaussian Mixture Model environment to validate the approach and the introduction of the IELegal dataset for real-world evaluation.
Results
In controlled experiments, the reinforcement learning agent using RPS significantly outperformed a random query baseline. In the IELegal dataset, RPS demonstrated superior performance compared to static prompt baselines, effectively uncovering relevant and concealed information during legal consultations.
Implications
The findings suggest that RPS can enhance the capabilities of LLMs in various interactive AI applications, such as personal assistants, tutoring systems, and legal or clinical support, by improving their ability to elicit sensitive information from users. This could lead to more effective and context-aware conversational agents.
Metric-Aware Principal Component Analysis (MAPCA): A Unified Framework for Scale-Invariant Representation Learning
Theory
- MAPCA provides a unified framework for scale-invariant representation learning by utilizing a metric matrix.
- The β-family of metrics allows for interpolation between standard PCA and output whitening, addressing the trade-off between scale invariance and variance preservation.
- Invariant PCA (IPCA) is identified as a special case within the MAPCA framework, showcasing strict scale invariance.
- Connections to self-supervised learning methods are established, clarifying their underlying metric choices.
Read more
Metric-Aware Principal Component Analysis (MAPCA): A Unified Framework for Scale-Invariant Representation Learning
Summary
This paper introduces Metric-Aware Principal Component Analysis (MAPCA), a novel framework designed for scale-invariant representation learning. The framework is based on a generalized eigenproblem that incorporates a symmetric positive definite metric matrix, M, which determines the representation geometry. The canonical β-family of metrics allows for continuous control over spectral bias, bridging the gap between standard PCA and output whitening. The diagonal metric recovers Invariant PCA (IPCA), which exhibits strict scale invariance under diagonal rescaling. The MAPCA framework also connects various self-supervised learning objectives, revealing that methods like Barlow Twins and VICReg correspond to specific metric choices within MAPCA. A key finding is that W-MSE, often associated with whitening, corresponds to a metric outside the spectral compression range, highlighting the importance of the MAPCA framework in distinguishing between input and output whitening. The theoretical results are validated using the army cadets dataset, demonstrating the practical applicability of MAPCA in representation learning.
Methodology
The MAPCA framework formulates a generalized eigenproblem that incorporates a metric matrix, M, allowing for flexible representation geometry. The framework includes a canonical β-family of metrics that parameterizes the degree of spectral compression, enabling a continuous transition between standard PCA and output whitening. Theoretical properties are derived and validated through numerical experiments on a specific dataset.
Results
The theoretical framework of MAPCA is validated through numerical experiments on the army cadets dataset, demonstrating its effectiveness in achieving scale-invariant representations. The analysis reveals the distinct behaviors of various self-supervised learning methods when interpreted through the MAPCA lens, particularly highlighting the differences between input and output whitening.
Implications
The MAPCA framework has significant implications for dimensionality reduction and representation learning, particularly in scenarios where feature scaling is a concern. Its ability to unify various self-supervised learning objectives may lead to improved methodologies in machine learning applications, enhancing the robustness and interpretability of learned representations.
TOPCELL: Topology Optimization of Standard Cell via LLMs
Large Language Models
Optimization
Generative Models
- Introduction of TOPCELL, a framework leveraging LLMs for topology optimization in standard cell design.
- Utilization of Group Relative Policy Optimization (GRPO) for efficient and autonomous topology discovery.
- Demonstrated superior performance and zero-shot generalization in generating high-quality topologies.
- Achieved an average speedup of 85.91x compared to traditional exhaustive search methods.
Read more
TOPCELL: Topology Optimization of Standard Cell via LLMs
Summary
The paper presents TOPCELL, a novel framework that utilizes Large Language Models (LLMs) for the topology optimization of standard cells in integrated circuit design. Traditional methods for topology optimization face significant challenges due to the exponential complexity of exhaustive search techniques, especially as transistor counts increase in advanced technology nodes. TOPCELL reformulates the topology exploration as a generative task, employing Group Relative Policy Optimization (GRPO) to fine-tune the model according to logical and spatial constraints. Experimental results demonstrate that TOPCELL significantly outperforms existing foundation models in generating routable and physically-aware topologies. When integrated into a state-of-the-art automation flow for a 7nm library generation task, TOPCELL achieves an impressive speedup of 85.91x while maintaining layout quality comparable to exhaustive solvers. This work highlights the potential of LLMs in automating complex design tasks within Electronic Design Automation (EDA), paving the way for more efficient standard cell design processes.
Methodology
TOPCELL reformulates topology optimization as a generative task using LLMs. It employs GRPO to fine-tune the model, aligning its optimization strategy with circuit and layout constraints. The framework takes a standard-cell netlist as input and autonomously proposes topology modifications, optimizing for design-technology co-optimization objectives.
Results
TOPCELL outperformed larger foundation models in discovering routable topologies and demonstrated strong zero-shot generalization capabilities. In practical applications, it achieved a speedup of 85.91x in layout generation for a 7nm library, while maintaining comparable quality to traditional exhaustive solvers.
Implications
The findings suggest that LLMs can significantly enhance the efficiency of Electronic Design Automation processes, particularly in standard cell design. This could lead to faster design cycles, reduced manual intervention, and improved performance in advanced technology nodes.
Quantum-inspired tensor networks in machine learning models
Theory
Efficient ML
Interpretability
- Tensor networks offer a compressed representation of complex data dependencies, improving computational efficiency.
- They can enhance explainability and privacy in machine learning models compared to traditional neural networks.
- Two main approaches are discussed: using TNs as learning architectures and as compression strategies for existing models.
- TNs provide intrinsic measures of complexity and correlation, enabling better interpretability of model decisions.
Read more
Quantum-inspired tensor networks in machine learning models
Summary
This paper reviews the integration of tensor networks (TNs) into machine learning (ML), highlighting their potential to address challenges faced by traditional neural networks (NNs) such as high computational costs, opacity, and data leakage. TNs, originally developed for many-body quantum physics, provide a compressed representation of complex data dependencies, making them suitable for various ML tasks. The authors discuss two main approaches: using TNs as alternative learning architectures (Tensor Neural Networks) and employing them for structured compression of conventional NNs. The paper emphasizes the advantages of TNs in terms of computational efficiency, explainability, and privacy, while also addressing the challenges that need to be overcome for broader adoption in ML applications. The authors provide a critical assessment of the current state of research, applications in supervised and unsupervised learning, and the potential for quantum machine learning.
Methodology
The authors review theoretical concepts of tensor networks, including Penrose notation, tensor operations, and network architectures. They analyze applications of TNs in ML, focusing on their use in compression, explainability, and integration with deep learning models. The paper synthesizes existing literature and presents a critical assessment of the state of the art in quantum-inspired machine learning.
Results
The review indicates that tensor networks have been successfully implemented in various ML tasks, demonstrating advantages in computational efficiency and interpretability. The authors find that TNs can effectively reduce parameter counts in neural networks without significant performance loss, and they provide insights into how TNs can be leveraged for better privacy and explainability.
Implications
The findings suggest that tensor networks could revolutionize machine learning by providing more efficient and interpretable models. Their application could lead to advancements in fields such as natural language processing, computer vision, and quantum machine learning, ultimately enhancing the capabilities and trustworthiness of AI systems.
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
Theory
Optimization
- Introduces a new algorithm achieving ˜O(t−1/4) last-iterate convergence in bandit settings.
- Utilizes log-barrier regularization and dual-focused analysis for improved convergence rates.
- Extends results to extensive-form games, maintaining the same convergence rate.
- Addresses the limitations of previous methods that did not achieve optimal rates.
Read more
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
Summary
This paper addresses the challenge of achieving last-iterate convergence in zero-sum matrix games under bandit feedback conditions. The authors build upon previous work that established a lower bound on the exploitability gap of Ω(t−1/4) when players are uncoupled. They propose a novel algorithm that utilizes a mirror descent approach combined with log-barrier regularization, which enables achieving a convergence rate of ˜O(t−1/4) with high probability. This is a significant improvement over existing methods that have not reached this optimal rate. Furthermore, the authors extend their findings to extensive-form games, demonstrating that the same convergence rate can be achieved in this broader context. The paper emphasizes the importance of dual-focused analysis in the proposed methodology, which allows for effective learning of minimax policies in a sequential learning framework where players only observe their actions and outcomes.
Methodology
The authors propose an algorithm based on online mirror descent with log-barrier regularization. They analyze the convergence properties using a dual-focused approach, which allows for the derivation of bounds on the exploitability gap in the context of bandit feedback. The algorithm is designed to operate under conditions where players do not communicate their actions, thus addressing the challenges posed by uncoupled learning scenarios.
Results
The proposed algorithm successfully achieves a last-iterate convergence rate of ˜O(t−1/4) with high probability, which is optimal under the given conditions. Additionally, the extension to extensive-form games demonstrates that similar convergence rates can be achieved, indicating the robustness of the proposed methodology across different game structures.
Implications
The findings have significant implications for the design of learning algorithms in competitive environments, particularly in scenarios where players have limited feedback. The results can enhance the development of strategies in various applications, including economic modeling, automated decision-making systems, and game-theoretic frameworks in AI.
Can LLMs Score Medical Diagnoses and Clinical Reasoning as well as Expert Panels?
Large Language Models
NLP
Multimodal
- LLM jury scores are systematically lower than expert clinician panel scores.
- LLM jury shows better concordance with primary expert panels than human re-scoring panels.
- The probability of severe errors is lower in LLM jury models compared to human experts.
- Calibration of LLM jury improves alignment with human expert evaluations.
Read more
Can LLMs Score Medical Diagnoses and Clinical Reasoning as well as Expert Panels?
Summary
This paper investigates the feasibility of using large language models (LLMs) as evaluators for medical diagnoses and clinical reasoning, comparing their performance to that of expert clinician panels. The study involved an LLM jury composed of three advanced AI models that scored 3,333 diagnoses from 300 real-world cases in a middle-income country context. The evaluation focused on four dimensions: diagnosis accuracy, differential diagnosis, clinical reasoning, and negative treatment risk. The findings revealed that while uncalibrated LLM scores were generally lower than those from clinician panels, the LLM jury demonstrated better ordinal agreement and concordance with primary expert evaluations than human re-scoring panels. Additionally, the LLM models exhibited a lower probability of severe errors compared to human experts. Calibration of the LLM jury using isotonic regression significantly improved its alignment with expert evaluations. Overall, the results suggest that a calibrated, multi-model LLM jury can serve as a reliable and efficient alternative to expert clinician evaluations in medical AI benchmarking.
Methodology
The study utilized a dataset of 539 medical cases, with expert panels providing primary, secondary, and differential diagnoses. The LLM jury evaluated these cases across four dimensions, and performance was compared against evaluations from expert clinician panels and independent human re-scoring panels. The LLM jury's scoring was calibrated using isotonic regression to enhance alignment with expert evaluations.
Results
The LLM jury's uncalibrated scores were lower than those of expert panels, but it maintained ordinal agreement and showed better concordance with primary expert evaluations. The LLM models had a lower incidence of severe errors compared to human experts, and calibration improved their performance metrics, allowing them to match or outperform human re-score panels across all metrics.
Implications
The findings indicate that LLMs can effectively serve as a scalable and reliable alternative to traditional expert evaluations in medical AI applications, potentially improving the efficiency of medical diagnosis assessments and reducing the burden on human experts.
Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation
Time Series
Theory
Efficient ML
- DyMETER integrates parameter shifting and dynamic thresholding for effective online anomaly detection.
- The framework adapts to new concepts without retraining, enhancing efficiency.
- Instance-level concept uncertainty is estimated for robust adaptation.
- Dynamic threshold optimization ensures continuous alignment with evolving data distributions.
Read more
Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation
Summary
This paper presents DyMETER, a novel framework for online anomaly detection (OAD) that addresses the challenges posed by concept drift in dynamic data environments. Traditional OAD methods often require costly retraining and have rigid decision boundaries, which limit their adaptability. DyMETER overcomes these limitations by integrating on-the-fly parameter shifting and dynamic thresholding into a single online paradigm. Initially, it learns a static detector from historical data to identify recurring concepts, then transitions to a dynamic mode that adapts to new concepts as they emerge. The framework employs a hypernetwork to generate instance-aware parameter shifts, allowing for efficient adaptation without the need for retraining. Additionally, a lightweight evolution controller estimates instance-level concept uncertainty, facilitating robust and interpretable updates. DyMETER also includes a dynamic threshold optimization module that recalibrates decision boundaries based on uncertain samples, ensuring alignment with evolving concepts. Extensive experiments demonstrate that DyMETER significantly outperforms existing OAD methods across various application scenarios, showcasing its effectiveness in real-time anomaly detection.
Methodology
DyMETER employs a two-phase approach: first, it learns a static anomaly detection model from historical data to capture central concepts. Then, it transitions to a dynamic mode where it uses a hypernetwork for instance-aware parameter shifts and an evolution controller for uncertainty estimation. A dynamic threshold optimization module recalibrates decision boundaries based on uncertain samples.
Results
The experimental results indicate that DyMETER outperforms traditional OAD methods across a variety of scenarios, demonstrating its capability to effectively detect anomalies in the presence of concept drift.
Implications
DyMETER's approach to dynamic concept adaptation can be applied in various fields requiring real-time anomaly detection, such as finance, cybersecurity, and industrial monitoring, where data distributions frequently change.
AdaSplash-2: Faster Differentiable Sparse Attention
NLP
Large Language Models
Efficient ML
- ADASPLASH-2 reduces the number of iterations for computing the normalizer τ in α-entmax attention to 1-2.
- The method utilizes a histogram-based initialization stored in on-chip SRAM for efficient computation.
- It outperforms FlashAttention-2 in moderate-to-high sparsity regimes, improving training speed.
- Empirical results show that ADASPLASH-2 matches or exceeds the performance of softmax attention in various tasks.
Read more
AdaSplash-2: Faster Differentiable Sparse Attention
Summary
The paper presents ADASPLASH-2, an innovative approach to differentiable sparse attention that addresses the computational inefficiencies associated with the α-entmax attention mechanism, particularly in long-context training scenarios. Traditional softmax attention suffers from quadratic time and memory complexity, which limits its scalability. ADASPLASH-2 introduces a histogram-based initialization method that significantly reduces the number of iterations required to compute the normalizer τ, typically to just 1-2 iterations. This is achieved by computing a coarse histogram of attention scores on-the-fly and storing it in on-chip SRAM, allowing for faster forward and backward computations. The implementation also includes a sparsity-aware GPU kernel that efficiently skips zero blocks, resulting in improved training times compared to existing methods like FlashAttention-2, especially in moderate-to-high block sparsity scenarios. Empirical results demonstrate that models trained with ADASPLASH-2 not only match softmax baselines in short-context settings but also achieve substantial performance gains in long-context tasks, highlighting the effectiveness of the proposed method.
Methodology
The authors developed a histogram-based initialization technique for the α-entmax normalizer τ, which allows for fast evaluation without the need for dense intermediate representations. This method is combined with a safeguarded hybrid solver for one-pass refinement and an optimized GPU kernel that exploits dynamic sparsity efficiently.
Results
ADASPLASH-2 demonstrated improved per-step training times compared to FlashAttention-2, especially in cases of moderate-to-high block sparsity. The method achieved competitive performance on downstream tasks, matching or outperforming standard softmax attention in both short- and long-context scenarios.
Implications
The advancements made in ADASPLASH-2 could lead to more efficient training of transformer models, particularly in applications requiring long-context processing, such as natural language processing and other sequence-based tasks. This could enable broader use of transformers in real-time applications where computational resources are limited.
Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports
Reinforcement Learning
Multimodal
- Introduces a novel application of inverse reinforcement learning for style-based scouting in esports.
- Develops a two-branch architecture that combines telemetry data with tactical commentary for player evaluation.
- Demonstrates that the proposed system can match expert human analysts in player selection accuracy.
- Addresses the scalability issues of traditional scouting methods by automating the evaluation process.
Read more
Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports
Summary
This paper addresses the challenge of player scouting in esports, particularly in first-person shooters like Counter-Strike 2, where traditional methods rely heavily on qualitative assessments and manual video reviews. The authors propose a novel scouting system that utilizes inverse reinforcement learning (IRL) to derive pro-specific reward functions from gameplay data. This system ranks candidate players based on their fit to existing player archetypes, effectively transforming scouting from a subjective process into a data-driven one. The architecture consists of a two-branch intake that integrates structured telemetry data with tactical commentary from broadcast footage, allowing for a comprehensive evaluation of player styles. The authors validate their approach through empirical studies in collaboration with FNATIC Esports, demonstrating that their IRL-based selector can match expert analysts' judgments while scaling to evaluate a larger pool of players. This work represents a significant advancement in esports analytics by providing a systematic, scalable method for style-based player evaluation.
Methodology
The authors propose a two-branch intake architecture that fuses structured state-action trajectories from in-game telemetry with temporally aligned tactical commentary from broadcast footage. This integrated approach allows the system to learn pro-specific reward functions using inverse reinforcement learning, which are then used to rank candidate players based on their fit to specific player archetypes.
Results
The empirical validation of the proposed scouting system showed that it could effectively match the judgments of expert analysts and outperform simpler baselines that relied solely on telemetry or generic similarity metrics. The results indicate that the IRL-based selector is capable of accurately identifying players who fit desired styles, thus enhancing the scouting process.
Implications
This research has significant implications for esports organizations, as it provides a scalable, data-driven approach to player scouting that can enhance team composition and performance. By automating the evaluation process, teams can more efficiently identify players who fit specific tactical roles, ultimately improving roster construction and competitive success.
Awakening Dormant Experts: Counterfactual Routing to Mitigate MoE Hallucinations
NLP
Large Language Models
Efficient ML
- Identifies the 'Dormant Expert' phenomenon in MoE models due to static Top-k routing.
- Proposes Counterfactual Routing (CoR) as a training-free framework for expert reallocation.
- Demonstrates a 3.1% average improvement in factual accuracy on hallucination benchmarks.
- Maintains the same total activation count during inference, ensuring computational efficiency.
Read more
Awakening Dormant Experts: Counterfactual Routing to Mitigate MoE Hallucinations
Summary
This paper addresses the issue of hallucinations in Sparse Mixture-of-Experts (MoE) models, which are known for their scalability but struggle with factual accuracy, especially regarding long-tail knowledge. The authors identify that the static Top-k routing mechanism leads to a preference for high-frequency patterns, causing critical 'specialist experts' with long-tail knowledge to remain dormant and underutilized. To mitigate this problem, they propose a novel inference framework called Counterfactual Routing (CoR), which dynamically reallocates computational resources from syntax-dominant to knowledge-intensive layers without increasing the total activation count. CoR employs layer-wise perturbation analysis and the Counterfactual Expert Impact (CEI) metric to awaken dormant experts by focusing on causal necessity rather than correlation. The effectiveness of CoR is demonstrated through extensive experiments on various benchmarks, showing an average improvement of 3.1% in factual accuracy without additional inference costs, thus establishing a superior Pareto frontier compared to traditional static scaling strategies.
Methodology
The methodology involves a training-free inference framework called Counterfactual Routing (CoR), which utilizes layer-wise perturbation analysis to identify knowledge-intensive layers and the Counterfactual Expert Impact (CEI) metric to assess the causal necessity of experts. This allows for dynamic resource reallocation from syntax-dominant layers to those requiring factual knowledge, effectively activating dormant experts.
Results
The experiments conducted on datasets such as TruthfulQA, FACTOR, and TriviaQA show that CoR improves factual accuracy by an average of 3.1% without increasing the inference budget, outperforming static scaling strategies in terms of efficiency and accuracy.
Implications
The findings suggest that by addressing the routing bottleneck in MoE models, it is possible to enhance the factual accuracy of large language models significantly. This has potential applications in improving the reliability of AI systems in various domains, including information retrieval, conversational agents, and any application relying on accurate knowledge representation.
A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models
NLP
Large Language Models
Efficient ML
- Introduces a gradient-free sensitivity analysis framework for hybrid SSM-Transformer models.
- Demonstrates that KL divergence is a superior metric for quantization sensitivity in language models.
- Validates the proposed method through extensive experiments and real-world profiling.
- Achieves significant model compression with minimal accuracy loss, suitable for edge deployment.
Read more
A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models
Summary
This paper addresses the challenges of deploying Large Language Models (LLMs) on edge devices, which face significant computational and memory constraints. The authors propose a novel framework for sensitivity analysis that identifies components of hybrid Structured State Space Models (SSMs) and Transformer architectures most affected by quantization. Unlike traditional methods that rely on backpropagation, this approach uses a forward-pass sensitivity analysis, making it efficient and suitable for scenarios with limited access to in-domain data. The study demonstrates that the Kullback-Leibler (KL) divergence metric is a more effective measure of quantization sensitivity for language modeling tasks compared to conventional metrics like mean squared error (MSE) and signal-to-quantization-noise ratio (SQNR). Through extensive experiments, the authors validate that KL-based sensitivity rankings correlate well with performance degradation, enabling a practical deployment strategy for hybrid models on resource-constrained devices with minimal accuracy loss. The framework was tested on Intel Lunar Lake hardware, achieving near-FP16 perplexity while maintaining competitive model sizes and throughput against Uniform INT4 quantization.
Methodology
The authors developed a lightweight, backpropagation-free sensitivity analysis framework that operates solely on forward-pass metrics. This method identifies which components of hybrid SSM-Transformer architectures are most sensitive to quantization, allowing for targeted mixed-precision assignments. The effectiveness of the KL divergence metric was formally analyzed and compared against traditional metrics like MSE and SQNR.
Results
The experiments confirmed that the KL-based sensitivity rankings aligned with observed performance drops in hybrid models. The proposed framework enabled the deployment of mixed-precision models that achieved near-FP16 perplexity while being competitive in size and throughput with Uniform INT4 quantization on both CPU and GPU platforms.
Implications
This research provides a pathway for deploying advanced language models on edge devices, addressing the critical need for efficient model compression techniques that maintain performance. The findings can influence future work in model quantization strategies, particularly in resource-constrained environments.
Improving Sparse Autoencoder with Dynamic Attention
Interpretability
Computer Vision
NLP
- Introduction of a transformer-based SAE framework that utilizes cross-attention for coherent concept learning.
- Development of a sparsemax function that dynamically determines the number of active concepts without requiring hyperparameter tuning.
- Demonstration of superior reconstruction results and coherent concept capture compared to traditional SAE methods.
- Validation across various image and text tasks, showcasing the flexibility and efficiency of the proposed approach.
Read more
Improving Sparse Autoencoder with Dynamic Attention
Summary
This paper presents a novel approach to enhance Sparse Autoencoders (SAEs) by integrating dynamic attention mechanisms using sparsemax. The authors address the challenge of determining the optimal level of sparsity for neurons in SAEs, which is critical for balancing reconstruction quality and interpretability. Traditional activation functions like ReLU and TopK often require additional regularization or hyperparameter tuning, which can lead to suboptimal performance. The proposed method utilizes a cross-attention architecture where latent features serve as queries and a learnable dictionary acts as key and value matrices. By employing a sparsemax-based attention strategy, the model can dynamically infer a sparse set of concepts based on the complexity of each neuron, leading to improved activation functions. The authors validate their approach through comprehensive evaluations, demonstrating that it achieves lower reconstruction loss while generating high-quality concepts. Additionally, the sparsity level determined by the model can guide the tuning of existing SAEs, enhancing their performance.
Methodology
The authors propose a transformer-based SAE that employs a cross-attention mechanism, where latent features are treated as queries and a learnable dictionary serves as key and value matrices. The sparsemax function replaces the traditional softmax in the attention operation, allowing for dynamic determination of active concepts based on the complexity of each input sample.
Results
The proposed method shows significant improvements in reconstruction loss and concept quality compared to existing SAE models. The dynamic nature of the sparsemax function allows for a more accurate assignment of concepts, with the model adapting to the complexity of the input data.
Implications
This work has potential implications for enhancing the interpretability and performance of machine learning models, particularly in applications requiring sparse representations of data, such as image and text processing. The dynamic attention mechanism can be applied to various tasks where understanding the underlying concepts is crucial.
MAny: Merge Anything for Multimodal Continual Instruction Tuning
Multimodal
Large Language Models
Efficient ML
- Identification of a dual-forgetting phenomenon in MLLMs affecting both perception and reasoning.
- Introduction of Cross-modal Projection Merging (CPM) for adaptive merging of visual features.
- Development of Low-rank Parameter Merging (LPM) using Recursive Least Squares for optimal parameter merging.
- MAAny achieves state-of-the-art performance on UCIT and MLLM-DCL benchmarks without GPU training.
Read more
MAny: Merge Anything for Multimodal Continual Instruction Tuning
Summary
The paper addresses the challenge of Multimodal Continual Instruction Tuning (MCIT) for Multimodal Large Language Models (MLLMs), which often suffer from catastrophic forgetting. The authors identify a dual-forgetting phenomenon that occurs in both the Cross-modal Projection Space and the Low-rank Parameter Space, which has been overlooked in existing literature. To tackle this issue, they propose a novel framework called MAny (Merge Anything) that employs two key strategies: Cross-modal Projection Merging (CPM) and Low-rank Parameter Merging (LPM). CPM focuses on recovering perceptual alignment by merging task-specific visual representations using visual-prototype guidance, while LPM minimizes interference among low-rank modules through a recursive merging approach. Notably, MAny operates without the need for additional training, relying instead on efficient CPU-based algebraic operations. The framework demonstrates superior performance across multiple benchmarks, achieving significant improvements in accuracy compared to state-of-the-art methods.
Methodology
The authors developed MAny, which includes Cross-modal Projection Merging (CPM) to adaptively merge visual representations and Low-rank Parameter Merging (LPM) to minimize interference among low-rank modules. CPM utilizes visual-prototype guidance for perceptual alignment, while LPM employs a recursive least squares algorithm to ensure optimal merging of parameters. The approach is designed to be training-free, relying on efficient CPU-based operations.
Results
MAAny demonstrated significant improvements in performance on the UCIT benchmark, achieving up to 8.57% and 2.85% higher final average accuracy compared to existing state-of-the-art methods across two different MLLMs.
Implications
The findings suggest that addressing both perceptual and reasoning aspects in MLLMs can enhance their adaptability to sequential tasks. The lightweight nature of MAny makes it suitable for deployment in real-world applications where computational resources are limited.
One-shot learning for the complex dynamical behaviors of weakly nonlinear forced oscillators
Theory
Efficient ML
- Introduction of a one-shot learning method for nonlinear frequency-response curves.
- Development of MEv-SINDy for non-autonomous multi-frequency dynamics.
- Utilization of Generalized Harmonic Balance for complex response decomposition.
- Validation on MEMS applications showing accurate predictions across excitation levels.
Read more
One-shot learning for the complex dynamical behaviors of weakly nonlinear forced oscillators
Summary
This paper presents a novel one-shot learning method aimed at identifying the global nonlinear frequency-response curves of weakly nonlinear forced oscillators using a single excitation time history. The authors introduce MEv-SINDy (Multi-frequency Evolutionary Sparse Identification of Nonlinear Dynamics), which enhances the equation learning framework from autonomous single-frequency dynamics to non-autonomous multi-frequency dynamics by integrating the Generalized Harmonic Balance (GHB) method. This approach allows for the decomposition of complex forced responses into slow-varying evolution equations, facilitating the extraction of governing equations. The methodology is validated through applications on two critical Micro-Electro-Mechanical Systems (MEMS): a nonlinear beam resonator and a MEMS micromirror. The results demonstrate that the model can accurately predict softening/hardening effects and jump phenomena across a wide range of excitation levels, significantly reducing the data acquisition burden for the characterization and design of nonlinear microsystems.
Methodology
The authors developed MEv-SINDy, which employs the Generalized Harmonic Balance method to analyze and decompose the dynamics of weakly nonlinear forced oscillators. This method allows for the inference of governing equations from a single excitation time history, facilitating the identification of frequency-response curves without extensive data collection.
Results
The proposed methodology was successfully validated on two MEMS devices, demonstrating its ability to accurately predict complex nonlinear behaviors such as softening and hardening effects, as well as jump phenomena, across various excitation levels. This indicates a significant improvement in the efficiency of data acquisition and modeling for nonlinear systems.
Implications
The findings suggest that the one-shot learning approach can greatly enhance the design and characterization processes of nonlinear microsystems, making it feasible to predict complex dynamical behaviors with minimal data. This could lead to advancements in various engineering applications, particularly in MEMS technology.
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
Reinforcement Learning
Optimization
Large Language Models
- Introduces STOMP, a novel offline RL algorithm for multi-objective optimization.
- Utilizes smooth Tchebysheff scalarization to effectively capture non-convex regions of the Pareto front.
- Demonstrates superior performance over existing methods in protein engineering tasks.
- Addresses the limitations of linear reward scalarization in multi-objective RL.
Read more
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
Summary
This paper addresses the challenge of aligning large language models with human preferences through offline reinforcement learning (RL) in multi-objective settings. Traditional approaches often rely on linear reward scalarization, which fails to capture non-convex regions of the Pareto front, crucial for optimizing conflicting objectives. The authors propose a novel algorithm, Smooth Tchebysheff Optimization of Multi-Objective Preferences (STOMP), which utilizes smooth Tchebysheff scalarization to frame multi-objective RL as an optimization problem. This method dynamically standardizes individual rewards based on their observed distributions, thus overcoming the limitations of linear scalarization. The effectiveness of STOMP is empirically validated through experiments on protein engineering tasks, where it aligns autoregressive protein language models with multiple fitness objectives. The results demonstrate that STOMP outperforms state-of-the-art baselines in achieving higher hypervolumes in various evaluation settings, indicating its robustness and potential for improving multi-attribute optimization tasks.
Methodology
The authors frame multi-objective RL as an optimization problem and apply smooth Tchebysheff scalarization to derive STOMP. This approach standardizes individual rewards based on their distributions in an offline dataset, allowing for effective optimization of multiple conflicting objectives without the need for manual scaling of rewards.
Results
STOMP was empirically validated on protein engineering tasks, achieving the highest hypervolumes in eight out of nine evaluation settings compared to state-of-the-art baselines. This indicates that STOMP is effective in aligning models with multiple objectives, significantly improving performance in offline RL scenarios.
Implications
The proposed STOMP algorithm has significant implications for various applications requiring multi-objective optimization, such as protein engineering, chatbot development, and other domains where conflicting objectives must be balanced. It provides a robust framework for enhancing the alignment of large language models with human preferences.
CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization
Optimization
Theory
Efficient ML
- The generalization error of the Lion optimizer is established as O(1/NτT).
- CLion optimizer improves generalization error to O(1/N).
- CLion demonstrates a fast convergence rate of O(√d/T^(1/4)).
- The study provides a rigorous analysis of the generalization properties of learning-based optimizers.
Read more
CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization
Summary
The paper introduces the Cautious Lion (CLion) optimizer, an enhancement of the Lion optimizer, which is known for its effectiveness in training deep learning models. While the Lion optimizer has been previously studied for its convergence properties, its generalization capabilities were not well understood. This work fills that gap by analyzing the generalization error of the Lion optimizer, proving it to be O(1/NτT), where N is the training sample size, τ is the smallest non-zero element in the gradient estimator, and T is the total number of iterations. The authors also demonstrate that the SignSGD algorithm shares the same generalization error. To improve generalization, the CLion optimizer is proposed, which utilizes a cautious approach to the sign function, resulting in a lower generalization error of O(1/N). Additionally, the paper establishes that CLion achieves a fast convergence rate of O(√d/T^(1/4)) under the ℓ1-norm of the gradient for nonconvex stochastic optimization problems. Extensive numerical experiments validate the effectiveness of the CLion optimizer compared to existing methods.
Methodology
The authors utilize mathematical induction to analyze the generalization properties of the Lion optimizer, proving its generalization error. They then design the CLion optimizer by cautiously applying the sign function, enhancing its generalization capabilities. The convergence properties of CLion are also studied, leading to theoretical proofs regarding its performance in nonconvex stochastic optimization.
Results
The paper proves that the Lion optimizer has a generalization error of O(1/NτT) and that CLion achieves a lower generalization error of O(1/N). The convergence rate of CLion is shown to be O(√d/T^(1/4)), indicating improved efficiency in training. Numerical experiments demonstrate that CLion outperforms existing optimization algorithms.
Implications
The findings suggest that the CLion optimizer can be effectively used in training large-scale deep learning models, potentially leading to better generalization and faster convergence in various machine learning applications. This work also contributes to the theoretical understanding of learning-based optimization algorithms, paving the way for future research in this area.
Zeroth-Order Optimization at the Edge of Stability
Optimization
Theory
- Introduces a mean-square linear stability theory for Zeroth-Order optimization methods.
- Establishes that ZO methods' stability is influenced by the entire Hessian spectrum, unlike First-Order methods.
- Derives tractable stability bounds based on the largest eigenvalue and Hessian trace.
- Empirical evidence shows ZO methods operate at the edge of stability across various deep learning tasks.
Read more
Zeroth-Order Optimization at the Edge of Stability
Summary
This paper investigates the dynamics of Zeroth-Order (ZO) optimization methods, which are essential in scenarios where gradients are unavailable or costly to compute, such as black-box learning and fine-tuning large models. The authors establish a step size condition that captures the mean-square linear stability of ZO methods, revealing a significant difference from First-Order (FO) methods, where stability is influenced solely by the largest Hessian eigenvalue. In contrast, ZO methods' stability is determined by the entire Hessian spectrum. Given the impracticality of computing the full Hessian spectrum during neural network training, the authors derive stability bounds based on the largest eigenvalue and the Hessian trace. Empirical results demonstrate that full-batch ZO methods operate at the edge of stability, with ZO-GD, ZO-GDM, and ZO-Adam stabilizing near the predicted stability boundary across various deep learning tasks. The findings suggest an implicit regularization effect unique to ZO methods, where larger step sizes primarily regularize the Hessian trace, differing from FO methods that regularize the top eigenvalue.
Methodology
The authors develop a theoretical framework for analyzing the stability of ZO optimization methods, focusing on mean-square linear stability. They derive conditions for stability based on the Hessian spectrum and conduct empirical experiments to validate their theoretical findings using various ZO methods on deep learning tasks.
Results
The study finds that ZO methods, specifically ZO-GD, ZO-GDM, and ZO-Adam, consistently stabilize near the predicted stability boundary. The results indicate that the stability of ZO methods is governed by the Hessian trace rather than the largest eigenvalue, showcasing a different dynamic compared to FO methods.
Implications
The findings have significant implications for the design and application of ZO optimization methods in machine learning, particularly in scenarios where gradient computation is infeasible. Understanding the stability dynamics can lead to improved performance and efficiency in training large models.
Tight Sample Complexity Bounds for Best-Arm Identification Under Bounded Systematic Bias
Theory
Optimization
Robotics
- Introduces a localized Best-Arm Identification framework for node expansion under bounded systematic bias.
- Establishes tight sample complexity bounds for safe node elimination, highlighting the importance of the empirical reward gap.
- Presents the PAC-MCTS algorithm as a practical implementation of the theoretical findings.
- Demonstrates through experiments that the proposed method effectively preserves optimal paths while managing computational costs.
Read more
Tight Sample Complexity Bounds for Best-Arm Identification Under Bounded Systematic Bias
Summary
This paper addresses the challenges of Best-Arm Identification (BAI) in the context of bounded systematic bias, particularly relevant in autonomous reasoning and embodied planning where the action space expands significantly. The author frames the node expansion process as a localized BAI problem, introducing a sample complexity bound of O((∆−4L)−2), which indicates that safe node elimination is feasible only when the empirical reward gap exceeds 4L. The paper also presents an information-theoretic lower bound of Ω((∆−2L)−2) to establish the limits of biased search. The proposed PAC-MCTS algorithm incorporates these theoretical insights into a practical pruning mechanism that dynamically manages the active frontier of candidate nodes. Experimental evaluations demonstrate that adhering to the derived safety boundaries preserves optimal trajectories while maximizing sample allocation efficiency, validating the theoretical bounds established.
Methodology
The paper formulates the node expansion as a BAI problem and derives sample complexity bounds using the Lambert W function. It introduces the PAC-MCTS algorithm, which dynamically updates empirical means and confidence intervals to ensure safe pruning of suboptimal nodes while maintaining a localized PAC safety requirement.
Results
The theoretical results confirm that safe pruning is possible only when the effective gap exceeds the bias threshold. The PAC-MCTS algorithm successfully identifies optimal nodes in synthetic environments, validating the proposed sample complexity bounds and demonstrating improved sample allocation efficiency.
Implications
The findings have significant implications for enhancing decision-making processes in autonomous systems, particularly in scenarios where systematic biases are present, such as in the deployment of large language models in complex reasoning tasks.
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
NLP
Large Language Models
Optimization
- Parameter importance in supervised fine-tuning is dynamic, not static.
- Evolving Parameter Isolation (EPI) adapts isolation masks based on online gradient estimates.
- EPI improves stability and generalization in multi-task learning scenarios.
- The framework effectively balances the retention of established knowledge with the acquisition of new capabilities.
Read more
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
Summary
This paper addresses the challenges of Supervised Fine-Tuning (SFT) in large language models, particularly the issues of task interference and catastrophic forgetting. Traditional methods of parameter isolation assume that the importance of parameters remains static, which is contradicted by the authors' empirical findings that parameter importance actually evolves over the course of training. To tackle this issue, the authors propose a novel framework called Evolving Parameter Isolation (EPI), which dynamically updates isolation decisions based on real-time estimates of parameter importance. EPI utilizes gradient-based signals to periodically adjust isolation masks, allowing the model to protect newly critical parameters while releasing those that have become redundant. The authors conduct extensive experiments across various multi-task benchmarks, demonstrating that EPI significantly reduces interference and forgetting compared to both static isolation methods and standard fine-tuning approaches, while also enhancing overall generalization. The findings underscore the necessity of aligning isolation strategies with the dynamic nature of learning in diverse task environments.
Methodology
The EPI framework employs an online importance estimation mechanism that continuously monitors gradient-based signals to track parameter sensitivity. It combines temporal smoothing with layer-wise normalization to dynamically update isolation masks, creating a 'moving shield' that adapts to the evolving importance of parameters throughout the training process.
Results
Experiments show that EPI consistently outperforms standard SFT and static isolation baselines across diverse benchmarks, leading to reduced task interference and catastrophic forgetting, while improving overall model generalization.
Implications
The findings suggest that adapting isolation strategies in real-time can enhance the performance of large language models in multi-task settings, potentially leading to more effective applications in areas requiring dynamic learning capabilities.
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
Large Language Models
NLP
Theory
- LongCoT is a novel benchmark for evaluating long-horizon reasoning in language models.
- The benchmark consists of 2,500 expert-designed problems across multiple domains.
- Current top models achieve less than 10% accuracy on LongCoT, highlighting significant reasoning limitations.
- The problems require navigating complex interdependencies, emphasizing the need for planning and error management.
Read more
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
Summary
The paper introduces LongCoT, a benchmark designed to evaluate the long-horizon chain-of-thought (CoT) reasoning capabilities of advanced language models. As language models are increasingly utilized for complex tasks, their ability to reason over extended sequences of interdependent steps is crucial. LongCoT comprises 2,500 expert-designed problems across various domains, including chemistry, mathematics, computer science, chess, and logic. Each problem requires navigating a complex graph of interdependent reasoning steps, with the aim of isolating and measuring the models' long-horizon reasoning abilities. The benchmark reveals that even the best-performing models, such as GPT 5.2 and Gemini 3 Pro, achieve less than 10% accuracy, indicating significant limitations in their reasoning capabilities over long chains of thought. The authors emphasize the importance of this benchmark in assessing and improving the reasoning abilities of future models, as current benchmarks fail to adequately stress-test these capabilities.
Methodology
The authors developed LongCoT by designing a set of 2,500 problems that require long-horizon reasoning across various domains. Each problem consists of a short prompt with a verifiable answer, necessitating the navigation of a graph of interdependent reasoning steps. The problems are constructed using domain-specific templates that allow for scalable question generation while ensuring that each step is tractable in isolation. This design isolates failures in reasoning to long-horizon capabilities rather than single-step difficulties.
Results
The best-performing models, including GPT 5.2 and Gemini 3 Pro, achieved accuracies of 9.8% and 6.1%, respectively, on the LongCoT benchmark. These results indicate a substantial gap in the current capabilities of language models to perform long-horizon reasoning, as they struggle with the complexity and interdependencies of the tasks presented.
Implications
LongCoT provides a critical framework for assessing and improving the reasoning capabilities of language models, particularly as they are deployed in more complex and autonomous tasks. The benchmark can guide future research and development efforts aimed at enhancing the reasoning abilities of AI systems, potentially leading to more reliable and effective applications in various fields.
Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety
Reinforcement Learning
Robotics
Time Series
- Integration of real-time drowsiness detection into an autonomous braking system.
- Utilization of ECG signals for accurate drowsiness monitoring.
- Development of a Double Dual Deep Q-Network (DD-DQN) for adaptive braking policies.
- Achieved a 99.99% success rate in avoiding accidents in both drowsy and non-drowsy scenarios.
Read more
Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety
Summary
This paper presents a novel autonomous braking system that incorporates driver drowsiness detection using deep reinforcement learning (DRL) to enhance road safety. Recognizing that drowsiness significantly impairs a driver's ability to judge safe braking distances, the authors propose a system that integrates physiological data, specifically ECG signals, to detect drowsiness in real-time. The system utilizes a Double Dual Deep Q-Network (DD-DQN) agent that learns adaptive braking policies based on vehicle dynamics, traffic conditions, and the driver's cognitive state. The study includes an exhaustive benchmark analysis of different ECG window segmentation configurations to optimize drowsiness detection. The proposed RNN model effectively predicts drowsiness, which is then used to simulate delayed driver reactions in the DQN agent. The framework is evaluated in a high-fidelity simulation environment, demonstrating a remarkable 99.99% success rate in maintaining safe distances and avoiding collisions under both drowsy and alert conditions. This work represents a significant advancement in integrating physiological monitoring with autonomous driving systems, aiming to improve safety on the roads.
Methodology
The authors developed a drowsiness-aware braking system using a Double Dual Deep Q-Network (DD-DQN) agent. They employed a Recurrent Neural Network (RNN) to process ECG-derived features for real-time drowsiness detection. The system was trained in a simulation environment that mimicked real-world driving conditions, incorporating various configurations for ECG signal segmentation to optimize detection accuracy. The drowsiness state was integrated into the DQN's observable state space, simulating delayed actions to reflect impaired driver responses.
Results
The DD-DQN agent achieved a 99.99% success rate in maintaining safe following distances and avoiding collisions during testing. Over 30,000 seconds of simulation, only 0.9 seconds of cumulative violations of safe distance occurred, all during drowsy states, indicating the agent's robustness in adapting to impaired driving conditions.
Implications
This research has significant implications for the development of intelligent driving systems that can enhance road safety by integrating physiological monitoring. The approach could lead to more adaptive and responsive vehicle control systems that account for driver states, potentially reducing accident rates caused by drowsiness.
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
NLP
Large Language Models
Reinforcement Learning
Optimization
- Introduction of Contribution-Weighted GRPO (CW-GRPO) framework for search agents.
- Reframing process supervision as advantage reallocation based on round contributions.
- Demonstrated significant performance improvements over standard GRPO.
- Empirical evidence shows concentrated contributions in informative search rounds.
Read more
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
Summary
This paper presents Contribution-Weighted Group Relative Policy Optimization (CW-GRPO), a novel framework designed to enhance the performance of Large Language Model (LLM)-based search agents. Traditional reinforcement learning methods for training search agents face significant challenges, particularly in terms of unstable value estimation and credit assignment due to sparse rewards. CW-GRPO addresses these issues by integrating process supervision into the optimization process. Instead of optimizing process rewards directly, CW-GRPO utilizes an LLM judge to evaluate the utility and correctness of each search round, generating contribution scores that inform the redistribution of outcome-based advantages. This approach allows for more precise credit assignment while maintaining optimization stability. The empirical results demonstrate that CW-GRPO outperforms standard Group Relative Policy Optimization (GRPO) by 5.0% on the Qwen3-8B benchmark and 6.3% on Qwen3-1.7B, indicating improved search behaviors. The findings also reveal that successful search trajectories tend to concentrate contributions in informative rounds, providing insights into effective search strategies.
Methodology
The CW-GRPO framework employs an LLM judge to assess the retrieval utility and reasoning correctness at each search round, producing per-round contribution scores. These scores are then used to rescale outcome-based advantages along the trajectory, allowing for fine-grained credit assignment without compromising optimization stability.
Results
CW-GRPO achieved performance improvements of 5.0% on the Qwen3-8B benchmark and 6.3% on the Qwen3-1.7B benchmark compared to standard GRPO, indicating more effective search behaviors. The analysis of successful trajectories revealed that contributions to task success are highly concentrated in informative rounds.
Implications
The CW-GRPO framework could enhance the effectiveness of LLM-based search agents in various knowledge-intensive applications, improving their ability to retrieve and utilize real-time information. This could lead to advancements in fields such as information retrieval, question answering, and interactive AI systems.
Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism
Time Series
- The study investigates the potential of unsupervised domain transfer for sleep monitoring amidst signal degradation.
- A discriminator-guided approach is proposed to enhance the realism of hypnograms, which can improve scoring accuracy.
- The unsupervised method shows performance improvements in various signal distortion scenarios without decreasing overall performance.
- Real-life application of the method revealed limited benefits, indicating the need for further refinement.
Read more
Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism
Summary
This paper explores the use of unsupervised domain transfer techniques to enhance sleep monitoring by addressing signal degradation issues. The authors propose a method that combines a pretrained 'u-sleep' model with a discriminator network to align features from a target domain with those learned during pretraining. The study investigates how 'realism' in hypnograms can guide the adaptation process to various types of signal distortions encountered in mobile sleep monitoring. The results indicate that the unsupervised approach can improve performance metrics, such as Cohen's kappa, by up to 0.29 depending on the distortion type, without decreasing performance in any case. However, the method does not achieve the theoretical optimal performance and shows limited benefits when applied to real-life domain mismatches. The findings suggest that 'discriminator-guided fine tuning' holds promise for improving sleep monitoring systems, although further development is needed before practical implementation.
Methodology
The authors utilized an adversarial learning framework that includes a sleep scorer and a discriminator network. The sleep scorer is trained to accurately classify sleep stages while simultaneously fooling the discriminator into believing that the hypnograms from the target domain are similar to those from the source domain. This approach allows the model to adapt to signal degradations without requiring ground truth labels from the target domain.
Results
The unsupervised domain transfer method improved Cohen's kappa scores by as little as 0.03 and up to 0.29, depending on the type of signal distortion. The method maintained performance across all tested scenarios but did not reach the estimated theoretical optimal performance. In real-life applications, the benefits were found to be insignificant.
Implications
The findings suggest that adversarial domain transfer techniques can be a viable approach for enhancing sleep monitoring systems, particularly in dealing with signal degradation in real-world settings. However, further research and development are necessary to optimize the method for practical use in clinical environments.
Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting
Time Series
- TSFMs generally outperform task-specific models in probabilistic electricity price forecasting.
- Well-configured task-specific models can achieve performance close to or surpassing TSFMs under certain conditions.
- The study emphasizes the importance of balancing computational efficiency with forecasting accuracy.
- Probabilistic forecasts are crucial for managing risks in electricity markets with high renewable energy integration.
Read more
Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting
Summary
This paper investigates the trade-off between performance and computational efficiency in probabilistic electricity price forecasting (PEPF) using foundation models, particularly in the context of European electricity markets characterized by high renewable energy integration. The authors compare four forecasting models: a deterministic NHITS backbone with Quantile-Regression Averaging (NHITS+QRA), a conditional Normalizing-Flow forecaster (NF), and two Time Series Foundation Models (TSFMs), Moirai and ChronosX. The study finds that TSFMs generally outperform task-specific models in terms of Continuous Ranked Probability Score (CRPS), Energy Score, and predictive interval calibration across various market conditions. However, well-configured task-specific models like NHITS+QRA can achieve comparable performance, and in certain scenarios, they can even surpass TSFMs when enhanced with additional features or adapted through few-shot learning. The findings suggest that while TSFMs provide advanced modeling capabilities, traditional models remain competitive, highlighting the importance of balancing computational costs with performance gains in PEPF.
Methodology
The authors benchmarked four forecasting models: NHITS+QRA, a conditional Normalizing-Flow forecaster, and two TSFMs (Moirai and ChronosX) in the context of day-ahead probabilistic electricity price forecasting in European markets. They assessed model performance using metrics such as CRPS, Energy Score, and predictive interval calibration under various market conditions.
Results
The results indicate that TSFMs consistently outperform task-specific models in key performance metrics. However, NHITS+QRA, when properly configured and supplemented with additional informative features or adapted via few-shot learning, can achieve comparable or superior performance in certain scenarios.
Implications
The findings suggest that while foundation models offer robust forecasting capabilities, traditional models should not be overlooked, especially in terms of computational efficiency. This has implications for market participants in optimizing their forecasting strategies and managing risks associated with electricity price volatility.
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
Reinforcement Learning
Large Language Models
Optimization
- Introduction of CoUR framework for efficient reward function design in RL.
- Integration of code uncertainty quantification to streamline reward component reuse.
- Utilization of Bayesian optimization for independent optimization of reward terms.
- Extensive evaluation showing CoUR outperforms traditional methods in performance and cost.
Read more
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
Summary
This paper addresses the challenges of designing effective reward functions in reinforcement learning (RL), which is often a labor-intensive and error-prone process. The author proposes a novel framework called Chain of Uncertain Rewards (CoUR) that integrates large language models (LLMs) to streamline the reward function design and evaluation process. CoUR introduces a mechanism for code uncertainty quantification that identifies and reuses relevant reward function components through textual and semantic analyses, thereby reducing redundancy and improving consistency. Additionally, it employs Bayesian optimization on decoupled reward terms to enhance the efficiency of reward feedback exploration. The effectiveness of CoUR is validated through comprehensive experiments across nine environments from IsaacGym and 20 tasks from the Bidexterous Manipulation benchmark, demonstrating superior performance and reduced evaluation costs compared to traditional methods. The paper highlights the potential of LLMs in automating and optimizing reward design in RL, addressing local uncertainties, and minimizing redundant efforts.
Methodology
The CoUR framework employs a code uncertainty quantification mechanism that combines textual and semantic analyses to identify relevant reward components. It also utilizes Bayesian optimization on decoupled reward terms to enhance the efficiency of the reward evaluation process.
Results
The experimental results indicate that CoUR significantly improves performance across various benchmarks while reducing the cost and complexity associated with reward evaluations. The framework demonstrates faster convergence and less redundancy in reward design compared to traditional methods.
Implications
The proposed CoUR framework has the potential to revolutionize reward function design in RL by automating the process, addressing uncertainties, and minimizing redundant efforts. This could lead to more efficient training of RL agents in complex environments, ultimately enhancing their performance and adaptability.
Stability and Generalization in Looped Transformers
Theory
Large Language Models
NLP
- Introduces a fixed-point framework to analyze stability in looped transformers.
- Proves that recall and outer normalization are crucial for achieving meaningful predictions.
- Empirical results confirm that performance aligns with theoretical predictions across various tasks.
- Presents internal recall as a novel variant that enhances performance under specific conditions.
Read more
Stability and Generalization in Looped Transformers
Summary
This paper investigates the architectural choices in looped transformers that enable them to generalize to harder problems at test time, rather than merely memorizing training-specific solutions. The author introduces a fixed-point based framework to analyze looped architectures along three axes of stability: reachability, input-dependence, and geometry. The theoretical findings demonstrate that looped networks without recall have countable fixed points and lack strong input-dependence across spectral regimes. In contrast, incorporating recall with outer normalization allows for a regime where fixed points are reachable, locally smooth, and supported by stable backpropagation. Empirical validation is conducted through training single-layer looped transformers on tasks like chess, sudoku, and prefix-sums, revealing that performance aligns with the framework's predictions. Additionally, a novel recall placement variant, termed internal recall, shows competitive performance, especially in sudoku, when outer normalization is applied. The paper provides a comprehensive understanding of the architectural elements necessary for stable computation in looped transformers, addressing inconsistencies in previous empirical results.
Methodology
The paper employs a theoretical analysis based on fixed-point stability to evaluate looped transformer architectures. It examines the impact of architectural choices such as recall and outer normalization on the stability axes. Empirical validation is performed by training single-layer looped transformers on multiple reasoning tasks and analyzing their performance against the theoretical framework.
Results
The study finds that looped transformers with recall and outer normalization achieve a regime of stability where all three axes (reachability, input-dependence, geometry) are satisfied. The empirical results show that the performance of the models on chess, sudoku, and prefix-sums tasks aligns with the predictions made by the theoretical framework, with internal recall demonstrating competitive performance, particularly in sudoku.
Implications
The findings suggest that looped transformers can be effectively designed to generalize beyond their training data, potentially leading to advancements in reasoning tasks in NLP and other domains. The insights into architectural choices could inform future designs of transformer models, enhancing their ability to tackle complex problems.
Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence
Theory
Reinforcement Learning
Time Series
- Establishes a minimax lower bound for classification risk under Markov dependence.
- Demonstrates that uniform bagging is suboptimal, with a significant risk gap.
- Proposes adaptive spectral routing to achieve optimal performance in Markov settings.
- Validates theoretical predictions through extensive experiments on various datasets.
Read more
Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence
Summary
This paper addresses the performance degradation of majority-vote ensembles when trained on data exhibiting Markov dependence, which is common in time-series forecasting and reinforcement learning. The authors establish a minimax characterization of the classification risk for these ensembles in a fixed-dimensional Markov setting. They derive an information-theoretic lower bound indicating that no estimator can achieve better than Ω(√Tmix/n) excess classification risk. The paper also demonstrates that uniform bagging is suboptimal under these conditions, with a risk bounded below by Ω(Tmix/√n), revealing a significant gap. To overcome this, the authors propose an adaptive spectral routing algorithm that partitions training data based on the empirical Fiedler eigenvector of a dependency graph, achieving the minimax rate of O(√Tmix/n) up to a lower-order term. Experimental validation on synthetic Markov chains, spatial grids, and various datasets supports their theoretical findings, highlighting the implications for deep reinforcement learning and variance analysis.
Methodology
The authors utilize information-theoretic techniques to derive lower bounds for classification risk in Markov chains. They analyze the performance of uniform bagging and develop an adaptive spectral routing algorithm that partitions data based on the Fiedler eigenvector of a dependency graph. Theoretical results are supported by empirical experiments on synthetic and real-world datasets.
Results
The paper shows that the minimax lower bound for excess classification risk is Ω(√Tmix/n). Uniform bagging is proven to be suboptimal with a risk of Ω(Tmix/√n), while the proposed adaptive spectral routing achieves the optimal rate of O(√Tmix/n) on a graph-regular subclass, effectively closing the gap identified.
Implications
The findings suggest that traditional ensemble methods may need to be adapted for data with Markov dependence to avoid significant performance penalties. The proposed methods could enhance the effectiveness of ensemble learning in various applications, particularly in time-series analysis and reinforcement learning scenarios.
xFODE: An Explainable Fuzzy Additive ODE Framework for System Identification
Interpretability
Time Series
Theory
- xFODE enhances interpretability in system identification by defining states incrementally based on measurable outputs.
- The framework utilizes Fuzzy Additive Models to approximate state derivatives, allowing for clearer understanding of input contributions.
- Partitioning Strategies are introduced to simplify the antecedent space, improving interpretability and reducing inference complexity.
- xFODE achieves accuracy comparable to existing models like NODE, FODE, and NLARX while providing interpretable insights.
Read more
xFODE: An Explainable Fuzzy Additive ODE Framework for System Identification
Summary
The paper introduces Explainable Fuzzy Additive Ordinary Differential Equations (xFODE), a novel framework for data-driven System Identification (SysID) that enhances interpretability while maintaining high accuracy. Traditional approaches like Neural Ordinary Differential Equations (NODE) and Fuzzy Ordinary Differential Equations (FODE) have shown promise in modeling nonlinear dynamics but often lack clear physical meanings and interpretability of state contributions. xFODE addresses these issues by defining states incrementally based on measurable outputs, thus preserving their physical significance. It employs Fuzzy Additive Models (FAMs) to approximate state derivatives, allowing for clearer input-wise contributions. Additionally, the framework introduces Partitioning Strategies (PSs) that simplify the antecedent space, ensuring that only two consecutive rules are activated for any given input, which reduces complexity and enhances interpretability. The training of xFODE is supported by a deep learning framework that optimizes parameterized membership functions. The authors validate xFODE against benchmark SysID datasets, demonstrating that it achieves comparable accuracy to NODE, FODE, and NonLinear AutoRegressive network with eXogenous inputs (NLARX) models while providing interpretable insights into system dynamics.
Methodology
The xFODE framework employs an incremental state definition based on measurable outputs, utilizes Fuzzy Additive Models for state derivative approximation, and introduces Partitioning Strategies to enhance interpretability. A deep learning framework is developed for end-to-end optimization of parameterized membership functions.
Results
xFODE demonstrates competitive accuracy on benchmark SysID datasets, matching the performance of NODE, FODE, and NLARX models while offering enhanced interpretability of the system dynamics.
Implications
The xFODE framework can be applied in various fields requiring system identification, such as control systems, robotics, and dynamic modeling, where interpretability is crucial for understanding system behavior and making informed decisions.
Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments
Reinforcement Learning
Robotics
Theory
- Introduction of Adaptive Memory Crystallization (AMC) for continual reinforcement learning.
- Development of a three-phase memory hierarchy (Liquid, Glass, Crystal) to manage memory stability and plasticity.
- Rigorous mathematical proofs establishing the convergence and performance guarantees of the proposed SDE.
- Empirical results show substantial improvements in learning efficiency and memory management.
Read more
Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments
Summary
This paper introduces Adaptive Memory Crystallization (AMC), a novel memory architecture designed to enhance the learning capabilities of autonomous AI agents in dynamic environments. The primary challenge addressed is the stability-plasticity dilemma, where agents must acquire new skills without losing previously learned knowledge. AMC is inspired by synaptic tagging and capture (STC) theory, conceptualizing memory as a continuous crystallization process. The framework features a three-phase memory hierarchy (Liquid, Glass, Crystal) governed by an Itô stochastic differential equation (SDE). The authors provide rigorous proofs of the SDE's well-posedness, convergence properties, and establish links between SDE parameters and agent performance. Empirical evaluations demonstrate that AMC significantly improves forward transfer, reduces catastrophic forgetting, and decreases memory footprint across various benchmarks, including Meta-World MT50, Atari, and MuJoCo.
Methodology
The methodology involves formulating a memory architecture based on a continuous crystallization process modeled by an Itô stochastic differential equation (SDE). The experiences are categorized into three phases of memory (Liquid, Glass, Crystal), with each phase having distinct learning rates and eviction policies. The performance of AMC is validated through extensive empirical evaluations on multiple reinforcement learning benchmarks.
Results
The empirical evaluation of AMC on various tasks showed a 34-43% improvement in forward transfer compared to the strongest baseline, a 67-80% reduction in catastrophic forgetting, and a 62% decrease in memory footprint. The theoretical analysis confirmed the well-posedness and convergence of the crystallization process, linking SDE parameters to agent performance.
Implications
The findings suggest that AMC can significantly enhance the capabilities of autonomous AI agents in dynamic environments, making it applicable in fields such as robotics, adaptive software, and autonomous driving. The approach may lead to more efficient lifelong learning systems that can adapt to new tasks without losing previously acquired knowledge.
Expressivity of Transformers: A Tropical Geometry Perspective
Theory
- Introduces a tropical geometry framework to analyze transformers' expressivity.
- Establishes that self-attention corresponds to a Power Voronoi Diagram in the zero-temperature limit.
- Demonstrates that Multi-Head Self-Attention expands complexity to O(N H).
- Derives the first tight bounds on the number of linear regions in transformers as Θ(N dmodel L).
Read more
Expressivity of Transformers: A Tropical Geometry Perspective
Summary
This paper introduces a tropical geometry framework to analyze the geometric expressivity of transformers, particularly focusing on their self-attention mechanism. By modeling self-attention as a vector-valued tropical rational map, the authors demonstrate that in the zero-temperature limit, it corresponds to a Power Voronoi Diagram. This equivalence allows for a combinatorial interpretation of Multi-Head Self-Attention (MHSA), revealing that the complexity of multi-head aggregation expands to O(N H), surpassing the O(N) limitation of single heads. The authors derive the first tight asymptotic bounds on the number of linear regions in transformers, expressed as Θ(N dmodel L), where N is the sequence length, dmodel is the embedding dimension, and L is the network depth. The study also ensures that these geometric structures remain stable under finite-temperature conditions, confirming that the theoretical models align with practical implementations. Overall, the paper provides a comprehensive mathematical framework that quantifies the expressivity of transformers through geometric analysis.
Methodology
The authors employ tropical geometry to model the self-attention mechanism as a vector-valued tropical rational map. They utilize log-lifting parameterization to bridge attention mechanisms with computational geometry, and apply Minkowski sums to analyze Multi-Head Self-Attention. The study also uses Voronoi diagrams to derive bounds on the number of linear regions and ensures geometric stability through differential approximation bounds.
Results
The paper establishes that the expressivity of transformers is quantifiable and driven by the interaction of sequence length, number of attention heads, and network depth. It provides tight asymptotic bounds on the number of linear regions, confirming that deep transformers can achieve a combinatorial explosion in expressivity while maintaining geometric stability.
Implications
The findings have significant implications for understanding the capabilities of transformer models in various applications, particularly in natural language processing and computer vision. By quantifying expressivity, this work lays the groundwork for optimizing transformer architectures and improving their performance in complex tasks.
Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling
NLP
Large Language Models
- Identifies systematic overconfidence in LLM-generated confidence scores in telecommunications.
- Proposes a Twin-Pass CoT-Ensembling method to improve confidence estimation.
- Achieves up to 88% reduction in Expected Calibration Error (ECE) across benchmarks.
- Provides empirically validated confidence thresholds and recommendations for telecom applications.
Read more
Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling
Summary
This paper addresses the critical issue of confidence estimation in Large Language Models (LLMs) used in telecommunications, where reliable self-assessment is essential for operational tasks. The authors focus on the Gemma-3 model family, evaluating its performance on three benchmarks: TeleQnA, ORANBench, and srsRANBench. They identify that traditional single-pass confidence estimates often misrepresent the model's correctness, leading to overconfidence in incorrect predictions. To improve this, the authors propose a novel Twin-Pass Chain of Thought (CoT)-Ensembling methodology, which involves multiple independent reasoning evaluations to aggregate confidence scores. This approach significantly reduces Expected Calibration Error (ECE) by up to 88%, enhancing the reliability of LLM outputs in telecommunications. The study emphasizes the need for better confidence calibration methods tailored to the unique demands of the telecom domain and provides actionable recommendations for practitioners.
Methodology
The authors evaluate confidence calibration using the Gemma-3 model family on three telecom-specific benchmarks. They introduce a training-free Twin-Pass CoT-Ensembling method, where the model critiques its own reasoning through multiple stochastic samples, aggregating the self-assessed scores to produce calibrated confidence estimates.
Results
The proposed methodology results in a significant reduction of Expected Calibration Error (ECE) by up to 88.4% across the evaluated benchmarks, transforming unreliable confidence scores into actionable metrics. The study also finds that mean aggregation of confidence scores outperforms median aggregation in 55% of experimental conditions.
Implications
The findings suggest a practical path toward more trustworthy evaluation of LLM outputs in telecommunications, which is crucial for decision-critical applications. Improved confidence estimation can enhance operational reliability and reduce risks associated with automated network management.
Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization
Reinforcement Learning
Optimization
Theory
- Introduces a geometric framework for RL using Wasserstein space.
- Establishes rigorous existence guarantees for stationary distributions.
- Utilizes Otto's calculus for second-order analysis of policy optimization.
- Demonstrates scalability of the method to high-dimensional problems.
Read more
Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization
Summary
This paper introduces a geometric framework for Reinforcement Learning (RL) that conceptualizes policies as mappings into the Wasserstein space of action probabilities. The author establishes a Riemannian structure induced by stationary distributions, ensuring rigorous existence guarantees in a general context. The tangent space of policies is defined, and geodesics are characterized, addressing the measurability of vector fields from the state space to the tangent space of probability measures over the action space. A general RL optimization problem is formulated, and a gradient flow is constructed using Otto's calculus, allowing for the computation of both the gradient and Hessian of the expected cumulative cost. The paper also provides numerical examples demonstrating the method's effectiveness in low-dimensional problems and its scalability to high-dimensional continuous control scenarios through neural network parameterization optimized via an ergodic approximation of the cost. The work bridges theoretical gaps in existing literature, offering a comprehensive second-order Wasserstein gradient flow framework for policy optimization in RL.
Methodology
The methodology involves defining a Riemannian structure for the policy space, characterizing the tangent space and geodesics, and formulating a gradient flow using Otto's calculus. The paper computes the gradient and Hessian of the expected cumulative cost, facilitating a second-order analysis. Numerical examples are provided for both low-dimensional and high-dimensional scenarios, with neural networks used for scalability.
Results
The paper successfully computes the gradient and Hessian of the energy in a rigorous manner, demonstrating the method's applicability to both low-dimensional and high-dimensional continuous control problems. The theoretical framework provides essential guarantees for the existence of invariant measures, enhancing the understanding of policy optimization dynamics.
Implications
This work has significant implications for the development of more efficient and robust reinforcement learning algorithms, particularly in environments with complex action spaces. The integration of optimal transport theory into RL could lead to improved policy optimization techniques that are more aligned with the underlying geometry of the action space.
Some Theoretical Limitations of t-SNE
Theory
- t-SNE can lose important data features during dimensionality reduction.
- In high-dimensional spaces, t-SNE may map distinct points to the same location in lower dimensions.
- The paper provides mathematical propositions demonstrating the limitations of t-SNE in preserving data structure.
- The findings suggest that t-SNE may not be appropriate for all datasets, particularly those with high dimensionality.
Read more
Some Theoretical Limitations of t-SNE
Summary
This paper provides a mathematical framework to understand the theoretical limitations of t-distributed stochastic neighbor embedding (t-SNE), a popular technique for dimensionality reduction and data visualization. The authors highlight that while t-SNE is effective in many scenarios, it can lead to significant loss of important data features, particularly in high-dimensional spaces. They present several propositions and a theorem demonstrating that as the dimensionality increases, t-SNE may fail to preserve the structure of the data, often collapsing distinct points into a single point in the lower-dimensional representation. The paper discusses the implications of these findings, emphasizing that t-SNE may not be suitable for all datasets, especially those with high dimensionality where points are approximately equidistant. The authors also relate their results to prior theoretical analyses of t-SNE, underscoring the need for caution when interpreting t-SNE visualizations.
Methodology
The authors establish a mathematical framework to analyze the performance of t-SNE by formulating propositions and a theorem that illustrate how t-SNE fails to maintain the original data structure in high-dimensional spaces. They use asymptotic analysis and consider specific configurations of data points to demonstrate the limitations of the t-SNE algorithm.
Results
The paper presents several key results, including: (1) a proposition showing that for a set of points sampled uniformly from a high-dimensional sphere, t-SNE can lead to a situation where points that are far apart in high dimensions become close in the low-dimensional embedding; (2) a theorem indicating that in high-dimensional settings, t-SNE often collapses points into a small neighborhood, resulting in a loss of informative structure; and (3) a demonstration that the optimal embedding may result in all points coinciding at a single location, particularly when the data points are equidistant.
Implications
The findings of this paper have significant implications for researchers and practitioners using t-SNE for data visualization. It highlights the need for careful consideration of the dimensionality of the data and the potential for misleading visualizations. The results suggest that alternative dimensionality reduction techniques may be necessary for high-dimensional datasets to preserve important features.
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
Multimodal
Optimization
Large Language Models
- MixAtlas provides a two-axis data decomposition for interpretable multimodal mixture optimization.
- The method utilizes small proxy models and Gaussian-process surrogates for efficient mixture search.
- Empirical results show significant performance improvements and faster convergence compared to existing baselines.
- Recipes developed on smaller models are transferable to larger models, enhancing practical optimization.
Read more
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
Summary
The paper introduces MixAtlas, a novel framework for optimizing data mixtures in multimodal large language model (MLLM) midtraining. Recognizing the limitations of current methods that typically optimize along a single dimension, MixAtlas decomposes the training corpus into two interpretable axes: image concepts and task supervision. This allows for a more systematic exploration of the mixture space. The authors utilize small proxy models paired with a Gaussian-process surrogate to efficiently search for optimal mixtures while quantifying uncertainty. The empirical results demonstrate that MixAtlas significantly enhances performance across various benchmarks, achieving improvements of 8.5%–17.6% on Qwen2-7B and 1.0%–3.3% on Qwen2.5-7B compared to baseline methods. Additionally, the optimized mixtures accelerate training, reaching target loss in up to 2× fewer steps, and the recipes discovered on smaller models transfer effectively to larger-scale training, preserving performance benefits.
Methodology
MixAtlas employs a two-axis decomposition of training data based on image concepts and task supervision. It uses small proxy models to explore the mixture space, combined with a Gaussian-process surrogate to predict performance and quantify uncertainty. This approach enables efficient optimization of data mixtures while maintaining interpretability.
Results
MixAtlas achieved performance improvements of 8.5%–17.6% on Qwen2-7B and 1.0%–3.3% on Qwen2.5-7B across 10 benchmarks. The optimized mixtures also reduced the number of training steps required to reach target loss by up to 2×. Furthermore, the mixtures discovered on 0.5B proxy models successfully transferred to 7B-scale training, maintaining both convergence and accuracy benefits.
Implications
The MixAtlas framework has the potential to enhance the efficiency and effectiveness of multimodal training processes in various applications, allowing for more targeted data collection and better performance in vision-language tasks. It could be particularly useful in scenarios where computational resources are limited, enabling researchers and practitioners to optimize their training data mixtures without extensive computational overhead.
Quantization of Spiking Neural Networks Beyond Accuracy
Efficient ML
- EMD is proposed as a diagnostic metric for evaluating firing distribution divergence in quantized SNNs.
- Quantization methods significantly affect firing distributions, even when accuracy is preserved.
- Learned quantization (e.g., LQ-Net) maintains firing behavior more effectively than uniform quantization.
- The study highlights the importance of considering firing dynamics in the deployment of quantized SNNs.
Read more
Quantization of Spiking Neural Networks Beyond Accuracy
Summary
This paper addresses the quantization of Spiking Neural Networks (SNNs) and emphasizes the need to evaluate not only the accuracy of quantized models but also their firing behavior. The authors argue that traditional metrics focusing solely on accuracy overlook critical aspects of SNN performance, particularly how quantization affects firing distributions. They introduce Earth Mover's Distance (EMD) as a new diagnostic metric to measure the divergence in firing distributions between quantized and full-precision SNNs. The study systematically explores various quantization methods, clipping ranges, and bit-widths on SEW-ResNet architectures trained on CIFAR-10 and CIFAR-100 datasets. The findings reveal that uniform quantization can lead to significant distributional drift, even when accuracy is maintained, whereas learned quantization methods, such as LQ-Net, better preserve firing behavior. The authors advocate for the inclusion of behavior preservation as a key evaluation criterion alongside accuracy in SNN quantization studies.
Methodology
The authors systematically evaluated the effects of different quantization methods, clipping ranges, and bit-widths on SNNs using EMD to measure the divergence in firing distributions. They applied these methods to SEW-ResNet architectures trained on CIFAR-10 and CIFAR-100 datasets.
Results
The results indicate that uniform quantization leads to significant changes in firing distributions, while learned quantization methods like LQ-Net preserve firing behavior closer to that of full-precision models. The study demonstrates that behavior preservation should be a critical evaluation criterion in addition to accuracy.
Implications
The findings suggest that for the effective deployment of SNNs in resource-constrained environments, it is essential to consider both accuracy and firing behavior. This could influence future research and development in SNN quantization techniques, leading to more efficient and reliable models for edge computing applications.
First-See-Then-Design: A Multi-Stakeholder View for Optimal Performance-Fairness Trade-Offs
Theory
Optimization
- Introduces a multi-stakeholder framework for fair algorithmic decision-making.
- Shifts focus from prediction-centric fairness to utility-based fairness.
- Utilizes post-hoc multi-objective optimization to explore performance-fairness trade-offs.
- Demonstrates that stochastic policies can yield better outcomes than deterministic ones.
Read more
First-See-Then-Design: A Multi-Stakeholder View for Optimal Performance-Fairness Trade-Offs
Summary
This paper addresses the limitations of traditional fairness assessments in algorithmic decision-making, which often rely on predictive metrics that do not account for the actual outcomes of decisions. The authors propose a multi-stakeholder framework that integrates welfare economics and distributive justice principles, focusing on the utilities of decision-makers (DM) and decision subjects (DS). By defining fairness through a social planner's utility that captures inequalities among DS, the framework allows for a more nuanced understanding of performance-fairness trade-offs. The authors formulate the problem as a post-hoc multi-objective optimization (MOO) task, enabling stakeholders to explore the trade-offs between DM utility and social planner utility under various decision policies. The findings indicate that simple stochastic decision policies can outperform deterministic ones in achieving better performance-fairness trade-offs by leveraging outcome uncertainty. This work advocates for a shift from prediction-centric fairness to a more transparent, justice-based approach that facilitates collaborative decision-making.
Methodology
The authors develop a multi-stakeholder framework that models the utilities of decision-makers and decision subjects, defining fairness through a social planner's utility. They employ post-hoc multi-objective optimization to characterize the trade-offs between performance and fairness, allowing stakeholders to evaluate different decision policies.
Results
The empirical analysis shows that under certain conditions, stochastic decision policies can achieve superior performance-fairness trade-offs compared to deterministic policies, highlighting the importance of considering outcome uncertainty in decision-making.
Implications
This framework can be applied in various domains where algorithmic fairness is critical, such as finance, healthcare, and criminal justice, promoting more equitable decision-making processes. It encourages stakeholders to engage in transparent discussions about the implications of different decision policies.
Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades
NLP
Large Language Models
Theory
- CTD introduces a model-cascade approach with finite-sample guarantees on delegation rate and safety performance.
- The delegation value (DV) probe provides a more targeted delegation signal compared to traditional uncertainty measures.
- CTD consistently outperforms existing methods in safety monitoring across various budget levels.
- The method adapts budget allocation based on input difficulty, preventing harmful over-delegation.
Read more
Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades
Summary
The paper presents a novel approach called Calibrate-Then-Delegate (CTD) for safety monitoring in large language models (LLMs), addressing the challenges of balancing cost and accuracy in model cascades. Traditional methods rely on probe uncertainty for delegation decisions, which can lead to ineffective escalation to more capable experts. CTD introduces a delegation value (DV) probe that predicts the benefit of escalation, allowing for more informed decisions. The method calibrates thresholds on the DV signal using held-out data, ensuring that the delegation rate adheres to budget constraints while maximizing safety performance. Evaluations on four safety datasets demonstrate that CTD outperforms uncertainty-based delegation across all budget levels, effectively preventing over-delegation and adapting budget allocation based on input difficulty. This approach provides a more reliable framework for safety monitoring in LLM applications.
Methodology
CTD combines a cheap safety probe with an expensive expert model, using a DV probe to predict the benefit of escalation for each input. It calibrates a threshold on the DV signal using held-out data and applies a delegation policy that allows for instance-level decisions without batch context. This approach ensures compliance with budget constraints while maximizing safety performance.
Results
CTD was evaluated on four safety datasets, showing significant improvements over uncertainty-based delegation, with up to +11% AUC and +19% accuracy when the expert is weaker than the probe. The method effectively prevents over-delegation and allocates computational resources adaptively based on the difficulty of inputs.
Implications
The findings suggest that CTD can enhance the safety and reliability of LLM deployments in various applications, potentially leading to more responsible AI systems that can better manage risks associated with unsafe inputs.
How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations
Graph Learning
- Establishes a unified framework for evaluating node embeddings in GNNs.
- Compares classical and quantum-oriented embeddings under controlled conditions.
- Demonstrates that quantum embeddings outperform classical ones on structure-driven datasets.
- Identifies the importance of embedding design in graph-level prediction.
Read more
How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations
Summary
This paper investigates the impact of different node embedding techniques on graph neural networks (GNNs) for graph classification tasks. The authors establish a controlled benchmarking framework to compare classical and quantum-oriented node representations, ensuring that all methods are evaluated under the same conditions, including a fixed GNN backbone, stratified data splits, and consistent optimization protocols. The study introduces various embedding methods, including classical fixed embeddings and trainable multi-layer perceptrons, alongside quantum-inspired embeddings derived from variational quantum circuits and graph dynamics. The experiments are conducted on five TU datasets and the QM9 dataset, revealing that quantum-oriented embeddings generally provide better performance on structure-driven benchmarks, while classical embeddings remain effective for social graphs with limited attributes. The findings highlight the trade-offs between inductive bias, trainability, and stability, offering a reproducible reference for selecting embeddings in graph learning tasks.
Methodology
The authors implemented a controlled benchmarking framework where various node embedding techniques were integrated into a fixed GNN pipeline. They evaluated classical embeddings (fixed and MLP-based) and quantum-inspired embeddings (variational circuits and graph dynamics) across multiple datasets, measuring performance using accuracy, Macro-F1, and Macro Precision/Recall metrics.
Results
The results indicated that quantum-oriented embeddings consistently yielded better performance on datasets that required structural understanding, while classical embeddings were sufficient for simpler social graph datasets. The study emphasized the dataset-dependent nature of embedding effectiveness and the significance of embedding design in influencing GNN performance.
Implications
The findings suggest that researchers and practitioners should carefully consider the choice of node embeddings based on the specific characteristics of the graph data and the tasks at hand. The established framework provides a basis for future research into embedding techniques and their integration into GNNs, particularly in applications requiring nuanced structural understanding.
Hardware-Efficient Neuro-Symbolic Networks with the Exp-Minus-Log Operator
Efficient ML
Interpretability
Theory
- Introduction of the Exp-Minus-Log (EML) operator as a unifying primitive for DNNs.
- Development of a DNN-EML hybrid architecture that enhances interpretability and reduces hardware complexity.
- Establishment of computational-cost bounds and analysis of inference and training acceleration.
- Identification of a literature gap in existing neuro-symbolic approaches that do not utilize a single hardware-realizable primitive.
Read more
Hardware-Efficient Neuro-Symbolic Networks with the Exp-Minus-Log Operator
Summary
This paper addresses the limitations of deep neural networks (DNNs) in safety-critical and resource-constrained environments, particularly their lack of interpretability and reliance on diverse activation functions that increase latency and hardware requirements. The author introduces the Exp-Minus-Log (EML) operator, which can express all standard elementary functions using a binary tree of identical nodes. By embedding EML primitives into conventional DNN architectures, the proposed DNN-EML hybrid model aims to enhance interpretability while maintaining approximation power. The paper details the forward equations of the DNN-EML architecture, establishes computational-cost bounds, and analyzes the potential for inference and training acceleration compared to traditional multilayer perceptrons (MLPs) and physics-informed neural networks (PINNs). The findings suggest that while EML may not accelerate training or inference on standard hardware, it could provide significant latency advantages on dedicated EML cells, such as FPGA or analog circuits, along with improved interpretability and formal verification capabilities.
Methodology
The paper formulates a DNN-EML hybrid architecture by embedding EML primitives into conventional DNNs. It derives forward equations, proves universal approximation properties, and analyzes computational complexity for inference and training. The author contrasts the EML approach with existing neuro-symbolic methods and evaluates performance on standard and dedicated hardware.
Results
The analysis indicates that EML does not accelerate training or inference on standard CPU/GPU hardware. However, on dedicated EML cells, such as FPGA or analog circuits, the DNN-EML hybrid can achieve latency advantages of an order of magnitude, along with gains in interpretability and formal verification.
Implications
The findings suggest that the DNN-EML architecture could be particularly beneficial for applications in safety-critical domains, such as automotive engineering, where interpretability and formal verification are essential. It may also enhance the deployment of AI models in edge computing environments with limited resources.
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
Reinforcement Learning
Robotics
Optimization
- CMAT bridges MARL and SARL, addressing key challenges in cooperative multi-agent settings.
- The framework utilizes a Transformer encoder and a hierarchical decision-making mechanism for effective coordination.
- Simultaneous action generation based on a consensus vector reduces sensitivity to action order.
- CMAT shows superior performance on benchmark tasks compared to existing methods.
Read more
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
Summary
This paper introduces the Consensus Multi-Agent Transformer (CMAT), a novel framework that connects cooperative Multi-Agent Reinforcement Learning (MARL) with hierarchical Single-Agent Reinforcement Learning (SARL). The authors address the challenges of non-stationarity, unstable training, and weak coordination that arise in MARL due to the exponential growth of joint observation and action spaces. CMAT treats all agents as a unified entity and employs a Transformer encoder to process observations, while a hierarchical decision-making mechanism allows for the generation of a high-level consensus vector. This consensus enables simultaneous action generation by all agents, mitigating the sensitivity to action-generation order seen in conventional Multi-Agent Transformers. The framework is optimized using single-agent Proximal Policy Optimization (PPO), maintaining expressive coordination through latent consensus. Experimental evaluations on benchmark tasks from StarCraft II, Multi-Agent MuJoCo, and Google Research Football demonstrate that CMAT outperforms existing centralized solutions and conventional MARL methods, showcasing its effectiveness in cooperative settings.
Methodology
The authors developed the Consensus Multi-Agent Transformer (CMAT) by employing a Transformer encoder to process joint observations and a hierarchical decision-making mechanism that generates a consensus vector. This vector allows all agents to generate their actions simultaneously, thus avoiding the order sensitivity of traditional methods. The framework is optimized using single-agent Proximal Policy Optimization (PPO).
Results
CMAT was evaluated on several benchmark tasks, including StarCraft II, Multi-Agent MuJoCo, and Google Research Football. The results indicated that CMAT consistently outperformed strong baselines, including recent centralized solutions and conventional MARL approaches, highlighting its effectiveness in cooperative multi-agent scenarios.
Implications
The proposed CMAT framework has significant implications for real-world applications requiring coordinated decision-making among multiple agents, such as autonomous fleet management, traffic signal optimization, and robotic swarm control. Its ability to handle large joint observation and action spaces efficiently could lead to advancements in various domains where cooperation among agents is crucial.
Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate
Large Language Models
Efficient ML
Optimization
- DASH-Q improves robustness in ultra low-bit quantization by using diagonal Hessian approximations.
- The framework effectively filters out noise from calibration data, enhancing feature preservation.
- Achieves significant accuracy improvements over existing PTQ methods, particularly in low-bit regimes.
- Demonstrates strong performance with minimal calibration data, making it suitable for resource-limited environments.
Read more
Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate
Summary
This paper presents DASH-Q, a novel framework for Post-Training Quantization (PTQ) aimed at improving the deployment of Large Language Models (LLMs) in resource-constrained environments. Traditional PTQ methods, particularly those based on Hessian approximations, struggle with low-bit quantization due to noise in curvature estimates from limited calibration data. DASH-Q addresses this issue by utilizing a diagonal Hessian approximation and iterative weighted least squares, which effectively filters out noise-prone dependencies while preserving the importance of salient features. The proposed method demonstrates significant improvements in zero-shot accuracy across five baseline LLM models, achieving an average increase of 7.01% and up to 14.01% over existing state-of-the-art methods, even with minimal calibration data. This advancement highlights the potential of DASH-Q to enhance the robustness and efficiency of ultra low-bit quantization in practical applications.
Methodology
DASH-Q employs a diagonal Hessian approximation to decouple quantization into independent weighted least squares problems. This method iteratively optimizes quantization parameters while minimizing reconstruction error, effectively mitigating the impact of noise from limited calibration data.
Results
DASH-Q outperformed existing PTQ baselines in ultra low-bit quantization, achieving an average increase of 7.01% in zero-shot accuracy and up to 14.01% improvement over the strongest baselines across five LLM models, demonstrating robust performance even with very small calibration datasets.
Implications
The findings suggest that DASH-Q can significantly enhance the deployment of LLMs in environments with limited computational resources, making it a valuable tool for applications requiring efficient model performance without extensive retraining.
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
NLP
Large Language Models
Efficient ML
- ConfLayers introduces a confidence-based mechanism for adaptive layer skipping in self-speculative decoding.
- The framework is training-free and offers a plug-and-play solution for constructing draft subnetworks.
- Empirical results show up to 1.4× speedup over standard autoregressive decoding with maintained output quality.
- ConfLayers consistently outperforms existing heuristic and dynamic skipping methods.
Read more
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
Summary
This paper introduces ConfLayers, a novel framework for self-speculative decoding in large language models (LLMs) that enhances inference speed without compromising output quality. Self-speculative decoding leverages a compact draft model for rapid token generation, followed by verification from a full target model. ConfLayers innovatively employs a confidence-based mechanism to dynamically select which layers to skip during inference, iteratively refining the draft model based on confidence scores and acceptance rates. This approach eliminates the need for auxiliary models or extensive training, providing a plug-and-play solution that adapts to various tasks and datasets. The empirical evaluation demonstrates that ConfLayers achieves up to 1.4× speedup over traditional autoregressive decoding methods while maintaining high-quality outputs, outperforming existing heuristic and dynamic layer-skipping strategies. The findings suggest that confidence-guided inference can significantly enhance the efficiency of LLM generation, making it a promising avenue for real-time applications.
Methodology
ConfLayers iteratively computes confidence scores for each layer during generation, selects layers to skip based on an adaptive threshold, and refines the draft model configuration based on the verifier's acceptance rates. This process is conducted without retraining or modifying the model architecture, allowing for a scalable and adaptable inference strategy.
Results
The performance evaluation across various models and datasets indicates that ConfLayers achieves up to 1.4× speedup compared to traditional LLM generation methods, while ensuring high output quality. It consistently outperforms previous layer-skipping and early-exiting baselines.
Implications
The findings suggest that ConfLayers can be effectively utilized in real-time applications requiring efficient LLM inference, potentially transforming how large language models are deployed in interactive settings. The confidence-guided approach may also inspire future research on adaptive inference techniques.
Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits
Theory
Optimization
Efficient ML
- Introduces SAT-CTS, a lightweight policy for beam and rate adaptation in mmWave systems.
- Establishes finite-time regret bounds for combinatorial semi-bandits with satisficing objectives.
- Demonstrates that SAT-CTS effectively reduces satisficing regret while maintaining fairness and throughput.
- Eliminates the need for explicit channel state information by relying on ACK/NACK feedback.
Read more
Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits
Summary
This paper addresses the challenge of downlink beam and rate adaptation in multi-user millimeter-wave (mmWave) MISO systems, where multiple base stations serve single-antenna user equipments using analog beamforming. The authors introduce a novel approach called SAT-CTS, which utilizes a satisficing throughput threshold to guide the adaptation process. By framing the problem as a combinatorial semi-bandit, SAT-CTS balances exploration and exploitation to efficiently learn beam-rate configurations that meet a predefined quality of service (QoS) target without requiring explicit channel state information (CSI). The theoretical contributions include finite-time regret bounds for the proposed algorithm, demonstrating its effectiveness in achieving satisfactory throughput while maintaining competitive standard regret and fairness across users. The experimental results indicate that SAT-CTS significantly reduces satisficing regret and enhances average throughput, showcasing its potential for practical applications in next-generation wireless networks.
Methodology
The authors model the joint beam and rate adaptation as a combinatorial semi-bandit problem, utilizing a satisficing approach that focuses on achieving a throughput threshold rather than maximizing throughput. The SAT-CTS algorithm combines conservative confidence estimates with posterior sampling to guide the learning process based on binary ACK/NACK feedback from user equipments. Theoretical analysis provides regret bounds under different realizability conditions of the throughput threshold.
Results
The SAT-CTS algorithm consistently reduces cumulative satisficing regret and maintains competitive standard regret across various time-varying sparse multipath channels. The results indicate that SAT-CTS achieves favorable average throughput and fairness among users, demonstrating its effectiveness in meeting QoS targets without requiring detailed channel state knowledge.
Implications
The findings suggest that SAT-CTS can be effectively applied in next-generation wireless networks, particularly in scenarios where channel state information is difficult to obtain. This approach can enhance communication reliability and throughput in multi-user environments, making it suitable for bandwidth-intensive applications such as interactive extended reality and ultra-high-definition streaming.
Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning
Theory
Optimization
Efficient ML
- Introduction of top-k goodness function, significantly outperforming the traditional sum-of-squares method.
- Development of entmax-weighted energy for adaptive sparse weighting, leading to improved accuracy.
- Implementation of separate label–feature forwarding (FFCL) enhances performance across all goodness functions.
- Establishment of a unifying principle that emphasizes the importance of sparsity in goodness functions for FF networks.
Read more
Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning
Summary
This paper investigates the Forward-Forward (FF) algorithm, a biologically plausible alternative to backpropagation for training neural networks. The authors challenge the conventional use of the sum-of-squares (SoS) goodness function, proposing a systematic exploration of the goodness function design space. They introduce 'top-k goodness,' which focuses on the k most active neurons, demonstrating a significant performance improvement over SoS on the Fashion-MNIST dataset. Additionally, they present 'entmax-weighted energy,' which utilizes a learnable sparse weighting mechanism to further enhance performance. The paper also adopts a novel approach called separate label–feature forwarding (FFCL), which injects class hypotheses at every layer. The combination of these innovations leads to a remarkable accuracy of 87.1% on Fashion-MNIST, representing a 30.7 percentage point improvement over the SoS baseline. The authors establish that sparsity in the goodness function is crucial for FF performance, with adaptive sparsity yielding the best results. Through extensive experiments, they reveal that the choice of goodness function significantly impacts the learning dynamics and overall effectiveness of FF networks.
Methodology
The authors conducted a systematic study of various goodness functions, focusing on the top-k goodness function that measures only the most active neurons. They also explored entmax-weighted energy for adaptive sparsity and implemented FFCL for label injection. The performance was evaluated through controlled experiments across multiple goodness functions and architectures, analyzing the effects of sparsity on model performance.
Results
The proposed top-k goodness function achieved a 22.6 percentage point improvement over the SoS baseline on Fashion-MNIST. The entmax-weighted energy further improved results, and the combination of these methods with FFCL led to an overall accuracy of 87.1%, a 30.7 percentage point increase over the SoS baseline.
Implications
The findings suggest that rethinking the design of goodness functions can lead to significant advancements in the performance of FF networks. This work may influence future research in biologically inspired learning algorithms and their applications in various domains, including computer vision and beyond.
Golden Handcuffs make safer AI agents
Reinforcement Learning
Theory
- Introduces the 'Golden Handcuffs' mechanism to enhance safety in AI agents.
- Expands the reward range to include negative values, promoting risk aversion.
- Proves that the agent can achieve sublinear regret against the best mentor.
- Ensures that unsafe actions are only triggered by mentors, not the optimizing policy.
Read more
Golden Handcuffs make safer AI agents
Summary
This paper addresses the safety concerns associated with reinforcement learning (RL) agents operating in general environments, where traditional assumptions do not hold. The authors propose a novel approach called the 'Golden Handcuffs' agent, which incorporates a pessimistic variant of the AIXI framework. By expanding the agent's subjective reward range to include a large negative value, the agent becomes risk-averse to strategies that could lead to significant negative outcomes. The agent employs a mentor-guided exploration mechanism, allowing it to defer to safer mentor policies when its confidence in achieving high rewards diminishes. The authors demonstrate that this approach leads to two main properties: (i) capability, where the agent achieves sublinear regret against the best mentor, and (ii) safety, ensuring that no low-complexity unsafe actions are taken before being flagged by a mentor. The paper discusses the implications of this method for improving the safety and robustness of AI agents in complex environments.
Methodology
The authors develop a Bayesian policy that incorporates a pessimistic approach to reward maximization. The agent's reward structure is modified to include a large negative value, which discourages exploration of potentially harmful strategies. The agent occasionally defers to mentor policies for exploration and safety, ensuring that it learns from safe actions while avoiding irrecoverable states.
Results
The Golden Handcuffs agent achieves sublinear regret of order T^(2/3 + ε) against the best mentor policy over time T. Additionally, it guarantees that no unsafe actions are taken by the agent before being triggered by a mentor, thus enhancing the overall safety of the agent's operations.
Implications
This approach has significant implications for the design of safer AI systems, particularly in environments where traditional safety guarantees are insufficient. It can be applied in various domains where AI agents interact with complex and unpredictable environments, ensuring that they remain aligned with human safety standards.
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
Reinforcement Learning
Robotics
Efficient ML
- Introduction of TrailBlazer, a sample-efficient Monte-Carlo planning algorithm.
- Focus on exploring near-optimal states to minimize oracle calls.
- Theoretical guarantees on sample complexity provided.
- Comparison with existing algorithms shows significant efficiency improvements.
Read more
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
Summary
This paper introduces TrailBlazer, a novel algorithm designed for sample-efficient Monte-Carlo planning in Markov Decision Processes (MDPs) utilizing a generative model. The authors focus on optimizing the planning process by exploring only a subset of states that can be reached through near-optimal policies, thereby reducing the number of oracle calls needed to approximate the value function. The paper provides a sample complexity analysis of TrailBlazer, demonstrating that it can achieve ε-accurate approximations of the value function with a significantly lower number of oracle calls compared to existing algorithms like UCT and uniform sampling methods. The authors emphasize the importance of exploiting the structure of the MDP to enhance efficiency, and they present theoretical guarantees on the sample complexity based on a new problem-dependent measure of near-optimal nodes. The approach is validated through a tree representation of the planning problem, which alternates between maximizing actions and averaging over possible next states.
Methodology
The authors employ a tree-based representation of the planning problem, where nodes alternate between maximizing actions and averaging over next states. They analyze the sample complexity of the TrailBlazer algorithm, ensuring that it can provide ε-accurate approximations of the value function with high probability. The algorithm's sampling strategy is designed to adaptively exploit the structure of the MDP, allowing for efficient exploration of near-optimal states.
Results
TrailBlazer demonstrates improved sample complexity compared to traditional methods, achieving polynomial bounds in specific cases. The algorithm effectively reduces the number of oracle calls required to approximate the value function, making it computationally efficient while maintaining high accuracy in planning.
Implications
The findings suggest that TrailBlazer can be applied in various domains requiring efficient planning under uncertainty, such as robotics and automated decision-making systems. Its ability to leverage the structure of MDPs could lead to advancements in real-time planning applications and enhance the performance of AI systems in complex environments.
Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification
Theory
Interpretability
Efficient ML
- Introduces a physics-informed transfer learning framework for methane sorption prediction.
- Achieves a 227% improvement over classical isotherm models in predictive accuracy.
- Monte Carlo Dropout is identified as the best method for uncertainty quantification.
- Demonstrates the importance of moisture-volatile interactions in methane sorption.
Read more
Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification
Summary
This paper presents a novel physics-informed transfer learning framework designed to enhance methane sorption predictions across various coal ranks. The framework adapts a previously developed hydrogen sorption Physics-Informed Neural Network (PINN) to methane sorption using Elastic Weight Consolidation and coal-specific feature engineering. A three-phase curriculum is implemented to balance the preservation of transfer learning with thermodynamic fine-tuning. The model is trained on a dataset comprising 993 equilibrium measurements from 114 independent coal experiments, achieving an R² score of 0.932 on held-out samples, significantly outperforming traditional pressure-only isotherm models. The study also evaluates five Bayesian uncertainty quantification methods, revealing that Monte Carlo Dropout provides the most reliable uncertainty estimates with minimal computational overhead. The results indicate that moisture-volatile interactions are the most influential factors in sorption behavior, and the learned representations maintain physical interpretability. This work demonstrates the effectiveness of cross-gas transfer learning as a strategy for geological material modeling, particularly in data-scarce environments.
Methodology
The methodology involves adapting a hydrogen sorption PINN to methane sorption through Elastic Weight Consolidation and coal-specific feature engineering. A three-phase curriculum is utilized to progressively balance transfer preservation with thermodynamic fine-tuning. The model is trained on a comprehensive dataset of coal sorption measurements, and various Bayesian uncertainty quantification methods are compared to assess their performance under physics constraints.
Results
The proposed framework achieved an R² score of 0.932 on held-out coal samples, indicating a significant improvement in prediction accuracy compared to traditional models. Monte Carlo Dropout provided well-calibrated uncertainty estimates with an expected calibration error of 0.101 and a correlation coefficient of 0.708, while deep ensembles showed performance degradation due to shared physics constraints.
Implications
The findings suggest that the proposed framework can significantly enhance methane sorption predictions in coal seams, which is critical for resource assessment and carbon storage. The effective use of cross-gas transfer learning could lead to more efficient modeling strategies in geological applications, particularly in scenarios with limited data.