AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
24
Papers today
8h
Update frequency
7
Days of history
QDSB: Quantized Diffusion Schrödinger Bridges
Generative Models
Efficient ML
Optimization
- Introduction of QDSB for efficient generative modeling from unpaired samples.
- Anchor-based quantization method improves computational efficiency of coupling.
- Stability analysis provides geometric principles for anchor selection.
- Empirical results show QDSB matches existing methods in sample quality with reduced time.
Read more
QDSB: Quantized Diffusion Schrödinger Bridges
Summary
The paper introduces Quantized Diffusion Schrödinger Bridges (QDSB), a novel approach to learning generative models from unpaired samples of source and target distributions. Traditional methods, such as Schrödinger bridges (SB), require paired samples to estimate the most likely evolution between distributions, which can be computationally expensive. The authors propose an anchor-based quantization method that computes the entropic optimal transport (OT) problem on quantized endpoint distributions, allowing for a more efficient coupling process. This method retains the simulation-free training objective while significantly reducing computational time. The stability of the proposed coupling is analyzed, showing that the error is controlled by the quality of the anchor approximation. Empirical evaluations demonstrate that QDSB matches the sample quality of existing methods while improving time performance, making it a promising approach for generative modeling tasks.
Methodology
The QDSB method involves quantizing endpoint distributions onto a set of anchors, computing the entropic OT between these discrete measures, and sampling original endpoints from matched anchor cells. This approach allows for simulation-free training while maintaining the quality of the learned generative model.
Results
The experiments conducted on both toy and real-world tasks indicate that QDSB effectively preserves the quality of simulation-free bridge training while significantly improving the time-performance trade-off during the coupling stage.
Implications
QDSB has potential applications in various generative modeling tasks, including image translation, audio synthesis, and video generation, where efficient learning from unpaired samples is crucial.
QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization
Efficient ML
Optimization
Computer Vision
- Introduction of the Intelligence Index (I) for unified evaluation of quantized neural networks.
- QuIDE framework offers a standardized protocol for measuring model efficiency across various architectures.
- Empirical findings reveal a task-dependent Pareto Knee, with optimal bit-widths varying by task complexity.
- The accuracy-gated variant (I′) effectively identifies non-viable quantization configurations.
Read more
QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization
Summary
The paper introduces QuIDE, a framework designed to evaluate the efficiency of quantized neural networks through a unified metric called the Intelligence Index (I). This index combines the trade-offs of compression, accuracy, and latency into a single scalar value, defined as I = (C × P) / log2(T + 1), where C represents compression, P denotes predictive accuracy, and T indicates inference latency. The authors conduct experiments across six different settings, including SimpleCNN on MNIST and CIFAR datasets, ResNet-18 on ImageNet-1K, and Llama-3-8B, revealing a task-dependent Pareto Knee. The findings indicate that 4-bit quantization is optimal for simpler tasks and large language models, while 8-bit quantization is necessary for complex CNN tasks like ResNet-18 on ImageNet, where 4-bit quantization leads to significant accuracy loss. An accuracy-gated variant of the Intelligence Index (I′) is proposed to prevent rewarding non-viable configurations that the raw index might inflate. The QuIDE framework provides a reproducible evaluation protocol and a fitness function for mixed-precision search, addressing the lack of standardized metrics in quantized model evaluation.
Methodology
The authors define the Intelligence Index (I) and its accuracy-gated variant (I′) formally. They conduct post-training quantization (PTQ) experiments across six different neural network architectures and datasets to validate the effectiveness of the QuIDE framework. The methodology emphasizes a reproducible evaluation protocol that integrates compression, accuracy, and latency into a single metric.
Results
The results demonstrate that 4-bit quantization is optimal for simpler tasks like MNIST and large language models, while 8-bit quantization is necessary for more complex tasks such as ResNet-18 on ImageNet, where 4-bit quantization results in catastrophic accuracy loss. The accuracy-gated variant I′ successfully identifies configurations that would otherwise be incorrectly rewarded by the raw index I.
Implications
The QuIDE framework has the potential to standardize the evaluation of quantized neural networks, facilitating better model deployment on edge devices. It provides a systematic approach to optimize the trade-offs between compression, accuracy, and latency, which is crucial for real-world applications in resource-constrained environments.
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
Efficient ML
Time Series
Theory
- Identified gradient blocking during BMRU updates as a key limitation.
- Introduced CMRU and αCMRU as novel parallelizable RNN cells with persistent memory.
- Demonstrated improved convergence stability and reduced initialization sensitivity.
- CMRU and αCMRU outperform traditional RNNs on tasks requiring long-term retention.
Read more
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
Summary
This paper addresses the challenges of learning long-term dependencies in recurrent neural networks (RNNs) designed for ultra-low power applications. The authors introduce the Cumulative Memory Recurrent Unit (CMRU) and its relaxed variant, the αCMRU, as solutions to the limitations of the Bistable Memory Recurrent Unit (BMRU), particularly focusing on the issue of gradient blocking during state updates. By proposing a cumulative update formulation, the authors enhance gradient flow while maintaining persistent memory, which is crucial for efficient analog implementations. The CMRU and αCMRU demonstrate improved convergence stability and reduced sensitivity to initialization compared to traditional RNN architectures. Experimental results show that these new units match or exceed the performance of Linear Recurrent Units (LRUs) and minimal Gated Recurrent Units (minGRUs) on various benchmarks, particularly excelling in tasks that require discrete long-range retention. The findings suggest that the CMRU and αCMRU can effectively bridge the performance gap with state-space models and gated RNNs while being suitable for both analog and digital applications.
Methodology
The authors developed a cumulative update formulation to address gradient blocking in BMRUs, leading to the creation of the CMRU and αCMRU. They conducted systematic benchmarking to evaluate the performance of these units against traditional RNN architectures on various sequential tasks, focusing on small model sizes and interpretability.
Results
The CMRU and αCMRU showed significant improvements in convergence stability and performance on benchmarks, matching or surpassing LRUs and minGRUs, particularly in tasks that require discrete long-range memory retention. The CMRU maintained quantized states and noise-resilient dynamics, making it suitable for ultra-low power analog implementations.
Implications
The findings suggest that CMRU and αCMRU can be effectively utilized in ultra-low power applications, such as always-on sensors and biomedical implants, where efficient memory retention and low power consumption are critical. Additionally, these models may facilitate advancements in hardware-software co-design for machine learning systems.
ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models
Large Language Models
Optimization
Efficient ML
- ADMM-Q formulates layer-wise weight quantization as a constrained optimization problem, improving upon traditional methods.
- The algorithm employs a joint optimization approach rather than a greedy column-wise method, enhancing model utility.
- ADMM-Q shows significant performance improvements over GPTQ, particularly in low-bit quantization scenarios.
- The method is modular and compatible with existing quantization techniques, ensuring ease of integration.
Read more
ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models
Summary
The paper introduces ADMM-Q, a novel weight quantization algorithm designed to enhance post-training quantization (PTQ) for large language models (LLMs). Traditional quantization methods like GPTQ and RTN often lead to degraded model performance, especially at aggressive quantization levels (sub-4-bit). ADMM-Q addresses this by formulating the layer-wise quantization problem as a constrained optimization task, utilizing a combinatorial variant of the Alternating Direction Method of Multipliers (ADMM). This approach allows for simultaneous optimization of all weights, minimizing layer-wise reconstruction error while adhering to quantization constraints. The authors implement several enhancements, including penalty scheduling and local search post-processing, to ensure efficiency at LLM scale. The results demonstrate that ADMM-Q outperforms existing methods in perplexity and accuracy across various quantization settings, making it a modular and effective drop-in replacement for existing weight quantizers in PTQ pipelines.
Methodology
The authors propose ADMM-Q, which uses the Alternating Direction Method of Multipliers (ADMM) to jointly optimize weights for layer-wise quantization. The algorithm incorporates a Hessian-weighted update mechanism and a discrete quantization projection, along with enhancements like adaptive penalty scheduling and a local search refinement step to improve the final quantized weights.
Results
ADMM-Q consistently outperforms GPTQ across various configurations of the Qwen3 model family, achieving lower perplexity and better zero-shot accuracy. For instance, in the W3A16 weight-only setting, perplexity improved from 12.85 to 10.06. The integration of ADMM-Q into existing weight-and-activation quantization pipelines like SpinQuant and SmoothQuant also yielded superior results, with preserved inference throughput and memory benefits.
Implications
The development of ADMM-Q has significant implications for the deployment of large language models on resource-constrained devices, as it allows for aggressive quantization without sacrificing model performance. This could enhance the accessibility of LLMs in real-time applications and on edge devices.
RT-Transformer: The Transformer Block as a Spherical State Estimator
Theory
- Introduction of the Radial–Tangential SDE (RT-SDE) for structured stochastic modeling of noise in Transformers.
- Attention is reinterpreted as a precision-weighted estimator of latent directions on a hypersphere.
- Unified derivation of attention, residual connections, and normalization as components of a single filtering update.
- Proposed architectural modifications enhance the Transformer by incorporating magnitude-dependent precision and normalization.
Read more
RT-Transformer: The Transformer Block as a Spherical State Estimator
Summary
This paper presents a novel interpretation of the Transformer block by modeling its core components—attention, residual connections, and normalization—as arising from a single geometric estimation problem. The author introduces the Radial–Tangential Stochastic Differential Equation (RT-SDE), which models the latent state as a direction on a hypersphere, allowing for a precision-weighted directional inference procedure. In this framework, attention aggregates evidence based on reliability, residual connections facilitate incremental state updates, and normalization retracts the updated state back onto the hypersphere. The RT-SDE maintains computational tractability while accommodating anisotropic noise through a structured decomposition of uncertainty into radial and tangential components. The paper also proposes architectural modifications to the standard Transformer, including magnitude-dependent attention precision and normalization of queries, keys, and values, which arise naturally from the underlying geometric model rather than as arbitrary design choices. This work lays the groundwork for a unified understanding of the Transformer architecture and suggests future empirical evaluations to validate the proposed framework.
Methodology
The paper employs a theoretical framework based on geometric estimation, specifically using the Radial–Tangential SDE to model the latent state on a hypersphere. It derives the operations of the Transformer block from this geometric perspective, focusing on the implications of anisotropic noise and the structure of state updates.
Results
The RT-SDE allows for closed-form covariance propagation and tractable precision computation, enabling a directional inference process that maintains the efficiency of the Transformer architecture. The proposed modifications to the Transformer architecture are grounded in the geometric interpretation, suggesting that token magnitude encodes directional confidence and that attention logits should incorporate magnitude-dependent precision.
Implications
This work provides a unified theoretical foundation for understanding the Transformer architecture, potentially leading to more effective designs and applications in various domains. The insights gained from the geometric interpretation may inform future research on attention mechanisms and state estimation in machine learning models.
No More, No Less: Task Alignment in Terminal Agents
NLP
Large Language Models
Theory
- Introduces the Task Alignment Benchmark (TAB) to evaluate task alignment in terminal agents.
- Defines task alignment as the selective use of environmental information, distinguishing between cue utilization and distraction resistance.
- Demonstrates a gap between task capability and task alignment in existing terminal agents.
- Shows that suppressing distractor execution can also suppress necessary cues for task completion.
Read more
No More, No Less: Task Alignment in Terminal Agents
Summary
This paper addresses the challenge of task alignment in terminal agents, which are increasingly capable of executing complex tasks based on user prompts. The authors introduce the Task Alignment Benchmark (TAB), a suite of 89 terminal tasks designed to evaluate how well agents can selectively use relevant environmental cues while ignoring irrelevant distractions. Existing benchmarks fail to capture this nuanced ability, as they often measure task completion without considering how agents interpret environmental information. The TAB tasks are intentionally underspecified, requiring agents to discern necessary cues embedded in the environment from plausible distractors. The evaluation of ten frontier agents reveals a significant gap between task capability and task alignment, with some agents achieving high task completion rates but low alignment scores. The study also shows that defenses aimed at suppressing distractor execution can inadvertently hinder cue utilization, highlighting the need for agents to balance the use of environmental instructions. Overall, the paper formalizes task alignment, introduces a new benchmark, and demonstrates the importance of selective information processing in terminal agents.
Methodology
The authors developed the Task Alignment Benchmark (TAB) by transforming 89 terminal tasks from Terminal-Bench 2.1, ensuring each task contained a necessary cue and a distractor. They evaluated ten terminal agents on their ability to utilize cues and resist distractions, measuring task alignment through cue utilization and distraction resistance metrics.
Results
The evaluation revealed that while some agents achieved high task completion rates, their task alignment scores were low, indicating poor distraction resistance. For instance, GPT-5.5 had high cue utilization but only 23% task alignment, while Claude Opus 4.7 achieved 72% task alignment through better distraction resistance. The study also found that defenses designed to limit distractor execution often compromised cue utilization.
Implications
The findings suggest that improving task alignment in terminal agents is crucial for their effective deployment in real-world environments, where they must navigate complex and often conflicting information. The TAB benchmark can serve as a tool for future research and development of more sophisticated agents capable of discerning relevant information.
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
Computer Vision
Theory
Efficient ML
- KAN-CL utilizes per-knot importance regularization to address catastrophic forgetting in continual learning.
- The framework combines a KAN classification head with a convolutional backbone, enhancing feature extraction while localizing task-specific parameters.
- Significant reductions in forgetting (88% and 93%) were achieved on standard benchmarks, while maintaining competitive accuracy.
- Theoretical insights from NTK analysis support the effectiveness of the proposed method in preventing forgetting.
Read more
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
Summary
The paper addresses the challenge of catastrophic forgetting in continual learning (CL), where models lose previously acquired knowledge when learning new tasks. The authors propose KAN-CL, a framework utilizing Kolmogorov-Arnold Networks (KANs) to implement per-knot importance regularization. This method allows for task-specific parameterization by exploiting the compact-support spline structure of KANs, which enables localized parameter activation based on input ranges. KAN-CL is integrated with a convolutional backbone and employs standard Elastic Weight Consolidation (EWC) regularization on the backbone, referred to as bbEWC. The results demonstrate significant reductions in forgetting—88% and 93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T, respectively—while maintaining or exceeding baseline accuracy. The paper also includes a Neural Tangent Kernel (NTK) analysis, revealing that the structural properties of KANs lead to a forgetting bound that remains effective even during feature learning. Overall, KAN-CL represents a modular and principled approach to mitigating catastrophic forgetting in continual learning.
Methodology
The authors developed KAN-CL by integrating Kolmogorov-Arnold Networks with a convolutional neural network (CNN) backbone. They introduced a per-knot Fisher decomposition to model parameter importance at a granular level, allowing for localized regularization. The method employs standard EWC on the backbone while applying per-knot importance-weighted L2 anchoring on the KAN head. A neural tangent kernel analysis was conducted to derive theoretical bounds on forgetting.
Results
KAN-CL demonstrated a reduction in forgetting by 88% and 93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T, respectively, compared to a head-only KAN baseline. The accuracy of KAN-CL matched or exceeded that of all baseline methods, achieving a Pareto improvement on the accuracy-forgetting trade-off.
Implications
The findings suggest that KAN-CL could be a valuable framework for applications requiring continual learning, such as robotics, autonomous systems, and any domain where models must adapt to new tasks without losing prior knowledge. The approach may also inspire further research into localized parameterization strategies in neural networks.
TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning
Time Series
- TRACE introduces an autoregressive EEG pre-training framework that predicts future EEG patches from causal context.
- The TR-MoE block combines spatial-temporal attention with a cross-channel routing mechanism to maintain coherence across channels.
- TRACE supports heterogeneous pre-training across various EEG datasets without requiring a uniform electrode layout.
- The framework achieves leading results on multiple EEG benchmarks, demonstrating its effectiveness in both seen and unseen domains.
Read more
TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning
Summary
The paper presents TRACE, an innovative autoregressive framework designed for learning transferable representations from electroencephalography (EEG) data. Traditional methods struggle with EEG's multi-channel and non-stationary nature, often applying uniform computation across time or treating channels independently. TRACE addresses these challenges by predicting future EEG patches based on causal context while maintaining coherence across channels. It employs a Temporal Routing Mixture-of-Experts (TR-MoE) architecture that adapts computation to varying temporal dynamics while ensuring cross-channel consistency. The framework is compatible with diverse EEG datasets, allowing for heterogeneous pre-training without enforcing a common montage. Evaluated across eight EEG benchmarks, TRACE demonstrates superior performance in both seen-domain transfer and unseen-dataset generalization, outperforming several existing methods and confirming the importance of its unique routing mechanism.
Methodology
TRACE utilizes an autoregressive approach to predict future EEG patches from past observations, employing a TR-MoE block that integrates spatial-temporal attention and a cross-channel temporal routing feedforward network (CTR-FFN). This architecture allows for adaptive computation based on the temporal state of the entire EEG signal, ensuring coherence among channels observed simultaneously. The model is pre-trained on a diverse corpus of EEG data that includes various montages and recording protocols.
Results
TRACE achieved the best results on several EEG benchmarks, particularly excelling in seen-domain transfer and maintaining competitive performance in motor imagery and clinical event classification tasks. The ablation studies confirmed the advantages of the CTR-FFN and the multi-source pre-training strategy, underscoring the model's robustness across different EEG datasets.
Implications
The TRACE framework has significant implications for brain-computer interface (BCI) applications and clinical monitoring, as it enhances the ability to learn generalizable EEG representations. Its autoregressive nature allows for real-time predictions of neural dynamics, which could improve the accuracy and responsiveness of BCI systems.
Targeted Neuron Modulation via Contrastive Pair Search
NLP
Large Language Models
Interpretability
- CNA identifies the 0.1% of MLP neurons crucial for distinguishing harmful from benign prompts.
- Neuron-level ablation reduces refusal rates by over 50% while preserving output coherence.
- Refusal mechanisms in instruction-tuned models are targetable and sparse, unlike base models.
- Results are consistent across different architectures and model sizes, indicating robustness.
Read more
Targeted Neuron Modulation via Contrastive Pair Search
Summary
This paper addresses the mechanisms behind the refusal behavior of language models when faced with harmful requests. The authors introduce a novel method called Contrastive Neuron Attribution (CNA), which identifies a sparse subset of MLP neurons that significantly distinguish between harmful and benign prompts. Unlike existing steering methods that operate on the residual stream and can degrade output coherence, CNA focuses on individual neurons, allowing for targeted interventions without sacrificing output quality. The study demonstrates that ablating the identified neurons in instruction-tuned models can reduce refusal rates by over 50% while maintaining fluency across various model sizes and architectures, including Llama and Qwen. The findings suggest that the refusal mechanisms are crystallized during alignment fine-tuning, transforming pre-existing discrimination structures into effective refusal gates. This work highlights the potential for neuron-level interventions to enhance model alignment without the quality trade-offs associated with traditional methods.
Methodology
The authors developed Contrastive Neuron Attribution (CNA), which involves running prompts through the model to record MLP activations, calculating mean activation differences between harmful and benign prompts, and selecting the top 0.1% of neurons with the highest differences for targeted ablation. This method requires only forward passes, avoiding the need for gradients or additional training.
Results
The application of CNA led to a more than 50% reduction in refusal rates in instruction-tuned models across various sizes and architectures, while maintaining coherent output quality. In contrast, similar interventions in base models resulted in content shifts without affecting refusal behavior, indicating that the refusal mechanism is specifically developed during fine-tuning.
Implications
The findings suggest that targeted neuron modulation could be a viable approach for enhancing the alignment and safety of language models, providing insights into the underlying mechanisms of model behavior and offering a pathway for more effective steering methods.
Incentivizing Truthfulness and Collaborative Fairness in Bayesian Learning
Theory
Federated Learning
- Introduces a mechanism that ensures both collaborative fairness and truthfulness in data sharing.
- Combines semivalues with a truthful data valuation function based on an unknown validation set.
- Proves the existence of a truthful equilibrium where sources maximize rewards through honest data submission.
- Addresses the limitations of existing methods that either ensure fairness or truthfulness, but not both.
Read more
Incentivizing Truthfulness and Collaborative Fairness in Bayesian Learning
Summary
This paper addresses the challenges of incentivizing truthfulness and ensuring collaborative fairness in collaborative machine learning settings, where multiple data sources contribute to model training. Existing data valuation methods reward sources based solely on the data they submit, without verifying its truthfulness, leading to potential manipulation such as data duplication or noise introduction. The authors propose a novel mechanism that guarantees both collaborative fairness (F) and truthfulness (T) at equilibrium for Bayesian models. The mechanism integrates semivalues, which ensure fairness, with a truthful data valuation function (DVF) based on a validation set unknown to the sources. This approach allows sources to maximize their expected data values by submitting datasets that reflect their true knowledge. The paper also explores the implications of limited budgets for rewards and the absence of a validation set, demonstrating the robustness of the proposed mechanism through theoretical proofs and empirical validation on synthetic and real-world datasets.
Methodology
The authors develop a mechanism that utilizes semivalues to ensure collaborative fairness and a data valuation function based on log-likelihood from a validation set to incentivize truthfulness. The mechanism is theoretically grounded, proving that truthful data submission leads to maximum expected rewards, and it is empirically tested across various datasets.
Results
The proposed mechanism successfully guarantees both collaborative fairness and truthfulness, as demonstrated through theoretical proofs and empirical validation. The results indicate that sources can achieve higher rewards by submitting truthful data, thus enhancing the overall performance of collaboratively trained models.
Implications
This work has significant implications for collaborative machine learning applications, particularly in fields like healthcare and finance, where data sharing is crucial. By ensuring that data sources are incentivized to provide truthful data, the proposed mechanism can improve model accuracy and reliability, fostering greater collaboration among data providers.
From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation
NLP
Large Language Models
Reinforcement Learning
- Self-distillation rewards in language models can be interpreted as Bayesian filtering increments measuring pointwise mutual information.
- There exists an input-generic bias in self-distillation rewards that can dilute the effectiveness of credit assignment.
- The proposed CREDIT method effectively isolates input-specific contributions, improving performance on various benchmarks.
- CREDIT enhances learning efficiency with negligible additional computational cost.
Read more
From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation
Summary
This paper addresses the challenges of credit assignment in on-policy self-distillation for language models, which utilize environment feedback to improve their performance. The authors analyze the nature of the self-distillation token reward, revealing that it functions as a Bayesian filtering increment that measures pointwise mutual information (pMI) between model responses and feedback. They identify a bias in this reward system towards input-generic correlations, which can obscure the true input-specific contributions. To mitigate this issue, the authors propose a new method called CREDIT (Contrastive REward from DIsTillation), which employs a batch-contrastive baseline to isolate input-specific credit. This approach enhances the model's performance across various benchmarks, including coding and scientific reasoning tasks, while requiring minimal additional computational resources. The findings suggest that by refining the reward structure, models can achieve better learning efficiency and effectiveness in complex tasks.
Methodology
The authors analyze the self-distillation reward under a posterior-compatibility framework, demonstrating its relationship to pointwise mutual information. They introduce the CREDIT method, which uses a batch-contrastive approach to correct for input-generic biases in the reward signal, allowing for more accurate credit assignment during training.
Results
CREDIT significantly improves aggregate performance on coding, scientific reasoning, and tool-use benchmarks across two model families, demonstrating its effectiveness in providing clearer input-specific credit without substantial computational overhead.
Implications
The findings have implications for the design of reinforcement learning systems, particularly in enhancing the training of language models and other AI systems that rely on self-supervised learning from feedback. By refining reward structures, models can achieve better performance in complex tasks that require nuanced understanding and reasoning.
Causal Fairness for Survival Analysis
Theory
Interpretability
Time Series
- Introduces a causal framework for fairness in survival analysis, addressing temporal disparities.
- Develops a non-parametric four-step methodology for causal pathway decomposition.
- Proves the Causal Reduction Theorem to facilitate the identification of group disparities.
- Applies the framework to analyze racial disparities in ICU outcomes, illustrating its practical utility.
Read more
Causal Fairness for Survival Analysis
Summary
This paper addresses the challenge of ensuring fairness in survival analysis, particularly in temporal contexts where disparities may evolve over time. The author critiques existing approaches in fair machine learning that primarily focus on static settings and statistical definitions of fairness, which fail to disentangle the causal mechanisms behind disparities. To fill this gap, the paper proposes a causal framework for fairness in time-to-event (TTE) analysis, allowing for the decomposition of disparities into direct, indirect, and spurious pathways. The methodology involves a four-step non-parametric approach: (1) formalizing assumptions about censoring and confounding using a graphical model; (2) recovering the conditional survival function based on covariates; (3) applying the Causal Reduction Theorem to facilitate causal pathway decomposition; and (4) efficiently estimating the effects. The framework is instantiated in three regimes: non-informative censoring, competing risks, and informative censoring, with identification results derived for each. The practical application of this framework is demonstrated through an analysis of racial disparities in outcomes following ICU admissions, showcasing how disparities in survival evolve over time.
Methodology
The methodology consists of a four-step non-parametric approach: formalizing assumptions using a graphical model, recovering conditional survival functions, applying the Causal Reduction Theorem for causal pathway decomposition, and efficiently estimating the effects across different censoring regimes.
Results
The framework successfully decomposes disparities in survival analysis into causal pathways, providing insights into how these disparities evolve over time. The analysis of ICU admissions reveals significant racial disparities in outcomes, demonstrating the framework's applicability in real-world scenarios.
Implications
This work has significant implications for high-stakes decision-making in healthcare and other domains, as it provides a structured approach to understanding and addressing fairness in temporal contexts. It can inform policy-making and algorithm design to mitigate biases in critical areas such as healthcare and criminal justice.
SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization
NLP
Large Language Models
Efficient ML
- SOAR improves NVFP4 quantization accuracy through innovative scale optimization techniques.
- Closed-form Joint Scale Optimization (CJSO) allows for simultaneous optimization of global and block-wise scales.
- Decoupled Scale Search (DSS) mitigates precision loss by separating quantization and dequantization scales.
- Extensive experiments show SOAR outperforms existing methods while maintaining the same memory footprint.
Read more
SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization
Summary
The paper introduces SOAR, a novel post-training quantization framework designed to enhance the accuracy of NVFP4 quantization for large language models (LLMs). NVFP4 is a 4-bit microscaling format that provides efficient representation for LLMs but suffers from suboptimal performance due to rigid scale selection and the coupling of quantization and dequantization scales. SOAR addresses these limitations through two main innovations: Closed-form Joint Scale Optimization (CJSO) and Decoupled Scale Search (DSS). CJSO optimizes both global and block-wise scales using analytical solutions derived from minimizing reconstruction errors, while DSS separates the high-precision quantization scale from its constrained dequantization counterpart, allowing for a more effective discrete search to reduce precision loss. Experimental results demonstrate that SOAR consistently outperforms existing NVFP4 quantization methods, achieving superior accuracy without increasing memory requirements or hardware costs. The proposed framework is particularly beneficial for deploying LLMs in resource-constrained environments, making it a significant advancement in efficient model inference.
Methodology
The methodology involves two key components: Closed-form Joint Scale Optimization (CJSO) for optimizing global and block-wise scales through reconstruction error minimization, and Decoupled Scale Search (DSS) which separates the quantization and dequantization scales to enhance precision during the quantization process. The framework employs a discrete search strategy to identify the optimal scaling configurations.
Results
SOAR achieved an accuracy of 70.68 on the Qwen3-8B model across five zero-shot tasks under NVFP4 quantization, surpassing the previous state-of-the-art accuracy of 70.12. This demonstrates the effectiveness of the proposed methods in improving quantization performance.
Implications
The advancements presented in SOAR have significant implications for the deployment of large language models, particularly in environments with limited computational resources. By enhancing quantization accuracy without additional hardware costs, SOAR facilitates more efficient inference, making it easier to integrate LLMs into various applications requiring low latency and reduced memory usage.
Deep Minds and Shallow Probes
Theory
- Probing methods should be stable under representation symmetries to avoid artifacts from arbitrary coordinate choices.
- A unique hierarchy of shallow probes is established, with linear probes as the degree-1 member and higher-order probes introduced systematically.
- The concept of probe-visible quotients is introduced for effective cross-model transfer, focusing on directions visible to the probes rather than the full hidden state.
- Experiments show that degree-2 probes significantly improve performance over linear probes in specific tasks.
Read more
Deep Minds and Shallow Probes
Summary
This paper explores the concept of probing in neural networks, particularly focusing on the stability of probes under representation symmetries. The authors argue that neural representations are not unique and can differ by reparameterization, which necessitates that probing methods should be stable under these transformations. They introduce a hierarchy of shallow probes, with linear probes as the simplest form, and establish a framework for deriving probe families based on the induced group actions of neural representations. The study emphasizes the importance of affine invariance in the final readout layer of models, leading to the identification of a polynomial hierarchy of probes. The authors conduct experiments on both synthetic and real-world tasks, demonstrating that higher-order probes (degree-2) outperform linear probes in certain scenarios and that a quotient-based transfer mechanism allows for effective cross-model probe transfer. These findings suggest a geometric theory of neural probing design, with implications for the portability of monitoring systems across different model architectures.
Methodology
The authors analyze the final readout layer of neural networks to study representation symmetries and derive a hierarchy of probes based on affine invariance. They conduct experiments on synthetic and real-world tasks to validate their theoretical claims, comparing the performance of linear and higher-order probes and exploring the effectiveness of quotient-based transfer mechanisms.
Results
The experiments confirm that the proposed degree hierarchy of probes is effective, with degree-2 probes (low-rank Canonical Polyadics) outperforming linear probes by 16.8–20.0 percentage points in AUROC on a cross-token agreement task. Additionally, the quotient-based transfer mechanism successfully enables the transfer of safety monitors across different model families without requiring target labels.
Implications
The findings suggest a new framework for designing neural probes that can be applied across various architectures, potentially improving model interpretability and the portability of monitoring systems. This could have significant implications for the development of robust AI systems that require consistent performance across different models.
FedOUI: OUI-Guided Client Weighting for Federated Aggregation
Federated Learning
- FedOUI introduces a new aggregation rule based on the Overfitting-Underfitting Indicator (OUI).
- The method improves client weighting by considering the internal activation structure of client models.
- Empirical results show significant improvements in aggregation quality under strong data heterogeneity.
- FedOUI remains lightweight and interpretable, making it suitable for practical federated learning applications.
Read more
FedOUI: OUI-Guided Client Weighting for Federated Aggregation
Summary
The paper introduces FedOUI, a novel aggregation method for federated learning that leverages the Overfitting-Underfitting Indicator (OUI) to improve client weighting during model updates. Traditional federated learning methods often rely on dataset size or gradient information for aggregation, which may not fully capture the heterogeneity of client models. FedOUI addresses this by allowing each client to compute an OUI value based on the activation patterns of their model during training. This OUI value, sent alongside the local model update, serves as a measure of the client's structural typicality. The server then uses these OUI values to assign weights to clients, down-weighting those that are structurally atypical. The method was evaluated on the CIFAR-10 dataset under conditions of strong non-IID data distribution and noisy clients, demonstrating that OUI-based weighting significantly enhances aggregation quality compared to standard methods like FedAvg and FedProx, particularly in heterogeneous environments. The results indicate that internal activation structures can provide valuable insights for federated aggregation, suggesting a shift towards more nuanced client weighting strategies in federated learning.
Methodology
FedOUI operates within a synchronous federated learning framework where each client computes an OUI value based on a fixed probe batch of data. Clients send their local model updates along with their OUI values to the server. The server then estimates the distribution of these OUI values and assigns weights to clients based on their structural typicality, using a Beta distribution to model the OUI values and compute bilateral structural scores for client weighting.
Results
The evaluation of FedOUI on CIFAR-10 showed that it outperformed standard aggregation methods like FedAvg and FedProx, particularly in scenarios with strong non-IID data distributions and noisy clients. The OUI-based weighting led to improved aggregation quality, demonstrating the effectiveness of using internal activation structures for client weighting.
Implications
FedOUI's approach to client weighting could lead to more robust federated learning systems, particularly in heterogeneous environments. By incorporating internal model signals, future federated learning frameworks may achieve better performance and generalization, paving the way for more adaptive and intelligent aggregation strategies.
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling
Time Series
- Introduction of a modular digital twin architecture for multiple diabetes types.
- Transition from correlation-based prediction to decision-aware modeling.
- Reproducible proof-of-concept evaluation using an open dataset.
- Discussion of practical deployment considerations including interpretability and safety.
Read more
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling
Summary
This paper introduces a proof-of-concept digital twin framework aimed at enhancing diabetes modeling through simulation-driven approaches. The framework leverages benchmark clinical data and synthetic temporal augmentation to generate interpretable simulated trajectories, focusing on decision-aware analysis rather than solely predictive outcomes. By integrating temporal modeling, causal reasoning, and counterfactual simulation, the framework allows for the evaluation of intervention strategies across different types of diabetes, including Type 1, Type 2, and gestational diabetes. The evaluation is conducted using a public dataset combined with controlled synthetic scenarios, demonstrating the feasibility of this approach in providing actionable insights for diabetes management. While the framework is not yet clinically validated, it lays the groundwork for future research in simulation-driven digital twin systems in healthcare, emphasizing the need for decision-oriented models that can adapt to the complexities of diabetes management.
Methodology
The methodology involves creating a digital twin framework that integrates heterogeneous data sources, temporal modeling techniques, and counterfactual simulation. The framework is evaluated using a combination of real clinical data and synthetic scenarios to illustrate its capability in simulating intervention effects and temporal behavior.
Results
The results indicate that the proposed framework successfully integrates prediction with counterfactual simulation, allowing for a nuanced understanding of how different interventions can impact diabetes management. The evaluation showcases the framework's potential to support decision-making processes in clinical settings, although it is acknowledged that further validation is needed before clinical application.
Implications
The implications of this work suggest that the digital twin framework could significantly enhance personalized diabetes management by providing clinicians and patients with actionable insights into intervention strategies. This approach could lead to improved health outcomes through better-informed decision-making in diabetes care.
On the Approximation Complexity of Matrix Product Operator Born Machines
Theory
Generative Models
Efficient ML
- Proved that KL approximation for MPO-BMs is NP-hard in the continuous setting.
- Identified conditions under which MPO-BMs can achieve efficient approximation with polynomial bond dimension.
- Demonstrated that polynomially many score queries are sufficient for estimating the induced Hamiltonian.
- Established a connection between score-based variational inference and the ground-state problem in physics.
Read more
On the Approximation Complexity of Matrix Product Operator Born Machines
Summary
This paper investigates the approximation complexity of Matrix Product Operator Born Machines (MPO-BMs), which are tensor-network models used for probabilistic modeling. The authors provide a comprehensive characterization of the conditions under which MPO-BMs can efficiently approximate target distributions. They establish that approximating arbitrary distributions with bounded Kullback-Leibler (KL) divergence is NP-hard, indicating that MPO-BMs cannot universally serve as efficient approximators in the worst-case scenario. Conversely, they identify a structured regime where efficient approximation is possible, specifically when the loss-induced Hamiltonian exhibits locality and a constant spectral gap. In this regime, MPO-BMs can achieve polynomial bond dimensions and provable KL guarantees. Furthermore, the authors demonstrate that polynomially many score queries are sufficient to estimate the induced Hamiltonian, thus avoiding the curse of dimensionality. The findings are supported by numerical experiments, providing a theoretical framework for understanding the limits and capabilities of MPO-BMs in probabilistic modeling.
Methodology
The authors utilized theoretical analysis to establish the NP-hardness of KL approximation for MPO-BMs and explored score-based variational inference to identify conditions for efficient approximation. They analyzed the locality and spectral-gap conditions of the Hamiltonian induced by target distributions and connected these findings to the estimation of induced Hamiltonians through score queries.
Results
The main results include the proof of NP-hardness for KL approximation in MPO-BMs, identification of a structured regime for efficient approximation, and the establishment that polynomially many score queries can be used to estimate the induced Hamiltonian, leading to efficient learning of MPO-BMs.
Implications
The findings have significant implications for the design of efficient algorithms in probabilistic modeling using tensor networks, guiding future research on approximation capabilities and learning strategies in high-dimensional spaces.
Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation
Large Language Models
NLP
- PCAP conditions adversarial searches on diverse attacker personas to explore realistic attack vectors.
- Empirical evaluation shows a significant increase in attack success rates and prompt diversity.
- Fine-tuning on PCAP-generated data dramatically enhances model robustness with minimal false positives.
- The approach provides a practical pipeline for automated vulnerability discovery and mitigation.
Read more
Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation
Summary
This paper introduces Persona-Conditioned Adversarial Prompting (PCAP), a novel approach to automated red-teaming for large language models (LLMs) that enhances the discovery of adversarial vulnerabilities. Traditional automated red-teaming methods often fail to capture the diversity of real-world threats due to their reliance on narrow attack tactics and limited attacker personas. PCAP addresses these shortcomings by conditioning adversarial searches on a variety of attacker personas, such as doctors, students, and malicious actors, allowing for the exploration of realistic attack scenarios. The methodology involves running parallel persona-conditioned searches to identify transferable jailbreaks and generate rich defense datasets with automatic metadata tracking. Empirical results demonstrate that PCAP significantly increases the attack success rate from 57% to 97% on GPT-OSS 120B and produces 2-6 times more diverse prompts. Furthermore, fine-tuning lightweight adapters on the data generated by PCAP leads to substantial improvements in model robustness, with F1 scores rising from 0.53 to 0.96, while maintaining low false positive rates. The study presents a complete closed-loop approach from vulnerability discovery to automated mitigation, showcasing the practical applicability of the proposed method.
Methodology
The methodology involves conditioning adversarial prompting on multiple attacker personas and strategy sets, allowing for a comprehensive exploration of diverse attack scenarios. This is achieved through parallel searches that generate a rich dataset of adversarial prompts, which are then used for fine-tuning lightweight adapters to improve model robustness.
Results
PCAP increased the attack success rate on GPT-OSS 120B from 57% to 97% and produced 2-6 times more diverse prompts. Fine-tuning on PCAP-generated data improved model robustness significantly, with F1 scores increasing from 0.53 to 0.96, while maintaining low false positive rates.
Implications
The findings suggest that PCAP can enhance the safety and robustness of LLMs by providing a more comprehensive understanding of adversarial threats. This approach could be applied in various domains where LLMs are deployed, ensuring better alignment with safety protocols and reducing the risk of harmful outputs.
Efficient Adjoint Matching for Fine-tuning Diffusion Models
Generative Models
Optimization
Efficient ML
- EAM significantly improves training efficiency by reformulating the SOC problem.
- The method eliminates the need for backward adjoint simulation, reducing computational costs.
- EAM converges up to 4× faster than traditional Adjoint Matching while maintaining or exceeding performance metrics.
- The approach leverages a linear base drift to facilitate efficient trajectory sampling.
Read more
Efficient Adjoint Matching for Fine-tuning Diffusion Models
Summary
The paper introduces Efficient Adjoint Matching (EAM), a novel approach for fine-tuning diffusion models to align them with human preferences in text-to-image generation. Traditional methods like Adjoint Matching (AM) face significant computational challenges due to the stochastic simulation of generative trajectories and the need for backward ODE simulation for adjoint states. EAM addresses these inefficiencies by reformulating the stochastic optimal control (SOC) problem with a linear base drift and a modified terminal cost, which allows for efficient trajectory construction using deterministic ODE solvers and closed-form adjoint solutions. The authors demonstrate that EAM can converge up to four times faster than AM while achieving comparable or superior performance across various metrics, including PickScore and CLIPScore, on standard text-to-image reward fine-tuning benchmarks.
Methodology
The authors propose EAM by redesigning the base drift of the SOC problem to be linear, which simplifies the trajectory simulation process. This allows for the use of a few-step deterministic ODE solver for generating endpoint images and sampling intermediate states from the original noising kernel. The adjoint state is computed using a closed-form solution, thus removing the need for backward simulation.
Results
EAM achieves convergence rates up to four times faster than AM and matches or surpasses AM in various performance metrics, including PickScore, ImageReward, HPSv2.1, CLIPScore, and Aesthetics on standard text-to-image reward fine-tuning benchmarks.
Implications
The findings suggest that EAM can be a more efficient alternative for fine-tuning diffusion models, potentially leading to faster deployment of text-to-image generation systems that better align with human preferences. This could enhance applications in creative industries, advertising, and content generation.
Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario
Theory
Optimization
Time Series
- Integration of teacher-student ensemble learning with PIDL for TSE under VSL scenarios.
- Teacher models encode local traffic physics while the student model selects appropriate estimations.
- Demonstrated superior performance in TSE compared to traditional methods.
- Addresses the challenges of varying traffic characteristics due to dynamic speed limits.
Read more
Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario
Summary
This paper presents a novel framework that integrates teacher-student ensemble learning with physics-informed deep learning (PIDL) for traffic state estimation (TSE) in scenarios with varying speed limits (VSLs). The authors identify that traditional PIDL architectures struggle to adapt to the dynamic characteristics of traffic flow influenced by VSLs. To address this, they propose a teacher-student model where teacher networks encode the physics of flow conservation locally, while a student model, implemented as a multi-layer perceptron (MLP), identifies traffic characteristics and selects appropriate teacher models for TSE. The framework effectively captures the heterogeneity of VSLs, allowing for accurate traffic state estimations. A case study demonstrates that the proposed ensemble approach outperforms existing baseline methods, as evidenced by a lower relative L2 error, thus validating the effectiveness of the approach in real-world traffic management scenarios.
Methodology
The methodology involves training teacher PIDL neural networks to model local traffic characteristics and a student PIDL neural network that generalizes across the entire road segment. The teacher networks minimize a loss function that includes both estimation and physics compliance losses, while the student network focuses on minimizing estimation loss across the entire dataset.
Results
The proposed ensemble learning framework significantly reduces the relative L2 error in traffic state estimation compared to popular baseline methods, indicating enhanced accuracy and reliability in TSE under varying speed limit conditions.
Implications
The findings suggest that the proposed framework can be effectively utilized in real-time traffic management systems, enabling better control of traffic flow and congestion through adaptive speed limit adjustments. This could lead to improved road safety and efficiency in urban transportation networks.
A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning
Graph Learning
Large Language Models
NLP
- UniGraphLM is the first model to integrate a multi-domain, multi-task GNN encoder with LLMs for unified graph token generation.
- The proposed graph-text pair pretraining strategy enhances the alignment of GNN representations with textual semantics.
- A curriculum alignment tuning strategy is introduced to adaptively manage varying alignment difficulties across diverse graph data.
- Extensive experiments validate the superiority of UniGraphLM over existing GLM baselines in multiple domains and tasks.
Read more
A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning
Summary
This paper introduces UniGraphLM, a Unified Graph Language Model that addresses the challenges of aligning Graph Neural Network (GNN) encoded representations with Large Language Models (LLMs) for multi-domain and multi-task graph alignment instruction tuning. Existing Graph Language Models (GLMs) often struggle with generalizing across diverse graph data due to the lack of a unified approach for aligning GNN representations with LLM token spaces. The authors identify two main challenges: the difficulty in learning GNN-encoded representations that are generalizable across domains and tasks, and the varying alignment difficulties presented by diverse graph data. To overcome these challenges, the authors propose a graph-text pair pretraining strategy that trains a tailored GNN encoder on large-scale graph-text datasets, enabling the model to learn representations that are naturally aligned with textual semantics. Additionally, they introduce a curriculum alignment tuning strategy that adaptively adjusts the alignment process based on the varying degrees of compatibility between graph data and the LLM token space. The extensive experiments conducted demonstrate that UniGraphLM consistently outperforms state-of-the-art baselines across various graph datasets, showcasing its effectiveness in multi-domain and multi-task learning scenarios.
Methodology
The authors developed a graph-text pair pretraining strategy using a tailored GNN encoder trained on large-scale datasets, followed by a curriculum alignment tuning strategy that adjusts the alignment process based on the compatibility of graph data with LLMs.
Results
UniGraphLM consistently outperformed state-of-the-art GLM baselines across various graph datasets, demonstrating its effectiveness in achieving generalizable representations and improved alignment with LLMs.
Implications
The findings suggest that UniGraphLM can significantly enhance the performance of graph-based tasks in diverse domains, paving the way for more robust and generalizable graph language models.
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models
Reinforcement Learning
Large Language Models
NLP
- Introduces the concept of domain block size conflict in multi-domain RL for dLLMs.
- Develops the Block-R1-41K dataset with optimal block sizes for individual samples.
- Establishes Block-R1 as a benchmark for cross-domain RL post-training.
- Proposes a sample-level block-conditioned training method for improved policy updates.
Read more
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models
Summary
This paper addresses the challenges of block size in multi-domain reinforcement learning (RL) for diffusion large language models (dLLMs). The authors identify that existing RL methods, which typically use a fixed block size, lead to domain block size conflicts that hinder the effectiveness of multi-domain post-training. They propose a novel framework, Block-R1, which formulates the concept of domain block size conflict and introduces a new dataset, Block-R1-41K, that assigns optimal block sizes to individual samples. This dataset is used to create a benchmark for flexible RL post-training across multiple domains. The authors also present a cross-domain training method that utilizes sample-level best-improved training block sizes, allowing for more effective policy updates. Extensive experiments demonstrate the advantages of their approach across 13 datasets and various RL algorithms, showing significant improvements in performance compared to traditional methods.
Methodology
The authors conducted a comprehensive analysis of block size effects in multi-domain RL settings, formulating the domain block size conflict. They constructed the Block-R1-41K dataset through a teacher-student evaluation pipeline to determine optimal block sizes for training samples. The proposed Block-R1 benchmark facilitates flexible RL post-training, allowing for sample-level block size adjustments during policy updates. Extensive experiments were performed using 13 datasets and 7 RL algorithms to validate the effectiveness of their approach.
Results
The experiments revealed that the Block-R1 approach significantly outperforms traditional fixed block size methods, particularly in multi-domain scenarios. The sample-level block-conditioned training method led to improved reasoning capabilities in dLLMs, demonstrating that tailored block sizes enhance model performance across diverse tasks.
Implications
The findings suggest that reconsidering block size in RL for dLLMs can lead to better generalization across domains, reducing overfitting to specific patterns. This work opens avenues for more adaptive and effective training methodologies in multi-domain settings, potentially benefiting a wide range of applications in natural language processing and beyond.
Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
Graph Learning
- Introduction of a label-free adaptive signed affinity to prevent sign cancellation in graph learning.
- Development of the first framework that constructs an orthonormal, multi-scale, sparse spectral basis in near-linear time.
- Empirical evidence showing that conventional GNNs suffer from hub domination, oversmoothing, and oversquashing, which HMH effectively mitigates.
- Achievement of state-of-the-art accuracy on both node and graph classification tasks while ensuring linear scalability.
Read more
Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
Summary
This paper introduces Hierarchical Multi-view HAAR (HMH), a novel spectral graph-learning framework designed to effectively handle heterophilous graphs, where adjacent nodes have different labels. Traditional spectral Graph Neural Networks (GNNs) struggle with issues like hub-dominated aggregation and oversmoothing due to their reliance on polynomial filters that can introduce approximation errors and mix signals from distant nodes. HMH addresses these challenges by first learning feature- and structure-aware signed affinities through a heterophily-aware encoder. It then constructs a soft graph hierarchy based on these embeddings. At each level of the hierarchy, HMH uses a sparse, orthonormal Haar basis to apply learnable spectral filters, allowing for better localization of signals. The architecture incorporates skip-connection unpooling layers to combine outputs from all levels, effectively mitigating hub domination and oversquashing. Experimental results demonstrate that HMH outperforms existing state-of-the-art spectral methods, achieving up to a 3% improvement in node classification and a 7% improvement in graph classification tasks, all while maintaining near-linear scalability.
Methodology
The HMH framework employs a heterophily-aware encoder to learn signed affinities, constructs a soft graph hierarchy, and utilizes a sparse Haar basis for spectral filtering. It incorporates skip-connection unpooling to reintegrate filtered signals and reduce the effects of oversquashing.
Results
HMH demonstrated superior performance compared to existing spectral GNNs, achieving up to a 3% improvement in node classification accuracy and a 7% improvement in graph classification accuracy, while maintaining linear scalability.
Implications
The proposed HMH framework has significant implications for applications in social networks, molecular interactions, and other domains where heterophilous graphs are common, enabling more accurate and efficient graph classification and learning.
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning
Reinforcement Learning
Robotics
Optimization
- Introduction of Reach-Avoid Probability Certificates (RAPCs) for enforcing probabilistic reach-avoid constraints.
- Development of a contraction-based Bellman formulation that integrates safety and cost optimization.
- Proposal of RAPCPO, a reinforcement learning algorithm that converges to locally optimal policies under probabilistic constraints.
- Demonstration of improved cost efficiency and high satisfaction rates in stochastic environments through experiments.
Read more
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning
Summary
This paper addresses the challenge of stochastic minimum-cost reach-avoid reinforcement learning, where an agent must achieve a reach-avoid specification with a certain probability while minimizing expected cumulative costs in stochastic environments. Existing methods often struggle to enforce probabilistic reach-avoid constraints while optimizing costs. To overcome this, the authors introduce Reach-Avoid Probability Certificates (RAPCs), which identify states from which the reach-avoid constraints can be satisfied. They develop a contraction-based Bellman formulation that integrates these constraints into reinforcement learning, allowing for cost optimization under probabilistic requirements. The proposed Reach-Avoid Probability-Constrained Policy Optimization (RAPCPO) algorithm is shown to converge almost surely to locally optimal policies. Experimental results in the MuJoCo simulator demonstrate that RAPCPO significantly reduces cumulative costs while maintaining high satisfaction rates for reach-avoid specifications, outperforming state-of-the-art baselines.
Methodology
The authors propose a contraction-based Bellman formulation that provides a probabilistic interpretation of the value function, allowing for the integration of reach-avoid constraints into reinforcement learning. They introduce RAPCs to certify the satisfaction of reach-avoid specifications and develop the RAPCPO algorithm, which optimizes a surrogate objective while ensuring almost sure convergence to locally optimal policies.
Results
The experimental evaluation in the MuJoCo simulator shows that RAPCPO achieves substantially lower cumulative costs compared to existing methods while consistently satisfying the desired probabilistic reach-avoid specifications, indicating its effectiveness in high-dimensional control tasks.
Implications
The proposed methods have significant implications for autonomous decision-making systems in safety-critical domains, such as robotics and autonomous driving, where ensuring safety while minimizing operational costs is crucial.