gistml

By James Asher

Daily summaries of the latest Machine Learning research papers from Arxiv.

2026-01-21 • Found 24 papers

A Boolean Function-Theoretic Framework for Expressivity in GNNs with Applications to Fair Graph Mining

Manjish Pal
  • Introduces Subpopulation Boolean Isomorphism (SBI) as a new expressivity measure for GNNs, subsuming existing frameworks like WL and homomorphism-based approaches.
  • Identifies theoretical barriers to GNN expressivity, including Fourier degree, circuit class, and influence, particularly in fairness-aware contexts.
  • Develops a circuit-traversal-based algorithm to ensure fairness across subpopulations defined by high-complexity Boolean functions.
  • Demonstrates superior performance in reducing fairness gaps across diverse subpopulations in real-world graph datasets.
  • Provides the first principled framework for analyzing GNN expressivity tailored to fairness-sensitive applications.
Read More
Abstract
This paper introduces a novel framework for analyzing the expressivity of Graph Neural Networks (GNNs) using Boolean function theory. The proposed framework, termed Subpopulation Boolean Isomorphism (SBI), extends existing expressivity measures such as the Weisfeiler-Lehman (WL) test and homomorphism-based approaches by incorporating both structural properties and node features. The study identifies key theoretical barriers to GNN expressivity, including Fourier degree, circuit class, and influence, particularly in fairness-sensitive applications. A new fairness-aware algorithm is developed, leveraging circuit traversal techniques to handle subpopulations defined by complex Boolean functions, such as parity, which are challenging for existing methods. Experimental evaluations on real-world graphs demonstrate that the proposed method achieves lower fairness gaps across intersectional subpopulations compared to state-of-the-art approaches, marking a significant advancement in fairness-aware GNN design.
Methodology
The paper models subpopulations in graphs using binary indicator vectors and Boolean functions, introducing the concept of Subpopulation Boolean Isomorphism (SBI) to analyze GNN expressivity. It leverages circuit traversal techniques to design a fairness-aware algorithm capable of handling complex Boolean functions. The framework is evaluated theoretically and experimentally on real-world graph datasets, comparing its fairness performance against existing methods.
Results
The proposed framework achieves lower fairness gaps across intersectional subpopulations in real-world graphs, outperforming state-of-the-art fairness-aware GNN methods. It successfully handles subpopulations defined by complex Boolean functions, such as parity, which existing methods fail to address.
Implications
This work provides a robust theoretical foundation for analyzing GNN expressivity and designing fairness-aware algorithms. It has potential applications in fairness-sensitive domains such as social network analysis, financial risk modeling, and bioinformatics, where equitable treatment across diverse subpopulations is critical.
View on arXiv

A Graph Prompt Fine-Tuning Method for WSN Spatio-Temporal Correlation Anomaly Detection

Miao Ye, Jing Cui, Yuan Huang, Qian He, Yong Wang, Jiwen Zhang
  • Introduces a graph neural network-based anomaly detection framework tailored for WSN spatio-temporal data.
  • Incorporates a multi-task self-supervised learning strategy with pre-training, graph prompting, and fine-tuning to reduce annotation costs and improve generalization.
  • Improves the Mamba model with multi-scale strategies, inter-modal fusion, and variational graph convolution to enhance spatio-temporal feature extraction.
  • Achieves superior performance compared to existing methods, with F1 scores exceeding 91% on both public and real-world datasets.
  • Addresses challenges such as long-term temporal dependencies, spatio-temporal correlations, and data imbalance in WSN anomaly detection.
Read More
Abstract
This paper addresses the challenges of anomaly detection in Wireless Sensor Networks (WSNs), particularly in scenarios involving multi-modal, spatio-temporal data. Existing methods often fail to fully capture spatio-temporal correlations, suffer from high annotation costs for anomaly samples, and struggle with imbalanced datasets. To overcome these limitations, the authors propose a novel graph neural network (GNN)-based framework that incorporates spatio-temporal correlation features and a multi-task self-supervised training strategy. The framework consists of three main components: a backbone anomaly detection network, a pre-training mechanism with three subtasks (contrastive learning, prediction, and reconstruction), and a graph prompting fine-tuning mechanism. The backbone network is built on an improved Mamba model with a multi-scale strategy, inter-modal fusion, and a variational graph convolution module to effectively extract spatio-temporal features. The proposed method achieves state-of-the-art performance on both public and real-world WSN datasets, with F1 scores of 91.30% and 92.31%, respectively.
Methodology
The proposed method consists of three main components: (1) a backbone anomaly detection network based on an improved Mamba model with multi-scale strategies, inter-modal fusion, and variational graph convolution to capture spatio-temporal correlations; (2) a pre-training mechanism with three subtasks (contrastive learning, prediction, and reconstruction) to learn generic features from unlabeled data; and (3) a graph prompting fine-tuning mechanism to guide the pre-trained model for task-specific parameter optimization. This approach reduces training costs and enhances detection generalization.
Results
The proposed framework achieves F1 scores of 91.30% on a public dataset and 92.31% on a real-world WSN dataset, outperforming existing anomaly detection methods. These results demonstrate the model's ability to effectively capture spatio-temporal correlations and generalize well across datasets.
Implications
This work has significant implications for improving the reliability and stability of WSNs in various applications, such as environmental monitoring, industrial inspection, and intelligent transportation. By reducing annotation costs and enhancing generalization, the proposed method can be widely adopted for real-time anomaly detection in complex, multi-modal WSN environments.
View on arXiv

A Learnable Wavelet Transformer for Long-Short Equity Trading and Risk-Adjusted Return Optimization

Shuozhe Li, Du Cheng, Leqi Liu
  • WaveLSFormer combines a learnable wavelet filter bank with a Transformer backbone for multi-scale financial time series analysis.
  • The low-guided high-frequency injection (LGHI) module enhances low-frequency representations with high-frequency signals while ensuring stability.
  • The model is trained end-to-end with a trading-aware objective and incorporates risk-budget rescaling for practical portfolio construction.
  • WaveLSFormer achieves significant improvements in ROI and Sharpe ratio compared to baseline models, demonstrating superior profitability and risk-adjusted returns.
  • The approach addresses key challenges in financial time series, including noise, non-stationarity, and cross-sectional dependencies.
Read More
Abstract
This paper introduces WaveLSFormer, a novel deep learning model designed for intraday long-short equity trading and risk-adjusted return optimization. The model integrates a learnable wavelet-based front-end with a Transformer backbone to jointly perform multi-scale decomposition and trading-oriented decision-making. Unlike traditional approaches that rely on fixed wavelet preprocessing, WaveLSFormer employs an end-to-end learnable wavelet filter bank optimized for task-specific frequency bands. A key innovation is the low-guided high-frequency injection (LGHI) module, which refines low-frequency representations with high-frequency cues while maintaining training stability. The model also incorporates a risk-budget rescaling mechanism to ensure practical portfolio construction. Extensive experiments on five years of hourly U.S. equity data across six industry groups demonstrate that WaveLSFormer significantly outperforms baseline models, including MLPs, LSTMs, and Transformers, in terms of both profitability and risk-adjusted returns. The proposed model achieves a substantial improvement in return on investment (ROI) and Sharpe ratio, highlighting its effectiveness in addressing the challenges of noisy, non-stationary financial time series and cross-sectional dependencies.
Methodology
WaveLSFormer employs a learnable wavelet front-end to decompose financial time series into low- and high-frequency components. The Transformer backbone processes these components, and the LGHI module explicitly fuses multi-scale information. The model outputs long/short positions for a portfolio, rescaled to satisfy a fixed risk budget. Training is conducted end-to-end using a trading-aware objective function, incorporating spectral regularization and risk-aware constraints.
Results
WaveLSFormer outperforms baseline models (MLP, LSTM, and Transformer) across five years of hourly U.S. equity data in six industries. It achieves an average cumulative ROI of 0.607 ± 0.045 and a Sharpe ratio of 2.157 ± 0.166, compared to the strongest baseline ROI of 0.317 ± 0.050 and Sharpe ratio of 1.879 ± 0.158. These results highlight its ability to deliver robust profitability and risk-adjusted returns.
Implications
WaveLSFormer has significant implications for quantitative finance, particularly in systematic trading and portfolio management. Its ability to learn task-specific frequency bands and optimize for trading performance rather than forecasting metrics makes it a valuable tool for developing robust, risk-aware trading strategies. The model's success in handling noisy, non-stationary data and cross-sectional dependencies could inspire further advancements in financial machine learning and time-series analysis.
View on arXiv

A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model

Jinhao Li, Hao Wang
  • PRAIM is a unified imputation framework that uses large language models and retrieval-augmented memory to handle missing EV charging data.
  • The framework encodes multimodal data, including time-series, calendar, and geospatial features, into a cohesive representation.
  • PRAIM dynamically retrieves relevant examples from the charging network, enabling a single model to impute data across multiple stations.
  • It outperforms traditional and deep learning-based imputation methods in accuracy and preserving the original data's statistical properties.
  • The framework improves downstream tasks like demand forecasting, making it highly applicable to EV infrastructure management.
Read More
Abstract
This paper introduces PRAIM, a novel probabilistic variational imputation framework designed to address the challenges of missing data in electric vehicle (EV) charging datasets. Unlike traditional methods that rely on station-specific models or fail to capture the multimodal nature of charging data, PRAIM leverages the power of pre-trained large language models (LLMs) and retrieval-augmented memory to create a unified, semantically rich representation of the data. The framework integrates diverse data types, including time-series demand, calendar features, and geospatial context, into a single model. By dynamically retrieving relevant examples from the entire charging network, PRAIM overcomes data sparsity and captures inter-station correlations. Extensive experiments on four public datasets demonstrate that PRAIM significantly outperforms existing imputation methods in terms of accuracy and its ability to preserve the statistical distribution of the original data. This leads to improved performance in downstream tasks such as demand forecasting, highlighting the framework's potential for real-world applications in EV infrastructure management.
Methodology
PRAIM employs a pre-trained large language model to encode heterogeneous data into a unified representation. It incorporates retrieval-augmented memory to dynamically retrieve relevant examples from the charging network, enabling the model to leverage inter-station correlations. A variational neural architecture is used to perform probabilistic imputation, addressing data sparsity and multimodal dependencies. The framework is evaluated on four public EV charging datasets, comparing its performance against baseline statistical, machine learning, and deep learning methods.
Results
PRAIM demonstrated superior imputation accuracy compared to baseline methods, effectively preserving the statistical distribution of the original data. This led to significant improvements in downstream tasks such as EV charging demand forecasting. The framework's ability to handle multimodal data and leverage inter-station correlations was key to its performance advantage.
Implications
The PRAIM framework has significant implications for EV infrastructure management, particularly in improving the reliability of data-driven applications such as demand forecasting and grid load management. Its ability to handle multimodal data and impute missing records across multiple stations makes it a valuable tool for optimizing EV charging networks and supporting the broader transition to electric mobility.
View on arXiv

AdaNODEs: Test Time Adaptation for Time Series Forecasting Using Neural ODEs

Ting Dang, Soumyajit Chatterjee, Hong Jia, Yu Wu, Flora Salim, Fahim Kawsar
  • AdaNODEs is a source-free TTA method specifically designed for time series forecasting, addressing temporal distribution shifts.
  • The framework uses Neural Ordinary Differential Equations (NODEs) with two adaptive parameters (α and γ) to capture temporal dynamics during test time.
  • A novel loss function combining negative log-likelihood (NLL) and KL divergence is introduced to improve forecasting performance.
  • AdaNODEs requires minimal parameter updates, making it computationally efficient and memory-friendly.
  • The method achieves significant performance improvements over SOTA baselines, particularly in scenarios with high-severity distribution shifts.
Read More
Abstract
This paper introduces AdaNODEs, a novel test-time adaptation (TTA) framework specifically designed for time series forecasting under distribution shifts. Unlike existing TTA methods that focus on independent data or classification tasks, AdaNODEs leverages Neural Ordinary Differential Equations (NODEs) to model temporal dynamics and adapt to unseen data distributions without requiring labeled target data or access to source data. The framework incorporates two learnable parameters, α and γ, into the NODEs to adjust for temporal distribution shifts during test time. Additionally, the authors propose a new loss function combining negative log-likelihood (NLL) and Kullback-Leibler (KL) divergence, tailored for forecasting tasks. Extensive experiments on both one-dimensional and high-dimensional time series datasets demonstrate that AdaNODEs outperforms state-of-the-art (SOTA) baselines, achieving relative improvements of 5.88% and 28.4% in forecasting accuracy, particularly under severe distribution shifts.
Methodology
AdaNODEs employs a variational encoder-decoder architecture with a latent NODEs module. The encoder transforms input time series into a latent representation, which is then processed by the NODEs to model temporal dynamics. During test-time adaptation, two additional parameters (α and γ) are introduced to adapt the NODEs to unseen data distributions. The decoder reconstructs the forecasted time series from the adapted latent representation. The training process optimizes a novel loss function that combines negative log-likelihood (NLL) and KL divergence to handle the challenges of forecasting under distribution shifts.
Results
AdaNODEs demonstrated superior performance in extensive experiments on both one-dimensional and high-dimensional time series datasets. It achieved relative improvements of 5.88% and 28.4% over SOTA baselines, particularly excelling in scenarios with severe distribution shifts. The method effectively captured temporal dependencies while maintaining computational efficiency by updating only a limited number of parameters during test time.
Implications
AdaNODEs has significant implications for real-world applications where time series data is subject to distribution shifts, such as healthcare monitoring, financial forecasting, and IoT systems. Its source-free and label-free test-time adaptation capabilities make it particularly valuable in scenarios with privacy concerns or limited access to labeled data. The approach also opens new avenues for applying TTA methods to regression tasks and advancing the use of NODEs in time series modeling.
View on arXiv

Autoregressive Deep Learning for Real-Time Simulation of Soft Tissue Dynamics During Virtual Neurosurgery

Fabian Greifeneder, Wolfgang Fenz, Benedikt Alkin, Johannes Brandstetter, Michael Giretzlehner, Philipp Moser
  • The paper proposes a deep learning-based surrogate model for real-time simulation of soft tissue dynamics in neurosurgical training environments.
  • The model leverages Universal Physics Transformers (UPT) to operate directly on large-scale mesh data, avoiding the inefficiencies of grid-based approaches.
  • A stochastic teacher forcing strategy is introduced to improve long-term prediction stability during autoregressive inference.
  • The model achieves accurate predictions with a maximum error reduction from 6.7 mm to 3.5 mm and scales to large meshes with up to 150,000 nodes.
  • The trained model integrates into an interactive simulation environment, achieving sub-10 millisecond runtimes per simulation step on consumer-grade hardware.
Read More
Abstract
This paper introduces a deep learning-based surrogate model for simulating soft tissue dynamics in real-time during virtual neurosurgery. Traditional numerical solvers, such as finite element methods (FEM), struggle to meet the computational demands of real-time, high-fidelity simulations required for interactive surgical training environments. To address this, the authors propose a model based on Universal Physics Transformers (UPT) that operates directly on large-scale mesh data, avoiding the inefficiencies of grid-based projections. The model is trained on a dataset generated from nonlinear finite element simulations, capturing a wide range of surgical tool-tissue interaction scenarios. A novel stochastic teacher forcing strategy is introduced during training to improve the stability of long-term predictions. The model achieves accurate predictions of transient brain deformations, scales to meshes with up to 150,000 nodes, and delivers simulation runtimes below 10 milliseconds per step on consumer-grade hardware. This framework enables realistic, interactive neurosurgical simulations, paving the way for advanced surgical training and planning tools.
Methodology
The authors use Universal Physics Transformers (UPT) as the backbone architecture for their surrogate model, which directly processes large-scale mesh data. The model is trained on a dataset generated from nonlinear finite element simulations, covering diverse tool-tissue interaction scenarios. To enhance long-term prediction stability, a stochastic teacher forcing strategy is applied during training, gradually replacing ground truth inputs with model-generated predictions in short rollout sequences.
Results
The proposed model achieves accurate and efficient predictions of transient brain deformations, with a significant reduction in maximum prediction error (from 6.7 mm to 3.5 mm). It scales to large meshes with up to 150,000 nodes and achieves runtimes below 10 milliseconds per simulation step on consumer-grade hardware. These results demonstrate the model's suitability for real-time, interactive neurosurgical simulations.
Implications
This work has significant implications for the development of realistic and interactive neurosurgical training environments. By enabling real-time, high-fidelity simulations of soft tissue dynamics, the proposed framework can enhance surgical training, preoperative planning, and intraoperative guidance. The approach also highlights the potential of deep learning-based surrogate models to replace computationally expensive numerical solvers in other domains requiring real-time physical simulations.
View on arXiv

Auxiliary-predicted Compress Memory Model (ApCM Model): A Neural Memory Storage Model Based on Invertible Compression and Learnable Prediction

Weinuo Ou
  • Proposes the ApCM Model, which combines invertible compression with learnable prediction for efficient memory storage and reconstruction.
  • Introduces a global memory bank with a cosine-similarity-based read mechanism and a frequency-based write policy for dynamic memory management.
  • Achieves compression efficiency comparable to PCA while offering superior nonlinear modeling and reconstruction capabilities.
  • Optimizes memory storage by decoupling compressed storage (zcomp) from auxiliary information (zaux), enabling lossy yet high-fidelity reconstruction.
  • Demonstrates potential for enhancing runtime memory in AI systems, particularly for tasks requiring dynamic knowledge updates and personalized interactions.
Read More
Abstract
The paper introduces the Auxiliary-predicted Compress Memory (ApCM) Model, a novel neural memory storage architecture designed to address the lack of runtime memory in artificial intelligence systems, particularly in Large Language Models (LLMs). The ApCM Model integrates invertible dimensionality reduction with a learnable auxiliary predictor to enable efficient, lossy compression and high-fidelity reconstruction of input data. The architecture employs an invertible neural network to encode data into a latent space, splitting it into a compressed representation (zcomp) for storage and an auxiliary representation (zaux) that is discarded. A lightweight predictor network is trained to estimate zaux from zcomp, enabling reconstruction of the original data via the inverse transform. Additionally, the model incorporates a global memory bank with a cosine-similarity-based read mechanism and an access-frequency-based write policy for dynamic memory management. Experimental results demonstrate that the ApCM Model achieves compression efficiency comparable to Principal Component Analysis (PCA) while outperforming it in nonlinear data reconstruction, making it a promising solution for runtime memory in AI systems.
Methodology
The ApCM Model uses an invertible neural network with stacked affine coupling layers and random permutation layers to encode input data into a latent space. The latent representation is split into zcomp (compressed storage) and zaux (auxiliary information). A lightweight predictor network is trained to estimate zaux from zcomp. The model also includes a global memory bank for storing zcomp, with a cosine-similarity-based read mechanism and a frequency-based write policy for dynamic memory management. The entire system is trained end-to-end by minimizing reconstruction loss.
Results
The ApCM Model achieves compression efficiency comparable to PCA while surpassing it in nonlinear data reconstruction. The model demonstrates strong performance in storing and retrieving data dynamically, with high reconstruction fidelity enabled by the learnable auxiliary predictor. The proposed memory management mechanisms further enhance its adaptability and efficiency.
Implications
The ApCM Model has significant implications for enhancing runtime memory in AI systems, particularly in applications like Large Language Models (LLMs) that require dynamic knowledge updates, long-context understanding, and personalized interactions. Its efficient and learnable memory storage paradigm could reduce computational and storage bottlenecks, enabling more scalable and adaptive AI systems.
View on arXiv

Beyond Softmax and Entropy: Improving Convergence Guarantees of Policy Gradients by f-SoftArgmax Parameterization with Coupled Regularization

Safwan Labbi, Daniil Tiapkin, Paul Mangold, Eric Moulines
  • The softmax parameterization in policy gradient methods often leads to ill-conditioned optimization landscapes and slow convergence, which can be mitigated by the proposed f-softargmax parameterization.
  • The f-softargmax parameterization is coupled with a regularizer induced by the same f-divergence, improving the optimization landscape and ensuring a uniform Polyak–Łojasiewicz inequality.
  • The proposed approach provides the first explicit non-asymptotic last-iterate convergence guarantees for stochastic policy gradient methods in finite MDPs without requiring preconditioning.
  • Using Tsallis divergence within the f-softargmax framework results in polynomial sample complexity, significantly outperforming the exponential complexity of softmax-entropy methods.
  • The method generalizes beyond softmax-entropy regularization, offering a more robust and efficient alternative for policy gradient optimization in RL.
Read More
Abstract
This paper addresses the limitations of the widely-used softmax parameterization in policy gradient methods for reinforcement learning (RL). The authors propose a novel family of parameterizations called f-softargmax, which are derived from f-divergence generators. By coupling these parameterizations with regularizers induced by the same f-divergence, the authors improve the optimization landscape and establish stronger theoretical guarantees for convergence. Specifically, they demonstrate that the proposed approach satisfies a Polyak–Łojasiewicz inequality, enabling explicit non-asymptotic last-iterate convergence guarantees for stochastic policy gradient methods in finite Markov Decision Processes (MDPs). The paper highlights the advantages of using Tsallis divergence within this framework, showing that it achieves polynomial sample complexity, in contrast to the exponential complexity associated with the standard softmax-entropy pairing. The proposed method eliminates the need for computationally expensive preconditioning while improving convergence rates significantly.
Methodology
The authors introduce the f-softargmax parameterization, derived from f-divergence generators, and couple it with a regularizer based on the same f-divergence. They analyze the f-regularized value function and demonstrate that it satisfies a non-uniform Łojasiewicz inequality and a monotonicity property. This leads to a uniform Polyak–Łojasiewicz inequality over the optimization region, enabling improved convergence guarantees. The theoretical analysis is supported by comparisons of optimization landscapes and convergence rates between the proposed method and traditional softmax-entropy approaches.
Results
The proposed f-softargmax parameterization, particularly with Tsallis divergence, achieves polynomial sample complexity and significantly faster convergence rates compared to the exponential rates of softmax-entropy methods. The authors provide explicit global last-iterate convergence guarantees for stochastic policy gradient methods in finite MDPs, without relying on preconditioning or large batch sizes. The optimization landscape under the f-softargmax parameterization is shown to be better conditioned, avoiding flat regions that hinder convergence.
Implications
The proposed f-softargmax parameterization with coupled regularization has the potential to improve the efficiency and scalability of policy gradient methods in reinforcement learning. By addressing the limitations of softmax-entropy regularization, this approach could lead to faster and more reliable training of RL agents, particularly in complex environments. The framework may also inspire further research into alternative parameterizations and regularization techniques for optimization in machine learning.
View on arXiv

Differentiable Logic Synthesis: Spectral Coefficient Selection via Sinkhorn-Constrained Composition

Gorgi Pavlov
  • Introduces Hierarchical Spectral Composition, a differentiable architecture for Boolean logic synthesis using Fourier analysis.
  • Adapts the Sinkhorn-constrained routing framework from mHC to stabilize optimization and enable Boolean negations via column-sign modulation.
  • Demonstrates 100% accuracy for n = 2 and n = 4 variable Boolean operations, with sparsity and hardware-efficient ternary representations.
  • Achieves 10,959 MOps/s throughput on GPU, enabling single-cycle combinational logic inference.
  • Establishes the viability of spectral synthesis for scalable neuro-symbolic logic tasks.
Read More
Abstract
This paper introduces a novel differentiable architecture for Boolean logic synthesis called Hierarchical Spectral Composition. The approach leverages Boolean Fourier analysis to select spectral coefficients from a pre-computed Fourier basis and composes them using Sinkhorn-constrained routing with column-sign modulation. This method addresses the challenges of learning precise Boolean logic through gradient descent, which often results in fuzzy approximations that degrade under quantization. By adapting the Manifold-Constrained Hyper-Connections (mHC) framework, the authors stabilize the optimization process and extend its expressivity to include Boolean negations. The architecture is validated across increasing levels of complexity (n = 2, 3, 4 variables), achieving high accuracy and sparsity in learned representations. Furthermore, the models are hardware-efficient, enabling single-cycle combinational logic inference with high throughput on GPUs. The work bridges the gap between symbolic reasoning and gradient-based learning, offering a scalable and efficient solution for neuro-symbolic logic synthesis.
Methodology
The proposed method uses Boolean Fourier analysis to represent Boolean functions as spectral coefficients. These coefficients are selected and composed using Sinkhorn-constrained routing matrices projected onto the Birkhoff polytope, ensuring stability and norm preservation. Column-sign modulation is introduced to extend expressivity for Boolean negations. The architecture is validated through gradient descent optimization and refined using MCMC techniques for higher-dimensional cases. The learned ternary polynomial threshold functions are compiled into hardware-efficient combinational logic blocks.
Results
The architecture achieves 100% accuracy for n = 2 (16 Boolean operations) and n = 4 (10 operations) with zero-loss quantization and sparsity of up to 39%. For n = 3, it achieves 76% accuracy via gradient descent but reaches 100% accuracy with exhaustive enumeration. The models demonstrate hardware efficiency with 10,959 MOps/s throughput on GPU, supporting single-cycle combinational logic inference.
Implications
This work has significant implications for neuro-symbolic AI, enabling precise and scalable Boolean logic synthesis that bridges symbolic reasoning and gradient-based learning. The hardware-efficient design makes it suitable for edge inference applications, where resource constraints are critical. It also opens avenues for integrating logic synthesis into larger differentiable systems, such as neural networks for reasoning tasks.
View on arXiv

Distribution Shift Is Key to Learning Invariant Prediction

Hong Zheng, Fei Teng
  • Distribution shift across training domains is critical for learning invariant prediction models.
  • ERM can achieve performance comparable to invariant prediction methods under specific data conditions, such as large distribution shifts or causality-related assumptions.
  • Theoretical results show that sufficient distribution shift ensures minimal generalization error and stable predictions across domains.
  • Empirical experiments demonstrate a linear correlation between distribution shift and model performance, supporting the theoretical claims.
  • The findings provide a new perspective on why ERM can perform well in OOD tasks, even with a small number of training domains.
Read More
Abstract
This paper investigates why Empirical Risk Minimization (ERM) sometimes outperforms specialized methods for out-of-distribution (OOD) tasks, despite its simplicity. The authors identify distribution shift across training domains as a key factor that influences model learning and facilitates invariant prediction. They provide theoretical and empirical evidence showing that a large degree of distribution shift can improve model performance, enabling ERM to approximate invariant prediction models that generalize well across domains. The study derives upper bounds indicating that distribution shift directly impacts prediction ability, and under certain conditions, ERM solutions can achieve performance comparable to invariant prediction models. Empirical experiments on tasks like CMNIST validate these findings, showing a positive correlation between distribution shift and model performance. The work challenges the assumption that ERM is inherently inferior for OOD tasks and highlights the importance of data conditions in achieving robust generalization.
Methodology
The authors derive theoretical bounds to analyze the relationship between distribution shift and model performance, using KL divergence to measure the degree of shift. They also conduct empirical experiments on classification tasks, such as CMNIST, to validate their theoretical findings. The analysis focuses on regression and classification cases to demonstrate how distribution shift affects learning invariant predictors.
Results
The study shows that a large degree of distribution shift in training data leads to better model performance, enabling ERM to approximate invariant prediction models. Theoretical bounds confirm that distribution shift is necessary for achieving stable and generalizable predictions. Empirical results on CMNIST reveal a linear relationship between distribution shift and model accuracy, aligning with the theoretical insights.
Implications
The findings suggest that distribution shift should be considered a critical factor in designing and evaluating machine learning models for domain generalization. This work challenges the notion that ERM is inherently unsuitable for OOD tasks and highlights the importance of data diversity in training. It could inform the development of new algorithms that leverage distribution shift to improve generalization and robustness in real-world applications.
View on arXiv

GRADE: Replacing Policy Gradients with Backpropagation for LLM Alignment

Lukas Abrie Nel
  • GRADE replaces policy gradient methods like PPO with direct backpropagation using Gumbel-Softmax relaxations, addressing high-variance gradient estimation issues.
  • The proposed GRADE-STE method combines Gumbel-Softmax with straight-through estimation to enable differentiable gradient flow while maintaining realistic discrete token generation.
  • GRADE-STE achieves a 50% relative improvement in test reward over PPO on sentiment-controlled text generation tasks and reduces gradient variance by over 14× compared to REINFORCE.
  • The method demonstrates stable training dynamics and superior generalization to held-out data, validated through rigorous train/validation/test splits.
  • GRADE simplifies the LLM alignment process by eliminating the need for complex hyperparameter tuning and computationally expensive policy gradient methods.
Read More
Abstract
This paper introduces GRADE (Gumbel-softmax Relaxation for Alignment via Differentiable Estimation), a novel approach to aligning large language models (LLMs) with human preferences by replacing traditional reinforcement learning from human feedback (RLHF) methods, such as Proximal Policy Optimization (PPO), with direct backpropagation. GRADE leverages the Gumbel-Softmax reparameterization and straight-through estimation (STE) to enable end-to-end differentiable gradient flow through the token sampling process. This eliminates the high-variance gradient estimation issues inherent in policy gradient methods. The authors demonstrate that GRADE-STE achieves superior performance on sentiment-controlled text generation tasks using the IMDB dataset, with a 50% relative improvement in test reward over PPO and significantly reduced gradient variance. The method also exhibits stable training dynamics and better generalization to unseen data. By simplifying the optimization process and improving efficiency, GRADE offers a promising alternative to RLHF for LLM alignment.
Methodology
GRADE uses the Gumbel-Softmax reparameterization to create a differentiable relaxation of the discrete token sampling process. This allows for direct backpropagation of gradients from reward signals through the token generation process to model parameters. The GRADE-STE variant employs straight-through estimation, which uses hard samples during the forward pass and soft gradients during the backward pass, enabling realistic text generation while maintaining differentiability. The method was evaluated on sentiment-controlled text generation using the IMDB dataset with strict data splits to ensure robust generalization testing.
Results
GRADE-STE achieved a test reward of 0.763 ± 0.344, compared to 0.510 ± 0.313 for PPO and 0.617 ± 0.378 for REINFORCE, representing a 50% relative improvement over PPO. It also reduced gradient variance by over 14× compared to REINFORCE and maintained stable training dynamics throughout optimization. The method demonstrated superior generalization characteristics on held-out test data.
Implications
GRADE offers a simpler, more stable, and computationally efficient alternative to traditional RLHF methods for aligning LLMs with human preferences. Its ability to reduce gradient variance and improve generalization could make it a valuable tool for training safer and more reliable language models. This approach has potential applications in areas requiring fine-grained control over LLM outputs, such as content moderation, personalized assistants, and creative writing tools.
View on arXiv

Inverting Self-Organizing Maps: A Unified Activation-Based Framework

Alessandro Londei, Matteo Benati, Denise Lanzieri, Vittorio Loreto
  • The authors prove that SOM activation patterns (squared distances to prototypes) can be inverted to recover the exact input vector under certain geometric conditions.
  • They introduce the MUSIC update rule, which enables controlled, geometry-aware transformations in the latent space by modifying specific prototype distances while preserving others.
  • The proposed method is deterministic and does not rely on probabilistic sampling, generative decoders, or latent priors, unlike traditional generative models.
  • MUSIC enables semantically meaningful data augmentation and latent-space exploration while maintaining consistency with the data manifold.
  • The framework is validated on synthetic and real-world datasets, demonstrating its ability to produce smooth, interpretable, and coherent transformations.
Read More
Abstract
This paper introduces a novel framework for inverting Self-Organizing Maps (SOMs) by leveraging the geometric information encoded in their activation patterns. The authors demonstrate that the squared distances between an input vector and SOM prototypes can be inverted to recover the exact input under mild geometric conditions. Building on this, they propose the Manifold-Aware Unified SOM Inversion and Control (MUSIC) update rule, which enables controlled, semantically meaningful transformations in the latent space. Unlike generative models such as VAEs or diffusion models, MUSIC operates deterministically and relies solely on prototype geometry without requiring sampling or latent priors. The framework is validated on synthetic Gaussian mixtures, MNIST, and natural face datasets, showing its ability to produce smooth, interpretable trajectories that align with the underlying data manifold. This work positions SOMs as a powerful tool for controllable latent-space exploration and data augmentation.
Methodology
The authors derive a linear system to invert SOM activation patterns based on Euclidean distance geometry, ensuring exact recovery of input vectors when prototypes span the input space. They introduce the MUSIC update rule, which modifies squared distances to selected prototypes while preserving others, enforcing Euclidean realizability through Tikhonov regularization. The method is tested on synthetic Gaussian mixtures, MNIST, and natural face datasets to evaluate its performance in reconstructing inputs and generating controlled transformations.
Results
The proposed framework successfully recovers exact inputs from SOM activations under appropriate conditions. The MUSIC update rule generates smooth, interpretable trajectories that align with the data manifold. Experiments on synthetic Gaussian mixtures, MNIST, and natural face datasets demonstrate the method's ability to produce semantically meaningful transformations, outperforming unsupervised clustering and random reference sets in terms of coherence and interpretability.
Implications
This work redefines the role of SOMs from static quantizers to dynamic tools for latent-space exploration and data augmentation. The deterministic and geometry-aware nature of the framework makes it suitable for applications in data visualization, semantic interpolation, and controlled data generation in fields such as computer vision, robotics, and genomics.
View on arXiv

Knowledge-Integrated Representation Learning for Crypto Anomaly Detection under Extreme Label Scarcity; Relational Domain-Logic Integration with Retrieval-Grounded Context and Path-Level Explanations

Gyuyeon Na, Minjung Park, Soyoun Kim, Jungbin Shin, Sangmi Chai
  • RDLI embeds expert-derived heuristics as logic-aware latent signals to detect complex transactional flows that evade standard GNNs.
  • The Retrieval-Grounded Context (RGC) module integrates real-time regulatory and macroeconomic data to mitigate false positives in volatile markets.
  • RDLI provides path-level explanations, ensuring audit-ready transparency and compliance with regulatory standards.
  • Under extreme label scarcity (0.01%), RDLI outperforms GNN baselines by 28.9% in F1-score.
  • A micro-expert user study confirms RDLI's improved trustworthiness and perceived usefulness compared to existing methods.
Read More
Abstract
This paper addresses the challenge of detecting anomalous transactions in decentralized cryptocurrency networks, where extreme label scarcity and adaptive evasion strategies by illicit actors hinder effective anomaly detection. The authors propose a novel framework called Relational Domain-Logic Integration (RDLI), which embeds expert-derived heuristics as differentiable, logic-aware latent signals into representation learning. RDLI overcomes limitations of Graph Neural Networks (GNNs) by capturing multi-hop, logic-driven motifs such as fund dispersal and layering, which are critical for identifying sophisticated money laundering activities. Additionally, the framework incorporates a Retrieval-Grounded Context (RGC) module to condition anomaly scoring on real-time regulatory and macroeconomic contexts, reducing false positives caused by benign market shifts. RDLI also provides path-level explanations that enhance interpretability and compliance with regulatory requirements. Experimental results demonstrate RDLI's superior performance under extreme label scarcity, achieving a 28.9% improvement in F1-score over state-of-the-art GNN baselines. A user study further highlights RDLI's effectiveness in improving trustworthiness and perceived usefulness for forensic investigators.
Methodology
The RDLI framework integrates expert-derived heuristics into representation learning as differentiable latent signals, enabling the detection of multi-hop, logic-driven patterns. It incorporates a Retrieval-Grounded Context (RGC) module to condition anomaly scoring on external regulatory and macroeconomic contexts. RDLI also generates path-level explanations by mapping flagged activities to salient subgraphs and domain-logic cues, enhancing interpretability and auditability.
Results
RDLI achieves a 28.9% improvement in F1-score over state-of-the-art GNN baselines under extreme label scarcity (0.01%). A micro-expert user study (n=24) shows significant improvements in trustworthiness, perceived usefulness, and clarity of explanations compared to conventional methods, with statistical significance (p < 0.001).
Implications
The RDLI framework has significant implications for financial anomaly detection in decentralized networks, particularly in combating money laundering and other illicit activities. Its integration of domain-specific logic and contextual grounding enhances both accuracy and explainability, making it suitable for regulatory compliance and forensic investigations. RDLI's ability to operate effectively under extreme label scarcity also makes it valuable for real-world applications where reliable ground truth is limited.
View on arXiv

MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning

Zheng Fang, Wolfgang Mayer, Zeyu Zhang, Jian Wang, Hong-Yu Zhang, Wanli Li, Zaiwen Feng
  • A novel dataset was developed, comprising 155 tools across seven domains and 9,377 user queries, to evaluate tool selection performance in LLMs.
  • MetaToolAgent (MTA) employs a bi-level meta-learning framework to improve cross-tool generalization, addressing limitations of supervised fine-tuning and in-context learning.
  • MTA significantly outperforms baseline methods in selecting appropriate tools for unseen scenarios, demonstrating robust generalization capabilities.
  • The framework leverages dynamic optimization principles to refine tool-specific policies while capturing cross-tool patterns.
  • The study highlights the importance of scalable and flexible tool-learning systems for real-world applications of LLMs.
Read More
Abstract
This paper introduces MetaToolAgent (MTA), a meta-learning framework designed to enhance the ability of large language models (LLMs) to generalize tool usage across diverse and unseen tools. Tool learning is critical for LLMs to perform complex tasks by integrating external tools effectively. Existing approaches, such as supervised fine-tuning and in-context learning, face limitations in generalizing to novel tools and require significant resources or optimization efforts. To address these challenges, the authors constructed a comprehensive dataset spanning seven domains, containing 155 tools and 9,377 question-answer pairs, simulating realistic tool integration scenarios. MTA employs a bi-level meta-learning optimization framework that captures cross-tool patterns and dynamically refines tool-specific policies. Experimental results demonstrate that MTA significantly outperforms baseline methods in tool selection tasks, particularly for unseen tools, showcasing its potential for building scalable and flexible LLM systems capable of dynamic tool coordination.
Methodology
The authors propose a bi-level meta-learning framework (MTA) that decomposes tool optimization into hierarchical objectives. The outer-level optimization minimizes expected loss over unseen tasks, while the inner-level optimization refines policies for known tools. The dataset construction involved defining seven real-world scenarios, designing 155 tools, generating natural language queries, and synthesizing structured data for tool selection analysis. MTA uses recursive bi-level optimization to simulate dynamic tool ecosystems and enhance generalization capabilities.
Results
MetaToolAgent demonstrated superior performance in tool selection tasks compared to baseline methods, particularly for unseen tools. The framework effectively generalized across diverse tool sets and scenarios, showcasing its ability to dynamically coordinate tools without relying on static prompts or extensive fine-tuning. The dataset and algorithm evaluations validated the robustness and scalability of MTA in real-world tool-learning applications.
Implications
The proposed MetaToolAgent framework has significant implications for improving the adaptability and scalability of LLMs in real-world applications requiring dynamic tool usage. It can be applied in domains such as software development, education, IoT, and mobile applications, where LLMs need to interact with diverse tools. The approach also paves the way for more efficient and resource-friendly methods for tool learning, reducing reliance on extensive labeled datasets or costly prompt engineering.
View on arXiv

NeuroShield: A Neuro-Symbolic Framework for Adversarial Robustness

Ali Shafiee Sarvestani, Jason Schmidt, Arman Roohi
  • NeuroShield integrates symbolic reasoning into neural networks to improve adversarial robustness and interpretability.
  • Logical constraints based on domain knowledge (e.g., traffic sign attributes) are enforced during training via semantic and symbolic logic losses.
  • The framework achieves significantly higher adversarial robustness compared to standard adversarial training, with improvements of 18.1% and 17.35% for FGSM and PGD attacks, respectively.
  • NeuroShield maintains clean-sample accuracy while providing robustness gains, unlike many traditional adversarial defenses.
  • The method is computationally efficient, achieving strong results with a ResNet18 backbone trained for only 10 epochs, outperforming heavier transformer-based defenses.
Read More
Abstract
NeuroShield is a neuro-symbolic framework designed to enhance the adversarial robustness and interpretability of deep neural networks (DNNs). By integrating symbolic reasoning into the training pipeline, the framework enforces domain-specific logical constraints, such as rules about the shape and color of traffic signs, to improve model performance under adversarial attacks. The symbolic reasoning component acts as a semantic filter, ensuring that predictions adhere to logical relationships, even when adversarial perturbations are present. Using the GTSRB dataset, the authors demonstrate that NeuroShield significantly outperforms standard adversarial training methods in terms of robustness against FGSM and PGD attacks, achieving up to three times the robustness improvement compared to baseline methods, without sacrificing clean-sample accuracy. The framework also offers a lightweight alternative to transformer-based defenses, achieving comparable or superior results with simpler architectures like ResNet18.
Methodology
The authors integrate symbolic reasoning into the training process of DNNs by encoding domain knowledge as logical constraints. These constraints are enforced using semantic and symbolic logic losses, which guide the model to adhere to predefined rules during training. The framework is evaluated using adversarial training variants (FGSM-Neuro-Symbolic and PGD-Neuro-Symbolic) on the GTSRB dataset, with a ResNet18 backbone used for experiments.
Results
NeuroShield achieves a 18.1% and 17.35% improvement in adversarial accuracy over FGSM and PGD adversarial training baselines, respectively, representing a three-fold increase in robustness compared to standard adversarial training. These gains are achieved without reducing clean-sample accuracy. Additionally, the framework performs comparably or better than transformer-based defenses like LNL-MoEx, while requiring less computational overhead.
Implications
NeuroShield provides a promising approach for improving the safety and reliability of AI systems in safety-critical applications, such as autonomous driving. By combining symbolic reasoning with neural learning, the framework enhances both robustness to adversarial attacks and interpretability, paving the way for more trustworthy AI systems. Its computational efficiency also makes it a practical alternative to more resource-intensive defenses.
View on arXiv

Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays

Chang-Wei Shi, Shi-Shang Wang, Wu-Jun Li
  • OrLoMo is the first method to implement asynchronous distributed MSGD with local updates.
  • The method aggregates local momentum updates in an ordered fashion based on global iteration indices.
  • The convergence of OrLoMo is proven for non-convex problems under arbitrary delays without relying on restrictive assumptions.
  • OrLoMo reduces communication frequency while maintaining strong convergence properties.
  • Experiments show that OrLoMo outperforms synchronous and other asynchronous methods.
Read More
Abstract
This paper introduces Ordered Local Momentum (OrLoMo), a novel method for asynchronous distributed learning that incorporates momentum SGD (MSGD) with local updates. OrLoMo addresses the challenges of asynchronous distributed learning, particularly under arbitrary delays caused by heterogeneous computing capabilities in a cluster. Unlike existing methods, OrLoMo aggregates local momentum updates from workers in an ordered manner based on their global iteration indices. The authors provide a theoretical convergence proof for OrLoMo on non-convex optimization problems without relying on restrictive assumptions like bounded delays or gradients. Experimental results demonstrate that OrLoMo outperforms both its synchronous counterpart and other asynchronous methods in terms of convergence speed and performance.
Methodology
The proposed OrLoMo method allows each worker to run MSGD locally and then aggregates the local momentum updates at the server in an ordered manner based on global iteration indices. The authors provide a theoretical analysis proving convergence for non-convex optimization problems under arbitrary delays, without requiring assumptions like bounded delays or gradients. The method is evaluated empirically against synchronous and asynchronous distributed learning methods using large-scale deep learning tasks.
Results
OrLoMo achieves superior performance compared to its synchronous counterpart and other asynchronous methods. It demonstrates faster convergence and better generalization in experiments, validating its effectiveness in handling arbitrary delays in heterogeneous computing environments.
Implications
OrLoMo has significant implications for large-scale distributed deep learning, particularly in environments with heterogeneous computing resources. By addressing the challenges of asynchronous learning under arbitrary delays, it enables more efficient training of deep models while reducing communication overhead. This makes it particularly relevant for applications in federated learning and distributed systems with varying computational capabilities.
View on arXiv

Orthogonalized Policy Optimization: Decoupling Sampling Geometry from Optimization Geometry in RLHF

Wang Zixian
  • Existing RLHF alignment methods conflate sampling geometry and optimization geometry, leading to instability and gradient saturation.
  • OPO decouples these two axes using α-divergence for sampling geometry and Pearson χ² divergence for optimization geometry.
  • The proposed framework avoids the numerical instability and gradient saturation caused by KL divergence in high-confidence regimes.
  • OPO provides a unified perspective on existing alignment methods and introduces a stable, well-conditioned optimization objective.
  • The framework enables independent control over sample weighting and penalty structure, improving robustness in reasoning-oriented training.
Read More
Abstract
This paper introduces Orthogonalized Policy Optimization (OPO), a novel framework for aligning large language models (LLMs) in reinforcement learning from human feedback (RLHF). The author identifies a key issue in existing alignment methods, such as PPO, DPO, and IPO, which conflate two independent design choices: sampling geometry (which determines the dominance of samples in the gradient signal) and optimization geometry (which governs how deviations in value are penalized). This conflation, often implemented through KL divergence, leads to numerical instability and gradient saturation, particularly in high-confidence regimes. OPO addresses this by decoupling these two axes, using α-divergence-based sampling weights for sampling geometry and Pearson χ²-induced quadratic regularization for optimization geometry. This decoupling results in a well-conditioned objective with linear gradient dynamics, stable optimization, and robust peak-seeking behavior. The framework unifies existing methods under a generalized perspective and provides a principled foundation for more stable and effective alignment in RLHF.
Methodology
The author formalizes alignment objectives as the minimization of a generalized distance between policy energy and target energy, parameterized by α-divergence-based sampling weights and Bregman divergence-based value metrics. OPO introduces a new objective function that operates in ratio coordinates, where the Pearson χ² divergence induces a quadratic penalty. This decoupling allows independent control over sampling geometry (via α) and optimization geometry (via a quadratic regularization term). The framework is analyzed theoretically to demonstrate its stability and linear gradient dynamics.
Results
The proposed OPO framework achieves stable optimization without gradient saturation, even in high-confidence regimes. It maintains peak-seeking behavior while avoiding the numerical instability associated with KL divergence. The theoretical analysis shows that OPO has a unique stable equilibrium and linear gradient dynamics, providing a robust foundation for RLHF alignment.
Implications
OPO has significant implications for improving the stability and robustness of RLHF in training large language models. By decoupling sampling and optimization geometries, it enables more precise control over alignment objectives, potentially enhancing the reasoning capabilities of LLMs. This framework could be applied to various RLHF tasks, including preference optimization, reward modeling, and fine-tuning of LLMs for specific applications.
View on arXiv

Press Start to Charge: Videogaming the Online Centralized Charging Scheduling Problem

Alireza Ghahtarani, Martin Cousineau, Amir-massoud Farahmand, Jorge E. Mendoza
  • The OCCSP is reformulated as a video game, reducing model complexity and enabling more effective learning-based solutions.
  • The gamified framework supports convolutional image models, which scale better and generalize more effectively than vector-based approaches.
  • Learning-based policies trained with DAgger outperform heuristic baselines, vector-based methods, and supervised learning agents in load balancing and robustness.
  • The proposed methods achieve significant economic benefits, reducing system costs by tens of millions of dollars annually in a real-world case study for the Greater Montréal Area.
  • Gamification provides a flexible modeling paradigm that bridges optimization and machine learning for dynamic scheduling problems.
Read More
Abstract
This paper addresses the Online Centralized Charging Scheduling Problem (OCCSP), where a central authority must schedule the charging of dynamically arriving electric vehicles (EVs) in real time to balance grid load while adhering to capacity constraints. The authors propose a novel gamification-based approach, modeling the problem as a video game where charging blocks are placed on a grid under temporal and capacity constraints. This gamified representation reduces model complexity and improves generalization compared to traditional vector-based formulations. The authors develop a range of solution methods, including heuristic policies, mixed-integer programming benchmarks, and learning-based approaches trained using expert demonstrations and refined with Dataset Aggregation (DAgger). Experimental results, based on real-world data from the Greater Montréal Area, demonstrate that the proposed gamified learning framework significantly outperforms traditional methods in load balancing, robustness, and economic efficiency. The study highlights the potential of gamification and machine learning to address complex, dynamic optimization problems in EV charging and grid management.
Methodology
The authors gamify the OCCSP by representing it as a sequential placement game on a grid, enabling the use of convolutional image models. They develop and compare multiple solution methods, including mixed-integer programming benchmarks, heuristic policies, and learning-based approaches trained with expert demonstrations and improved using Dataset Aggregation (DAgger). Extensive experiments are conducted using real-world EV arrival patterns and utility cost data from the Greater Montréal Area.
Results
The gamified learning framework achieves superior load balancing compared to heuristic and vector-based methods. The image-to-movement model trained with DAgger consistently outperforms baselines in both performance and robustness. The proposed methods also demonstrate significant economic benefits, reducing system costs by tens of millions of dollars annually and potentially delaying costly grid infrastructure upgrades.
Implications
The study demonstrates the potential of gamification and machine learning to address complex, real-time optimization problems in EV charging and grid management. The proposed methods can enhance grid stability, reduce operational costs, and delay infrastructure investments, making them highly relevant for utility companies and policymakers in regions with growing EV adoption.
View on arXiv

Principled Latent Diffusion for Graphs via Laplacian Autoencoders

Antoine Siraudin, Christopher Morris
  • LG-Flow introduces a latent graph diffusion framework that reduces computational complexity from quadratic to linear by operating in a compressed latent space.
  • The proposed Laplacian Graph Variational Autoencoder (LG-VAE) ensures provable near-lossless reconstruction of adjacency matrices, addressing a key challenge in graph generation.
  • The framework employs a Diffusion Transformer (DiT) with flow matching in the latent space, achieving competitive generative performance while maintaining scalability.
  • LG-Flow achieves significant speed-ups (up to 1000×) compared to traditional graph diffusion models, enabling the generation of larger and more complex graphs.
  • The approach is modular and unifies the generation of undirected graphs and DAGs, making it applicable to diverse domains such as molecular design and circuit synthesis.
Read More
Abstract
This paper introduces LG-Flow, a novel latent graph diffusion framework designed to address the inefficiencies and scalability challenges of existing graph generative models. Current graph diffusion models operate directly in graph space, leading to quadratic computational complexity and inefficiencies in modeling sparse graphs. LG-Flow overcomes these limitations by employing a Laplacian Graph Variational Autoencoder (LG-VAE) to compress graphs into a low-dimensional latent space while ensuring near-lossless reconstruction of adjacency matrices. This latent representation scales linearly with the number of nodes, enabling efficient and scalable graph generation. In the latent space, a Diffusion Transformer (DiT) with flow matching is used for generative modeling. LG-Flow achieves competitive performance on synthetic and real-world benchmarks, with speed-ups ranging from 10× to 1000× compared to traditional graph diffusion models. The framework is modular, allowing the use of generic denoising architectures, and unifies the generation of both undirected graphs and directed acyclic graphs (DAGs).
Methodology
LG-Flow combines a Laplacian Graph Variational Autoencoder (LG-VAE) with a Diffusion Transformer (DiT) in a latent space. LG-VAE maps each node to a fixed-dimensional embedding, ensuring adjacency matrix recoverability and linear scaling with graph size. The DiT is trained using flow matching to perform efficient and expressive graph generation in the latent space. The framework is designed to maintain permutation equivariance and structural validity of generated graphs.
Results
LG-Flow achieves competitive performance on both synthetic and real-world graph generation benchmarks. It significantly reduces computational complexity, achieving speed-ups of up to 1000× compared to traditional graph diffusion models. The framework demonstrates scalability to larger graphs while maintaining high reconstruction fidelity and generative quality.
Implications
LG-Flow enables scalable and efficient graph generative modeling, making it suitable for applications in domains such as molecular design, combinatorial optimization, and circuit synthesis. Its modular design allows for the reuse of generic denoising architectures, bridging the gap between graph generation and advancements in other modalities like image generation.
View on arXiv

Self-Improvement as Coherence Optimization: A Theoretical Account

Tianyi Qiu, Ahmed Hani Ismail, Zhonghao He, Shi Feng
  • The paper unifies methods like debate, bootstrap, and ICM under the framework of coherence optimization, explaining their shared mechanism for feedback-free self-improvement.
  • Coherence optimization is shown to be equivalent to description-length regularization and is proven to be the optimal regularization scheme for semi-supervised learning when using a pretrained model as a prior.
  • The authors propose a scalable algorithm based on Gibbs sampling to efficiently optimize coherence, which converges to high-coherence policies under mild conditions.
  • Preliminary experiments demonstrate that coherence-based regularizers outperform other methods, such as LLM-as-a-judge, in terms of truthfulness and generalization.
  • The theoretical framework predicts when feedback-free self-improvement methods are likely to succeed or fail.
Read More
Abstract
This paper provides a theoretical framework to explain how language models can improve their accuracy without external supervision, a phenomenon observed in methods like debate, internal coherence maximization (ICM), and iterative bootstrap. The authors propose that these methods are all special cases of 'coherence optimization,' which involves finding a context-to-behavior mapping that maximizes joint predictability and compressibility. They demonstrate that coherence optimization is equivalent to description-length regularization and prove that it is the optimal regularization scheme for semi-supervised learning when the regularizer is derived from a pretrained model. The paper also introduces a scalable algorithm based on Gibbs sampling to efficiently optimize coherence. Preliminary experiments validate the theoretical claims, showing that coherence-based regularizers outperform other methods, such as using large language models (LLMs) as judges, in terms of truthfulness and generalization.
Methodology
The authors develop a theoretical framework that formalizes coherence optimization as a description-length regularization problem. They prove its optimality for semi-supervised learning using a pretrained model as a prior. To address the computational intractability of directly optimizing coherence, they propose a Gibbs sampling-based algorithm that iteratively updates context-to-behavior mappings to maximize coherence. Preliminary experiments were conducted to validate the theoretical claims and compare the performance of coherence-based regularizers with other methods.
Results
The paper demonstrates that coherence optimization is the optimal regularization scheme for semi-supervised learning with a pretrained prior, maximizing the worst-case lower bound on expected accuracy. Preliminary experiments show that coherence-based regularizers outperform other methods, such as LLM-as-a-judge, in terms of truthfulness and generalization. The proposed Gibbs sampling algorithm effectively converges to high-coherence policies, making it a scalable and practical solution for coherence optimization.
Implications
The findings provide a unified theoretical explanation for feedback-free self-improvement methods, offering insights into their mechanisms and limitations. The proposed coherence optimization framework and algorithm could be applied to improve the performance of language models in semi-supervised learning tasks, potentially reducing the need for large-scale labeled datasets. This work also contributes to the broader field of AI alignment by providing a formal basis for scalable oversight methods that enhance model capabilities without external supervision.
View on arXiv

Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property Prediction

Long D. Nguyen, Kelin Xia, Binh P. Nguyen
  • MI-MoE introduces a multiscale mixture of experts framework that models molecular interactions across short-, mid-, and long-range geometric regimes.
  • A topology-aware gating mechanism, based on filtration-derived descriptors like persistent homology, adaptively routes inputs to the appropriate experts.
  • The proposed framework is compatible with and improves the performance of multiple 3D molecular GNN architectures.
  • MI-MoE achieves state-of-the-art results on diverse molecular and polymer property prediction benchmarks, including both regression and classification tasks.
  • The approach highlights the importance of integrating multiscale geometric representations with global topological cues in molecular graph learning.
Read More
Abstract
This paper introduces the Multiscale Interaction Mixture of Experts (MI-MoE), a novel framework for molecular property prediction that leverages topology-aware multiscale modeling. The authors address limitations in existing 3D molecular graph neural networks (GNNs), which often rely on rigid, globally fixed interaction heuristics such as single distance cutoffs and maximum neighbor limits. MI-MoE introduces a set of specialized experts that operate across multiple geometric regimes, defined by distinct distance cutoffs, and a topology-aware gating mechanism that adaptively routes inputs to these experts. The gating mechanism uses filtration-based topological descriptors, including persistent homology features, to capture how molecular connectivity evolves across interaction radii. This approach enables MI-MoE to adaptively model short-, mid-, and long-range molecular interactions, improving the representation of complex molecular structures. Extensive experiments on diverse molecular and polymer property prediction benchmarks demonstrate that MI-MoE consistently outperforms state-of-the-art models across both regression and classification tasks. The framework is also shown to be a plug-and-play module that enhances the performance of multiple 3D molecular GNN backbones.
Methodology
The MI-MoE framework consists of two main components: (1) a multiscale mixture of experts module, where each expert operates on molecular graphs induced by different distance cutoffs to capture interactions at various spatial scales, and (2) a topology-aware gating network that uses filtration-based topological descriptors, such as persistent homology features, to adaptively route inputs to the appropriate experts. The framework is designed as a plug-in module that can be integrated into existing 3D molecular GNNs.
Results
MI-MoE consistently outperforms state-of-the-art models and single-scale baselines across multiple molecular and polymer property prediction benchmarks. It demonstrates superior performance in both regression and classification tasks, showcasing its ability to generalize across diverse datasets. The framework also improves the performance of various 3D molecular GNN backbones, validating its versatility and effectiveness.
Implications
The MI-MoE framework has significant implications for drug discovery and materials science, where accurate molecular property prediction is critical. By enabling adaptive multiscale interaction modeling, the approach can improve the efficiency and accuracy of virtual screening and molecular optimization. Additionally, the integration of topological features into molecular graph learning could inspire further research into topology-aware deep learning methods for other scientific domains.
View on arXiv

Trend-Adjusted Time Series Models with an Application to Gold Price Forecasting

Sina Kazemdehbashi
  • The TATS model reframes time series forecasting as a two-part task: trend prediction and value forecasting.
  • The model integrates a binary classifier for trend prediction, a value forecaster (e.g., LSTM), and an adjustment function to refine predictions.
  • TATS outperforms standard LSTM and Bi-LSTM models in forecasting daily gold prices, achieving lower forecasting errors.
  • The study emphasizes the importance of trend detection accuracy as a complementary evaluation metric for time series models.
  • The approach is particularly suited for volatile and non-stationary time series, such as financial data.
Read More
Abstract
This paper introduces the Trend-Adjusted Time Series (TATS) model, a novel approach to time series forecasting that integrates trend prediction as a subtask within the broader forecasting framework. The model consists of three components: a trend predictor (binary classifier), a value forecaster (e.g., LSTM or Bi-LSTM), and an adjustment function that refines the forecasted value based on the predicted trend. The TATS model is applied to the challenging task of forecasting daily gold prices, a volatile financial time series. Experimental results demonstrate that TATS achieves superior performance compared to standard LSTM and Bi-LSTM models, with significantly lower forecasting errors. The study also highlights the limitations of traditional evaluation metrics like Mean Squared Error (MSE) and Mean Absolute Error (MAE) in assessing time series models, advocating for the inclusion of trend detection accuracy as a complementary metric. The proposed methodology provides a unified perspective on trend prediction and value forecasting, offering a robust framework for handling non-stationary and volatile time series data.
Methodology
The TATS model combines three components: (1) a trend predictor using binary classification models (e.g., Logistic Regression, Random Forest, XGBoost) to predict the directional movement of the time series, (2) a value forecaster using models like LSTM or Bi-LSTM to predict the next time step's value, and (3) an adjustment function that modifies the forecasted value based on the predicted trend. The model is validated through theoretical analysis and empirical evaluation on daily gold price data.
Results
The TATS model consistently outperforms standard LSTM and Bi-LSTM models in forecasting daily gold prices, achieving significantly lower forecasting errors. Additionally, the inclusion of trend detection accuracy as an evaluation metric highlights the model's ability to effectively capture directional movements in the time series.
Implications
The TATS model provides a robust framework for forecasting volatile and non-stationary time series, with potential applications in finance, healthcare, marketing, and other domains where accurate trend and value predictions are critical. Its ability to integrate trend prediction into the forecasting process could improve decision-making in areas like investment strategies, risk management, and resource allocation.
View on arXiv

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

Megha Thukral, Cyrus Tanade, Simon A. Lee, Juhyeon Lee, Hao Zhou, Keum San Chun, Migyeong Gwak, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Mehrab Bin Morshed, Subramaniam Venkatraman, Sharanya Arcot Desai
  • Introduces Masked Multiscale Reconstruction (MMR), a self-supervised framework for PPG representation learning using wavelet-based multiresolution decomposition.
  • Pretrained on a large-scale dataset of 17 million PPG segments from 32,000 smartwatch users, capturing hierarchical time-frequency features.
  • Achieves state-of-the-art performance on 17 out of 19 diverse health-related tasks, demonstrating robust generalization.
  • Wavelet-based representations are shown to capture physiologically grounded features, validated through extensive ablation studies.
  • Highlights the potential of wavelet-driven approaches for generalizable PPG foundation models in digital health applications.
Read More
Abstract
This paper introduces a novel self-supervised learning framework, Masked Multiscale Reconstruction (MMR), for pretraining photoplethysmography (PPG) foundation models. PPG signals, commonly collected via wearable devices, are inherently multi-scale, with physiological information encoded across both fine-grained waveform morphology and broader rhythmic dynamics. The proposed MMR framework leverages wavelet-based multiresolution decomposition to capture these hierarchical time-frequency features. Specifically, the model is trained to reconstruct masked wavelet coefficients from PPG signals, encouraging the learning of rich, physiologically meaningful embeddings. The authors pretrain their model on a large-scale dataset of approximately 17 million 10-second PPG segments from 32,000 smartwatch users, totaling 48,000 hours of data. The pretrained model demonstrates state-of-the-art performance on 17 out of 19 downstream health-related tasks, outperforming or matching existing PPG foundation models and other self-supervised baselines. Ablation studies further highlight the importance of wavelet-based representations and the impact of design choices such as wavelet family, decomposition scales, and patch size. This work underscores the potential of wavelet-driven approaches for building generalizable and robust PPG foundation models, enabling advancements in digital health applications such as cardiovascular monitoring and stress detection.
Methodology
The authors propose a self-supervised pretraining framework, Masked Multiscale Reconstruction (MMR), which uses wavelet-based multiresolution decomposition of PPG signals. The model employs a transformer encoder to reconstruct randomly masked wavelet coefficients across multiple time-frequency scales. The training dataset consists of 17 million 10-second PPG segments from 32,000 smartwatch users, processed using the Discrete Wavelet Transform (DWT). The model is evaluated on 19 downstream health-related tasks, with systematic ablations to assess the impact of design choices such as wavelet family, decomposition scales, and patch size.
Results
The MMR framework achieves state-of-the-art performance on 17 out of 19 health-related tasks, outperforming or matching existing PPG foundation models, time-series foundation models, and other self-supervised baselines. The learned embeddings capture robust, physiologically meaningful features, as demonstrated through extensive analysis and ablation studies. The results validate the effectiveness of wavelet-based representations for generalizable PPG modeling.
Implications
The proposed MMR framework has significant implications for digital health, particularly in wearable technology and continuous cardiovascular monitoring. By leveraging wavelet-based multiscale representations, the model enables robust and generalizable PPG foundation models, which can support a wide range of health-related applications, including blood pressure estimation, arrhythmia detection, and stress monitoring. This work also highlights the potential of wavelet-driven approaches for advancing self-supervised learning in biosignal modeling.
View on arXiv

Who Should Have Surgery? A Comparative Study of GenAI vs Supervised ML for CRS Surgical Outcome Prediction

Sayeed Shafayet Chowdhury, Snehasis Mukhopadhyay, Shiaofen Fang, Vijay R. Ramakrishnan
  • This is the first study to comprehensively evaluate generative AI (GenAI) systems for predictive clinical outcome modeling using structured pre-operative data in CRS.
  • Supervised ML models, particularly a custom multi-layer perceptron (MLP), achieved superior performance (85% accuracy) compared to GenAI models in predicting surgical outcomes.
  • GenAI models underperformed in discrimination and calibration but provided justifications that aligned with clinician heuristics and ML feature importance.
  • The study introduces a standardized evaluation protocol for comparing tabular ML and GenAI systems in clinical decision-making tasks.
  • The authors propose an ML-first, GenAI-augmented workflow for surgical candidate triage and shared decision-making.
Read More
Abstract
This paper investigates the use of generative AI (GenAI) models and supervised machine learning (ML) models for predicting surgical outcomes in chronic rhinosinusitis (CRS) patients. The study focuses on pre-operative prediction of meaningful clinical improvement, defined as a ≥8.9-point reduction in the SNOT-22 score at six months post-surgery. Using a prospectively collected dataset of CRS patients who underwent surgery, the authors benchmarked multiple GenAI systems (e.g., ChatGPT, Claude, Gemini, Perplexity) against traditional supervised ML models (logistic regression, tree ensembles, and a custom multi-layer perceptron). The results show that the supervised ML models, particularly the MLP, outperformed GenAI models in terms of accuracy, calibration, and decision-curve net benefit. However, GenAI models provided justifications that aligned with clinical heuristics and the feature importance rankings of the ML models, suggesting their potential as complementary tools for enhancing transparency and shared decision-making. The authors propose an ML-first, GenAI-augmented workflow for clinical decision support in CRS surgical planning. They also introduce a reproducible evaluation framework for comparing tabular ML and GenAI systems in clinical applications.
Methodology
The study used a prospectively collected dataset of CRS patients who underwent endoscopic sinus surgery (ESS). The authors trained and evaluated supervised ML models (logistic regression, tree ensembles, and a custom multi-layer perceptron) on structured pre-operative data to predict whether patients would achieve a clinically meaningful improvement in SNOT-22 scores. They also tested multiple generative AI models (ChatGPT, Claude, Gemini, Perplexity) using standardized zero-shot prompts to generate binary recommendations with confidence scores. The models were compared on metrics such as accuracy, calibration, and decision-curve net benefit. Subgroup analyses and feature importance evaluations were also conducted.
Results
The custom MLP model achieved the highest accuracy (85%) and demonstrated superior calibration and decision-curve net benefit compared to other ML models and GenAI systems. GenAI models underperformed in terms of discrimination and calibration but provided explanations that aligned with clinical reasoning and the MLP's feature importance rankings. Key predictive features included baseline SNOT-22 scores, CT/endoscopy severity, polyp phenotype, and comorbidities such as psychological and pain conditions.
Implications
The findings suggest that supervised ML models are more reliable for primary triage in CRS surgical decision-making, while GenAI models can serve as complementary tools to enhance transparency and support shared decision-making. The proposed ML-first, GenAI-augmented workflow has the potential to improve clinical decision support systems by combining high accuracy with explainability. The standardized evaluation framework introduced in this study could also guide future comparisons of ML and GenAI systems in other clinical domains.
View on arXiv