AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
68
Papers today
8h
Update frequency
7
Days of history
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Generative Models
Optimization
- Introduces ProteÃna-Complexa, a unified framework for protein binder design.
- Constructs a large-scale dataset, Teddymer, for effective pretraining.
- Achieves state-of-the-art performance in binder design benchmarks.
- Utilizes advanced test-time optimization techniques for improved efficiency.
Read more
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Summary
This paper presents ProteÃna-Complexa, a novel framework for atomistic protein binder design that integrates generative modeling and inference-time optimization. The authors argue against the traditional dichotomy between generative and hallucination methods in binder design, proposing a unified approach that leverages a flow-based generative model and a large-scale dataset of synthetic binder-target pairs, called Teddymer. This dataset is constructed from predicted protein structures and domain-domain interactions, allowing for effective pretraining. The framework employs advanced test-time optimization strategies, such as beam search and Monte Carlo Tree Search, to enhance the quality of generated binders. The results demonstrate that ProteÃna-Complexa achieves state-of-the-art performance in computational binder design benchmarks, outperforming existing generative and hallucination methods in terms of success rates and efficiency. The framework also shows versatility by optimizing hydrogen bonds and extending to small molecule targets and enzyme design tasks, indicating its broad applicability in computational biology and drug discovery.
Methodology
The methodology involves creating a flow-based generative model for binder design, utilizing a large dataset of synthetic binder-target pairs derived from predicted protein structures. The framework employs a staged training scheme and incorporates advanced test-time optimization strategies, including beam search and Monte Carlo Tree Search, to enhance the generation of high-quality binders.
Results
ProteÃna-Complexa sets a new benchmark in computational binder design, achieving significantly higher in-silico success rates compared to existing generative and hallucination methods. The framework also demonstrates effective optimization of hydrogen bonds and adaptability to small molecule targets and enzyme design tasks, outperforming prior approaches in these areas.
Implications
The findings suggest that ProteÃna-Complexa can significantly advance the field of protein design, with potential applications in drug discovery and therapeutic development. Its ability to efficiently generate high-quality binders and optimize binding interactions could lead to more effective treatments and a deeper understanding of protein interactions.
A Perturbation Approach to Unconstrained Linear Bandits
Optimization
Theory
- The perturbation approach reduces uBLO to a standard OLO problem.
- Expected-regret guarantees are derived for comparator-adaptive OLO algorithms.
- Dynamic regret analysis achieves optimal √PT dependencies without prior knowledge.
- First high-probability guarantees for static and dynamic regret in uBLO are established.
Read more
A Perturbation Approach to Unconstrained Linear Bandits
Summary
This paper revisits the perturbation-based approach to Unconstrained Bandit Linear Optimization (uBLO) originally proposed by Abernethy et al. (2008). The authors demonstrate that this approach can effectively reduce Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem in the unconstrained setting. The framework presented improves upon previous work by deriving expected-regret guarantees when combined with comparator-adaptive OLO algorithms, providing new insights into the influence of various adversarial models on comparator-adaptive rates. The authors extend their analysis to dynamic regret, achieving optimal √PT path-length dependencies without prior knowledge of PT. They also introduce the first high-probability guarantees for both static and dynamic regret in uBLO and discuss lower bounds on static regret, proving the Ω(√dT) rate for adversarial linear bandits on the unit Euclidean ball, which is of independent interest.
Methodology
The authors utilize a modular reduction technique that allows the use of any OLO learner under bandit feedback by providing suitably perturbed loss estimates. They analyze the performance of their approach in terms of expected and dynamic regret, focusing on the implications of different adversarial models and the structure of the comparator sequence.
Results
The paper presents new expected-regret guarantees for uBLO when using comparator-adaptive OLO algorithms. It establishes high-probability guarantees for both static and dynamic regret, achieving optimal dependencies on the path length without prior knowledge of the comparator sequence. Additionally, it proves lower bounds on static regret, confirming the Ω(√dT) rate for adversarial linear bandits.
Implications
The findings have significant implications for online learning and decision-making under uncertainty, particularly in scenarios where exploration must be conducted under budget constraints. The results can enhance the design of algorithms in various applications, including finance, resource allocation, and adaptive systems.
FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning
Federated Learning
Graph Learning
- FedDES provides a decentralized approach to personalized federated learning, allowing for model heterogeneity.
- The use of a Graph Neural Network enables dynamic ensemble selection tailored to individual test samples.
- FedDES effectively suppresses contributions from non-beneficial peer models, enhancing performance and reducing negative transfer.
- The framework supports asynchronous peer-to-peer communication, avoiding bottlenecks associated with centralized coordination.
Read more
FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning
Summary
The paper introduces FedDES, a novel decentralized framework for personalized federated learning (pFL) that addresses the challenges posed by statistical heterogeneity in client data distributions. Traditional pFL methods often treat peer contributions uniformly, which can lead to negative transfer and suboptimal performance. FedDES enhances personalization by implementing dynamic ensemble selection through a Graph Neural Network (GNN) that models interactions between data samples and candidate classifiers. This GNN dynamically selects and weights peer models for each test query, allowing for instance-level personalization. The framework operates in a decentralized manner, enabling heterogeneous model architectures and peer-to-peer communication. Experimental results on CIFAR-10 and real-world ICU healthcare data demonstrate that FedDES significantly outperforms existing pFL baselines in non-IID settings, effectively mitigating negative transfer and improving prediction accuracy.
Methodology
FedDES employs a Graph Neural Network to create a heterogeneous graph representing the relationships between data samples and classifiers. The GNN processes this graph to produce sample embeddings, which are used to dynamically select and weight classifiers for each test sample, ensuring that only the most competent models contribute to the final prediction.
Results
The experiments conducted on CIFAR-10 and ICU healthcare data indicate that FedDES outperforms traditional pFL methods, particularly in non-IID scenarios. The dynamic ensemble selection mechanism significantly reduces the risk of negative transfer, leading to improved accuracy and robustness in model predictions.
Implications
FedDES has potential applications in various fields requiring personalized models, such as healthcare diagnostics, where patient data can vary significantly. The framework's ability to adaptively select models based on individual sample characteristics could enhance the effectiveness of federated learning in real-world scenarios with diverse data distributions.
Liquid Networks with Mixture Density Heads for Efficient Imitation Learning
Robotics
Efficient ML
Generative Models
- Liquid neural networks with mixture density heads outperform diffusion policies in imitation learning tasks.
- Liquid policies require fewer parameters while achieving significantly lower prediction errors and faster inference times.
- The proposed shared-backbone comparison protocol ensures a fair evaluation of policy head performance.
- Liquid models show increased robustness, particularly in scenarios with limited training data.
Read more
Liquid Networks with Mixture Density Heads for Efficient Imitation Learning
Summary
This paper presents a comparative study of liquid neural networks equipped with mixture density heads against diffusion policies in the context of imitation learning across three robotics tasks: Push-T, RoboMimic Can, and PointMaze. The authors introduce a fair shared-backbone comparison protocol that ensures identical inputs, training budgets, and evaluation settings to isolate the effects of policy heads. The findings reveal that liquid policies, which utilize approximately half the parameters of diffusion policies (around 4.3M compared to 8.6M), achieve a 2.4× reduction in offline prediction error and are 1.8× faster during inference. Additionally, liquid models demonstrate superior robustness in sample efficiency experiments, particularly in low and medium data regimes. Closed-loop results indicate that while offline density modeling contributes to deployment success, it does not fully dictate closed-loop performance. The study concludes that liquid recurrent multimodal policies offer a compact and efficient alternative to iterative denoising methods in imitation learning, leveraging continuous-time modeling to enhance sample efficiency and deployment simplicity.
Methodology
The authors employ liquid neural networks, specifically Continuous-time (CfC) architectures, combined with mixture density network heads to model multimodal action distributions. They conduct experiments across three robotics tasks using a shared-backbone protocol to ensure fair comparisons. The performance of liquid networks is evaluated against diffusion policies in terms of parameter efficiency, prediction accuracy, inference speed, and sample efficiency.
Results
Liquid networks demonstrated approximately half the parameter count of diffusion policies while achieving 2.4× lower offline prediction error and 1.8× faster inference. In sample efficiency tests, liquid models consistently outperformed diffusion policies, especially in low-data and medium-data scenarios. Closed-loop performance results were directionally consistent with offline evaluations, indicating the practical advantages of liquid policies in real-world applications.
Implications
The findings suggest that liquid neural networks could serve as a more efficient and practical approach for imitation learning in robotics, particularly in environments with limited data. This could lead to advancements in robotic control systems, enabling faster and more accurate learning and deployment of robotic tasks.
Physics-Embedded Feature Learning for AI in Medical Imaging
Interpretability
- Introduction of PhysNet, a physics-embedded deep learning framework for medical imaging.
- Integration of tumor growth dynamics into the feature learning process of CNNs.
- Dual branch architecture enables simultaneous tumor classification and learning of tumor behavior.
- PhysNet outperforms state-of-the-art models in classification tasks on brain MRI datasets.
Read more
Physics-Embedded Feature Learning for AI in Medical Imaging
Summary
This paper introduces PhysNet, a novel physics-embedded deep learning framework designed to enhance the interpretability and robustness of AI in medical imaging, specifically for brain tumor classification. Traditional deep learning models often function as black boxes, lacking consideration for the physical processes governing tumor growth, which can lead to issues with interpretability and clinical trust. PhysNet addresses these limitations by integrating a reaction-diffusion model of tumor growth directly into the feature learning process of a convolutional neural network (CNN). This integration allows for the simultaneous classification of multi-class tumors while learning a latent tumor density field and biologically relevant parameters such as tumor diffusion and growth rates. The architecture employs a dual branch design that facilitates the learning of tumor behavior alongside accurate predictions. Experimental results on a large brain MRI dataset demonstrate that PhysNet significantly outperforms several state-of-the-art deep learning baselines, achieving superior classification accuracy and F1-scores. Furthermore, the model provides interpretable latent representations and learned parameters that align with established medical knowledge, showcasing the potential of physics-embedded representation learning in developing trustworthy medical AI systems.
Methodology
The PhysNet framework incorporates a reaction-diffusion model of tumor growth within the intermediate feature representations of a ResNet backbone. This approach allows for end-to-end training that jointly optimizes for multi-class tumor classification and the learning of latent tumor density fields and physical parameters. The architecture is designed to embed physical principles directly into the feature learning process rather than applying them as post hoc constraints.
Results
PhysNet demonstrated superior performance compared to multiple state-of-the-art deep learning models, including MobileNetV2, VGG16, and VGG19, achieving higher classification accuracy and F1-scores on a large brain MRI dataset. The model also produced interpretable latent representations and learned parameters consistent with established medical knowledge.
Implications
The development of PhysNet suggests a pathway toward more interpretable and clinically trustworthy AI systems in medical imaging. By embedding physical principles into deep learning frameworks, it enhances the robustness and transparency of predictions, which is crucial for high-stakes medical applications.
Automating Early Disease Prediction Via Structured and Unstructured Clinical Data
NLP
- Introduces an automated methodology for early disease prediction using structured and unstructured data.
- Utilizes natural language processing to extract relevant information from discharge reports.
- Demonstrates improved predictive accuracy for atrial fibrillation progression compared to traditional methods.
- Addresses challenges of missing or incomplete data in electronic health records.
Read more
Automating Early Disease Prediction Via Structured and Unstructured Clinical Data
Summary
This paper introduces a fully automated methodology for early disease prediction in clinical settings, utilizing both structured and unstructured clinical data, specifically discharge reports. The proposed pipeline enhances the prediction process through three main steps: cohort selection, dataset generation, and outcome labeling. By employing natural language processing (NLP) techniques to analyze discharge reports, the authors can efficiently identify relevant patient cohorts, enrich structured datasets with additional clinical variables, and generate high-quality outcome labels without manual intervention. This approach effectively addresses the common issue of missing or incomplete data in electronic health records (EHR), capturing clinically relevant information that is often overlooked. The methodology is evaluated in predicting the progression of atrial fibrillation (AF), demonstrating that predictive models trained on datasets enriched with discharge report information achieve higher accuracy and correlation with actual outcomes compared to those relying solely on structured EHR data. Furthermore, the models outperform traditional clinical scoring systems. The findings suggest that automating the integration of unstructured clinical text can streamline early prediction studies, enhance data quality, and improve the reliability of predictive models for clinical decision-making.
Methodology
The methodology involves an end-to-end pipeline that automates cohort selection, dataset generation, and outcome labeling by processing unstructured discharge reports with NLP techniques. This allows for the enrichment of structured EHR data and the generation of high-quality labels without manual intervention.
Results
The study shows that predictive models trained on datasets enriched with information from discharge reports achieve higher accuracy and correlation with true outcomes than models trained only on structured EHR data. These models also surpass traditional clinical scores for predicting atrial fibrillation progression.
Implications
The findings indicate that automating the integration of unstructured clinical text can significantly improve early disease prediction efforts, enhance data quality, and support better clinical decision-making, potentially leading to improved patient outcomes and reduced healthcare costs.
Symbolic Density Estimation: A Decompositional Approach
Theory
Interpretability
- Introduction of AI-Kolmogorov for Symbolic Density Estimation (SymDE).
- Multi-stage pipeline includes clustering, nonparametric density estimation, and symbolic regression.
- Demonstrated efficacy on synthetic and high-energy physics-related datasets.
- Addresses challenges of validity constraints, dimensionality, and complex expression discovery.
Read more
Symbolic Density Estimation: A Decompositional Approach
Summary
The paper introduces AI-Kolmogorov, a novel framework for Symbolic Density Estimation (SymDE), which leverages symbolic regression to produce interpretable models for density estimation tasks. Traditional density estimation methods either rely on parametric assumptions, which can limit expressiveness, or nonparametric approaches, which often lack interpretability. AI-Kolmogorov addresses these challenges through a multi-stage pipeline that includes problem decomposition via clustering and probabilistic graphical models, nonparametric density estimation, support estimation, and symbolic regression on the density estimate. The authors demonstrate the effectiveness of their approach on various datasets, including synthetic mixture models and distributions relevant to high-energy physics, showing that AI-Kolmogorov can uncover underlying distributions and provide insights into their mathematical representations.
Methodology
The methodology involves a multi-stage pipeline: (1) problem decomposition using clustering or probabilistic graphical models, (2) nonparametric density estimation, (3) support estimation, and (4) applying symbolic regression to the density estimate. The framework utilizes evolutionary algorithms, particularly PySR, to explore mathematical expressions that describe the probability distributions.
Results
The experiments conducted show that AI-Kolmogorov effectively discovers underlying distributions and provides interpretable mathematical expressions for various datasets, including multivariate normal distributions and complex distributions from high-energy physics. The results indicate that the framework can yield valid probability densities while maintaining interpretability.
Implications
The implications of this work extend to fields requiring interpretable models for density estimation, such as high-energy physics, finance, and any domain where understanding the underlying distribution is crucial. The framework could enhance model transparency and facilitate scientific discovery by providing clear mathematical representations of complex data.
KMM-CP: Practical Conformal Prediction under Covariate Shift via Selective Kernel Mean Matching
Theory
Efficient ML
- KMM-CP framework utilizes Kernel Mean Matching for conformal prediction under covariate shift.
- Introduces a selective extension to improve stability in low-overlap regions.
- Establishes a connection between moment-matching quality and effective sample size for coverage guarantees.
- Demonstrates significant performance improvements in molecular property prediction tasks.
Read more
KMM-CP: Practical Conformal Prediction under Covariate Shift via Selective Kernel Mean Matching
Summary
The paper introduces KMM-CP, a novel framework for conformal prediction that addresses the challenges posed by covariate shift in machine learning. Conformal Prediction (CP) is a method that provides finite-sample coverage guarantees, but its validity is often compromised in real-world scenarios due to distribution shifts. The authors propose using Kernel Mean Matching (KMM) to align calibration and test distributions without the need for explicit density estimation, thereby enhancing stability. They also introduce a selective extension that focuses on regions of reliable support overlap, which further improves the framework's performance in low-overlap scenarios. The paper provides a thorough analysis connecting moment-matching quality to effective sample size and conformal coverage, demonstrating that KMM-CP significantly reduces the coverage gap in molecular property prediction tasks with substantial covariate shift, outperforming existing methods by over 50%.
Methodology
The KMM-CP framework employs Kernel Mean Matching to align the weighted calibration distribution with the test covariate distribution in a reproducing kernel Hilbert space (RKHS). This is achieved by minimizing the Maximum Mean Discrepancy (MMD) under explicit weight constraints. The selective extension optimizes calibration weights and target selection variables to restrict correction to regions of shared support, enhancing the bias-variance tradeoff.
Results
KMM-CP was evaluated on molecular property prediction benchmarks, showing a reduction in coverage gap by over 50% compared to existing approaches. The method demonstrated robustness and efficiency, particularly in high-dimensional settings with significant covariate shifts.
Implications
The KMM-CP framework has potential applications in high-stakes domains such as healthcare and scientific discovery, where reliable uncertainty quantification is crucial. Its ability to handle covariate shifts effectively makes it a valuable tool for deploying machine learning models in real-world scenarios.
GIFT: Bootstrapping Image-to-CAD Program Synthesis via Geometric Feedback
Generative Models
Computer Vision
Optimization
- GIFT leverages geometric feedback to enhance training data diversity for CAD program synthesis.
- The framework reduces inference compute by 80% while improving performance metrics.
- GIFT outperforms traditional supervised fine-tuning methods and remains competitive with complex models.
- The approach addresses the critical bottleneck of limited training data in generative CAD design.
Read more
GIFT: Bootstrapping Image-to-CAD Program Synthesis via Geometric Feedback
Summary
The paper introduces Geometric Inference Feedback Tuning (GIFT), a novel framework aimed at enhancing the generation of executable CAD programs from images. Current methods struggle with the alignment of visual geometry and symbolic program representations, particularly as design complexity increases. The authors argue that the main limitation lies in the scarcity of diverse training examples rather than the model's capacity. GIFT addresses this by employing geometric feedback to create high-quality training samples from test-time computations. It incorporates two key mechanisms: Soft-Rejection Sampling (GIFT-REJECT), which retains diverse high-fidelity programs, and Failure-Driven Augmentation (GIFT-FAIL), which generates synthetic training examples from near-miss predictions. This approach significantly reduces inference compute by 80% while improving mean Intersection over Union (IoU) by 12% over a strong supervised baseline. GIFT demonstrates competitive performance against more complex multimodal systems without requiring additional human annotations or specialized architectures.
Methodology
GIFT employs a data augmentation framework that utilizes geometric feedback to generate high-quality training samples. It combines Soft-Rejection Sampling to retain diverse programs and Failure-Driven Augmentation to create synthetic examples from near-miss predictions, effectively turning test-time computations into valuable training data.
Results
GIFT achieves a 12% improvement in mean IoU compared to a strong supervised baseline and solves 53% more problems than the baseline. It maintains higher resilience against increased task complexity compared to other models, demonstrating superior accuracy and robustness.
Implications
The GIFT framework has the potential to significantly advance generative CAD design by providing a more efficient and effective means of training models, thereby enabling more complex engineering designs and reducing reliance on expensive and limited datasets.
FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation
Federated Learning
- FeDMRA addresses the limitations of traditional fixed memory allocation in federated learning.
- The framework incorporates dynamic memory allocation based on client data distribution and contribution.
- It effectively mitigates catastrophic forgetting through optimized exemplar storage.
- Extensive experiments show significant performance improvements on medical image datasets.
Read more
FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation
Summary
The paper introduces FeDMRA, a novel framework for Federated Class-Incremental Learning (FCIL) that addresses the challenges posed by non-IID data distributions in federated healthcare systems. Traditional methods often rely on fixed memory allocations for exemplar storage, which can lead to performance disparities among clients due to varying data characteristics. FeDMRA proposes a dynamic memory allocation strategy that adjusts the storage resources based on each client's data distribution and contribution to the global model. This approach not only enhances model performance but also mitigates catastrophic forgetting by ensuring a fair representation of data across clients. The authors conducted extensive experiments on three medical image datasets, demonstrating significant improvements over existing baseline models, thereby validating the effectiveness of their proposed method in real-world scenarios.
Methodology
The authors developed a dynamic memory allocation strategy that allows the server to allocate memory portions to clients based on their private data distributions and contributions to the global model. This method integrates regularization and knowledge distillation techniques to enhance training and reduce forgetting.
Results
The experiments conducted on three medical image datasets revealed that FeDMRA outperformed existing baseline models, showcasing its effectiveness in improving model performance and addressing the challenges of data heterogeneity in federated learning settings.
Implications
FeDMRA has significant implications for federated learning applications in healthcare, particularly in scenarios involving continuous learning from diverse and evolving datasets. It can enhance the accuracy and reliability of models used for medical diagnosis and other critical applications.
Personalizing Mathematical Game-based Learning for Children: A Preliminary Study
Theory
- The study proposes a framework for personalizing game-based learning using AI techniques.
- A dataset of 206 player-generated game levels was analyzed to develop a classifier.
- The Random Forest model was identified as the most effective classifier for predicting valid game levels.
- The research emphasizes the importance of adaptive learning in enhancing student engagement and learning outcomes.
Read more
Personalizing Mathematical Game-based Learning for Children: A Preliminary Study
Summary
This study explores the integration of artificial intelligence (AI) techniques into game-based learning (GBL) systems to enhance the personalization of mathematical learning experiences for children. The authors identify a significant challenge in GBL: the need for a large number of high-quality game levels that are tailored to individual learning abilities. To address this, they propose a framework guided by adaptive learning theory, which utilizes a machine learning classifier to evaluate player-generated game levels. The dataset comprises 206 distinct levels created by both experts and advanced players using a tool called Creative Mode within a math learning app. The study evaluates four machine learning models—k-nearest neighbors, decision trees, support vector machines, and random forests—to classify and predict valid game levels. The results indicate that the Random Forest model outperforms the others, suggesting its effectiveness in filtering and personalizing game content. This research highlights the potential of AI in developing GBL systems that can adapt to the unique needs of learners, ultimately aiming to improve engagement and learning outcomes in mathematics education.
Methodology
The authors collected a dataset of 206 game levels created by users in a math learning app's Creative Mode. They developed a machine learning classifier to extract features from these levels and predict their validity using four different classification models: k-nearest neighbors, decision trees, support vector machines, and random forests.
Results
The study found that the Random Forest model was the optimal classifier for predicting valid player-generated game levels, outperforming the other models tested. This indicates a promising approach to filtering and personalizing game content in educational settings.
Implications
The findings suggest that AI can significantly enhance the personalization of game-based learning experiences, potentially leading to improved engagement and learning outcomes in mathematics education for children. This approach could be applied to other educational contexts where personalized learning is beneficial.
A Comparative Investigation of Thermodynamic Structure-Informed Neural Networks
Theory
- Comparison of different thermodynamic formulations in PINNs.
- Newtonian-residual-based PINNs struggle with physical consistency.
- Structure-preserving formulations improve parameter identification and robustness.
- Numerical experiments demonstrate the effectiveness of various thermodynamic models.
Read more
A Comparative Investigation of Thermodynamic Structure-Informed Neural Networks
Summary
This paper presents a systematic comparison of thermodynamic structure-informed neural networks (PINNs) that incorporate various thermodynamic formulations, including Newtonian, Lagrangian, and Hamiltonian mechanics for conservative systems, as well as the Onsager variational principle and extended irreversible thermodynamics for dissipative systems. The authors conduct comprehensive numerical experiments on ordinary and partial differential equations to evaluate the impact of these formulations on accuracy, physical consistency, noise robustness, and interpretability. The findings reveal that while Newtonian-residual-based PINNs can reconstruct system states, they struggle to recover key physical and thermodynamic quantities reliably. In contrast, structure-preserving formulations significantly enhance parameter identification, thermodynamic consistency, and robustness. This work provides practical guidance for designing thermodynamics-consistent models and lays the groundwork for integrating more general nonequilibrium thermodynamic structures into physics-informed machine learning.
Methodology
The authors constructed multiple PINNs models incorporating various thermodynamic formalisms to evaluate their performance in solving both forward and inverse problems. They conducted numerical experiments using representative conservative and dissipative systems, analyzing the models' accuracy, consistency, and robustness.
Results
The results indicate that structure-preserving formulations significantly outperform Newtonian-residual-based PINNs in terms of parameter identification and thermodynamic consistency. The experiments showed that the latter could reconstruct system states but failed to recover essential physical quantities reliably.
Implications
The findings suggest that incorporating thermodynamic structures into PINNs can enhance their applicability in scientific machine learning, particularly in modeling complex dynamical systems. This work opens avenues for future research into integrating nonequilibrium thermodynamic structures into machine learning frameworks.
PEANUT: Perturbations by Eigenvalue Alignment for Attacking GNNs Under Topology-Driven Message Passing
Graph Learning
- PEANUT is a gradient-free, black-box attack that injects virtual nodes into GNNs.
- The attack is applicable during the inference phase, making it practical for real-world scenarios.
- No features are required for the injected nodes, showcasing the significance of connectivity in GNNs.
- The method demonstrates effectiveness across various graph tasks, including graph-level regression.
Read more
PEANUT: Perturbations by Eigenvalue Alignment for Attacking GNNs Under Topology-Driven Message Passing
Summary
This paper addresses the vulnerabilities of Graph Neural Networks (GNNs) to small perturbations in graph structure, which can significantly impact their performance. The authors introduce PEANUT, a novel black-box attack method that injects virtual nodes into the graph to exploit these vulnerabilities. Unlike traditional graph modification attacks, PEANUT operates at the inference phase without requiring access to the original graph's structure or features of the injected nodes. This makes it a practical and realistic threat model, particularly relevant for deployed GNN systems. The method is simple, gradient-free, and does not involve complex optimizations or surrogate model training, allowing for immediate application. The authors conduct extensive experiments across various graph tasks, demonstrating that even minimal perturbations can lead to substantial degradation in GNN performance, particularly in graph-level regression tasks. The findings highlight the critical importance of connectivity design in GNNs and the need for improved robustness against such attacks.
Methodology
The authors propose a black-box attack method called PEANUT, which injects virtual nodes into the graph to maximize differences between clean and perturbed graph representations. The attack operates without requiring gradient information or surrogate models, focusing on the evasion setting where the GNN is already trained.
Results
Extensive experiments reveal that PEANUT can significantly degrade GNN performance across multiple tasks, including graph-level regression, even with minimal perturbations. The results indicate that the attack is effective despite its simplicity, underscoring the vulnerabilities of GNNs to structural changes.
Implications
The findings suggest that GNNs may be susceptible to adversarial attacks in practical applications, emphasizing the need for enhanced robustness measures. The study also opens avenues for further research into the security of GNN architectures and the development of defenses against such attacks.
Match or Replay: Self Imitating Proximal Policy Optimization
Reinforcement Learning
Robotics
Optimization
- Introduction of Self-Imitating Proximal Policy Optimization (SIPP) for improved exploration and sample efficiency.
- Development of the MATCH strategy utilizing optimal transport to enhance learning in dense reward environments.
- Implementation of the REPLAY strategy to reinforce learning from successful trajectories in sparse reward scenarios.
- Empirical validation across various environments demonstrating significant improvements in learning efficiency.
Read more
Match or Replay: Self Imitating Proximal Policy Optimization
Summary
This paper addresses the challenge of inefficient exploration in Reinforcement Learning (RL) agents, particularly in environments with sparse rewards. Traditional exploration strategies often lead to slow learning and suboptimal performance due to the inability of agents to build on successful past experiences. The authors propose a novel self-imitating on-policy algorithm called Self-Imitating Proximal Policy Optimization (SIPP), which enhances exploration and sample efficiency by leveraging past high-reward state-action pairs for policy updates. The method incorporates two strategies: MATCH, which uses optimal transport to prioritize state-action distributions that align with rewarding trajectories in dense reward environments, and REPLAY, which replays successful trajectories in sparse reward settings to facilitate structured exploration. Experimental results demonstrate that SIPP significantly improves learning efficiency across diverse environments, including MuJoCo for dense rewards and the 3D Animal-AI Olympics for sparse rewards, achieving faster convergence and higher success rates compared to existing self-imitating RL methods. The findings highlight the potential of self-imitation as a robust strategy for enhancing exploration in RL, applicable to more complex tasks.
Methodology
The authors propose SIPP, an on-policy RL algorithm that integrates self-imitation into the Proximal Policy Optimization (PPO) framework. The MATCH strategy employs optimal transport to prioritize rewarding state-action pairs, while the REPLAY strategy maintains a buffer of successful trajectories for replay in sparse environments. This dual approach enhances exploration and sample efficiency without relying on off-policy corrections.
Results
Experimental evaluations show that SIPP achieves faster convergence and higher success rates compared to state-of-the-art self-imitating RL baselines across various environments, including dense and sparse reward settings. The results indicate substantial improvements in learning efficiency, particularly in complex tasks like MuJoCo and the Animal-AI Olympics.
Implications
The proposed self-imitation strategies could significantly enhance the performance of RL agents in both dense and sparse reward environments, making them applicable to a wide range of complex tasks in robotics, game playing, and other domains requiring efficient exploration.
Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression
Theory
- Tabular foundation models like TabPFN and TabICL are effective for conditional density estimation.
- These models outperform traditional CDE methods in terms of loss, log-likelihood, and CRPS across various datasets.
- Calibration performance is competitive but may require post-hoc adjustments for larger datasets.
- A case study in photometric redshift estimation highlights the practical advantages of using foundation models.
Read more
Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression
Summary
This paper investigates the effectiveness of recent tabular foundation models, specifically TabPFN and TabICL, for conditional density estimation (CDE) in regression tasks. CDE aims to estimate the full conditional distribution of a response variable given tabular covariates, which is crucial in scenarios with heteroscedasticity, multimodality, or asymmetric uncertainty. The authors benchmark these models against a variety of parametric, tree-based, and neural CDE methods across 39 real-world datasets, varying training sizes from 50 to 20,000. The evaluation employs six metrics that assess density accuracy, calibration, and computation time. The findings reveal that foundation models consistently achieve superior CDE loss, log-likelihood, and continuous ranked probability score (CRPS) across most datasets. While calibration performance is competitive at smaller sample sizes, it sometimes lags behind specialized neural baselines at larger sizes, indicating that post-hoc recalibration might enhance performance. A case study on photometric redshift estimation demonstrates that TabPFN, trained on 50,000 galaxies, outperforms all baselines trained on a larger dataset of 500,000 galaxies. Overall, the results position tabular foundation models as robust off-the-shelf options for CDE tasks.
Methodology
The authors conducted an empirical benchmark comparing TabPFN and TabICL against a range of classical and modern CDE methods across 39 real-world datasets. They varied training sizes and utilized six evaluation metrics to assess performance, focusing on density accuracy, calibration, and computation time.
Results
The results indicate that tabular foundation models achieve the best performance in terms of CDE loss, log-likelihood, and CRPS in the majority of cases. Calibration is competitive at smaller sample sizes but shows variability at larger sizes, where specialized methods sometimes perform better. The case study on photometric redshift estimation demonstrates the superiority of TabPFN over traditional methods.
Implications
The findings suggest that tabular foundation models can serve as strong, general-purpose tools for conditional density estimation, potentially simplifying the modeling process in various applications, including risk analysis and treatment-response modeling.
Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence
Generative Models
Optimization
Theory
- Introduction of GibbsPCDSolver, a scalable method for MaxEnt population synthesis.
- Utilizes Persistent Contrastive Divergence to approximate expectations without full enumeration.
- Demonstrates superior performance in terms of mean relative error and effective sample size compared to traditional methods.
- Validated on a new demographic benchmark, Syn-ISTAT, with significant implications for urban modeling.
Read more
Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence
Summary
This paper introduces GibbsPCDSolver, a novel approach for scalable maximum entropy (MaxEnt) population synthesis that addresses the computational bottleneck of exact expectation computation in existing methods. Traditional MaxEnt modeling requires summing over the entire tuple space, which becomes infeasible with more than approximately 20 categorical attributes due to the exponential growth of the space. GibbsPCDSolver utilizes Persistent Contrastive Divergence (PCD) to maintain a persistent pool of synthetic individuals, updating them through Gibbs sweeps at each gradient step, thus providing a stochastic approximation of model expectations without the need to enumerate the entire tuple space. The method is validated on controlled benchmarks and a new K=15 Italian demographic benchmark, Syn-ISTAT, which features analytically exact marginal targets. The results demonstrate that GibbsPCDSolver maintains a mean relative error (MRE) between 0.010 and 0.018 as the number of attributes increases, while the runtime scales linearly with the number of attributes rather than exponentially with the size of the tuple space. Additionally, it achieves a significant diversity advantage in the generated populations, making it particularly suitable for agent-based urban simulations.
Methodology
The methodology involves the development of GibbsPCDSolver, which replaces the exact expectation computation in MaxEnt modeling with a stochastic approximation using Persistent Contrastive Divergence. This approach maintains a persistent pool of synthetic individuals and updates them through Gibbs sweeps, allowing for efficient computation of model expectations without enumerating the entire tuple space.
Results
GibbsPCDSolver was validated on the Syn-ISTAT benchmark, achieving an MRE of 0.03 after training and outperforming generalized raking in terms of effective sample size and population diversity. The method maintained an MRE between 0.010 and 0.018 across various scaling experiments, demonstrating its robustness as the number of categorical attributes increased.
Implications
The findings suggest that GibbsPCDSolver can significantly enhance the generation of synthetic populations for urban simulations, providing a more diverse and representative set of individual profiles. This has potential applications in urban planning, demographic studies, and agent-based modeling, where accurate population synthesis is crucial.
Stepwise Credit Assignment for GRPO on Flow-Matching Models
Reinforcement Learning
Generative Models
Computer Vision
- Introduction of Stepwise-Flow-GRPO for improved credit assignment in reinforcement learning.
- Utilization of Tweedie's formula for intermediate reward estimation to enhance learning efficiency.
- Development of a new SDE inspired by DDIM for better image quality in generated outputs.
- Demonstrated superior sample efficiency and faster convergence compared to traditional Flow-GRPO.
Read more
Stepwise Credit Assignment for GRPO on Flow-Matching Models
Summary
The paper introduces Stepwise-Flow-GRPO, a novel approach to reinforcement learning applied to flow-matching models for text-to-image generation. Traditional methods like Flow-GRPO utilize uniform credit assignment across all steps in a trajectory, which fails to account for the temporal structure of the diffusion process. This can lead to suboptimal learning as it rewards all steps based solely on the final image quality, ignoring the contributions of intermediate steps. The authors propose a stepwise credit assignment mechanism that evaluates each denoising step's impact on reward improvement, leveraging Tweedie's formula for intermediate reward estimation. This method not only enhances sample efficiency but also accelerates convergence rates. Additionally, the authors present an improved stochastic differential equation (SDE) inspired by DDIM to generate higher-quality images while maintaining the stochastic nature required for policy gradients. The results demonstrate that Stepwise-Flow-GRPO significantly outperforms standard Flow-GRPO in terms of sample efficiency and convergence speed, making it a promising advancement in the application of reinforcement learning to generative models.
Methodology
The authors propose Stepwise-Flow-GRPO, which assigns credit based on the reward improvement of each denoising step rather than the final image quality. They utilize Tweedie's formula to estimate intermediate rewards and calculate stepwise gains to optimize the policy. Additionally, they introduce a new SDE inspired by DDIM to improve the quality of generated images while preserving necessary stochasticity.
Results
The experiments show that Stepwise-Flow-GRPO achieves significantly better sample efficiency and faster convergence rates compared to the standard Flow-GRPO approach, leading to higher final rewards and improved image quality.
Implications
This work has the potential to enhance the performance of text-to-image generation models by providing a more nuanced approach to credit assignment in reinforcement learning, which could lead to better alignment between generated images and textual prompts.
PruneFuse: Efficient Data Selection via Weight Pruning and Network Fusion
Efficient ML
- PruneFuse introduces a two-stage approach for efficient data selection using pruned networks.
- The method significantly reduces computational costs associated with traditional active learning techniques.
- Fusing the pruned network with the original model enhances training efficiency and generalization.
- Extensive experiments show PruneFuse outperforms state-of-the-art methods across multiple datasets.
Read more
PruneFuse: Efficient Data Selection via Weight Pruning and Network Fusion
Summary
The paper introduces PruneFuse, a novel approach to efficient data selection aimed at enhancing the training efficiency of deep neural networks while minimizing annotation requirements. Traditional data selection methods, particularly in active learning, often incur high computational costs due to the need for extensive training cycles on large models. PruneFuse addresses this challenge by employing a two-stage process: first, it applies structured pruning to create a smaller, pruned network that is structurally coherent with the original network, allowing it to effectively select the most informative samples from the dataset. In the second stage, this pruned network is fused with the original network, leveraging the insights gained during its training to facilitate the learning process of the fused model. The authors demonstrate that PruneFuse significantly reduces computational costs associated with data selection and accelerates the overall training process while achieving superior performance compared to existing state-of-the-art active learning methods across various datasets, including CIFAR-10, CIFAR-100, Tiny-ImageNet-200, ImageNet-1K, and text datasets.
Methodology
PruneFuse operates in two stages: first, it applies structured pruning to create a smaller pruned network that selects informative samples from the dataset. Second, this pruned network is fused with the original network, allowing the insights from the pruned network to enhance the training of the fused model.
Results
The experimental results indicate that PruneFuse achieves better performance than traditional active learning methods while significantly reducing computational costs. It demonstrates broad applicability across different datasets and network architectures.
Implications
PruneFuse has the potential to make active learning more scalable and cost-effective, particularly in resource-constrained environments. It can be utilized in various deep learning applications where efficient data selection is critical.
Q-BIOLAT: Binary Latent Protein Fitness Landscapes for QUBO-Based Optimization
Optimization
- Q-BioLat provides a framework for modeling protein fitness landscapes in binary latent spaces.
- The approach emphasizes the significance of representation in optimization landscapes, showing that different representations can yield different optimization outcomes.
- PCA-based binary representations consistently outperform learned representations in terms of optimization effectiveness.
- Classical combinatorial optimization methods are effective in structured binary latent spaces, enabling efficient exploration of protein fitness landscapes.
Read more
Q-BIOLAT: Binary Latent Protein Fitness Landscapes for QUBO-Based Optimization
Summary
The paper introduces Q-BioLat, a novel framework designed for modeling and optimizing protein fitness landscapes using binary latent representations. Traditional approaches to protein fitness optimization often rely on continuous representations and focus on predictive accuracy, which can be inadequate for the inherently discrete nature of protein sequences. Q-BioLat addresses this by utilizing pretrained protein language model embeddings to create compact binary representations that facilitate the formulation of protein fitness as a quadratic unconstrained binary optimization (QUBO) problem. This allows for the application of various combinatorial optimization techniques, such as simulated annealing and genetic algorithms, to efficiently explore the fitness landscapes. The authors highlight the importance of representation in shaping optimization landscapes, demonstrating that different representations can lead to significantly different optimization behaviors. Empirical results show that PCA-based binary latent representations outperform other methods in terms of decoding quality and protein design performance across multiple datasets. The framework also reveals a trade-off between expressivity, generalization, and searchability based on latent dimensionality. Overall, Q-BioLat bridges the gap between modern machine learning and discrete optimization, offering a pathway for integration with quantum optimization methods.
Methodology
The methodology involves transforming pretrained protein language model embeddings into binary latent representations, which are then used to formulate the protein fitness optimization problem as a QUBO. This allows for the application of combinatorial optimization techniques to explore the resulting fitness landscapes.
Results
The empirical results demonstrate that PCA-based binary latent representations lead to superior decoding quality and protein design performance compared to other methods. Classical optimization techniques effectively identify high-fitness regions in the binary latent spaces, even in data-scarce scenarios. The study also identifies a trade-off between the dimensionality of the latent space and optimization performance.
Implications
The Q-BioLat framework has significant implications for protein engineering, drug discovery, and synthetic biology by providing a more effective method for exploring protein fitness landscapes. It also opens avenues for integrating machine learning with quantum optimization techniques, potentially enhancing the efficiency of protein design processes.
AMIGO: Agentic Multi-Image Grounding Oracle Benchmark
Multimodal
- AMIGO introduces a long-horizon benchmark for hidden-target identification in multi-image settings.
- The benchmark employs a constrained questioning protocol with explicit penalties for invalid actions.
- It allows for controlled oracle imperfections to assess model robustness and verification behavior.
- The evaluation metrics cover identification success, interaction quality, and protocol compliance.
Read more
AMIGO: Agentic Multi-Image Grounding Oracle Benchmark
Summary
The paper introduces AMIGO, a novel benchmark designed to evaluate agentic vision-language models (VLMs) through interactive hidden-target identification across multiple images. Unlike traditional benchmarks that focus on single-image, single-turn evaluations, AMIGO emphasizes long-horizon planning and the ability to ask attribute-focused questions to identify a target image from a gallery of visually similar candidates. The model must navigate uncertainty, maintain consistent constraints, and adapt its questioning strategy based on feedback received. The benchmark incorporates a Yes/No/Unsure questioning protocol, penalizing invalid actions with a 'Skip' response, allowing for the measurement of protocol compliance and interaction quality. The authors instantiate AMIGO with the 'Guess My Preferred Dress' task, which involves identifying a specific dress from a collection based on fine-grained attributes. This setup not only tests the model's identification success but also its efficiency, evidence verification, and robustness against oracle inconsistencies. The paper highlights the importance of evaluating VLMs in a more dynamic and interactive context, paving the way for future research in agentic behavior and multimodal reasoning.
Methodology
The authors developed AMIGO as an interactive benchmark where a model identifies a target image by asking binary questions about observable attributes. The model receives feedback in the form of Yes/No/Unsure responses, and any protocol violations result in a 'Skip' response. This setup allows for tracking the model's decision-making process and evaluating its performance across multiple turns.
Results
The paper reports on the implementation of AMIGO with the 'Guess My Preferred Dress' task, detailing metrics such as identification success rates, efficiency of question selection, and adherence to the interaction protocol. The results indicate the effectiveness of AMIGO in diagnosing agentic behaviors and the challenges faced by VLMs in maintaining consistent state and adapting to feedback.
Implications
AMIGO serves as a controlled testbed for evaluating agentic behaviors in VLMs, focusing on long-term planning and interaction strategies. It highlights the importance of developing models that can handle uncertainty and maintain compliance with interaction protocols, which are critical for practical applications in multimodal systems.
Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
Reinforcement Learning
Theory
Optimization
- Establishes stability conditions for relative TD learning with linear function approximation.
- Demonstrates that the choice of baseline distribution is crucial for algorithm stability.
- Shows that asymptotic bias and covariance remain bounded as the discount factor approaches one.
- Provides empirical validation through simulations on finite-state MDPs.
Read more
Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
Summary
This paper investigates the stability and sensitivity of relative temporal-difference (TD) learning, particularly focusing on its application with linear function approximation. Relative TD learning aims to enhance convergence rates when the discount factor approaches one by utilizing a baseline in the TD update. The authors establish stability conditions for the algorithm, emphasizing the significance of the baseline distribution. They demonstrate that when the baseline is the empirical distribution of the state-action process, the algorithm remains stable for any non-negative baseline weight and discount factor. Furthermore, the paper provides a sensitivity analysis of parameter estimates, revealing that both asymptotic bias and covariance are uniformly bounded as the discount factor nears one. The findings are supported by simulations conducted on finite-state finite-action Markov Decision Processes (MDPs) and speed scaling scenarios, illustrating the practical implications of the theoretical results.
Methodology
The authors analyze relative TD learning using linear function approximation and derive stability conditions based on the choice of baseline distribution. They employ stochastic approximation techniques to establish convergence properties and conduct simulations to validate their theoretical findings.
Results
The paper concludes that relative TD learning is stable for any non-negative baseline weight when the baseline is chosen as the empirical distribution of the state-action process. Additionally, the asymptotic bias and covariance of parameter estimates are shown to remain uniformly bounded as the discount factor approaches one, indicating robust performance of the algorithm.
Implications
The findings suggest that careful selection of baseline distributions can significantly enhance the stability and convergence of TD learning algorithms, which is particularly relevant for applications in reinforcement learning where slow convergence can be a critical issue.
PiCSRL: Physics-Informed Contextual Spectral Reinforcement Learning
Reinforcement Learning
Efficient ML
Optimization
- PiCSRL effectively addresses HDLSS constraints in environmental monitoring through improved representation mechanisms.
- The framework is the first to apply reinforcement learning to hyperspectral sensing under HDLSS conditions for sample-efficient policy learning.
- Demonstrates significant improvements in predictive modeling for cyanobacterial gene concentrations using hyperspectral imagery.
- Achieves superior performance compared to traditional sampling methods, enhancing detection efficiency.
Read more
PiCSRL: Physics-Informed Contextual Spectral Reinforcement Learning
Summary
The paper introduces PiCSRL (Physics-Informed Contextual Spectral Reinforcement Learning), a novel framework designed to address the challenges posed by high-dimensional low-sample-size (HDLSS) datasets in environmental monitoring. Traditional reinforcement learning (RL) methods struggle in HDLSS contexts due to the sparsity of labeled data, which is crucial for effective model training. PiCSRL incorporates domain knowledge into the RL state representation through physics-informed embeddings, enhancing adaptive sensing capabilities. The authors validate their approach by applying it to the task of adaptive sampling for predicting cyanobacterial gene concentrations using NASA PACE hyperspectral imagery over Lake Erie. The results demonstrate that PiCSRL significantly outperforms baseline methods, achieving a root mean square error (RMSE) of 0.153 and a bloom detection rate of 98.4%. Additionally, the framework shows improved generalization in semi-supervised learning scenarios and effectively scales to larger networks, indicating its potential for broader applications in Earth observation and environmental monitoring.
Methodology
The methodology involves formulating the problem of adaptive sensing as a reinforcement learning task, where the agent selects locations to sample based on a belief state. The authors employ a semi-supervised learning approach that utilizes physics-informed features derived from hyperspectral data to reduce dimensionality and improve model training. A ridge regression model serves as a teacher to generate pseudo-labels for unlabeled data, which are then used to train a multi-layer perceptron (MLP) student model.
Results
PiCSRL achieved an RMSE of 0.153 and a bloom detection rate of 98.4%, outperforming random sampling (0.296 RMSE) and Upper Confidence Bound (UCB) methods (0.178 RMSE). The ablation studies indicated that the use of physics-informed features improved test generalization (0.52 R², +0.11 over raw bands) in semi-supervised learning. Scalability tests confirmed that PiCSRL effectively handles large networks with over 50 stations and 2 million combinations, showing significant performance improvements (p = 0.002) over baseline methods.
Implications
The findings suggest that PiCSRL can serve as a sample-efficient adaptive sensing method across various Earth observation domains, potentially leading to enhanced monitoring of environmental phenomena such as water quality and algal blooms. The integration of physics-informed features into reinforcement learning frameworks may also inspire further research in other domains facing similar HDLSS challenges.
D-GATNet: Interpretable Temporal Graph Attention Learning for ADHD Identification Using Dynamic Functional Connectivity
Graph Learning
Time Series
Interpretability
- D-GATNet leverages dynamic functional connectivity for ADHD classification, addressing limitations of static approaches.
- The framework incorporates a Graph Attention Network for spatial learning and temporal convolution for dynamic modeling.
- Interpretability is achieved through attention mechanisms that highlight significant ROI interactions and temporal windows.
- The model outperforms existing methods, achieving 85.18% balanced accuracy and 0.881 AUC on the ADHD-200 dataset.
Read more
D-GATNet: Interpretable Temporal Graph Attention Learning for ADHD Identification Using Dynamic Functional Connectivity
Summary
This paper presents D-GATNet, an innovative framework for the automated classification of Attention Deficit Hyperactivity Disorder (ADHD) using dynamic functional connectivity (dFC) derived from resting-state functional MRI (rs-fMRI) data. The authors highlight the challenges in ADHD diagnosis, particularly the limitations of static functional connectivity approaches and the lack of interpretability in existing deep learning models. D-GATNet employs a temporal graph-based architecture that captures both spatial and temporal dynamics of brain connectivity. The framework utilizes a sliding-window approach to construct functional brain graphs, where regions of interest (ROIs) serve as nodes and connectivity strengths as edges. A multi-layer Graph Attention Network (GAT) is used to learn spatial dependencies, while temporal dynamics are modeled through 1D convolution and temporal attention mechanisms. The interpretability of the model is enhanced by analyzing graph attention weights and temporal attention scores, which reveal significant ROI interactions and highlight informative connectivity segments. The model was evaluated on the ADHD-200 dataset, achieving a balanced accuracy of 85.18% ± 5.64 and an AUC of 0.881, outperforming existing state-of-the-art methods. The findings suggest potential neuroimaging biomarkers related to ADHD, particularly disruptions in the cerebellar and default mode networks.
Methodology
The D-GATNet framework consists of five main modules: Dynamic Connectivity Representation, Graph Construction, Spatial Graph Modeling, Temporal Dynamics Modeling, and Classification. It utilizes a sliding-window approach to compute dynamic functional connectivity from rs-fMRI data, constructs brain graphs, and applies a multi-layer Graph Attention Network to learn spatial dependencies, complemented by temporal convolution and attention mechanisms to capture time-varying dynamics.
Results
The D-GATNet model achieved a balanced accuracy of 85.18% ± 5.64 and an AUC of 0.881 on the ADHD-200 dataset, outperforming existing state-of-the-art methods. Attention analysis revealed significant disruptions in the cerebellar and default mode networks, suggesting potential neuroimaging biomarkers for ADHD.
Implications
The findings from this study could enhance the diagnostic process for ADHD by providing a more interpretable and accurate machine learning framework. The identification of specific neuroimaging biomarkers may also contribute to a better understanding of ADHD and inform future research and clinical practices.
EcoFair: Trustworthy and Energy-Aware Routing for Privacy-Preserving Vertically Partitioned Medical Inference
Efficient ML
Federated Learning
Multimodal
- EcoFair maintains data privacy by transmitting only embeddings instead of raw data.
- The framework employs a dynamic routing mechanism that activates heavier processing based on clinical risk and uncertainty.
- Experimental results show significant energy savings in edge-side inference without compromising classification accuracy.
- Selective routing improves performance for subgroup-sensitive malignant cases.
Read more
EcoFair: Trustworthy and Energy-Aware Routing for Privacy-Preserving Vertically Partitioned Medical Inference
Summary
The paper presents EcoFair, a framework designed for privacy-preserving medical inference, specifically targeting dermatological diagnosis through a vertically partitioned architecture. By keeping raw image and tabular data local and transmitting only modality-specific embeddings, EcoFair addresses the challenges of data privacy and energy efficiency in edge computing environments. The framework introduces a lightweight-first routing mechanism that selectively activates a heavier image encoder based on local uncertainty and clinical risk, combining predictive uncertainty with a neurosymbolic risk score derived from patient demographics. The authors evaluate EcoFair on three dermatology benchmarks, demonstrating that it significantly reduces edge-side inference energy while maintaining competitive classification performance. Additionally, the selective routing mechanism enhances the diagnostic behavior for vulnerable subgroups without altering the global training objective, positioning EcoFair as a practical solution for energy-aware and privacy-preserving medical inference.
Methodology
EcoFair utilizes a simulated vertically partitioned inference architecture where raw data remains local. It employs a lightweight-first routing mechanism that decides whether to use a lightweight or heavyweight image encoder based on predictive uncertainty and a neurosymbolic risk score. The framework is evaluated using various pretrained image encoders to assess energy consumption and classification performance across multiple dermatology benchmarks.
Results
The experiments reveal that EcoFair can substantially reduce energy consumption during edge-side inference while achieving classification performance comparable to existing methods. The selective routing mechanism also demonstrates improved diagnostic performance for clinically significant malignant cases, indicating its effectiveness in addressing subgroup-sensitive issues.
Implications
EcoFair's approach has significant implications for the deployment of AI in healthcare, particularly in ensuring data privacy while optimizing energy use. It can be applied in various medical settings where data sensitivity and computational resources are critical, paving the way for more efficient and trustworthy medical AI solutions.
Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards
Reinforcement Learning
Optimization
Theory
- Introduces a primal-dual policy optimization algorithm for linear mixture CMDPs with adversarial rewards.
- Achieves near-optimal regret and constraint violation bounds, matching minimax lower bounds up to logarithmic factors.
- Utilizes a regularized dual update and weighted ridge regression for tighter confidence intervals.
- Addresses limitations of existing algorithms that either assume fixed rewards or do not scale well.
Read more
Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards
Summary
This paper addresses the challenge of safe reinforcement learning in finite-horizon linear mixture constrained Markov decision processes (CMDPs) with adversarial rewards. The authors propose a novel primal-dual policy optimization algorithm that operates under full-information feedback and an unknown transition kernel. The algorithm achieves regret and constraint violation bounds of eO(√d²H³K), where d is the feature dimension, H is the horizon, and K is the number of episodes. This is the first algorithm to provide provably efficient performance in this setting, achieving near-optimal regret bounds that align with the known minimax lower bounds for unconstrained linear mixture MDPs, up to logarithmic factors. The key innovation lies in the introduction of a regularized dual update that facilitates a drift-based analysis, circumventing the limitations of strong duality-based approaches when dealing with non-stationary reward functions. Additionally, the authors extend weighted ridge regression for parameter estimation within the constrained framework, leading to tighter confidence intervals essential for deriving the near-optimal regret bounds.
Methodology
The authors develop a primal-dual policy optimization algorithm that combines a regularized dual update for drift-based analysis with weighted ridge regression for parameter estimation. This approach allows for efficient learning in linear mixture CMDPs with adversarial rewards, even when the transition kernel is unknown and rewards change across episodes.
Results
The proposed algorithm achieves regret and constraint violation bounds of eO(√d²H³K), which is near-optimal and aligns with the minimax lower bounds for unconstrained linear mixture MDPs. This demonstrates the algorithm's efficiency and effectiveness in handling adversarial rewards in CMDPs.
Implications
The findings of this paper have significant implications for safe reinforcement learning applications, particularly in environments where rewards are adversarial and non-stationary. The proposed algorithm could be applied in various domains such as robotics, finance, and automated decision-making systems where safety and performance are critical.
Shapley meets Rawls: an integrated framework for measuring and explaining unfairness
Theory
Interpretability
- Introduces an integrated framework combining Shapley values with fairness measurement.
- Demonstrates the application of the framework on the Census Income dataset.
- Identifies key features contributing to gender unfairness in classifiers.
- Offers a computationally efficient alternative to traditional methods for measuring unfairness.
Read more
Shapley meets Rawls: an integrated framework for measuring and explaining unfairness
Summary
This paper presents an integrated framework that utilizes the Shapley value to define and explain unfairness in machine learning models, particularly under standard group fairness criteria. The authors argue that fairness and explainability, often treated separately, can be effectively combined through the Shapley value, which allows for the estimation of unfairness and the identification of contributing features. The framework can be extended to Efficient-Symmetric-Linear (ESL) values, which provide more robust fairness definitions and improved computational efficiency. The authors illustrate their approach using the Census Income dataset, demonstrating that features such as 'Age', 'Number of hours', and 'Marital status' contribute to gender unfairness, achieving results with shorter computation times compared to traditional Bootstrap tests. This work emphasizes the importance of integrating fairness and explainability in AI systems, providing a novel method for assessing and addressing unfairness in algorithmic decision-making.
Methodology
The authors develop a framework that employs Shapley values to quantify unfairness and explain the contributions of various features to this unfairness. They also explore the extension of this framework to ESL values, which are computationally efficient and provide robust definitions of fairness. The methodology includes empirical analysis using the Census Income dataset to illustrate the effectiveness of the proposed approach.
Results
The analysis reveals that features such as 'Age', 'Number of hours', and 'Marital status' are significant contributors to gender unfairness in the dataset. The proposed framework demonstrates a reduction in computation time compared to traditional Bootstrap tests while effectively measuring and explaining unfairness.
Implications
This research has significant implications for the development of fair and explainable AI systems. By providing a method to quantify and explain unfairness, it aids in the ethical deployment of AI technologies across various domains, such as hiring and credit assessment, where fairness is critical. The integration of fairness and explainability can enhance trust in AI systems and support regulatory compliance.
High dimensional theory of two-phase optimizers
Optimization
Theory
- Two-phase optimizers like LA and LA-DiLoCo provide a different noise structure compared to SGD, which can be beneficial in high-dimensional optimization tasks.
- The one-worker variant of LA shows a favorable trade-off between signal and noise, outperforming SGD under optimal learning rates.
- LA-DiLoCo's multi-worker implementation generates more noise, but this can be controlled with appropriate hyperparameter choices.
- The introduction of momentum in the Super Lookahead variant enhances optimization performance by non-linearly transforming the Hessian spectrum.
Read more
High dimensional theory of two-phase optimizers
Summary
This paper explores the high-dimensional theory of two-phase optimizers, particularly focusing on the Lookahead (LA) and LA-DiLoCo algorithms in the context of linear regression. The study highlights the advantages of two-phase optimizers over traditional stochastic gradient descent (SGD), particularly in terms of the trade-off between signal and noise during optimization. The author demonstrates that the one-worker variant of LA offers a different balance between signal and noise compared to SGD, leading to improved performance in certain scenarios. Furthermore, the multi-worker version, LA-DiLoCo, generates more noise than its single-worker counterpart, but this can be mitigated through careful hyperparameter tuning. The analysis also extends to the Super Lookahead (SLA) variant, which incorporates momentum, revealing that stacking momentum operators can enhance convergence rates by transforming the effective Hessian spectrum. Overall, the findings suggest that two-phase optimizers present a promising avenue for advancing training algorithms in high-dimensional settings.
Methodology
The paper employs high-dimensional learning dynamics theory to analyze the performance of two-phase optimizers, specifically Lookahead (LA) and LA-DiLoCo, in a linear regression framework. The author derives evolution equations for the expected loss and examines the noise characteristics and convergence rates associated with these optimizers.
Results
The analysis reveals that LA provides a distinct trade-off between signal and noise compared to SGD, leading to superior performance in certain conditions. The multi-worker LA-DiLoCo variant produces more noise than the single-worker version, but this can be minimized through hyperparameter adjustments. Additionally, the study shows that the SLA variant with momentum can significantly improve convergence rates by leveraging the effective Hessian spectrum.
Implications
The findings suggest that two-phase optimizers could enhance the efficiency and effectiveness of training algorithms, particularly in large-scale machine learning applications. This research may lead to the development of more robust optimization techniques that can handle high-dimensional data more effectively.
Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints
NLP
Large Language Models
Optimization
- Identifies limitations of existing per-query LLM routing methods under batch inference and strict constraints.
- Introduces a robust batch-level routing framework that optimizes model assignments while considering performance uncertainty.
- Explores optimal allocation of computational resources prior to inference to enhance efficiency.
- Demonstrates significant improvements in routing accuracy and resource management through extensive experiments.
Read more
Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints
Summary
This paper addresses the challenges of routing queries to large language models (LLMs) while adhering to cost, GPU resource, and concurrency constraints. Traditional per-query routing methods often struggle to manage batch-level costs, particularly under non-uniform or adversarial batching conditions. The authors propose a novel batch-level, resource-aware routing framework that optimizes model assignments for each batch, ensuring compliance with cost and capacity limits. A robust variant of this framework is introduced to handle uncertainties in predicted LLM performance, coupled with an offline instance allocation strategy that balances quality and throughput across multiple models. The proposed framework is model-agnostic and can be efficiently solved using existing optimization solvers. Experimental results on two multi-task LLM benchmarks demonstrate that the robust approach enhances accuracy by 1-14% compared to non-robust methods, while batch-level routing outperforms per-query methods by up to 24% in adversarial scenarios. Additionally, optimized instance allocation provides further accuracy improvements of up to 3%, all while maintaining strict control over costs and GPU resources.
Methodology
The authors employ Integer Linear Programming (ILP) to formulate a batch-level routing framework that maximizes average routing quality while enforcing cost and capacity constraints. A robust optimization variant is introduced to mitigate the effects of estimation uncertainty in model performance. The framework is designed to be model-agnostic and compatible with various performance estimators, allowing for practical deployment in large-scale settings.
Results
The proposed routing framework achieves improved routing accuracy, with robustness enhancing performance by 1-14% over non-robust methods. Batch-level routing outperforms traditional per-query methods by up to 24% under adversarial batching conditions. Additionally, optimized instance allocation results in accuracy gains of up to 3% compared to non-optimized allocations, all while adhering to cost and GPU resource constraints.
Implications
The findings suggest that implementing batch-level routing and robust optimization can significantly enhance the efficiency and effectiveness of LLM inference systems, making them more suitable for industrial applications where resource constraints are critical. This approach could lead to better performance in real-world scenarios, particularly in environments with varying query complexities and resource limitations.
Mixture-Model Preference Learning for Many-Objective Bayesian Optimization
Optimization
- Introduces a mixture of preference archetypes for many-objective optimization, moving beyond a single utility function.
- Develops information-theoretic methods for active query selection that focus on both mode identity and trade-off shapes.
- Provides diagnostics for mixture-aware evaluation that go beyond simple regret measures.
- Demonstrates superior performance on synthetic and real-world datasets compared to existing methods.
Read more
Mixture-Model Preference Learning for Many-Objective Bayesian Optimization
Summary
This paper addresses the challenges of preference-based many-objective optimization, particularly in scenarios with multiple conflicting objectives and heterogeneous human value structures. The authors propose a novel Bayesian framework that models preferences as a mixture of latent archetypes rather than relying on a single fixed utility function. This approach utilizes a Dirichlet-process mixture to capture the uncertainty over both the archetypes and their weights. The framework includes hybrid queries designed to efficiently gather information about the identity of the active mode and the trade-offs within that mode. The authors provide a simple regret guarantee for their mixture-aware Bayesian optimization procedure. Empirical results demonstrate that their method outperforms standard baselines on both synthetic and real-world benchmarks, revealing structures in the data that traditional regret measures fail to capture. The proposed method enhances preference-driven many-objective optimization by allowing for the coexistence of multiple trade-off archetypes and improving the calibration of user preferences.
Methodology
The authors formulate a preference-based many-objective Bayesian optimization framework using a Dirichlet-process mixture model over Chebyshev weights. They design inter-mode and intra-mode queries to gather information about the active preference mode and its trade-offs. The methodology combines Gaussian process surrogates with a focus on user preferences, allowing for effective sampling in decision-relevant regions of the objective space.
Results
The proposed method shows significant improvements over standard baselines in both synthetic and real-world scenarios. The mixture-aware diagnostics reveal important structural insights that are not captured by traditional regret metrics, indicating a more nuanced understanding of user preferences and trade-offs.
Implications
This work has potential applications in various fields requiring many-objective optimization, such as urban design, materials science, and autonomous systems. By effectively modeling heterogeneous preferences, the framework can enhance decision-making processes in complex environments where multiple objectives must be balanced.
Online Learning for Dynamic Constellation Topologies
Optimization
Theory
- Introduces a novel convex optimization framework for dynamic satellite network topology management.
- Does not assume fixed orbital structures, allowing for flexibility in satellite maneuvers.
- Demonstrates a trade-off between computational complexity and convergence in online learning.
- Empirical results show performance matching that of established offline methods.
Read more
Online Learning for Dynamic Constellation Topologies
Summary
This paper addresses the challenges of configuring dynamic network topologies in satellite constellations using an online learning framework. With the increasing deployment of Low Earth Orbit (LEO) satellites, the need for adaptive network management is critical due to the continuous movement and potential collisions of satellites. The authors propose a novel convex optimization problem that does not rely on predefined structures, such as known orbital planes, which may be disrupted by satellite maneuvers. The approach allows for constrained online learning, balancing computational complexity with convergence to an optimal strategy. Empirical results indicate that the proposed method performs comparably to state-of-the-art offline techniques, demonstrating its effectiveness in dynamic environments. The findings highlight the importance of adaptable network topologies for maintaining service quality in satellite communications.
Methodology
The authors formulated the problem of dynamic topology configuration as a convex optimization problem within the online learning framework. They employed constrained online learning techniques to manage the trade-off between computational efficiency and convergence speed, allowing for real-time adjustments to the network topology as satellites move or change status.
Results
The proposed method was empirically validated against state-of-the-art offline methods, showing comparable performance. The results indicate that the online learning approach is effective in adapting to the dynamic nature of satellite constellations, maintaining service quality while managing computational resources.
Implications
The findings suggest that the proposed online learning framework can significantly enhance the management of satellite networks, particularly in scenarios with frequent changes in topology. This has implications for improving the reliability and efficiency of satellite communications, especially as the number of satellites in orbit continues to grow.
Are LLM-Enhanced Graph Neural Networks Robust against Poisoning Attacks?
Graph Learning
Large Language Models
- Introduces a robustness assessment framework for LLM-enhanced GNNs against poisoning attacks.
- Evaluates 24 victim models using diverse LLM/LM feature enhancers and GNN architectures.
- Demonstrates that LLM-enhanced GNNs show superior performance and robustness compared to shallow embedding baselines.
- Identifies critical factors contributing to robustness, such as effective node representation encoding.
Read more
Are LLM-Enhanced Graph Neural Networks Robust against Poisoning Attacks?
Summary
This paper investigates the robustness of Large Language Model (LLM)-enhanced Graph Neural Networks (GNNs) against poisoning attacks, a critical concern as these models integrate both graph structures and textual attributes. The authors propose a comprehensive robustness assessment framework that evaluates 24 different victim models, combining eight LLM/Language Model (LM)-based feature enhancers with three GNN backbones. The study incorporates a diverse range of poisoning attacks, including six structural and three textual attacks, across four real-world datasets to ensure a fair evaluation and avoid ground truth leakage. The results demonstrate that LLM-enhanced GNNs significantly outperform shallow embedding-based baselines in terms of accuracy and exhibit lower Relative Drop in Accuracy (RDA) across various attack scenarios. Key factors contributing to this robustness include effective encoding of structural and label information in node representations. The paper also outlines future research directions and proposes a new combined attack along with a graph purification defense strategy, aiming to enhance the resilience of LLM-enhanced GNNs against adversarial threats.
Methodology
The authors developed a robustness assessment framework that systematically evaluates LLM-enhanced GNNs under various poisoning attacks. They integrated multiple LLM/LM-based feature enhancers with classic GNN backbones, covering a wide range of attack types and real-world datasets to ensure comprehensive evaluation.
Results
The experiments revealed that LLM-enhanced GNNs achieved significantly higher accuracy and lower RDA compared to shallow embedding-based models across multiple attack settings. The analysis highlighted the importance of structural and label information encoding in enhancing model robustness.
Implications
The findings suggest that LLM-enhanced GNNs can be effectively utilized in applications requiring robust performance against adversarial attacks, such as social network analysis and citation networks. The proposed framework and insights can guide future research in developing more resilient graph learning models.
Physics-Guided Transformer (PGT): Physics-Aware Attention Mechanism for PINNs
Theory
Efficient ML
Optimization
- Introduction of Physics-Guided Transformer (PGT) for improved reconstruction of physical fields.
- Embedding of physical structure into self-attention mechanisms to enhance model performance.
- Demonstrated significant error reduction in sparse data scenarios compared to existing methods.
- Unified training framework combining PDE residuals and data fidelity for robust learning.
Read more
Physics-Guided Transformer (PGT): Physics-Aware Attention Mechanism for PINNs
Summary
The paper introduces the Physics-Guided Transformer (PGT), a novel neural architecture designed to reconstruct continuous physical fields from sparse observations, particularly for nonlinear systems governed by partial differential equations (PDEs). Traditional physics-informed approaches often struggle with gradient imbalance and instability, especially when data is limited. PGT addresses these challenges by embedding physical structures directly into the self-attention mechanism, incorporating a heat-kernel-derived additive bias that aligns with diffusion physics and temporal causality. The architecture utilizes physics-conditioned context tokens to enhance feature representation, which drives a FiLM-modulated sinusoidal implicit decoder for adaptive spectral response. The authors evaluate PGT on benchmark systems, including the one-dimensional heat equation and the two-dimensional incompressible Navier–Stokes equations, demonstrating significant improvements in reconstruction accuracy and stability compared to existing methods. PGT achieves a relative L2 error of 5.9×10−3 in 1D reconstruction with only 100 observations, marking a 38-fold improvement over physics-informed neural networks. In the 2D cylinder-wake problem, it achieves a governing-equation residual of 8.3 × 10−4 and a relative L2 error of 0.034, outperforming all comparable methods. The findings underscore the importance of integrating physical priors at the representational level, enhancing both optimization stability and physical coherence under data-scarce conditions.
Methodology
The PGT architecture integrates a physics-guided attention mechanism that incorporates an additive bias derived from the heat-kernel Green’s function. This allows the model to respect the causal and diffusive structures of PDEs. It employs a FiLM-modulated SIREN decoder to adaptively control frequency response based on the learned context, enabling accurate reconstruction of high-frequency details. The training framework combines PDE residuals, boundary/initial conditions, and data fidelity terms into a composite uncertainty-weighted loss.
Results
PGT achieved a relative L2 error of 5.9×10−3 in one-dimensional heat equation reconstruction with only 100 observations, a 38-fold improvement over traditional PINNs. In the two-dimensional cylinder-wake problem, it reached a governing-equation residual of 8.3 × 10−4 and a relative L2 error of 0.034, outperforming all other methods in terms of both accuracy and physical consistency.
Implications
The integration of physical inductive biases into neural architectures like PGT could significantly enhance the reliability and efficiency of scientific machine learning applications, particularly in fields requiring accurate modeling of complex physical systems under data constraints.
Hybrid Deep Learning with Temporal Data Augmentation for Accurate Remaining Useful Life Prediction of Lithium-Ion Batteries
Time Series
- Introduction of CDFormer, a hybrid deep learning model for RUL prediction of lithium-ion batteries.
- Integration of CNNs, DRSNs, and Transformers for improved feature extraction and modeling of degradation dynamics.
- Implementation of novel temporal data augmentation techniques to enhance model robustness.
- Demonstrated superior performance over existing RUL prediction methods with significant error reductions.
Read more
Hybrid Deep Learning with Temporal Data Augmentation for Accurate Remaining Useful Life Prediction of Lithium-Ion Batteries
Summary
This paper addresses the challenge of accurately predicting the remaining useful life (RUL) of lithium-ion batteries, which is crucial for effective health monitoring and maintenance. The authors propose a novel hybrid deep learning model called CDFormer, which integrates convolutional neural networks (CNNs), deep residual shrinkage networks (DRSNs), and Transformer encoders to capture multiscale temporal features from battery measurement signals such as voltage, current, and capacity. This architecture allows for the joint modeling of local and global degradation dynamics, significantly enhancing prediction accuracy. To further improve predictive reliability, the authors introduce a composite temporal data augmentation strategy that includes Gaussian noise, time warping, and time resampling, effectively addressing measurement noise and variability. The model is evaluated on two real-world datasets, demonstrating superior performance compared to conventional recurrent neural network-based and Transformer-based baselines. The results indicate a substantial reduction in prediction errors, showcasing the model's potential for practical applications in battery health monitoring and maintenance strategies.
Methodology
The proposed CDFormer model combines CNNs for local feature extraction, DRSNs for noise-resilient representation, and Transformer encoders for long-range temporal modeling. Additionally, a composite temporal data augmentation strategy is employed, incorporating Gaussian noise, time warping, and time resampling to simulate diverse battery degradation behaviors and operational variations.
Results
The CDFormer model outperformed conventional baselines and other deep learning variants, achieving average error reductions of 24.6% in root mean square error (RMSE), 30.4% in mean absolute error (MAE), and 25.9% in relative error (RE) compared to the state-of-the-art AttMoE method, highlighting its effectiveness in RUL prediction.
Implications
The findings suggest that CDFormer can significantly enhance the reliability and accuracy of RUL predictions for lithium-ion batteries, supporting better health monitoring and maintenance strategies in various industrial and everyday applications.
TinyML for Acoustic Anomaly Detection in IoT Sensor Networks
Audio & Speech
Efficient ML
Time Series
- Introduction of a compact TinyML pipeline for acoustic anomaly detection.
- Utilization of Mel Frequency Cepstral Coefficients (MFCCs) for sound feature extraction.
- Achieved 91% test accuracy and balanced F1-scores of 0.91 on the UrbanSound8K dataset.
- Demonstrates the effectiveness of on-device processing for real-time anomaly detection.
Read more
TinyML for Acoustic Anomaly Detection in IoT Sensor Networks
Summary
This paper presents a novel Tiny Machine Learning (TinyML) pipeline designed for real-time acoustic anomaly detection in Internet of Things (IoT) sensor networks. The authors highlight the challenges associated with cloud-based audio processing, such as latency, power consumption, and privacy concerns, particularly in resource-constrained environments. To address these issues, the proposed pipeline extracts Mel Frequency Cepstral Coefficients (MFCCs) from environmental sound signals and employs a lightweight neural network classifier optimized for deployment on microcontrollers. The model was trained and evaluated using the UrbanSound8K dataset, achieving a test accuracy of 91% and balanced F1-scores of 0.91 for both normal and anomalous sound classes. This work demonstrates the feasibility of embedded acoustic anomaly detection, providing a scalable and responsive solution for integrating acoustic intelligence into IoT systems.
Methodology
The authors developed a TinyML pipeline that extracts MFCCs from audio signals and trains a lightweight neural network classifier. The model is quantized and converted to TensorFlow Lite Micro format for compatibility with microcontroller platforms, enabling efficient on-device inference.
Results
The proposed model achieved a test accuracy of 91% and balanced F1-scores of 0.91 across normal and anomalous sound classes, confirming the effectiveness of the approach for real-time anomaly detection in IoT sensor networks.
Implications
This research has significant implications for enhancing safety and context awareness in IoT applications, particularly in urban environments. The ability to perform acoustic anomaly detection on edge devices can lead to improved privacy, reduced latency, and lower power consumption, making it suitable for a wide range of applications in smart cities and environmental monitoring.
Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes
Reinforcement Learning
Theory
Optimization
- Introduces an optimistic actor-critic framework for linear MDPs using parametric log-linear policies.
- Utilizes a logit-matching regression objective for the actor and Langevin Monte Carlo for the critic.
- Achieves state-of-the-art sample complexity in both on-policy and off-policy settings.
- Demonstrates practical effectiveness through experiments in linear MDPs and Atari environments.
Read more
Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes
Summary
This paper addresses the limitations of existing actor-critic methods in reinforcement learning, particularly their theoretical analyses and practical implementations in linear Markov Decision Processes (MDPs). The authors propose an optimistic actor-critic framework that utilizes parametric log-linear policies, which are more efficient for sampling compared to traditional implicit policies. The actor employs a tractable logit-matching regression objective, while the critic leverages approximate Thompson sampling through Langevin Monte Carlo to derive optimistic value estimates. The authors demonstrate that their algorithm achieves state-of-the-art sample complexity, specifically eO(ϵ−4) for on-policy and eO(ϵ−2) for off-policy settings, aligning theoretical performance with practical applicability. Experimental results validate the effectiveness of their approach in both linear MDPs and more complex environments like Atari games, showcasing its robustness and efficiency in real-world applications.
Methodology
The proposed framework consists of an actor that employs a logit-matching regression objective to optimize parametric log-linear policies, and a critic that uses Langevin Monte Carlo for approximate Thompson sampling to provide optimistic value estimates. The algorithm is analyzed for its sample complexity in both on-policy and off-policy settings.
Results
The algorithm achieves sample complexities of eO(ϵ−4) for on-policy and eO(ϵ−2) for off-policy settings, matching the best-known theoretical results while being more aligned with practical implementations. Experimental results confirm the algorithm's effectiveness in linear MDPs and its applicability to more complex environments like Atari games.
Implications
The findings suggest that the proposed optimistic actor-critic framework can enhance the efficiency of reinforcement learning algorithms, making them more applicable to real-world scenarios where exploration and sample efficiency are critical. This could lead to advancements in various fields, including robotics and automated decision-making systems.
The Unreasonable Effectiveness of Scaling Laws in AI
Theory
Efficient ML
Interpretability
- Classical scaling laws effectively predict AI progress despite diminishing returns.
- The compute variable should be interpreted as logical compute, abstracting from implementation details.
- Diminishing returns indicate rising operational burdens rather than merely a flatter performance curve.
- Efficiency improvements in hardware and algorithms are crucial for continued AI progress.
Read more
The Unreasonable Effectiveness of Scaling Laws in AI
Summary
This paper explores the effectiveness of classical AI scaling laws, emphasizing their predictive power and portability across different AI architectures and regimes. The author argues that the compute variable in these laws should be interpreted as 'logical compute,' an abstract measure that abstracts away from specific implementation details. This abstraction allows for continued AI progress despite diminishing returns, as it highlights the importance of efficiency in realizing compute. The paper introduces a framework that separates logical compute from the physical burden of resource realization, suggesting that diminishing returns reflect rising operational burdens rather than a simple flattening of performance curves. By making efficiency explicit and allowing it to compound over time, the author posits that ongoing progress in AI relies on repeated efficiency improvements in hardware, algorithms, and systems. The paper aims to clarify the classical scaling laws and their implications for future AI development.
Methodology
The author restates classical scaling laws, identifies omitted factors, and introduces a framework that separates logical compute from physical resource burdens. The paper employs theoretical analysis to derive a time-indexed efficiency-doubling extension of the classical scaling law.
Results
The paper demonstrates that classical scaling laws remain robust across different architectures and training regimes. It establishes a clear distinction between logical compute and the physical resources required to achieve it, showing that efficiency plays a critical role in sustaining AI progress.
Implications
The findings suggest that future AI advancements will depend on continuous improvements in efficiency across various dimensions, including hardware and algorithms. This perspective may guide researchers and practitioners in optimizing AI systems and understanding the limits of scaling.
Bit-Identical Medical Deep Learning via Structured Orthogonal Initialization
Time Series
Theory
Efficient ML
- Introduces a framework for verified bit-identical training in deep learning.
- Eliminates randomness from weight initialization, batch ordering, and GPU operations.
- Structured orthogonal initialization outperforms traditional Kaiming initialization.
- Demonstrates significant reductions in variance for rare clinical classes in ECG classification.
Read more
Bit-Identical Medical Deep Learning via Structured Orthogonal Initialization
Summary
This paper addresses the non-deterministic nature of deep learning training, particularly in medical applications where reproducibility is critical. The author proposes a framework for achieving verified bit-identical training by eliminating three sources of randomness: weight initialization, batch ordering, and non-deterministic GPU operations. The method employs structured orthogonal basis functions for weight initialization, golden ratio scheduling for batch ordering, and deterministic GPU architectures. The framework was tested on PTB-XL ECG rhythm classification, demonstrating that structured initialization significantly outperforms Kaiming initialization, reducing aggregate variance and per-class variability, especially for rare clinical classes. The results indicate that the proposed method not only achieves reproducibility but also maintains performance across various medical tasks, confirming its utility in clinical settings where consistent model behavior is essential.
Methodology
The methodology involves structured orthogonal initialization using basis functions (DCT, Hadamard, Hartley) to produce deterministic weights, golden ratio scheduling for consistent batch ordering, and the selection of GPU architectures that ensure deterministic operations. The training process is verified through MD5 hash comparisons to ensure bit-identical outputs across runs.
Results
The study found that structured initialization reduced aggregate variance by 2-3 times and per-class variability on rare rhythms by up to 7.5 times compared to Kaiming initialization. Cross-domain validation showed no performance penalties on standard tasks, and external ECG database evaluations confirmed high generalization performance with AUC values exceeding 0.93 for AFIB detection.
Implications
The findings suggest that deterministic training methods can enhance the reliability of deep learning models in medical applications, which is crucial for clinical deployment and regulatory compliance. This approach may lead to improved patient outcomes by ensuring consistent model predictions.
Kernel Dynamics under Path Entropy Maximization
Theory
- The kernel function is treated as a dynamical variable, allowing for a new perspective on kernel evolution.
- The optimization landscape is endogenous, meaning that the geometry of the probability space changes as kernels evolve.
- Fixed points of the dynamics correspond to self-consistent kernels that reinforce their own distinction structures.
- The thermodynamic cost of kernel change is quantitatively linked to the mutual information gained.
Read more
Kernel Dynamics under Path Entropy Maximization
Summary
This paper introduces a variational framework that treats the kernel function as a dynamical variable subject to path entropy maximization, known as Maximum Caliber (MaxCal). The kernel function is viewed as the foundational object that encodes the distinctions an agent can represent, leading to an effective geometry on the probability space. The author formulates fixed-point conditions for self-consistent kernels and proposes the renormalization group (RG) flow as a special case. The evolution of the neural tangent kernel (NTK) during deep network training is suggested as an empirical instantiation of this framework. The work establishes that the thermodynamic cost of kernel change is bounded by the mutual information unlocked by the updated kernel, linking information theory with kernel dynamics. The paper situates its findings within the broader context of assembly theory and MaxCal literature, and it poses open questions to guide future empirical and mathematical exploration.
Methodology
The paper employs a variational approach to maximize path entropy over trajectories in kernel space, deriving fixed-point conditions and stability criteria. It also discusses the implications of the framework in various contexts, including RG flow and NTK evolution.
Results
The main results include the formulation of a new framework for understanding kernel dynamics, the establishment of fixed-point conditions for self-consistent kernels, and the identification of thermodynamic bounds on kernel change. The paper also provides conjectural interpretations of these results in biological and scientific contexts.
Implications
This work has potential implications for understanding the dynamics of learning systems, particularly in how kernels evolve over time in response to new information. It may also inform approaches in biological evolution and adaptive systems, as well as contribute to the development of more effective machine learning algorithms.
Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation
Time Series
Audio & Speech
- The study compares three sample selection methods for annotating biomedical time-series data.
- Interactive 2D visualizations (2DVs) significantly enhance the annotation process, particularly in capturing rare classes.
- Variability in label distribution from 2DV can decrease classification performance when using individual annotator labels.
- Farthest-first traversal (FAFT) excels in scenarios with limited annotation budgets.
Read more
Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation
Summary
This paper addresses the challenges of annotating biomedical time-series data, which is crucial for developing reliable machine learning models in healthcare. The authors compare three sample selection methods for data annotation: random sampling (RND), farthest-first traversal (FAFT), and a novel graphical user interface-based method that utilizes interactive 2D visualizations (2DVs). The study involves twelve annotators, categorized as experts or non-experts, who annotate data under a constrained budget across two biomedical tasks: infant motility assessment (IMA) and speech emotion recognition (SER). The results indicate that the 2DV method outperforms the others in aggregating labels across annotators, particularly excelling in capturing rare classes in IMA. However, it also leads to higher variability in label distribution, which can negatively impact classification performance when models are trained on individual annotators' labels. In contrast, FAFT performed better in scenarios with limited annotations. The study highlights the potential of 2DV-based sampling for enhancing the annotation process, especially when resources are not severely limited, and provides insights into the subjective experiences of annotators, noting that 2DV made the task more engaging.
Methodology
The authors conducted a proof-of-concept study involving twelve human annotators (experts and non-experts) who annotated data using three sample selection methods: random sampling (RND), farthest-first traversal (FAFT), and a GUI-based approach with interactive 2D visualizations (2DVs) through the Time-Series Explorer (TSExplorer). The study evaluated these methods across two biomedical datasets and four classification tasks.
Results
The 2DV method outperformed RND and FAFT in aggregating labels across annotators and effectively captured rare classes in IMA. However, it also resulted in greater variability in label distribution, which negatively affected classification performance when models were trained on individual annotators' labels. FAFT showed superior performance in cases with limited annotation budgets, while RND was identified as the safest option when annotator expertise was uncertain.
Implications
The findings suggest that 2DV-based sampling could significantly improve the efficiency and effectiveness of biomedical time-series data annotation, particularly in settings where the annotation budget is not highly constrained. This approach may lead to better model performance and more engaging annotation experiences for users.
Geometric Evolution Graph Convolutional Networks: Enhancing Graph Representation Learning via Ricci Flow
Graph Learning
- Introduction of discrete Ricci flow into deep graph representation learning.
- Integration of LSTM and GCN for enhanced node representation learning.
- Demonstrated state-of-the-art performance on multiple benchmark datasets.
- Particularly effective in capturing structural heterophily and long-range interactions.
Read more
Geometric Evolution Graph Convolutional Networks: Enhancing Graph Representation Learning via Ricci Flow
Summary
The paper introduces the Geometric Evolution Graph Convolutional Network (GEGCN), a novel framework aimed at enhancing graph representation learning by modeling the geometric evolution of graphs through discrete Ricci flow. Traditional Graph Neural Networks (GNNs) often treat graphs as static structures, neglecting the rich global geometric properties. GEGCN addresses this limitation by employing Long Short-Term Memory (LSTM) networks to model the structural sequences generated by Ricci flow, allowing for dynamic representations that are integrated into a Graph Convolutional Network (GCN). The proposed method captures multi-scale structural information, leading to improved node embeddings for various downstream tasks. Experimental results demonstrate that GEGCN outperforms existing state-of-the-art methods, particularly excelling in tasks involving heterophilic graphs, thereby highlighting the importance of geometric dynamics in graph learning.
Methodology
GEGCN generates a sequence of evolving graphs using discrete Ricci flow, capturing structural dynamics at multiple scales. An LSTM is employed to model these dynamics, and the resulting representations are incorporated into a GCN to enhance node embeddings for classification tasks.
Results
Extensive experiments show that GEGCN consistently achieves superior performance on node classification tasks across various benchmark datasets, particularly excelling on heterophilic graphs. The model effectively captures long-range interactions and structural heterophily, outperforming strong baseline methods.
Implications
The findings suggest that integrating geometric dynamics into graph representation learning can significantly enhance model performance, paving the way for future research that unifies geometric analysis with graph neural networks. This approach could be applied in various domains where graph structures are prevalent, such as social networks, biological networks, and recommendation systems.
Machine Learning-Assisted High-Dimensional Matrix Estimation
Optimization
Theory
Efficient ML
- Introduces a machine learning-assisted approach to high-dimensional matrix estimation.
- Enhances LADMM with learnable parameters for improved accuracy and convergence speed.
- Proves theoretical convergence and faster convergence rates for the reparameterized LADMM.
- Validates the proposed method against classical optimization techniques.
Read more
Machine Learning-Assisted High-Dimensional Matrix Estimation
Summary
This paper addresses the computational challenges associated with high-dimensional matrix estimation, particularly focusing on covariance and precision matrices, which are essential in multivariate statistics. The authors propose a novel approach that integrates machine learning with optimization techniques, specifically using the Linearized Alternating Direction Method of Multipliers (LADMM). They enhance the traditional LADMM by introducing learnable parameters and modeling proximal operators with neural networks, which improves estimation accuracy and accelerates convergence. The paper establishes the theoretical convergence of both the original and reparameterized LADMM, demonstrating that the latter achieves a faster convergence rate. The proposed methodology is validated through comparisons with classical optimization algorithms across various matrix structures and dimensions, showcasing its effectiveness in practical applications. This work bridges the gap between theoretical advancements in high-dimensional statistics and the computational realities faced in empirical studies.
Methodology
The authors utilize the Linearized Alternating Direction Method of Multipliers (LADMM) as a foundational optimization technique. They introduce learnable parameters and model proximal operators using neural networks to enhance the optimization process. Theoretical proofs are provided for the convergence properties of both the standard and reparameterized LADMM.
Results
The proposed machine learning-assisted LADMM demonstrates improved estimation accuracy and faster convergence rates compared to traditional optimization algorithms. The effectiveness of the method is validated through empirical comparisons across different high-dimensional matrix structures.
Implications
This research has significant implications for fields requiring high-dimensional matrix estimation, such as genomics, finance, and neuroscience. The integration of machine learning with optimization can lead to more efficient and accurate statistical analyses in these domains.
Rethinking Language Model Scaling under Transferable Hypersphere Optimization
NLP
Large Language Models
Optimization
- Introduction of HyperP framework for optimal learning rate transfer across various model configurations.
- Demonstration of transferable stability in training dynamics under hypersphere optimization.
- Development of SqrtGate mechanism for improved MoE performance and load balancing.
- Achieved 1.58× compute efficiency over a strong baseline at large scales.
Read more
Rethinking Language Model Scaling under Transferable Hypersphere Optimization
Summary
This paper introduces HyperP (Hypersphere Parameterization), a novel framework for optimizing large language models (LLMs) by transferring optimal learning rates across various model configurations while ensuring stability during training. The authors highlight the limitations of existing hyperparameter transfer laws, which primarily focus on first-order optimizers and do not guarantee stability at scale. By employing hypersphere optimization, which constrains weight matrices to a fixed-norm hypersphere, the authors demonstrate that it is possible to achieve both optimal scaling efficiency and transferable stability. They derive new learning rate transfer laws applicable across model width, depth, training tokens, and Mixture-of-Experts (MoE) granularity. The paper also presents SqrtGate, a gating mechanism that enhances MoE performance. Empirical results show that HyperP achieves significant compute efficiency improvements and maintains stability indicators as training scales up, making it a promising approach for future LLM development.
Methodology
The authors developed the HyperP framework, which utilizes hypersphere optimization to derive learning rate transfer laws across different model configurations. They employed the Muon optimizer under the Frobenius-sphere constraint and conducted empirical evaluations to assess stability and efficiency. The SqrtGate mechanism was also introduced to enhance MoE performance.
Results
HyperP demonstrated a 1.58× compute efficiency improvement over a strong Muon baseline at 6 × 10^21 FLOPs. The framework ensured that all monitored instability indicators remained bounded and non-increasing as training FLOPs increased. The SqrtGate mechanism significantly reduced Z-value peaks and improved expert balance in MoE models.
Implications
The findings suggest that hypersphere optimization can fundamentally change how large language models are trained, allowing for more stable and efficient scaling. This could lead to better performance in LLMs and more effective use of computational resources in future AI applications.
SIMR-NO: A Spectrally-Informed Multi-Resolution Neural Operator for Turbulent Flow Super-Resolution
Theory
- Introduces SIMR-NO, a novel framework for turbulent flow super-resolution.
- Combines deterministic interpolation with spectral corrections for improved accuracy.
- Achieves significant error reduction compared to existing methods like FNO and EDSR.
- Successfully reproduces energy and enstrophy spectra, ensuring physical fidelity.
Read more
SIMR-NO: A Spectrally-Informed Multi-Resolution Neural Operator for Turbulent Flow Super-Resolution
Summary
The paper addresses the challenge of reconstructing high-resolution turbulent flow fields from severely under-resolved observations, a critical problem in computational fluid dynamics and scientific machine learning. Traditional interpolation methods are inadequate for capturing fine-scale structures, while existing deep learning approaches often rely on convolutional architectures that lack the necessary spectral and multiscale inductive biases. The authors propose the Spectrally-Informed Multi-Resolution Neural Operator (SIMR-NO), a hierarchical operator learning framework that decomposes the inverse mapping across intermediate spatial resolutions. This method integrates deterministic interpolation priors with spectrally gated Fourier residual corrections and local refinement modules to recover fine-scale features beyond the truncated Fourier basis. The effectiveness of SIMR-NO is evaluated on Kolmogorov-forced two-dimensional turbulence, reconstructing 128×128 vorticity fields from coarse 8×8 observations, achieving a mean relative ℓ2 error of 26.04% across 201 independent test realizations. The method outperforms existing techniques, reducing reconstruction error by significant margins and accurately reproducing the ground-truth energy and enstrophy spectra, demonstrating its capability for physically consistent super-resolution of turbulent flow fields.
Methodology
The SIMR-NO framework employs a hierarchical operator learning approach that factorizes the inverse mapping across multiple spatial resolutions. It integrates deterministic interpolation with spectrally gated Fourier residual corrections at each stage and incorporates local refinement modules to enhance the recovery of fine-scale spatial features.
Results
The proposed method achieved a mean relative â„“2 error of 26.04% in reconstructing high-resolution vorticity fields from coarse observations. It demonstrated a 31.7% reduction in reconstruction error compared to the Fourier Neural Operator (FNO), 26.0% compared to Enhanced Deep Super-Resolution (EDSR), and 9.3% compared to LapSRN. Additionally, SIMR-NO was the only method that accurately reproduced the ground-truth energy and enstrophy spectra across the full resolved wavenumber range.
Implications
The findings suggest that SIMR-NO can significantly enhance the accuracy of turbulent flow reconstructions, which is crucial for various applications in computational fluid dynamics, climate modeling, and other scientific fields where understanding turbulent dynamics is essential.
TED: Training-Free Experience Distillation for Multimodal Reasoning
Multimodal
Efficient ML
Large Language Models
- TED enables knowledge distillation without parameter updates, making it suitable for resource-constrained environments.
- The framework utilizes a teacher-guided experience generation and compression mechanism to distill reusable reasoning principles.
- Experiments show substantial performance improvements on multimodal reasoning tasks with only 100 training samples.
- TED reduces training costs by over 20 times compared to conventional parameter-based distillation methods.
Read more
TED: Training-Free Experience Distillation for Multimodal Reasoning
Summary
The paper introduces TED, a novel training-free framework for knowledge distillation that focuses on context-based experience rather than parameter updates. Traditional knowledge distillation methods require extensive training data and frequent parameter adjustments, which can be impractical in resource-constrained environments. TED addresses this by allowing a student model to generate multiple reasoning trajectories for each input while a teacher model independently produces its own reasoning. The teacher evaluates these trajectories against its own reasoning and the ground-truth answer, extracting generalized experiences that encapsulate effective reasoning patterns. A significant challenge in this approach is managing the growth of experiences and noise accumulation, which TED mitigates through an experience compression mechanism that tracks usage statistics and selectively merges or removes low-utility experiences. The framework was evaluated on multimodal reasoning benchmarks, demonstrating that TED can significantly enhance performance with minimal training data while reducing training costs by over 20 times compared to traditional methods. This indicates that meaningful knowledge transfer can occur through contextual experience without the need for parameter updates.
Methodology
TED employs a context-based distillation approach where the student model generates reasoning trajectories, and the teacher model evaluates these against its own reasoning and ground-truth answers. It utilizes an experience compression mechanism to manage the accumulation of experiences by tracking their utility and selectively merging or removing low-value experiences.
Results
On the MathVision benchmark, TED improved the performance of the Qwen3-VL-8B model from 0.627 to 0.702, and on VisualPuzzles from 0.517 to 0.561, all while using only 100 training samples. The results indicate that TED achieves performance comparable to fully trained parameter-based distillation methods under low-data conditions.
Implications
TED's training-free approach to knowledge distillation can be particularly beneficial for applications in edge computing and scenarios where model retraining is impractical. It opens avenues for efficient model adaptation and deployment in rapidly changing environments.
Hierarchy-Guided Topology Latent Flow for Molecular Graph Generation
Generative Models
Graph Learning
- HLTF explicitly generates bond topology alongside 3D coordinates to improve molecular validity.
- The model employs a planner-executor framework that integrates a latent hierarchy for global context.
- HLTF achieves high stability and validity rates on benchmark datasets, outperforming existing methods.
- The approach reduces false-valid samples that pass basic validation but fail stricter checks.
Read more
Hierarchy-Guided Topology Latent Flow for Molecular Graph Generation
Summary
The paper addresses the challenge of generating chemically valid 3D molecular structures, which is often hindered by discrete bond topology. The authors introduce the Hierarchy-Guided Latent Topology Flow (HLTF), a planner-executor model that generates bond graphs alongside 3D coordinates. This model incorporates a latent multi-scale plan to provide global context and employs a constraint-aware sampler to mitigate topology-driven failures. The HLTF framework evolves bond logits through feasibility-preserving dynamics and conditions topology decisions based on a hierarchical plan. Evaluations on the QM9 and GEOM-DRUGS datasets demonstrate that HLTF achieves high atom stability and valid-and-unique rates, outperforming existing methods in terms of plausibility and reducing false-valid samples. The contributions of the paper include a novel approach to topology generation, a hierarchy-conditioned prediction mechanism, and a constraint-aware sampling technique that collectively enhance the feasibility of generated molecular structures.
Methodology
The HLTF framework consists of three main components: a latent hierarchy plan that encodes multi-scale structure, a topology executor that predicts bond types conditioned on this hierarchy, and an E(3)-equivariant geometry predictor for generating 3D coordinates. The sampling process integrates a coupled ODE in logit space for categorical variables and Euclidean space for coordinates, utilizing annealed energy guidance to ensure topology feasibility.
Results
On the QM9 dataset, HLTF achieves 98.8% atom stability and 92.9% valid-and-unique rates, surpassing the best reported baseline by 0.9%. For the GEOM-DRUGS dataset, it reaches 85.5% validity and 85.0% valid-unique-novel rates without post-processing, and 92.2% validity after standardized relaxation, remaining competitive with the best post-processed methods.
Implications
The HLTF model has significant implications for drug discovery and molecular design, providing a robust framework for generating valid molecular structures that adhere to chemical constraints. Its ability to reduce false-valid samples can enhance the reliability of generated molecules for practical applications in pharmaceuticals and materials science.
From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
Large Language Models
Theory
Efficient ML
- Extension of the Semantic Router DSL to multi-step agent workflows.
- Introduction of a multi-target compilation framework for generating artifacts across different layers.
- Establishment of a four-pillar analysis framework for evaluating the proposed approach.
- Guarantees for auditability, cost efficiency, verifiability, and tunability are maintained across all targets.
Read more
From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
Summary
This paper introduces an extension of the Semantic Router DSL, a non-Turing-complete policy language originally designed for per-request LLM inference routing, to support stateful, multi-step agent workflows. The authors propose a multi-target compilation approach where a single declarative source file can generate verified decision nodes for orchestration frameworks, Kubernetes artifacts, and protocol-boundary gates. This unified policy language aims to eliminate policy drift by ensuring that changes in signal definitions propagate across all targets seamlessly. The paper outlines four main contributions: a multi-target compilation framework, a four-pillar analysis of auditability, cost efficiency, verifiability, and tunability, a complementarity argument for consistent policy across different layers, and a critical analysis of the proposed approach. The findings suggest that the unified policy framework enhances the reliability and efficiency of LLM-powered applications by providing a structured and verifiable decision-making process across various operational layers.
Methodology
The authors developed a multi-target compiler that translates the same Semantic Router DSL source file into various deployment artifacts, including orchestration decision nodes, Kubernetes configurations, and protocol-boundary gates. The methodology emphasizes the non-Turing-complete nature of the DSL, which allows for guaranteed conflict-free compilation and comprehensive verification across different operational layers.
Results
The proposed approach successfully demonstrates that a single source file can generate consistent and verified artifacts for multiple layers of an LLM-powered application. The analysis shows that the guarantees established for inference routing extend to agent workflows, with clear boundaries on what each pillar of the analysis covers. The critical examination of the approach highlights its strengths and potential limitations, providing a nuanced understanding of its applicability.
Implications
The findings suggest that adopting a unified policy language can significantly streamline the development and deployment of LLM applications, reducing the risk of policy drift and enhancing compliance with safety and privacy standards. This approach could lead to more robust and efficient agent orchestration in various applications, including automated decision-making systems and complex multi-agent environments.
ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing
Large Language Models
Efficient ML
NLP
- ITQ3 S utilizes FWHT for rotation-domain adaptive quantization, improving weight distribution for better quantization fidelity.
- The method achieves zero-error round-trip fidelity between quantization and inference, outperforming traditional 3-bit quantization methods.
- Empirical results show ITQ3 S achieves competitive perplexity with FP16 models while enhancing throughput significantly.
- The approach is specifically designed for consumer-grade GPUs, addressing the challenges of deploying large language models efficiently.
Read more
ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing
Summary
The paper introduces ITQ3 S (Interleaved Ternary Quantization – Specialized), a novel 3-bit weight quantization method designed for large language models (LLMs). ITQ3 S integrates TurboQuant, a rotation-domain adaptive quantization strategy based on the Fast Walsh-Hadamard Transform (FWHT), to address the precision loss typically associated with conventional 3-bit quantization methods. By pre-rotating the weight space, ITQ3 S mitigates the impact of heavy-tailed weight distributions and inter-channel outliers, leading to a near-Gaussian distribution suitable for uniform ternary coding. The authors present a mathematically rigorous dequantization procedure that ensures zero-error fidelity between offline quantization and online inference. Empirical results demonstrate that ITQ3 S achieves perplexity comparable to FP16 baselines while providing over 1.5× throughput compared to 4-bit alternatives on NVIDIA RTX 5090 hardware. This work establishes ITQ3 S as a practical solution for deploying high-fidelity LLMs on consumer-grade hardware, effectively bridging the gap between model capability and deployability.
Methodology
The methodology involves a deterministic FWHT-based rotation of weight vectors to minimize quantization error, followed by an interleaved ternary coding scheme that optimally packs 3-bit values into 32-bit words. A fused 256-point Inverse FWHT CUDA kernel is employed to reconstruct weights in shared memory, eliminating off-chip memory traffic penalties.
Results
ITQ3 S demonstrated competitive perplexity with FP16 baselines and achieved over 1.5× throughput compared to 4-bit quantization methods on the NVIDIA RTX 5090. The reconstruction error was shown to be strictly smaller than that of any uniform 3-bit baseline under equal bit-budget constraints.
Implications
The findings suggest that ITQ3 S can enable the deployment of larger and more complex language models on consumer hardware, making advanced AI capabilities more accessible. This could lead to broader applications of LLMs in various domains, including natural language processing and conversational AI.
Taming the Instability: A Robust Second-Order Optimizer for Federated Learning over Non-IID Data
Optimization
Federated Learning
- FedRCO is the first comprehensive framework addressing second-order optimization instability in federated learning.
- It incorporates mechanisms to monitor gradient anomalies and reset states during numerical instability.
- The proposed aggregation strategy preserves local curvature while integrating global knowledge.
- FedRCO shows superior performance in terms of convergence speed and accuracy compared to existing methods.
Read more
Taming the Instability: A Robust Second-Order Optimizer for Federated Learning over Non-IID Data
Summary
This paper introduces Federated Robust Curvature Optimization (FedRCO), a novel second-order optimization framework aimed at enhancing convergence speed and reducing communication costs in Federated Learning (FL) systems, particularly under non-IID data conditions. Traditional second-order optimization methods face challenges such as high computational costs and numerical instability in distributed environments. FedRCO addresses these issues by integrating an efficient approximate curvature optimizer with a stability mechanism. The framework consists of three main components: a Gradient Anomaly Monitor to detect and mitigate exploding gradients, a Fail-Safe Resilience protocol to reset optimization states during instability, and a Curvature-Preserving Adaptive Aggregation strategy to safely incorporate global knowledge while maintaining local curvature geometry. Theoretical analyses demonstrate that FedRCO effectively mitigates instability and prevents unbounded updates while ensuring optimization efficiency. Extensive experiments reveal that FedRCO outperforms both state-of-the-art first-order and second-order methods in terms of robustness against non-IID scenarios, achieving higher accuracy and faster convergence.
Methodology
The authors developed FedRCO by analyzing the causes of instability in second-order optimization within federated learning. They designed three key components: a Gradient Anomaly Monitor, a Fail-Safe Resilience protocol, and a Curvature-Preserving Adaptive Aggregation strategy. The framework was theoretically analyzed to ensure it controls update magnitudes and stabilizes second-order information. Extensive experiments were conducted on standard benchmarks to validate the framework's performance.
Results
FedRCO demonstrated significantly faster convergence and improved final accuracy in various non-IID scenarios compared to both baseline first-order methods and state-of-the-art second-order methods. The framework also reduced the number of communication rounds required for convergence, highlighting its efficiency in bandwidth-constrained environments.
Implications
The development of FedRCO has significant implications for federated learning applications, particularly in environments with heterogeneous data distributions. It enables more efficient and robust model training while preserving data privacy, making it suitable for IoT devices and other edge computing scenarios.
Data-Driven Plasticity Modeling via Acoustic Profiling
Audio & Speech
Time Series
Theory
- Introduces a data-driven approach to model plasticity in crystalline metals using acoustic emissions.
- Utilizes wavelet transforms for improved detection of AE events compared to traditional methods.
- Identifies 266 unique AE events, revealing insights into the mechanics of dislocation dynamics.
- Establishes a correlation between AE events and stress drops, validating the detection methodology.
Read more
Data-Driven Plasticity Modeling via Acoustic Profiling
Summary
This paper addresses the challenge of understanding plastic deformation in crystalline metals, which occurs through abrupt, localized dislocation events. These events generate acoustic emissions (AEs) that can be analyzed to gain insights into the underlying mechanics of material deformation. The author proposes a novel approach that extends traditional AE analysis by employing modern nonstationary modeling techniques, focusing on temporal modeling and forecasting of AE waveforms. The methodology includes detecting AE events using wavelet transforms, validating these events through physics-based analysis, and creating a labeled dataset for further study. The experimental setup involves a micropillar of Nickel subjected to compressive stress, with AEs detected via a piezo-electric device. The study employs Morlet wavelets for event detection, leading to the identification of 266 unique AE events across various frequency bands. The results indicate a strong correlation between detected AE events and stress drops in the material, suggesting that the proposed methods can enhance the understanding of dislocation dynamics and plasticity in metals.
Methodology
The methodology involves detecting AE events using Morlet wavelets to analyze the frequency spectrum of acoustic emissions. A zero-phase bandpass filter is applied to reduce noise, and an 'instantaneous band energy' metric is computed to identify AE events. The approach also includes physics-based validation of detected events against stress curves.
Results
The study successfully detected 266 unique AE events across different frequency bands, with a significant concentration of events in the 16KHz band. The correlation between AE events and stress drops indicates that the detection method effectively captures the dynamics of plastic deformation.
Implications
The findings have potential implications for advancing the understanding of material behavior under stress, improving predictive models for plasticity, and enhancing the design of materials in engineering applications.
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark
Time Series
- QUITOBENCH addresses the scarcity of high-quality benchmarks in time series forecasting.
- The benchmark categorizes time series based on intrinsic properties rather than application domains.
- Deep learning models outperform foundation models at short context lengths, while the reverse is true at longer lengths.
- Forecastability is the dominant factor affecting model performance, leading to significant MAE differences.
Read more
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark
Summary
The paper introduces QUITOBENCH, a novel benchmark for time series forecasting aimed at addressing the lack of high-quality, large-scale evaluation frameworks in the field. QUITOBENCH is built on QUITO, a comprehensive dataset of application traffic from Alipay, encompassing nine business domains and designed to cover eight distinct trend×seasonality×forecastability (TSF) regimes. This benchmark allows for a more nuanced evaluation of forecasting models based on intrinsic statistical properties rather than arbitrary domain labels. The authors benchmarked ten different forecasting models, including deep learning and foundation models, across 232,200 evaluation instances. Key findings reveal that deep learning models excel with shorter context lengths, while foundation models perform better with longer contexts. Forecastability emerged as the primary factor influencing model performance, with deep learning models achieving competitive results with significantly fewer parameters compared to foundation models. The study emphasizes the importance of scaling training data over model size for improved performance. The open-source release of QUITOBENCH aims to facilitate reproducible and regime-aware evaluations in time series forecasting research.
Methodology
The authors developed QUITOBENCH using a billion-scale dataset (QUIT0) from Alipay's application traffic, ensuring a regime-balanced evaluation across eight TSF regimes. They benchmarked ten forecasting models, analyzing their performance across 232,200 instances while focusing on context length, forecastability, and model parameters.
Results
The study found that deep learning models lead in performance at short context lengths (L = 96), while foundation models dominate at longer contexts (L ≥ 576). A significant MAE gap of 3.64× was observed across different regimes, with deep learning models achieving comparable performance with 59 times fewer parameters than foundation models. Additionally, increasing the amount of training data yielded greater performance improvements than scaling model size.
Implications
The introduction of QUITOBENCH has the potential to standardize evaluations in time series forecasting, guiding practitioners in model selection and improving the reliability of benchmarking in the field. This could lead to advancements in various applications, including finance, healthcare, and cloud computing.
Automatic feature identification in least-squares policy iteration using the Koopman operator framework
Reinforcement Learning
- Introduction of KAE-LSPI algorithm for automatic feature identification in RL.
- Reformulation of classical LSPI using the Koopman operator framework.
- Comparison with existing LSPI and KLSPI methods shows competitive performance.
- Elimination of the need for manual feature/kernel selection.
Read more
Automatic feature identification in least-squares policy iteration using the Koopman operator framework
Summary
This paper introduces the Koopman autoencoder-based least-squares policy iteration (KAE-LSPI) algorithm, which addresses the challenge of feature selection in reinforcement learning (RL) by leveraging the Koopman operator framework. The KAE-LSPI reformulates the classical least-squares fixed-point approximation method using extended dynamic mode decomposition (EDMD), allowing for automatic feature learning. The motivation behind this work stems from the limitations of existing linear RL techniques, which often require manual selection of features or kernels. The authors compare KAE-LSPI with classical least-squares policy iteration (LSPI) and kernel-based least-squares policy iteration (KLSPI) using stochastic chain walk and inverted pendulum control problems. The results indicate that KAE-LSPI can learn a reasonable number of features without prior fixed selections, achieving convergence to optimal or near-optimal policies comparable to the other methods. This approach not only simplifies the feature selection process but also enhances the efficiency of RL algorithms by integrating automatic feature learning into the policy iteration framework.
Methodology
The KAE-LSPI algorithm reformulates the classical least-squares fixed-point approximation method within the Koopman operator framework, utilizing the Koopman autoencoder (KAE) to learn the dictionary of basis functions from data. This approach avoids the manual selection of features or kernels, which is a common limitation in traditional RL methods.
Results
Empirical evaluations on stochastic chain walk and inverted pendulum control problems reveal that KAE-LSPI learns a manageable number of features and achieves convergence to optimal or near-optimal policies, comparable to classical LSPI and KLSPI methods. The results indicate that the KAE technique effectively addresses the feature selection challenge in RL.
Implications
The KAE-LSPI algorithm has the potential to streamline the feature selection process in reinforcement learning applications, making it easier to apply RL techniques to complex real-world problems without the burden of manual feature engineering. This could lead to more efficient and effective RL solutions across various domains.
Neuro-Symbolic Process Anomaly Detection
Theory
Interpretability
- Proposes a neuro-symbolic approach for process anomaly detection that integrates domain knowledge.
- Utilizes Logic Tensor Networks to enhance neural network models with symbolic reasoning.
- Demonstrates improved anomaly detection performance with as few as 10 conformant traces.
- Highlights the importance of Declare constraints in refining the detection process.
Read more
Neuro-Symbolic Process Anomaly Detection
Summary
This paper presents a novel approach to process anomaly detection by integrating neuro-symbolic AI techniques with traditional machine learning methods. The authors highlight the limitations of current neural network-based methods, which often misclassify rare but conformant traces as anomalies due to their statistical nature and lack of human domain knowledge. To address this, the authors propose using Logic Tensor Networks (LTN) to incorporate symbolic knowledge into the anomaly detection process. The approach involves training an autoencoder model on event logs while embedding Declare constraints, which serve as soft logical guiderails, to improve the model's ability to distinguish between anomalous and rare but conformant behaviors. The evaluation of the proposed method on both synthetic and real-world datasets demonstrates significant improvements in F1 scores, even with a minimal number of conformant traces. The results indicate that the choice of Declare constraints and the integration of domain knowledge play a crucial role in enhancing anomaly detection performance.
Methodology
The methodology involves training an autoencoder model on event logs to learn control flow patterns, while simultaneously mining and incorporating Declare constraints as soft logical guiderails using Logic Tensor Networks. This integration allows the model to optimize for both reconstruction error and the satisfaction of the constraints, thereby improving the detection of anomalies.
Results
The proposed neuro-symbolic approach significantly improves the F1 scores in anomaly detection compared to baseline models, particularly effective with limited conformant traces. The study shows that the choice of Declare constraints directly influences the performance gains in detecting anomalies.
Implications
This research has potential applications in various domains where process compliance and anomaly detection are critical, such as finance, healthcare, and manufacturing. The integration of domain knowledge into machine learning models can lead to more accurate and interpretable anomaly detection systems.
Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks
Theory
- The tail index α of the bottleneck layer predicts test accuracy with high precision under label noise conditions.
- Under hyperparameter variation, spectral and conventional measures are weak predictors of test accuracy.
- The spectral signature is concentrated at the information-processing bottleneck layer.
- The study provides a comprehensive comparison of spectral measures against conventional metrics.
Read more
Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks
Summary
This paper investigates the spectral properties of neural network weight matrices to predict test accuracy, particularly in the presence of label noise. The author finds that the tail index (α) of the eigenvalue distribution at the bottleneck layer of the network serves as a strong predictor of test accuracy, achieving a leave-one-out R² of 0.984 across various levels of controlled label noise, significantly outperforming conventional metrics. The study encompasses three neural network architectures (MLP, CNN, ResNet-18) and two datasets (MNIST, CIFAR-10). However, when examining hyperparameter variations at fixed data quality, the predictive power of both spectral and conventional measures diminishes, with R² values below 0.25. The paper posits that while the tail index is a robust diagnostic for identifying label corruption and training set degradation, it should not be viewed as a universal predictor of generalization. Additionally, a noise detector calibrated on synthetic noise effectively identifies real human annotation errors in CIFAR-10N. The research connects these findings to the BBP phase transition in random matrix theory and reports that the level spacing ratio is uninformative for weight matrices due to Wishart universality.
Methodology
The study employs a controlled experimental design to assess spectral predictors under two perturbation axes: label noise variation and hyperparameter variation. It calculates eigenvalues from weight matrices and measures the tail index, effective rank, outlier fraction, and deviations from the Marchenko-Pastur distribution. The research compares these spectral observables against conventional norm-based measures across different neural network architectures and datasets.
Results
The tail index α demonstrated a strong correlation with test accuracy (LOO R² = 0.984) under varying label noise levels, while conventional metrics performed poorly (R² ≤ 0.149). In contrast, under hyperparameter variations, all measures were weak predictors (R² < 0.25), with simple baselines slightly outperforming spectral measures. The noise detector successfully identified real annotation errors in CIFAR-10N, detecting 9% noise with only 3% error.
Implications
The findings suggest that the tail index can be a valuable tool for diagnosing data quality issues in neural networks, particularly in identifying label noise. This could have significant implications for improving model robustness and accuracy in real-world applications where data quality is often compromised.
Temporal Credit Is Free
Time Series
Optimization
Efficient ML
- Jacobian propagation is not necessary for online adaptation in RNNs; immediate derivatives are sufficient.
- Eligibility traces fail due to miscalibrated decay rates and lack of normalization, not because of the absence of Jacobian information.
- The proposed method scales to larger networks (n = 1024) with 1000× less memory than RTRL.
- An architectural rule is established to determine when normalization is required based on the presence of nonlinear state updates.
Read more
Temporal Credit Is Free
Summary
This paper challenges the conventional wisdom regarding online training of recurrent neural networks (RNNs) by demonstrating that Jacobian propagation is unnecessary for effective adaptation. The author argues that the hidden state in RNNs inherently carries temporal credit, allowing for immediate derivatives to suffice for online learning. The paper identifies two critical issues with existing eligibility trace methods: the miscalibration of decay rates and the lack of per-parameter normalization, which leads to ineffective gradient updates. By correcting these issues—specifically, using zero decay and incorporating the Adam optimizer's β2 normalization—the proposed method matches or surpasses the performance of full Real-Time Recurrent Learning (RTRL) while significantly reducing memory requirements. The findings are validated across various architectures and real-world benchmarks, showing that immediate derivatives can effectively replace traditional Jacobian computations, thus enabling scalable online learning in RNNs.
Methodology
The author revisits the assumptions behind eligibility traces and Jacobian propagation in RNNs, proposing a method that utilizes immediate derivatives with corrected decay rates and normalization. The performance of this method is tested across ten different architectures and real-world benchmarks, including BCI data and language models.
Results
The proposed method achieves recovery rates exceeding 100% compared to full RTRL on various tasks, demonstrating improved accuracy and stability. It also shows significant memory efficiency, operating at 12.6 MB compared to 12.9 GB for RTRL, while effectively adapting to pretrained language models.
Implications
This research has the potential to simplify the training of recurrent neural networks, making them more efficient and scalable for real-time applications. It could lead to advancements in fields requiring online learning, such as robotics, time series analysis, and real-time decision-making systems.
Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids
Reinforcement Learning
Optimization
Robotics
- Introduces a DRL framework for maritime CPP on irregular hexagonal grids.
- Utilizes a Transformer-based pointer policy for constructing coverage tours.
- Implements a critic-free GRPO scheme for stable training in long-horizon tasks.
- Achieves a 99.0% success rate in unseen environments, outperforming classical heuristics.
Read more
Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids
Summary
This paper addresses the challenge of Coverage Path Planning (CPP) in maritime environments characterized by irregular geometries, such as coastlines and islands. Traditional CPP methods struggle with these complexities, often requiring computationally intensive re-planning. The authors propose a novel Deep Reinforcement Learning (DRL) framework that formulates CPP as a neural combinatorial optimization problem, utilizing a Transformer-based pointer policy to construct coverage tours on hexagonal grids. A key innovation is the implementation of a critic-free Group-Relative Policy Optimization (GRPO) scheme, which estimates advantages through comparisons of sampled trajectories rather than relying on a value function. This approach stabilizes training and enhances performance in long-horizon routing tasks. The experimental results demonstrate that the proposed method achieves a 99.0% Hamiltonian success rate across 1,000 unseen synthetic maritime environments, significantly outperforming traditional heuristics in terms of path length and heading changes, while maintaining real-time feasibility for on-board deployment.
Methodology
The authors formulated the CPP problem as a graph traversal task on hexagonal grids, employing a Transformer-based pointer policy that constructs coverage paths through dynamic action masking. They adapted the GRPO method to estimate advantages based on trajectory comparisons, avoiding the instability associated with traditional actor-critic methods.
Results
The proposed policy achieved a 99.0% Hamiltonian success rate, produced paths that were 7% shorter, and required 24% fewer heading changes compared to the closest baseline. All inference modes operated under 50 ms per instance, confirming the method's suitability for real-time applications.
Implications
The findings suggest that the proposed DRL framework can significantly enhance the efficiency of maritime surveillance missions, enabling better resource allocation and operational effectiveness in complex environments. This approach could be applied to various maritime applications, including search and rescue, environmental monitoring, and security operations.
AcTTA: Rethinking Test-Time Adaptation via Dynamic Activation
Computer Vision
- AcTTA introduces an activation-aware approach to Test-Time Adaptation, focusing on dynamic modulation of activation functions.
- The framework allows for adaptive adjustments of activation behavior without modifying network weights or requiring source data.
- Extensive experiments show that AcTTA outperforms traditional normalization-based TTA methods across multiple datasets.
- The study highlights the importance of activation functions in representation dynamics and their potential for improving adaptation to domain shifts.
Read more
AcTTA: Rethinking Test-Time Adaptation via Dynamic Activation
Summary
The paper introduces AcTTA, a novel framework for Test-Time Adaptation (TTA) that emphasizes the role of activation functions in adapting neural networks to distribution shifts during inference. Traditional TTA methods primarily focus on recalibrating normalization layers, often neglecting the significant impact of activation functions on representation dynamics. AcTTA addresses this gap by reformulating conventional activation functions into parameterized forms that allow for adaptive adjustments of their response thresholds and gradient sensitivities. This enables the model to dynamically modulate its activation behavior without altering network weights or requiring source data. The authors demonstrate that AcTTA achieves robust and stable adaptation across various datasets, including CIFAR10-C, CIFAR100-C, and ImageNet-C, consistently outperforming normalization-based TTA methods. The findings suggest that activation adaptation is a compact and effective strategy for enhancing domain-shift robustness in test-time learning, thus broadening the current understanding of adaptation mechanisms in neural networks.
Methodology
AcTTA reformulates conventional activation functions into parameterized forms that can be adaptively adjusted during inference. This involves modulating the response thresholds and gradient sensitivities of activation functions, allowing for continuous adaptation to input variations without changing the overall network architecture.
Results
The experimental results demonstrate that AcTTA consistently achieves better performance and stability compared to existing normalization-based TTA methods across CIFAR10-C, CIFAR100-C, and ImageNet-C datasets, indicating its effectiveness in handling distribution shifts.
Implications
The findings suggest that incorporating activation adaptability into TTA frameworks can significantly enhance the robustness of neural networks in real-world applications where distribution shifts are common. This approach could lead to more resilient models in various domains, including computer vision and beyond.
ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment
Graph Learning
Multimodal
- ORACAL integrates heterogeneous graph models with LLMs for enhanced vulnerability detection.
- The framework employs a causal attention mechanism to improve robustness against adversarial attacks.
- PGExplainer is used for generating explainable outputs, aiding in understanding vulnerability paths.
- ORACAL achieves state-of-the-art performance, significantly outperforming existing models.
Read more
ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment
Summary
The paper introduces ORACAL, a novel framework designed to enhance the detection of vulnerabilities in smart contracts through a multimodal approach that integrates various graph representations and large language models (LLMs). Traditional methods for vulnerability detection, particularly those relying on homogeneous graphs, have been found lacking in capturing the complex interactions within smart contract code. ORACAL addresses these limitations by employing a heterogeneous graph model that combines Control Flow Graphs (CFG), Data Flow Graphs (DFG), and Call Graphs (CG). It enriches critical subgraphs with contextual information derived from Retrieval-Augmented Generation (RAG) and LLMs, thereby providing a deeper semantic understanding of the code. A causal attention mechanism is implemented to distinguish genuine vulnerability indicators from misleading correlations, enhancing the model's robustness against adversarial attacks. Additionally, the framework incorporates PGExplainer to generate explainable outputs, detailing the paths that lead to identified vulnerabilities. The experimental results demonstrate that ORACAL outperforms existing models in vulnerability detection, achieving a peak Macro F1 score of 91.28% and maintaining strong generalization across various datasets. The framework also shows resilience against adversarial attacks, with minimal performance degradation.
Methodology
The methodology involves creating a heterogeneous multimodal graph learning framework that integrates CFG, DFG, and CG. It enriches critical nodes with contextual information from RAG and LLMs, and employs a causal attention mechanism to focus on true vulnerability indicators. The explainability aspect is addressed using PGExplainer to provide insights into the vulnerability detection process.
Results
ORACAL achieved a peak Macro F1 score of 91.28% on the primary benchmark, outperforming other models by up to 39.6 percentage points. It maintained strong generalization with scores of 91.8% on CGT Weakness and 77.1% on DAppScan. In terms of explainability, PGExplainer achieved a Mean Intersection over Union (MIoU) of 32.51% against manually annotated paths. The framework also limited performance degradation to approximately 2.35% under adversarial conditions, with an Attack Success Rate of only 3%.
Implications
The implications of this research are significant for the blockchain and smart contract development communities, as it provides a robust and explainable tool for vulnerability detection, potentially reducing financial losses due to exploits. The integration of LLMs with graph-based models could pave the way for more advanced security analysis tools in software engineering.
Interpretable long-term traffic modelling on national road networks using theory-informed deep learning
Interpretability
Theory
Time Series
- DeepDemand integrates travel demand theory with deep learning for improved traffic volume predictions.
- The model outperforms traditional and machine learning baselines in predictive accuracy.
- It demonstrates good geographic transferability, making it suitable for long-term planning.
- Interpretability analysis provides insights into travel-time deterrence and socioeconomic factors.
Read more
Interpretable long-term traffic modelling on national road networks using theory-informed deep learning
Summary
This paper addresses the challenges of long-term traffic modeling for transport planning, which often involves a trade-off between interpretability, transferability, and predictive accuracy. Traditional travel demand models provide a behavioral structure but require extensive calibration and strong assumptions, while generic deep learning models capture complex patterns but lack theoretical grounding. The authors propose DeepDemand, a theory-informed deep learning framework that integrates key components of travel demand theory to predict long-term highway traffic volumes. The framework utilizes socioeconomic features and road-network structure, employing a competitive two-source Dijkstra procedure for local origin-destination (OD) region extraction and a differentiable architecture for modeling OD interactions and travel-time deterrence. Evaluated on eight years of data from the UK strategic road network, DeepDemand outperforms various baseline models, achieving an R² of 0.718 and an MAE of 7,406 vehicles under random cross-validation, and maintains strong performance (R² = 0.665) under spatial cross-validation. The interpretability analysis reveals stable nonlinear travel-time deterrence patterns and key socioeconomic drivers, demonstrating the framework's potential for practical planning applications.
Methodology
The methodology involves a five-stage framework including data collection and preprocessing, local OD region extraction, deep learning model training, model evaluation, and explainability analysis. The model is trained on observed link volumes using a differentiable architecture that captures the core steps of traditional travel demand models.
Results
DeepDemand achieves an R² of 0.718 and an MAE of 7,406 vehicles under random cross-validation, outperforming linear, ridge, random forest, and gravity-style models. It also shows strong performance under spatial cross-validation (R² = 0.665), indicating good transferability across geographic regions.
Implications
The integration of transport theory with deep learning in DeepDemand provides a robust framework for long-term traffic modeling, offering valuable insights for transport planning, infrastructure investment, and policy-making. Its interpretability can aid in understanding traffic dynamics and informing decision-making processes.
Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards
Generative Models
Multimodal
- Generative modeling is transforming protein design beyond traditional structure prediction.
- The survey categorizes methods into representations, architectures, and task settings.
- Best practices for evaluation emphasize the importance of physical validity and leakage-aware splits.
- Identifies key challenges in the field, including biosecurity risks and modeling complexities.
Read more
Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards
Summary
This paper presents a comprehensive survey of generative modeling in protein design, addressing the fragmented landscape of methodologies, representations, and evaluation standards in the field. The authors categorize generative AI applications into foundational representations (sequence, geometric, and multimodal encodings), generative architectures (including SE(3)-equivariant diffusion and flow matching), and various task settings such as structure prediction and protein interactions. The survey emphasizes the importance of comparing assumptions, conditioning mechanisms, and controllability across different models. Additionally, it synthesizes best practices for evaluation, highlighting the need for leakage-aware splits and physical validity checks. The authors identify critical challenges in the field, including modeling conformational dynamics, scaling to large assemblies, and addressing biosecurity risks associated with dual-use technologies. By unifying architectural advances with practical evaluation standards, the paper aims to facilitate the transition from predictive modeling to reliable, function-driven protein engineering.
Methodology
The authors conducted a systematic review of existing literature on generative modeling in protein design, categorizing methods based on their representations, architectures, and applications. They compared various models and synthesized best practices for evaluation, focusing on the integration of generative design with practical considerations.
Results
The survey highlights the rapid advancements in generative modeling techniques and their applications in protein design, showcasing the convergence of structure prediction and generative design. It identifies gaps in the literature and proposes a framework for evaluating generative models that emphasizes practical applicability and safety.
Implications
The findings suggest that generative modeling can significantly enhance protein engineering, leading to the development of novel therapeutics and biomaterials. However, the associated biosecurity risks necessitate careful consideration of governance frameworks to prevent misuse of these technologies.
Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
Reinforcement Learning
Theory
Optimization
- Introduces a robust estimator for offline MARLHF against data corruption.
- Achieves O(ϵ1−o(1)) and O(√ϵ) bounds on Nash-equilibrium gaps under different coverage assumptions.
- Develops a quasi-polynomial-time algorithm for coarse correlated equilibria to address computational challenges.
- First systematic approach to handle adversarial data corruption in multi-agent settings.
Read more
Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
Summary
This paper addresses the challenge of robustness against data corruption in offline multi-agent reinforcement learning from human feedback (MARLHF). The authors propose a framework that considers a strong-contamination model where an epsilon fraction of the data can be arbitrarily corrupted. They model the problem using linear Markov games and introduce a robust estimator that guarantees a Nash-equilibrium gap bound under uniform coverage assumptions. In scenarios with unilateral coverage, they achieve a Nash-gap bound of O(√ϵ). The paper also tackles the computational intractability of these procedures by relaxing the solution concept to coarse correlated equilibria (CCE), leading to a quasi-polynomial-time algorithm with a CCE gap scaling as O(√ϵ). This work is significant as it is the first systematic treatment of adversarial data corruption in offline MARLHF, highlighting the complexities and potential solutions in multi-agent settings where data integrity is critical.
Methodology
The authors model the problem using linear Markov games and develop robust estimators for reward functions. They utilize value-based backward induction for Nash equilibrium computation under uniform coverage and projected gradient ascent for unilateral coverage. To address computational intractability, they relax the Nash equilibrium concept to coarse correlated equilibria and apply the Optimistic Hedge algorithm.
Results
The proposed algorithms yield bounds on the Nash gap of O(nϵ1−o(1) + n/√m) under uniform coverage and O(n√ϵ + n/√m + n/√T1) under unilateral coverage. The CCE approach results in a gap bound of O(n√ϵ + n/√m + n/√T1 + n/T2), demonstrating effective robustness against data corruption.
Implications
The findings suggest that robust methods for MARLHF can significantly enhance the reliability of multi-agent systems in real-world applications, particularly in safety-critical environments where data integrity is paramount. This work lays the groundwork for future research in adversarial robustness in multi-agent reinforcement learning.
DPD-Cancer: Explainable Graph-based Deep Learning for Small Molecule Anti-Cancer Activity Prediction
Graph Learning
- DPD-Cancer utilizes a Graph Attention Transformer for improved prediction of small molecule anti-cancer activity.
- The model outperforms existing methods, achieving high AUC scores and correlation coefficients for pGI50 predictions.
- Attention mechanisms in DPD-Cancer enhance explainability by identifying and visualizing important molecular features.
- The framework incorporates a multi-stage, chemistry-aware data partitioning strategy for robust performance evaluation.
Read more
DPD-Cancer: Explainable Graph-based Deep Learning for Small Molecule Anti-Cancer Activity Prediction
Summary
The paper presents DPD-Cancer, a novel deep learning framework based on a Graph Attention Transformer (GAT) architecture, aimed at predicting the anti-cancer activity of small molecules. The challenge of accurately predicting drug responses in cancer research is addressed by modeling the complex interactions between molecular structures and cellular contexts, particularly in light of tumor heterogeneity and genomic variability. DPD-Cancer is benchmarked against existing state-of-the-art methods, demonstrating superior performance with an Area Under ROC Curve (AUC) of up to 0.87 on the NCI60 dataset and up to 0.98 on other datasets. The model also predicts growth inhibition concentration (pGI50) across various cancer types, achieving Pearson's correlation coefficients of up to 0.72 on independent test sets. A key feature of DPD-Cancer is its explainability, leveraging attention mechanisms to visualize specific molecular substructures, thus providing actionable insights for drug candidate prioritization and lead optimization. The framework is made publicly available as a web server, facilitating its use in drug discovery.
Methodology
DPD-Cancer employs a Graph Attention Transformer architecture to model the relationships between molecular structures and their biological effects. It uses a multi-stage, chemistry-aware data partitioning strategy to ensure robust validation against novel chemical spaces, enhancing the reliability of its predictions.
Results
DPD-Cancer achieved an AUC of up to 0.87 on the NCI60 dataset and up to 0.98 on other datasets. For pGI50 predictions across 10 cancer types and 73 cell lines, it reached Pearson's correlation coefficients of up to 0.72 on independent test sets, indicating strong predictive performance.
Implications
The development of DPD-Cancer has significant implications for drug discovery, particularly in oncology, as it provides a powerful tool for predicting drug responses and optimizing lead compounds. Its explainability features can guide researchers in understanding the underlying molecular interactions, potentially leading to more effective cancer therapies.
Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL
Reinforcement Learning
Robotics
Multimodal
- ROVED combines vision-language embeddings with selective oracle feedback for efficient PbRL.
- The framework reduces the need for high-quality oracle feedback by leveraging noisy VLE outputs.
- A parameter-efficient fine-tuning method enhances the VLE's performance using sparse oracle feedback.
- ROVED achieves oracle-level performance while cutting annotation costs by 50-80%.
Read more
Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL
Summary
This paper presents ROVED, a novel framework designed to enhance preference-based reinforcement learning (PbRL) by integrating lightweight vision-language embeddings (VLEs) with selective oracle feedback. The authors identify the challenge of high costs associated with obtaining oracle feedback, which limits the scalability of PbRL methods. ROVED addresses this by utilizing VLEs to generate segment-level preferences while relying on oracle feedback only for uncertain cases, thus reducing the overall need for costly feedback. The framework includes a parameter-efficient fine-tuning method that adapts the VLE based on oracle feedback, improving its performance over time. The evaluation of ROVED on robotic manipulation tasks demonstrates that it can achieve performance comparable to traditional oracle-only methods while significantly reducing the number of required oracle queries by 50-80%. Additionally, the fine-tuned VLE shows strong generalization across different tasks, leading to cumulative annotation savings of up to 90%. This work highlights the potential of combining scalable embedding techniques with precise oracle supervision to make PbRL more practical and efficient.
Methodology
The ROVED framework employs a hybrid approach where vision-language embeddings generate segment-level preferences, and oracle feedback is sought only for uncertain cases. It incorporates a parameter-efficient adaptation scheme that improves VLE preference labels using dynamics-aware objectives and sparse oracle feedback. Additionally, it utilizes a confidence-aware training strategy to minimize oracle queries by focusing on uncertain samples.
Results
ROVED matches or exceeds the performance of prior preference-based methods on robotic manipulation tasks while reducing oracle queries by up to 80%. The fine-tuned VLE generalizes well across tasks, achieving cumulative annotation savings of 75-90%.
Implications
The findings suggest that integrating scalable VLE models with selective oracle feedback can significantly enhance the efficiency and practicality of preference-based reinforcement learning, making it more accessible for real-world applications in robotics and beyond.
DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease
Generative Models
Efficient ML
- DRiffusion introduces a draft-and-refine process for parallelizing diffusion models.
- The method employs skip transitions to generate multiple draft states for parallel noise computation.
- Theoretical acceleration rates of 1/n or 2/(n+1) are achieved depending on the operational mode.
- Empirical results show speedups of 1.4× to 3.7× with minimal quality degradation.
Read more
DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease
Summary
The paper introduces DRiffusion, a novel framework designed to parallelize the sampling process of diffusion models, which are known for generating high-fidelity content but suffer from slow iterative sampling. The authors propose a draft-and-refine approach that utilizes skip transitions to create multiple draft states for future timesteps, allowing for parallel computation of their corresponding noise predictions. This method significantly accelerates the diffusion inference process, achieving theoretical acceleration rates of 1/n or 2/(n+1) based on the mode of operation. Empirical results demonstrate that DRiffusion can achieve speedups of 1.4× to 3.7× across various diffusion models while maintaining generation quality, as evidenced by minimal degradation in metrics such as FID and CLIP on the MS-COCO dataset. The paper highlights the importance of unlocking inherent parallelism in diffusion models, providing a practical implementation that combines theoretical insights with effective computational strategies.
Methodology
DRiffusion utilizes a draft-and-refine process that leverages skip transitions within the diffusion sampling chain. By generating multiple draft states in parallel and computing their corresponding noise predictions simultaneously, the method consolidates the computational bottleneck into a single parallel step, enhancing overall efficiency.
Results
The implementation of DRiffusion resulted in speedups ranging from 1.4× to 3.7× across multiple diffusion models. Quality metrics such as FID and CLIP remained comparable to the original models, with only minor drops in PickScore and HPSv2.1, indicating that the method effectively balances speed and output fidelity.
Implications
The DRiffusion framework has significant implications for the deployment of diffusion models in interactive applications, where low latency is critical. By improving sampling speed without sacrificing quality, it opens up new possibilities for real-time content generation in various domains, including image and video synthesis, audio generation, and more.
Skillful Kilometer-Scale Regional Weather Forecasting via Global and Regional Coupling
Time Series
- Introduction of a global-regional coupling framework for high-resolution weather forecasting.
- Development of the ScaleMixer module for dynamic identification of cross-scale interactions.
- Significant performance improvement over operational NWP and AI baselines in forecasting accuracy.
- Ability to capture complex weather phenomena in challenging terrains.
Read more
Skillful Kilometer-Scale Regional Weather Forecasting via Global and Regional Coupling
Summary
This paper addresses the challenges of high-resolution regional weather forecasting by proposing a novel global-regional coupling framework that integrates a pretrained Transformer-based global model with a high-resolution regional network. The framework employs a unique module called ScaleMixer, which dynamically identifies meteorologically critical regions through adaptive key-position sampling and facilitates cross-scale feature interaction using dedicated attention mechanisms. The proposed method produces forecasts at a resolution of 0.05° (approximately 5 km) and 1-hour intervals, significantly outperforming traditional numerical weather prediction (NWP) systems and existing AI baselines. The model demonstrates exceptional skill in capturing fine-grained weather phenomena, such as orographic wind patterns and Foehn warming, while maintaining global coherence. The results indicate that the framework is not only efficient, taking less than 3 minutes for a 48-hour forecast on a single GPU, but also effective in operational settings, showcasing its potential for practical applications in regional weather forecasting.
Methodology
The authors developed a global-regional coupling framework that integrates a pretrained Transformer model for synoptic-scale context with a high-resolution regional model. The ScaleMixer module plays a crucial role in identifying key regions for cross-scale interactions and enables bidirectional feature fusion between global and regional data.
Results
The proposed framework outperformed operational NWP and leading AI models in both hindcast and real-time settings. It demonstrated notable skill in accurately forecasting fine-grained weather phenomena, particularly in complex terrains in China, and achieved efficient inference times, making it suitable for practical applications.
Implications
The framework has significant implications for improving the accuracy and efficiency of regional weather forecasting, which is critical for disaster mitigation, agriculture, and energy management. Its ability to capture complex weather dynamics can enhance decision-making processes in various sectors reliant on accurate weather predictions.
From Independent to Correlated Diffusion: Generalized Generative Modeling with Probabilistic Computers
Generative Models
Optimization
Efficient ML
- Introduction of correlated diffusion that incorporates Ising couplings into the sampling process.
- Demonstration of improved sampling efficiency and accuracy using probabilistic computers (p-computers).
- Validation of the framework on benchmark systems, showing closer alignment with MCMC distributions.
- Establishment of a hybrid architecture combining p-computers for sampling and GPUs for neural network evaluation.
Read more
From Independent to Correlated Diffusion: Generalized Generative Modeling with Probabilistic Computers
Summary
This paper presents a novel approach to diffusion models in generative modeling by integrating Markov chain Monte Carlo (MCMC) dynamics into the stochastic sampling process. Traditional diffusion models primarily rely on independent noise injection, but this work generalizes the sampling component to incorporate known interaction structures, specifically Ising couplings. The authors propose a hybrid architecture utilizing probabilistic computers (p-computers) built from probabilistic bits (p-bits) for efficient sampling, which significantly enhances sampling throughput and energy efficiency compared to conventional GPU-based methods. The framework is validated through experiments on the 2D ferromagnetic Ising model and the 3D Edwards-Anderson spin glass, demonstrating that correlated diffusion yields samples that are more consistent with MCMC reference distributions than those produced by independent diffusion. This advancement opens pathways for new classes of diffusion algorithms that leverage structured probabilistic sampling for generative tasks.
Methodology
The authors developed a generalized diffusion framework that integrates Ising-structured Gibbs dynamics into both the forward noising and reverse inference processes. They utilized p-computers for stochastic sampling, allowing for the incorporation of known couplings in the sampling distributions, and compared the performance against traditional independent diffusion methods.
Results
The experiments on the 2D ferromagnetic Ising model and the 3D Edwards-Anderson spin glass demonstrated that the proposed correlated diffusion approach produced samples that were significantly closer to MCMC reference distributions compared to those generated by independent diffusion, indicating enhanced accuracy and efficiency.
Implications
This research suggests that the integration of structured probabilistic sampling into diffusion models can lead to more accurate generative modeling, particularly in systems with known interactions. The use of p-computers may revolutionize the efficiency of generative tasks in deep learning, paving the way for advanced applications in various fields such as physics, optimization, and machine learning.
Improving Risk Stratification in Hypertrophic Cardiomyopathy: A Novel Score Combining Echocardiography, Clinical, and Medication Data
Multimodal
Interpretability
- Development of a novel ML risk score for HCM using echocardiographic, clinical, and medication data.
- The Random Forest model significantly outperformed the ESC score in predicting 5-year cardiovascular outcomes.
- The model provides high interpretability through SHAP analysis, identifying both established and novel predictors.
- Longitudinal analysis shows the model's stability over time, allowing for dynamic risk monitoring in clinical settings.
Read more
Improving Risk Stratification in Hypertrophic Cardiomyopathy: A Novel Score Combining Echocardiography, Clinical, and Medication Data
Summary
This study addresses the critical need for improved risk stratification in patients with hypertrophic cardiomyopathy (HCM), a condition associated with a significant risk of sudden cardiac death. Current risk models, such as the European Society of Cardiology (ESC) score, have shown moderate performance in predicting outcomes. The authors propose a novel machine learning (ML) risk score that integrates echocardiographic, clinical, and medication data from Electronic Health Records (EHRs) to predict a 5-year composite cardiovascular outcome. The model was developed using a large cohort of 1,201 patients from the SHARE registry and validated on an independent cohort of 382 patients from Rennes Hospital. The final Random Forest ensemble model achieved an internal Area Under the Curve (AUC) of 0.85±0.02, significantly outperforming the ESC score (0.56±0.03). Survival curve analysis indicated superior risk separation for the ML score compared to the ESC score. Additionally, the model demonstrated stability over time in event-free patients, highlighting its potential for longitudinal risk monitoring. The study emphasizes the importance of explainability in ML models, utilizing Shapley additive explanations (SHAP) to identify key predictors, thereby enhancing the interpretability and clinical applicability of the risk score.
Methodology
The study utilized a cohort of 2,244 HCM patients, applying strict inclusion criteria to ensure adequate follow-up data for 5-year risk prediction. A Random Forest ensemble model was trained on echocardiographic, clinical, and medication data, with internal validation on a large cohort and external validation on an independent cohort. SHAP analysis was employed to enhance model interpretability and identify significant predictors.
Results
The Random Forest model achieved an internal AUC of 0.85±0.02, significantly surpassing the ESC score's AUC of 0.56±0.03. Survival curve analysis showed superior risk separation for the ML score (Log-rank p = 8.62 × 10−4) compared to the ESC score (p = 0.0559). The model also demonstrated stability in risk predictions over time for event-free patients.
Implications
The proposed ML risk score offers a promising tool for personalized clinical management of HCM, enhancing the ability to identify high-risk patients for ICD therapy and improving overall patient outcomes. Its explainability and longitudinal monitoring capabilities could lead to more informed clinical decisions and better patient management strategies.
Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems
Reinforcement Learning
Efficient ML
- Knowledge Distillation effectively reduces the size and computational requirements of Transformer-based reinforcement learning models.
- The distilled student models can outperform teacher models in terms of electricity cost efficiency.
- Significant reductions in model parameters, memory usage, and inference time were achieved without sacrificing performance.
- The approach enhances the applicability of reinforcement learning in resource-constrained environments, such as residential energy management systems.
Read more
Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems
Summary
This paper explores the application of Knowledge Distillation (KD) to enhance the efficiency of Transformer-based reinforcement learning models, specifically the Decision Transformer (DT), for use in hardware-constrained residential energy management systems. The authors address the challenge of deploying high-capacity models, which typically require significant computational resources, on embedded systems with limited memory and processing power. By utilizing the Ausgrid dataset, the researchers train teacher models using a sequence-based Decision Transformer framework on multi-building data. They then distill these teacher models into smaller student models that maintain control quality while significantly reducing model size. The results demonstrate that the distilled student models can achieve up to 96% reduction in parameters, 90% reduction in memory usage, and 63% reduction in inference time, while often preserving or even improving control performance. This work highlights the potential of KD to make advanced reinforcement learning techniques more applicable in real-world energy management scenarios, where resource limitations are a critical concern.
Methodology
The methodology involves training high-capacity teacher models using the Decision Transformer framework on a diverse dataset of multi-building energy data. Knowledge Distillation is then applied to create smaller student models by matching the actions of the teacher models, ensuring that the distilled models retain effective decision-making capabilities.
Results
The study found that the distilled student models achieved up to 96% fewer parameters, 90% less memory usage, and 63% faster inference times compared to the teacher models. Additionally, the control performance was largely preserved, with some configurations showing up to a 1% improvement in electricity cost efficiency.
Implications
The findings suggest that Knowledge Distillation can significantly enhance the deployment of advanced reinforcement learning models in practical applications, particularly in energy management systems where computational resources are limited. This could lead to more efficient energy usage and cost savings in residential settings.
Information-Theoretic Limits of Safety Verification for Self-Improving Systems
Theory
- Establishes dual conditions for safety in self-improving systems: bounded risk and unbounded utility.
- Proves that classifiers under power-law risk schedules cannot achieve both safety and utility simultaneously.
- Introduces a verification escape mechanism that allows for zero risk with positive true positive rates.
- Demonstrates a universal finite-horizon ceiling for classifier utility, which is subpolynomial compared to verifiers.
Read more
Information-Theoretic Limits of Safety Verification for Self-Improving Systems
Summary
This paper addresses the challenge of ensuring safety in self-improving AI systems while allowing for beneficial modifications. It formalizes the problem using dual conditions: bounded risk (P δn < ∞) and unbounded utility (P TPRn = ∞). The author establishes a series of theorems demonstrating the incompatibility of these conditions under certain risk schedules, particularly power-law distributions. The first major result (Theorem 1) shows that for classifiers operating under overlapping safe/unsafe distributions, the true positive rate (TPRn) is bounded, leading to a limitation on utility. A second result (Theorem 5) provides a universal finite-horizon ceiling for classifier utility, indicating that it grows subpolynomially, significantly lower than the linear growth achievable by verifiers. The paper also introduces a verification escape mechanism (Theorem 2), where sound verification gates can achieve zero risk while maintaining a positive true positive rate. The findings highlight the structural limitations of classifier-based safety mechanisms and suggest that verification methods can circumvent these limitations, providing a pathway for safe self-improvement in AI systems.
Methodology
The paper employs theoretical proofs to establish the impossibility results and utility bounds. It utilizes techniques from information theory, including Hölder's inequality and NP counting methods, to derive the main results. The author also validates findings through empirical evaluations on models like GPT-2.
Results
Key results include: (1) Impossibility of achieving both bounded risk and unbounded utility for classifiers under power-law risk schedules (Theorem 1); (2) A universal finite-horizon ceiling for classifier utility that grows subpolynomially (Theorem 5); (3) A verification escape mechanism that allows for achieving zero risk with a positive true positive rate (Theorem 2).
Implications
The findings have significant implications for the design of safety mechanisms in AI systems, particularly those that are self-improving. They suggest that reliance on classifiers for safety may be insufficient, and that verification methods could provide a more robust approach to ensuring safe self-modification.