AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
66
Papers today
8h
Update frequency
7
Days of history
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
Large Language Models
Theory
Efficient ML
- NANOZK provides a cryptographic mechanism for verifying LLM inference, addressing trust issues in LLM APIs.
- The layerwise proof framework allows for independent layer computations, significantly improving scalability and efficiency.
- Lookup table approximations for non-arithmetic operations maintain model accuracy without compromising verification.
- NANOZK achieves a 52× speedup over existing ZKP methods while ensuring soundness guarantees.
Read more
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
Summary
The paper introduces NANOZK, a novel zero-knowledge proof (ZKP) system designed to ensure verifiable inference for large language models (LLMs). As users increasingly rely on LLM APIs, they face a trust gap due to the lack of cryptographic assurance that the claimed model is indeed being used. NANOZK addresses this issue by leveraging the inherent layerwise structure of transformer models, allowing for independent proofs for each layer. This layerwise decomposition not only reduces the computational burden associated with traditional monolithic proving approaches but also enables parallel proof generation. The authors develop lookup table approximations for non-arithmetic operations, ensuring no degradation in model accuracy while achieving significant speed improvements in proof generation and verification. The results demonstrate that NANOZK can generate proofs for GPT-2 scale transformers in just 43 seconds with a proof size of 6.9KB, achieving a 52× speedup over existing methods while maintaining formal soundness guarantees.
Methodology
The authors propose a layerwise decomposition of transformer inference into independent proofs, connected by cryptographic commitments. They develop lookup table approximations for non-arithmetic operations and utilize Fisher information to guide verification prioritization. This approach allows for parallel proof generation and selective verification based on layer importance.
Results
NANOZK generates proofs for GPT-2 scale transformers in 43 seconds with a constant proof size of 6.9KB and a verification time of 23ms. This represents a 52× speedup compared to the EZKL toolkit, particularly beneficial for larger models where traditional methods struggle with memory constraints. The lookup approximations preserve model perplexity across various benchmarks, confirming that verification does not compromise output quality.
Implications
NANOZK has significant implications for industries relying on LLMs, such as healthcare and legal sectors, where trust and verification of model outputs are critical. It enables users to confidently utilize LLM APIs while ensuring they receive the promised model capabilities, thereby enhancing the reliability of AI-driven decision-making processes.
Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions
Large Language Models
Reinforcement Learning
Optimization
- Algorithm rankings are scale-dependent, with significant inversions observed between different model sizes.
- Most modifications to DPO variants do not significantly outperform the vanilla DPO algorithm.
- Algorithm effectiveness is highly task-specific, with performance varying greatly across different benchmarks.
- A hierarchy of performance factors is established: model scale > training paradigm > online vs. offline methods > loss function modifications.
Read more
Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions
Summary
This paper addresses the lack of controlled comparisons among various post-training algorithms used for aligning large language models (LLMs) with human preferences. The author introduces OXRL, a unified framework that implements 51 post-training algorithms under identical conditions, enabling a large-scale evaluation across different model scales (0.5B to 7B) and evaluation domains. The study reveals three major findings: (1) Algorithm rankings are unstable across model scales, with significant ranking inversions observed; (2) Modifications to loss functions yield negligible improvements, with most variants of DPO not outperforming the original; (3) The effectiveness of algorithms is highly task-specific, indicating that algorithm choice is crucial primarily within the training distribution. The findings suggest a hierarchy of factors influencing performance: model scale has the most significant impact, followed by training paradigm, online vs. offline methods, and loss function modifications. The paper concludes by releasing OXRL as a community benchmark for post-training algorithms, providing tools for future research.
Methodology
The study employs a controlled empirical comparison of post-training algorithms using a unified framework (OXRL) that standardizes model loading, data pipelines, and evaluation harnesses. It evaluates 8 algorithms across 4 model scales and 20 DPO variants, conducting 100 runs for statistical robustness.
Results
Key results include the discovery of scale-dependent ranking inversions, where the best-performing algorithm at 1.5B (SGRPO) becomes the worst at 7B (SimPO). Additionally, none of the 20 DPO variants significantly outperform the original DPO after correction for multiple comparisons, and algorithm performance varies significantly across tasks.
Implications
The findings highlight the importance of model scale and task specificity in algorithm selection for post-training processes, guiding practitioners in choosing the most effective algorithms for their specific applications. The release of OXRL as a benchmark may facilitate further research and development in this area.
GoAgent: Group-of-Agents Communication Topology Generation for LLM-based Multi-Agent Systems
NLP
Large Language Models
Graph Learning
- GoAgent shifts the communication topology generation paradigm from node-centric to group-centric.
- The method utilizes LLMs to identify task-relevant collaborative groups for efficient problem-solving.
- Incorporation of a Conditional Information Bottleneck (CIB) reduces communication redundancy.
- GoAgent achieves 93.84% average accuracy and reduces token consumption by about 17% across benchmarks.
Read more
GoAgent: Group-of-Agents Communication Topology Generation for LLM-based Multi-Agent Systems
Summary
The paper introduces GoAgent, a novel method for generating communication topologies in large language model (LLM)-based multi-agent systems (MAS). Traditional approaches often rely on node-centric paradigms, which can lead to suboptimal coordination and excessive communication overhead due to implicit group structures. GoAgent addresses these limitations by explicitly treating collaborative groups as atomic units in the topology construction process. It begins by using an LLM to enumerate task-relevant candidate groups and then employs an autoregressive model to select and connect these groups, ensuring strong intra-group cohesion and effective inter-group coordination. To further enhance communication efficiency, GoAgent integrates a Conditional Information Bottleneck (CIB) mechanism that compresses inter-group communication, filtering out redundant signals while preserving task-relevant information. The experimental results demonstrate that GoAgent achieves state-of-the-art performance with an average accuracy of 93.84% while reducing token consumption by approximately 17%. This work not only proposes a new paradigm for topology generation but also highlights the importance of group structures in enhancing the performance of MAS.
Methodology
GoAgent employs a two-step process: first, it uses a large language model to enumerate candidate groups relevant to a specific task. Then, it utilizes an autoregressive graph generation model to select and connect these groups, forming a communication topology that emphasizes intra-group cohesion and inter-group coordination. The Conditional Information Bottleneck (CIB) is integrated to compress communication and filter out noise.
Results
GoAgent demonstrated state-of-the-art performance with an average accuracy of 93.84% on six benchmarks, while also achieving a reduction in token consumption by approximately 17%, indicating improved efficiency in communication.
Implications
The findings suggest that explicitly modeling group structures can significantly enhance the performance and efficiency of multi-agent systems, paving the way for more effective applications in complex task-solving scenarios such as collaborative decision-making and automated reasoning.
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition
Time Series
- Trojan horse attacks pose significant security risks to deep forecasting models used in spacecraft telemetry.
- The Trojan Horse Hunt competition engaged over 200 teams to identify hidden triggers in forecasting models.
- The competition highlighted the lack of effective methods for detecting and characterizing trojan triggers in time series data.
- The results emphasize the necessity for robust security measures in AI applications, particularly in high-stakes environments like space operations.
Read more
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition
Summary
This paper addresses the security risks posed by trojan horse attacks in deep forecasting models, particularly in the context of spacecraft telemetry. The authors organized the Trojan Horse Hunt competition, where over 200 teams were tasked with identifying hidden triggers in deep forecasting models. The competition aimed to fill the gap in research regarding the detection and characterization of trojan triggers in time series forecasting, a relatively underexplored area compared to image classification and language models. The paper details the competition's task formulation, evaluation protocols, and highlights the best solutions. It emphasizes the importance of developing robust methods for detecting adversarial triggers to ensure the safety and reliability of AI systems in critical applications like space operations. The findings suggest a need for ongoing research into effective identification techniques for time series forecasting models, which are essential for maintaining the integrity of AI applications in safety-critical domains.
Methodology
The authors organized a data science competition where participants were tasked with reconstructing multivariate triggers injected into deep forecasting models. The competition provided a benchmark dataset and evaluation protocol to assess the effectiveness of various solutions proposed by the teams.
Results
The competition yielded innovative solutions from participants, showcasing diverse approaches to trigger detection and reconstruction. The best-performing teams demonstrated significant advancements in identifying adversarial triggers, contributing valuable insights into the security of time series forecasting models.
Implications
The findings from this research have critical implications for the development of secure AI systems in safety-critical applications, particularly in space operations. Enhanced methods for detecting trojan triggers can improve the reliability and trustworthiness of AI models, facilitating their broader adoption in high-stakes environments.
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
Optimization
Efficient ML
Generative Models
- SOL-ExecBench benchmarks GPU kernels against hardware limits rather than software baselines.
- The benchmark includes 235 optimization problems from diverse AI models, targeting NVIDIA Blackwell GPUs.
- Performance is measured using Speed-of-Light (SOL) bounds derived from hardware specifications.
- A scoring system quantifies the optimization potential of kernels relative to hardware capabilities.
Read more
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
Summary
The paper introduces SOL-ExecBench, a novel benchmarking framework designed to evaluate GPU kernels against hardware limits rather than traditional software baselines. This benchmark encompasses 235 CUDA kernel optimization problems derived from 124 AI models across various domains, including language, vision, and audio, specifically targeting NVIDIA Blackwell GPUs. Unlike existing benchmarks that primarily focus on speedup over software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds, which represent the theoretical maximum performance achievable on the hardware. The authors developed a pipeline called SOLAR to compute these SOL bounds based on FLOP counts and GPU throughput. The benchmark also includes a scoring system that quantifies the extent to which a kernel closes the gap between a predefined baseline and the SOL bound, thereby encouraging optimizations that approach hardware efficiency. Additionally, a sandboxed evaluation harness is provided to ensure reliable and reproducible results, addressing potential reward-hacking behaviors in AI-driven optimizers. The framework aims to redefine GPU kernel benchmarking by focusing on hardware efficiency and providing a robust infrastructure for evaluating agentic AI systems.
Methodology
The authors developed SOL-ExecBench by extracting computational subgraphs from a variety of AI models and curating them into benchmark problems. They utilized a pipeline called SOLAR to analytically derive Speed-of-Light bounds based on hardware performance metrics. The evaluation framework includes a sandboxed environment to ensure reproducibility and mitigate reward-hacking strategies.
Results
The paper reports the establishment of a comprehensive benchmark that effectively measures GPU kernel performance against hardware limits. The SOL Score provides a quantifiable metric for evaluating kernel optimizations, highlighting the remaining optimization headroom relative to the maximum achievable performance.
Implications
SOL-ExecBench has the potential to significantly impact GPU kernel optimization by providing a more accurate assessment of performance, guiding future hardware design, and facilitating the development of more efficient AI systems. It encourages the creation of kernels that leverage the full capabilities of emerging hardware architectures.
Scalable Learning of Multivariate Distributions via Coresets
Efficient ML
Theory
Optimization
- Introduction of the first coresets for semi-parametric distributional models.
- Significant data reduction achieved through importance sampling.
- High probability bounds on log-likelihood accuracy maintained.
- Enhanced adaptability for complex distributions and non-linear relationships.
Read more
Scalable Learning of Multivariate Distributions via Coresets
Summary
This paper addresses the challenge of scalable learning in the context of multivariate conditional transformation models (MCTMs) by introducing a novel coreset construction. The authors highlight the limitations of existing methods in handling large-scale data and propose a semi-parametric approach that utilizes importance sampling for data reduction while ensuring high probability bounds on log-likelihood accuracy. The method enhances adaptability in modeling complex distributions and non-linear relationships, which are often inadequately captured by traditional parametric models. The authors also tackle numerical stability issues through a geometric approximation based on the convex hull of input data, facilitating accurate inference in large datasets. Experimental results demonstrate significant improvements in computational efficiency, paving the way for broader applications in statistics and machine learning.
Methodology
The authors develop a coreset construction for MCTMs, which model both unconditional and conditional distributions using monotonic transformation functions and Gaussian copulas. The approach involves minimizing the negative log-likelihood through optimization of linear basis coefficients and covariance structures, while addressing numerical stability with geometric approximations.
Results
The proposed method demonstrates substantial improvements in computational efficiency when applied to large and complex datasets, maintaining statistical accuracy with high probability bounds on log-likelihood. The experiments validate the scalability and effectiveness of the coreset approach in practical applications.
Implications
The findings suggest that the proposed coreset construction can significantly enhance the scalability of non-parametric and semi-parametric models in statistics and machine learning, making it applicable to a wide range of real-world problems involving large and complex datasets.
FedRG: Unleashing the Representation Geometry for Federated Learning with Noisy Clients
Federated Learning
- FedRG redefines noise identification in federated learning by focusing on representation geometry rather than scalar loss values.
- The framework utilizes self-supervised learning to create robust, label-agnostic representations.
- A spherical vMF mixture model is employed to capture semantic clusters and identify noisy samples effectively.
- Extensive experiments validate the superior performance of FedRG over state-of-the-art methods in heterogeneous data environments.
Read more
FedRG: Unleashing the Representation Geometry for Federated Learning with Noisy Clients
Summary
The paper introduces FedRG, a novel framework designed to enhance federated learning (FL) in the presence of noisy client data. Traditional methods for identifying noisy labels often rely on scalar loss values, which can be unreliable in heterogeneous scenarios. FedRG shifts the focus from loss values to the representation geometry of data. It employs self-supervised learning to create label-agnostic spherical representations and utilizes a spherical von Mises-Fisher (vMF) mixture model to capture semantic clusters based on previously identified clean samples. By measuring the divergence between label-free and label-conditioned feature spaces, FedRG robustly identifies noisy samples and updates the vMF model accordingly. Additionally, a personalized noise absorption matrix is applied to optimize the model against noisy labels. Experimental results demonstrate that FedRG significantly outperforms existing methods in various noisy client scenarios, showcasing its effectiveness in addressing the challenges posed by data heterogeneity in federated learning.
Methodology
FedRG employs a representation geometry approach to identify noisy labels. It creates spherical representations using self-supervised learning and fits a spherical vMF mixture model to capture semantic clusters. The divergence between label-free and label-conditioned feature spaces is measured to identify noisy samples. A personalized noise absorption matrix is then used to optimize the model against these noisy labels.
Results
The experimental results indicate that FedRG significantly outperforms existing federated learning methods in various scenarios characterized by data heterogeneity and noisy clients. The framework's design and components were validated through comprehensive ablation studies, confirming their effectiveness and necessity.
Implications
FedRG has the potential to improve the robustness and reliability of federated learning systems in real-world applications where data is distributed and often noisy, such as in healthcare, finance, and other privacy-sensitive domains.
TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
Large Language Models
Efficient ML
NLP
- TTQ enables on-the-fly quantization of large models during inference, addressing domain shift issues.
- The framework incorporates low-complexity activation-aware quantization with negligible overhead.
- TTQ integrates low-rank decomposition to further enhance model compression.
- Experiments show that TTQ outperforms existing quantization methods on several LLM benchmarks.
Read more
TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
Summary
The paper introduces a novel framework called Test-Time Quantization (TTQ) aimed at addressing the computational demands of large foundation models (LLMs) during inference. Traditional activation-aware compression techniques often rely on calibration data, which can lead to domain shift issues when applied to unseen tasks. The proposed TTQ framework allows for on-the-fly compression of large models at inference time, utilizing an efficient online calibration process that adapts to each prompt dynamically. This method not only accelerates inference speed but also improves quantization performance compared to existing state-of-the-art methods. The authors demonstrate the effectiveness of TTQ through various experiments, showcasing its ability to enhance LLM performance without the need for extensive offline calibration.
Methodology
The TTQ framework employs an activation-aware quantization method that dynamically calculates the diagonal correlation of input tokens during inference. This allows for real-time adaptation of scale and zero-point parameters without requiring offline calibration data. The method leverages groupwise quantization and a simplified activation-aware quantization approach to minimize approximation loss while maintaining computational efficiency.
Results
The experimental results indicate that TTQ significantly improves quantization performance over state-of-the-art methods, achieving faster inference times and better adaptability to various downstream tasks. The framework demonstrates a minimal computational overhead, making it suitable for real-time applications.
Implications
The TTQ framework has the potential to enhance the accessibility and efficiency of large language models in practical applications, particularly in scenarios where computational resources are limited or where models need to adapt to new tasks quickly. This could lead to broader adoption of LLMs in various domains, including natural language processing and beyond.
MLOW: Interpretable Low-Rank Frequency Magnitude Decomposition of Multiple Effects for Time Series Forecasting
Time Series
- MLOW provides an interpretable frequency-based decomposition for time series forecasting.
- Introduces Hyperplane-NMF, a new low-rank method that enhances interpretability and efficiency.
- Addresses challenges of spectral leakage and input horizon limitations in time series analysis.
- Demonstrates robustness to noise and effective disentanglement of multiple effects.
Read more
MLOW: Interpretable Low-Rank Frequency Magnitude Decomposition of Multiple Effects for Time Series Forecasting
Summary
The paper introduces MLOW, a novel approach for time series forecasting that focuses on interpretable multi-effect decomposition. Traditional time series forecasting models often struggle with effectively separating multiple effects such as trends and seasonality due to their reliance on smoothing-based techniques. MLOW leverages a frequency-based decomposition pipeline that represents a time series as a magnitude spectrum multiplied by phase-aware basis functions. The authors propose a new low-rank method called Hyperplane-NMF, which combines the strengths of existing methods like PCA and NMF while addressing their limitations in interpretability and generalization. MLOW also incorporates a mathematical mechanism to allow flexible selection of input horizons and frequency levels, mitigating issues related to spectral leakage. The results demonstrate that MLOW achieves robust, interpretable, and hierarchical multiple-effect decomposition, showing significant performance improvements when integrated into existing time series forecasting frameworks with minimal architectural changes.
Methodology
MLOW employs a frequency-based decomposition approach, representing time series as a magnitude spectrum and phase-aware basis functions. It utilizes a sliding window technique to sample data and applies Hyperplane-NMF to learn low-rank representations of the frequency magnitude spectrum, addressing issues of interpretability and generalization. The method also incorporates a mathematical framework to select input horizons and frequency levels flexibly.
Results
The experimental results indicate that MLOW achieves superior performance in time series forecasting tasks compared to traditional smoothing-based methods. It successfully captures dominant trends and seasonal effects while maintaining interpretability. Visual analyses confirm the method's ability to provide clear and hierarchical decompositions of multiple effects, even in the presence of noise.
Implications
MLOW has significant implications for various real-world applications requiring time series forecasting, such as demand prediction, financial risk assessment, and environmental monitoring. Its interpretable nature allows practitioners to better understand the underlying dynamics of time series data, facilitating more informed decision-making.
SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
Large Language Models
Efficient ML
NLP
- SpecForge provides a scalable and efficient framework for training speculative decoding models.
- The framework supports EAGLE-3 and incorporates advanced techniques like target-draft decoupling and hybrid parallelism.
- SpecBundle offers a suite of high-quality draft models that enhance inference speed and quality.
- The proposed methods lead to significant reductions in inference latency for large language models.
Read more
SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
Summary
The paper introduces SpecForge, an open-source framework designed to enhance the training of speculative decoding models, particularly addressing the challenges posed by large language models (LLMs) that experience high inference latency due to their autoregressive nature. SpecForge aims to facilitate the creation of high-quality draft models that can significantly speed up inference processes by using a lightweight draft model to propose multiple tokens for verification by a larger target model. The framework incorporates several innovative features, including target-draft decoupling, hybrid parallelism, and optimized training kernels, which collectively enable a remarkable training speedup of up to 9.9 times for the EAGLE-3 model. Additionally, the authors present SpecBundle, a collection of production-grade draft models trained using SpecForge, which addresses the scarcity of effective draft models in the open-source community. The results demonstrate that these draft models can achieve up to 4.48 times end-to-end inference speedup, establishing SpecForge as a practical solution for deploying speculative decoding in real-world applications.
Methodology
The authors developed SpecForge by implementing a flexible architecture that supports target-draft decoupling and hybrid parallelism. They optimized training kernels to enhance performance and integrated the framework with production-grade inference engines. A systematic study of training recipes for speculative decoding was conducted to improve the quality of draft models.
Results
SpecForge achieved up to 9.9 times faster training for the EAGLE-3 model and enabled draft models that provide up to 4.48 times end-to-end inference speedup on the SGLang benchmark, demonstrating substantial improvements in both training efficiency and inference performance.
Implications
The introduction of SpecForge and SpecBundle has the potential to significantly improve the deployment of speculative decoding in various applications, particularly in environments where low latency is critical. This framework could facilitate broader adoption of speculative decoding techniques in the industry, enhancing the performance of large language models.
Beyond Weighted Summation: Learnable Nonlinear Aggregation Functions for Robust Artificial Neurons
Theory
Optimization
Computer Vision
- Introduction of two learnable nonlinear aggregation functions: F-Mean and Gaussian Support neurons.
- Development of hybrid neurons that combine linear and nonlinear aggregation for improved robustness.
- Evaluation on CIFAR-10 demonstrates significant improvements in noise robustness and slight gains in clean data performance.
- Learned parameters converge to sub-linear aggregation strategies, indicating effective noise handling.
Read more
Beyond Weighted Summation: Learnable Nonlinear Aggregation Functions for Robust Artificial Neurons
Summary
This paper addresses the limitations of the traditional weighted summation aggregation method used in artificial neurons, which is sensitive to noise and outliers. The author proposes two novel differentiable aggregation mechanisms: the F-Mean neuron, which utilizes a learnable power-weighted aggregation rule, and the Gaussian Support neuron, which employs distance-aware affinity weighting. To enhance optimization stability, hybrid neurons are introduced that blend linear and nonlinear aggregation through a learnable parameter. The proposed methods are evaluated on multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) using the CIFAR-10 dataset and a noisy variant with additive Gaussian corruption. The results indicate that hybrid neurons significantly improve robustness against noise, with the three-way hybrid achieving robustness scores of up to 0.991 compared to 0.890 for the standard baseline. Additionally, the F-Mean hybrids show modest performance gains on clean data. The findings suggest that neuron-level aggregation is a crucial yet underexplored aspect for developing noise-tolerant neural networks.
Methodology
The paper proposes two new aggregation mechanisms: F-Mean, which uses a power-weighted mean, and Gaussian Support, which applies distance-aware weighting. Hybrid neurons are created by blending these nonlinear methods with traditional linear aggregation through a learnable parameter. The methods are tested in MLP and CNN architectures on CIFAR-10 datasets, both clean and noisy.
Results
The hybrid neurons consistently outperform standard neurons in terms of robustness to noise, achieving a maximum robustness score of 0.991 compared to 0.890 for the baseline. The F-Mean hybrids also provide modest improvements in performance on clean data. The learned parameters indicate a trend towards sub-linear aggregation, suggesting effective strategies for noise tolerance.
Implications
The findings imply that rethinking aggregation methods at the neuron level can lead to more robust neural network architectures, particularly in noisy environments. This could have applications in various domains where data quality is variable, such as computer vision and real-world sensor data processing.
Automatic Configuration of LLM Post-Training Pipelines
Large Language Models
Reinforcement Learning
Optimization
- AutoPipe is a novel framework for budget-aware configuration selection in LLM post-training pipelines.
- It employs a dataset-conditioned ranking surrogate to provide transferable guidance across datasets.
- The framework adapts online using Bayesian optimization and a Gaussian-process residual model.
- An early-stop predictor is introduced to minimize evaluation costs by leveraging early training signals.
Read more
Automatic Configuration of LLM Post-Training Pipelines
Summary
The paper addresses the challenges of configuring large language model (LLM) post-training pipelines, which involve supervised fine-tuning (SFT) and reinforcement learning (RL). The authors propose AutoPipe, a budget-aware two-stage framework designed to optimize configuration selection under realistic compute constraints. The framework consists of an offline phase, where a dataset-conditioned learning-to-rank surrogate is learned from historical runs to capture configuration preferences, and an online phase that utilizes this guidance for Bayesian optimization on new datasets. AutoPipe also incorporates an early-stop predictor to reduce evaluation costs by mapping early training signals to a proxy for final performance. Experiments demonstrate that AutoPipe consistently outperforms offline-only baselines and achieves competitive results with leading online hyperparameter optimization (HPO) methods while using less than 10% of their computational resources.
Methodology
AutoPipe operates in two phases: an offline phase where it learns a ranking surrogate from historical data to guide configuration selection, and an online phase where it applies Bayesian optimization to refine configurations for new datasets. The online phase also utilizes an early-stop predictor to evaluate configurations based on early training signals, allowing for cost-effective optimization.
Results
In experiments focused on biomedical reasoning tasks, AutoPipe outperformed strong offline-only baselines and matched the performance of leading online HPO methods while consuming less than 10% of their computational cost.
Implications
The proposed AutoPipe framework can significantly streamline the configuration process for LLM post-training, making it more accessible and efficient for practitioners. Its ability to adapt to new datasets while minimizing resource expenditure could enhance the deployment of LLMs in various applications, particularly in domains with limited computational budgets.
A Dynamic Bayesian and Machine Learning Framework for Quantitative Evaluation and Prediction of Operator Situation Awareness in Nuclear Power Plants
Theory
Interpretability
Time Series
- Introduces a dynamic Bayesian-machine learning framework for real-time evaluation of operator situation awareness.
- Identifies and quantifies interdependencies among 11 performance shaping factors affecting situation awareness.
- Achieves a predictive accuracy of 13.8% in estimating situation awareness scores from performance shaping factors.
- Demonstrates the importance of training quality and stress dynamics in maintaining operator situation awareness.
Read more
A Dynamic Bayesian and Machine Learning Framework for Quantitative Evaluation and Prediction of Operator Situation Awareness in Nuclear Power Plants
Summary
This paper presents a novel framework, the Dynamic Bayesian-Machine Learning Framework for Situation Awareness (DBML-SA), aimed at quantitatively evaluating and predicting operator situation awareness (SA) in nuclear power plants (NPPs). Traditional assessment methods like SAGAT and SART are criticized for being static and retrospective, failing to capture the dynamic nature of cognitive processes in high-stakes environments. The DBML-SA framework integrates probabilistic reasoning with machine learning to provide a real-time, interpretable model of SA. Utilizing 212 operational event reports from 2007 to 2021, the authors identified 11 performance shaping factors (PSFs) that influence SA and developed a dynamic Bayesian network (DBN) to model the temporal dependencies of these factors. Additionally, a neural network was employed to predict SART scores based on PSF inputs, achieving a mean absolute percentage error of 13.8% and demonstrating statistical consistency with subjective evaluations. The findings indicate that training quality and stress dynamics are critical factors affecting SA degradation. Overall, the DBML-SA framework offers a significant advancement over traditional methods by enabling real-time cognitive monitoring and early-warning predictions, which are essential for enhancing human-machine reliability in modern digital control rooms.
Methodology
The DBML-SA framework combines statistical analysis of operational event data to identify causal relationships among performance shaping factors, a dynamic Bayesian network to model probabilistic and temporal dependencies, and machine learning techniques to predict situation awareness scores based on measurable cognitive indicators.
Results
The framework was validated through simulator-based experiments, showing consistent trends between predicted and measured situation awareness levels. The model effectively captured the dynamic nature of operator cognition and provided insights into the factors influencing SA degradation.
Implications
The DBML-SA framework has the potential to transform how situation awareness is monitored and evaluated in high-stakes environments like nuclear power plants, paving the way for improved human-machine interaction and reliability management in advanced digital control systems.
Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning
NLP
Large Language Models
Efficient ML
- Introduction of Sparse Token Embedding Unlearning (STEU) for parameter-efficient unlearning.
- STEU modifies only a small fraction of model parameters, making it suitable for deployment-constrained environments.
- Demonstrated effectiveness across multiple clinical datasets and transformer architectures.
- Achieves near-complete forgetting of targeted information while preserving model utility.
Read more
Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning
Summary
This paper addresses the challenge of machine unlearning in clinical language models, which is crucial for complying with privacy regulations that necessitate the removal of sensitive information from deployed systems. The authors introduce Sparse Token Embedding Unlearning (STEU), a novel method that allows for class-level unlearning by making targeted edits to a small subset of token embeddings, rather than modifying the entire encoder layers. This approach is designed to balance effective forgetting of specific information while maintaining the overall utility of the model. The authors conduct extensive experiments using various transformer architectures (BioClinicalBERT, BERT-base, and DistilBERT) across multiple clinical datasets (MIMIC-IV, MIMIC-III, and eICU). Results indicate that STEU achieves near-complete forgetting of the target class with minimal parameter modification (only 0.19% of model parameters), while retaining competitive performance on the remaining tasks. This suggests that localized embedding edits can effectively induce behavioral changes in the model without extensive retraining.
Methodology
The methodology involves the Sparse Token Embedding Unlearning (STEU) approach, which selects token embeddings for modification based on pointwise mutual information (PMI). The method updates only these selected embeddings along with a lightweight classifier head, keeping all encoder layers frozen to minimize changes to the model.
Results
In experiments, STEU achieved a forget F1 score of 0.0004 in the MIMIC-IV dataset, indicating near-complete forgetting of the target class, while maintaining a retained average F1 score of 0.4766. This was accomplished with only 0.19% of the model parameters being modified, showcasing the method's efficiency and effectiveness.
Implications
The findings suggest that STEU can be a practical solution for healthcare institutions needing to comply with data privacy regulations, allowing for the removal of sensitive information from clinical models without the need for extensive retraining. This has significant implications for the deployment of AI in sensitive environments where data privacy is paramount.
Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction
Audio & Speech
- Multi-corpus training can lead to performance degradation in spoofing detection due to dataset-specific biases.
- The proposed IDFE framework effectively reduces corpus-specific information in embeddings, improving generalization.
- The IDFE framework achieves a 20% reduction in average EER compared to baseline models across multiple datasets.
- The study emphasizes the need for robust training methodologies to enhance the reliability of anti-spoofing systems.
Read more
Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction
Summary
This paper addresses the challenges of speech spoofing detection, particularly the performance variability across different training and evaluation corpora. The authors propose a novel framework called Invariant Domain Feature Extraction (IDFE) that utilizes multi-task learning and a gradient reversal layer to minimize corpus-specific biases in learned embeddings. Their experiments reveal that multi-corpus training does not consistently enhance performance and can lead to degradation due to dataset-specific biases. The IDFE framework effectively reduces the average equal error rate (EER) by 20% compared to baseline methods across four diverse datasets, demonstrating its potential to improve generalization and robustness in spoofing detection systems. The study highlights the importance of addressing dataset biases to enhance the reliability of automatic speaker verification systems against spoofing attacks.
Methodology
The authors employed a multi-task learning approach with a gradient reversal layer to create the IDFE framework, which suppresses dataset-specific information in the embedding space. They conducted experiments using four different datasets to analyze the impact of multi-corpus training and dataset biases on model performance.
Results
The IDFE framework demonstrated a significant improvement in detection performance, achieving a 20% reduction in average EER compared to the baseline methods across four evaluation datasets. This indicates that the framework effectively mitigates the negative effects of dataset-specific biases.
Implications
The findings suggest that addressing dataset biases is crucial for developing more reliable spoofing detection systems. The IDFE framework can be applied to various detection architectures, potentially enhancing the robustness of automatic speaker verification systems against spoofing attacks.
Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers
Federated Learning
- Development of a cross-facility FL framework tailored for heterogeneous HPC environments.
- Systematic characterization of performance variations due to computational throughput and communication costs.
- Evaluation of existing FL algorithms under realistic HPC scheduling conditions.
- Validation of the framework's applicability through fine-tuning a large language model on scientific data.
Read more
Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers
Summary
This paper addresses the challenges of training large scientific models using Federated Learning (FL) across multiple High Performance Computing (HPC) facilities. Traditional FL frameworks are not well-suited for the unique constraints of HPC environments, such as job scheduling delays, heterogeneous architectures, and strict security policies. The authors present a comprehensive cross-facility FL framework built on the Advanced Privacy-Preserving Federated Learning (APPFL) framework, utilizing Globus Compute and Transfer for orchestration. The framework is evaluated across four U.S. Department of Energy leadership-class supercomputers, demonstrating that cross-facility FL is feasible and effective for scientific applications. The study characterizes the performance variations due to computational heterogeneity and evaluates various FL algorithms under realistic scheduling conditions. The authors validate the framework by fine-tuning a large language model on a chemistry instruction dataset, highlighting the importance of scheduler-aware algorithm design for future deployments.
Methodology
The authors implemented a cross-facility FL framework that orchestrates training across diverse HPC facilities. They utilized the APPFL framework and Globus Compute for task dispatching and Globus Transfer for model updates. The framework was evaluated through experiments on four DOE supercomputers, focusing on both large-scale co-scheduled runs and smaller queue-based runs to assess performance under realistic conditions.
Results
The experiments demonstrated that cross-HPC facility FL is practically achievable, revealing significant performance variations influenced by heterogeneity in computational resources and scheduling dynamics. The evaluation of FL algorithms showed that their performance can be significantly affected by the specific scheduling conditions and memory constraints of the HPC environments.
Implications
The findings suggest that cross-facility FL can facilitate collaborative scientific model training while addressing privacy and data sovereignty challenges. The framework's design principles can guide future research and deployments in federated learning for scientific applications, particularly in environments with diverse computational resources.
A Visualization for Comparative Analysis of Regression Models
Theory
Interpretability
- Traditional regression metrics like MAE and RMSE can mask important differences in model performance.
- The proposed 2D Error Space visualization allows for a more nuanced understanding of regression model errors.
- The methodology includes a colormap for error distribution visualization and uses Mahalanobis distance for better comparison.
- The approach is validated on three real datasets, showcasing its practical relevance.
Read more
A Visualization for Comparative Analysis of Regression Models
Summary
This paper addresses the limitations of traditional metrics used to evaluate regression models, such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), which often aggregate information too much and obscure critical behavioral differences between models. The authors propose a novel visualization methodology that enhances the comparative analysis of regression models by focusing on the distribution of errors. The methodology consists of two main steps: first, selecting the best models using 1D visualizations, and second, analyzing the errors of these models in a 2D Error Space. This 2D Error Space incorporates a colormap to visualize the percentile-based distribution of errors, allowing for the identification of dense regions and outliers, and utilizes the Mahalanobis distance to account for correlations and differences in scale. The proposed visualization approach is illustrated using three real datasets, demonstrating its effectiveness in revealing patterns that traditional metrics may obscure.
Methodology
The methodology involves two main steps: (1) selecting the best-performing regression models using 1D visualizations based on traditional metrics, and (2) analyzing the errors of these selected models in a 2D Error Space that visualizes error distributions using a colormap and applies Mahalanobis distance for comparison.
Results
The proposed visualization technique successfully highlights the distribution of errors across different regression models, revealing patterns and discrepancies that traditional metrics fail to capture. The application of this method on three real datasets demonstrated its ability to provide a comprehensive view of model performance.
Implications
This visualization approach can significantly enhance the interpretability of regression models, particularly in fields where understanding error patterns is crucial, such as medical diagnosis and autonomous driving. It encourages a more detailed analysis of model performance beyond traditional metrics, potentially leading to better model selection and deployment strategies.
Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
Optimization
Efficient ML
Computer Vision
- Tula optimizes large-batch training by balancing time, cost, and model quality.
- The service predicts training time and cost with an error margin of 7.5-14%.
- Achieves up to 20× speedup and approximately 9% improvement in test accuracy over standard methods.
- Introduces a gradient-scaling technique to mitigate the generalization gap associated with large-batch training.
Read more
Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
Summary
The paper introduces Tula, an online service designed to optimize time, cost, and convergence quality in large-batch training of convolutional models. It addresses the challenges of distributed training, where simply increasing batch size can lead to diminishing returns in performance due to increased communication overhead and the generalization gap. Tula employs a combination of parallel-systems modeling and statistical performance prediction to determine the optimal batch size tailored to specific models, datasets, and computational resources. The authors present a comprehensive analysis of the trade-offs between parallel and statistical efficiency in large-batch training, develop a performance model that estimates resource requirements and execution overhead, and introduce a gradient-scaling technique that enhances model accuracy. The effectiveness of Tula is demonstrated through extensive evaluations across various vision tasks, showing significant improvements in training speed and model accuracy compared to standard large-batch training methods.
Methodology
Tula utilizes a black-box approach combining empirical performance modeling with profiling-based compute analysis to estimate memory demands and a parallel model to predict compute and synchronization costs. It incorporates a gradient-scaling technique to improve accuracy in large-batch training.
Results
Tula demonstrated a 20× overall speedup in training and a 9% average improvement in test accuracy across various vision tasks compared to traditional large-batch training methods. The performance model accurately predicted training time and cost within a 7.5-14% error range.
Implications
The findings suggest that Tula can significantly enhance the efficiency and effectiveness of distributed training in deep learning, making it applicable in environments with heterogeneous computing resources and large datasets. This could lead to faster model development cycles and improved model performance in real-world applications.
A Mathematical Theory of Understanding
Theory
- The ability to decode information is dependent on the learner's prerequisite knowledge structure.
- Teaching is modeled as sequential communication, where the effectiveness of signals varies based on the learner's current knowledge state.
- Two limits on learning speed are identified: structural (prerequisite reachability) and epistemic (uncertainty about the target).
- Threshold effects in learning imply that resource allocation strategies should focus on depth rather than uniform distribution.
Read more
A Mathematical Theory of Understanding
Summary
This paper presents a mathematical model addressing the learner-side bottleneck in understanding information, particularly in the context of generative AI. It posits that the value of information is contingent upon the learner's ability to decode it, which is influenced by their prerequisite knowledge structure. The author defines a 'mind' as a learning system whose capacity to interpret signals is shaped by previously acquired concepts. The model explores the relationship between teaching, learning, and the structural prerequisites necessary for understanding. It identifies two primary limits on learning speed: a structural limit based on prerequisite reachability and an epistemic limit tied to uncertainty about the target concept. The findings suggest that effective teaching requires a careful consideration of the learner's current knowledge state, with implications for instructional resource allocation. The paper concludes that personalized instruction may outperform common broadcast curricula, especially in heterogeneous learner environments, due to the varying prerequisite structures among learners.
Methodology
The paper employs a formal mathematical framework to model the concept of a 'mind' as a learning system characterized by a prerequisite structure. It analyzes the flow of information through this model to derive insights about teaching and learning dynamics.
Results
The model establishes a lower bound on teaching time that incorporates both structural and epistemic barriers. It reveals that once the prerequisite structure allows for a target concept to be reachable, a single additional signal can suffice for identification. The study also indicates that instructional time yields non-concave returns, emphasizing the importance of strategic resource allocation in teaching.
Implications
The findings have significant implications for educational strategies, particularly in designing curricula that account for varying learner prerequisites. It suggests that personalized teaching approaches may enhance learning outcomes, especially in environments with diverse learner capabilities. Additionally, the insights could inform the development of AI systems that adaptively tailor information delivery based on user understanding.
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
Optimization
Reinforcement Learning
Theory
- Establishes finite-time convergence bounds for stochastic approximation under heavy-tailed and LRD noise.
- Demonstrates that convergence rates degrade with the presence of heavy-tailed and LRD noise compared to classical models.
- Introduces a noise-averaging technique that improves moment bounds without modifying the iteration process.
- Provides the first finite-time guarantees for SGD under LRD noise and for gradient play under both heavy-tailed and LRD noise.
Read more
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
Summary
This paper addresses the limitations of classical stochastic approximation (SA) methods, which typically assume martingale difference or Markov noise with bounded second moments. The authors investigate the effects of heavy-tailed and long-range dependent (LRD) noise on SA, particularly in the context of finding the root of strongly monotone operators. They establish the first finite-time moment bounds for SA under these non-classical noise models, providing explicit convergence rates that reflect the impact of heavy tails and temporal dependence. The analysis employs a noise-averaging technique that regularizes the noise's impact without altering the iteration process. The framework is applied to stochastic gradient descent (SGD) and gradient play, with numerical experiments validating the theoretical findings. The results indicate that heavy-tailed noise and LRD significantly slow convergence rates compared to classical noise assumptions, with specific decay rates derived for both types of noise.
Methodology
The authors utilize a noise-averaging argument to regularize the impact of heavy-tailed and LRD noise on the stochastic approximation process. They derive moment bounds for the error in finding the root of a strongly monotone operator, analyzing the convergence rates under different noise conditions. The framework is applied to specific algorithms like SGD and gradient play, with numerical experiments conducted to support the theoretical analysis.
Results
The paper presents finite-time moment bounds for stochastic approximation under heavy-tailed noise, showing a decay rate of O(k^-(p-1)) for p-th moment errors, and for LRD noise, a mean square error bound of O(k^(-δ)) where δ is related to the autocovariance decay. These results indicate that heavier-tailed noise and stronger temporal dependence lead to slower convergence rates.
Implications
The findings have significant implications for various applications in optimization and reinforcement learning, particularly in fields like finance and communications where heavy-tailed and LRD noise are prevalent. The established finite-time guarantees can enhance the robustness and reliability of algorithms in practical scenarios.
GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models
NLP
Large Language Models
Interpretability
- GeoLAN treats token representations as geometric trajectories to improve interpretability in LLMs.
- Two differentiable regularizers are introduced to promote isotropy and diverse attention.
- Experiments show that GeoLAN maintains task performance while enhancing geometric metrics and reducing biases.
- The approach reveals scale-dependent trade-offs, particularly beneficial for mid-sized models.
Read more
GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models
Summary
The paper introduces GeoLAN, a novel training framework aimed at enhancing the interpretability of large language models (LLMs) by treating token representations as geometric trajectories. This approach addresses the transparency crisis in LLMs, which often operate as black boxes, making it difficult to audit their decision-making processes. GeoLAN employs two differentiable regularizers, Katz-Tao Convex Wolff (KT-CW) and Katz-Tao Attention (KT-Attn), designed to promote isotropy and encourage diverse attention patterns. The authors conducted experiments using various model sizes, including Gemma-3 and Llama-3-8B, demonstrating that GeoLAN can maintain task accuracy while improving geometric metrics and reducing fairness biases, particularly in mid-sized models. The findings suggest that geometry-aware training can enhance mechanistic interpretability, revealing scale-dependent trade-offs between geometric precision and model performance.
Methodology
The authors developed a geometric theory for internal explainability by formalizing token representations as geometric trajectories. They introduced two differentiable loss functions that enforce 'stickiness' in the latent space to prevent representation collapse and promote the use of the full geometric space during training. The methodology also includes a comprehensive evaluation protocol to assess the impact of geometric constraints on explainability.
Results
The experiments indicated that GeoLAN effectively maintains task accuracy across various model sizes while improving geometric metrics and reducing fairness biases. The most significant improvements were observed in mid-sized models, highlighting the scale-dependent nature of the proposed geometric constraints.
Implications
GeoLAN's approach could lead to more interpretable LLMs, making them safer and more accountable in critical applications such as legal and medical domains. The findings suggest that geometry-aware training could be a viable strategy for enhancing the interpretability of complex models.
From ex(p) to poly: Gaussian Splatting with Polynomial Kernels
Computer Vision
Efficient ML
- Introduction of an N-th-order polynomial kernel for Gaussian Splatting that is computationally efficient and compatible with existing datasets.
- Significant performance improvements (4%-15%) with negligible degradation in image quality.
- Formal mathematical derivation proving invariance of anti-aliasing normalization factors for arbitrary kernel functions.
- Methodology for fitting polynomial coefficients using L1 loss tailored to practical quadric distributions in 3DGS.
Read more
From ex(p) to poly: Gaussian Splatting with Polynomial Kernels
Summary
This paper presents a novel approach to Gaussian Splatting (3DGS) by introducing a polynomial kernel approximation that enhances computational efficiency while maintaining compatibility with existing datasets. The authors replace the traditional exponential kernel with a polynomial approximation combined with a ReLU function, allowing for more aggressive culling of splats. This modification leads to a performance improvement of 4%−15% with minimal impact on image quality. The paper includes a detailed mathematical analysis of the new kernel, demonstrating its benefits for 3DGS implementations, particularly on NPU hardware. The authors also propose a methodology for fitting polynomial coefficients optimized for real-world rendering scenarios, ensuring that the new kernel is practical for various applications in neural rendering.
Methodology
The authors propose a polynomial kernel approximation for Gaussian Splatting, focusing on maintaining compatibility with existing datasets. They derive a universal bounding radius for splats and implement a fitting methodology for polynomial coefficients using L1 loss. The performance of the new kernel is evaluated across multiple 3DGS implementations and rendering APIs.
Results
The proposed polynomial kernel shows a performance improvement of 4%−15% in various 3DGS implementations, with minimal impact on image quality. The mathematical analysis confirms that anti-aliasing normalization factors remain invariant across different kernel functions.
Implications
This work has significant implications for real-time rendering applications, particularly in optimizing performance for neural rendering pipelines. The compatibility with existing datasets facilitates broader adoption of the proposed methods in practical scenarios.
GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression
Generative Models
Efficient ML
Optimization
- Introduces a goal-oriented approach to data sampling and compression in network telemetry.
- Combines adaptive sampling with generative AI for efficient data acquisition.
- Utilizes a hybrid compression scheme to balance fidelity and efficiency.
- Demonstrates significant cost reductions in data transfer while preserving analytical performance.
Read more
GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression
Summary
The paper presents GO-GenZip, a novel framework for network telemetry that leverages generative AI to optimize data sampling and compression from a goal-oriented perspective. Traditional telemetry systems struggle with the increasing volume of data, leading to inefficiencies in storage and transmission. GO-GenZip addresses this by integrating adaptive sampling policies with generative modeling to selectively acquire and compress data based on its relevance to specific downstream tasks. The framework employs adaptive masking techniques to identify critical features in the data while utilizing a hybrid compression approach that combines traditional lossless coding with generative AI-driven lossy compression. Experimental results demonstrate that GO-GenZip achieves over 50% reductions in sampling and data transfer costs while maintaining high reconstruction accuracy and analytical fidelity for various tasks. This work highlights the potential of generative AI in enhancing the efficiency of network telemetry systems, making it applicable to diverse contexts beyond telecommunications.
Methodology
The methodology involves designing an adaptive masking policy for sampling relevant telemetry data, implementing a hybrid compression strategy that integrates generative AI with traditional lossless methods, and developing a goal-oriented end-to-end training method to optimize both masking and compression policies. The framework is validated using real network telemetry data collected from multiple base stations.
Results
The experimental results indicate that GO-GenZip achieves over 50% reductions in both sampling and data transfer costs compared to traditional methods, while maintaining comparable reconstruction accuracy and goal-oriented analytical fidelity across various tasks.
Implications
The proposed framework has significant implications for improving the efficiency of network telemetry systems, particularly in the context of next-generation networks. It can be applied to various use cases, including channel charting and Integrated Sensing and Communication (ISAC) scenarios, thereby enhancing the adaptability and performance of network management systems.
Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL
NLP
Large Language Models
Theory
- Demonstration effectiveness is quantified using Lipschitz constants, linking quality to ICL performance.
- CoT prompting benefits ICL by decomposing tasks into manageable subtasks, contingent on well-selected demonstrations.
- The influence of prompt templates on ICL performance varies with the number of demonstrations, exhibiting diminishing returns.
- Theoretical results are supported by empirical experiments, confirming the model's ability to generalize beyond pretraining.
Read more
Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL
Summary
This paper presents a theoretical analysis of In-Context Learning (ICL) in Large Language Models (LLMs), addressing gaps in existing literature that often rely on strong assumptions or overlook practical factors affecting performance. The authors establish a framework that links demonstration selection, Chain-of-Thought (CoT) prompting, the number of demonstrations, and prompt templates to the generalization behavior of ICL. They derive an upper bound on ICL test loss, identifying three key factors influencing performance: the quality of demonstrations (measured by Lipschitz constants), the intrinsic ICL capability of the model, and the degree of distribution shift. The analysis reveals that CoT prompting can enhance ICL by decomposing tasks into simpler subtasks, provided that demonstrations are well-chosen. Additionally, the interaction between prompt templates and the number of demonstrations is characterized, showing that the impact of templates diminishes as the number of demonstrations increases. The theoretical insights are validated through experiments, demonstrating that pretraining equips models to generalize to unseen tasks effectively.
Methodology
The authors develop a theoretical framework under mild assumptions to analyze ICL performance. They derive mathematical relationships linking demonstration quality, CoT prompting, and prompt templates to ICL outcomes. Experiments are conducted to validate the theoretical findings.
Results
The study establishes that ICL performance is governed by the quality of demonstrations, the model's intrinsic capabilities, and the distribution shift. CoT prompting is shown to improve performance when subtasks are well-defined. The interaction between prompt templates and demonstration count reveals that templates have a significant impact when few demonstrations are used, but this effect diminishes with more demonstrations.
Implications
The findings suggest that careful selection of demonstrations and effective use of CoT prompting can significantly enhance the performance of LLMs in ICL scenarios. This has implications for designing better prompting strategies and understanding the generalization capabilities of pretrained models.
MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
Large Language Models
Reinforcement Learning
Graph Learning
- MemReward utilizes a graph-based structure to enhance reward prediction in LLMs with limited labeled data.
- The framework achieves 97.3% of Oracle performance with only 20% of the required labels.
- MemReward outperforms fully-supervised models on out-of-domain tasks, showcasing its generalization capabilities.
- The performance of MemReward scales positively with the increase in label budget, reaching 99.4% of Oracle at 70% labels.
Read more
MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
Summary
The paper introduces MemReward, a novel graph-based experience memory framework designed to enhance reward prediction for large language models (LLMs) in scenarios where labeled data is scarce. Traditional reinforcement learning (RL) fine-tuning of LLMs relies heavily on reward labels, which can be expensive and time-consuming to obtain. MemReward addresses this challenge by leveraging a heterogeneous graph structure where queries, thinking processes, and answers are represented as nodes. A graph neural network (GNN) is employed to propagate rewards from labeled to unlabeled rollouts, allowing for effective online optimization. The framework was evaluated on two model sizes (Qwen2.5-3B and 1.5B) across various tasks, including mathematics, question answering, and code generation. The results demonstrate that MemReward can achieve performance close to that of fully-supervised models, even with only 20% of the labels, and surpasses the Oracle performance on out-of-domain tasks, indicating its potential for generalization and efficiency in RL fine-tuning.
Methodology
MemReward employs a heterogeneous graph structure where nodes represent queries, thinking processes, and answers. A GNN is trained on labeled nodes to predict rewards for unlabeled rollouts during online optimization. The model constructs edges based on semantic similarity and relationships between different nodes, allowing for effective information aggregation and reward propagation.
Results
MemReward demonstrated impressive results, achieving 97.3% of Oracle performance on the Qwen2.5-3B model and 96.6% on the 1.5B model with only 20% of the labels. It also surpassed fully-supervised Oracle performance on out-of-domain tasks, indicating its robustness and ability to generalize across different domains.
Implications
The findings suggest that MemReward can significantly reduce the reliance on extensive labeled datasets for training LLMs, making it a valuable approach in scenarios where labeling is challenging or costly. This could lead to more efficient training processes and broader applications of LLMs in various fields.
MeanFlow Meets Control: Scaling Sampled-Data Control for Swarms
Robotics
Optimization
Generative Models
- Introduces a control-space learning framework for swarm steering under sampled-data control.
- Focuses on learning a coefficient for finite-horizon minimum-energy control rather than instantaneous velocity fields.
- Demonstrates a scalable approach to few-step swarm steering consistent with real control systems.
- Establishes integral and differential representations for the learned control coefficient.
Read more
MeanFlow Meets Control: Scaling Sampled-Data Control for Swarms
Summary
This paper addresses the challenge of steering large-scale swarms with limited control updates, particularly in sampled-data systems where control inputs are applied intermittently. The authors introduce a control-space learning framework inspired by MeanFlow, focusing on linear time-invariant dynamics. Instead of relying on instantaneous velocity fields, the framework learns a coefficient that parameterizes the finite-horizon minimum-energy control for each sampling interval. This approach allows for a scalable method of swarm steering that respects the sampled-data structure of real control systems. The paper demonstrates that the learned coefficient can be represented both integrally and through local differential identities, leading to a straightforward stop-gradient training objective. The implementation uses the learned coefficient directly in sampled-data updates, ensuring compliance with prescribed dynamics and actuation maps. The proposed framework effectively bridges the gap between flow-based generative modeling and control theory, providing a robust solution for few-step swarm steering.
Methodology
The methodology involves learning a coefficient that parameterizes the finite-horizon minimum-energy control for each sampling interval in a sampled-data control framework. The training process utilizes bridge-based supervision to align the learned quantity with the actual control applied over finite windows, ensuring that the learned controller respects the system's dynamics and actuation maps.
Results
The results indicate that the proposed framework successfully scales swarm steering in sampled-data settings, demonstrating effective control with fewer updates. The learned coefficients align well with the minimum-energy control requirements, showing robustness in implementation and adherence to the system's dynamics.
Implications
The findings have significant implications for robotics, autonomous transportation, and distributed sensing, where effective swarm control is crucial. The framework can enhance the performance of large-scale swarm systems by providing a method that respects the constraints of real-world sampled-data control systems.
SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
Reinforcement Learning
Large Language Models
NLP
- SLEA-RL retrieves experiences at each decision step, improving relevance and adaptability.
- The framework includes a self-evolving experience library that maintains quality under continuous updates.
- Empirical results show superior performance on multi-turn benchmarks compared to standard RL methods.
Read more
SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
Summary
The paper introduces SLEA-RL, a novel framework for reinforcement learning that enhances the training of Large Language Model (LLM) agents in multi-turn tool-use tasks. Traditional methods fail to leverage accumulated experiences across episodes, leading to static retrieval of experiences that become less relevant as the task progresses. SLEA-RL addresses this by implementing step-level experience retrieval, where relevant experiences are fetched at each decision step based on the current observation. The framework consists of three main components: (i) step-level observation clustering for efficient retrieval, (ii) a self-evolving experience library that distills successful strategies and failure patterns, and (iii) policy optimization with step-level credit assignment for improved advantage estimation. The experience library evolves alongside the policy through semantic analysis rather than gradient updates, allowing for continuous learning from past interactions. Experiments demonstrate that SLEA-RL significantly outperforms various reinforcement learning baselines on long-horizon multi-turn agent benchmarks, showcasing its effectiveness in adapting to dynamic environments.
Methodology
SLEA-RL employs step-level observation clustering to group structurally similar environmental states for efficient experience retrieval. It utilizes a self-evolving experience library that distills strategies through score-based admission and rate-limited extraction. The policy optimization incorporates step-level credit assignment to enhance learning from intermediate actions.
Results
SLEA-RL achieved faster convergence and higher success rates on benchmarks such as ALFWorld and WebShop compared to existing reinforcement learning algorithms, demonstrating its effectiveness in multi-turn environments.
Implications
The proposed framework can significantly enhance the training of LLM agents in complex, dynamic environments, potentially improving their performance in real-world applications such as interactive web search and tool-use tasks.
Warm-Start Flow Matching for Guaranteed Fast Text/Image Generation
Generative Models
Efficient ML
Multimodal
- Introduces Warm-Start Flow Matching (WS-FM) to enhance sample generation speed in flow matching algorithms.
- Utilizes lightweight generative models to create initial draft samples that are of decent quality.
- Reduces the number of time steps required for sample generation, ensuring a guaranteed speed-up.
- Demonstrates effectiveness on both synthetic and real-world datasets.
Read more
Warm-Start Flow Matching for Guaranteed Fast Text/Image Generation
Summary
This paper addresses the inefficiencies in sample generation of flow matching (FM) algorithms, which are often computationally intensive and time-consuming due to the large number of function evaluations required. The author proposes a novel approach called Warm-Start Flow Matching (WS-FM) that utilizes lightweight generative models to produce initial draft samples that serve as a better starting point for the FM algorithm. By generating these draft samples quickly, the method allows for a significant reduction in the number of time steps needed to reach the target data distribution, thus guaranteeing a speed-up in the sample generation process without compromising the quality of the outputs. The paper demonstrates the effectiveness of WS-FM through experiments on synthetic data as well as real-world text and image generation tasks, showing that it can achieve faster generation times while maintaining high sample quality.
Methodology
The methodology involves training lightweight generative models (such as LSTM or GAN) to produce draft samples from a given dataset. These draft samples are then used as the initial distribution for the FM algorithm, allowing the process to start closer to the target distribution. This approach contrasts with traditional FM methods that start from pure noise, thereby reducing the number of necessary time steps for accurate sample generation.
Results
The results indicate that the WS-FM approach significantly reduces sample generation time compared to conventional FM methods while ensuring that the quality of the generated samples remains high. Experiments on synthetic and real-world datasets validate the effectiveness of the proposed method.
Implications
The implications of this research suggest that WS-FM could be widely applicable in scenarios requiring fast and efficient text/image generation, particularly in resource-constrained environments. It opens avenues for further exploration of hybrid generative models that leverage the strengths of both lightweight and complex models.
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning
Reinforcement Learning
Optimization
- Formulates resource allocation as a constrained RMAB process addressing asynchronous cluster arrivals.
- Proposes a hierarchical reinforcement learning framework that separates global coordination from local decision-making.
- Implements a generalized local DQN that adapts to varying resource constraints without retraining.
- Achieves a 20%-30% improvement in outbreak control effectiveness compared to heuristic strategies.
Read more
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning
Summary
This paper addresses the challenge of optimizing resource allocation for non-pharmaceutical interventions (NPIs) during infectious disease outbreaks, particularly when resources are limited and multiple outbreak clusters emerge asynchronously. The authors formulate the problem as a constrained restless multi-armed bandit (RMAB) and propose a hierarchical reinforcement learning framework to manage resource allocation effectively. The framework consists of a global controller that adjusts resource demand through a continuous action cost multiplier and a local policy that evaluates the marginal value of resource allocation within individual clusters. The proposed method is evaluated using a realistic agent-based simulator of SARS-CoV-2, demonstrating its ability to outperform existing RMAB-inspired and heuristic approaches. The results show a significant improvement in outbreak control effectiveness, highlighting the framework's scalability and efficiency in decision-making under resource constraints.
Methodology
The authors developed a hierarchical reinforcement learning framework that includes a global controller using Proximal Policy Optimization (PPO) to manage resource allocation across clusters and a local policy based on a generalized Deep-Q Network (DQN) enhanced with Transformer architecture to make individual resource allocation decisions. This design allows for dynamic adjustment of resource usage while maintaining computational efficiency.
Results
The proposed framework consistently outperformed RMAB-inspired and heuristic baselines, improving outbreak control effectiveness by 20%-30%. It also demonstrated scalability to scenarios with up to 40 concurrently active clusters and achieved approximately 5× faster decision-making compared to traditional methods.
Implications
The findings suggest that the hierarchical reinforcement learning approach can significantly enhance public health responses to infectious disease outbreaks by enabling more effective and efficient resource allocation strategies. This could lead to better management of NPIs in real-world scenarios, particularly in the early stages of outbreaks when resources are critically limited.
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
NLP
Large Language Models
Efficient ML
- KV cache entries can be exactly reconstructed from the residual stream, proving their redundancy.
- Removing the KV cache yields token-identical outputs across various transformer models.
- KV-Direct reduces peak memory usage by 2.5x and improves latency compared to traditional caching methods.
- Cross-task residual patching shows that the residual stream satisfies a Markov property.
Read more
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
Summary
This paper challenges the conventional wisdom that the key-value (KV) cache is essential for transformer inference. The authors demonstrate that the keys and values stored in the KV cache are deterministic projections of the residual stream, meaning they can be reconstructed from a single residual vector per token without any loss of information. This finding is validated across six models from four architecture families, confirming that removing the KV cache entirely does not affect output quality. The authors introduce KV-Direct, a new inference scheme that reduces memory usage significantly by storing only the residual vectors and recomputing keys and values on demand. This approach not only decreases peak memory requirements but also improves latency, making it a more efficient alternative to traditional caching methods. The paper provides empirical evidence supporting these claims, including experiments showing that KV-Direct maintains 100% token match across various models while traditional methods degrade in performance. Overall, the findings suggest that the KV cache can be eliminated without compromising model performance, leading to more efficient transformer implementations.
Methodology
The authors conducted empirical experiments across six transformer models to validate their hypothesis about the redundancy of the KV cache. They analyzed the relationship between the residual stream and KV entries, performed cross-task residual patching, and introduced the KV-Direct inference scheme, which checkpoints residual vectors instead of caching KV pairs. They also conducted latency analyses to compare the performance of KV-Direct against traditional caching methods.
Results
The results showed that the KV cache can be completely removed without affecting the output, achieving 100% token identity in outputs across all tested models. The KV-Direct method maintained a peak memory usage of 42 MB compared to over 100 MB for standard caching, while also demonstrating faster recomputation times, up to 5 times faster than reading from cached tensors.
Implications
The findings have significant implications for the design of transformer models, suggesting that memory efficiency can be greatly improved by eliminating the KV cache. This could lead to more scalable and efficient implementations of large language models, particularly in resource-constrained environments.
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
Efficient ML
Computer Vision
NLP
- Introduces InfoMamba, an attention-free hybrid model combining SSM and global filtering.
- Develops a consistency boundary analysis to identify limitations in existing models.
- Implements a concept-bottleneck linear filtering layer to reduce interaction complexity.
- Demonstrates superior performance over existing Transformer and SSM models across multiple tasks.
Read more
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
Summary
The paper introduces InfoMamba, a novel hybrid architecture that integrates a selective state-space model (SSM) with a lightweight global aggregation pathway, addressing the challenges of balancing fine-grained local modeling and long-range dependency capture in sequence modeling. Traditional Transformers, while effective at token mixing, suffer from quadratic complexity, making them less suitable for long contexts. In contrast, Mamba-style SSMs offer linear scaling but often underperform in capturing high-rank global interactions. The authors conduct a consistency boundary analysis to identify the limitations of diagonal short-memory SSMs in approximating causal attention. To overcome these limitations, InfoMamba replaces token-level self-attention with a concept-bottleneck linear filtering layer, which acts as a minimal-bandwidth global interface. This is coupled with a selective recurrent stream through information-maximizing fusion (IMF), which dynamically injects global context into SSM dynamics while enforcing complementary information usage through a mutual-information-inspired objective. The experimental results demonstrate that InfoMamba consistently outperforms state-of-the-art Transformer and SSM baselines across various tasks, achieving a favorable accuracy-efficiency trade-off with near-linear scaling.
Methodology
The authors propose a hybrid architecture that integrates a concept-bottleneck linear filtering layer with a selective recurrent SSM. The model employs information-maximizing fusion (IMF) to couple the global filtering path with the local recurrent stream, guided by a mutual-information-inspired redundancy-reduction objective. This approach allows for efficient global context aggregation while preserving local detail.
Results
Extensive experiments show that InfoMamba outperforms state-of-the-art Transformer and SSM baselines in classification, dense prediction, and non-vision tasks. The model achieves strong accuracy-efficiency trade-offs, demonstrating near-linear scaling in computational complexity.
Implications
InfoMamba's architecture could be applied in various domains requiring efficient sequence modeling, such as natural language processing, computer vision, and time-series forecasting, where balancing local and global context is crucial.
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
Reinforcement Learning
Large Language Models
Theory
- CausalRM addresses the challenges of noisy and biased observational feedback in reward modeling for RLHF.
- The framework introduces a noise-aware surrogate loss to correct for user annotation errors.
- Propensity scores are used to reweight training samples, counteracting user preference bias.
- Extensive experiments show substantial performance improvements in RLHF tasks using CausalRM.
Read more
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
Summary
The paper introduces CausalRM, a novel framework for reward modeling in reinforcement learning from human feedback (RLHF) that leverages observational user feedback, such as clicks and upvotes, as a cost-effective alternative to traditional experimental feedback. The authors identify two main challenges in using observational feedback: the noise introduced by user annotation errors and the bias stemming from user preferences, which can lead to a distribution shift between training and inference data. To address these challenges, CausalRM employs a noise-aware surrogate loss that models the error generation process, ensuring that the learning objective remains valid even in the presence of noise. Additionally, it utilizes propensity scores to reweight training samples, mitigating the bias from user preferences. The framework is validated through extensive experiments across various large language model (LLM) architectures and benchmark datasets, demonstrating its effectiveness in learning accurate reward signals from noisy and biased data. The results show significant performance improvements in downstream RLHF tasks, indicating the potential of CausalRM to enhance the alignment of LLMs with human values.
Methodology
CausalRM employs a causal-theoretic framework that incorporates a noise-aware surrogate loss to model annotation errors and uses propensity scores to reweight training samples, addressing biases in observational feedback. This approach allows for unbiased learning objectives despite the inherent noise and bias in user feedback.
Results
The experiments conducted demonstrate that CausalRM significantly improves the accuracy of reward signals derived from observational feedback, achieving a 49.2% gain on the WildGuardMix benchmark and a 32.7% improvement on HarmBench, showcasing its effectiveness across various LLM architectures.
Implications
CausalRM has the potential to revolutionize the way reward models are trained in RLHF, making the process more scalable and cost-effective. This could lead to better alignment of AI systems with human values, facilitating the deployment of RLHF in industrial applications and enhancing user experience.
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States
Large Language Models
Reinforcement Learning
Theory
- Reintroducing Markov states can break the performance ceiling of RL in LLM post-training.
- Markov models demonstrate superior out-of-distribution generalization compared to history-dependent models.
- Theoretical guarantees indicate that Markovian learning achieves lower sample complexity.
- Empirical results show significant improvements in solving complex logic puzzles.
Read more
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States
Summary
This paper addresses the limitations of reinforcement learning (RL) in post-training large language models (LLMs), which often encounter a 'capability ceiling' where they fail to discover novel strategies and merely refine existing patterns. The authors identify that this issue stems from the reliance on an expansive history of actions instead of compact, informative Markov states, which are central to classical RL. They propose reintroducing explicit Markov states into the post-training process of LLMs. Theoretical foundations are provided to demonstrate that using estimated Markov states can significantly reduce sample complexity. Empirical results show that models utilizing Markov states consistently outperform traditional RL post-training methods on complex logic puzzles, indicating that structured Markovian representations are crucial for enhancing reasoning capabilities and generalization in generative AI.
Methodology
The authors revisit the concept of explicit Markov states in reinforcement learning and provide both theoretical analysis and empirical testing. They benchmark the performance of LLMs with and without Markov states on a suite of complex logic puzzles to evaluate improvements in reasoning and generalization capabilities.
Results
The introduction of Markov states led to consistent performance improvements in LLMs, surpassing traditional RL post-training methods. The models with Markov states achieved higher success rates on complex tasks and demonstrated better generalization to out-of-distribution scenarios. The theoretical analysis confirmed that Markovian approaches require fewer samples to achieve comparable performance.
Implications
The findings suggest that integrating Markov states into LLM post-training could unlock new reasoning capabilities and facilitate the development of more advanced generative AI systems. This approach may pave the way for achieving artificial general intelligence by enhancing the models' ability to explore and discover novel strategies.
Authority-Level Priors: An Under-Specified Constraint in Hierarchical Predictive Processing
Theory
- Introduction of Authority-Level Priors (ALPs) as constraints on identity-level hypotheses in predictive processing.
- ALPs explain the persistence of maladaptive predictions despite belief updating and evidence accumulation.
- The model provides a formal mechanism for understanding regulatory dominance among competing identity-level hypotheses.
- Falsifiable predictions regarding stress-reactivity and behavioral change dynamics are generated from the proposed framework.
Read more
Authority-Level Priors: An Under-Specified Constraint in Hierarchical Predictive Processing
Summary
This paper addresses a gap in hierarchical predictive processing frameworks, which explain adaptive behavior through precision-weighted inference. The author introduces Authority-Level Priors (ALPs) as meta-structural constraints that define a subset of identity-level hypotheses that can regulate autonomic and behavioral control. Unlike traditional priors, ALPs do not represent additional states but rather determine which hypotheses are admissible for regulatory control. This distinction elucidates why belief updating can alter representational beliefs while autonomic responses remain stable despite contradictory evidence. The paper provides a minimal computational formalization where policy optimization is limited to those generated by authorized hypotheses, leading to testable predictions about stress-reactivity dynamics and behavioral persistence. Neurobiologically, ALPs are proposed to function through prefrontal networks involved in rule maintenance and conflict monitoring. The model generates falsifiable predictions regarding governance shifts and their impact on stress responses, suggesting a need for computational modeling and longitudinal studies to evaluate the proposed framework.
Methodology
The paper presents a minimal computational formalization that restricts policy optimization to those generated by authorized hypotheses, allowing for the derivation of testable predictions related to stress-reactivity and behavioral control.
Results
The proposed model yields predictions that governance shifts will lead to measurable changes in stress-reactivity curves, recovery dynamics, and cognitive effort, providing a framework for understanding the stability of identity-level predictions under uncertainty.
Implications
The introduction of ALPs could enhance therapeutic approaches in clinical settings by providing insights into the mechanisms behind identity regulation and stress responses, potentially leading to more effective interventions for maladaptive behaviors.
Deep Hilbert--Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control
Optimization
Theory
Reinforcement Learning
- Introduction of Hilbert–Galerkin Neural Operators (HGNOs) for approximating solutions to infinite-dimensional PDEs.
- Establishment of Universal Approximation Theorems (UATs) for functions on Hilbert spaces and their derivatives.
- Development of numerical methods that minimize PDE residuals across the entire Hilbert space.
- Successful application of the proposed methods to Kolmogorov and HJB PDEs in optimal control scenarios.
Read more
Deep Hilbert--Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control
Summary
This paper introduces deep learning-based approximation methods for fully nonlinear second-order partial differential equations (PDEs) defined on separable Hilbert spaces, particularly focusing on Hamilton-Jacobi-Bellman (HJB) equations relevant to infinite-dimensional control problems. The authors present the first Universal Approximation Theorems (UATs) applicable to these PDEs, leveraging novel topologies for Hessian terms and continuity assumptions on fully nonlinear operators. These UATs ensure that the proposed Hilbert–Galerkin Neural Operators (HGNOs) can effectively approximate all necessary PDE terms, including Fréchet derivatives up to second order. The authors also extend their results to optimal feedback controls derived from the approximating value function HGNO. The paper proposes two numerical training methods: Deep Hilbert–Galerkin and Hilbert Actor-Critic methods, which minimize the L2µ(H)-norm of the PDE residual across the entire Hilbert space, marking a significant advancement in the field. The effectiveness of these methods is demonstrated through numerical solutions of Kolmogorov and HJB PDEs related to optimal control of deterministic and stochastic heat and Burgers' equations, showcasing the potential of deep learning in solving complex infinite-dimensional PDEs.
Methodology
The authors develop a framework using Hilbert–Galerkin Neural Operators (HGNOs) to parameterize solutions of infinite-dimensional PDEs. They prove Universal Approximation Theorems (UATs) for these operators and derive numerical training methods that minimize the L2µ(H)-norm of the PDE residuals across the entire Hilbert space, rather than limiting to finite-dimensional projections.
Results
The proposed methods successfully approximate solutions to Kolmogorov and HJB PDEs, demonstrating their capability in handling both deterministic and stochastic control problems. Numerical experiments validate the effectiveness of the HGNOs in achieving accurate solutions.
Implications
The findings suggest that deep learning techniques can significantly enhance the numerical treatment of complex infinite-dimensional PDEs, with potential applications in various fields such as physics, finance, and engineering, where such equations frequently arise.
Kolmogorov-Arnold causal generative models
Generative Models
Interpretability
Theory
- Introduction of KaCGM, a causal generative model that enhances interpretability in causal inference.
- Utilization of Kolmogorov-Arnold Networks (KANs) for parameterizing structural equations.
- Development of a validation pipeline for assessing model performance using observational data.
- Demonstration of competitive performance on synthetic and real-world datasets.
Read more
Kolmogorov-Arnold causal generative models
Summary
The paper introduces the Kolmogorov-Arnold causal generative model (KaCGM), which aims to bridge the gap between expressive causal generative modeling and functional interpretability in high-stakes decision-making contexts. KaCGM is designed for mixed-type tabular data, where each structural equation is parameterized by a Kolmogorov-Arnold Network (KAN). This approach allows for direct inspection of causal mechanisms, facilitating symbolic approximations and visualization of relationships between variables. The authors propose a validation pipeline that utilizes distributional matching and independence diagnostics to assess model performance using observational data. Experimental results demonstrate that KaCGM achieves competitive performance against state-of-the-art methods on synthetic and semi-synthetic benchmarks, and a real-world cardiovascular case study illustrates its ability to extract interpretable causal effects. The findings suggest that KaCGM can provide both expressive modeling capabilities and functional transparency, making it suitable for deployment in sensitive applications.
Methodology
The methodology involves the use of Kolmogorov-Arnold Networks (KANs) to parameterize structural equations within a causal generative model framework. The model supports mixed data types and employs an additive noise structural causal model (SCM) approach. A validation pipeline is introduced for model assessment based on distributional matching and independence testing of inferred exogenous variables.
Results
KaCGM shows competitive performance against existing state-of-the-art causal generative models on both synthetic and semi-synthetic datasets. The real-world application in a cardiovascular case study successfully extracts simplified structural equations and interpretable causal effects, demonstrating the model's practical utility.
Implications
The findings suggest that KaCGM can enhance the transparency and interpretability of causal generative models, making it suitable for high-stakes applications such as personalized medicine and public policy, where understanding causal relationships is critical for decision-making.
What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time
Reinforcement Learning
Large Language Models
NLP
- SCRL mitigates label noise amplification in TTRL by enforcing strict consensus criteria.
- Introduces negative supervision for the first time in TTRL to prune incorrect trajectories.
- Demonstrates substantial performance improvements over baseline methods in challenging scenarios.
- Maintains robust generalization and training stability under limited rollout budgets.
Read more
What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time
Summary
This paper introduces Selective-Complementary Reinforcement Learning (SCRL), a novel framework for Test-Time Reinforcement Learning (TTRL) aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) in challenging scenarios. Traditional TTRL methods rely heavily on positive pseudo-labeling derived from majority voting, which can lead to incorrect trajectory reinforcement when answer distributions are dispersed. SCRL addresses this limitation by implementing Selective Positive Pseudo-Labeling, which applies strict consensus criteria to filter out unreliable majorities, and Entropy-Gated Negative Pseudo-Labeling, the first negative supervision mechanism in TTRL, to prune incorrect trajectories based on uncertainty. This dual approach mitigates label noise amplification and improves the robustness of the learning process. The authors conducted extensive experiments across multiple reasoning benchmarks, demonstrating that SCRL significantly outperforms existing methods, particularly under constrained rollout budgets, while maintaining generalization and training stability.
Methodology
SCRL employs a two-pronged approach: Selective Positive Pseudo-Labeling enforces strict consensus criteria to ensure that positive supervision is only applied when answer distributions are concentrated, while Entropy-Gated Negative Pseudo-Labeling identifies and penalizes incorrect trajectories based on their uncertainty. This is complemented by Dynamic Reward Shaping, which integrates both positive and negative signals to guide the learning process effectively.
Results
The experiments showed that SCRL consistently outperformed baseline methods across various reasoning benchmarks, particularly in scenarios with limited rollout budgets. The results highlighted the effectiveness of the proposed mechanisms in reducing label noise and improving the overall learning stability and performance of LLMs.
Implications
SCRL has the potential to enhance the performance of LLMs in unsupervised reasoning tasks, making it applicable in fields requiring robust decision-making under uncertainty, such as automated reasoning, natural language understanding, and complex problem-solving.
HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning
Reinforcement Learning
Large Language Models
- HISR enhances credit assignment in multi-turn RL by aligning rewards with sub-goals.
- The segment-level process reward model avoids overly fine-grained reward allocation.
- A hindsight model captures action importance based on trajectory outcomes.
- Extensive experiments show HISR achieves state-of-the-art performance on benchmark tasks.
Read more
HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning
Summary
The paper introduces HISR (Hindsight Information Modulated Segmental Process Rewards), a novel approach aimed at enhancing the performance of large language models (LLMs) in complex long-horizon agentic decision-making tasks. Traditional reinforcement learning (RL) methods struggle with delayed rewards and unreliable credit assignment, particularly in multi-turn scenarios. HISR addresses these issues by aligning rewards with sub-goals and emphasizing significant segments of the task trajectory. The authors propose a segment-level process reward model that assigns rewards to sub-goals rather than individual turns, thus avoiding overly granular reward allocation. Additionally, a hindsight model is developed to assess action importance based on the likelihood of actions after observing the trajectory outcome. This model uses ratios of sequence likelihoods between the hindsight and policy models to aggregate segment importance scores, which modulate the segmental process rewards. The effectiveness of HISR is validated through extensive experiments on three publicly available benchmarks, demonstrating state-of-the-art performance and improved credit assignment reliability.
Methodology
The methodology involves developing a segment-level process reward model that assigns rewards to sub-goals instead of individual turns. A hindsight model is utilized to evaluate action importance based on the likelihood of actions after knowing the trajectory outcome. Ratios of sequence likelihoods between the hindsight and policy models are calculated to aggregate segment importance scores, which modulate the segmental process rewards.
Results
The experimental results indicate that HISR outperforms existing methods on three publicly available agentic benchmarks, achieving state-of-the-art performance. The case studies further validate the effectiveness of the proposed approach in enhancing credit assignment reliability.
Implications
The findings suggest that HISR could significantly improve the performance of LLMs in complex decision-making tasks, paving the way for more advanced applications in areas such as household assistance, automated planning, and interactive AI systems.
CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing
Large Language Models
NLP
Interpretability
- Introduction of CLARE, a lightweight technique for predicting ripple effects in LLM editing.
- Achieves 62.2% improvement in predictive accuracy over gradient-based methods.
- Utilizes a curated corpus of 11,427 facts for systematic analysis of model edits.
- Significantly faster and more memory-efficient than existing techniques.
Read more
CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing
Summary
This paper introduces CLARE (Critical Layer Representation Entanglement), a novel technique designed to quantify representational entanglement in large language models (LLMs) to predict ripple effects resulting from model editing. As LLMs often contain outdated or incorrect knowledge, model-editing techniques are employed to update factual associations. However, these edits can lead to unintended ripple effects that propagate through the model, affecting unrelated outputs. CLARE addresses this issue by utilizing forward activations from a single intermediate layer to measure entanglement, avoiding the computational costs associated with gradient-based methods. The authors curated a corpus of 11,427 facts from diverse datasets to analyze how edits propagate through the model's knowledge base. The results demonstrate that CLARE significantly outperforms existing methods, achieving an average of 62.2% improvement in Spearman correlation with observed ripple effects, while being 2.74 times faster and using 2.85 times less peak GPU memory. The entanglement graphs generated by CLARE facilitate safer model editing, enabling stronger preservation sets, audit trails, and scalable evaluations, thus enhancing the reliability and interpretability of LLMs.
Methodology
CLARE quantifies representational entanglement using forward activations from a single intermediate layer of LLMs, avoiding the need for backward passes or gradient computations. The authors prepared a large corpus of facts to analyze how local edits propagate through the model's representational space, generating entanglement graphs for multiple models.
Results
CLARE demonstrated an average of 62.2% improvement in Spearman correlation with observed ripple effects compared to baseline methods. It was also 2.74 times faster and utilized 2.85 times less peak GPU memory, while requiring significantly less storage for fact representations.
Implications
The findings suggest that CLARE can enhance the safety and interpretability of model editing in LLMs, making it a valuable tool for researchers and practitioners. The entanglement graphs can be used for constructing preservation sets, conducting audits, and facilitating red-teaming efforts, ultimately leading to more reliable and interpretable AI systems.
The $ extbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus
Large Language Models
Theory
Efficient ML
- Introduction of λ-RLM, a structured framework for long-context reasoning in LLMs.
- Replacement of arbitrary code generation with a typed functional runtime based on λ-calculus.
- Formal guarantees of termination, predictable computation, and improved reliability.
- Empirical results show significant improvements in accuracy and latency over standard RLMs.
Read more
The $ extbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus
Summary
This paper introduces λ-RLM, a novel framework designed to enhance long-context reasoning in large language models (LLMs) by addressing the limitations of existing recursive language models (RLMs). Traditional RLMs utilize an open-ended read-eval-print loop (REPL) for recursive problem-solving, which can lead to unpredictable execution and verification challenges. In contrast, λ-RLM employs a typed functional runtime based on λ-calculus, replacing arbitrary code generation with a structured set of pre-verified combinators. This approach allows for explicit control flow and guarantees such as termination and predictable accuracy scaling with recursion depth. The authors demonstrate that λ-RLM significantly outperforms standard RLMs across various long-context reasoning tasks, achieving up to a 21.9% increase in accuracy and reducing latency by up to 4.1 times. The implementation of λ-RLM is open-sourced, promoting further research and application in the community.
Methodology
The authors developed λ-RLM by integrating a library of deterministic combinators into a functional runtime, allowing for structured recursive reasoning. The model operates by executing neural inference only on bounded leaf subproblems while managing higher-level control through a planner that ensures predictable execution.
Results
λ-RLM outperformed standard RLMs in 29 out of 36 model-task comparisons, with an average accuracy improvement of up to 21.9 points and a latency reduction of up to 4.1 times across four long-context reasoning tasks and nine base models.
Implications
The findings suggest that structured symbolic control can enhance the reliability and efficiency of LLMs in handling long-context reasoning tasks, potentially leading to broader applications in areas requiring complex problem-solving and evidence gathering.
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3
Optimization
Theory
- Proves global convergence of a multiplicative update iteration for a nuclear norm optimization problem.
- Demonstrates the utility of AI (Gemini 3) in assisting mathematical proofs.
- Closes a previously open problem regarding the convergence of fixed-point iterations in private machine learning contexts.
- Includes a narrative on the collaborative process of using AI in mathematics.
Read more
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3
Summary
This paper addresses the global convergence of a fixed-point iteration arising from the optimization of a regularized nuclear norm objective, specifically in the context of private machine learning. The author proves that the iteration defined by v(k+1) = diag((D1/2 v(k) M D1/2 v(k))^1/2) converges monotonically to the unique global optimizer of the potential function J(v) = 2 Tr((D1/2 v M D1/2 v)^1/2) - Σ vi. This result closes a gap left open in previous literature, where local convergence was established but global convergence remained unproven. The proof was significantly aided by the AI model Gemini 3, which identified a crucial variational characterization of the nuclear norm that facilitated the diagonalization of the problem. The paper also discusses the collaborative process of using AI in mathematical proofs, providing insights into the interaction between human mathematicians and AI systems.
Methodology
The methodology involves analyzing a fixed-point iteration derived from a nuclear norm optimization problem. The author employs mathematical proofs to establish the monotonic ascent property of the potential function, ensuring global convergence. The proof leverages properties of the nuclear norm and involves a collaborative approach with the AI model Gemini 3 to identify critical mathematical insights.
Results
The main result is the proof that the iteration v(k+1) = diag((D1/2 v(k) M D1/2 v(k))^1/2) converges monotonically to the unique global maximum of the potential function J(v). This result confirms the global convergence of the previously studied local iteration and provides a framework for future research in similar optimization problems.
Implications
The findings have significant implications for the field of private machine learning, particularly in optimizing algorithms that require privacy guarantees. The successful collaboration with AI models suggests new avenues for mathematical research and problem-solving, potentially leading to more efficient algorithms in various applications.
Integrating Meta-Features with Knowledge Graph Embeddings for Meta-Learning
Graph Learning
- KGmetaSP utilizes knowledge graph embeddings to enhance meta-learning tasks.
- The approach captures dataset-pipeline interactions by integrating past experiment metadata.
- A large-scale benchmark of 144,177 experiments was created to validate the method.
- KGmetaSP shows significant improvements in both pipeline performance estimation and dataset similarity estimation.
Read more
Integrating Meta-Features with Knowledge Graph Embeddings for Meta-Learning
Summary
This paper introduces KGmetaSP, a novel approach that integrates knowledge graph embeddings (KGEs) with meta-features for enhancing meta-learning tasks, specifically pipeline performance estimation (PPE) and dataset performance-based similarity estimation (DPSE). Traditional methods for these tasks primarily utilize numeric dataset representations, often neglecting the rich metadata from past experiments available on platforms like OpenML. By constructing a unified knowledge graph (MetaExe-KG) that encapsulates both datasets and their associated pipeline configurations, the authors leverage KGEs to capture the latent relationships between datasets and pipelines. This enables a more nuanced understanding of dataset-pipeline interactions, which is crucial for accurate PPE and DPSE. The authors validate their approach using a large-scale benchmark of 144,177 OpenML experiments, demonstrating that KGmetaSP significantly improves the accuracy of PPE and enhances DPSE compared to existing methods. The proposed methodology and benchmark are made publicly available, setting a new standard for meta-learning research.
Methodology
The authors constructed a unified knowledge graph (MetaExe-KG) that integrates semantic representations of datasets and their pipeline configurations. They employed knowledge graph embedding techniques to learn vector representations for datasets and pipelines, which were then used in pipeline-agnostic meta-models for PPE and distance-based retrieval for DPSE.
Results
KGmetaSP demonstrated accurate pipeline performance estimation using a single meta-model and improved dataset performance-based similarity estimation over existing baselines. The evaluation showed that the approach effectively captures the interactions between datasets and pipelines, leading to better predictive performance.
Implications
The integration of knowledge graph embeddings into meta-learning tasks has the potential to significantly enhance the efficiency and effectiveness of machine learning workflows, particularly in scenarios where historical experiment data is abundant. This could lead to more informed decision-making in selecting and optimizing machine learning pipelines.
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
NLP
Large Language Models
Optimization
- Developed a theoretical framework connecting parameter and logit spaces to analyze learning dynamics.
- Identified the squeezing effect as a result of rapid expansion of residuals along high-curvature directions.
- Introduced logits-SAM, a computationally efficient variant of SAM that improves DPO performance.
- Demonstrated consistent performance gains across multiple datasets and benchmarks.
Read more
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
Summary
This paper addresses the challenges faced by Direct Preference Optimization (DPO) in aligning large language models with human preferences, particularly the 'squeezing effect' that leads to a decrease in the probability of preferred responses during training. The authors develop a theoretical framework that models learning dynamics in logit space, revealing that negative-gradient updates can cause residuals to expand along high-curvature directions, exacerbating the squeezing effect. To mitigate this, they introduce Sharpness-Aware Minimization (SAM) as a solution, demonstrating its effectiveness in reducing curvature-related issues. The authors propose a computationally efficient variant, logits-SAM, which only perturbs the output layer, thus incurring minimal overhead. Extensive experiments on various large language models (Pythia-2.8B, Mistral-7B, and Gemma-2B-IT) across multiple datasets show that logits-SAM significantly enhances the performance of DPO, making it a valuable addition to the training process of language models.
Methodology
The authors establish a theoretical framework to analyze the learning dynamics in both parameter and logit spaces. They utilize Sharpness-Aware Minimization (SAM) to regularize curvature during training and propose logits-SAM, which focuses on perturbing only the output layer parameters to enhance efficiency. The effectiveness of logits-SAM is validated through extensive experiments on large language models across various datasets.
Results
The experiments reveal that logits-SAM consistently improves the effectiveness of DPO, leading to better alignment of language models with human preferences. The results indicate that logits-SAM incurs virtually no additional computational overhead while significantly enhancing model performance.
Implications
The findings suggest that incorporating curvature-aware training methods like logits-SAM can improve the stability and effectiveness of preference optimization in large language models, potentially leading to better alignment with human values and preferences in AI systems.
MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasoning Models
Large Language Models
Reinforcement Learning
Generative Models
- Introduction of MOLRGEN, a large-scale benchmark for de novo molecular generation.
- Development of a diversity-aware top-k scoring system for evaluating generated molecules.
- Successful training of a 24B LLM using reinforcement learning for molecular generation.
- Emphasis on the challenges of exploring chemical space in drug discovery.
Read more
MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasoning Models
Summary
The paper introduces MOLRGEN, a novel benchmark and dataset aimed at enhancing the training and evaluation of reasoning-based large language models (LLMs) for de novo molecular generation. Traditional approaches in molecular design often rely on ground-truth labels, which are not available in de novo generation where the goal is to create new molecules without prior knowledge of high-scoring candidates. The authors propose a threefold contribution: first, they establish a training and evaluation setting specifically for de novo molecular generation and property prediction; second, they introduce a diversity-aware top-k score that assesses both the quality and diversity of generated molecules; and third, they demonstrate the feasibility of training a 24B parameter LLM using reinforcement learning to optimize molecular generation. The study highlights the challenges of exploring chemical space and the potential of reasoning-based approaches to improve the quality of generated molecules through structured reasoning.
Methodology
The authors created a large-scale dataset comprising 4.5k protein structures and associated molecular property prediction tasks. They evaluated various open-source LLMs using the proposed diversity-aware top-k score and employed reinforcement learning to train a 24B parameter LLM for de novo molecular generation, focusing on optimizing a desirability score without prior knowledge of high-scoring candidates.
Results
The study successfully demonstrated that the proposed dataset and scoring methods could effectively train LLMs for generating high-reward molecules. The analysis revealed limitations in exploring the chemical space, indicating that while the approach shows promise, challenges remain in identifying viable drug candidates.
Implications
The findings suggest that reasoning-based LLMs can significantly enhance the de novo molecular generation process, potentially accelerating drug discovery and molecular design. The proposed benchmark and scoring methods could serve as a foundation for future research in this area, facilitating the development of more effective generative models in chemistry.
BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection
Time Series
Reinforcement Learning
Optimization
- Introduces a reconstruction-driven framework for generating hard negatives in TSAD.
- Utilizes reinforcement learning to adaptively control the negative sample generation process.
- Improves temporal semantic consistency and decision-boundary supervision in anomaly detection.
- Achieves competitive performance compared to existing TSAD methods.
Read more
BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection
Summary
The paper presents BoundAD, a novel framework for time series anomaly detection (TSAD) that focuses on improving the quality of negative sample generation. Traditional contrastive learning methods often rely on random perturbations or pseudo-anomaly injections, which can compromise temporal semantic consistency and decision-boundary supervision. BoundAD addresses this by employing a reconstruction-driven approach that generates hard negatives directly from normal samples during the reconstruction process. A reconstruction network captures normal temporal patterns, while a reinforcement learning strategy adaptively adjusts the optimization updates based on the reconstruction state. This allows the model to generate boundary-shifted samples that are close to the normal data manifold, enhancing the contrastive representation learning process. The experimental results demonstrate that BoundAD significantly improves anomaly representation learning and achieves competitive detection performance across various datasets, highlighting its effectiveness over existing methods that depend on predefined anomaly patterns.
Methodology
The BoundAD framework employs a reconstruction network to learn normal temporal patterns and uses reinforcement learning to dynamically adjust the generation of hard negatives based on the current state of reconstruction. This approach allows for the creation of boundary-shifted samples that enhance the contrastive learning process without relying on explicit anomaly injections.
Results
Experimental evaluations show that BoundAD significantly enhances the quality of anomaly representation learning and achieves competitive performance in anomaly detection tasks, outperforming traditional methods that rely on fixed anomaly patterns.
Implications
The proposed method has significant implications for various real-world applications in industrial monitoring, healthcare, and cybersecurity, where accurate anomaly detection is crucial. By improving the generation of negative samples, BoundAD can lead to more robust and reliable anomaly detection systems.
Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation
NLP
Large Language Models
Interpretability
- Introduction of Dual Path Attribution (DPA) for efficient model attribution.
- DPA operates with O(1) time complexity, making it scalable for long sequences.
- The method decomposes the SwiGLU Transformer into control and content pathways.
- Extensive experiments show DPA achieves state-of-the-art faithfulness and efficiency.
Read more
Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation
Summary
This paper presents Dual Path Attribution (DPA), a novel framework designed to enhance the interpretability of transformer-based large language models (LLMs), specifically focusing on SwiGLU Transformers. The authors address the challenge of efficiently attributing model predictions to inputs and internal components, a task that has been hindered by the computational expense of existing methods. DPA operates by performing a single forward and backward pass through the model, effectively tracing the information flow without requiring counterfactual examples. The framework decomposes the computational structure of the SwiGLU Transformers into distinct control and content pathways, allowing for efficient propagation of a targeted unembedding vector. This approach achieves constant time complexity with respect to the number of model components, making it scalable for long input sequences. The authors validate DPA through extensive experiments on standard interpretability benchmarks, demonstrating that it outperforms existing attribution methods in terms of both faithfulness and efficiency, thereby providing deeper insights into the causal mechanisms of LLMs.
Methodology
The DPA framework involves two main stages: a forward pass where the model processes the input and caches necessary activations, followed by a backward pass that propagates a targeted unembedding vector through the transformer layers. This method leverages the bilinear structure of SwiGLU Transformers to decompose the computational graph into distinct pathways for effective attribution.
Results
The experiments conducted on standard interpretability benchmarks indicate that DPA achieves superior performance in terms of both faithfulness and computational efficiency compared to existing state-of-the-art attribution methods. This includes better identification of causal mechanisms within the model.
Implications
The development of DPA has significant implications for the deployment of transformer-based models in real-world applications, as it enhances the understanding of model behavior and decision-making processes. This can lead to more reliable AI systems and facilitate the debugging and improvement of LLMs.
RiboSphere: Learning Unified and Efficient Representations of RNA Structures
Generative Models
Graph Learning
Interpretability
- RiboSphere combines vector quantization and flow matching to learn discrete representations of RNA structures.
- The framework captures biologically meaningful motifs, enhancing interpretability and generalization.
- RiboSphere achieves state-of-the-art performance in structure reconstruction and inverse folding tasks.
- The model demonstrates effective transferability to RNA-ligand binding predictions, even in data-scarce conditions.
Read more
RiboSphere: Learning Unified and Efficient Representations of RNA Structures
Summary
The paper presents RiboSphere, a novel framework designed to learn discrete geometric representations of RNA structures, addressing the challenges posed by RNA's flexible backbone, prevalent non-canonical interactions, and the scarcity of experimentally determined 3D structures. RiboSphere integrates vector quantization with flow matching to capture the modular organization of RNA architecture, where complex folds are composed of recurring structural motifs. The framework employs a geometric transformer encoder to generate SE(3)-invariant features, which are then discretized into a finite vocabulary of latent codes using finite scalar quantization (FSQ). A flow-matching decoder reconstructs atomic coordinates from these discrete codes, enabling high-fidelity structure generation. The results indicate that the learned code indices are enriched for specific RNA motifs, demonstrating that RiboSphere captures motif-level compositional structure. The framework achieves strong performance in structure reconstruction, inverse folding, and RNA-ligand binding prediction, showcasing its robustness in data-scarce scenarios.
Methodology
RiboSphere employs a geometric transformer encoder to produce SE(3)-invariant features, which are discretized using finite scalar quantization (FSQ) into a codebook of latent representations. A flow-matching decoder reconstructs atomic coordinates from these discrete codes, facilitating high-fidelity RNA structure generation.
Results
RiboSphere achieves a root-mean-square deviation (RMSD) of 1.25 Å and a TM-score of 0.84 in structure reconstruction. It also records a 63.0% sequence recovery rate in inverse folding tasks and outperforms existing models in RNA-ligand binding prediction, particularly on challenging data splits.
Implications
The RiboSphere framework has significant implications for computational biology, particularly in RNA structure prediction and design. Its ability to learn interpretable structural motifs could enhance our understanding of RNA functionality and facilitate advancements in RNA-targeted therapeutics.
AgenticRS-EnsNAS: Ensemble-Decoupled Self-Evolving Architecture Search
Theory
Efficient ML
Optimization
- Introduces Ensemble-Decoupled Architecture Search to reduce validation costs in NAS.
- Establishes a theoretical condition for ensemble error improvement based on architecture properties.
- Decouples architecture search from full ensemble training, enabling faster iterations.
- Categorizes solution strategies for different types of architecture searches.
Read more
AgenticRS-EnsNAS: Ensemble-Decoupled Self-Evolving Architecture Search
Summary
This paper addresses the computational challenges of Neural Architecture Search (NAS) in industrial recommender systems, particularly the validation bottleneck caused by the need to evaluate multiple ensemble models. The authors propose a novel framework called Ensemble-Decoupled Architecture Search, which allows for the prediction of system-level performance using a single or a few learners instead of the full ensemble. The framework is built on Ensemble-Decoupled Theory, which establishes a condition for guaranteed improvement in ensemble error based on estimable architecture properties. This approach significantly reduces the per-candidate search cost from O(M) to O(1), while maintaining the O(M) deployment cost only for validated architectures. The authors categorize solution strategies into three types based on the nature of the architecture being searched: closed-form optimization, constrained differentiable optimization, and LLM-driven search. The paper also reveals two mechanisms for improvement: base diversity gain and accuracy gain, providing actionable insights for NAS in resource-constrained environments. The theoretical foundations are rigorously derived, and the paper outlines plans for comprehensive empirical validation in future work.
Methodology
The methodology involves establishing the Ensemble-Decoupled Theory, which provides a sufficient condition for ensemble error reduction. The authors utilize lightweight dual-learner training to estimate key architecture properties (expected error change, correlation, and variance) that inform the search process. They categorize solution strategies into closed-form optimization, constrained differentiable optimization, and LLM-driven search, allowing for flexibility based on the architecture type.
Results
The proposed framework reduces the computational cost of validating candidate architectures from O(M) to O(1), while ensuring that the deployment cost remains O(M) only for architectures that are validated as superior. The theoretical derivations are supported by rigorous proofs, and the authors plan to validate their approach empirically in future work.
Implications
The implications of this work are significant for industrial applications of NAS, particularly in environments where computational resources are limited. By enabling faster iterations and reducing costs, this framework can enhance the robustness and accuracy of recommender systems and other applications that rely on ensemble models.
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards
Reinforcement Learning
Large Language Models
Efficient ML
- Introduces Discounted Beta–Bernoulli (DBB) reward estimation to improve sample efficiency in RLVR.
- DBB leverages historical reward statistics to reduce variance and avoid variance collapse.
- Empirical results show significant accuracy improvements over naive GRPO methods.
- DBB achieves lower mean squared error in low-sample scenarios compared to traditional point estimation.
Read more
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards
Summary
This paper addresses the inefficiencies in reinforcement learning with verifiable rewards (RLVR), particularly in group-based methods that suffer from high sample inefficiency due to reliance on point estimation of rewards. The authors reformulate the reward estimation problem by modeling rewards as samples from a policy-induced distribution and propose a new method called Discounted Beta–Bernoulli (DBB) reward estimation. This method utilizes historical reward statistics to provide a more stable and informative training signal, reducing variance and avoiding variance collapse. The DBB estimator, while biased, demonstrates lower mean squared error compared to standard point estimation, particularly in low-sample scenarios. The authors conduct extensive experiments on various reasoning benchmarks, showing that their approach consistently outperforms naive group relative policy optimization (GRPO) methods without incurring additional computational costs. The results indicate significant improvements in accuracy across both in-distribution and out-of-distribution tasks, highlighting the effectiveness of the DBB approach in enhancing the performance of large language models in RLVR settings.
Methodology
The authors adopt a statistical perspective on reward estimation, modeling rewards as stochastic outcomes drawn from a distribution induced by the policy. They propose the DBB estimator, which tracks the evolving reward distribution by discounting historical observations. This Bayesian framework allows for explicit modeling of uncertainty and temporal dynamics in rewards, leading to reduced variance and more stable training signals.
Results
The proposed GRPO with DBB (GRPO-DBB) outperformed naive GRPO across six in-distribution benchmarks, achieving average accuracy improvements of 3.22 points for the 1.7B model and 2.42 points for the 8B model. In out-of-distribution benchmarks, GRPO-DBB achieved average accuracy gains of 12.49 points and 6.92 points for the respective models, demonstrating the method's robustness and effectiveness.
Implications
The findings suggest that DBB reward estimation can significantly enhance the performance of large language models in RLVR settings, making it a valuable approach for improving reasoning capabilities in various applications. This could lead to more efficient training processes and better utilization of computational resources in reinforcement learning tasks.
Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training
Optimization
Theory
- Introduces a mathematical framework for population-based neural network training dynamics.
- Establishes connections between population-based learning, bilevel optimization, and replicator-mutator models.
- Demonstrates the role of noise and diversity in optimizing hyperparameters and model parameters.
- Validates theoretical results through numerical experiments, highlighting the benefits of effective fitness measures.
Read more
Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training
Summary
This paper presents a theoretical framework for understanding neural network training through a population-based lens, specifically focusing on two-time-scale learning dynamics. The authors model a population of neural networks as an interacting agent system, where network parameters are updated rapidly via stochastic gradient descent (SGD) or Langevin dynamics, while hyperparameters evolve more slowly through selection and mutation processes. The framework allows for the derivation of a selection-mutation equation for hyperparameter density under the assumption of strong time-scale separation. The study connects population-based learning with bilevel optimization and classical replicator-mutator models, providing insights into how noise and diversity can balance optimization and exploration. Numerical experiments validate the theoretical findings, demonstrating that access to effective fitness measures can enhance population-level updates, thereby improving training efficiency and outcomes.
Methodology
The authors develop a theoretical model that describes the dynamics of a population of neural networks, employing fast noisy gradient updates for parameters and slower selection-mutation dynamics for hyperparameters. They prove the large-population limit for the joint distribution of parameters and hyperparameters and derive a selection-mutation equation under strong time-scale separation. Numerical experiments are conducted to illustrate the dynamics and validate the theoretical framework.
Results
The study shows that the averaged dynamics of the population converge towards the fittest hyperparameter, and that the fast parameter dynamics relax to a Boltzmann-Gibbs measure. The results indicate that effective fitness measures can significantly improve population-level updates, leading to better training outcomes.
Implications
This framework provides a deeper understanding of population-based training methods, which could enhance the design of more efficient machine learning algorithms, particularly in hyperparameter optimization and reinforcement learning contexts. The insights gained may also inform future research on multi-agent systems and evolutionary strategies in AI.
ODySSeI: An Open-Source End-to-End Framework for Automated Detection, Segmentation, and Severity Estimation of Lesions in Invasive Coronary Angiography Images
Computer Vision
- ODySSeI provides an automated solution for lesion detection and severity estimation in ICA images.
- The Pyramidal Augmentation Scheme (PAS) significantly enhances model performance, especially in complex tasks.
- The framework achieves high accuracy in estimating lesion severity, with minimal deviation from ground truth.
- ODySSeI processes ICA images rapidly, making it suitable for real-time clinical applications.
Read more
ODySSeI: An Open-Source End-to-End Framework for Automated Detection, Segmentation, and Severity Estimation of Lesions in Invasive Coronary Angiography Images
Summary
The paper presents ODySSeI, an open-source framework designed for the automated detection, segmentation, and severity estimation of lesions in Invasive Coronary Angiography (ICA) images. ICA is the gold standard for assessing coronary artery disease, but its interpretation is subjective and varies significantly among operators. ODySSeI addresses this issue by integrating deep learning models for lesion detection and segmentation, utilizing a novel Pyramidal Augmentation Scheme (PAS) to enhance performance across diverse patient cohorts. The framework also introduces a quantitative coronary angiography-free Lesion Severity Estimation (LSE) technique that computes the Minimum Lumen Diameter (MLD) and diameter stenosis directly from predicted lesion geometries. Evaluation on clinical datasets shows ODySSeI's strong generalizability and efficiency, processing images in seconds on a CPU and fractions of a second on a GPU. The results indicate significant performance improvements, particularly in lesion detection, with a 2.5-fold increase compared to baseline methods. ODySSeI is available as a plug-and-play web interface, promoting reproducible and scalable ICA analysis for clinical decision-making.
Methodology
ODySSeI employs deep learning models for lesion detection and segmentation, enhanced by a Pyramidal Augmentation Scheme (PAS) that applies various augmentation techniques to improve model robustness. The framework processes raw ICA images to detect lesions, segments them, and estimates their severity using a novel algorithm that does not rely on traditional quantitative coronary angiography methods.
Results
The framework demonstrated a 2.5-fold increase in lesion detection performance and a 1-3% increase in lesion segmentation performance compared to baseline models. The Lesion Severity Estimation technique achieved high accuracy, with predicted MLD values differing by only 2-3 pixels from ground truths. ODySSeI processes images in seconds on a CPU and fractions of a second on a GPU.
Implications
ODySSeI has the potential to standardize the interpretation of ICA images, reducing operator variability and improving clinical decision-making. Its real-time processing capabilities make it a valuable tool for cardiologists, facilitating timely interventions for coronary artery disease.
Target Concept Tuning Improves Extreme Weather Forecasting
Time Series
Interpretability
- Introduces Target Concept Tuning (TaCT) for fine-tuning deep learning models in extreme weather forecasting.
- Utilizes Sparse Autoencoders to identify failure-related concepts for targeted model adaptation.
- Achieves improved forecasting accuracy for typhoons while maintaining performance on other meteorological variables.
- Reveals model biases through interpretable concepts corresponding to meteorological structures.
Read more
Target Concept Tuning Improves Extreme Weather Forecasting
Summary
This paper addresses the challenges faced by deep learning models in meteorological forecasting, particularly during rare but impactful extreme weather events like typhoons. Traditional fine-tuning methods struggle with a trade-off between neglecting these extreme events and overfitting them, which can degrade overall performance. The authors propose a novel framework called Target Concept Tuning (TaCT), which enhances model adaptation specifically for failure cases while maintaining performance in common scenarios. TaCT utilizes Sparse Autoencoders to identify failure-related internal concepts and employs a concept-gated fine-tuning approach, updating model parameters only when these concepts are activated. This method not only improves typhoon forecasting accuracy across various regions but also preserves the model's performance on other meteorological variables. The identified concepts correspond to physically meaningful circulation patterns, which help reveal model biases and support trustworthy adaptations in scientific forecasting tasks. The results demonstrate consistent improvements in forecasting accuracy without compromising the model's general predictive capabilities.
Methodology
The authors developed TaCT, an interpretable concept-gated fine-tuning framework that employs Sparse Autoencoders to discover and disentangle failure-related internal concepts. Fine-tuning is performed selectively, updating model parameters only when the identified concepts are activated, thus preserving the model's overall performance.
Results
The experiments showed that TaCT consistently improved typhoon forecasting accuracy across different cyclone basins (Northern Atlantic, Western Pacific, and Eastern Pacific) without degrading the performance on other meteorological variables. The identified concepts were linked to meaningful meteorological patterns, enhancing interpretability.
Implications
The findings suggest that TaCT can be a valuable tool for improving the reliability of AI-based weather forecasting models, particularly in high-stakes scenarios where accurate predictions of extreme weather events are critical for disaster preparedness and response.
DPxFin: Adaptive Differential Privacy for Anti-Money Laundering Detection via Reputation-Weighted Federated Learning
Federated Learning
- Introduction of DPxFin, a reputation-driven differential privacy framework for federated learning in finance.
- Dynamic adjustment of differential privacy noise based on client reputation enhances model utility and privacy.
- Extensive experiments show improved performance in fraud detection on AML datasets, particularly under non-IID conditions.
- DPxFin effectively mitigates risks of data leakage, proving its robustness in financial applications.
Read more
DPxFin: Adaptive Differential Privacy for Anti-Money Laundering Detection via Reputation-Weighted Federated Learning
Summary
The paper presents DPxFin, a novel federated learning framework designed to enhance anti-money laundering (AML) detection while addressing data privacy concerns. The framework integrates reputation-guided adaptive differential privacy, which dynamically assigns differential privacy noise to client updates based on their calculated reputation. This reputation is determined by the alignment of locally trained models with a global model, allowing clients with higher reputations to contribute more effectively to the model while minimizing the risk of privacy leakage from lower-reputation clients. The authors validate DPxFin using an AML dataset under both IID and non-IID conditions, employing a Multi-Layer Perceptron (MLP) for model training. The results indicate that DPxFin achieves a superior balance between accuracy and privacy compared to traditional federated learning and fixed-noise differential privacy approaches. Furthermore, the framework demonstrates resilience against tabular data leakage attacks, confirming its effectiveness in real-world financial scenarios.
Methodology
The methodology involves a reputation-based dynamic differential privacy aggregation mechanism where client reputations are calculated using the Euclidean distance between local models and a temporary global model. Clients with high reputations receive lower differential privacy noise, while those with low reputations are assigned higher noise levels to protect data integrity.
Results
The experimental results demonstrate that DPxFin outperforms existing federated learning and fixed-noise differential privacy methods in terms of accuracy and privacy trade-offs. The framework successfully withstands tabular data leakage attacks, validating its practical applicability in financial fraud detection.
Implications
The implications of this research extend to enhancing the effectiveness of AML systems in financial institutions by leveraging federated learning while ensuring data privacy. The adaptive differential privacy approach can be applied to other sensitive domains requiring collaborative learning without compromising individual data security.
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
NLP
Large Language Models
Efficient ML
- AFBS-BO automates hyperparameter tuning for sparse attention, eliminating the need for manual grid search.
- The framework achieves 3.4× faster hyperparameter discovery with 8.8× fewer evaluations than traditional methods.
- Configurations discovered by AFBS-BO outperform existing sparse attention baselines while closely matching dense attention quality.
- The method leverages multi-fidelity evaluation to efficiently explore hyperparameter spaces.
Read more
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
Summary
The paper addresses the usability gap in sparse attention mechanisms for transformers, which are hindered by the need for optimal hyperparameters that vary across layers and models. The authors propose AFBS-BO (Adaptive Fidelity Binary Search with Bayesian Optimization), an automated framework that identifies layer- and head-specific hyperparameters without human intervention. This hybrid algorithm combines Bayesian Optimization for global exploration with binary search for local refinement, utilizing multi-fidelity evaluation to reduce tuning costs. The results show that AFBS-BO accelerates hyperparameter discovery by 3.4 times and requires 8.8 times fewer evaluations compared to traditional grid search methods. It also identifies high-sparsity configurations that outperform existing sparse attention methods while maintaining quality comparable to dense attention. By transforming sparse attention into a self-optimizing component, AFBS-BO facilitates easier integration across various transformer architectures and applications.
Methodology
The AFBS-BO framework employs a three-stage hybrid algorithm: (1) Bayesian Optimization for global exploration of the hyperparameter space using low-fidelity evaluations, (2) Binary Search Refinement for precise tuning using high-fidelity evaluations, and (3) Multi-Input Validation to ensure robustness across diverse inputs.
Results
AFBS-BO achieved a hyperparameter discovery time of 3.0 seconds for a 12-layer Llama-2-7B model, compared to 10.1 seconds for grid search, while requiring only 240 evaluations instead of 2100. It discovered configurations that achieved a perplexity of 7.45 at 70.7% sparsity, outperforming the state-of-the-art H2O method and closely approaching the theoretical Top-K oracle.
Implications
The automated tuning of sparse attention mechanisms can significantly enhance the deployment of transformers in various applications, reducing the expertise barrier and enabling more efficient model training and inference in natural language processing and other domains.
FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
Large Language Models
Federated Learning
Optimization
- FedPDPO is the first framework for aligning LLMs with human preferences in federated learning while preserving privacy.
- The framework utilizes a frozen LLM backbone with a shared LoRA adapter and personalized client-specific heads to address non-IID data challenges.
- A personalized DPO training strategy is introduced to enhance generalization and mitigate the limitations of implicit rewards.
- The proposed bottleneck adapter effectively bridges global and local knowledge, improving model performance.
Read more
FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
Summary
The paper presents FedPDPO, a novel framework designed to align large language models (LLMs) with human preferences in a federated learning (FL) setting. The authors address the challenges posed by decentralized, privacy-sensitive, and non-IID preference data, which can lead to performance degradation when using Direct Preference Optimization (DPO) in FL. FedPDPO employs a parameter-efficient fine-tuning architecture where each client maintains a frozen pretrained LLM backbone augmented with a Low-Rank Adaptation (LoRA) adapter. This setup allows for efficient communication and aggregation. The framework introduces three key innovations: a globally shared LoRA adapter with a personalized client-specific LLM head to tackle non-IID data heterogeneity, a personalized DPO training strategy with a client-specific explicit reward head to enhance generalization, and a bottleneck adapter to balance global and local features. Theoretical analyses support the framework's soundness, and extensive experiments demonstrate its effectiveness, achieving significant accuracy improvements across various preference datasets.
Methodology
FedPDPO employs a federated learning architecture where each client uses a frozen pretrained LLM with a fine-tuned LoRA adapter. It introduces a personalized DPO training strategy with explicit rewards and a bottleneck adapter to manage feature integration from global and local sources.
Results
The framework demonstrated state-of-the-art performance on multiple preference datasets, achieving up to a 4.80% average accuracy improvement in both intra-domain and cross-domain federated settings.
Implications
FedPDPO has significant implications for the deployment of LLMs in privacy-sensitive applications across various domains, such as healthcare and finance, where aligning models with user preferences is crucial.
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
Reinforcement Learning
Optimization
- DeepStock integrates classical inventory management concepts into DRL to enhance performance.
- Policy regularizations significantly reduce hyperparameter tuning time and improve training outcomes.
- The approach has been successfully deployed in a real-world setting, managing inventory for Alibaba's Tmall.
- Synthetic experiments indicate a re-evaluation of the best DRL methods for inventory management.
Read more
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
Summary
The paper presents DeepStock, a novel approach to inventory management using Deep Reinforcement Learning (DRL) enhanced by policy regularizations. Traditional DRL methods often struggle with hyperparameter sensitivity and interpretability, leading to inconsistent performance in inventory control tasks. The authors propose incorporating classical inventory management concepts, such as 'Base Stock', into the DRL framework to mitigate these issues. By applying policy regularizations, they demonstrate a significant acceleration in hyperparameter tuning and an improvement in the performance of DRL methods. The effectiveness of DeepStock is validated through a full-scale deployment on Alibaba's Tmall platform, managing inventory for over 1 million SKU-warehouse combinations. Additionally, extensive synthetic experiments reveal that these policy regularizations reshape the understanding of optimal DRL methods for inventory management, suggesting a shift in how DRL can be applied in practical scenarios.
Methodology
The authors utilize Deep Reinforcement Learning with policy regularizations that encode inventory management principles. They define structured mappings from neural network outputs to order quantities, allowing for a more interpretable and efficient learning process. Two DRL methods, DDPG and PPO, are tested with these regularizations to evaluate their effectiveness in inventory management.
Results
The implementation of policy regularizations led to a full-scale deployment of DRL at Alibaba, managing inventory for all products on Tmall. The results from synthetic experiments showed that the proposed regularizations improved the performance of traditional DRL methods, demonstrating their effectiveness in real-world applications.
Implications
The findings suggest that incorporating classical inventory management principles into DRL can enhance the applicability and performance of machine learning in operational settings. This approach could be beneficial for other e-commerce platforms and industries facing similar inventory management challenges.
Fine-tuning Timeseries Predictors Using Reinforcement Learning
Reinforcement Learning
Time Series
- Reinforcement learning can enhance the performance of pre-trained time series predictors.
- The proposed fine-tuning methodology eliminates the need for human feedback, making it cost-effective.
- The study demonstrates the transfer learning properties of fine-tuned models.
- A systematic implementation plan for RL fine-tuning is provided for practitioners.
Read more
Fine-tuning Timeseries Predictors Using Reinforcement Learning
Summary
This chapter explores the application of reinforcement learning (RL) for fine-tuning financial time series predictors that were initially trained using supervised learning. The authors present three major RL algorithms and propose a systematic implementation plan for backpropagating the loss from RL tasks to pre-trained models. The study highlights the performance improvements achieved through fine-tuning, demonstrating the transfer learning capabilities of the models. The authors argue that RL can effectively align time series predictors with domain-specific constraints, such as risk management, while eliminating the need for human feedback, thus reducing costs and subjectivity. The chapter is structured to provide a comprehensive overview of the literature, data used, framework for evaluation, benchmarking against standard RL tasks, and hyperparameter tuning, culminating in empirical results that validate the effectiveness of the proposed methods.
Methodology
The authors utilized a pre-trained model as a backbone for reinforcement learning implementation, setting up an environment closely reflecting the training data. They focused on designing a reward structure that guides the model's learning process, allowing for backpropagation of loss through the backbone to update weights according to the RL policy. The chapter includes a literature review, data description, framework for fine-tuning, benchmarking against standard RL tasks, and hyperparameter tuning.
Results
The results indicate a marked increase in performance of the time series predictors after fine-tuning with reinforcement learning. The empirical evaluations confirm that the fine-tuned models exhibit enhanced predictive capabilities and demonstrate effective transfer learning properties.
Implications
The findings suggest that reinforcement learning can be a viable alternative to traditional supervised fine-tuning methods for time series prediction, particularly in financial contexts. This approach could lead to more efficient model training processes and improved predictive accuracy in various applications, including finance and risk management.
Online Learning and Equilibrium Computation with Ranking Feedback
Theory
Optimization
- Sublinear regret is unattainable with instantaneous utility ranking feedback.
- Sublinear regret can be achieved under time-average utility ranking feedback with certain assumptions.
- The proposed algorithms yield approximate coarse correlated equilibria in normal-form games.
- The study highlights the relevance of ranking feedback in real-world applications, such as recommendation systems.
Read more
Online Learning and Equilibrium Computation with Ranking Feedback
Summary
This paper investigates online learning in environments where feedback is provided in the form of rankings rather than numeric utilities, which is particularly relevant in human-in-the-loop applications. The authors explore two ranking mechanisms: one based on instantaneous utility and another based on time-average utility. They demonstrate that achieving sublinear regret is impossible with instantaneous utility rankings and under certain conditions with time-average utility rankings. The authors propose new algorithms that can achieve sublinear regret when the utility sequence exhibits sublinear total variation. Notably, they show that this assumption can be relaxed under full-information settings. The paper also connects these findings to equilibrium computation in normal-form games, indicating that when players use these algorithms, their repeated interactions can lead to an approximate coarse correlated equilibrium. Additionally, the effectiveness of the proposed algorithms is validated through an application in online large-language-model routing.
Methodology
The authors analyze online learning models with ranking feedback, categorizing the feedback into instantaneous utility rankings and time-average utility rankings. They establish theoretical limits on regret minimization and develop algorithms that achieve sublinear regret under specific conditions. The algorithms are tested in both theoretical frameworks and practical applications, including large-language-model routing tasks.
Results
The paper concludes that sublinear regret is impossible under instantaneous utility rankings and under certain deterministic conditions with time-average utility rankings. However, new algorithms are introduced that can achieve sublinear regret when the utility sequence has sublinear total variation. In full-information settings, this assumption can be removed, leading to approximate coarse correlated equilibria in repeated normal-form games.
Implications
The findings suggest that ranking feedback can be a viable alternative to numeric utility feedback in online learning scenarios, particularly in applications involving human preferences. This has implications for designing more effective recommendation systems and understanding equilibrium computation in multi-agent settings.
Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
Graph Learning
Optimization
Theory
- Introduction of unlearning corruption attacks that exploit graph unlearning processes.
- Formulation of the attack as a bi-level optimization problem to address technical challenges.
- Demonstration of significant accuracy degradation in GNNs due to carefully crafted unlearning requests.
- Highlighting the stealthy nature of these attacks, which can evade detection during training.
Read more
Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
Summary
This paper investigates a novel form of adversarial attack on Graph Neural Networks (GNNs) that exploits the process of graph unlearning, which is necessary for compliance with privacy regulations. The authors introduce the concept of unlearning corruption attacks, where adversaries inject carefully crafted nodes into a training graph and subsequently request their deletion. This deletion is mandated by privacy laws, making it a stealthy attack that can lead to significant performance degradation post-unlearning. The authors formulate this attack as a bi-level optimization problem, addressing challenges such as black-box unlearning and label scarcity by using gradient-based updates and surrogate models for pseudo-label generation. Their extensive experiments reveal that even small, strategically designed unlearning requests can severely impact the accuracy of GNNs, raising concerns about the robustness of unlearning methods in real-world applications.
Methodology
The authors propose a bi-level optimization framework to model the unlearning corruption attack. They approximate the unlearning process using gradient-based updates and utilize a surrogate model to generate pseudo-labels for unlabeled nodes. This approach allows for the optimization of node injections that maximize post-unlearning performance degradation while maintaining stealthiness during training.
Results
The experiments conducted across various benchmarks and unlearning algorithms demonstrate that the proposed unlearning corruption attacks can lead to substantial accuracy degradation in GNNs, even with minimal and carefully designed unlearning requests. This highlights vulnerabilities in current graph unlearning methods and raises alarms about their robustness under regulatory demands.
Implications
The findings underscore the need for improved defenses against adversarial attacks that leverage unlearning processes, particularly in contexts where GNNs are deployed in sensitive applications subject to privacy regulations. This research may inform the development of more resilient graph unlearning techniques and contribute to the broader discourse on security in machine learning.
Off-Policy Learning with Limited Supply
Reinforcement Learning
Theory
Optimization
- Conventional greedy OPL methods are suboptimal in limited supply scenarios.
- Theoretical proof exists that superior policies can be developed under supply constraints.
- OPLS focuses on relative expected rewards to improve item allocation efficiency.
- Empirical results demonstrate OPLS's superiority over traditional OPL methods.
Read more
Off-Policy Learning with Limited Supply
Summary
This paper addresses the challenges of off-policy learning (OPL) in contextual bandits under limited supply conditions, which are common in real-world applications like recommendation systems and online advertising. Traditional OPL methods assume an unconstrained environment where items can be selected infinitely. However, in scenarios such as coupon allocation or e-commerce, limited supply can lead to suboptimal policy performance if greedy selection strategies are employed. The authors provide a theoretical analysis demonstrating that conventional greedy approaches may fail to maximize expected rewards when items are scarce. They introduce a novel method called Off-Policy Learning with Limited Supply (OPLS), which prioritizes items based on their relative expected rewards compared to other users, rather than simply selecting the highest expected reward item. Empirical evaluations on synthetic and real-world datasets show that OPLS outperforms existing OPL methods, highlighting its effectiveness in managing limited supply situations.
Methodology
The authors formulate the problem of OPL with limited supply and analyze a simplified setting where each item is available in a single unit. They propose the OPLS method, which selects items based on the relative reward gap, defined as a user's expected reward minus the average expected reward across all users. This approach allows for more efficient allocation of limited items without increasing computational costs.
Results
The empirical results indicate that OPLS significantly outperforms traditional OPL methods in both synthetic and real-world datasets, demonstrating its effectiveness in maximizing expected rewards under limited supply conditions.
Implications
The findings suggest that OPLS can be effectively applied in various domains where item availability is constrained, such as e-commerce, coupon distribution, and other recommendation systems, potentially leading to better user satisfaction and resource management.
MSNet and LS-Net: Scalable Multi-Scale Multi-Representation Networks for Time Series Classification
Time Series
- Introduction of MSNet and LS-Net for scalable time series classification.
- Demonstrated the importance of structured multi-representation inputs for improved performance.
- MSNet achieves superior calibration, while LiteMV has the highest accuracy.
- LS-Net offers a favorable efficiency-accuracy trade-off, suitable for resource-constrained environments.
Read more
MSNet and LS-Net: Scalable Multi-Scale Multi-Representation Networks for Time Series Classification
Summary
This paper presents MSNet and LS-Net, two novel architectures designed for time series classification (TSC) that leverage a scalable multi-scale convolutional framework. The authors argue that the performance of TSC models can be significantly improved by integrating diverse input representations, such as derivatives and frequency-domain projections, rather than relying solely on raw time-domain inputs. MSNet is a hierarchical multi-scale convolutional network focused on robustness and calibration, while LS-Net is a lightweight variant aimed at efficiency. The authors adapt LiteMV, originally for multivariate inputs, to handle multi-representation univariate signals, facilitating cross-representation interaction. The models are evaluated across 142 benchmark datasets using a unified experimental protocol, revealing that structured multi-representation learning enhances performance, with MSNet excelling in calibration, LiteMV achieving the highest accuracy, and LS-Net providing an optimal efficiency-accuracy trade-off. The findings suggest that scalable multi-representation multi-scale learning is a promising direction for modern TSC.
Methodology
The authors developed two architectures, MSNet and LS-Net, employing a multi-scale convolutional framework that integrates structured multi-representation inputs. They adapted LiteMV for univariate signals and evaluated the models using Monte Carlo re-sampling across various metrics, including accuracy, macro-F1, AUC, NLL, and runtime. Statistical validation was performed using Critical Difference analysis.
Results
The evaluation revealed that structured multi-representation learning consistently outperforms raw input models. MSNet provided the best calibration performance, LiteMV achieved the highest mean accuracy, and LS-Net established an effective efficiency-accuracy Pareto frontier, demonstrating the flexibility of the proposed framework.
Implications
The findings suggest that adopting multi-representation and multi-scale approaches can significantly enhance the performance of time series classification models, making them more robust and efficient. This could lead to better applications in various fields that rely on time series data, such as finance, healthcare, and IoT.
SHAPCA: Consistent and Interpretable Explanations for Machine Learning Models on Spectroscopy Data
Interpretability
- SHAPCA combines PCA and SHAP to enhance interpretability of machine learning models on spectroscopy data.
- The method provides explanations in the original input space, facilitating better understanding for practitioners.
- Numerical analysis shows improved consistency of feature importance across repeated model training.
- The framework allows for both global and local analysis of model predictions.
Read more
SHAPCA: Consistent and Interpretable Explanations for Machine Learning Models on Spectroscopy Data
Summary
The paper introduces SHAPCA, an explainable machine learning pipeline designed to provide consistent and interpretable explanations for models applied to high-dimensional spectroscopy data. The authors highlight the challenges of using machine learning in spectroscopy, particularly the high dimensionality and collinearity of the data, which complicate model training and the interpretability of predictions. Traditional feature extraction methods, such as Principal Component Analysis (PCA), often obscure the connection between model predictions and the original spectral data. SHAPCA addresses these issues by combining PCA for dimensionality reduction with Shapley Additive Explanations (SHAP) for post hoc interpretability. This approach allows for the analysis of model behavior from both global and local perspectives, revealing important spectral bands that influence predictions. The framework ensures that explanations are provided in the original input space, making them more interpretable for practitioners. The authors demonstrate the effectiveness of SHAPCA through numerical analysis, showing improved consistency and interpretability of results across different training runs.
Methodology
The SHAPCA framework integrates Principal Component Analysis (PCA) for dimensionality reduction with Shapley Additive Explanations (SHAP) for generating post hoc explanations. It captures the correlation structure of spectral data by grouping highly correlated wavelengths into a low-dimensional latent representation, which is then used for classification. SHAP values are computed at the component level and back-projected to the original input space to provide interpretable explanations.
Results
The application of SHAPCA demonstrated that the explanations provided are more stable and consistent across different training runs compared to traditional methods. The results highlighted specific spectral bands that significantly influence model predictions, thus enhancing the interpretability of the machine learning models applied to spectroscopy data.
Implications
The SHAPCA framework has significant implications for the deployment of machine learning models in clinical and safety-critical applications, where understanding model predictions is essential. By providing interpretable explanations, SHAPCA can help build trust in AI systems used for chemical and biomedical analysis, facilitating their integration into decision-making processes.
From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models
Robotics
Multimodal
Efficient ML
- Conventional efficiency metrics for VLA models do not capture real-world performance on robotic platforms.
- Embodied efficiency metrics provide a more accurate assessment of robotic execution behaviors.
- Reducing computational costs can lead to increased end-to-end execution time and degraded motion quality.
- Common adaptation techniques show limited improvements in embodied efficiency and may involve trade-offs.
Read more
From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models
Summary
This paper critiques the existing efficiency metrics used in Vision-Language-Action (VLA) models, which primarily focus on inference efficiency through parameters, FLOPs, and token throughput. The authors argue that these metrics do not accurately reflect the performance of robotic platforms in real-world applications. Instead, they propose that embodied efficiency—measured through task completion time, trajectory smoothness, cumulative joint rotation, and motion energy—should be prioritized. Through controlled experiments involving model compression, token sparsification, and action sequence compression, the authors reveal that traditional efficiency improvements can lead to increased execution costs and degraded motion quality. They find that common adaptation methods yield only marginal improvements in embodied efficiency and often come with trade-offs in other performance metrics. The study emphasizes the need for a more comprehensive evaluation framework that incorporates embodied efficiency to better assess and compare VLA models in practical scenarios.
Methodology
The authors conducted controlled studies examining various efficiency-improving techniques, including model compression (weight pruning and quantization), token sparsification, and action sequence compression. They evaluated these techniques using multiple embodied efficiency metrics to analyze their impact on robotic performance.
Results
The findings indicate that methods aimed at reducing inference costs do not necessarily enhance embodied efficiency. For instance, models with reduced parameters may still incur higher overall system energy costs and longer task completion times. The study highlights significant discrepancies between traditional efficiency metrics and actual embodied performance.
Implications
The results suggest that future research and development in VLA models should focus on embodied efficiency to ensure practical applicability in real-world robotic tasks. This shift could lead to improved design and evaluation frameworks that better align with the operational demands of embodied agents.
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
NLP
Large Language Models
Interpretability
- Introduces per-layer supervision to enhance modularity in transformer models.
- Demonstrates that per-layer supervision leads to significantly larger ablation effects compared to standard training.
- Establishes a methodology for capturing computational dynamics independent of vocabulary structure.
- Validates the approach through causal experiments showing functional reorganization in attention heads.
Read more
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
Summary
This paper addresses the challenge of interpretability in transformer models, particularly the issue of surgical control due to the 'Hydra effect,' where ablation of critical components leads to minimal behavioral change due to redundancy. The author proposes a novel approach that combines dual-stream processing, per-layer supervision, and gated attention to expose hidden modularity in transformers. By implementing per-layer supervision, the study shows that models exhibit significantly larger ablation effects, enabling greater control over targeted behaviors. The results indicate that models trained with per-layer supervision have a wider variance in ablation effects, revealing which predictions depend on specific circuits. The methodology includes engineered features that capture computational dynamics rather than vocabulary structure, and causal experiments demonstrate functional reorganization in the model's attention heads. This work establishes a framework for transforming interpretability from passive observation to active control, suggesting that modularity can be engineered through architectural constraints and training objectives.
Methodology
The methodology involves three main components: dual-stream processing to separate token and contextual representations, per-layer supervision to provide independent gradient signals at each layer, and gated attention to regularize activation patterns. This architecture is compared against a control model that lacks per-layer supervision.
Results
Models trained with per-layer supervision exhibited ablation effects that were 5 to 23 times larger than those of control models, with a standard deviation of 6.32% compared to 0.63% in controls. This indicates a significant increase in control leverage over targeted behaviors, such as capitalization, where changes in attention head scaling produced predictable output variations.
Implications
The findings suggest that transformer models can be designed to have verifiable modularity, allowing for more interpretable and controllable AI systems. This could have applications in various domains where understanding model behavior is critical, such as natural language processing and decision-making systems.
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
NLP
Large Language Models
Reinforcement Learning
- FIPO enhances reasoning in LLMs by addressing limitations of uniform reward systems.
- The algorithm incorporates future-KL divergence for more granular credit assignment.
- FIPO significantly increases reasoning chain lengths and accuracy on benchmarks.
- The approach outperforms existing models, demonstrating its effectiveness.
Read more
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
Summary
The paper introduces Future-KL Influenced Policy Optimization (FIPO), a novel reinforcement learning algorithm aimed at enhancing reasoning capabilities in large language models (LLMs). Traditional methods like Group Relative Policy Optimization (GRPO) utilize outcome-based rewards that apply a uniform advantage across all tokens in a trajectory, which can hinder performance by not distinguishing between critical reasoning steps and trivial tokens. FIPO addresses this limitation by incorporating discounted future-KL divergence into the policy update, allowing for a more nuanced advantage formulation that re-weights tokens based on their impact on future behavior. The authors demonstrate that FIPO significantly extends the average length of reasoning chains from approximately 4,000 to over 10,000 tokens and improves accuracy on the AIME 2024 benchmark from 50.0% to a peak of 58.0%. This performance surpasses existing models such as DeepSeek-R1-Zero-Math-32B and o1-mini. The findings suggest that dense advantage formulations are crucial for unlocking the reasoning potential of LLMs, and the authors have open-sourced their training system to facilitate further research in this area.
Methodology
FIPO modifies the policy update process by integrating discounted future-KL divergence, which allows for a more detailed advantage calculation that considers the influence of tokens on future trajectory behavior. This is coupled with mechanisms for stability, such as influence weight clipping and filtering.
Results
FIPO achieved an average chain-of-thought length increase from 4,000 to over 10,000 tokens and improved AIME 2024 Pass@1 accuracy from 50.0% to a peak of 58.0%, outperforming both DeepSeek-R1-Zero-Math-32B (∼47.0%) and o1-mini (∼56.0%).
Implications
The findings indicate that refining reward structures in reinforcement learning can unlock deeper reasoning capabilities in LLMs, potentially leading to advancements in tasks requiring complex reasoning such as competitive mathematics and coding.
Ternary Gamma Semirings: From Neural Implementation to Categorical Foundations
Theory
- Standard neural networks fail at compositional generalization tasks, achieving 0% accuracy.
- Introducing Ternary Gamma Semirings allows neural networks to achieve 100% accuracy on novel combinations.
- The learned feature space corresponds to a unique algebraic structure classified in mathematics.
- Neural networks' generalization capabilities stem from their internalization of algebraic axioms.
Read more
Ternary Gamma Semirings: From Neural Implementation to Categorical Foundations
Summary
This paper presents a theoretical framework that connects neural network learning with abstract algebraic structures, specifically through the introduction of Ternary Gamma Semirings. The author demonstrates that standard neural networks fail to achieve compositional generalization, achieving 0% accuracy on tasks that require inferring unseen combinations based on learned rules. By introducing a logical constraint in the form of the Ternary Gamma Semiring, the same neural architecture can learn a structured feature space, resulting in 100% accuracy on novel combinations. The paper proves that this learned feature space is a finite commutative ternary Γ-semiring, characterized by a ternary operation that implements the majority vote rule. The findings suggest that the success of neural networks in generalization can be attributed to their ability to internalize algebraic axioms, and that logical constraints can guide them toward canonical forms. This work lays the groundwork for a new interdisciplinary field called Computational Γ-Algebra, merging machine learning with abstract algebra and category theory.
Methodology
The study employs a minimal counterexample to demonstrate the failure of standard neural networks on compositional generalization tasks. It introduces the Ternary Gamma Semiring as a logical constraint, reformulating the neural architecture to learn a structured feature space. The paper also includes a categorical perspective on commutative ternary Γ-semirings and their properties.
Results
The results indicate that standard neural networks misclassify all test samples in compositional generalization tasks, while the implementation of the Ternary Gamma Semiring achieves perfect accuracy. The learned feature space is shown to satisfy key algebraic properties, confirming its classification as a unique Boolean-type ternary Γ-semiring.
Implications
The findings suggest that enhancing neural networks with algebraic structures can significantly improve their reasoning capabilities and generalization performance. This could lead to more robust AI systems capable of understanding and inferring rules rather than merely memorizing examples. The establishment of Computational Γ-Algebra may open new avenues for research at the intersection of machine learning and mathematics.