AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
24
Papers today
8h
Update frequency
7
Days of history
Auction-Based Online Policy Adaptation for Evolving Objectives
Reinforcement Learning
Robotics
Optimization
- Introduces a modular framework for multi-objective reinforcement learning using auction-based policy adaptation.
- Local policies compete through bids reflecting urgency, allowing for dynamic prioritization of objectives.
- Demonstrates superior performance compared to monolithic policies in dynamic environments.
- Enhances interpretability by allowing clear identification of active policies and objectives.
Read more
Auction-Based Online Policy Adaptation for Evolving Objectives
Summary
This paper addresses the challenge of multi-objective reinforcement learning (MORL) in dynamic environments where objectives can appear or disappear at runtime. The authors propose a modular framework that utilizes an auction-based mechanism for policy adaptation. Each objective is managed by a selfish local policy that bids for the right to execute actions based on the urgency of its corresponding state. This auction system allows for a dynamic trade-off among competing objectives, enabling the system to adapt quickly when objectives change. The framework is designed to be modular, allowing for easy addition or removal of policies as objectives evolve. The authors demonstrate that this approach outperforms traditional monolithic policies trained with proximal policy optimization (PPO) in complex environments, such as Atari Assault and a gridworld path-planning task. The modular design not only enhances performance but also improves interpretability, as it allows for clear identification of the active policy at any moment.
Methodology
The authors implemented a compositional reinforcement learning framework where each objective is managed by a local policy. These policies engage in a general-sum game, competing for action execution rights through an auction mechanism. Policies are trained concurrently using proximal policy optimization (PPO), with penalties imposed for dishonest bidding to ensure truthful urgency estimation.
Results
The proposed auction-based framework significantly outperformed monolithic policies in both Atari Assault and a gridworld path-planning task, achieving higher payoffs and demonstrating effective adaptation to changing objectives. The modular approach also facilitated faster adaptation and clearer interpretability of policy actions.
Implications
This work has potential applications in robotics, particularly in environments where tasks and objectives are dynamic and uncertain, such as autonomous navigation and resource allocation. The framework can be adapted to various multi-objective scenarios, enhancing decision-making in real-time systems.
Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
Reinforcement Learning
Theory
Optimization
- Introduces a novel passive Langevin-based algorithm for adaptive inverse reinforcement learning.
- Utilizes Malliavin calculus to efficiently estimate counterfactual gradients conditioned on measure-zero events.
- Achieves optimal convergence rates independent of trajectory resampling or kernel smoothing.
- Provides a comprehensive algorithmic framework for counterfactual gradient estimation.
Read more
Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
Summary
This paper addresses the challenge of adaptive inverse reinforcement learning (IRL), which aims to reconstruct the loss function of a forward learner by passively observing its gradient dynamics during reinforcement learning (RL). The authors propose a novel Langevin-based algorithm that utilizes Malliavin calculus to efficiently estimate counterfactual gradients, which are essential for adaptive IRL but are conditioned on events of probability zero under the forward learner's trajectory. Traditional Monte Carlo methods are inefficient for this purpose, and kernel smoothing techniques suffer from slow convergence. By reformulating the counterfactual conditioning as a ratio of unconditioned expectations involving Malliavin derivatives, the authors achieve standard estimation rates. The paper details the derivation of necessary Malliavin derivatives and their adjoint Skorohod integral formulations, leading to a concrete algorithmic approach for counterfactual gradient estimation. The proposed method overcomes limitations of existing kernel-based Langevin algorithms and demonstrates improved convergence rates without the need for resampling or kernel smoothing. Numerical implementations validate the effectiveness of the proposed algorithm in recovering the forward learner's loss function in real time.
Methodology
The authors employ Malliavin calculus to reformulate counterfactual gradient estimation as a ratio of unconditioned expectations. They derive necessary derivatives and integral formulations to create an efficient algorithm for adaptive IRL, which replaces traditional kernel-based methods.
Results
The proposed Malliavin-based gradient estimator yields unbiased Monte Carlo estimators for counterfactual conditional expectations, achieving optimal convergence rates. Numerical experiments demonstrate effective recovery of the forward learner's loss function.
Implications
This work has significant implications for real-time adaptive IRL applications, particularly in scenarios where observing the complete trajectory of the forward learner is impractical. The methodology could enhance the efficiency and accuracy of learning algorithms in various domains, including robotics and automated decision-making systems.
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
Theory
Optimization
- Introduces a diffusion-based framework for uncertainty quantification in industrial models.
- Eliminates the need for post-hoc calibration by providing intrinsically calibrated predictive uncertainty.
- Demonstrates significant improvements in uncertainty calibration and predictive accuracy over existing methods.
- Evaluated on synthetic datasets and real-world industrial case studies.
Read more
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
Summary
This paper addresses the critical challenge of uncertainty quantification (UQ) in industrial data-driven models, which are essential for real-time monitoring of performance indicators that are difficult to measure directly. The authors propose a novel diffusion-based posterior sampling framework that inherently generates well-calibrated predictive uncertainty, thus eliminating the need for post-hoc calibration. The method is evaluated extensively on synthetic distributions, a Raman-based phenylacetic acid soft sensor benchmark, and a real ammonia synthesis case study. The results demonstrate significant improvements in both uncertainty calibration and predictive accuracy compared to existing UQ techniques. This work highlights the potential of diffusion samplers as a principled and scalable approach for enhancing uncertainty-aware modeling in industrial applications, ultimately fostering greater trust and reliability in data-driven decision-making processes.
Methodology
The authors developed a diffusion-based posterior sampling framework that utilizes Bayesian inference principles to produce calibrated predictive distributions. This approach focuses on faithful posterior sampling to accurately represent uncertainty without requiring additional calibration steps.
Results
The proposed method achieved practical improvements in uncertainty calibration and predictive accuracy across various evaluations, including synthetic distributions and real-world industrial applications. The results indicate that the diffusion sampler effectively captures the true posterior distribution, leading to more reliable uncertainty estimates.
Implications
The findings suggest that the diffusion-based UQ framework can enhance the deployment of data-driven models in safety-critical industrial settings, enabling better decision-making and risk management. This approach may lead to broader acceptance and trust in data-driven technologies within process industries.
Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
Optimization
Theory
Efficient ML
- Introduces feature weighting in distance computation for active learning in regression.
- Proposes five new active learning approaches that incorporate feature weighting.
- Demonstrates improved performance of feature-weighted methods over traditional unweighted methods.
- Extends the applicability of feature weighting to both single-task and multi-task regression problems.
Read more
Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
Summary
This paper addresses the challenge of pool-based sequential active learning for regression (ALR), which aims to select a small number of unlabeled samples to label in order to build a more accurate regression model within a limited labeling budget. The author identifies that existing ALR methods fail to account for the varying importance of different features when calculating distances between samples, leading to sub-optimal sample selection. To remedy this, the paper proposes three feature-weighted single-task ALR approaches (FW-RD, FW-GSx, FW-iGS) and two multi-task approaches (FW-MT-GSx, FW-MT-iGS) that utilize ridge regression coefficients from previously labeled samples to weight features in distance computations. Extensive experiments demonstrate that these feature-weighted approaches consistently outperform their unweighted counterparts across both single-task and multi-task regression scenarios, indicating that feature weighting can enhance the performance of various regression models.
Methodology
The paper develops feature-weighted versions of existing active learning approaches by integrating ridge regression coefficients to adjust the importance of features in distance calculations. The proposed methods include FW-RD, FW-GSx, FW-iGS for single-task learning, and FW-MT-GSx, FW-MT-iGS for multi-task learning. The performance of these methods is evaluated through extensive experiments comparing them against their unweighted versions.
Results
The experimental results show that all five proposed feature-weighted ALR approaches significantly outperform their corresponding unweighted versions. This improvement is consistent across both linear and nonlinear regression models, indicating the robustness and effectiveness of the feature weighting strategy.
Implications
The findings suggest that incorporating feature weighting can lead to more efficient sample selection in active learning scenarios, potentially reducing labeling costs and improving model accuracy. The proposed methods can be easily adapted for use in other domains such as stream-based active learning and classification tasks.
Neural network methods for two-dimensional finite-source reflector design
Optimization
- Introduces a neural network parameterization for reflector design that addresses finite-source light distribution.
- Develops two differentiable objective functions for optimizing reflector height.
- Demonstrates superior performance of the neural network approach over traditional deconvolution methods.
- Provides a comprehensive evaluation across multiple benchmarks, including height constraints.
Read more
Neural network methods for two-dimensional finite-source reflector design
Summary
This paper addresses the inverse design problem of creating two-dimensional reflectors that can transform light from a finite, extended source into a desired far-field distribution. The authors propose a novel approach using neural network parameterization to model the reflector height, coupled with two differentiable objective functions. The first function is a direct change-of-variables loss that facilitates the mapping of the source distribution through the learned inverse function. The second is a mesh-based loss that allows for continuous mapping back to the source, even in cases of discontinuous sources. The gradients for optimization are computed using automatic differentiation and a robust quasi-Newton method. The authors also establish a baseline comparison with a deconvolution method based on a simplified finite-source approximation. Through four benchmark tests, including scenarios with continuous and discontinuous sources, the neural network approach demonstrates faster convergence and lower normalized mean absolute error (NMAE) compared to the deconvolution method, while naturally accommodating height constraints. The paper concludes with a discussion on extending the method to three-dimensional designs using iterative correction schemes.
Methodology
The authors utilize a neural network to parameterize the reflector height and develop two differentiable objective functions: a direct change-of-variables loss and a mesh-based loss. They employ automatic differentiation for gradient computation and optimize using a quasi-Newton method. A baseline deconvolution method is also formulated for comparison, based on a simplified finite-source approximation.
Results
The neural network approach converges more rapidly and achieves consistently lower NMAE across all benchmarks compared to the deconvolution method. It effectively handles height constraints and demonstrates robustness in both continuous and discontinuous source scenarios.
Implications
The proposed method has significant implications for optical design, particularly in applications requiring precise control of light propagation, such as advanced illumination systems, solar concentrators, and optical communications. The ability to extend the method to three-dimensional designs opens up further possibilities in complex beam shaping and freeform optics.
MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction
Time Series
Multimodal
- Introduction of MATA-Former, a transformer architecture that aligns clinical semantics with temporal dynamics.
- Development of Plateau-Gaussian Soft Labeling (PSL) for continuous risk modeling instead of binary classification.
- Creation of the SIICU dataset with over 506,000 expert-annotated clinical events to enhance evaluation of ICU risk prediction models.
- Demonstration of superior performance in risk prediction from text-intensive, irregular clinical time series.
Read more
MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction
Summary
This paper addresses the challenge of predicting clinical risks in Intensive Care Units (ICUs) by proposing a novel framework called the Medical-semantics Aware Time-ALiBi Transformer (MATA-Former). The authors argue that traditional methods fail to capture the complex relationships between clinical events due to their reliance on chronological proximity rather than intrinsic pathological dependencies. MATA-Former utilizes event semantics to dynamically adjust attention weights, allowing the model to prioritize causal relevance over mere time lags. Additionally, the authors introduce Plateau-Gaussian Soft Labeling (PSL), which reformulates binary classification into a continuous multi-horizon regression framework, enabling a more nuanced understanding of risk evolution over time. The framework is evaluated on a newly constructed dataset, the Semantic-Integrated Intensive Care Unit (SIICU), which includes over 506,000 expert-annotated clinical events. The results demonstrate that MATA-Former outperforms existing methods in capturing risks from both structured and unstructured clinical data, showcasing robust generalization capabilities across different datasets.
Methodology
The authors propose MATA-Former, which integrates unified clinical embeddings with a semantic-guided temporal attention mechanism to dynamically generate query-specific focus windows. This allows the model to prioritize historical events based on their pathological relevance rather than their physical proximity. PSL is introduced to transform binary classification into a continuous regression framework, enabling the capture of dynamic risk trajectories throughout the ICU stay.
Results
The evaluation of MATA-Former on the SIICU dataset and the MIMIC-IV dataset shows that it significantly outperforms existing methods in terms of predictive accuracy and generalization. The framework effectively captures the complexities of clinical risk evolution, demonstrating its capability to utilize both structured and unstructured data.
Implications
The proposed framework has the potential to improve Clinical Decision Support Systems (CDSS) in ICUs by providing more accurate risk predictions, ultimately leading to better patient outcomes. The SIICU dataset can serve as a valuable resource for future research in clinical risk modeling.
Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
Generative Models
Graph Learning
Efficient ML
- Introduction of the Geometric Enhancement Module (GEM) for direct geometric biasing in Transformers.
- Replacement of one-hot atom representations with a compact chemically informed tokenization.
- Crystalite achieves state-of-the-art results in crystal structure prediction and generation.
- Significantly faster sampling compared to traditional geometry-heavy models.
Read more
Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
Summary
The paper introduces Crystalite, a lightweight diffusion Transformer designed for efficient modeling of crystalline materials. Traditional generative models for crystals often utilize equivariant graph neural networks (GNNs), which, while effective, are computationally expensive and slow. Crystalite addresses these challenges by incorporating two novel components: Subatomic Tokenization, which replaces high-dimensional one-hot atom representations with a more compact and chemically structured format, and the Geometry Enhancement Module (GEM), which integrates periodic geometric information directly into the attention mechanism of the Transformer. This approach maintains the simplicity and efficiency of standard Transformers while enhancing their capability to model crystal structures. The authors demonstrate that Crystalite achieves state-of-the-art performance on crystal structure prediction benchmarks and excels in de novo generation tasks, outperforming existing geometry-heavy alternatives in terms of sampling speed.
Methodology
Crystalite employs a lightweight diffusion Transformer architecture that integrates the GEM to inject periodic geometric information into the attention mechanism. The model uses Subatomic Tokenization for atom representation, enhancing the efficiency of the diffusion process. The architecture preserves the standard multi-head attention framework while incorporating additive geometric biases to improve performance on crystal modeling tasks.
Results
Crystalite demonstrates superior performance on crystal structure prediction benchmarks, achieving the best S.U.N. discovery score among evaluated models. It also shows enhanced de novo generation capabilities while significantly reducing sampling time compared to more complex, geometry-heavy alternatives.
Implications
The development of Crystalite has significant implications for materials science, particularly in the discovery and design of novel crystalline materials with desired properties. Its efficiency and performance could facilitate faster exploration of the vast compositional space in materials research, potentially accelerating advancements in various applications such as electronics, photonics, and catalysis.
Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
NLP
Large Language Models
Reinforcement Learning
Efficient ML
- Introduction of Batched Contextual Reinforcement (BCR) for efficient reasoning in LLMs.
- Discovery of a task-scaling law where increasing concurrent problems reduces token usage while maintaining accuracy.
- Demonstration of a 'free lunch' phenomenon where accuracy improves despite reduced verbosity.
- Emergence of self-regulated efficiency in models, eliminating redundant reasoning loops.
Read more
Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
Summary
This paper introduces Batched Contextual Reinforcement (BCR), a novel training paradigm aimed at enhancing the efficiency of reasoning in Large Language Models (LLMs) while maintaining or improving accuracy. Traditional methods for improving efficiency often lead to degraded reasoning quality or require complex training processes. BCR simplifies this by allowing models to solve multiple problems simultaneously within a shared context window, rewarding them based solely on per-instance accuracy. The authors identify a new task-scaling law, showing that as the number of concurrent problems increases, per-problem token usage decreases while accuracy remains relatively stable. This challenges the conventional accuracy-efficiency trade-off, revealing a 'free lunch' phenomenon where models can achieve better accuracy with reduced verbosity. The study demonstrates that BCR can reduce token usage by 15.8% to 62.6% across different model sizes while improving performance on major mathematical benchmarks. Furthermore, qualitative analyses indicate that models trained with BCR develop self-regulated efficiency, autonomously eliminating redundant reasoning processes. The findings suggest that BCR provides a stable, constraint-based alternative for length control in LLMs, unlocking latent high-density reasoning capabilities without explicit supervision.
Methodology
The authors propose BCR, which involves training models to solve N problems simultaneously within a shared context window, rewarded by per-instance accuracy. This method creates an implicit token budget that encourages efficient reasoning without the need for explicit length penalties or complex training structures.
Results
BCR achieves a reduction in token usage by 15.8% to 62.6% across model sizes (1.5B and 4B) while consistently maintaining or improving accuracy on five major mathematical benchmarks. The method reveals a task-scaling law that allows for controllable throughput and accuracy trade-offs.
Implications
The findings suggest that BCR can significantly enhance the efficiency of reasoning in LLMs, making it a valuable framework for practical applications in areas requiring complex reasoning, such as mathematical problem-solving and other cognitive tasks. This could lead to more efficient deployment of LLMs in real-world applications, reducing computational costs while improving performance.
Robust Graph Representation Learning via Adaptive Spectral Contrast
Graph Learning
Theory
- Identifies a spectral dilemma in graph contrastive learning regarding the trade-off between high-frequency signal utility and noise sensitivity.
- Introduces ASPECT, a framework that utilizes a reliability-aware spectral gating mechanism to improve robustness in graph representation learning.
- Demonstrates that existing global spectral fusion strategies are suboptimal for mixed graphs with varying node-wise frequency preferences.
- Achieves state-of-the-art performance on 8 out of 9 benchmarks, particularly on heterophilic graphs.
Read more
Robust Graph Representation Learning via Adaptive Spectral Contrast
Summary
This paper addresses the challenges of spectral graph contrastive learning, particularly the vulnerability of high-frequency signals to noise, which is critical for encoding heterophilic structures. The authors identify a spectral dilemma where high-frequency components, while essential for capturing heterophily, exhibit higher variance under perturbations. They propose ASPECT, a novel framework that employs a reliability-aware spectral gating mechanism to dynamically adjust the reliance on frequency channels based on their stability against adversarial perturbations. This approach is formulated as a minimax game, optimizing a node-wise gate against a spectral adversary targeting energy distributions. Empirical evaluations demonstrate that ASPECT achieves state-of-the-art performance on 8 out of 9 benchmarks, effectively distinguishing meaningful structural heterophily from incidental noise, thereby enhancing robustness in graph representation learning.
Methodology
The authors develop ASPECT, which formulates a minimax game to optimize a node-wise gate that adjusts the reliance on frequency channels based on their stability against perturbations. This is achieved through a Rayleigh quotient penalty targeting spectral energy distributions, allowing the encoder to learn robust representations while filtering out unreliable high-frequency noise.
Results
ASPECT outperforms existing methods on 8 out of 9 benchmarks, particularly excelling in scenarios involving heterophilic graphs. The analysis of the learned gate values indicates a strong correlation with local homophily, confirming the framework's effectiveness in disentangling structural signals from noise.
Implications
The findings suggest that enhancing robustness in spectral graph learning is crucial for developing models that generalize well under mixed structural conditions. This work could inform future research in graph representation learning, particularly in applications involving complex graph structures with varying node characteristics.
Coupled Query-Key Dynamics for Attention
NLP
Large Language Models
Efficient ML
- Introduces Coupled QK Dynamics, enhancing attention mechanisms by evolving queries and keys jointly.
- Achieves significant improvements in language modeling perplexity with minimal additional parameters.
- Structural ablation studies confirm that coupling is the key factor for performance gains.
- Effectiveness varies by corpus, with benefits observed in domain-coherent texts but not in heterogeneous datasets.
Read more
Coupled Query-Key Dynamics for Attention
Summary
This paper introduces a novel framework for attention mechanisms in neural networks, termed Coupled Query-Key (QK) Dynamics. Unlike standard attention, which computes scores from static and independent projections of the input, the proposed method evolves queries and keys jointly through shared learned dynamics prior to scoring. This coupling enhances language modeling performance and training stability, as evidenced by significant reductions in perplexity on the WikiText-103 dataset. The authors demonstrate that coupled dynamics achieves a perplexity of 22.55–22.62 at 60M parameters, outperforming standard attention's 24.22 with only a marginal increase in parameters. Through structural ablation studies, they isolate the benefits of coupling from other factors, revealing that the coupling itself, rather than the specific integrator used (Hamiltonian or Euler), is crucial for performance improvements. The paper also characterizes the conditions under which coupling is beneficial, noting its effectiveness on domain-coherent text while showing degradation on heterogeneous datasets. The findings suggest that coupled dynamics can serve as a sample-efficiency mechanism, requiring fewer tokens for similar performance compared to standard attention when trained for longer durations.
Methodology
The authors propose a framework for evolving queries and keys through shared learned dynamics before scoring, utilizing both Hamiltonian and Euler integrators. They conduct structural ablation studies to isolate the effects of coupling and evaluate performance across various datasets and model sizes.
Results
Coupled QK Dynamics achieves a perplexity of 22.55–22.62 on WikiText-103 at 60M parameters, a 6.6-6.9% improvement over standard attention. The method shows consistent benefits on domain-coherent datasets like WikiText-103 and PubMed, while performance degrades on heterogeneous web text. At larger model sizes (350M), the advantage narrows, with Differential Attention surpassing coupled dynamics.
Implications
The findings suggest that incorporating coupled dynamics into attention mechanisms can lead to more stable training and improved performance in language modeling tasks. This approach may also inform future developments in transformer architectures and other applications requiring efficient attention mechanisms.
Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine
Optimization
Theory
Robotics
- Development of a physics-based component-level model for turbofan engine control.
- Introduction of a meta-heuristic extended dynamic mode decomposition for accurate dynamic modeling.
- Creation of two controllers: AKMPC and K-FBLC, with AKMPC showing superior robustness.
- Demonstration of the Koopman model's flexibility across different control objectives.
Read more
Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine
Summary
This paper explores the application of Koopman operator-based methods for the multivariable control of a two-spool turbofan engine. A physics-based component-level model is developed to generate training data and validate the controllers. The author introduces a meta-heuristic extended dynamic mode decomposition, which utilizes a cost function to effectively capture spool-speed dynamics and engine pressure ratio (EPR). This allows for the creation of a single Koopman model that can be adapted for various control objectives. Two controllers are developed based on the identified time-varying Koopman model: an adaptive Koopman-based model predictive controller (AKMPC) with a disturbance observer and a Koopman-based feedback linearization controller (K-FBLC) as a benchmark. The performance of these controllers is evaluated across two control strategies—spool speeds and EPR—under both sea-level and varying flight conditions. The findings indicate that the identification approach provides accurate predictions for spool speeds and EPR, facilitating the flexible reuse of the Koopman model across different control formulations. While both control strategies yield similar performance in steady conditions, the AKMPC demonstrates enhanced robustness compared to the K-FBLC under varying flight conditions, effectively compensating for model mismatches. Additionally, the EPR control strategy is shown to improve thrust response, underscoring the potential of the Koopman-based control framework for robust turbofan engine management.
Methodology
The study employs a physics-based component-level model to generate training data and validate control strategies. A meta-heuristic extended dynamic mode decomposition is developed to create a single Koopman model. Two control strategies are implemented: an adaptive Koopman-based model predictive controller (AKMPC) and a feedback linearization controller (K-FBLC). The performance of these controllers is assessed under different flight conditions.
Results
The proposed identification approach successfully predicts spool speeds and EPR, allowing for flexible application of the Koopman model. The AKMPC outperforms the K-FBLC in terms of robustness under varying flight conditions, while both controllers achieve comparable performance in steady conditions. The EPR control strategy is found to improve thrust response.
Implications
The findings suggest that Koopman-based control methodologies can significantly enhance the robustness and adaptability of turbofan engine management systems, potentially leading to improved fuel efficiency and operational flexibility in aviation.
Residuals-based Offline Reinforcement Learning
Reinforcement Learning
Optimization
Theory
- Introduces a residuals-based framework for offline reinforcement learning that addresses data coverage limitations.
- Defines a residuals-based Bellman optimality operator that incorporates estimation errors into policy optimization.
- Develops a residuals-based offline deep Q-learning algorithm and demonstrates its effectiveness in a stochastic environment.
- Provides finite-sample guarantees and conditions for asymptotic optimality of the proposed methods.
Read more
Residuals-based Offline Reinforcement Learning
Summary
This paper addresses the challenges of offline reinforcement learning (RL), which relies on previously collected data without real-time interaction with the environment. The authors propose a novel residuals-based offline RL framework that incorporates estimation errors in transition dynamics into policy optimization. By defining a residuals-based Bellman optimality operator, the framework allows for learning policies without the stringent requirement of data coverage across all state-action pairs. The authors develop a residuals-based offline deep Q-learning (DQN) algorithm and demonstrate its effectiveness in a stochastic CartPole environment. The proposed method not only mitigates issues related to distribution shift but also enables the generation of unseen states through empirical residuals, thereby enhancing the learning process in high-stakes applications where traditional online RL methods are impractical.
Methodology
The authors construct an estimated transition model from static offline data using supervised learning. They compute empirical residuals to capture discrepancies between the learned model and true dynamics. By sampling these residuals, they generate trajectories for training policies, allowing for on-policy training and addressing distribution shift.
Results
The proposed residuals-based offline DQN algorithm was tested in a stochastic CartPole environment, demonstrating improved performance over traditional offline RL methods. The framework showed that it could effectively generate unseen states and mitigate the impact of distribution shift, leading to more reliable policy evaluations.
Implications
This work has significant implications for high-stakes applications in fields such as healthcare, transportation, and energy, where offline RL can be safely deployed without the risks associated with online trial-and-error learning. The framework can enhance decision-making processes in environments where real-time interaction is not feasible.
UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression
Time Series
Theory
Efficient ML
- UQ-SHRED provides a distributional learning framework for valid uncertainty quantification in sparse sensing.
- The method combines noise injection with energy score minimization, maintaining computational efficiency.
- Theoretical guarantees are established for the learned conditional distribution, supporting its use in uncertainty-aware applications.
- UQ-SHRED is validated across multiple scientific datasets, showcasing its effectiveness in various domains.
Read more
UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression
Summary
The paper introduces UQ-SHRED, a novel framework for uncertainty quantification in the context of reconstructing high-dimensional spatiotemporal fields from sparse sensor measurements. Building on the SHallow REcurrent Decoder (SHRED) architecture, UQ-SHRED addresses the critical limitation of uncertainty estimation in complex and data-scarce environments. The framework employs a distributional learning approach through a method called engression, which allows for the modeling of predictive distributions conditioned on sensor history. By injecting stochastic noise into sensor inputs and utilizing an energy score loss for training, UQ-SHRED efficiently generates well-calibrated predictive distributions without the need for extensive computational resources or multiple network architectures. The authors validate UQ-SHRED on various real-world datasets, including turbulent flow and atmospheric dynamics, demonstrating its robustness and effectiveness across diverse scientific applications. The paper also includes ablation studies to analyze the impact of different model settings on performance, confirming the framework's capability for valid uncertainty quantification in sparse sensing scenarios.
Methodology
UQ-SHRED utilizes a distributional learning framework that incorporates noise injection into the input of the SHRED architecture. The model is trained using an energy score loss to optimize the predictive distribution of spatial states based on sensor measurements. This approach allows for uncertainty to be modeled throughout the network without requiring additional architectural modifications. At inference, the model generates samples from the conditional predictive distribution by propagating input noise through the trained network.
Results
The UQ-SHRED framework demonstrated effective uncertainty quantification across five complex real-world datasets, including sea-surface temperature, turbulent flows, neural activity, solar activity, and propulsion physics. The results indicated that UQ-SHRED produced well-calibrated confidence intervals and maintained robustness across diverse applications. The ablation studies provided insights into how various hyperparameters affected the quality of uncertainty estimates.
Implications
The development of UQ-SHRED has significant implications for scientific applications that require reliable uncertainty quantification, such as risk assessment, anomaly detection, and decision-making under uncertainty. The framework's ability to provide valid uncertainty estimates can enhance the safety and reliability of systems in fields like fluid dynamics, neuroscience, and atmospheric sciences.
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
NLP
Large Language Models
Efficient ML
- FourierMoE integrates MoE architecture with inverse discrete Fourier transform (IDFT) for frequency-aware adaptation.
- The method addresses task interference and representation deficiency in multi-task fine-tuning settings.
- FourierMoE employs a frequency-adaptive router and learns complex coefficients to capture both phase and amplitude information.
- Extensive evaluations show superior performance across various benchmarks with fewer trainable parameters compared to existing methods.
Read more
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
Summary
The paper introduces FourierMoE, a novel adaptation method for large language models (LLMs) that leverages the mixture-of-experts (MoE) architecture in the spectral domain. Traditional parameter-efficient fine-tuning (PEFT) methods face challenges in multi-task settings due to task interference and representational limitations. FourierMoE addresses these issues by reformulating adaptation through spectral analysis, revealing that different tasks exhibit unique frequency energy distributions and that LLM layers have varying frequency sensitivities. The proposed method employs a frequency-adaptive router to allocate tokens to experts that specialize in distinct frequency bands, allowing for more effective adaptation. Each expert learns conjugate-symmetric complex coefficients, ensuring lossless reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks demonstrate that FourierMoE consistently outperforms existing methods in both single-task and multi-task scenarios while utilizing significantly fewer trainable parameters, showcasing the potential of spectral-domain adaptation for efficient LLM fine-tuning.
Methodology
FourierMoE reformulates the adaptation of LLMs in the spectral domain, utilizing a frequency-adaptive router to direct tokens to specialized experts based on distinct frequency bands. Each expert learns conjugate-symmetric complex coefficients, allowing for a comprehensive representation of spectral information while ensuring lossless reconstruction into real-valued weights.
Results
The results indicate that FourierMoE outperforms competitive baselines across 28 benchmarks, demonstrating enhanced performance in both single-task and multi-task settings while significantly reducing the number of trainable parameters required for adaptation.
Implications
The findings suggest that spectral-domain expert adaptation can serve as an effective and parameter-efficient approach for fine-tuning large language models, potentially leading to advancements in multi-task learning and applications in natural language processing.
Universal Hypernetworks for Arbitrary Models
Computer Vision
Graph Learning
NLP
- UHN is a fixed-architecture generator that can produce weights for various models without redesigning the hypernetwork.
- It supports multi-model generalization and multi-task learning across different architectures.
- UHN allows for stable recursive generation of hypernetworks, enhancing flexibility in model creation.
- Empirical results show UHN's competitive performance across diverse benchmarks.
Read more
Universal Hypernetworks for Arbitrary Models
Summary
The paper introduces the Universal Hypernetwork (UHN), a novel approach that addresses the limitations of conventional hypernetworks, which are typically tied to specific model architectures. UHN is a fixed-architecture generator that predicts neural network weights based on deterministic descriptors related to parameters, architecture, and tasks. This decoupling allows UHN to generate diverse models across various architectures and tasks without the need for redesign or retraining. The authors present three main empirical claims: (1) UHN maintains competitive performance with direct training across multiple benchmarks in vision, graph, text, and formula-regression; (2) it supports both multi-model generalization within a family and multi-task learning across heterogeneous models; and (3) UHN enables stable recursive generation, allowing for the creation of intermediate hypernetworks before producing the final model. The findings suggest that UHN can effectively scale to larger and more diverse target networks while remaining efficient and versatile.
Methodology
The UHN predicts each scalar parameter from deterministic descriptors that encode parameter indices, architecture information, and task details. This method utilizes Gaussian Fourier features to model complex weight fields, allowing a single hypernetwork to generate parameters for various target models efficiently.
Results
The empirical evaluations demonstrate that UHN is competitive with direct training methods across multiple benchmarks, including CIFAR-10, Cora, and AG News. It effectively generalizes across model families and tasks while maintaining performance stability during recursive generation.
Implications
The UHN framework has significant implications for model design in machine learning, particularly in scenarios requiring flexibility across different architectures and tasks. It can streamline the process of model adaptation and deployment, making it easier to leverage hypernetworks in diverse applications.
Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids
Reinforcement Learning
Graph Learning
Optimization
- Introduces a physics-informed RL methodology for topology control in power grids.
- Utilizes a Gibbs prior to select a small, state-dependent set of feasible actions.
- Employs a graph neural network to predict overload risks for action evaluation.
- Achieves significant improvements in reward and decision time compared to existing methods.
Read more
Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids
Summary
This paper addresses the complex problem of topology control in power grids, which involves sequential decision-making with a combinatorial action space that grows with grid size. The authors propose a physics-informed Reinforcement Learning (RL) framework that integrates semi-Markov control with a Gibbs prior to encode the system's physical dynamics. The decision-making process is triggered only when the grid enters hazardous conditions, while a graph neural network (GNN) surrogate predicts the overload risk of feasible topology actions. This approach reduces exploration difficulties and online simulation costs, maintaining the flexibility of learned policies. The method is evaluated across three benchmark environments, demonstrating strong performance: achieving oracle-level results while being significantly faster and more efficient than existing baselines. The proposed framework effectively balances control quality and computational efficiency, making it a promising solution for real-world power grid operations.
Methodology
The proposed method formulates the topology control problem as a semi-Markov decision process, intervening only during hazardous conditions. It constructs a time-dependent candidate action set using a graph-based policy and a physics-informed prior that ranks actions based on predicted overload risks. The prior is learned from simulator rollouts and is used to reweight action scores before selection.
Results
The method achieves oracle-level performance while being approximately 6Ă— faster on the first benchmark, reaches 94.6% of oracle reward with about 200Ă— lower decision time on the second benchmark, and improves over a PPO baseline by up to 255% in reward and 284% in survived steps on the most challenging benchmark, while remaining about 2.5Ă— faster than a specialized engineering baseline.
Implications
The findings suggest that the proposed physics-informed RL framework can significantly enhance decision-making processes in power grid operations, potentially leading to safer and more efficient management of electrical networks under varying operational conditions.
Application of parametric Shallow Recurrent Decoder Network to magnetohydrodynamic flows in liquid metal blankets of fusion reactors
Time Series
- SHRED effectively reconstructs MHD states from sparse measurements.
- The integration of SVD with SHRED enhances computational efficiency.
- The framework generalizes well to unseen magnetic field configurations.
- SHRED can infer magnetic field dynamics from temperature data alone.
Read more
Application of parametric Shallow Recurrent Decoder Network to magnetohydrodynamic flows in liquid metal blankets of fusion reactors
Summary
This paper presents a novel data-driven framework for reconstructing magnetohydrodynamic (MHD) states in liquid metal blankets of fusion reactors using a parametric Shallow Recurrent Decoder Network (SHRED). MHD phenomena are critical in nuclear fusion systems where electrically conducting fluids interact with magnetic fields, influencing flow dynamics. Traditional numerical solutions for MHD models are computationally intensive, especially in real-time or multi-query contexts. The authors propose integrating dimensionality reduction via Singular Value Decomposition (SVD) with SHRED to reconstruct full spatio-temporal states from sparse measurements. The methodology is applied to a three-dimensional model of a water-cooled tube surrounded by lead-lithium flows, examining various magnetic field configurations. Results demonstrate that SHRED achieves high accuracy and robustness in reconstructing MHD states, even under previously unseen conditions, including time-varying magnetic fields. Notably, the framework can infer the evolution of the magnetic field using only temperature measurements. The findings highlight SHRED's potential as a computationally efficient tool for real-time monitoring and control in fusion reactor blanket systems.
Methodology
The study employs a combination of Singular Value Decomposition (SVD) for dimensionality reduction and the SHallow REcurrent Decoder (SHRED) neural network architecture to reconstruct MHD states from sparse time-series measurements. The methodology is tested on a three-dimensional model representing a portion of a water-cooled blanket cell.
Results
SHRED demonstrated high reconstruction accuracy and robustness across various magnetic field configurations, including constant and time-dependent fields. The model effectively generalized to conditions not encountered during training, accurately inferring the temporal evolution of magnetic fields using temperature measurements.
Implications
The findings suggest that SHRED can serve as a powerful tool for real-time monitoring, diagnostics, and control in fusion reactor blanket systems, potentially improving the design and operation of nuclear fusion reactors.
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Reinforcement Learning
Large Language Models
Robotics
- SKILL0 is the first RL framework explicitly designed for skill internalization, enabling zero-shot autonomous behavior.
- In-context reinforcement learning (ICRL) is introduced to transition from context-dependent execution to intrinsic competence.
- Dynamic Curriculum adaptsively withdraws skills based on their on-policy helpfulness, optimizing the learning process.
- SKILL0 achieves substantial performance improvements over traditional RL baselines while maintaining a low token context size.
Read more
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Summary
The paper introduces SKILL0, a novel framework for skill internalization in reinforcement learning (RL) that allows agents to autonomously perform tasks without relying on inference-time skill retrieval. Traditional methods of skill augmentation involve injecting skills into the model's context during inference, which can lead to retrieval noise, token overhead, and a lack of true knowledge acquisition. SKILL0 addresses these limitations by implementing an in-context reinforcement learning (ICRL) approach, where skills are initially provided as guidance during training but are completely removed during inference. This transition is facilitated through a Dynamic Curriculum that evaluates the helpfulness of each skill, retaining only those that contribute to the agent's performance. The framework demonstrates significant improvements over standard RL baselines, achieving better performance while maintaining a compact context size, thus reducing inference overhead. The results indicate that SKILL0 effectively enables zero-shot autonomous behavior, marking a significant advancement in the field of agent-based learning.
Methodology
The methodology involves a training regime that starts with full skill context and progressively removes it, utilizing in-context reinforcement learning (ICRL) to optimize the transition from context-dependent execution to autonomous behavior. Skills are grouped and rendered with interaction history into a compact visual context, and a Dynamic Curriculum evaluates the on-policy helpfulness of skills to determine their retention during training.
Results
SKILL0 shows substantial improvements over standard RL baselines, achieving a +9.7% increase for ALFWorld and a +6.6% increase for Search-QA. The framework maintains an efficient context of fewer than 0.5k tokens per step, significantly reducing inference overhead while enhancing task performance.
Implications
The implications of this research suggest that skill internalization can lead to more efficient and capable autonomous agents, reducing reliance on external skill retrieval and enhancing the scalability of agent-based systems in complex environments.
DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
Reinforcement Learning
Generative Models
Large Language Models
- DISCO-TAB synthesizes clinical data while preserving privacy and ensuring clinical validity.
- The framework uses a hierarchical reinforcement learning approach to evaluate data quality at multiple granularities.
- It incorporates techniques to preserve medical logic and address class imbalances in synthetic data.
- DISCO-TAB shows significant improvements in clinical classifier utility and statistical fidelity compared to existing methods.
Read more
DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
Summary
The paper presents DISCO-TAB, a novel framework designed to synthesize complex clinical data while preserving privacy and ensuring clinical validity. Traditional methods for generating synthetic data from Electronic Health Records (EHR) often fail to capture the intricate dependencies and class imbalances present in biomedical datasets. DISCO-TAB addresses these challenges by integrating a fine-tuned Large Language Model (LLM) with a multi-objective discriminator system, optimized through a hierarchical reinforcement learning approach. This framework evaluates the quality of synthetic data at multiple levels—token, sentence, feature, and row—allowing for a more nuanced assessment of data validity. The authors introduce techniques such as Automated Constraint Discovery and Inverse-Frequency Reward Shaping to maintain medical logic and mitigate issues related to minority class representation. The framework is validated on various benchmarks, including datasets related to heart failure and Parkinson's disease, demonstrating significant improvements in downstream clinical classifier utility and statistical fidelity. The results indicate that DISCO-TAB outperforms existing methods, achieving up to a 38.2% enhancement in utility while maintaining robust defenses against membership inference attacks. This work sets a new benchmark for generating trustworthy synthetic tabular data in healthcare applications.
Methodology
DISCO-TAB combines a fine-tuned Large Language Model with a hierarchical reinforcement learning optimization strategy. It evaluates synthetic data quality at four levels: token, sentence, feature, and row, using multi-objective feedback to ensure compliance with clinical constraints. The framework employs Automated Constraint Discovery and Inverse-Frequency Reward Shaping to maintain medical logic and address minority class collapse.
Results
The framework achieved up to a 38.2% improvement in downstream clinical classifier utility compared to baseline methods such as GANs and diffusion models. It also demonstrated exceptional statistical fidelity with Jensen-Shannon Divergence (JSD) values below 0.01 and strong resistance to membership inference attacks.
Implications
DISCO-TAB has significant implications for the development of reliable clinical decision support systems, enabling the generation of synthetic data that is both useful for training AI models and compliant with privacy regulations. This could facilitate advancements in precision medicine and improve patient care by providing high-quality, explainable data for AI applications.
Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular Grids
Graph Learning
Efficient ML
- VIRSO provides accurate sparse-to-dense reconstruction for irregular geometries.
- The method integrates spectral and spatial analysis for improved performance.
- Achieves mean relative L2 errors below 1% while reducing energy-delay product significantly.
- Demonstrates edge-deployability with low power consumption and latency.
Read more
Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular Grids
Summary
The paper presents VIRSO (Virtual Irregular Real-Time Sparse Operator), a novel graph-based neural operator designed for sparse-to-dense reconstruction on irregular geometries, addressing the challenges of real-time virtual sensing in resource-constrained environments. Traditional physics-based solvers are often too slow and power-intensive for real-time applications, particularly in fields like nuclear thermal-hydraulics where accurate sensing is critical but instrumentation is limited. VIRSO integrates both spectral and spatial analysis to enhance reconstruction accuracy while minimizing latency and power consumption. The authors introduce a variable-connectivity algorithm, Variable KNN (V-KNN), for efficient graph construction tailored to mesh geometries. Evaluations on three nuclear thermal-hydraulic benchmarks demonstrate that VIRSO achieves mean relative L2 errors below 1% across various reconstruction ratios, outperforming existing operators with fewer parameters. The implementation on an NVIDIA Jetson Orin Nano shows sub-10 W power consumption and sub-second latency, highlighting its suitability for edge deployment. This work establishes a new paradigm for compute-aware operator learning, emphasizing the importance of hardware constraints in the design of virtual sensing instruments.
Methodology
The authors developed VIRSO, a graph-based neural operator that utilizes a variable-connectivity algorithm (V-KNN) for mesh-informed graph construction. The approach combines spectral and spatial analysis to enhance reconstruction accuracy from sparse boundary measurements, focusing on hardware constraints for edge deployment.
Results
VIRSO was evaluated on three nuclear thermal-hydraulic benchmarks, achieving mean relative L2 errors below 1% and demonstrating significant improvements in energy-delay product (EDP), reducing it from approximately 206 J·ms to 10.1 J·ms on an NVIDIA H200. The implementation on an NVIDIA Jetson Orin Nano maintained sub-10 W power consumption and sub-second latency across all configurations.
Implications
The findings suggest that VIRSO can serve as a viable solution for real-time virtual sensing in environments where traditional instrumentation is impractical, such as in advanced nuclear energy systems. This work paves the way for more efficient and deployable sensing technologies in various fields requiring real-time monitoring and control.
Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
Optimization
Efficient ML
Theory
- Sven optimizes neural networks by treating each data point's residual as a separate condition.
- The algorithm approximates the Moore-Penrose pseudoinverse using truncated SVD, leading to lower computational costs.
- Sven significantly outperforms Adam and other first-order methods in regression tasks.
- The method is particularly suited for over-parameterized models and can be applied in scientific computing.
Read more
Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
Summary
This paper introduces Sven, a novel optimization algorithm for neural networks that leverages the natural decomposition of loss functions into individual data point contributions. Unlike traditional methods that reduce the entire loss to a single scalar, Sven treats each data point's residual as a separate condition to be satisfied simultaneously. The algorithm employs the Moore-Penrose pseudoinverse of the loss Jacobian to compute a minimum-norm parameter update that addresses all conditions at once. To enhance computational efficiency, Sven approximates this pseudoinverse using a truncated singular value decomposition (SVD), retaining only the k most significant directions, which results in a computational overhead proportional to k, significantly lower than the square of the number of parameters typical in natural gradient methods. The authors demonstrate that Sven outperforms standard first-order optimization methods like Adam in terms of convergence speed and final loss on regression tasks, while also being competitive with LBFGS at a reduced computational cost. The paper discusses challenges related to memory overhead and proposes strategies for mitigation, highlighting Sven's potential applications in scientific computing where custom loss functions can be decomposed into multiple conditions.
Methodology
Sven employs a linear algebra approach to optimization by using the Moore-Penrose pseudoinverse of the loss Jacobian, approximated through truncated singular value decomposition (SVD). This allows for simultaneous updates to model parameters based on individual data point conditions, rather than aggregating them into a single loss value.
Results
The experimental results show that Sven converges faster and achieves lower final loss compared to standard optimization methods like Adam on various regression tasks, while also being competitive with LBFGS at a fraction of the computational cost.
Implications
Sven's methodology has significant implications for optimizing neural networks, particularly in scenarios where loss functions can be decomposed into multiple conditions. Its efficiency and performance suggest potential applications in scientific computing and other fields requiring complex loss structures.
Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling
Computer Vision
Interpretability
Theory
- Expert evaluations significantly enhance the quality of uncertainty estimates in medical AI.
- The proposed two-ensemble method effectively separates epistemic and aleatoric uncertainty.
- The framework shows substantial improvements in various medical tasks, outperforming state-of-the-art methods.
- A simplified one-ensemble method offers comparable performance with greater efficiency.
Read more
Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling
Summary
This paper addresses the critical challenge of AI reliability in healthcare by proposing a novel framework that integrates expert knowledge into uncertainty estimation. The authors highlight the importance of uncertainty estimation in medical AI systems, particularly in quantifying aleatoric uncertainty, which is often overlooked. They introduce a two-ensemble approach that utilizes expert disagreement to generate soft labels for training machine learning models, allowing for separate estimation of epistemic and aleatoric uncertainties. The method is validated across various medical tasks, including binary image classification and multiple-choice question answering, demonstrating significant improvements in uncertainty estimation quality. The authors also present a simplified one-ensemble variant that maintains performance while enhancing efficiency. Overall, the study emphasizes the value of expert input in developing risk-aware AI systems for healthcare applications.
Methodology
The authors propose a two-ensemble approach where one ensemble predicts hard labels for epistemic uncertainty, while a second ensemble, trained on expert-generated soft labels, estimates aleatoric uncertainty. This method leverages the law of total variance to decompose uncertainty into its components. A simplified one-ensemble alternative is also introduced for improved efficiency.
Results
The proposed method achieved a 9% improvement in multiple-choice question answering, a 50% improvement in image classification, a 7% improvement in binary image segmentation, and a 49% improvement in multiclass image segmentation compared to the second-best solution across various datasets.
Implications
The findings suggest that integrating expert knowledge into AI systems can significantly enhance their reliability and effectiveness in medical applications, potentially leading to better patient outcomes and more efficient healthcare workflows.
Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics
Time Series
Optimization
- The study systematically compares deep learning models with traditional statistical methods for demand forecasting.
- N-BEATS outperforms MSTL in forecasting accuracy, making it the most optimized model for this dataset.
- The proposed framework integrates forecasting with operational decision-making through integer linear programming.
- The research demonstrates the practical application of improved forecasting in logistics planning.
Read more
Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics
Summary
This paper addresses the challenges of demand forecasting in supply chain management, particularly the difficulties posed by seasonality, irregular spikes, and noise in retail data. The authors propose a three-step analytical framework that integrates forecasting with operational analytics. The first step involves exploratory data analysis of 180,519 transactions to identify trends and seasonal patterns. The second step compares the forecasting performance of the N-BEATS and N-HiTS deep learning models against the MSTL statistical model. Results indicate that both deep learning models significantly outperform MSTL, with N-BEATS achieving the lowest forecasting error. In the final step, the forecasts are utilized in an integer linear programming (ILP) model to optimize delivery plans, minimizing total delivery time while adhering to budget and capacity constraints. The study highlights the practical impact of accurate forecasting and interpretable model optimization in logistics, demonstrating a cohesive workflow from predictive analytics to prescriptive decision-making.
Methodology
The methodology consists of three stages: (1) exploratory data analysis to identify trends and seasonal components in the dataset, (2) comparative analysis of forecasting models (N-BEATS, N-HiTS, and MSTL) to determine the most accurate model, and (3) application of the selected forecasting model in an integer linear programming framework to optimize delivery plans.
Results
The results show that both N-BEATS and N-HiTS significantly outperform the MSTL model in forecasting accuracy, with N-BEATS being the most effective. The optimized delivery plan generated through the ILP model resulted in a feasible and cost-effective shipping strategy, minimizing delivery time under budget and capacity constraints.
Implications
The findings suggest that integrating advanced forecasting techniques with optimization models can enhance decision-making in supply chain management. This approach can lead to more efficient logistics operations and reduced costs, making it valuable for businesses facing complex demand patterns.
Test-Time Scaling Makes Overtraining Compute-Optimal
Large Language Models
Optimization
Theory
- Introduces Train-to-Test (T2) scaling laws that optimize pretraining and test-time decisions jointly.
- Demonstrates that optimal pretraining strategies shift towards overtraining when factoring in inference costs.
- Validates the T2 scaling approach by showing improved performance of overtrained models across various tasks.
- Findings remain relevant even after post-training, suggesting practical implications for model deployment.
Read more
Test-Time Scaling Makes Overtraining Compute-Optimal
Summary
This paper addresses the gap between pretraining scaling laws and test-time scaling strategies for large language models (LLMs). The authors introduce Train-to-Test (T2) scaling laws that optimize model size, training tokens, and inference samples under a fixed compute budget, thereby modernizing existing pretraining scaling laws. The study reveals that optimal pretraining decisions shift towards overtraining when considering inference costs, which is a departure from traditional scaling recommendations like those from Chinchilla. Through extensive evaluations across eight downstream tasks, the authors demonstrate that heavily overtrained models, when pre-trained according to T2 scaling forecasts, significantly outperform those trained under standard pretraining scaling laws. Furthermore, the findings persist even after post-training, indicating the robustness of T2 scaling in practical deployments. The paper emphasizes the need for a unified approach to pretraining and inference scaling, highlighting the nonlinear relationship between model size, training duration, and inference quality.
Methodology
The authors propose a joint optimization framework that incorporates model size, dataset size, and inference compute under a total budget. They evaluate two approaches: one based on loss and another on accuracy (pass@k). The methodology includes extensive experiments with over 100 models across different compute levels to validate the T2 scaling laws.
Results
The results indicate that the optimal pretraining decisions, when considering test-time compute, favor smaller and more overtrained models compared to traditional scaling laws. The T2 scaling laws consistently predict improved performance across eight tasks, confirming the advantages of overtraining in the context of inference costs. Additionally, the benefits of T2 scaling persist after post-training adjustments.
Implications
The findings suggest that practitioners should reconsider their pretraining strategies based on expected test-time usage, potentially leading to more efficient and effective model deployments. The T2 scaling laws could guide future research in optimizing LLMs for various applications, particularly in scenarios requiring repeated sampling.