AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
48
Papers today
8h
Update frequency
7
Days of history
Cross-Subject Generalization for EEG Decoding: A Survey of Deep Learning Methods
Time Series
- Inter-subject variability poses significant challenges for deep learning in EEG decoding.
- The survey categorizes methodologies into distinct families addressing cross-subject generalization.
- Rigorous evaluation protocols are essential for assessing the effectiveness of these methodologies.
- The paper emphasizes the importance of utilizing subject identity and metadata in model training.
Read more
Cross-Subject Generalization for EEG Decoding: A Survey of Deep Learning Methods
Summary
This survey addresses the challenge of cross-subject generalization in EEG decoding using deep learning methods, which is significantly affected by inter-subject variability. The authors formalize the cross-subject setting as a multi-source domain problem and propose rigorous, subject-independent evaluation protocols. They categorize existing methodologies into families such as feature alignment, adversarial learning, feature disentanglement, and contrastive learning. The survey highlights the theoretical limitations of current approaches, the importance of subject identity, and the potential of EEG foundation models. By focusing exclusively on deep learning techniques and expanding the application scope to include various tasks like emotion recognition and motor imagery, this survey provides a comprehensive analysis of how to leverage subject-level information to enhance model generalization and robustness in real-world applications.
Methodology
The authors systematically categorize and analyze deep learning methodologies aimed at cross-subject generalization, including feature alignment, adversarial learning, feature disentanglement, contrastive learning, and meta-learning. They also discuss the importance of rigorous evaluation protocols and the use of subject-level information to improve model performance.
Results
The survey does not present original experimental results but synthesizes findings from existing literature, highlighting the effectiveness of various methodologies in addressing the cross-subject generalization problem. It identifies gaps in current research and suggests directions for future work.
Implications
The findings of this survey have significant implications for the development of robust EEG decoding systems applicable in clinical diagnostics, brain-computer interfaces, and cognitive state analysis. By improving cross-subject generalization, these methodologies can enhance the reliability and applicability of EEG-based technologies in real-world settings.
Open Problems in Frontier AI Risk Management
Theory
- Frontier AI systems introduce novel safety risks that existing risk management frameworks are ill-equipped to handle.
- The paper identifies and categorizes open problems in the risk management process for frontier AI.
- Different types of open problems require tailored responses from various stakeholders.
- The authors provide a structured review of the literature to highlight unresolved challenges in risk management.
Read more
Open Problems in Frontier AI Risk Management
Summary
The paper addresses the emerging safety risks associated with frontier AI systems, which are general-purpose and capable of performing a wide range of tasks. It highlights that existing AI-specific risk management standards were primarily developed for narrow AI systems and are inadequate for the unique challenges posed by frontier AI. The authors systematically identify open problems in frontier AI risk management by examining the entire risk management process, including planning, identification, analysis, evaluation, and mitigation. They classify these problems into three categories: lack of scientific or technical consensus, misalignment with established frameworks, and implementation shortcomings. The paper aims to clarify the necessary steps for achieving robust consensus in frontier AI risk management and serves as a problem-oriented reference document to guide future research and governance efforts. It does not propose specific solutions but emphasizes the importance of collaboration among various stakeholders, including developers, regulators, and researchers.
Methodology
The authors adopted a problem-oriented approach, systematically reviewing the literature on risk management processes and identifying unresolved challenges at each stage, including planning, identification, analysis, evaluation, and mitigation.
Results
The paper identifies several open problems in frontier AI risk management, categorizing them based on their nature and the stakeholders best positioned to address them. It highlights the need for improved consensus and alignment in risk management practices.
Implications
The findings underscore the necessity for updated risk management frameworks that can accommodate the complexities of frontier AI. The paper encourages collaboration among various actors in the AI ecosystem to address these challenges effectively.
Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification
Audio & Speech
- Existing bioacoustic systems are limited by pre-trained models that only utilize the 0-8 kHz baseband.
- The proposed multi-band encoding framework effectively captures higher-frequency information from animal vocalizations.
- Fused representations from multi-band encoding outperform traditional methods in classification tasks.
- The study provides an open-source toolkit for the bioacoustics community to implement the proposed methods.
Read more
Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification
Summary
This paper addresses the limitations of existing computational bioacoustics systems that primarily rely on audio models pre-trained at 16 kHz, which restricts their analysis to the 0-8 kHz baseband and discards higher-frequency information crucial for understanding animal vocalizations. The authors propose an adaptive multi-band encoding framework that decomposes the full spectrum of animal calls into multiple frequency bands, allowing for the extraction of band features that are then fused into a unified representation. The study conducts classification experiments on three bioacoustic datasets using eight pre-trained models and five fusion strategies, demonstrating that the multi-band approach consistently outperforms both the conventional baseband and time-expansion methods. The findings suggest that utilizing high-frequency information can significantly enhance the classification of animal calls, paving the way for more effective bioacoustic analysis.
Methodology
The authors developed a heterodyning-based multi-band processing approach that involves three main stages: spectral band decomposition, per-band representation extraction using pre-trained encoders, and fusion of the band-level representations into a unified embedding for classification. This method allows for the effective utilization of high-frequency information in animal vocalizations.
Results
The classification experiments revealed that the multi-band representations significantly improved class separation and classification accuracy compared to baseband and time-expansion baselines across two of the three datasets tested. The results indicate that the proposed method can leverage previously unused high-frequency information to enhance bioacoustic classification tasks.
Implications
The findings suggest that adaptive multi-band encoding can revolutionize bioacoustic classification by enabling researchers to analyze a broader spectrum of animal vocalizations. This could lead to improved understanding of animal communication and behavior, as well as advancements in ecological monitoring and conservation efforts.
FiLMMeD: Feature-wise Linear Modulation for Cross-Problem Multi-Depot Vehicle Routing
Optimization
- FiLMMeD is the first MTL model explicitly targeting the MDVRP.
- Introduces Feature-wise Linear Modulation to enhance generalization across diverse constraints.
- Demonstrates Preference Optimization as a superior alternative to Reinforcement Learning in MTL.
- Employs a targeted curriculum learning strategy to improve model training.
Read more
FiLMMeD: Feature-wise Linear Modulation for Cross-Problem Multi-Depot Vehicle Routing
Summary
The paper presents FiLMMeD, a novel neural-based model designed to tackle the Multi-Depot Vehicle Routing Problem (MDVRP) through multi-task learning (MTL). Traditional methods struggle with the computational complexity of MDVRP, especially given the diverse constraints that arise in real-world logistics. FiLMMeD addresses these challenges by introducing Feature-wise Linear Modulation (FiLM) to dynamically adjust internal representations based on active constraints, enhancing model generalization across 24 MDVRP variants. The authors also demonstrate Preference Optimization as a more effective alternative to Reinforcement Learning within the MTL framework. Additionally, a targeted curriculum learning strategy is employed to progressively introduce the model to complex constraint interactions, mitigating generalization gaps. Experimental results show that FiLMMeD consistently outperforms state-of-the-art baselines across various MDVRP formulations, establishing its effectiveness and versatility in solving complex vehicle routing problems.
Methodology
The methodology involves augmenting a standard Transformer encoder with Feature-wise Linear Modulation (FiLM) to condition internal representations based on active constraints. The model is trained using a multi-task learning approach that allows it to learn from multiple MDVRP variants simultaneously. A targeted curriculum learning strategy is implemented to progressively expose the model to more complex constraints, enhancing its ability to generalize across different problem formulations.
Results
Extensive experiments demonstrate that FiLMMeD outperforms existing state-of-the-art methods across 24 MDVRP variants, including 8 new formulations. The model shows significant improvements in solution quality and computational efficiency, validating its design and approach.
Implications
The implications of this work extend to logistics and transportation sectors, where efficient routing solutions are critical for operations, especially in the context of e-commerce. The ability to adapt to varying constraints without retraining can lead to more agile and responsive logistics systems.
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time Series
Interpretability
- Introduction of an efficient SHAP-based explainability algorithm for TSFMs.
- Evaluation of Chronos-2 and TabPFN-TS for load forecasting against state-of-the-art models.
- Demonstration of meaningful use of covariates in load predictions.
- Alignment of model explanations with established domain knowledge.
Read more
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Summary
This paper addresses the challenge of explainability in Time Series Foundation Models (TSFMs) for load forecasting in energy systems. The authors propose an efficient algorithm for computing Shapley Additive Explanations (SHAP) tailored to TSFMs, which enhances transparency and trust in model predictions. The approach utilizes temporal and covariate masking to create coalition samples, allowing for scalable explanations without extensive computational costs. The study evaluates two TSFMs, Chronos-2 and TabPFN-TS, on a day-ahead load forecasting task using operational data from a transmission system operator in Germany. The results demonstrate that both models achieve competitive predictive performance compared to a Transformer model trained on extensive historical data. Furthermore, the explanations generated align with established domain knowledge, indicating that the models effectively utilize relevant covariates such as weather and calendar information. Overall, the findings suggest that TSFMs can be reliable and transparent tools for operational energy forecasting.
Methodology
The authors developed a SHAP-based algorithm that employs temporal and covariate masking to efficiently compute explanations for TSFMs. This method allows for the generation of coalition samples by selectively withholding inputs, which are then processed to derive SHAP values without the need for extensive background sampling.
Results
Chronos-2 and TabPFN-TS achieved competitive predictive performance in a zero-shot setting, comparable to a Transformer model trained on multiple years of data. The explanations derived from the models were consistent with domain knowledge, confirming their effective use of covariates.
Implications
The proposed approach enhances the transparency of TSFMs, making them suitable for deployment in operational energy forecasting. This is particularly relevant in light of regulatory frameworks emphasizing the need for explainability in AI applications within critical infrastructure.
A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification
Computer Vision
Efficient ML
- Mean pooling MIL outperforms or matches advanced MIL and 3D CNN methods on several tasks.
- Attention-based methods do not significantly improve performance compared to simple mean pooling.
- The study highlights the efficiency of mean pooling MIL, being 25 times faster to train than complex alternatives.
- Quality of learned attention in existing MIL methods is questioned, with trivial baselines performing comparably.
Read more
A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification
Summary
This paper presents a comprehensive evaluation of multiple instance learning (MIL) techniques for classifying 3D neuroimages, specifically focusing on CT and MRI scans. The authors systematically compare various methods, including simple MIL, attention-based MIL, 3D convolutional neural networks (CNNs), and 3D vision transformers (ViTs) across three CT and four MRI datasets, which include two large datasets with over 10,000 scans each. The study aims to guide practitioners in selecting effective neural network architectures for neuroimage classification, particularly in resource-constrained settings. The findings reveal that a straightforward mean pooling MIL approach often matches or surpasses the performance of more complex attention-based methods and 3D CNNs on moderate-sized tasks while being significantly faster to train. The authors also analyze the quality of learned attention in these models and propose that the lack of substantial gains from advanced MIL methods suggests room for innovation in future designs. The paper concludes by providing a reproducible framework for further research in this area.
Methodology
The authors conducted a systematic comparison of different MIL architectures, pooling strategies, and encoders across multiple datasets. They evaluated the performance of simple mean pooling against attention-based methods and 3D CNNs, analyzing training efficiency and classification accuracy. Additionally, they examined the quality of learned attention using instance-level labels from a specific dataset and created a semi-synthetic dataset to evaluate classifier performance against a Bayes estimator.
Results
The results indicated that mean pooling MIL consistently matched or outperformed attention-based MIL methods and 3D CNNs on moderate-sized tasks, while remaining competitive on larger datasets. The analysis of attention quality revealed that no learned attention method surpassed a simple Gaussian baseline, raising questions about the effectiveness of current attention mechanisms in MIL.
Implications
The findings suggest that simpler models like mean pooling MIL should be considered as strong baselines in neuroimage classification tasks. The study also indicates potential avenues for future research in improving MIL methodologies, particularly in understanding the limitations of current attention-based approaches.
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Large Language Models
NLP
Efficient ML
- AutoSP is the first automated solution for optimizing LLM training for long-context tasks.
- It integrates sequence parallelism and activation-checkpointing into the PyTorch-2.0 compilation stack.
- AutoSP significantly increases training context lengths without compromising runtime performance.
- The solution simplifies the implementation of complex long-context optimizations for developers.
Read more
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Summary
The paper introduces AutoSP, a novel automated solution designed to optimize the training of large language models (LLMs) for long-context tasks, which involve processing input sequences of tens to hundreds of thousands of tokens. Existing training libraries primarily focus on optimizing for large parameter counts, making it challenging for developers to implement long-context optimizations such as sequence parallelism without extensive expertise. AutoSP addresses this by compiling models and applying targeted optimizations, including automated sequence parallelism and long-context aware activation-checkpointing. The evaluation of AutoSP on both NVIDIA and AMD hardware shows significant improvements in training context lengths—up to 2.7× for NVIDIA and 2.5× for AMD—while maintaining negligible costs to runtime performance. This advancement allows for more efficient training of LLMs in scenarios requiring long-context processing, thereby enhancing productivity and accessibility for developers.
Methodology
AutoSP employs a compiler-based approach to implement sequence parallelism within the PyTorch-2.0 ecosystem. It consists of two main components: a sequence-parallel transformation pass that automatically manages communication collectives and reshapes activations, and a sequence-aware activation checkpointing pass that optimizes memory usage during long-context training. This allows for seamless integration of long-context optimizations into existing training pipelines with minimal code changes.
Results
The evaluation of AutoSP revealed that it can increase the training context length by up to 2.7× on NVIDIA hardware and 2.5× on AMD hardware compared to competitive hand-written baselines, all while incurring negligible costs to runtime performance. This demonstrates AutoSP's effectiveness in enhancing the trainability of LLMs for long-context tasks.
Implications
The development of AutoSP has significant implications for the field of natural language processing, particularly in applications requiring long-context understanding such as document analysis, multi-step reasoning, and extended dialogues. By simplifying the implementation of long-context training, AutoSP can enhance developer productivity and broaden access to advanced LLM capabilities.
Random Cloud: Finding Minimal Neural Architectures Without Training
Efficient ML
- Introduces a training-free method for neural architecture search called Random Cloud.
- Achieves significant parameter reduction while maintaining or improving accuracy compared to traditional pruning methods.
- Evaluates networks without backpropagation, reducing computational costs associated with full training cycles.
- Demonstrates effectiveness across seven diverse classification datasets.
Read more
Random Cloud: Finding Minimal Neural Architectures Without Training
Summary
The paper introduces the Random Cloud method, a novel approach to neural architecture search that eliminates the need for training during the topology discovery phase. This method focuses on identifying minimal feedforward network architectures through a stochastic exploration process, followed by a progressive reduction of network structure. Unlike traditional methods that require full training cycles for pruning, Random Cloud evaluates randomly initialized networks solely based on their classification accuracy without backpropagation. The approach involves generating a 'cloud' of networks, assessing their performance, and iteratively reducing their complexity until the best-performing candidate is identified. The final candidate is then trained using standard backpropagation. The method was empirically tested across seven classification benchmarks, demonstrating that it matches or surpasses the performance of magnitude and random pruning techniques while achieving significant parameter reduction and reduced computational costs. Notably, the Random Cloud method achieved a 4.9 percentage point increase in accuracy on the Sonar dataset compared to magnitude pruning, with an 87% reduction in parameters, and was faster than both pruning methods in four out of five datasets evaluated.
Methodology
The Random Cloud method consists of three phases: Exploration, where a cloud of networks is generated and evaluated based on classification accuracy without training; Selection, where the best-performing network is chosen based on accuracy and minimal topology; and Refinement, where the selected network is trained using backpropagation. The topology reduction is performed by sequentially removing neurons from the last hidden layer until no further reduction is possible.
Results
The Random Cloud method was evaluated on seven classification datasets, achieving statistically significant improvements in accuracy and parameter reduction. It matched or outperformed magnitude and random pruning methods in six out of seven datasets, with notable results such as a 4.9 percentage point increase in accuracy on the Sonar dataset and an 87% reduction in parameters. The method was also faster than both pruning techniques in four out of five datasets.
Implications
The Random Cloud method offers a promising alternative for neural architecture search, particularly in scenarios where computational resources are limited. Its ability to discover minimal architectures without extensive training cycles can lead to more efficient model deployment in various machine learning applications.
AutoREC: A software platform for developing reinforcement learning agents for equivalent circuit model generation from electrochemical impedance spectroscopy data
Reinforcement Learning
- AutoREC is an open-source platform for automating ECM generation from EIS data using reinforcement learning.
- The platform formulates ECM construction as a sequential decision-making problem, improving scalability and efficiency.
- The RL agent achieved over 99.6% success on synthetic datasets and showed strong generalization to real-world EIS data.
- AutoREC addresses the limitations of traditional manual ECM identification methods, enabling faster and more consistent analysis.
Read more
AutoREC: A software platform for developing reinforcement learning agents for equivalent circuit model generation from electrochemical impedance spectroscopy data
Summary
This paper presents AutoREC, an open-source Python package designed for the development of reinforcement learning (RL) agents that automatically generate equivalent circuit models (ECMs) from electrochemical impedance spectroscopy (EIS) data. Traditional ECM identification relies on manual trial-and-error methods, which are time-consuming and require expert knowledge, limiting scalability in automated experimental setups. AutoREC reformulates ECM construction as a sequential decision-making problem within a Markov Decision Process framework. The platform employs a Double Deep Q-Network with prioritized experience replay and a dead-loop mitigation strategy to navigate the complex action space for circuit generation. The authors trained an RL agent using AutoREC and assessed its performance across various datasets, achieving a success rate of over 99.6% on synthetic datasets and demonstrating strong generalization capabilities on unseen experimental EIS data from diverse applications, including batteries and corrosion systems. The findings suggest that AutoREC can significantly enhance the efficiency and adaptability of ECM generation, making it a valuable tool for integration into automated electrochemical workflows.
Methodology
AutoREC utilizes a Double Deep Q-Network (DDQN) framework with prioritized experience replay to train reinforcement learning agents. The construction of ECMs is modeled as a Markov Decision Process, allowing the agent to make sequential decisions about circuit elements and their connections. A dead-loop mitigation strategy is implemented to enhance exploration within the action space.
Results
The trained RL agent demonstrated a success rate exceeding 99.6% on synthetic datasets and effectively generalized to unseen experimental EIS data from various electrochemical systems, including batteries and corrosion processes.
Implications
AutoREC has significant implications for the automation of electrochemical data analysis, potentially accelerating materials discovery and optimizing experimental workflows in self-driving laboratories. Its ability to generate ECMs autonomously can reduce the reliance on human expertise and improve the scalability of electrochemical research.
Mini-Batch Class Composition Bias in Link Prediction
Graph Learning
- GNNs trained for link prediction may learn trivial heuristics rather than meaningful representations.
- Mini-batch class composition introduces bias that affects the learning of graph features.
- Randomizing mini-batch composition improves feature alignment with node classification tasks.
- Current link prediction methods may overestimate their ability to generalize across tasks.
Read more
Mini-Batch Class Composition Bias in Link Prediction
Summary
This paper investigates the performance of Graph Neural Networks (GNNs) in link prediction tasks, particularly focusing on the biases introduced by mini-batch class composition. The authors argue that while GNNs have shown the ability to learn transferable representations across graphs for node classification, this does not hold true for link prediction tasks. They identify that popular link prediction models can exploit a mini-batch dependent heuristic due to batch normalization layers, allowing them to predict edges without learning complex node class features. This leads to an overestimation of the models' capabilities to generalize across tasks. To address this issue, the authors propose a method to randomize the composition of positive and negative edges in mini-batches. Their experiments reveal that while this adjustment decreases link prediction performance, it enhances the alignment of learned features with those relevant to node classification, suggesting that GNNs can indeed learn meaningful representations when trained appropriately.
Methodology
The authors analyze common link prediction models and their training procedures, focusing on the impact of mini-batch composition on learning. They implement a randomized approach to mini-batch construction, varying the ratio of positive to negative edges, and evaluate the resulting performance and feature alignment with node classification tasks.
Results
The results indicate that while randomizing mini-batch composition leads to a decrease in link prediction performance, it significantly improves the alignment of the learned representations with features relevant to node classification. This suggests that GNNs can learn more meaningful graph representations when trained under the right conditions.
Implications
The findings imply that practitioners should be cautious about the training procedures used for link prediction tasks in GNNs, as standard methods may lead to biased learning. The study encourages the exploration of alternative training regimes to enhance the generalization capabilities of GNNs across different graph-related tasks.
NeuroPlastic: A Plasticity-Modulated Optimizer for Biologically Inspired Learning Dynamics
Optimization
- NeuroPlastic introduces a biologically inspired optimization approach that combines multiple learning signals.
- The optimizer shows significant improvements over traditional gradient-only methods, particularly in challenging datasets.
- A stabilization mechanism is included to maintain stable optimization dynamics across various learning rates.
- The framework provides reproducible benchmarks and diagnostic tools for analyzing optimization behavior.
Read more
NeuroPlastic: A Plasticity-Modulated Optimizer for Biologically Inspired Learning Dynamics
Summary
The paper introduces NeuroPlastic, a novel optimizer that enhances traditional gradient-based updates by incorporating a plasticity modulation mechanism inspired by biological learning dynamics. Unlike conventional optimizers that rely solely on local gradient statistics, NeuroPlastic employs a multi-signal modulation approach that integrates gradient magnitude, activity-like, and memory-like statistics. This design allows for dynamic scaling of gradient updates, making it particularly effective in scenarios with limited or noisy learning signals. The authors evaluate NeuroPlastic against standard optimizers like SGD and Adam across various image classification benchmarks, demonstrating consistent performance improvements, especially in challenging tasks and reduced-data settings. The findings suggest that leveraging multi-signal plasticity can significantly enhance the adaptability and effectiveness of gradient-driven optimization in deep learning.
Methodology
NeuroPlastic augments gradient updates by using a modulation coefficient derived from three normalized signals: gradient magnitude, an exponential moving average of gradient activity, and a memory term based on the ratio of first- to second-moment estimates. This modulation factor scales the effective gradient update, allowing for more nuanced learning dynamics. The optimizer was tested against gradient-only and standard optimizers on various image classification benchmarks.
Results
NeuroPlastic consistently outperformed gradient-only updates across several benchmarks, with notable improvements on Fashion-MNIST and stability on CIFAR-10 using ResNet-18. The method demonstrated enhanced performance particularly in scenarios with limited data, suggesting its robustness and adaptability.
Implications
The findings indicate that incorporating multi-signal plasticity into optimization algorithms can lead to more effective learning in deep neural networks, especially in environments where learning signals are weak or noisy. This approach may pave the way for developing more sophisticated optimizers that better mimic biological learning processes.
Differentiable latent structure discovery for interpretable forecasting in clinical time series
Time Series
Interpretability
Optimization
- Introduction of StructGP for continuous-time multi-task Gaussian process modeling.
- Differentiable structure learning enables the discovery of interpretable dependency structures.
- LP-StructGP captures cross-patient progression patterns through latent pathways.
- Models demonstrate superior forecasting accuracy and uncertainty calibration on clinical datasets.
Read more
Differentiable latent structure discovery for interpretable forecasting in clinical time series
Summary
This paper presents StructGP, a continuous-time multi-task Gaussian process model designed for interpretable forecasting in clinical time series data derived from electronic health records (EHRs). The model integrates process convolutions with differentiable structure learning to uncover a sparse, ordered directed acyclic graph (DAG) that represents inter-variable dependencies while maintaining uncertainty quantification. An extension, LP-StructGP, incorporates latent pathways to capture shared, temporally shifted trajectories across patients. The models are trained under constraints of sparsity and acyclicity using scalable low-rank updates. In simulations, StructGP effectively recovers ground-truth graphs and pathway assignments, demonstrating superior performance in short-horizon forecasting compared to independent-task baselines and unstructured kernels. On a real-world MIMIC-IV septic shock cohort, StructGP shows significant improvements in forecasting accuracy and calibration. The results indicate that the proposed models provide interpretable, scalable, and well-calibrated forecasting for irregular clinical time series, which is crucial for timely clinical decision-making.
Methodology
The authors developed StructGP and LP-StructGP, utilizing process convolutions and differentiable structure learning to model dependencies in clinical time series. The models were trained using an augmented Lagrangian approach with Adam optimizer, incorporating sparsity and acyclicity constraints. The training involved scalable low-rank updates, with StructGP maximizing exact marginal likelihood and LP-StructGP employing an online conditional pseudo-marginal likelihood.
Results
In simulations, StructGP achieved a Structural Hamming Distance approaching 0 as cohort sizes increased, indicating effective recovery of ground-truth graphs. On the MIMIC-IV septic shock cohort, StructGP improved short-horizon forecasting (RMSE of 0.68 vs. 0.88) and outperformed unstructured kernels (0.63 vs. 3.02) in terms of calibration. For long-horizon predictions, LP-StructGP further reduced errors for key variables and improved overall coverage.
Implications
The proposed models can enhance clinical decision-making by providing interpretable and accurate forecasts from irregular EHR data, potentially reducing delays in critical care interventions. Their ability to model inter-variable dependencies and patient-level trajectories may lead to better understanding and management of patient conditions.
A Randomized PDE Energy driven Iterative Framework for Efficient and Stable PDE Solutions
Theory
Efficient ML
- Introduces a PDE energy-driven framework that avoids classical matrix-based solvers and data-driven training.
- Utilizes physically constrained diffusion iterations and Gaussian smoothing for evolving initial fields.
- Demonstrates stable convergence and accurate resolution of sharp gradients in various PDEs.
- Achieves competitive accuracy and stability compared to traditional numerical methods.
Read more
A Randomized PDE Energy driven Iterative Framework for Efficient and Stable PDE Solutions
Summary
This paper presents a novel framework for solving partial differential equations (PDEs) that emphasizes efficiency and stability without relying on traditional matrix-based discretization or costly data-driven training methods. The proposed PDE energy-driven framework utilizes physically constrained diffusion iterations combined with Gaussian smoothing to evolve arbitrary random initial fields. This approach enforces boundary conditions at each iteration, ensuring physical consistency. The authors apply this framework to one-dimensional Poisson, Heat, and viscous Burgers equations, demonstrating its capability to handle both steady-state and transient problems. The numerical results indicate stable convergence to unique physical solutions from random initializations, with effective resolution of sharp gradients and controlled Mean Squared Error (MSE) across various discretization parameters. Comparisons with analytical solutions reveal that the framework achieves competitive accuracy and stability, positioning it as a fast, flexible, and physically consistent alternative to traditional numerical solvers, with potential applications in both research and engineering.
Methodology
The authors developed a PDE energy-driven iterative framework that evolves random initial fields through implicit iterations while enforcing boundary conditions. The method integrates Gaussian smoothing to ensure stability and accuracy in the solution process.
Results
The framework demonstrated stable convergence to unique physical solutions from random initializations, with effective handling of sharp gradients and controlled MSE across a range of discretization parameters. The results were validated against analytical solutions, showing competitive accuracy and stability.
Implications
This framework provides a promising alternative to traditional numerical solvers, potentially enhancing the efficiency and stability of PDE solutions in various scientific and engineering applications, particularly in dynamic and real-time predictive environments.
A Unified Framework of Hyperbolic Graph Representation Learning Methods
Graph Learning
- Introduction of HypeGRL, a unified framework for hyperbolic graph representation learning.
- Facilitates reproducible research and systematic evaluation of hyperbolic embedding methods.
- Experimental evaluation highlights performance differences in link prediction and node classification tasks.
- Provides practical insights into the strengths and limitations of existing hyperbolic GRL approaches.
Read more
A Unified Framework of Hyperbolic Graph Representation Learning Methods
Summary
This paper presents HypeGRL, an open-source framework designed to unify various hyperbolic graph representation learning (GRL) methods. Hyperbolic geometry is highlighted for its effectiveness in representing complex networks due to its ability to capture hierarchical structures and heterogeneous connectivity patterns with low-dimensional embeddings. Despite the growing number of hyperbolic GRL methods, their practical application has been hindered by fragmented implementations and a lack of standardized evaluation tools. HypeGRL addresses these issues by providing a consistent optimization interface, visualization tools, and evaluation metrics, facilitating reproducible research. The authors conduct a comprehensive experimental study using HypeGRL to evaluate the performance of hyperbolic embedding methods on two key tasks: link prediction and node classification across real-world networks. The findings reveal systematic differences in computational costs, representation efficiency, and task-dependent performance among the methods, offering valuable insights for researchers in selecting appropriate techniques for their specific applications.
Methodology
The authors developed HypeGRL as an open-source Python framework that integrates multiple hyperbolic GRL methods under a common optimization interface. This framework allows for consistent training, visualization, and evaluation of hyperbolic embeddings. The experimental study involved assessing the performance of various hyperbolic embedding methods on real-world networks, focusing on link prediction and node classification tasks.
Results
The experimental results demonstrated that hyperbolic embedding methods can achieve comparable or superior performance to Euclidean counterparts while often requiring lower-dimensional representations. The study also revealed systematic differences in computational efficiency and effectiveness across different methods, providing insights into their practical applicability.
Implications
The introduction of HypeGRL could significantly enhance the adoption of hyperbolic geometry in graph learning tasks, enabling researchers to conduct reproducible experiments and make informed decisions when selecting embedding methods. This framework may lead to improved performance in various applications such as social network analysis, recommendation systems, and biological network modeling.
BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning
Computer Vision
Efficient ML
- BrainDINO is a self-supervised model that generalizes across diverse brain MRI tasks.
- It was trained on a large dataset of 6.6 million unlabeled axial slices from 20 different sources.
- The model outperformed existing self-supervised learning baselines, especially under conditions of limited labeled data.
- BrainDINO's representations are anatomically organized and pathology-sensitive, enhancing its clinical applicability.
Read more
BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning
Summary
The paper introduces BrainDINO, a self-supervised foundation model designed for brain MRI representation learning that generalizes across various clinical tasks. Traditional learning methods in brain MRI are often task-specific and require extensive labeled data, which is not always available. BrainDINO addresses this by utilizing a self-distilled framework trained on approximately 6.6 million unlabeled axial slices from 20 diverse datasets. This model demonstrates the ability to transfer learned representations across multiple tasks, including tumor segmentation, classification of neurodegenerative and neurodevelopmental conditions, brain age estimation, and survival modeling, among others. The authors found that BrainDINO consistently outperformed existing self-supervised baselines, particularly in scenarios with limited labeled data. The representation learned by BrainDINO is anatomically organized and sensitive to pathology, indicating its potential for robust and efficient brain imaging analysis without the need for extensive task-specific fine-tuning. This work establishes a scalable foundation for clinical applications in neuroimaging, highlighting the effectiveness of large-scale self-supervised learning in creating unified representations that can adapt to various clinical endpoints.
Methodology
BrainDINO employs a self-distillation framework similar to DINOv3, optimizing for both global semantic alignment and local structural consistency through masked patch-token prediction and multi-scale cropping. The model is pretrained on a large-scale dataset of unlabeled brain MRI slices, allowing it to learn transferable representations without requiring full-network fine-tuning for downstream tasks.
Results
The evaluation of BrainDINO showed that it consistently matched or exceeded the performance of both natural-image and MRI-specific self-supervised baselines across various tasks and data regimes. The model demonstrated strong performance in tumor segmentation, neurodevelopmental and neurodegenerative classification, brain age estimation, and survival modeling, particularly excelling in scenarios with limited labeled data.
Implications
The findings suggest that BrainDINO can serve as a foundational model for various neuroimaging applications, potentially improving the efficiency and effectiveness of clinical analyses in brain MRI. Its ability to generalize across tasks without extensive retraining could lead to advancements in personalized medicine and the development of more robust diagnostic tools.
Mind the Gap: Structure-Aware Consistency in Preference Learning
NLP
Large Language Models
Theory
- Standard surrogate minimization in preference learning can lead to vacuous consistency guarantees.
- The authors introduce a margin-shifted ranking framework to enforce H-consistency in preference learning.
- SA-DPO adapts margins based on semantic distances, improving handling of synonyms and ambiguous pairs.
- The Margin-Capacity Profile quantifies the trade-off between theoretical consistency and model capacity.
Read more
Mind the Gap: Structure-Aware Consistency in Preference Learning
Summary
This paper addresses the theoretical inconsistencies in preference learning methods, particularly those used for aligning Large Language Models (LLMs) with human intent. The authors critique the prevalent Direct Preference Optimization (DPO) approach, which minimizes surrogate losses as proxies for the true pairwise ranking loss. They demonstrate that standard surrogates can yield vacuous generalization guarantees for equicontinuous hypothesis sets typical of neural networks. To overcome this issue, the authors propose a margin-shifted ranking framework that introduces H-consistency bounds dependent on a separation margin. They further develop a novel objective called Structure-Aware DPO (SA-DPO), which adapts the margin based on the semantic distance between responses, effectively handling synonyms and hard pairs. The paper also introduces the Margin-Capacity Profile, analyzing the trade-off between consistency and model limitations, and shows that heavy-tailed surrogates provide better consistency guarantees than standard logistic loss. Overall, the work bridges theoretical insights with practical applications in preference learning, offering a principled approach to improve model alignment.
Methodology
The authors formulate LLM preference learning as a pairwise ranking problem and derive H-consistency bounds for margin-shifted surrogates. They introduce the SA-DPO objective, which dynamically adjusts margins based on semantic distances, and analyze the Margin-Capacity Profile to understand the implications of margin enforcement on model performance.
Results
The paper proves that unconstrained surrogate minimization leads to vacuous consistency bounds and establishes that a confidence gap is essential for H-consistency. The introduction of SA-DPO shows improved performance in managing synonyms and hard pairs, while the analysis of the Margin-Capacity Profile reveals that heavy-tailed losses provide superior consistency guarantees compared to logistic loss.
Implications
The findings have significant implications for the alignment of LLMs with human preferences, suggesting that adopting structure-aware methods can enhance model performance and reliability. The theoretical foundations laid out in this work can guide future research in preference learning and model optimization.
A Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms
Multimodal
- Developed a multimodal machine learning framework for LVEF assessment using ECG and EHR data.
- Achieved high classification performance across four LVEF categories, outperforming unimodal models.
- Utilized SHAP for feature attribution to enhance model explainability.
- Demonstrated potential for practical application in resource-limited healthcare settings.
Read more
A Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms
Summary
This paper presents a novel multimodal machine learning framework for assessing left ventricular ejection fraction (LVEF) from electrocardiograms (ECGs) and electronic health records (EHRs). Traditional LVEF assessment relies on echocardiography, which is often inaccessible in resource-limited settings. The authors developed a model that combines engineered features from 12-lead ECG time series with structured EHR data to classify LVEF into four clinically relevant categories: normal (≥50%), mildly reduced (40–50%), moderately reduced (30–40%), and severely reduced (<30%). Using retrospective data from Hartford HealthCare, they trained an XGBoost model on 36,784 ECG-echocardiogram pairs from 30,952 outpatients and validated it on 19,966 ECGs from a subsequent period. The multimodal model demonstrated superior performance compared to unimodal baselines, achieving one-vs-rest AUROCs of 0.95 for severe, 0.92 for moderate, 0.82 for mild, and 0.91 for normal LVEF. The study emphasizes the importance of explainability, utilizing SHAP attributions to identify influential features, thereby supporting clinical decision-making. The findings suggest that this ECG-based approach can serve as a practical screening tool to prioritize further imaging in settings where resources are constrained.
Methodology
The authors employed a multimodal approach that integrates engineered features from 12-lead ECG time series with structured EHR variables. They utilized an XGBoost classifier to model LVEF stratification into four categories, focusing on computational efficiency and clinical deployment. The model was trained on a large dataset of ECG-echocardiogram pairs and validated on a separate temporal dataset.
Results
The multimodal model achieved one-vs-rest AUROCs of 0.95 for severe LVEF reduction, 0.92 for moderate, 0.82 for mild, and 0.91 for normal LVEF. The model outperformed both ECG-only and EHR-only baselines, with the largest improvement noted in the moderate LVEF category. The study also highlighted the model's robustness under temporal validation.
Implications
This research supports the use of ECG-based multimodal approaches for LVEF stratification, which could facilitate early detection and triage of heart failure in diverse healthcare settings, particularly where echocardiography is not readily available. It underscores the potential for AI to enhance clinical decision-making and improve patient outcomes.
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control
Large Language Models
Reinforcement Learning
Theory
- Introduces Entrocraft for precise entropy control in RL training of LLMs.
- Theoretical insights connect entropy changes to advantage distributions.
- Linear annealing of entropy schedules is found to be the most effective.
- Empirical results show significant improvements in generalization and output diversity.
Read more
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control
Summary
This paper addresses the issue of performance saturation in reinforcement learning (RL) for large language models (LLMs), which is characterized by the collapse of entropy during training. Traditional methods to prevent entropy collapse, such as regularization and clipping, often lead to instability in entropy curves, hindering performance improvements. The authors introduce 'Entrocraft', a novel rejection-sampling approach that allows for precise control over the entropy curve by biasing advantage distributions without the need for objective regularization. Theoretical analysis reveals a negative relationship between entropy changes and advantages, emphasizing the impact of model confidence. Through systematic experimentation, the authors determine that a linear annealing schedule for entropy performs best. Empirical results demonstrate that Entrocraft significantly enhances generalization, output diversity, and prolongs the training duration before performance plateaus, with a 4B model outperforming an 8B baseline and achieving a 50% increase in pass@K metrics.
Methodology
The authors propose a rejection-sampling method called Entrocraft that filters rollouts based on current entropy levels, allowing for user-customized entropy schedules. This method modifies the advantage distribution to achieve desired entropy levels without requiring regularization, making it compatible with existing RL algorithms. Theoretical analysis supports the design by linking entropy changes to advantages under minimal assumptions.
Results
Entrocraft demonstrates substantial improvements in RL performance, with a 4B model surpassing an 8B baseline in generalization. The method sustains performance improvements for up to four times longer before plateauing and increases pass@K metrics by 50% compared to the baseline. The linear annealing schedule for entropy is identified as the most effective approach.
Implications
The findings suggest that precise control over entropy can significantly enhance the training of LLMs using RL, potentially leading to more effective models that better align with human preferences and exhibit improved reasoning capabilities. This could have broad applications in NLP tasks requiring complex decision-making.
Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift
Reinforcement Learning
Computer Vision
Robotics
- OOD detection alone is insufficient for effective adaptation in visual MBRL under dynamics shift.
- JEPA-Indexed Local Expert Growth separates problem indexing from action correction, preserving baseline controller performance.
- The proposed method demonstrates significant OOD improvements while maintaining in-distribution performance.
- Learned experts can be reused for recurring shifts, supporting incremental knowledge growth.
Read more
Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift
Summary
This paper addresses the challenges faced by visual model-based reinforcement learning (MBRL) agents when encountering distribution shifts. While detecting such shifts is relatively straightforward, adapting to them effectively proves to be a significant hurdle. The author critiques common strategies for responding to shifts, such as planning penalties and direct policy adaptation, which often fail to enhance closed-loop control or may even degrade in-distribution performance. To tackle this issue, the paper introduces a novel approach called JEPA-Indexed Local Expert Growth. This method utilizes a frozen Joint Embedding Predictive Architecture (JEPA) for problem indexing while employing cluster-specific residual experts to provide localized action corrections without altering the baseline controller. Experimental results demonstrate that this approach yields statistically significant out-of-distribution (OOD) improvements across various shift conditions while maintaining strong in-distribution performance. Additionally, the learned experts are shown to be reusable for recurring shifts, emphasizing the concept of incremental knowledge growth rather than complete retraining. The findings suggest that effective adaptation in visual MBRL hinges on the ability to apply appropriate local action corrections after recognizing a shift.
Methodology
The paper employs a systematic empirical study to evaluate various strategies for responding to distribution shifts in visual MBRL. It introduces JEPA-Indexed Local Expert Growth, which uses a frozen JEPA representation for indexing and local experts for action correction. The approach is tested on the DMControl walker-walk task under torso-mass shifts.
Results
The results indicate that the JEPA-Indexed Local Expert Growth method significantly improves OOD performance across four evaluated shift conditions while preserving in-distribution performance. The learned experts remain effective for subsequent encounters with the same shift, highlighting the method's capability for incremental adaptation.
Implications
This research has implications for the development of more robust reinforcement learning agents capable of adapting to changing environments without compromising their performance on previously learned tasks. It suggests a modular approach to adaptation that could be beneficial in various real-world applications where distribution shifts are common.
Super-resolution Multi-signal Direction-of-Arrival Estimation by Hankel-structured Sensing and Decomposition
Robotics
Optimization
Theory
- Introduction of a Hankel-structured sensing framework for DoA estimation.
- L2-norm estimator achieves maximum-likelihood optimality in Gaussian noise.
- L1-norm estimator shows robustness in Laplace noise, suitable for real-world applications.
- Extensive simulations demonstrate significant improvements in super-resolution capabilities.
Read more
Super-resolution Multi-signal Direction-of-Arrival Estimation by Hankel-structured Sensing and Decomposition
Summary
This paper presents a novel framework for super-resolution multi-signal direction-of-arrival (DoA) estimation, addressing the challenges posed by hardware-constrained spatial sampling in modern autonomous systems. The authors introduce Hankel-structured sensing and data matrix decomposition methods that operate under both L2 and L1-norm formulations. The L2-norm estimator is shown to be maximum-likelihood optimal in the presence of white Gaussian noise, while the L1-norm estimator excels in scenarios with independent, identically distributed isotropic Laplace noise, demonstrating robustness against impulsive interference and corrupted measurements. Through extensive simulations, the proposed methods exhibit superior super-resolution capabilities, requiring lower signal-to-noise ratios (SNR) and achieving higher resolution probabilities compared to existing approaches. This work is particularly relevant for applications in wireless communications, remote sensing, and real-time localization in challenging environments.
Methodology
The authors developed a framework based on Hankel-structured sensing and matrix decomposition techniques. They formulated two estimators: one based on the L2-norm for Gaussian noise and another based on the L1-norm for Laplace noise. The performance of these estimators was evaluated through extensive simulations to assess their super-resolution capabilities and robustness against noise.
Results
The proposed L2-norm estimator was found to be maximum-likelihood optimal under white Gaussian noise conditions, while the L1-norm estimator demonstrated robustness in the presence of impulsive noise. The simulations indicated that the proposed methods required significantly lower SNR and achieved higher resolution probabilities than competing methods, showcasing their effectiveness in real-world scenarios.
Implications
This research has significant implications for the design of DoA estimation systems in various applications, including wireless communications, autonomous systems, and real-time tracking in complex environments. The ability to operate effectively with limited data and in the presence of noise enhances the feasibility of deploying these systems in practical scenarios.
Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction
Theory
Optimization
Efficient ML
- Introduction of Linear-Core Surrogates that combine smoothness with linear consistency rates.
- Proven differentiability and strict linear H-consistency bounds for the proposed loss functions.
- Significant computational advantages in structured prediction, allowing for unbiased stochastic gradient estimation.
- Empirical results show a 23× speedup over Structured SVMs and improved robustness to label noise.
Read more
Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction
Summary
This paper addresses the trade-off in selecting loss functions for classification tasks, particularly between smooth losses that allow for fast optimization but have slow consistency bounds, and piecewise-linear losses that offer fast consistency rates but are non-differentiable. The authors introduce Linear-Core (LC) Surrogates, a novel family of convex loss functions that combine a linear core with a smooth tail, achieving differentiability while maintaining linear H-consistency bounds. This approach not only enhances optimization efficiency but also provides significant computational advantages in structured prediction tasks, where traditional methods suffer from high complexity. The proposed LC surrogates enable an unbiased stochastic gradient estimator that circumvents the quadratic complexity of exact inference, resulting in substantial speed improvements in practical applications. Empirical results demonstrate a 23× speedup over Structured SVMs in large-vocabulary sequence tagging and improved robustness against instance-dependent label noise, outperforming Cross-Entropy on corrupted CIFAR-10 datasets.
Methodology
The authors construct Linear-Core Surrogates by stitching a linear core to a smooth tail, ensuring differentiability and linear H-consistency. They extend this framework to structured prediction, developing a stochastic optimization algorithm that reduces computational complexity. Theoretical analyses are provided to establish the consistency bounds, and empirical evaluations are conducted on various datasets to demonstrate performance improvements.
Results
The proposed Linear-Core Surrogates achieve a 23× speedup over Structured SVMs on large-vocabulary sequence tagging tasks and show a 2.6% improvement over Cross-Entropy on corrupted CIFAR-10 datasets, highlighting both efficiency and robustness.
Implications
The findings suggest that Linear-Core Surrogates can be effectively utilized in various classification and structured prediction tasks, offering a balance between optimization efficiency and statistical consistency. This could lead to advancements in applications requiring fast and reliable predictions, such as natural language processing and computer vision.
Strait: Perceiving Priority and Interference in ML Inference Serving
Efficient ML
Optimization
Theory
- Strait enhances deadline satisfaction for dual-priority inference traffic under high GPU utilization.
- The system models data transfer and kernel execution interference to improve latency estimation.
- Priority-aware scheduling is implemented to differentiate handling of high and low-priority tasks.
- Strait reduces deadline violations for high-priority tasks significantly while keeping low-priority task performance acceptable.
Read more
Strait: Perceiving Priority and Interference in ML Inference Serving
Summary
The paper introduces Strait, a machine learning inference serving system aimed at improving deadline satisfaction for dual-priority inference traffic under high GPU utilization. Existing systems often struggle with task prioritization and accurate latency estimation, particularly in on-premises scenarios where the urgency of tasks can vary significantly. Strait addresses these challenges by modeling potential contention during data transfer and kernel execution interference through an adaptive prediction model. This model allows Strait to perform priority-aware scheduling, ensuring that high-priority tasks are handled more effectively while minimizing the impact on low-priority tasks. The authors demonstrate that Strait can reduce deadline violations for high-priority tasks by 1.02 to 11.18 percentage points compared to traditional methods, while maintaining acceptable performance for lower-priority tasks. The evaluation highlights the system's ability to adapt to varying workloads and GPU characteristics, making it a robust solution for real-time inference serving.
Methodology
Strait employs an adaptive prediction model to estimate interference during data transfer and kernel execution. It integrates this model into a priority-aware scheduling algorithm that accounts for task priority levels and runtime overheads when making scheduling decisions.
Results
Evaluation results indicate that Strait reduces deadline violations for high-priority tasks by 1.02 to 11.18 percentage points compared to existing approaches, while incurring acceptable performance costs for low-priority tasks. The system also demonstrates more equitable performance compared to software-defined preemption methods.
Implications
Strait's approach can significantly enhance the reliability and efficiency of ML inference serving systems in on-premises environments, particularly in applications where task prioritization is critical, such as industrial monitoring and quality control.
Predicting Covariate-Driven Spatial Deformation for Nonstationary Gaussian Processes
Theory
- Introduces a covariate-driven approach to model spatial deformation in nonstationary Gaussian processes.
- Establishes a connection between diffeomorphic deformations and covariate vectors using velocity fields in a Lie algebra.
- Develops an efficient estimation-inference algorithm for out-of-sample predictions.
- Demonstrates the method's effectiveness through simulation and case studies in manufacturing and geostatistics.
Read more
Predicting Covariate-Driven Spatial Deformation for Nonstationary Gaussian Processes
Summary
This paper addresses the limitations of traditional nonstationary Gaussian processes (GPs) in modeling complex spatial data influenced by local covariates. The authors propose a novel approach that models spatial deformation as a function of covariates, enhancing the predictive capability of the deformation method. By connecting diffeomorphic deformations with Euclidean covariate vectors through velocity fields in a Lie algebra, the authors establish a concise functional form for deformations. They also tackle the challenge of high-order interactions among covariates by proving that these can be truncated under reasonable physical assumptions. An efficient estimation-inference algorithm is developed for out-of-sample nonstationary GP predictions, even with limited covariate-deformation sample pairs. The effectiveness of the proposed method is validated through simulation studies and two real-world case studies in manufacturing and geostatistics, demonstrating its generalizability and practical applicability.
Methodology
The authors model spatial deformation as a function of covariates, utilizing velocity fields from Lie algebra to connect diffeomorphic deformations with covariate vectors. They derive a functional form for the deformations and develop an estimation-inference algorithm suitable for limited data scenarios.
Results
The proposed method shows significant improvements in predicting nonstationary Gaussian processes across new spatial domains, outperforming traditional methods. The simulation and case studies confirm its effectiveness and generalizability in real-world applications.
Implications
This research has potential applications in various fields requiring spatial data analysis, such as environmental monitoring, manufacturing quality control, and geostatistics, where understanding and predicting spatial nonstationarity is crucial.
Momentum-Conserving Graph Neural Networks for Deformable Objects
Graph Learning
Robotics
- Introduction of MomentumGNN, a GNN architecture that conserves momentum.
- Utilization of per-edge impulses to ensure accurate momentum tracking.
- Layer-by-layer architecture for sequential updates of vertex positions.
- Unsupervised training using a physics-based loss function.
Read more
Momentum-Conserving Graph Neural Networks for Deformable Objects
Summary
This paper introduces MomentumGNN, a novel graph neural network (GNN) architecture specifically designed to accurately model the dynamics of deformable materials while conserving momentum. Traditional GNNs struggle with momentum conservation, often leading to non-physical behaviors in simulations. MomentumGNN addresses this by predicting per-edge stretching and bending impulses instead of unconstrained nodal accelerations, ensuring the preservation of both linear and angular momentum. The architecture employs a layer-by-layer approach where each layer updates vertex positions using momentum-conserving impulses, enhancing the model's physical fidelity. The network is trained in an unsupervised manner using a physics-based loss function, demonstrating superior performance over existing methods in scenarios where momentum is critical, such as in free motion and collisions. The proposed model shows promise for applications in various fields, including computer graphics, soft robotics, and simulation-based training.
Methodology
The authors modify the MeshGraphNets architecture by replacing per-vertex decoders with per-edge decoders that predict momentum-conserving impulses. The network is trained using a physics-based loss function in an unsupervised manner, allowing it to learn the dynamics of deformable objects while enforcing physical constraints.
Results
MomentumGNN demonstrates improved accuracy in simulating the dynamics of deformable materials, particularly in preserving linear and angular momentum. The model outperforms baseline methods in various scenarios, effectively addressing issues of drift and unnatural spin that are common in existing GNN approaches.
Implications
The proposed MomentumGNN architecture has significant implications for real-time simulations in computer graphics, soft robotics, and other applications where accurate modeling of deformable objects is crucial. Its ability to conserve momentum can enhance the realism and reliability of simulations in video games, animated films, and training environments.
STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
Graph Learning
Time Series
Optimization
- STLGT improves tail latency prediction accuracy by 8.5% MAPE compared to PERT-GNN.
- Achieves up to 12× faster CPU inference at scale, enhancing efficiency.
- Utilizes a linear graph transformer to model cross-service dependencies effectively.
- Incorporates a decoupled temporal module for better handling of workload dynamics.
Read more
STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
Summary
The paper introduces STLGT (Scalable Trace-based Linear Graph Transformer), a novel approach for predicting tail latency in microservices, which is crucial for proactive service level objective (SLO) management. Traditional methods struggle with modeling long-range dependencies and handling non-stationary, bursty workloads efficiently. STLGT addresses these challenges by encoding traces as span graphs and employing a structure-aware linear graph transformer to propagate cross-service dependencies, ensuring linear inference time relative to the span graph size. Additionally, it incorporates a decoupled temporal module to effectively capture workload dynamics. The authors validate STLGT on a personalized education microservice application and various open-source benchmarks, demonstrating significant improvements in forecasting accuracy over existing methods, such as PERT-GNN, with an average reduction of 8.5% in mean absolute percentage error (MAPE) and achieving up to 12 times faster CPU inference. Ablation studies confirm the effectiveness of STLGT's components, particularly under bursty traffic conditions, highlighting its potential for enhancing auto-scaling capabilities in cloud-native environments.
Methodology
STLGT employs a structure-aware linear graph transformer to encode traces as span graphs, allowing for efficient propagation of dependencies across services. It also integrates a decoupled temporal module to capture the dynamics of workload variations, addressing the challenges of bursty traffic and long-range dependency modeling.
Results
The results indicate that STLGT outperforms existing methods in terms of forecasting accuracy and inference speed. Specifically, it achieves an average 8.5% improvement in MAPE over PERT-GNN and up to 12 times faster inference times, demonstrating its scalability and efficiency in real-world applications.
Implications
The findings suggest that STLGT can significantly enhance proactive auto-scaling strategies in microservice architectures, reducing the risk of SLO violations and improving overall system performance. Its ability to handle bursty workloads makes it particularly relevant for cloud-native applications, such as digital education platforms.
Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts
Theory
Multimodal
- Unweighted evaluation metrics can mislead performance assessments in imbalanced classification due to within-class heterogeneity.
- The proposed predicted-weighted balanced accuracy (pBA) metric utilizes predicted posterior probabilities to provide a more accurate evaluation.
- Empirical results show that pBA outperforms traditional metrics in scenarios with uneven subconcept distributions.
- The study emphasizes the importance of considering minority subconcept performance for reliable model deployment.
Read more
Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts
Summary
This paper addresses the issue of performance estimation bias in imbalanced classification, particularly focusing on the disparities among minority subconcepts within the same class. Traditional evaluation metrics often fail to capture the heterogeneity within classes, leading to misleading conclusions about model performance. The authors propose a novel utility-weighted evaluation metric, termed predicted-weighted balanced accuracy (pBA), which utilizes predicted posterior probabilities from a multiclass subconcept model to replace unavailable subconcept labels at test time. This approach allows for a more nuanced assessment of model performance, particularly in scenarios where class distributions are uneven. The paper presents empirical evaluations across various datasets, including medical imaging and text, demonstrating that pBA offers more reliable insights into model performance compared to standard unweighted measures. The findings suggest that unweighted scores can mask significant performance disparities, and the proposed method provides a more stable and interpretable evaluation framework that is sensitive to the complexities of within-class heterogeneity.
Methodology
The authors developed a utility-weighted evaluation framework that replaces unavailable test-time subconcept labels with predicted posterior probabilities from a multiclass subconcept model. Evaluation weights are computed as expected utility based on these probabilities, leading to the proposed pBA metric. The methodology was tested on various datasets, transforming multiclass problems into binary class-imbalance scenarios while retaining subconcept knowledge.
Results
The experiments revealed that standard unweighted scores often overestimate model performance, particularly in the presence of class heterogeneity. In contrast, the pBA metric provided more stable and interpretable assessments, effectively correcting the bias towards larger minority subconcepts. The results indicated that utility weighting significantly improves the reliability of performance evaluations.
Implications
The findings of this study have significant implications for the deployment of machine learning models in critical applications, such as medical diagnostics, where performance on minority subpopulations is crucial. By adopting the proposed evaluation framework, practitioners can make more informed decisions and mitigate risks associated with underperformance on important subgroups.
Efficient and Interpretable Transformer for Counterfactual Fairness
Efficient ML
Interpretability
Theory
- Introduction of FCorrTransformer, an efficient and interpretable model for tabular data.
- Development of Counterfactual Attention Regularization (CAR) to enforce fairness without causal assumptions.
- Empirical results show strong performance in counterfactual fairness and predictive accuracy.
- The approach addresses bias at a structural level rather than through feature exclusion.
Read more
Efficient and Interpretable Transformer for Counterfactual Fairness
Summary
This paper addresses the challenges of achieving predictive performance, interpretability, and regulatory fairness in machine learning models, particularly in high-stakes domains like finance and insurance. The authors propose the Feature Correlation Transformer (FCorrTransformer), an attention-light architecture designed for tabular data, which interprets the attention matrix as pairwise feature dependencies. This enhances both interpretability and efficiency. Additionally, they introduce Counterfactual Attention Regularization (CAR), a framework that promotes counterfactually fair predictions by enforcing group-invariant representations of sensitive features at the attention level. The proposed methods do not rely on explicit causal assumptions, making them more practical in real-world applications. Empirical evaluations demonstrate that FCorrTransformer combined with CAR achieves strong counterfactual fairness while maintaining competitive predictive performance and significantly reducing model complexity compared to standard transformer-based approaches. This work bridges the gap between fairness theory and machine learning, providing a practical framework for responsible AI in regulated industries.
Methodology
The authors developed the FCorrTransformer architecture, which utilizes an attention-light mechanism to model feature dependencies in tabular data. They introduced CAR to regularize attention mechanisms, ensuring that sensitive features do not lead to biased predictions. The methodology emphasizes interpretability and efficiency while maintaining fairness.
Results
The empirical evaluations indicated that the FCorrTransformer with CAR achieved strong counterfactual fairness metrics while also delivering competitive predictive performance. The model complexity was significantly reduced compared to traditional transformer-based models, demonstrating the effectiveness of the proposed approach.
Implications
This research has significant implications for the deployment of machine learning in regulated industries, as it provides a framework that balances predictive accuracy with fairness and interpretability. It can be applied in sectors such as finance and insurance, where compliance with fairness regulations is critical.
Privacy-Preserving Federated Learning Framework for Distributed Chemical Process Optimization
Federated Learning
Optimization
Time Series
- Application of federated learning to distributed chemical process systems.
- Development of a cross-plant learning framework for heterogeneous environments.
- Incorporation of secure parameter aggregation mechanisms for data protection.
- Demonstration of improved prediction accuracy through experimental evaluations.
Read more
Privacy-Preserving Federated Learning Framework for Distributed Chemical Process Optimization
Summary
This paper presents a novel privacy-preserving federated learning (FL) framework aimed at optimizing chemical processes across distributed plants while adhering to strict data confidentiality requirements. The authors highlight the challenges faced in industrial settings, such as data heterogeneity, communication overhead, and privacy risks, which have hindered the broader application of FL. The proposed framework allows each plant to locally train a neural network model using its own time-series sensor data, transmitting only model parameters to a central server for aggregation through secure mechanisms. This approach facilitates knowledge sharing across plants without compromising sensitive operational data. Experimental evaluations conducted on datasets from three independent chemical plants demonstrate the framework's effectiveness, showing rapid convergence of the federated model and significant improvements in prediction accuracy compared to local-only training. The findings suggest that federated learning can serve as a scalable and effective solution for collaborative industrial analytics, enabling privacy-preserving predictive modeling and process optimization in the chemical industry.
Methodology
The methodology involves a federated learning framework where each chemical plant trains a local neural network model using its own data. Only the model parameters are sent to a central aggregation server, which combines these parameters securely. The framework addresses challenges like data heterogeneity and communication overhead while ensuring privacy through secure aggregation techniques.
Results
The experimental results indicate that the federated model achieved a global mean squared error reduction from approximately 2369 to below 50 within the first five communication rounds, stabilizing around 35 after 40 rounds. The federated framework significantly outperformed local-only training in terms of prediction accuracy, achieving performance comparable to centralized training.
Implications
The proposed framework has significant implications for the chemical industry, allowing for collaborative model training across geographically separated plants while maintaining data confidentiality. This could lead to enhanced predictive modeling and optimization of chemical processes, ultimately improving operational efficiency and safety.
Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning
Reinforcement Learning
Robotics
Theory
- Introduction of Self-Alignment for Safety (SAS) for test-time adaptation in offline safe RL.
- Utilization of Lyapunov stability as an occupancy-measure criterion for ensuring safety.
- Hierarchical RL interpretation through Bayesian inference over latent skills.
- Empirical results show SAS outperforms existing safe RL methods, reducing costs and failures.
Read more
Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning
Summary
This paper addresses the challenge of ensuring safety in offline reinforcement learning (RL) agents during deployment, where discrepancies between training datasets and real-world environments can lead to unsafe behaviors. The authors propose a novel framework called Self-Alignment for Safety (SAS), which utilizes a transformer-based architecture to facilitate test-time adaptation without requiring retraining. The core mechanism of SAS is self-alignment, where the pretrained agent generates multiple imagined trajectories and selects those that meet the Lyapunov stability condition. These selected trajectories serve as in-context prompts, guiding the agent's behavior towards safety. The framework effectively transforms Lyapunov-guided imagination into control-invariant prompts, allowing for a hierarchical interpretation of RL through Bayesian inference over latent skills. The authors demonstrate that SAS significantly reduces costs and failures while maintaining or improving returns across various benchmarks, including Safety Gymnasium and MuJoCo.
Methodology
The SAS framework employs a transformer architecture to generate imagined trajectories at test time. It identifies safe segments based on the Lyapunov condition and uses these segments as prompts to realign the agent's behavior towards safety. This approach leverages the principles of hierarchical reinforcement learning and Bayesian inference, allowing for effective adaptation without retraining.
Results
The empirical evaluation of SAS on Safety Gymnasium and MuJoCo benchmarks reveals that it consistently reduces operational costs and failure rates by up to two times while maintaining or enhancing the overall return of the RL agent. This demonstrates the effectiveness of the proposed self-alignment mechanism in ensuring safety during deployment.
Implications
The SAS framework has significant implications for the deployment of RL agents in real-world environments, particularly in scenarios where safety is paramount. By enabling safe test-time adaptation without retraining, SAS can facilitate the practical application of offline RL in various domains, including robotics and autonomous systems.
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
Graph Learning
- Classical machine learning models outperform larger models in many tasks related to molecular property prediction.
- Graph neural networks perform well but do not consistently surpass compact models.
- Pretrained molecular sequence models show limited effectiveness in this context.
- Performance is highly dependent on the specific task and data characteristics.
Read more
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
Summary
This paper investigates the effectiveness of larger models in drug discovery, particularly in predicting molecular properties and activities. The study challenges the prevailing notion that larger, pretrained models consistently outperform smaller, specialized models. It evaluates 22 molecular property and activity endpoints using a comprehensive dataset that includes public ADMET and Tox21 benchmarks, as well as internal datasets for anti-infective activity. The research employs a structure-similarity-separated five-fold cross-validation approach across 167,056 evaluations. Results indicate that classical machine learning models, such as Random Forest and ExtraTrees, outperform larger models in ten primary-metric tasks, while graph neural networks (GNNs) excel in nine tasks. Pretrained molecular sequence models show limited success, winning only three tasks. The findings suggest that compact, specialized models remain highly effective and that performance is often dependent on the alignment between molecular representation, inductive bias, data regime, and endpoint biology. Although larger models may offer advantages in zero-shot reasoning and SAR interpretation, they do not universally guarantee better predictive performance.
Methodology
The study utilizes a benchmark assessment involving 22 molecular property and activity endpoints, applying structure-similarity-separated five-fold cross-validation to evaluate various model types, including classical ML models, GNNs, and pretrained molecular sequence models.
Results
The results reveal that classical ML models win ten tasks, GNNs win nine, and pretrained models win three. Rule-based SAR reasoning models do not win under primary metrics but show some gains in SAR interpretation. Overall, compact models demonstrate superior predictive performance in many scenarios.
Implications
The findings suggest that drug discovery efforts should not solely rely on larger models but consider the specific context and characteristics of the data. Compact models may be more effective for certain tasks, and the alignment of model type with the task is crucial for optimal performance.
SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations
Multimodal
Robotics
Efficient ML
- SWAN is the first multimodal network that adapts resource allocation based on modality quality, sample complexity, and a user-defined budget.
- The QoI-aware controller optimally selects layer configurations while maintaining end-to-end differentiability using NeuralSort.
- The SkipGate module conditionally executes layers based on input features, enhancing efficiency.
- SWAN achieves significant reductions in computational load while maintaining high detection performance in autonomous driving tasks.
Read more
SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations
Summary
The paper presents SWAN (Sample and World-Aware Multimodal Network), a novel adaptive multimodal network designed to handle runtime variations in real-world environments, particularly in autonomous driving scenarios. Traditional networks struggle with fluctuations in modality quality, input complexity, and available computational resources, leading to inefficient resource utilization. SWAN addresses these challenges by employing a quality-aware controller that allocates resources among modalities based on a user-defined maximum budget. Additionally, it features an adaptive gating module that optimizes layer utilization according to sample complexity and a token dropping mechanism that filters out irrelevant multimodal features. The evaluation of SWAN on the nuScenes dataset demonstrates significant improvements in computational efficiency, achieving up to a 49% reduction in FLOPs with minimal performance degradation. This work highlights the importance of adapting to runtime variations and provides a comprehensive solution that integrates multiple adaptation strategies into a single framework.
Methodology
SWAN employs a layer-wise adaptive multimodal network architecture that integrates a quality-aware controller for resource allocation, an adaptive gating module for layer utilization, and a token dropping mechanism to filter out irrelevant features. The controller is trained using NeuralSort for effective layer prioritization based on input characteristics.
Results
SWAN outperformed related baselines by achieving up to 11.2 NDS and 13.7 mAP improvements while reducing FLOPs by up to 49% compared to fully-provisioned networks. The controller effectively prioritized high-QoI modalities under computational constraints, and the SkipGate and token pruning modules contributed to significant reductions in computation.
Implications
SWAN's approach to adaptive multimodal networks can enhance the performance and efficiency of autonomous vehicles and other applications requiring robust real-time processing in dynamic environments. Its methodologies could be applied to various domains where multimodal data is utilized, improving resource management and system responsiveness.
CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
NLP
Large Language Models
Efficient ML
- CoQuant introduces a joint weight-activation subspace projection method for mixed-precision quantization.
- The method balances activation and weight covariances to optimize the selection of high-precision subspaces.
- Extensive experiments show CoQuant outperforms strong PTQ baselines in perplexity and reasoning tasks.
- The approach addresses the limitations of existing methods that rely solely on activation statistics.
Read more
CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
Summary
The paper introduces CoQuant, a novel method for post-training quantization (PTQ) aimed at optimizing mixed-precision large language models (LLMs). Traditional mixed-precision methods primarily focus on activation statistics to construct high-precision subspaces, neglecting the joint influence of activation and weight quantization noise on output perturbation. CoQuant addresses this limitation by jointly modeling the quantization effects of both activations and weights. The authors derive a closed-form weighted PCA solution that balances the covariances of activations and weights to select the optimal high-precision subspace. Extensive experiments conducted on Llama-3.2 and Qwen2.5 models demonstrate that CoQuant consistently outperforms existing PTQ baselines in terms of perplexity and zero-shot reasoning accuracy. This highlights the effectiveness of joint weight-activation modeling for low-bit quantization in LLMs, providing a principled approach to enhancing inference efficiency without sacrificing model performance.
Methodology
CoQuant employs a theoretical model of expected output error to derive a closed-form solution based on weighted PCA. This method integrates both activation and weight statistics to jointly determine the high-precision subspace, thereby reducing quantization error more effectively than previous methods that focused only on activation statistics.
Results
The experimental results indicate that CoQuant achieves superior performance compared to existing PTQ methods, yielding lower perplexity scores and improved zero-shot reasoning accuracy across various model scales, demonstrating a better accuracy-efficiency trade-off.
Implications
The findings suggest that joint modeling of weight and activation statistics can significantly enhance the efficiency of LLMs, making CoQuant a promising approach for deploying large models in resource-constrained environments without compromising accuracy.
Statistical Channel Fingerprint Construction for Massive MIMO: A Unified Tensor Learning Framework
Optimization
Theory
Efficient ML
- Introduction of a unified tensor learning framework for constructing statistical channel fingerprints (sCF).
- Establishment of a relationship between channel spatial covariance matrix (CSCM) and channel power angular spectrum (CPAS).
- Development of LPWTNet architecture that utilizes Laplacian pyramid decomposition for efficient inference.
- Implementation of a shared mask learning strategy for adaptive refinement of high-frequency components.
Read more
Statistical Channel Fingerprint Construction for Massive MIMO: A Unified Tensor Learning Framework
Summary
This paper introduces a novel approach to constructing statistical channel fingerprints (sCF) for massive MIMO communication systems, focusing on the acquisition of statistical channel state information (sCSI). The authors establish a relationship between the channel spatial covariance matrix (CSCM) and the channel power angular spectrum (CPAS), leading to a unified tensor representation of the sCF. The proposed method, LPWTNet, employs a Laplacian pyramid decomposition and reconstruction framework, enhancing inference efficiency while capturing multi-scale frequency characteristics of the sCF. The architecture also incorporates a shared mask learning strategy to refine high-frequency components adaptively. Additionally, a small-kernel convolution mechanism based on wavelet transform is introduced to improve feature extraction without over-parameterization. Extensive experiments demonstrate that the proposed method achieves competitive reconstruction accuracy and computational efficiency compared to state-of-the-art techniques across various scenarios.
Methodology
The authors propose a unified tensor-based learning architecture called LPWTNet, which integrates a closed-form Laplacian pyramid decomposition and reconstruction framework. This approach replaces traditional encoder-decoder structures, allowing for efficient inference and capturing multi-scale characteristics of the sCF. The methodology also includes a shared mask learning strategy and a small-kernel convolution mechanism based on wavelet transform to enhance feature extraction.
Results
The proposed LPWTNet architecture shows significant improvements in reconstruction accuracy and computational efficiency across various scenarios for sCF construction, outperforming existing state-of-the-art methods.
Implications
The findings suggest that the proposed framework can effectively facilitate the acquisition of channel state information in massive MIMO systems, potentially leading to enhanced performance in next-generation communication networks, including 6G applications.
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
NLP
Large Language Models
- Introduces a trajectory-level measurement protocol for analyzing refusal geometry in language models.
- Demonstrates that R2D2 fine-tuning traces a robustness-utility frontier, showing that robust refusal cannot be solely evaluated by attack success.
- Provides evidence for the reorganization of refusal carriers rather than simple drift, with significant implications for model training.
- Causal interventions reveal that control over refusal is low-dimensional but closely linked to utility, challenging previous assumptions about independent refusal pathways.
Read more
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
Summary
This paper investigates the mechanisms behind robust refusal in safety-aligned language models, particularly focusing on how dynamic adversarial fine-tuning (R2D2) influences refusal geometry during training. The authors highlight the challenge of balancing the model's ability to refuse harmful requests while maintaining usability for benign requests. They present a measurement-driven study using a 7B backbone model, comparing standard supervised fine-tuning (SFT) with R2D2. The study employs a comprehensive protocol that integrates various evaluation metrics, including fixed-source HarmBench, StrongREJECT, and XSTest, alongside causal interventions. The findings reveal that R2D2 initially achieves a high level of refusal effectiveness but later experiences a partial recovery of benign utility, indicating a complex relationship between robustness and utility. The results suggest that refusal carriers reorganize rather than merely drift, with evidence showing that effective control is low-dimensional yet coupled with utility. Overall, the paper emphasizes the importance of understanding the trajectory of refusal geometry in enhancing the safety and usability of language models.
Methodology
The authors employed a measurement-driven approach, utilizing a combination of dense online monitoring, sparse five-anchor admissible-carrier analysis, and various evaluation metrics (HarmBench, StrongREJECT, XSTest). They conducted causal interventions and a direct benign-utility audit to track changes in refusal geometry during training.
Results
The study found that R2D2 fine-tuning significantly reduces attack success rates (ASR) early in training but leads to a partial recovery of benign utility later. The best admissible refusal carrier shifts from late-layer to early-layer during training, while effective rank remains stable. In contrast, SFT showed less robustness despite greater drift, indicating that raw drift does not correlate with improved performance.
Implications
The findings suggest that understanding the dynamics of refusal geometry can inform the design of safer language models. By recognizing the balance between robustness and usability, developers can create models that effectively refuse harmful requests without compromising their functionality in benign contexts.
PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
NLP
Large Language Models
Reinforcement Learning
- PAINT improves reasoning in LLMs by adapting the exposure of solution context based on rollout-reference overlap.
- The method employs sparse teacher energy interpolation to target specific token positions for better supervision.
- Empirical results show consistent gains over prior self-distillation methods and competitive performance against GRPO.
- PAINT achieves better rollout-token efficiency with shorter training rollouts compared to traditional methods.
Read more
PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
Summary
The paper introduces PAINT (Partial-solution Adaptive Interpolated Training), a novel training method aimed at enhancing the reasoning capabilities of large language models (LLMs) by providing more effective supervision during training. The authors argue that traditional methods such as reinforcement learning with verifiable rewards (RLVR) and supervised fine-tuning (SFT) either suffer from sparse rewards or rely on fixed trajectories, which may not align well with the model's inference states. PAINT addresses these issues by employing a contextual re-scoring approach that determines how much of the solution context should be revealed during training and which token positions should receive targeted supervision. The method combines overlap-adaptive solution masking with sparse teacher energy interpolation, allowing for a more nuanced training process that encourages broader reasoning. Empirical evaluations demonstrate that PAINT consistently outperforms existing self-distillation methods across multiple competition-level math benchmarks and various model scales, indicating its effectiveness in improving reasoning performance in LLMs.
Methodology
PAINT utilizes a contextual re-scoring framework to adaptively mask solution context based on the overlap between the model's rollouts and reference solutions. It applies energy-space interpolation selectively on token positions where there is a mismatch in entropy between the student and the privileged scorer, thus refining the training process to enhance reasoning capabilities.
Results
The implementation of PAINT across different scales of the Qwen3 model (8B, 4B, and 1.7B) resulted in macro Avg@12 improvements ranging from 0.8 to 2.1 points over the previous on-policy self-distillation baseline. It also matched or exceeded the performance of GRPO while requiring significantly less rollout budget.
Implications
The findings suggest that PAINT could be a valuable approach for enhancing the reasoning abilities of LLMs in various applications, particularly in complex mathematical and scientific tasks. This method may lead to more efficient training protocols and improved performance in real-world reasoning scenarios.
Optimized Deferral for Imbalanced Settings
NLP
Large Language Models
Efficient ML
- Introduces a novel cost-sensitive learning framework for deferral in imbalanced expert settings.
- Develops new margin-based loss functions and algorithms specifically for expert imbalance.
- Presents the MILD algorithm, which shows significant performance improvements in practical tasks.
- Demonstrates the effectiveness of the proposed methods through extensive empirical evaluations.
Read more
Optimized Deferral for Imbalanced Settings
Summary
This paper addresses the challenges of learning to defer in imbalanced expert settings, where certain experts are favored over others, leading to suboptimal performance. The authors propose a novel cost-sensitive learning framework that optimizes deferral loss by considering the imbalance among experts. They introduce new margin-based loss functions and develop algorithms tailored for these settings, specifically the MILD (Margin-based Imbalanced Learning to Defer) algorithm. The study demonstrates the effectiveness of these methods through extensive experiments on image classification and real-world Large Language Model (LLM) routing tasks, showing significant improvements over existing baselines. The findings highlight the importance of addressing expert imbalance to enhance the performance of deferral algorithms in various applications.
Methodology
The authors cast the deferral loss optimization as a cost-sensitive learning problem, deriving new margin-based loss functions. They develop the MILD algorithm, which is designed to effectively route inputs to the most suitable experts while accounting for the imbalance in expert performance. The methodology includes extensive empirical testing on both synthetic and real-world datasets to validate the proposed approach.
Results
The experiments reveal that the MILD algorithm significantly outperforms existing baselines in both image classification and LLM routing tasks. The results indicate that addressing expert imbalance leads to improved accuracy and resource efficiency, demonstrating the practical applicability of the proposed methods.
Implications
The findings of this paper have significant implications for various domains, including natural language processing, medical diagnosis, and computer vision, where effective expert selection can enhance accuracy and reduce computational costs. The proposed methods can be applied to improve decision-making processes in systems that rely on multiple specialized experts.
Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture
NLP
Large Language Models
Generative Models
- The Dual-Stream Memory Architecture effectively separates patient narratives from clinical records, enhancing data integrity.
- The Reconciliation Engine actively flags discrepancies, allowing for timely clinical evaluations rather than silent updates.
- The study quantifies a 13.6% error cascade, identifying upstream memory extraction errors as critical bottlenecks in clinical AI pipelines.
- Continuous validation of patient statements against clinical records is essential for safe healthcare applications of AI.
Read more
Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture
Summary
This paper addresses the challenges faced by Large Language Model (LLM) agents in managing longitudinal healthcare journeys, particularly the reconciliation of two imperfect sources of truth: patient self-reports and Electronic Health Records (EHRs). The authors propose a Dual-Stream Memory Architecture that separates patient narratives from structured clinical records, governed by a Reconciliation Engine that evaluates and classifies discrepancies. The architecture was tested on 26 patients across 675 wellness coaching sessions, using a hybrid dataset of real and synthetic clinical scenarios. The results showed that the engine detected 84.4% of designed clinical discrepancies with an 86.7% recall for safety-critical errors. The study highlights the importance of validating patient-reported memories against clinical records to ensure the safe deployment of health coaching agents.
Methodology
The authors implemented a Dual-Stream Memory Architecture that isolates patient narratives from structured clinical records. A Reconciliation Engine was developed to evaluate and classify discrepancies between the two sources. The architecture was tested using a hybrid dataset that combined real provider-patient transcripts with synthetic clinical scenarios, allowing for rigorous evaluation across multiple sessions.
Results
The Reconciliation Engine achieved an 84.4% detection rate for designed clinical discrepancies and an 86.7% recall for safety-critical errors. The analysis revealed a 13.6% error cascade traced back to memory extraction errors, emphasizing the need for component-level evaluation in clinical AI systems.
Implications
The findings suggest that integrating a robust reconciliation process into health coaching agents can significantly enhance patient safety and data integrity. This approach can be applied to various digital health applications, improving the reliability of AI-driven patient interactions and decision-making.
Low Rank Adaptation for Adversarial Perturbation
Theory
Optimization
Efficient ML
- Adversarial perturbations possess an inherently low-rank structure.
- The proposed method improves the efficiency and effectiveness of black-box adversarial attacks.
- The approach utilizes auxiliary data and a reference model to construct a low-rank subspace.
- Integrating low-rank optimization can reduce computational overhead in adversarial training.
Read more
Low Rank Adaptation for Adversarial Perturbation
Summary
This paper investigates the low-rank structure of adversarial perturbations, drawing parallels to Low-Rank Adaptation (LoRA) techniques used in training large language models (LLMs). The authors provide both theoretical and empirical evidence that adversarial perturbations exhibit low-rank characteristics, which can be leveraged to enhance the efficiency of black-box adversarial attacks. The proposed method involves a two-step approach: first, using a reference model and auxiliary data to project gradients into a low-dimensional subspace; second, constraining the perturbation search to this low-rank subspace. This approach addresses the challenges of excessive query requirements in black-box settings and demonstrates significant improvements in attack performance across various methods, models, and datasets. The findings suggest that integrating low-rank optimization into adversarial training can also reduce memory overhead, thereby offering new avenues for both attack and defense strategies in machine learning security.
Methodology
The authors conducted a theoretical analysis to prove the low-rank nature of adversarial perturbations and performed empirical evaluations across various attack methods, model architectures, and datasets. They developed a two-step method that involves projecting gradients into a low-dimensional subspace and constraining the search for adversarial perturbations within this subspace.
Results
The results showed substantial improvements in the performance of low-rank adversarial attacks compared to conventional methods, with enhanced efficiency and effectiveness in generating adversarial examples across different scenarios.
Implications
The insights from this study can inform the design of more efficient adversarial attacks and defenses, potentially leading to more robust machine learning models against adversarial threats. Additionally, the findings may influence future research in optimizing adversarial training processes.
Anomaly Detection in Soil Heavy Metal Contamination Using Unsupervised Learning for Environmental Risk Assessment
Theory
- Unsupervised machine learning effectively detects soil heavy metal contamination anomalies.
- Isolation Forest and PCA reconstruction error identified significant anomalies in soil samples.
- Anomalies exhibited 70-80% higher Hazard Index values compared to normal samples.
- Three distinct types of contamination anomalies were identified at specific sites.
Read more
Anomaly Detection in Soil Heavy Metal Contamination Using Unsupervised Learning for Environmental Risk Assessment
Summary
This study addresses the critical issue of soil contamination by heavy metals in Ghana, particularly at unregulated waste disposal sites. The authors apply an unsupervised machine learning framework to detect and characterize anomalous heavy metal contamination patterns in soil samples from twelve waste sites and residential controls in the Central Region of Ghana. They analyze concentrations of eight metals (As, Cd, Cr, Cu, Hg, Ni, Pb, Zn) alongside health risk indices such as the Hazard Index (HI) and Incremental Lifetime Cancer Risk (ILCR). The study employs three anomaly detection algorithms: Isolation Forest, DBSCAN, and Principal Component Analysis (PCA) reconstruction error. The Isolation Forest and PCA methods identified 12 anomalous samples (15.4% of 78 samples), while DBSCAN did not detect any density-isolated noise points. A consensus approach revealed six robust anomalies (7.7%), all located at a single site (S3), with these anomalies showing significantly higher mean HI values than normal samples. The study identifies three distinct types of anomalies, highlighting the effectiveness of unsupervised learning in providing detailed insights into soil contamination, which can enhance environmental risk assessment and management strategies.
Methodology
The study utilized an unsupervised learning framework, applying Isolation Forest, DBSCAN, and PCA reconstruction error to analyze soil samples for heavy metal concentrations. The methods aimed to identify anomalous patterns in the data without requiring pre-labeled contamination events.
Results
The analysis revealed 12 anomalous samples using Isolation Forest and PCA, with a consensus approach isolating six robust anomalies. These anomalies were concentrated at one site (S3) and exhibited significantly elevated health risk indices, indicating a strong correlation between multivariate deviations and health risks.
Implications
The findings suggest that unsupervised machine learning can enhance environmental risk assessments by providing detailed insights into contamination patterns. This approach can inform targeted interventions and improve monitoring strategies in vulnerable areas, ultimately contributing to better public health outcomes.
On the Expressive Power of GNNs to Solve Linear SDPs
Graph Learning
Optimization
Theory
- Standard GNN architectures fail to recover solutions for linear SDPs.
- The VC-2-FWL architecture is proposed as a more expressive alternative capable of capturing the structure of SDPs.
- Empirical results show that VC-2-FWL achieves lower prediction errors and objective gaps compared to weaker models.
- Utilizing predictions from VC-2-FWL to warm-start traditional solvers can lead to significant computational speedups.
Read more
On the Expressive Power of GNNs to Solve Linear SDPs
Summary
This paper investigates the ability of Graph Neural Networks (GNNs) to effectively solve Linear Semidefinite Programs (SDPs), which are crucial in convex optimization but computationally intensive to solve. The authors first demonstrate that standard GNN architectures are inadequate for recovering solutions to linear SDPs. They then propose a more expressive architecture, termed VC-2-FWL, which is capable of capturing the essential structure of SDPs and can emulate the updates of a standard first-order solver. Empirical evaluations on both synthetic datasets and SDPLIB benchmarks reveal that the VC-2-FWL architecture consistently outperforms weaker baselines in terms of prediction error and objective gap. Furthermore, the high-quality predictions generated by this architecture can be utilized to warm-start traditional solvers, leading to significant speedups of up to 80%. This work establishes a theoretical foundation for the expressivity required in GNNs to solve linear SDPs and provides a blueprint for future research in learning-based optimization.
Methodology
The authors analyze the expressivity of various GNN architectures, proving that standard message-passing methods are insufficient for representing linear SDPs. They introduce the VC-2-FWL architecture, which incorporates a two-dimensional Weisfeiler–Leman approach to effectively capture the necessary structural symmetries of SDPs. Empirical validation is conducted using synthetic and real-world benchmarks to compare the performance of VC-2-FWL against standard and weaker GNN architectures.
Results
The VC-2-FWL architecture consistently demonstrates lower prediction errors and objective gaps on both synthetic and SDPLIB benchmarks. Additionally, it enables warm-starting of traditional solvers, resulting in speed improvements of up to 80% in convergence time.
Implications
This research has significant implications for the field of optimization, particularly in developing efficient machine learning models that can serve as surrogates for traditional optimization methods. The findings could lead to advancements in solving large-scale SDPs and potentially other complex optimization problems using GNNs.
Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making
Optimization
Reinforcement Learning
Theory
- BCCB is an online framework that combines HTE estimation, exploration, and budget pacing into a unified decision-making process.
- It operates effectively without the need for historical data, making it suitable for cold-start scenarios.
- BCCB shows 3-5x lower performance variance than traditional offline methods, enhancing predictability in campaign planning.
- The framework consistently outperforms existing online methods, particularly at higher budget levels.
Read more
Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making
Summary
This paper addresses the challenge of treatment allocation under budget constraints in digital advertising, where advertisers must efficiently allocate limited budgets to users with varying responses to ads. The traditional two-stage offline approach, which estimates heterogeneous treatment effects (HTE) from historical data before optimizing budget allocation, struggles in cold-start scenarios with insufficient data. The author proposes Budget-Constrained Causal Bandits (BCCB), an online framework that learns user responses in real-time while managing budget constraints. BCCB integrates HTE estimation, exploration of uncertain responses, and adaptive budget pacing into a single sequential decision-making process. Evaluations on the Criteo Uplift dataset reveal that BCCB operates effectively from the first user, requiring no pre-collected historical data, and exhibits significantly lower performance variance compared to offline methods. Furthermore, BCCB consistently outperforms standard Thompson Sampling and other online methods across various budget levels, particularly excelling at higher budgets.
Methodology
The proposed BCCB framework employs a sequential decision-making approach that integrates heterogeneous treatment effect estimation with Thompson Sampling for exploration and adaptive budget pacing. This allows for real-time learning and treatment allocation based on individual user responses.
Results
BCCB demonstrated a data-efficiency crossover, requiring no more than one observation to yield effective results, in contrast to offline methods that need approximately 10,000 historical observations. It also achieved 3-5x lower performance variance across repeated runs and outperformed standard Thompson Sampling and budgeted Thompson Sampling across all tested budget levels.
Implications
The BCCB framework has significant implications for digital advertising and other domains where budget-constrained treatment allocation is critical. Its ability to function effectively in cold-start scenarios can enhance campaign efficiency and effectiveness, leading to better resource utilization and improved outcomes.
Generalizing the Geometry of Model Merging Through Fréchet Averages
Theory
Optimization
Efficient ML
- Model merging can be fragile due to architectural symmetries; traditional averaging methods may fail.
- Fréchet averaging provides a symmetry-aware merging solution by minimizing geodesic distances on a manifold.
- The choice of geometry (metric and manifold) is critical for effective model merging.
- The paper introduces a practical algorithm for merging low-rank adapters (LoRA) that addresses alignment issues.
Read more
Generalizing the Geometry of Model Merging Through Fréchet Averages
Summary
This paper addresses the challenge of model merging, which seeks to combine multiple trained models into a single model without additional training. Traditional methods of parameter-space averaging can fail due to architectural symmetries, leading to fragile merges. The authors propose a novel approach that utilizes Fréchet averaging, which minimizes a sum of geodesic distances on an appropriate manifold, ensuring that the merging process is symmetry-invariant. They emphasize that the choice of geometry, including the metric and manifold, is crucial for defining 'closeness' between models. The paper also discusses the specific case of low-rank adapters (LoRA), which exhibit unique symmetries that necessitate a distinct geometric approach. The authors critique existing LoRA merging methods and introduce a practical algorithm that aligns task-updates in a shared basis before applying merge operations. This approach demonstrates improved performance compared to traditional methods, particularly in scenarios where models are fine-tuned using parameter-efficient techniques.
Methodology
The authors develop a geometric framework called GeoMerge, which formulates model merging as computing a Fréchet mean in a chosen Riemannian representation space. This involves selecting an appropriate parameter manifold, metric, and equivalence relation to ensure that the merging process respects the underlying symmetries of the models being merged.
Results
The proposed Fréchet averaging method successfully addresses the limitations of naive merging techniques, particularly in the context of low-rank adapters. The authors provide empirical evidence demonstrating that their approach yields better performance than existing methods, especially in scenarios where models exhibit significant architectural symmetries.
Implications
This work has significant implications for the field of model merging, particularly in applications involving ensembles of models or parameter-efficient fine-tuning techniques. The proposed methods could enhance the robustness and effectiveness of model merging in various machine learning workflows, especially in scenarios where data is limited or inaccessible.
Hierarchical adaptive control for real-time dynamic inference at the edge
Efficient ML
- Introduction of a budgeted specialized predictor cascade that adheres to worst-case latency constraints.
- Development of a hierarchical control system that adapts to data and resource changes in real-time.
- Demonstration of substantial reductions in latency (up to 2.45x) and energy consumption (up to 2.86x) with minimal accuracy drop (<4%) compared to static baselines.
- Focus on enhancing the deployment of dynamic ML models in edge computing environments.
Read more
Hierarchical adaptive control for real-time dynamic inference at the edge
Summary
This paper addresses the challenges of deploying dynamic machine learning (ML) models in industrial edge systems, which must operate under strict latency, energy, and memory constraints. The authors propose a two-tier adaptive architecture that co-optimizes model and system decisions to enhance efficiency. At the global level, a constraint-driven scheduler configures a cascade of lightweight specialized classifiers and a generalist fallback model for each edge node, ensuring compliance with latency and memory requirements. At the node level, a local controller monitors data drift and hardware resources, adaptively managing the specialized predictors to maintain energy efficiency and avoid latency violations. The proposed architecture allows for longer operational times without necessitating a full redeployment, even when the global controller is unreachable. The evaluation on two vision datasets demonstrates significant improvements in latency and energy efficiency, with minimal accuracy loss compared to static models.
Methodology
The authors designed a two-tier adaptive architecture comprising a global scheduler and a local controller. The global scheduler selects a cascade of specialized predictors and a fallback model for each edge node, optimizing for latency and memory constraints. The local controller tracks resource usage and data drift, enabling dynamic adjustments to the model configuration during runtime.
Results
The experimental evaluation on two vision datasets under controlled distribution mismatch scenarios showed that the proposed architecture achieved average reductions in latency of up to 2.45 times and energy consumption of up to 2.86 times, while maintaining less than a 4% drop in accuracy compared to static models.
Implications
This work has significant implications for the deployment of machine learning in industrial applications, particularly in edge computing environments where latency and energy efficiency are critical. The proposed architecture can enhance the performance of real-time systems in various domains, including remote healthcare and autonomous machinery.
PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures
Time Series
- Comparison of CNN, CNN-LSTM, Transformer, and Mamba architectures for PPG-based affect recognition.
- Transformers and Mamba models show comparable performance to CNNs but do not consistently outperform them.
- CNNs achieve the highest accuracy and efficiency, making them the most effective overall.
- Transformers provide a better balance of F1 scores for arousal and relaxation states.
Read more
PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures
Summary
This paper investigates the effectiveness of various deep learning architectures for affect recognition using photoplethysmography (PPG) signals. The authors compare Convolutional Neural Networks (CNNs), CNN-LSTM hybrids, Transformers, and Mamba models in classifying emotional states such as arousal, valence, and relaxation. The study is motivated by the increasing use of PPG in wearable devices and the need for robust models that can generalize well despite the challenges posed by small and noisy datasets. The authors employ a subject-independent 5-fold cross-validation protocol, ensuring consistent preprocessing and training across all models. Results indicate that while Transformers and Mamba models perform comparably to CNNs, they do not consistently outperform them. CNNs demonstrate the highest accuracy and efficiency, while Transformers provide a better balance of F1 scores for specific emotional states. This research marks the first evaluation of Transformer and Mamba architectures in the context of PPG-based affect recognition, offering insights into model selection for wearable affective computing applications.
Methodology
The study employs a measurement-driven comparison of four deep learning architectures (CNN, CNN-LSTM hybrid, Transformers, and Mamba) using the WARM-VR dataset. A subject-independent 5-fold cross-validation protocol is utilized, ensuring identical preprocessing, segmentation, and training pipelines across all models.
Results
The results reveal that CNNs outperform other architectures in terms of accuracy and model size. Transformers and Mamba models achieve performance levels comparable to CNNs but do not consistently exceed their performance across all tasks. Specifically, Transformers exhibit a better balance of F1 scores for arousal and relaxation classifications.
Implications
The findings suggest that while advanced models like Transformers and Mamba may offer benefits in certain contexts, traditional CNN architectures remain the most effective choice for PPG-based affect recognition, particularly in scenarios with limited data. This has implications for the design of wearable affective computing systems, guiding practitioners in model selection based on specific application needs.
reward-lens: A Mechanistic Interpretability Library for Reward Models
NLP
Large Language Models
Reinforcement Learning
- Introduction of 'reward-lens', the first toolkit for mechanistic interpretability of reward models.
- The library organizes interpretability tools around the weight vector of the reward head.
- Includes five theory-grounded extensions to enhance interpretability.
- Empirical validation shows linear attribution does not predict causal importance.
Read more
reward-lens: A Mechanistic Interpretability Library for Reward Models
Summary
The paper introduces 'reward-lens', an open-source library designed to enhance mechanistic interpretability for reward models used in reinforcement learning from human feedback (RLHF) trained language models. Traditional interpretability tools have primarily focused on generative language models, often failing to accommodate the unique structure of reward models, which utilize a scalar regression head instead of a vocabulary unembedding. The library is built around the observation that the weight vector of the reward head serves as a natural axis for interpretability inquiries. It includes various tools such as the Reward Lens, component attribution, contrastive activation patching, and a suite for probing reward hacking. Additionally, the library features five extensions based on recent alignment theory results, including a distortion index and a misalignment cascade detector. The framework was validated on two production reward models, revealing that linear attribution does not predict causal importance, a finding treated as a feature of the framework rather than a flaw. This work aims to bridge the gap in mechanistic evaluation of reward models and provide a comprehensive toolkit for researchers.
Methodology
The reward-lens library was developed to provide a systematic approach to interpretability for reward models. It includes tools for projecting intermediate model states onto the reward head's weight vector, allowing for various interpretability analyses such as component attribution and activation patching. The library was validated through empirical evaluations on two reward models using approximately 695 RewardBench preference pairs.
Results
The validation of the reward-lens library on two production reward models revealed that linear attribution methods did not accurately predict causal importance, with mean Spearman correlation coefficients of -0.256 for Skywork and -0.027 for ArmoRM. This negative result indicates that traditional interpretability methods may not be suitable for reward models, highlighting the need for the new framework.
Implications
The reward-lens library has the potential to significantly advance the interpretability of reward models in RLHF, providing researchers with tools to better understand and evaluate the safety and alignment of these models. It may also facilitate the development of more robust reward models by exposing their structural properties and failure modes.
PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking
Time Series
- Introduction of PROMISE-AD, a leakage-safe survival framework for AD progression prediction.
- Development of progression-aware visit tokenization to handle irregular clinical histories.
- Utilization of a temporal Transformer for effective risk estimation.
- Achieved state-of-the-art performance in predicting AD conversion with low Brier scores and high C-index.
Read more
PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking
Summary
The paper presents PROMISE-AD, a novel framework designed for predicting the progression of Alzheimer's disease (AD) from cognitively normal (CN) status to mild cognitive impairment (MCI) and from MCI to AD dementia. The framework addresses critical challenges in AD progression prediction, including irregular visit patterns, censoring of data, and diagnostic leakage. PROMISE-AD utilizes a unique visit tokenization approach that encodes various clinical measurements, missingness indicators, and longitudinal changes while excluding diagnostic labels to prevent leakage. A temporal Transformer model is employed to fuse different representations of patient data, allowing for the estimation of a progression score and latent discrete-time mixture hazards. The training process incorporates multiple objectives, including survival likelihood and horizon-specific risk loss, followed by isotonic calibration for risk estimation at multiple time horizons. The framework was evaluated using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and TADPOLE Challenge, demonstrating superior performance in predicting CN-to-MCI and MCI-to-AD conversion compared to existing methods.
Methodology
PROMISE-AD employs a hybrid approach combining progression-aware visit tokenization, a temporal Transformer for representation learning, and latent mixture-based survival modeling. The framework encodes clinical visit data while avoiding leakage, and it integrates various training objectives to enhance risk estimation accuracy across multiple time horizons.
Results
In testing, PROMISE-AD achieved an integrated Brier score (IBS) of 0.085 ± 0.012 and a C-index of 0.808 ± 0.015 for CN-to-MCI conversion, outperforming other methods. For MCI-to-AD conversion, it reached a C-index of 0.894 ± 0.018 and near-ceiling performance in 5-year discrimination metrics (AUROC 0.997 ± 0.003; AUPRC 0.999 ± 0.001).
Implications
The findings suggest that PROMISE-AD can significantly improve the accuracy of AD progression predictions, which is crucial for early intervention and personalized treatment planning. The methodology can be adapted for other diseases with similar progression patterns, enhancing clinical decision-making.
Diagnosing Capability Gaps in Fine-Tuning Data
NLP
Large Language Models
Reinforcement Learning
- GOALCOVER enables systematic detection of capability gaps in fine-tuning datasets.
- The framework decomposes high-level goals into atomic subgoals for better evaluation.
- Controlled experiments validate GOALCOVER's effectiveness in identifying targeted capability impacts.
- Training on GOALCOVER-filtered data improves model performance in downstream tasks.
Read more
Diagnosing Capability Gaps in Fine-Tuning Data
Summary
The paper addresses the challenge of identifying capability gaps in fine-tuning datasets for large language models (LLMs) before costly training runs. The authors introduce GOALCOVER, a framework that enables practitioners to systematically detect these gaps through interactive goal decomposition and automated coverage assessment. GOALCOVER assists users in breaking down high-level goals into specific, independently evaluable subgoals, assigns alignment scores to training samples, and highlights missing capabilities through analysis of low-scoring samples. The framework was validated through controlled corruption experiments in three domains (medical QA, legal summarization, code generation), demonstrating its ability to distinguish between targeted and non-targeted capability impacts. Additionally, the authors showcased the downstream utility of GOALCOVER in a financial summarization Reinforcement Fine-Tuning (RFT) task, where training on GOALCOVER-filtered data significantly improved model performance. Overall, GOALCOVER serves as a practical diagnostic tool for detecting capability gaps and providing actionable insights for dataset refinement prior to fine-tuning.
Methodology
GOALCOVER operates in two phases: (1) an interactive goal-clarification system that helps practitioners decompose objectives into specific subgoals, and (2) an automated coverage pipeline that scores each training sample against these subgoals. The framework uses LLM-based evaluators to assess alignment and surfaces missing capabilities through structured explanations.
Results
The framework demonstrated a reliable distinction between targeted and non-targeted capability impacts, with target subgoals degrading by an average of 25.6% compared to 2.1% for non-target subgoals. In a financial summarization RFT task, training on GOALCOVER-filtered data improved the LLM-judge reward from 3.77 to 4.12, with the best configuration reaching 4.20.
Implications
GOALCOVER has significant implications for practitioners in various domains, as it provides a structured approach to ensure that fine-tuning datasets adequately cover necessary capabilities. This can lead to more reliable and effective deployment of LLMs in high-stakes applications, reducing the risk of production failures due to capability gaps.
Who Trains Matters: Federated Learning under Enrollment and Participation Selection Biases
Federated Learning
- Introduces a two-stage selection framework for Federated Learning, addressing both enrollment and participation biases.
- Develops FEDIPW, an inverse-probability-weighted aggregation method to recover the target-population mean update.
- Proposes a limited-information aggregate-calibration extension for scenarios where client-level data is unavailable.
- Demonstrates through experiments that enrollment correction effectively reduces target-population error.
Read more
Who Trains Matters: Federated Learning under Enrollment and Participation Selection Biases
Summary
This paper addresses the issue of selection bias in Federated Learning (FL), which arises from two stages: enrollment bias and participation bias. Enrollment bias occurs when eligibility rules determine which clients are reachable for training, while participation bias arises from factors affecting which enrolled clients contribute updates during training. The author formalizes FL under a two-stage selection model and introduces FEDIPW, an inverse-probability-weighted aggregation scheme that aims to recover the target-population mean update. The paper also discusses a limited-information aggregate-calibration extension to partially correct enrollment bias when client-level covariates are unavailable. An algorithm-agnostic optimization analysis reveals that incomplete selection correction can lead to a non-vanishing bias floor. Experiments using synthetic federated logistic regression validate the proposed methods, demonstrating that enrollment correction can significantly reduce target-population error.
Methodology
The paper formalizes a two-stage selection model in Federated Learning and derives the FEDIPW aggregation scheme. It also introduces an aggregate-calibration extension for limited-information scenarios and conducts an optimization analysis to understand the impact of residual weighting errors on bias.
Results
The experiments show that naive aggregation can lead to a convergence towards a selected-client solution rather than the target-population objective. The FEDIPW method successfully recovers the target-population objective when inclusion probabilities are known, and the aggregate calibration provides a useful partial correction in limited-information settings.
Implications
The findings suggest that addressing enrollment bias is crucial for improving the performance of Federated Learning systems, particularly in applications where the representativeness of client data is critical for model accuracy. This work can inform the design of more robust FL algorithms that better align with target populations.