AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
24
Papers today
8h
Update frequency
7
Days of history
AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction
Multimodal
- AIM-DDI is a model-agnostic integration module for DDI prediction.
- It maps heterogeneous drug modalities into a shared latent space as tokens.
- The module improves prediction performance, particularly in unseen-drug scenarios.
- AIM-DDI shows significant relative improvements in accuracy and recall metrics.
Read more
AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction
Summary
The paper addresses the critical task of drug-drug interaction (DDI) prediction, which is essential for ensuring patient safety in clinical settings. A significant challenge in this domain is the unseen-drug generalization, where models must predict interactions involving drugs not present in the training data. The authors propose AIM-DDI, a model-agnostic multimodal integration module that allows for the effective fusion of diverse drug-related information, such as structural, chemical, and semantic signals, into a shared latent space. This approach overcomes the limitations of existing multimodal DDI models, which are often tied to specific architectures, making them less flexible. AIM-DDI enables the integration of heterogeneous modality information through a unified fusion module, enhancing the robustness of DDI predictions, especially in scenarios where both drugs in a test pair are unseen. The authors conduct extensive evaluations using various DDI prediction models and demonstrate that AIM-DDI consistently improves prediction performance, achieving significant gains in accuracy, macro-F1, and macro-recall metrics. The findings suggest that treating multimodal integration as a reusable module is an effective strategy for enhancing unseen-drug DDI prediction.
Methodology
The authors developed AIM-DDI as a multimodal integration module that operates independently of specific prediction architectures. It utilizes a shared latent space to represent various drug modalities as tokens and employs a unified fusion module to model dependencies across these modalities. The effectiveness of AIM-DDI was evaluated using a DrugBank-based multimodal DDI benchmark with three different prediction models, focusing on both one-unseen and both-unseen drug settings.
Results
AIM-DDI demonstrated substantial improvements in DDI prediction performance, achieving relative gains of up to 23.34% in accuracy, 66.04% in macro-F1, and 86.00% in macro-recall when both drugs in a test pair were unseen. These results were consistent across multiple DDI prediction models and additional DrugBank-based frameworks.
Implications
The AIM-DDI module can enhance the robustness of DDI prediction systems, making them more reliable for clinical applications. Its model-agnostic nature allows for broader applicability across different architectures, facilitating safer drug prescriptions and improved pharmacovigilance.
Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement
Generative Models
- Identification of a stability-novelty trade-off in crystal generation.
- Development of Crys-JEPA, an energy-aware latent surrogate for stability evaluation.
- Introduction of a screening-and-refinement pipeline to improve generative model performance.
- Significant performance improvements over baseline models on crystal generation metrics.
Read more
Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement
Summary
The paper presents Crys-JEPA, a novel approach to crystal discovery that addresses the limitations of existing generative models which often prioritize stability over novelty. The authors identify a significant trade-off between stability and novelty in current crystal generation methods, where maximizing the likelihood of observed crystals tends to restrict the exploration of new, stable materials. To overcome this challenge, Crys-JEPA employs a joint embedding predictive architecture that creates an energy-aware latent space, allowing for efficient stability assessments without relying heavily on computationally expensive energy evaluations. The methodology includes a screening-and-refinement pipeline that selects promising generated crystals and refines the generative model based on these selections. The results demonstrate substantial improvements in generation quality, achieving up to 81.4% and 82.6% on the V.S.U.N metric across two datasets, MP-20 and Alex-MP-20, respectively. This work not only enhances the efficiency of crystal discovery but also opens avenues for the development of new materials with desirable properties.
Methodology
Crys-JEPA utilizes a joint embedding predictive architecture to create an energy-aware latent space that preserves formation-energy differences. This allows for stability assessments based on comparisons with training crystals rather than expensive energy calculations. The approach includes a refinement loop where generated crystals are screened for stability and novelty, and the generative model is fine-tuned based on selected samples.
Results
The proposed method outperformed baseline models, achieving improvements of up to 81.4% on the V.S.U.N metric for the MP-20 dataset and 82.6% for the Alex-MP-20 dataset, demonstrating the effectiveness of the Crys-JEPA architecture and the screening-and-refinement pipeline.
Implications
The findings suggest that Crys-JEPA can significantly accelerate the discovery of new stable and novel crystals, which has important implications for various applications in materials science, including the development of advanced materials for energy storage, catalysis, and other technological applications.
R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning
Reinforcement Learning
Robotics
Theory
- R2R2 addresses representation-level instability in SPL under high UTD regimes.
- The method avoids zero-centering to preserve critical global dynamics information.
- R2R2 improves the performance of TD7 by approximately 22% at a UTD ratio of 20.
- SimbaV2-SPL, enhanced with R2R2, sets a new state-of-the-art in continuous control benchmarks.
Read more
R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning
Summary
The paper introduces R2R2, a novel regularization method designed to enhance the efficiency of reinforcement learning (RL) in data-scarce environments, particularly in robotics. The authors identify a gap in existing research regarding representation-level instability in Self-Predictive Learning (SPL) when subjected to high Update-to-Data (UTD) ratios. R2R2 employs redundancy reduction principles to stabilize SPL performance, diverging from traditional zero-centering methods that can eliminate critical global dynamics information. The authors validate R2R2 on SPL-native algorithms like TD7, demonstrating significant improvements in performance, particularly at high UTD ratios. Additionally, they extend the state-of-the-art SimbaV2 architecture by integrating a tailored SPL module, termed SimbaV2-SPL, which establishes new benchmarks in continuous control tasks. The results show that R2R2 not only mitigates overfitting but also enhances the performance of existing architectures, confirming its effectiveness and compatibility with prior advancements.
Methodology
The authors propose R2R2 as a regularization method that incorporates redundancy reduction principles to stabilize the performance of SPL. They conduct theoretical analysis to demonstrate the conflict between SPL's spectral properties and zero-centering, leading to the development of a non-centered objective. R2R2 is applied to SPL-native algorithms like TD7 and integrated into the SimbaV2 architecture to create SimbaV2-SPL.
Results
Experiments across 11 continuous control tasks show that R2R2 effectively mitigates overfitting, improving TD7's performance by approximately 22% at a UTD ratio of 20. The integration of R2R2 into SimbaV2-SPL yields further performance gains, establishing a new state-of-the-art in continuous control benchmarks.
Implications
The findings suggest that R2R2 can significantly enhance the efficiency of reinforcement learning in data-scarce environments, making it particularly relevant for real-world robotics applications. The method's compatibility with existing architectures also opens avenues for further improvements in RL algorithms.
GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation
Multimodal
- GeoViSTA integrates gridded imagery and tabular socioeconomic data for comprehensive geospatial analysis.
- The model employs bilateral cross-attention and a geography-aware attention mechanism for effective feature fusion.
- Training is conducted using a self-supervised joint masked autoencoding objective, requiring no labeled data.
- GeoViSTA outperforms traditional models in predicting health-related and environmental outcomes.
Read more
GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation
Summary
The paper introduces GeoViSTA, a novel Geospatial Vision-Tabular Transformer designed to bridge the gap between visual and tabular data in geospatial analysis. Traditional geospatial models have primarily focused on gridded imagery, neglecting the structured socioeconomic data often stored in tabular formats. GeoViSTA addresses this limitation by employing a bilateral cross-attention mechanism that facilitates the exchange of spatial and semantic information across modalities. The model is trained using a self-supervised joint masked autoencoding objective, which encourages it to reconstruct missing elements from both image patches and tabular rows using local spatial context and cross-modal cues. The authors demonstrate the effectiveness of GeoViSTA through experiments on the Contiguous United States, where it integrates gridded visual representations with tabular socioeconomic data. The results indicate that GeoViSTA's unified embeddings significantly enhance performance in predicting critical outcomes such as disease-specific mortality and fire hazard frequency, outperforming existing unimodal and feature-concatenation baselines. The findings suggest that a holistic approach to modeling the physical environment alongside socioeconomic factors can yield more transferable representations for geospatial inference.
Methodology
GeoViSTA utilizes a vision-tabular architecture that incorporates bilateral cross-attention to facilitate interaction between visual and tabular data. It employs a geography-aware attention mechanism to align spatially relevant features, and is trained with a self-supervised joint masked autoencoding objective, which involves reconstructing missing data from both modalities.
Results
GeoViSTA's unified embeddings significantly improve linear probing performance on downstream tasks, including predicting disease-specific mortality rates and fire hazard frequency. The model demonstrates strong performance even when tested on held-out regions, indicating its robustness and transferability.
Implications
The development of GeoViSTA has significant implications for various fields, including public health, environmental risk assessment, and disaster response, by enabling more accurate and holistic geospatial analyses that incorporate both physical and socioeconomic factors.
Action-Inspired Generative Models
Generative Models
- Introduction of Action-Inspired Generative Models (AGMs) to improve generative model training.
- Utilization of a lightweight learned scalar potential, VΟ, to score bridge samples and modulate drift objectives.
- Significant improvements in generation quality by selectively penalizing uninformative transport paths.
- VΟ adds negligible overhead to training and no cost during inference, making it a practical enhancement.
Read more
Action-Inspired Generative Models
Summary
This paper introduces Action-Inspired Generative Models (AGMs), a novel dual-network generative framework aimed at improving the training of generative models by addressing the uniform treatment of stochastic transitions in existing bridge-matching methods. The authors propose a lightweight learned scalar potential, VΟ, which scores bridge samples and modulates the drift objective through importance weights. This approach allows the model to selectively penalize uninformative transport paths, enhancing the quality of generated samples. VΟ is designed to be minimal in terms of parameter count, comprising only about 1.4% of the primary drift network, and adds no overhead during inference. The framework is evaluated on the CelebA-HQ dataset, demonstrating consistent improvements across various generative metrics, indicating that the proposed method effectively enhances the learning process by focusing on structurally coherent trajectories.
Methodology
The AGM framework employs a dual-network architecture where a potential network, VΟ, is trained alongside the primary drift network. VΟ assigns importance scores to bridge samples without feeding gradients back into the drift objective, using a stop-gradient mechanism to prevent adversarial feedback. This allows for a more informed training process that emphasizes structurally coherent paths in the data manifold.
Results
The evaluation of AGMs on the CelebA-HQ dataset revealed consistent improvements in generation quality across fidelity and coverage metrics, demonstrating the effectiveness of the proposed importance weighting mechanism in enhancing the learning process.
Implications
The findings suggest that incorporating action-inspired principles into generative modeling can lead to more efficient training and improved sample quality. This approach could be applied to various generative tasks, potentially enhancing performance in fields such as image synthesis, video generation, and other applications requiring high-quality generative outputs.
PreFT: Prefill-only finetuning for efficient inference
NLP
Large Language Models
Efficient ML
- PreFT optimizes inference efficiency by applying adapters only during the prefill phase.
- The approach significantly increases throughput, achieving up to 1.9Γ the throughput of traditional PEFTs.
- PreFTs maintain competitive performance with traditional PEFTs, especially in reinforcement learning tasks.
- The implementation is available on the vLLM inference engine, facilitating practical applications.
Read more
PreFT: Prefill-only finetuning for efficient inference
Summary
The paper introduces PreFT (Prefill-only Finetuning), a novel approach to enhance the efficiency of serving personalized large language models (LLMs) at scale. Traditional parameter-efficient finetuning methods (PEFTs) face challenges in throughput when serving multiple user-specific adapters due to the mismatch between prefill and decode phases of inference. PreFT optimizes performance by applying adapters only during the prefill phase, significantly increasing throughput while maintaining competitive performance. The authors implement and benchmark two prefill-only PEFTs, LoRA and ReFT, on the vLLM inference engine, demonstrating that PreFTs can achieve up to 1.9 times the throughput of traditional PEFTs when serving multiple adapters. Although PreFTs show a slightly higher evaluation loss in supervised finetuning tasks, they can compensate for this by increasing rank with minimal impact on throughput. The results indicate that PreFTs are a favorable option for efficient multi-user serving of LLMs, balancing accuracy and throughput effectively.
Methodology
The authors developed prefill-only variants of existing PEFTs (LoRA and ReFT) and implemented them in the vLLM inference engine. They conducted extensive benchmarking to compare the throughput and performance of PreFTs against traditional PEFTs across various tasks, including supervised finetuning and reinforcement learning.
Results
The experimental results showed that PreFTs achieved significantly higher throughput compared to traditional PEFTs, with gains observed across different model scales. While PreFTs exhibited higher evaluation loss in supervised finetuning tasks, they maintained comparable accuracy in downstream evaluations. In reinforcement learning tasks, PreFTs approached the performance of standard PEFTs, demonstrating their effectiveness in practical applications.
Implications
The findings suggest that PreFTs can be effectively deployed for personalizing large language models in multi-user settings, offering a better trade-off between accuracy and inference efficiency. This approach could facilitate the deployment of LLMs in applications requiring rapid personalization for large user bases.
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
Large Language Models
NLP
- RxEval shifts the evaluation of medication recommendation from admission-level to prescription-level, capturing the dynamic nature of clinical decision-making.
- The benchmark includes 1,547 MCQs based on real patient data, enhancing the realism of the evaluation process.
- Evaluation results show that even state-of-the-art LLMs perform poorly on RxEval, indicating significant challenges in automated medication recommendation.
- Common errors identified include overlooking patient information and failing to derive clinical conclusions, suggesting areas for model improvement.
Read more
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
Summary
The paper introduces RxEval, a novel benchmark designed to evaluate the medication recommendation capabilities of large language models (LLMs) at a prescription level, rather than the admission level used in previous benchmarks. Traditional benchmarks have inadequately represented the complexities of real-world prescribing, which involves selecting specific medications, doses, and routes as patient conditions evolve. RxEval addresses these limitations by formulating the task as multiple-choice questions (MCQs) that present detailed patient profiles and clinical trajectories, requiring the selection of specific medication-dose-route triples. The benchmark consists of 1,547 questions derived from real electronic health records, covering 584 patients, 18 diagnostic categories, and 969 unique medications. The evaluation of 16 LLMs on RxEval reveals that the benchmark is challenging and discriminative, with F1 scores ranging from 45.18 to 77.10 and an Exact Match accuracy of only 46.10% for the best-performing model. Error analysis indicates that even advanced models struggle with oversight and reasoning errors, highlighting the need for further improvements in LLMs for clinical applications.
Methodology
The authors developed RxEval by constructing multiple-choice questions from real electronic health records, focusing on specific medication-dose-route triples. They employed a reasoning-chain perturbation method to create patient-specific distractors for the MCQs, ensuring that the questions require nuanced clinical reasoning rather than general medical knowledge.
Results
The evaluation of 16 LLMs on the RxEval benchmark showed F1 scores ranging from 45.18 to 77.10, with the highest Exact Match accuracy being only 46.10%. This indicates that current models struggle with the complexities of medication recommendation, particularly in late-phase admissions where longer clinical trajectories are involved.
Implications
RxEval serves as a critical tool for assessing the prescribing capabilities of LLMs, providing insights into their limitations and guiding future research to enhance automated medication recommendation systems, especially in resource-constrained healthcare settings.
NeuroAtlas: Benchmarking Foundation Models for Clinical EEG and Brain-Computer Interfaces
Time Series
- NeuroAtlas is the largest EEG benchmark with 42 datasets and ~260k hours of EEG data.
- EEG-specific foundation models do not consistently outperform generic time-series models.
- Standard metrics are inadequate for assessing clinical utility; bespoke evaluation measures are necessary.
- Model performance varies significantly across datasets within the same domain.
Read more
NeuroAtlas: Benchmarking Foundation Models for Clinical EEG and Brain-Computer Interfaces
Summary
The paper introduces NeuroAtlas, the largest EEG benchmark to date, comprising 42 datasets and approximately 260,000 hours of clinical EEG data covering epilepsy, sleep medicine, brain age estimation, and brain-computer interfaces (BCIs). The authors highlight the limitations of existing evaluations in the EEG domain, which often rely on single datasets and metrics that do not capture clinical relevance. NeuroAtlas aims to address these gaps by providing comprehensive data and model coverage, evaluating EEG foundation models (FMs) against generic time-series FMs, and employing clinically grounded evaluation metrics. The study reveals that EEG-specific FMs do not consistently outperform generic time-series FMs, and that standard machine learning metrics are insufficient for assessing clinical utility. The findings emphasize the variability in model performance across datasets and the need for bespoke evaluation measures that reflect real-world usability. The authors conclude that current models have not yet realized the potential of a unified EEG model, and they provide NeuroAtlas as a resource for future research in this area.
Methodology
The authors compiled a comprehensive benchmark, NeuroAtlas, which includes a diverse set of EEG datasets and evaluates multiple model families, including EEG-specific foundation models, generic time-series models, and supervised models. They employed both standard machine learning metrics and clinically relevant evaluation measures tailored to each domain, such as event-level decision-making and hypnogram-derived features.
Results
The study found that EEG-specific models do not consistently outperform generic time-series models, and that performance varies significantly across different datasets within the same domain. Additionally, the use of standard metrics often masked important clinical usability issues, highlighting the need for more appropriate evaluation methods.
Implications
NeuroAtlas serves as a critical resource for researchers aiming to develop and evaluate EEG foundation models, facilitating reproducible progress in the field. It underscores the importance of using clinically relevant metrics and diverse datasets to better understand model performance and utility in real-world applications.
Comparative Evaluation of Machine Learning Approaches for Minority-Class Financial Distress Prediction Under Class Imbalance Constraints
Interpretability
- The study emphasizes the importance of sensitivity to minority-class predictions in financial distress scenarios.
- A comprehensive machine learning workflow is proposed, integrating preprocessing, imbalance mitigation, and explainability.
- Multiple machine learning models are evaluated, showcasing the effectiveness of ensemble methods in handling class imbalance.
- SHAP explainability methods are applied to enhance interpretability and governance in financial distress predictions.
Read more
Comparative Evaluation of Machine Learning Approaches for Minority-Class Financial Distress Prediction Under Class Imbalance Constraints
Summary
This paper addresses the challenge of predicting financial distress in firms, particularly focusing on the minority-class events of bankruptcy, which are often underrepresented in financial datasets. The authors highlight the limitations of conventional classification models that may yield high overall accuracy but fail to adequately identify minority-class instances. The study employs a CRISP-DM-oriented machine learning workflow that includes structured preprocessing, feature filtering, and imbalance mitigation using the Synthetic Minority Oversampling Technique (SMOTE). A comparative evaluation of various models, including Logistic Regression, Random Forest, AdaBoost, XGBoost, CatBoost, and LightGBM, is conducted, with performance metrics emphasizing minority-class precision, recall, F1-score, and ROC-AUC. The authors place particular importance on recall due to the operational risks associated with false negatives in financial contexts. Additionally, SHAP-based explainability methods are utilized to analyze feature contributions to bankruptcy predictions. The overarching goal is to develop reproducible, interpretable, and imbalance-aware machine learning workflows for financial distress prediction, enhancing enterprise risk management and decision-making.
Methodology
The authors adopt a CRISP-DM-oriented framework that includes structured preprocessing, feature filtering, and the application of SMOTE for imbalance mitigation. They evaluate various machine learning models, including classical statistical methods and ensemble learning approaches, focusing on performance metrics relevant to minority-class predictions. SHAP-based methods are used for explainability analysis.
Results
The comparative experimentation reveals that ensemble learning models, particularly XGBoost and LightGBM, outperform traditional statistical methods in terms of minority-class recall and overall predictive performance. The study highlights the critical need for imbalance-aware optimization strategies to enhance sensitivity towards bankruptcy events.
Implications
The findings suggest that adopting advanced machine learning techniques and explainability methods can significantly improve financial distress prediction, aiding organizations in better risk management and decision-making processes. The research contributes to the development of interpretable and reproducible machine learning frameworks in financial contexts.
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
Federated Learning
Computer Vision
NLP
- MetaMoE enables the unification of independently trained experts without sharing private data, ensuring privacy.
- The framework employs diversity-aware proxy selection using a relevance-weighted DPP to enhance the representation of client domains.
- A proxy-aligned expert training strategy is introduced, aligning expert behavior with proxy data for better coordination.
- The context-aware router improves expert assignment across diverse input types.
Read more
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
Summary
The paper introduces MetaMoE, a novel framework designed to unify independently trained, domain-specialized experts into a single Mixture-of-Experts (MoE) model while ensuring data privacy. Traditional MoE approaches typically rely on centralized access to training data, which is often impractical due to privacy constraints. MetaMoE addresses this challenge by utilizing public proxy data to approximate inaccessible private data distributions. A key innovation of MetaMoE is its diversity-aware proxy selection mechanism, which employs a relevance-weighted determinantal point process (DPP) to select diverse and representative proxy samples from public data. This selection process enhances the training of experts by aligning their behavior with the proxy data used for router training, thereby improving coordination among experts during unification. Additionally, a context-aware router is implemented to optimize expert selection based on heterogeneous inputs. The effectiveness of MetaMoE is validated through experiments on computer vision and natural language processing benchmarks, where it consistently outperforms existing privacy-preserving MoE unification methods.
Methodology
MetaMoE utilizes a diversity-aware proxy selection mechanism based on a relevance-weighted determinantal point process (DPP) to choose diverse proxy samples from public data. This is followed by proxy-aligned expert training, where experts are fine-tuned on both private data and selected proxies. A context-aware router is then employed to enhance expert selection, leading to improved coordination among experts during the unification process.
Results
Experiments conducted on computer vision and natural language processing benchmarks show that MetaMoE consistently outperforms recent privacy-preserving MoE unification methods, demonstrating its effectiveness in maintaining performance while ensuring data privacy.
Implications
MetaMoE has significant implications for organizations that require the use of specialized models trained on private data while adhering to privacy regulations. It allows for the effective unification of models without compromising sensitive information, potentially benefiting various applications in fields such as healthcare, finance, and personalized services.
LiSA: Lifelong Safety Adaptation via Conservative Policy Induction
NLP
Large Language Models
Reinforcement Learning
- LiSA formulates lifelong guardrail adaptation for AI agents using sparse, noisy user feedback.
- The framework incorporates broad policy abstraction, conflict-aware local rules, and confidence-gated memory reuse.
- Empirical results show LiSA consistently outperforms strong baselines and maintains robustness against noisy inputs.
- LiSA enhances boundary-sensitive decision-making and pushes the latency-performance frontier beyond traditional scaling methods.
Read more
LiSA: Lifelong Safety Adaptation via Conservative Policy Induction
Summary
The paper introduces LiSA (Lifelong Safety Adaptation), a framework designed to enhance the safety of AI agents operating in dynamic environments where guardrails must adapt to local norms and user expectations. Traditional guardrails often fail to account for the contextual nuances of deployment, leading to potential risks such as privacy breaches or inappropriate refusals. LiSA addresses this by employing a conservative policy induction approach that utilizes structured memory to convert sparse user-reported failures into reusable policy abstractions. The framework includes mechanisms for broad policy abstraction, conflict-aware local refinement, and confidence-gated memory reuse, allowing for effective adaptation without the need for constant fine-tuning. Empirical evaluations on datasets like PrivacyLens+, ConFaide+, and AgentHarm demonstrate that LiSA significantly outperforms existing memory-based methods, remains robust against noisy feedback, and enhances decision-making in boundary-sensitive scenarios. The results indicate that LiSA not only improves safety but also optimizes performance and latency, providing a practical solution for securing AI agents in real-world applications.
Methodology
LiSA employs a structured policy memory framework organized as an online-offline loop. It abstracts sparse failure reports into broad policy items, creates conflict-aware local rules for mixed-label contexts, and utilizes confidence gating to ensure that memory reuse is based on accumulated evidence rather than empirical accuracy alone.
Results
LiSA demonstrated superior performance across multiple datasets, consistently outperforming fixed guardrails and memory-based baselines. It maintained robustness even with a 20% label-flip rate in user feedback, and the incorporation of local policies significantly contributed to performance improvements. The latency analysis indicated that LiSA's structured memory approach is more efficient than simply scaling existing models.
Implications
LiSA provides a practical framework for developing safer AI agents capable of adapting to complex and evolving real-world environments, potentially reducing risks associated with privacy violations and inappropriate actions. Its methodologies could be applied in various domains where AI systems interact with sensitive data or require nuanced decision-making.
XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference
Large Language Models
Efficient ML
Optimization
- XFP allows operators to specify quality thresholds instead of bit-widths, enhancing flexibility in quantization.
- The quantizer automatically determines codebook size and outlier handling, simplifying the quantization process.
- XFP achieves effective bits as low as 3.4 while maintaining high throughput for large models.
- Two codebook storage modes provide options for balancing precision and memory usage.
Read more
XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference
Summary
The paper introduces XFP, a novel dynamic weight quantizer designed for large language model (LLM) inference that reverses traditional quantization processes. Instead of requiring operators to specify bit-widths, XFP allows them to set reconstruction quality thresholds based on per-channel cosine similarity. The quantizer automatically determines the necessary codebook size, outlier budget, and packing for each layer without needing Hessian information, calibration data, or manual adjustments. XFP decomposes weight matrices into a sparse fp16 outlier residual and a dense sub-byte index tensor, utilizing a learned codebook for efficient representation. The paper presents two codebook storage modes: V2, which employs a per-channel Lloyd codebook, and V2a, which uses a shared library of codebooks, optimizing memory usage. The contributions of XFP include a quality-targeted quantization approach that converges to effective bits as low as 3.4, two flexible codebook storage modes, and a quality-driven iteration process (H-Process) that enables fitting large models into limited memory while maintaining high throughput. Experimental results demonstrate that XFP achieves significant speed and quality improvements over existing quantization methods, making it a promising solution for efficient LLM inference.
Methodology
XFP employs a quality-targeted quantization approach where operators set two reconstruction quality floors. The quantizer then determines the minimum bit width required to meet these thresholds, separating outlier weights into a sparse residual. It utilizes a learned codebook for weight representation, adapting to different model architectures without manual tuning.
Results
XFP demonstrated effective bits of approximately 3.97 on the Qwen3.5-122B-A10B model and 3.4 on Qwen3.5-397B-A17B, achieving 138 tokens per second on workstation hardware with a strict match accuracy of 94.49% on GSM8K. The H-Process allowed fitting a 397B MoE model into 2Γ96 GB memory, achieving a throughput of 100.9 tokens per second with a 66.72% strict match on a comprehensive problem set.
Implications
The advancements presented in XFP could significantly enhance the efficiency of LLM inference, making it feasible to deploy larger models on limited hardware resources. This could lead to broader applications of LLMs in real-time systems and environments with stringent memory constraints.
ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization
Reinforcement Learning
Optimization
Theory
- Identifies an objective misalignment problem in existing O2O RL approaches.
- Proposes a bi-level optimization framework for adaptive data mixing.
- Utilizes a multi-armed bandit mechanism for real-time optimization of data mixing ratios.
- Achieves superior stability and performance compared to static and heuristic-based methods.
Read more
ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization
Summary
The paper introduces ROAD (Reinforcement Learning with Optimized Adaptive Data-mixing), a novel framework addressing the challenges of offline-to-online reinforcement learning (O2O RL). The authors highlight the critical issue of non-stationary distribution shifts between offline datasets and evolving online policies, which can lead to suboptimal performance when using static mixing strategies. ROAD proposes a bi-level optimization approach where the data mixing strategy is treated as a meta-decision impacting the performance of the online policy. The inner-level employs standard Q-learning updates, while the outer-level focuses on optimizing the data mixing strategy to enhance the expected performance. To implement this, the authors develop a practical algorithm utilizing a multi-armed bandit mechanism to optimize data mixing ratios in real-time, guided by a surrogate objective that mitigates value overestimation. The empirical results demonstrate that ROAD consistently outperforms existing data replay methods across various datasets, achieving improved stability and performance without the need for manual adjustments.
Methodology
The authors formulate the data selection problem as a bi-level optimization process, where the outer-level focuses on maximizing the expected performance of the online policy through adaptive data mixing, while the inner-level employs conventional Q-learning updates. A multi-armed bandit mechanism is used to optimize the data mixing ratios in real-time, guided by a surrogate objective that approximates the bi-level gradient and addresses value overestimation.
Results
ROAD outperforms existing static and heuristic-based data selection strategies across extensive benchmarks, demonstrating enhanced training stability and asymptotic performance. The empirical evaluations confirm that the adaptive data mixing strategy effectively addresses the distribution shift challenges inherent in O2O RL.
Implications
The proposed ROAD framework has significant implications for improving the efficiency and effectiveness of reinforcement learning in real-world applications, particularly in scenarios where offline data is available but online interaction is necessary for performance enhancement. This approach can be particularly beneficial in safety-critical environments such as healthcare and autonomous driving.
Compositional Sparsity as an Inductive Bias for Neural Architecture Design
Theory
Efficient ML
Interpretability
- Introduces a novel architecture combining IFNs and HNNs to exploit compositional sparsity.
- HNNs are significantly sparser than standard DNNs, requiring less hyperparameter tuning.
- Empirical results show HNNs outperform dense architectures in both synthetic and real-world datasets.
- The approach provides a stable and interpretable framework for high-dimensional learning.
Read more
Compositional Sparsity as an Inductive Bias for Neural Architecture Design
Summary
This paper addresses the challenge of identifying structural priors that enable Deep Neural Networks (DNNs) to effectively learn in high-dimensional spaces, particularly in the context of compositional sparsity. The authors propose a novel architecture that combines Information Filtering Networks (IFNs) and Homological Neural Networks (HNNs) to leverage sparse dependency structures derived from data. The approach formalizes design principles that allow for the emergence of abstraction through hierarchical composition, resulting in HNNs that are significantly sparser than traditional DNNs and require minimal hyperparameter tuning. The empirical validation demonstrates that HNNs can recover underlying compositional structures in synthetic tasks and outperform dense baselines across various real-world datasets, particularly in high-dimensional settings. The findings suggest that compositional sparsity serves as a beneficial inductive bias, enhancing interpretability and stability while reducing sensitivity to hyperparameters.
Methodology
The authors utilize Information Filtering Networks (IFNs) to extract sparse dependency structures from data, which are then mapped into fixed-wiring sparse neural graphs using Homological Neural Networks (HNNs). This method allows for a hierarchical interaction structure where higher-order units aggregate signals from lower-order constituents, enabling the architecture to be determined by data rather than extensive trial-and-error.
Results
The empirical validation shows that HNNs consistently outperform dense multilayer perceptron (MLP) baselines on structured tasks, particularly in high-dimensional scenarios. HNNs maintain competitive performance with fewer parameters and exhibit lower variance and reduced sensitivity to hyperparameters compared to traditional DNNs.
Implications
The findings suggest that adopting compositional sparsity as an inductive bias can lead to more efficient neural architectures that are interpretable and robust in high-dimensional learning contexts. This approach may have significant implications for various applications in machine learning where data structures are complex and high-dimensional.
Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization
Optimization
Efficient ML
Theory
- CoCD is a deterministic and memory-efficient ZO optimizer that utilizes stale gradients effectively.
- Theoretical grounding connects CoCD to established optimization methods, proving its equivalence to BCCD with warm starts.
- Larger finite-difference step sizes can improve convergence stability by smoothing the optimization landscape.
- Empirical results show CoCD significantly outperforms existing methods in terms of sample efficiency and accuracy.
Read more
Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization
Summary
This paper addresses the challenges of Zeroth-Order (ZO) optimization, particularly in scenarios where backpropagation is not feasible. The authors introduce Coherent Coordinate Descent (CoCD), a deterministic and sample-efficient ZO optimizer that leverages stale gradients as computational assets rather than liabilities. By formalizing the concept of gradient coherence, CoCD is shown to be equivalent to Block Cyclic Coordinate Descent (BCCD) with warm starts, allowing for O(1) query complexity per step while maintaining effective descent directions. The paper also presents a counter-intuitive finding that larger finite-difference step sizes can induce implicit smoothing effects on the optimization landscape, enhancing convergence stability. Empirical evaluations demonstrate that CoCD outperforms BCCD in sample efficiency and convergence accuracy across various neural network architectures, suggesting that structured deterministic updates are superior to randomized methods for lightweight ZO optimization.
Methodology
The CoCD algorithm employs a cyclic coordinate optimization framework that maintains a FIFO buffer of past gradient estimates. This allows for structured updates that reduce variance while controlling both computational and memory budgets. The theoretical analysis connects CoCD to BCCD, providing convergence guarantees and insights into the benefits of larger finite-difference intervals.
Results
CoCD was tested on various neural network architectures, including MLPs, CNNs, and ResNet-20, demonstrating superior performance compared to BCCD in terms of sample efficiency and final accuracy. The method also exhibited greater stability than randomized zeroth-order methods like SPSA.
Implications
The findings suggest that deterministic, structure-aware updates can enhance the efficiency and stability of zeroth-order optimization, making CoCD a promising approach for applications in black-box optimization, memory-constrained environments, and scenarios where gradient information is unavailable.
Optimal Pattern Detection Tree for Symbolic Rule-Based Classification
Interpretability
Optimization
- Introduction of the Optimal Pattern Detection Tree (OPDT) for symbolic rule-based classification.
- Utilization of mixed-integer programming to discover a single optimal pattern in data.
- Incorporation of Branching Structure Constraints (BSC) to encode domain knowledge and compliance requirements.
- Demonstrated optimality guarantees and effective performance on real-world datasets.
Read more
Optimal Pattern Detection Tree for Symbolic Rule-Based Classification
Summary
This paper presents the Optimal Pattern Detection Tree (OPDT), a novel rule-based machine learning model designed for symbolic rule discovery in data. Unlike black-box deep learning models, OPDT focuses on generating human-interpretable rules that enhance transparency and explainability, which is crucial in high-stakes domains such as healthcare and risk assessment. The OPDT utilizes mixed-integer programming to discover a single optimal pattern through binary classification, maximizing coverage while minimizing false positives. To further enhance its applicability, the authors introduce the Branching Structure Constraints (BSC) framework, allowing decision-makers to incorporate prior knowledge and compliance requirements directly into the model. The computational experiments demonstrate that OPDT effectively identifies optimal patterns in moderately sized datasets within reasonable runtimes, showcasing its potential for practical applications in various fields.
Methodology
The OPDT model is developed using mixed-integer programming to extract a single optimal rule from data. It incorporates Branching Structure Constraints (BSC) that control the tree's topology and feature assignments, ensuring that the model adheres to user-defined structural constraints and incorporates domain knowledge.
Results
The computational experiments indicate that OPDT successfully discovers optimal patterns in various datasets, achieving high interpretability and performance while maintaining reasonable runtimes. The results highlight the model's ability to minimize false positive rates and maximize coverage in binary classification tasks.
Implications
The OPDT model has significant implications for fields requiring interpretable machine learning solutions, such as healthcare, finance, and criminal justice. By providing human-readable rules, it enhances decision-making processes and ensures accountability and transparency in automated systems.
A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures
Theory
Efficient ML
Time Series
- Introduction of a Schur-stable state-matrix weight-projection scheme.
- Alternative stable state-matrix parameterization for improved computational efficiency.
- Demonstrated performance on synthetic and real-world datasets.
- Achieves comparable accuracy and convergence rates to state-of-the-art methods.
Read more
A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures
Summary
This paper addresses the challenge of building black-box models for dynamical systems while ensuring asymptotic stability. The authors introduce a novel projection scheme based on Schur decomposition for the state matrix of linear discrete-time state-space layers. This method dynamically projects the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable counterpart, thus ensuring stable dynamics with minimal overparameterization. The proposed approach is backpropagation-compatible, making it suitable for training neural networks. Experimental results on synthetic linear systems indicate that the method achieves accuracy and convergence rates comparable to state-of-the-art techniques for stable system identification, despite a slight increase in computational complexity. Additionally, the reduced weight count enhances convergence during training without compromising accuracy in stacked neural-network architectures with static nonlinearities. Overall, the Schur-based projection method provides a robust framework for identifying complex dynamics while adhering to strict stability requirements.
Methodology
The authors propose a Schur-decomposition-based projection method that dynamically adjusts the state matrix of neural networks to ensure stability. This involves projecting the quasi-triangular factor of the Schur decomposition onto its nearest stable matrix. The methodology includes a pre-factorized formulation to enhance computational efficiency during training.
Results
The proposed methods were benchmarked against state-of-the-art techniques, showing that they maintain high accuracy and convergence rates on both synthetic and real-world datasets. The results indicate that the Schur-based projection method effectively balances stability and performance, with a lower weight count aiding in faster convergence during training.
Implications
This work has significant implications for the development of stable neural network architectures suitable for real-time control applications and other dynamical systems modeling tasks. The methods can be applied to enhance the stability and efficiency of various machine learning models that require robust performance in dynamic environments.
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
Reinforcement Learning
Optimization
Robotics
- DRATS addresses imbalanced data allocation in MTRL by prioritizing tasks with the largest return gap.
- The algorithm is derived from a minimax optimization framework, focusing on minimizing the worst-case return gap.
- Empirical results show DRATS achieves higher data efficiency and better performance on harder tasks compared to existing methods.
- DRATS can be integrated with existing multi-task learning architectures, enhancing their effectiveness.
Read more
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
Summary
This paper addresses the challenge of imbalanced learning in Multi-Task Reinforcement Learning (MTRL), where agents tend to quickly solve easier tasks while struggling with harder ones due to uniform data allocation. The authors propose a novel algorithm, Distributionally Robust Adaptive Task Sampling (DRATS), which adaptively prioritizes tasks based on their return gap from a target return. By formalizing MTRL as a feasibility problem and deriving a minimax objective, DRATS aims to minimize the worst-case return gap. The algorithm improves data efficiency and enhances performance on harder tasks by focusing on those that are furthest from being solved. Empirical evaluations on benchmarks such as MetaWorld-MT10 and MT50 demonstrate that DRATS outperforms existing task sampling methods in terms of data efficiency and worst-task performance, while also being compatible with various multi-task architectures.
Methodology
The authors formalize MTRL as a feasibility problem and derive a minimax optimization objective to minimize the maximum return gap. DRATS adaptively adjusts the task-sampling distribution to prioritize tasks that are furthest from achieving their target returns, thereby improving data allocation efficiency.
Results
In experiments on MetaWorld-MT10 and MT50 benchmarks, DRATS demonstrated improved data efficiency, higher aggregate returns, and superior worst-task performance compared to existing curriculum learning baselines. The algorithm converges to within an epsilon error of the optimal worst-task return gap at a standard rate of O(1/sqrt(T)).
Implications
The findings suggest that adaptive task sampling can significantly enhance the performance of multi-task reinforcement learning systems, particularly in environments with varying task difficulties. This approach could be beneficial in real-world applications where tasks are not uniformly easy or difficult, such as robotics and autonomous systems.
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
Large Language Models
NLP
Theory
- TFGN enables continual pre-training without task labels or data replay.
- Achieves minimal backward transfer and high retention rates across diverse text domains.
- Demonstrates positive cross-domain forward transfer, enhancing model performance.
- Introduces extensions for autonomous continual learning and effective forward-pass behavior.
Read more
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
Summary
The paper introduces TFGN, a novel architectural overlay for transformer language models designed to enable continual pre-training on diverse text domains without the need for task labels or data replay. This approach addresses the limitations of existing continual learning methods that typically require a buffer of previous data, task identifiers, or regularization penalties that do not scale well with model size. TFGN operates by producing input-conditioned, parameter-efficient updates within the existing transformer architecture, allowing for dense forward passes across all tokens. The authors demonstrate TFGN's effectiveness through extensive experiments across six heterogeneous text domains, achieving minimal backward transfer and high retention rates without the use of replay buffers or task IDs. Additionally, TFGN exhibits positive cross-domain forward transfer, indicating that training on one domain can benefit performance on another. The paper also explores two extensions that further enhance the model's capabilities, including a meta-control layer that reduces forgetting and an operator-level plan vector that reshapes forward-pass behavior. Overall, TFGN represents a significant advancement in continual learning for large language models, showcasing a unique combination of properties not previously achieved in the literature.
Methodology
TFGN is implemented as an architectural overlay within transformer models, allowing for input-conditioned updates while maintaining the integrity of the existing architecture. The model is evaluated across six text domains using a large-scale continual pre-training setup, with experiments conducted at multiple parameter scales and training regimes. The methodology emphasizes dense forward passes and internal mechanisms for structuring parameter updates to prevent catastrophic forgetting.
Results
TFGN achieved a backward transfer of -0.007 at LLaMA 3.1 8B Retrofit with high retention rates across training phases. The model demonstrated β₯99.59% L2-orthogonal gradient separation between domain pairs and significant reductions in perplexity for held-out domains, indicating effective cross-domain transfer. The extensions further improved performance, with the meta-control layer reducing forgetting by 81% and the plan vector maintaining high fidelity across parameter jumps.
Implications
The findings suggest that TFGN could be applied in scenarios requiring continual learning without the constraints of traditional methods, such as in dynamic environments where task labels are not available. This could enhance the adaptability and longevity of large language models in real-world applications.
Watch your neighbors: Training statistically accurate chaotic systems with local phase space information
Theory
Time Series
Optimization
- Introduces a framework that bridges the gap between Jacobian accuracy and long-term statistical behavior in chaotic systems.
- Utilizes local coverings of chaotic attractors to analyze dynamics and improve model training.
- Demonstrates significant improvements in Jacobian accuracy even in the presence of noise.
- Provides a practical solution for training surrogate models without requiring ground-truth dynamics.
Read more
Watch your neighbors: Training statistically accurate chaotic systems with local phase space information
Summary
This paper addresses the challenges of modeling chaotic systems through data-driven approaches, where small errors can lead to significant discrepancies in trajectory predictions. The authors propose a novel framework that combines the accurate reproduction of the Jacobian of chaotic dynamics with the long-term statistical behavior of these systems. By constructing a local covering of a chaotic attractor in phase space, the method analyzes how these coverings evolve over time under the dynamics. The surrogate model is trained by minimizing the maximum mean discrepancy between the pushforward distributions of the coverings from both the surrogate and the ground-truth dynamics. The results demonstrate that this approach significantly enhances Jacobian accuracy while maintaining competitiveness with existing state-of-the-art methods for statistically accurate dynamics learning. The framework is particularly robust against noise, making it suitable for real-world applications where data may be corrupted.
Methodology
The authors develop a training method for Neural ODEs that focuses on local phase space information, specifically the evolution of local sets over time. A specialized loss function is designed to regularize training based on the local expansion and contraction of chaotic attractors. The method minimizes the maximum mean discrepancy between the pushforward distributions of the surrogate and ground-truth dynamics.
Results
The experiments conducted show that the proposed method significantly improves the accuracy of the Jacobians compared to existing methods while also being competitive in terms of long-term statistical accuracy. The approach is effective even when dealing with noisy data, demonstrating its robustness and versatility.
Implications
This work has potential implications for various scientific fields that rely on modeling chaotic systems, such as meteorology, fluid dynamics, and neuroscience. The ability to accurately model chaotic dynamics can enhance predictive capabilities and improve understanding of complex systems.
Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction
Optimization
Time Series
- Introduction of DBS-Adam, an optimiser that adapts learning rates based on batch difficulty.
- Integration of DBS-Adam with Bi-LSTM networks for predicting injury severity in vehicular accidents.
- Demonstrated significant improvements in model performance metrics compared to traditional optimisers.
- Achieved a test accuracy of 95.22% and improved precision, recall, and F1-score.
Read more
Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction
Summary
This paper introduces the Dynamic Batch-Sensitive Adam (DBS-Adam) optimiser, designed to enhance the training of deep learning models on imbalanced and sequential datasets, particularly in the context of predicting injury severity from vehicular accidents. Traditional optimisers often struggle with such datasets, leading to slow convergence and instability. DBS-Adam addresses these issues by dynamically adjusting the learning rate based on a batch difficulty score, which is calculated from exponential moving averages of gradient norms and batch loss. The authors integrated DBS-Adam with Bi-Directional Long Short-Term Memory (Bi-LSTM) networks to predict accident injury severity, employing SMOTE-ENN resampling and Focal Loss to tackle class imbalance. The performance of DBS-Adam was rigorously evaluated against state-of-the-art optimisers like AMSGrad, AdamW, and AdaBound across multiple experimental configurations. The results showed that DBS-Adam significantly improved precision and achieved a test accuracy of 95.22%, with notable metrics including 96.11% precision, 95.28% recall, and 95.39% F1-score. This study demonstrates that DBS-Adam not only enhances model performance but also provides a robust framework for real-time accident severity classification, which can inform emergency response strategies and improve road safety interventions.
Methodology
The study employed a novel optimiser, DBS-Adam, which dynamically adjusts the learning rate based on a difficulty score derived from gradient norms and batch loss. It was integrated with Bi-LSTM networks for predicting injury severity, and class imbalance was addressed using SMOTE-ENN resampling and Focal Loss. The performance was evaluated against several baseline models and state-of-the-art optimisers across multiple experimental configurations.
Results
DBS-Adam outperformed traditional optimisation methods, achieving a test accuracy of 95.22%, precision of 96.11%, recall of 95.28%, F1-score of 95.39%, and a test loss of 0.0086. Statistical analysis indicated significant improvements in precision (p=0.020).
Implications
The findings suggest that DBS-Adam can effectively enhance the training of deep learning models on imbalanced sequential data, making it a valuable tool for real-time accident severity classification. This has implications for improving emergency response strategies and road safety interventions, potentially reducing fatalities and enhancing resource allocation in crisis situations.
Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows
NLP
Large Language Models
Optimization
- Lang2MLIP reduces the need for domain expertise in MLIP development by using natural language inputs.
- The framework employs a multi-agent system to manage the workflow dynamically, allowing for self-correction.
- Evaluation on a solid electrolyte interphase system shows the effectiveness of the approach in adapting to complex materials.
- Lang2MLIP represents a shift from fixed pipelines to a more flexible, decision-based model for MLIP development.
Read more
Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows
Summary
The paper introduces Lang2MLIP, a novel multi-agent framework designed to facilitate the development of machine learning interatomic potentials (MLIPs) for complex materials systems. Traditional MLIP development is hindered by the need for extensive domain expertise and fixed workflows, which are inadequate for heterogeneous materials where optimal strategies are not predetermined. Lang2MLIP addresses these challenges by employing large language models (LLMs) to interpret natural language inputs and manage the MLIP development process as a sequential decision-making task. The framework consists of two main phases: an interactive preparation phase where specialized agents gather necessary information and generate initial structures, and an autonomous training phase where a central decision-making agent selects actions based on the current state of the dataset and model. This approach allows for dynamic adjustments and self-correction, enhancing the accessibility of MLIP development for non-experts. The framework was evaluated on a solid electrolyte interphase (SEI) system, demonstrating its capability to autonomously construct a curriculum for model training that adapts to observed failures, thus showcasing the potential of LLM-based systems in automating complex scientific workflows.
Methodology
Lang2MLIP utilizes a multi-agent framework where specialized agents interpret natural language task specifications, generate initial structures, and manage the iterative training process. The central decision-making agent selects actions based on the current dataset, model state, and evaluation results, enabling dynamic adjustments to the workflow.
Results
The evaluation on a solid electrolyte interphase (SEI) system demonstrated that Lang2MLIP could autonomously develop a three-stage curriculum that effectively addressed the complexities of the materials system. The framework successfully reallocates sampling efforts in response to model failures, indicating its adaptability and effectiveness in MLIP development.
Implications
Lang2MLIP has the potential to democratize the development of machine learning interatomic potentials, making it accessible to non-experts and facilitating advancements in materials science. Its approach could be applied to other scientific workflows requiring iterative model refinement and adaptive learning.
Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic
Reinforcement Learning
Federated Learning
Robotics
- Introduces a federated actor-critic framework that supports personalized policy training.
- Establishes finite-time convergence rates for critic error and policy gradient norms.
- Demonstrates linear speedup with respect to the number of agents in heterogeneous environments.
- Develops new perturbation analysis techniques for projected subspace updates.
Read more
Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic
Summary
This paper introduces a novel federated actor-critic framework that enables collaborative yet personalized policy training among agents. Unlike existing methods that either ignore environmental heterogeneity or rely on a single shared policy, the proposed approach allows agents to maintain personalized local policy components while sharing a common linear subspace representation. The authors establish finite-time convergence for the proposed single-timescale personalized federated actor-critic algorithm (pFedAC) through a joint linear approximation framework. They demonstrate that both the critic error and policy gradient norm converge to zero at rates dependent on the number of agents and the discount factor. The paper also addresses challenges posed by heterogeneous Markovian trajectories and develops new perturbation analyses for the updates. Experimental results on a federated Hopper-v5 benchmark show that the proposed method outperforms traditional approaches like Single PPO and FedAvg PPO, highlighting the benefits of personalization in federated reinforcement learning.
Methodology
The authors propose a single-timescale personalized federated actor-critic algorithm (pFedAC) where agents collaboratively estimate a shared linear subspace while updating their local critic heads and personalized policies. The framework employs Markovian sampling and utilizes a joint linear approximation for convergence analysis, addressing the complexities of heterogeneous environments and coupled learning dynamics.
Results
The convergence analysis shows that the critic error converges to zero at the rate of e^(O(1/((1-Ξ³)4βTK))) and the policy gradient norm converges to zero at the rate of e^(O(1/((1-Ξ³)6βTK))). Experimental results indicate that the proposed pFedAC outperforms Single PPO and FedAvg PPO on the federated Hopper-v5 benchmark, demonstrating the advantages of personalization in policy training.
Implications
The findings suggest that personalized federated reinforcement learning can significantly enhance training efficiency and adaptability in complex environments, making it applicable to various domains such as robotics and embodied AI, where agents must collaborate while adapting to diverse local conditions.
Peng's Q(Ξ») for Conservative Value Estimation in Offline Reinforcement Learning
Reinforcement Learning
Theory
- CPQL is the first multi-step Q-learning algorithm for model-free offline RL.
- The method adapts the PQL operator for conservative value estimation without requiring additional models.
- Theoretical analyses confirm that CPQL achieves performance greater than or equal to the behavior policy.
- Extensive experiments show CPQL significantly outperforms existing offline single-step RL algorithms.
Read more
Peng's Q(Ξ») for Conservative Value Estimation in Offline Reinforcement Learning
Summary
This paper introduces Conservative Pengβs Q(Ξ») (CPQL), a novel model-free offline multi-step reinforcement learning algorithm that adapts the Pengβs Q(Ξ») operator for conservative value estimation, offering an alternative to the traditional Bellman operator. CPQL is the first method to theoretically and empirically validate the effectiveness of conservative multi-step value estimation in offline reinforcement learning (RL) by fully utilizing offline trajectories. The fixed point of the PQL operator is shown to be closer to the value function of the behavior policy, which helps in implicit behavior regularization. CPQL effectively mitigates over-pessimistic value estimation and guarantees performance that is at least equal to that of the behavior policy, achieving near-optimal performanceβa significant advancement over previous conservative approaches. The authors conducted extensive experiments on the D4RL benchmark, demonstrating that CPQL consistently outperforms existing offline single-step methods. Additionally, CPQL contributes to the offline-to-online learning framework by enabling the online PQL agent to maintain robust performance during fine-tuning, thus avoiding the typical performance drop. The code for CPQL is publicly available.
Methodology
The authors developed CPQL by adapting the Pengβs Q(Ξ») operator to facilitate conservative value estimation, leveraging complete offline trajectories instead of fragmented single-step transitions. The method avoids importance sampling, thus preventing distribution mismatch issues. Theoretical proofs were provided to validate the performance guarantees of CPQL.
Results
CPQL demonstrated consistent and significant improvements over existing offline single-step RL algorithms on the D4RL benchmark. The theoretical results indicated that CPQL's mixture policy outperforms the behavior policy and reduces the sub-optimality gap compared to conservative Q-learning (CQL).
Implications
The development of CPQL has significant implications for offline reinforcement learning, particularly in enhancing the reliability and performance of learned policies from static datasets. It also opens avenues for improved offline-to-online learning strategies, potentially benefiting various applications in robotics and autonomous systems.