AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
24
Papers today
8h
Update frequency
7
Days of history
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
NLP
Large Language Models
Efficient ML
- QAOD introduces a geometric decomposition method for hallucination detection in LLMs.
- The framework efficiently separates question-aligned components from answer representations.
- Fisher-based selection identifies the most informative layers and neurons for probing.
- QAOD achieves superior in-domain and out-of-domain performance compared to existing methods.
Read more
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
Summary
This paper addresses the challenge of hallucination detection in large language models (LLMs), where models generate factually incorrect or unsupported content. The authors propose a novel framework called QAOD (Question-Answer Orthogonal Decomposition), which enhances detection efficiency and robustness against domain shifts. Unlike traditional black-box methods that require multiple inferences or white-box methods that analyze answer representations in isolation, QAOD employs a geometric approach to separate question-aligned components from answer representations in a single pass. This allows for the extraction of a question-orthogonal component that minimizes domain variation while retaining informative signals. The framework incorporates a Fisher-based scoring mechanism to select the most discriminative layers and neurons, ensuring effective probing. QAOD features two probing strategies: a joint probe that combines the orthogonal component with question context for improved in-domain performance, and an orthogonal-only probe that maintains domain-agnostic factuality signals for better cross-domain generalization. The results demonstrate that the joint probe achieves the highest in-domain AUROC across all tested model-dataset pairs, while the orthogonal-only probe surpasses the best white-box baseline by up to 21% on the BioASQ dataset, all while operating at less than 2% of generation cost.
Methodology
QAOD consists of an offline branch that identifies informative layers and neurons during training and an online branch that performs single-pass hallucination detection at test time. It utilizes geometric decomposition to project out question-aligned components from answer representations, allowing for the extraction of a question-orthogonal component. Fisher scoring is employed to select the most discriminative features.
Results
The joint probe of QAOD achieves the best in-domain AUROC across all evaluated model-dataset pairs. The orthogonal-only probe shows a significant improvement in out-of-domain transfer, outperforming the best white-box baseline by up to 21% AUROC points on the BioASQ dataset, all while maintaining under 2% of generation cost.
Implications
The findings suggest that QAOD can enhance the reliability of LLMs in critical applications such as healthcare and legal services by effectively detecting hallucinations. Its efficiency and robustness make it a promising tool for real-world deployment of LLMs.
Neural Fields for NV-Center Inverse Sensing
Optimization
Theory
- NeTMY is introduced as an innovative neural-field solver for NV relaxometry inversion, addressing limitations of traditional methods.
- The proposed tensor power-summed forward operator improves fidelity by avoiding nonphysical cross terms found in scalar solvers.
- NeTMY achieves superior performance in reconstructing sparse density and spectral fields without the need for paired density labels.
- The method effectively mitigates center-collapse issues and enhances optimization stability through its parameterization.
Read more
Neural Fields for NV-Center Inverse Sensing
Summary
This paper addresses the challenges of solving inverse problems in scientific sensing, particularly in the context of nitrogen-vacancy (NV) center noise sensing in diamond. Traditional methods often rely on hand-designed regularizers or supervised networks trained on simulated data, which can fail under nonlinear and complex conditions. The authors propose a novel approach, NeTMY, which utilizes an amortization-free coordinate neural field that is coupled with a differentiable NV forward model. This method incorporates annealed positional encoding, multiscale optimization, and spectrum-fidelity losses to improve the reconstruction of sparse spin sources from magnetic-noise spectra. The authors demonstrate that their approach significantly enhances localization and distribution metrics compared to existing methods, while also mitigating common failure modes such as center-collapse. The study highlights the importance of forward-model fidelity in shaping the optimization landscape for inverse problems, positioning NV quantum sensing as a promising testbed for exploring physics-faithful neural inverse problems.
Methodology
The authors developed NeTMY, a coordinate neural field that optimizes parameters for each measurement using a differentiable tensor power-summed forward operator. The method employs annealed positional encoding and multiscale optimization to recover high-frequency structures while maintaining stability. It integrates spectrum consistency, sparsity, total variation regularization, and density gating to enhance reconstruction quality.
Results
NeTMY demonstrated the best localization and distributional metrics in benchmark tests involving sparse synthetic reconstructions. Mechanistic experiments revealed that NeTMY's parameterization effectively smooths and redistributes updates, reducing the risk of center-collapse and improving overall reconstruction fidelity.
Implications
The findings suggest that NeTMY could be applied to various scientific sensing applications where traditional methods struggle, particularly in scenarios involving nonlinear and complex forward models. This work also opens avenues for further research into physics-faithful neural inverse problems, potentially enhancing the accuracy of quantum sensing technologies.
Generalized Priority-Aware Shapley Value
Theory
Interpretability
Graph Learning
- GPASV handles cyclic and weighted priority graphs, overcoming limitations of existing Shapley value methods.
- The method incorporates individual soft priorities, enhancing the valuation framework.
- GPASV is validated through simulations and practical applications, confirming its accuracy and scalability.
- The priority sweeping diagnostic reveals larger effects of soft priorities compared to previous methods.
Read more
Generalized Priority-Aware Shapley Value
Summary
The paper introduces the Generalized Priority-Aware Shapley Value (GPASV), a novel valuation method that extends traditional Shapley value approaches to accommodate cyclic and weighted priority graphs, which are common in real-world applications like aggregated human preferences and multi-criterion comparisons. Existing methods typically rely on binary and acyclic pairwise priorities, limiting their applicability. GPASV allows for arbitrary directed weighted priority graphs, where pairwise edges can penalize order violations rather than strictly forbidding them. The authors provide an axiomatic characterization of GPASV, develop computational methods for its implementation, and introduce a priority sweeping diagnostic that reveals the effects of individual soft priorities. The method is validated through extensive simulations and applied to a large-scale experiment involving LLM ensemble valuation on the cyclic Chatbot Arena preference graph, demonstrating its practicality and scalability. The findings indicate that GPASV can yield significantly different valuations based on the balance of pairwise graph priority and individual soft priority, emphasizing the importance of context in valuation processes.
Methodology
The authors establish GPASV through an axiomatic framework that generalizes previous priority-aware Shapley values. They develop computational techniques including a local adjacent-swap Metropolis-Hastings ratio, a stage-wise greedy initialization for cyclic graphs, and integrate these with Monte Carlo estimation and utility caching to enhance scalability. The priority sweeping diagnostic is also extended to analyze the impact of soft priorities.
Results
The simulations validate GPASV's accuracy and theoretical predictions. The application to LLM ensemble valuation on the Chatbot Arena preference graph demonstrates that GPASV can produce significantly different valuations based on the interplay between pairwise graph priorities and individual soft priorities, highlighting the method's flexibility and robustness.
Implications
GPASV has potential applications in various fields where valuation of contributors is necessary, such as in machine learning for data attribution, feature importance analysis, and model evaluation. The ability to handle complex priority structures makes it particularly useful in scenarios involving human preferences and multi-criteria decision-making.
Exploitation of Hidden Context in Dynamic Movement Forecasting: A Neural Network Journey from Recurrent to Graph Neural Networks and General Purpose Transformers
Time Series
Graph Learning
Optimization
- Traditional forecasting methods are inadequate for capturing nonlinear dynamics in sports.
- Hybrid LSTM models that incorporate contextual information significantly improve forecasting accuracy.
- Machine learning methods outperform traditional models in predicting short-term movements.
- No single architecture is optimal for all metrics in trajectory prediction; task-specific considerations are crucial.
Read more
Exploitation of Hidden Context in Dynamic Movement Forecasting: A Neural Network Journey from Recurrent to Graph Neural Networks and General Purpose Transformers
Summary
This paper investigates the challenges of dynamic movement forecasting in sports, particularly focusing on predicting the movements of NBA players. Traditional forecasting methods, such as ARIMA and Kalman filters, struggle with the nonlinear and unpredictable nature of sports dynamics. The authors evaluate various machine learning models, including LSTMs, GNNs, and Transformers, to assess their ability to capture temporal dependencies and contextual interactions. A key contribution of the study is the introduction of a hybrid LSTM model that incorporates contextual information, which significantly improves forecasting accuracy. The experimental results demonstrate that ML methods outperform traditional models, achieving a final displacement error of 1.51m with the hybrid LSTM, while also requiring less data and training time compared to other architectures like GAT and Transformers. The findings highlight the importance of task-specific considerations in trajectory prediction, indicating that no single architecture excels across all metrics in fast-paced environments like NBA games.
Methodology
The authors conducted a comparative analysis of various machine learning models, including LSTMs, GNNs, TCNNs, and Transformers. They developed a hybrid LSTM architecture that integrates contextual modeling to enhance the representation of spatial and temporal features. The models were evaluated based on performance trade-offs across input history length, generalization capacity, and computational complexity.
Results
The hybrid LSTM model achieved the lowest final displacement error of 1.51m, outperforming other tested architectures such as TCNN, GAT, and Transformers. The ML-based methods showed substantial improvements over traditional linear models across forecast horizons of up to 2 seconds, demonstrating enhanced accuracy and efficiency.
Implications
The findings suggest that integrating contextual information into forecasting models can lead to significant improvements in real-time sports analytics, offering tactical advantages and potentially reducing injury risks. This research can inform the development of more effective predictive tools in sports and other dynamic environments.
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
Reinforcement Learning
Large Language Models
NLP
- Introduces the concept of Action Bottleneck in agentic RL, highlighting the disparity in informative training signals between action and reasoning tokens.
- Proposes ACTFOCUS, a token reweighting approach that downweights reasoning tokens and prioritizes action tokens based on their predictive uncertainty.
- Demonstrates that ACTFOCUS consistently outperforms traditional methods like PPO and GRPO in various environments.
- Finds a strong correlation between action token uncertainty and reward variance, emphasizing the importance of targeted training signal allocation.
Read more
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
Summary
This paper addresses the challenge of credit assignment in agentic reinforcement learning (RL) for large language models (LLMs) by introducing a novel approach called ACTFOCUS. The authors identify a phenomenon termed the 'Action Bottleneck,' where action tokens, despite being a small fraction of generated tokens, carry the majority of informative training signals. They demonstrate that traditional policy-gradient methods like PPO and GRPO treat all tokens uniformly, leading to ineffective training signal allocation. By employing an energy-based modeling perspective, the authors quantify token-level training signals and show that action tokens correlate strongly with reward variance, while reasoning tokens do not. ACTFOCUS reweights gradients to downweight reasoning tokens and prioritize action tokens with higher uncertainty, effectively redirecting gradient mass towards the critical action spans. The method is evaluated across four multi-turn environments, showing significant performance improvements over standard methods without additional computational costs.
Methodology
The authors analyze token-level attributions in agentic RL using an energy-based modeling approach. They quantify training signals based on the predictive uncertainty of tokens and their correlation with reward variance. The ACTFOCUS method involves downweighting gradients for reasoning tokens and redistributing weights to action tokens based on their uncertainty, thus enhancing the focus on critical action tokens during training.
Results
ACTFOCUS achieved final success-rate improvements of up to 65.2 percentage points over PPO and 63.7 percentage points over GRPO across four evaluated environments. Additionally, it improved training stability by reducing peak-to-final performance degradation.
Implications
The findings suggest that more effective credit assignment strategies in agentic RL can lead to better performance in LLMs, particularly in complex, multi-turn tasks. This could enhance the deployment of LLMs as autonomous agents in various applications, including conversational agents and decision-making systems.
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
Large Language Models
NLP
- RxEval shifts the evaluation of medication recommendation from admission-level to prescription-level, capturing the dynamic nature of patient care.
- The benchmark includes a reasoning-chain perturbation method to create patient-specific distractors for multiple-choice questions.
- Evaluation of 16 LLMs demonstrates that current models have significant limitations in accurately recommending medications.
- The results highlight systematic errors in LLMs, including oversight of patient information and reasoning failures.
Read more
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
Summary
The paper introduces RxEval, a novel benchmark designed to evaluate the medication recommendation capabilities of large language models (LLMs) at the prescription level. Unlike existing benchmarks that assess admission-level predictions using coarse drug codes, RxEval focuses on the detailed, time-ordered clinical trajectories of patients, requiring the selection of specific medication-dose-route triples. The benchmark consists of 1,547 multiple-choice questions derived from real electronic health records, covering 584 patients and 969 unique medications across 18 diagnostic categories. The authors evaluate 16 different LLMs using RxEval, revealing that the models struggle with the complexity of the task, achieving F1 scores between 45.18 and 77.10, with the best Exact Match accuracy at only 46.10%. Error analysis indicates that even advanced models frequently overlook critical patient information or fail to draw appropriate clinical conclusions, underscoring the need for further improvements in LLMs for practical medical applications.
Methodology
The authors developed RxEval by constructing a set of multiple-choice questions based on real electronic health records, focusing on specific medication-dose-route triples. They employed a reasoning-chain perturbation method to generate patient-specific distractors, ensuring that the questions require nuanced clinical reasoning rather than general medical knowledge. The benchmark was then used to evaluate 16 LLMs, measuring their performance through F1 scores and Exact Match accuracy.
Results
The evaluation of 16 LLMs on RxEval revealed F1 scores ranging from 45.18 to 77.10, with the highest Exact Match accuracy at 46.10%. Notably, even the top-performing model, Gemini-3.1-Pro, demonstrated significant limitations, achieving only 46.10% Exact Match. The analysis identified common error patterns, including oversight errors and reasoning errors, indicating that LLMs struggle with the complexities of real-world prescribing tasks.
Implications
The findings suggest that while LLMs show promise for assisting in medication recommendations, there is substantial room for improvement in their reasoning capabilities. RxEval can serve as a foundational tool for future research aimed at enhancing LLM performance in clinical settings, particularly in areas with clinician shortages.
Woodelf++: A Fast and Unified Partial Dependence Plot Algorithm for Decision Tree Ensembles
Efficient ML
Interpretability
- WOODELF++ provides a unified approach for computing PDPs, Joint-PDPs, and PDIVs.
- The algorithm achieves significant computational efficiency, particularly for Any-Order-PDIVs.
- Implementation in pure Python with GPU support enhances accessibility for large datasets.
- The method demonstrates up to 6x speed improvement over existing algorithms and up to 1,000,000 years faster for Any-Order-PDIVs.
Read more
Woodelf++: A Fast and Unified Partial Dependence Plot Algorithm for Decision Tree Ensembles
Summary
The paper introduces WOODELF++, an efficient algorithm designed for computing Partial Dependence Plots (PDPs), Joint Partial Dependence Plots (Joint-PDPs), and Partial Dependence Interaction Values (PDIVs) for decision tree ensembles. These tools are essential for interpreting machine learning models, particularly in understanding how individual features and their interactions influence predictions. WOODELF++ builds upon the existing WOODELF algorithm, enhancing its capabilities by deriving metrics over pseudo-Boolean functions. This unified framework allows for both exact and approximate computations of PDPs, Joint-PDPs, and Any-Order-PDIVs, significantly improving computational efficiency. The authors demonstrate that WOODELF++ can compute these metrics up to 6 times faster than current state-of-the-art methods and up to five orders of magnitude faster than popular libraries like scikit-learn. The algorithm is implemented in pure Python and supports GPU acceleration, making it accessible for large datasets. The paper highlights the importance of these explainability tools in real-world applications, emphasizing the need for faster computation methods to facilitate trust and decision-making in machine learning.
Methodology
The authors derive suitable metrics over pseudo-Boolean functions to compute PDPs, Joint-PDPs, and Any-Order-PDIVs in a unified framework. WOODELF++ improves upon the naive computation methods by leveraging the structure of decision tree ensembles, allowing for faster evaluations and computations.
Results
WOODELF++ computes PDPs and Joint-PDPs up to 6 times faster than the state of the art and significantly faster than scikit-learn. For Any-Order-PDIVs, it completes computations in 5 minutes compared to the estimated 1,000,000 years required by existing methods.
Implications
The advancements presented in WOODELF++ have significant implications for the interpretability of machine learning models, particularly in fields where decision tree ensembles are prevalent. Faster computation of explainability metrics can enhance model trust and facilitate better decision-making in critical applications such as healthcare and finance.
DeepTokenEEG Enhancing Mild Cognitive Impairment and Alzheimers Classification via Tokenized EEG Features
Time Series
Efficient ML
- Introduction of DeepTokenEEG, a lightweight model for AD classification using EEG signals.
- Utilization of spatial and temporal tokenization to capture relevant biomarkers.
- Achieved 100% accuracy on specific frequency bands, surpassing existing methods.
- Constructed a large-scale dataset for comprehensive benchmarking.
Read more
DeepTokenEEG Enhancing Mild Cognitive Impairment and Alzheimers Classification via Tokenized EEG Features
Summary
The paper presents DeepTokenEEG, a novel lightweight deep learning model designed to enhance the classification of Alzheimer's disease (AD) and mild cognitive impairment (MCI) using electroencephalogram (EEG) signals. The authors highlight the importance of early detection of AD for improving patient outcomes and address the limitations of traditional diagnostic methods, which are often subjective and time-consuming. DeepTokenEEG employs a spatial and temporal tokenizer to effectively capture AD-related biomarkers in both temporal and frequency domains, achieving high performance with only 0.29 million parameters. The model was trained on a combined dataset of 274 subjects, including 180 AD cases and 94 healthy controls, and demonstrated a maximum accuracy of 100% on specific frequency bands, outperforming existing state-of-the-art methods by 1.41-15.35%. The study emphasizes the potential of DeepTokenEEG for early detection and screening of AD, making it suitable for deployment in clinical settings due to its compact size and efficiency.
Methodology
The authors developed a deep learning framework called DeepTokenEEG, which uses a tokenization approach to transform EEG signals into tokens that preserve temporal order and long-range dependencies. The model was trained on a dataset comprising various neurological conditions, including AD, MCI, and healthy controls, allowing for multi-class classification.
Results
DeepTokenEEG achieved a maximum accuracy of 100% on specific frequency bands, representing a significant improvement over state-of-the-art methods by 1.41-15.35%. The model's performance was validated on a dataset of 274 subjects, demonstrating its effectiveness in distinguishing AD from other conditions.
Implications
The findings suggest that DeepTokenEEG could serve as a reliable tool for the early detection of Alzheimer's disease, potentially improving patient outcomes through timely intervention. Its lightweight nature makes it suitable for deployment in various clinical settings, enhancing accessibility to diagnostic tools for neurological conditions.
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
NLP
Large Language Models
- The rise of LLMs has increased the demand for effective AI-generated text detection methods.
- Binoculars-inclusive ensembles demonstrate the highest detection performance but are most susceptible to paraphrasing attacks.
- Human detection of AI-generated text is often unreliable, underscoring the need for automated solutions.
- The study categorizes detection methods into training-based and training-free paradigms, with Binoculars showing significant advantages.
Read more
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
Summary
This paper investigates the resilience of various AI-generated text detection methods against paraphrasing attacks, which are increasingly used to bypass detection systems. The authors evaluate three primary detection approaches: fine-tuned RoBERTa, Binoculars, and text feature analysis, as well as their ensembles using Random Forest classifiers. The study highlights the growing need for reliable detection mechanisms in light of the proliferation of large language models (LLMs) and the associated risks of plagiarism and misinformation. The findings reveal that while ensembles including Binoculars achieve the best performance, they also exhibit significant vulnerability to paraphrasing attacks. This dichotomy between detection performance and resilience complicates the understanding of the reliability of current state-of-the-art techniques. The paper emphasizes the necessity for precise automated detection methods to combat the challenges posed by AI-generated content.
Methodology
The authors evaluated three detection methods: fine-tuned RoBERTa, the Binoculars approach, and text feature analysis. They also explored ensemble methods using Random Forest classifiers to combine these techniques. The study focused specifically on the impact of paraphrasing attacks on these detection systems.
Results
The results indicated that ensembles incorporating the Binoculars method yielded the strongest detection performance. However, these ensembles experienced the most significant declines in accuracy when subjected to paraphrasing attacks, highlighting a critical trade-off between performance and resilience.
Implications
The findings suggest that while certain detection methods can achieve high accuracy, their vulnerability to paraphrasing attacks poses a challenge for their reliability in real-world applications. This underscores the need for ongoing research into more robust detection mechanisms to address the evolving landscape of AI-generated content.
Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence
Computer Vision
Multimodal
Efficient ML
- Mini-JEPAs outperform generalist models on specialized hydrologic tasks.
- Each Mini-JEPA is specialized for a specific satellite sensor, enhancing predictive accuracy.
- The embedding manifolds of the Mini-JEPAs exhibit distinct geometric structures.
- A routing LLM effectively selects the appropriate Mini-JEPA for hydrologic queries.
Read more
Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence
Summary
This paper introduces a fleet of small, sensor-specialized Joint Embedding Predictive Architecture (JEPA) foundation models, termed Mini-JEPAs, designed to enhance hydrologic intelligence. Unlike single, large planetary-scale models that may compromise on specialized hydrologic signals, the Mini-JEPAs are tailored to specific satellite sensors, allowing for more precise environmental variable reconstruction. The study pretrains five Mini-JEPAs, each with 22 million parameters, utilizing images from various satellite sources including Sentinel-1, Sentinel-2, and MODIS. Each model demonstrates high accuracy in predicting the environmental variable associated with its respective sensor, achieving R2 values of up to 0.97 for elevation and mean temperature. The models exhibit distinct geometric structures in their embedding manifolds, reflecting the physical characteristics of the sensors. A routing agent, implemented as a large language model (LLM), effectively selects the appropriate Mini-JEPA for specific queries, outperforming a generalist model (AlphaEarth) in targeted retrieval tasks. The findings suggest that locally-trained Mini-JEPAs can serve as a viable alternative or complement to larger models, facilitating more accessible and efficient hydrologic intelligence systems.
Methodology
The study employs a fleet of five Mini-JEPAs, each trained on data from different satellite sensors. The models share a common Vision Transformer backbone and training recipe, focusing on predicting masked latent representations rather than pixel values. A routing agent, implemented as a large language model, is used to select the appropriate Mini-JEPA based on the query context.
Results
The Mini-JEPAs achieved high R2 values for environmental variable predictions, with 0.97 for elevation and mean temperature, and 0.81 for precipitation. The models demonstrated distinct embedding geometries, with varying global participation ratios and local intrinsic dimensionalities. The routing system outperformed the AlphaEarth model in targeted retrieval tasks, indicating the effectiveness of the specialized approach.
Implications
The findings suggest that Mini-JEPAs can significantly enhance hydrologic intelligence systems, making them more accessible for research groups with limited computational resources. This approach could lead to improved environmental monitoring and decision-making processes in hydrology and related fields.
Architecture-Aware Explanation Auditing for Industrial Visual Inspection
Computer Vision
Interpretability
- The native-readout hypothesis suggests that explanation methods must align structurally with the model's decision mechanism to ensure faithfulness.
- Experiments on the WM-811K dataset reveal significant differences in explanation quality among various models and methods.
- Swin-Tiny's architecture allows for better explanation compatibility compared to traditional models like ResNet and DenseNet.
- Model-agnostic methods like RISE underperform compared to native methods, highlighting the importance of architectural alignment.
Read more
Architecture-Aware Explanation Auditing for Industrial Visual Inspection
Summary
This paper addresses the challenge of ensuring that explanation methods for deep learning models in industrial visual inspection accurately reflect the model's decision-making process. The authors propose an architecture-aware explanation audit protocol based on the native-readout hypothesis, which posits that the faithfulness of an explanation method is limited by its structural alignment with the model's decision mechanism. They conduct experiments using the WM-811K wafer map dataset, which consists of 172,950 labeled images across nine classes. The study compares various models and explanation methods, revealing that traditional post-hoc explanation methods like Grad-CAM can produce misleading results. The authors find that models with a native readout structure, such as Swin-Tiny, yield more faithful explanations than those relying on post-hoc proxies. Additionally, a model-agnostic explainer (RISE) demonstrates lower performance than native methods, suggesting that the architecture's representation plays a critical role in explanation quality. The findings indicate that explanation pathways should be co-designed with model architectures, and the authors emphasize the need for quantitative metrics to assess explanation faithfulness.
Methodology
The authors developed an architecture-aware explanation audit protocol based on the native-readout hypothesis. They conducted experiments using the WM-811K wafer map dataset, comparing various deep learning models and explanation methods (e.g., Grad-CAM, Attention Rollout) under a perturbation-based evaluation framework. They analyzed the faithfulness of explanations through metrics like Deletion AUC and Insertion AUC, and performed sensitivity analyses to assess the impact of different perturbation strategies.
Results
The results showed that ViT-Tiny + Attention Rollout achieved a Deletion AUC of 0.211, while Swin-Tiny, ResNet18+CBAM, and DenseNet121 with Grad-CAM achieved AUCs ranging from 0.432 to 0.525. The findings indicated that the faithfulness of explanation methods varies significantly based on their structural alignment with the model's decision-making process. Additionally, RISE, a model-agnostic explainer, compressed the faithfulness ranking across families to approximately 0.1, suggesting that the architecture's representation is crucial for explanation quality.
Implications
The findings have significant implications for the design of explainable AI systems in industrial settings, particularly in visual inspection tasks. The study advocates for the co-design of explanation pathways with model architectures to enhance trustworthiness and reliability. Furthermore, it highlights the necessity of quantitative metrics for evaluating explanation faithfulness, which can improve human validation processes in critical decision-making environments.
GenAI for Energy-Efficient and Interference-Aware Compressed Sensing of GNSS Signals on a Google Edge TPU
Generative Models
Efficient ML
Interpretability
- Introduces a GenAI-based approach for real-time GNSS jamming signal classification and compression.
- Achieves significant data compression (> 42Γ) while maintaining high classification accuracy.
- Utilizes 8-bit quantization for energy-efficient deployment on Google Edge TPUs.
- Explores various autoencoder architectures and their impact on reconstruction and classification performance.
Read more
GenAI for Energy-Efficient and Interference-Aware Compressed Sensing of GNSS Signals on a Google Edge TPU
Summary
This paper presents a novel approach to the classification and compression of Global Navigation Satellite System (GNSS) jamming signals using generative artificial intelligence (GenAI) techniques, specifically variational autoencoders (VAEs). Traditional methods for detecting GNSS interference rely on extensive data transmission to cloud-based systems, which is not feasible in power-constrained environments. The proposed method compresses GNSS data streams directly at the hardware receiver while enabling real-time classification of jamming and spoofing attacks. The authors evaluate various autoencoder architectures to optimize the compression of GNSS signals, focusing on maintaining interference characteristics and minimizing data size. The study employs 8-bit quantization for energy-efficient deployment on Google Edge TPUs. Results indicate that the system achieves over 42Γ compression and accurately classifies approximately 72 types of interference with an F2-score of 0.915, closely matching the original signals (F2-score of 0.923). The paper also explores latent feature disentanglement through ablation studies on conditional and factorized VAEs, enhancing model interpretability and trust in machine learning applications for interference detection.
Methodology
The authors investigate various state-of-the-art GenAI models, including autoencoders, vanilla VAEs, conditional VAEs, and factorized VAEs. These models are trained on diverse GNSS data representations, such as raw in-phase/quadrature data, FFT spectrum data, and handcrafted features. The pipeline is optimized for deployment on Google Edge TPUs using 8-bit quantization, and performance is evaluated across multiple metrics including power efficiency, compressibility, reconstruction accuracy, and classification performance.
Results
The proposed system achieves over 42Γ compression of GNSS data while accurately classifying 72 interference types with an F2-score of 0.915, closely matching the original signals' F2-score of 0.923. The approach significantly reduces data transmission costs and enhances the feasibility of real-time interference detection in power-constrained environments.
Implications
This work provides a practical solution for real-time GNSS interference detection and mitigation, which is crucial for maintaining the functionality of localization systems in the presence of jamming. The energy-efficient and cost-effective nature of the proposed method makes it suitable for deployment in various applications, including IoT and mobile GNSS receivers.
Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning
Theory
Optimization
- Validation convergence is fastest at an intermediate dataset size in NW matrix generation, not at the largest size.
- Partial rule learning can reduce the number of updates needed to achieve high training accuracy in the weak-validation regime.
- The study introduces a two-pressure account of learning, where initial rule discovery aids fitting, but further data can increase fitting costs.
- The findings refine the classical understanding of grokking by presenting a broader timing diagram of training and validation convergence.
Read more
Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning
Summary
This paper investigates the relationship between dataset size and validation convergence in algorithmic learning tasks, specifically focusing on the Needleman-Wunsch (NW) matrix generation. The authors challenge the conventional wisdom that larger datasets always lead to faster validation convergence after the onset of generalization. They identify a 'dataset-size sweet spot' where smaller datasets can lead to quicker validation accuracy, while larger datasets may require more gradient updates to achieve the same level of accuracy. The study reveals that, in structured-output tasks like NW matrix generation, the process of learning the underlying rule can initially accelerate fitting, but as the dataset size increases, the need to fit residual details can slow down convergence. The findings suggest that memorization and generalization are not strictly sequential processes, and that partial rule learning can facilitate fitting in certain contexts. This nuanced understanding of dataset size effects has implications for how we approach training in structured-output tasks.
Methodology
The authors conducted experiments using small Transformers on the Needleman-Wunsch matrix generation task, comparing validation convergence across varying dataset sizes. They analyzed the relationship between dataset size and the number of gradient updates required to reach specific accuracy thresholds, contrasting these results with a multiplication baseline to illustrate differences in learning dynamics.
Results
The study found that smaller datasets led to faster validation convergence in NW matrix generation, while larger datasets required more updates to achieve the same accuracy. This indicates a 'sweet spot' in dataset size that optimizes validation convergence, challenging the assumption that larger datasets are always beneficial post-generalization onset.
Implications
These findings could influence how practitioners select dataset sizes for training models on structured-output tasks, suggesting that in some cases, smaller datasets may be more effective for achieving rapid validation convergence. The insights also contribute to the theoretical understanding of the learning dynamics in algorithmic tasks, potentially guiding future research in this area.
Peng's Q(Ξ») for Conservative Value Estimation in Offline Reinforcement Learning
Reinforcement Learning
Theory
Optimization
- CPQL is the first multi-step Q-learning algorithm for model-free offline RL.
- The algorithm effectively utilizes offline trajectories without requiring additional model estimations.
- CPQL mitigates over-pessimistic value estimation while ensuring performance is at least equal to the behavior policy.
- Theoretical analyses confirm CPQL's ability to reduce sub-optimality compared to existing methods.
Read more
Peng's Q(Ξ») for Conservative Value Estimation in Offline Reinforcement Learning
Summary
This paper introduces Conservative Pengβs Q(Ξ») (CPQL), a novel model-free offline multi-step reinforcement learning algorithm that adapts the Pengβs Q(Ξ») operator for conservative value estimation, serving as an alternative to the traditional Bellman operator. The authors claim this is the first work to theoretically and empirically validate the effectiveness of conservative multi-step value estimation in offline RL by fully utilizing offline trajectories. CPQL's fixed point aligns closely with the behavior policy's value function, promoting implicit behavior regularization while effectively mitigating over-pessimistic value estimates. The algorithm guarantees performance that is equal to or better than the behavior policy and provides near-optimal performance assurances, a significant advancement over previous conservative methods. Extensive experiments on the D4RL benchmark reveal that CPQL consistently outperforms existing offline single-step baselines. Additionally, CPQL enhances the offline-to-online learning framework, allowing Q-functions pre-trained in offline settings to facilitate smoother online fine-tuning, thereby avoiding performance drops typically seen at the start of this process.
Methodology
The CPQL algorithm employs the Pengβs Q(Ξ») operator to leverage multi-step trajectories for value estimation in offline RL. It avoids the pitfalls of importance sampling and behavior policy estimation, thus reducing distributional shift issues. Theoretical analyses are provided to support the claims of improved performance and reduced over-pessimism.
Results
Numerical experiments on the D4RL benchmark indicate that CPQL significantly outperforms traditional single-step offline RL algorithms, demonstrating its effectiveness in conservative value estimation and policy evaluation.
Implications
The findings suggest that CPQL can enhance offline reinforcement learning applications by providing more reliable value estimates and improving the transition from offline to online learning, which could be beneficial in real-world scenarios where data collection is limited.
Test-Time Learning with an Evolving Library
NLP
Large Language Models
Efficient ML
- EVOLIB allows LLMs to learn and adapt during inference without parameter updates or external supervision.
- The framework maintains a structured library of knowledge abstractions, promoting efficient knowledge transfer across tasks.
- A novel credit assignment mechanism based on Information Gain (IG) and Future Information Gain (Future IG) supports continual learning.
- EVOLIB shows consistent performance improvements across diverse benchmarks compared to traditional test-time learning methods.
Read more
Test-Time Learning with an Evolving Library
Summary
The paper introduces EVOLIB, a novel test-time learning framework designed for large language models (LLMs) that allows them to accumulate, reuse, and evolve knowledge across different problem instances without requiring parameter updates or external supervision. Unlike traditional methods that depend on gradient updates or external feedback, EVOLIB maintains a shared library of knowledge abstractions, which includes modular skills and reflective insights derived from the model's own inference trajectories. The framework employs a principled mechanism for weighting and consolidating knowledge, optimizing for both immediate utility and long-term value. This enables simple, instance-specific abstractions to evolve into more general and reusable forms over time. The authors demonstrate the effectiveness of EVOLIB across various challenging benchmarks, including mathematical reasoning, code generation, and multi-turn agentic tasks, showing substantial improvements over existing test-time scaling and learning methods that do not rely on ground-truth feedback.
Methodology
EVOLIB operates by extracting modular skills and reflective insights from the model's inference trajectories. It utilizes a self-supervised process to evaluate generated solutions and distill experiences into reusable knowledge units. The library is continuously updated through iterative abstraction extraction and consolidation, guided by a credit assignment mechanism that balances immediate usefulness with future potential.
Results
The evaluation of EVOLIB across various benchmarks indicates significant performance enhancements over leading test-time scaling and learning approaches. The method demonstrates more efficient token usage and supports continual learning that is less sensitive to task ordering compared to existing methods.
Implications
EVOLIB's self-supervised learning framework has the potential to enhance the adaptability of LLMs in real-world applications where external feedback is scarce or costly. It opens avenues for more efficient knowledge accumulation and reuse, which could lead to improved performance in dynamic and complex environments.
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
Theory
Optimization
Robotics
- Introduces BBCritic, a paradigm shift in GUI critique from binary classification to continuous semantic alignment.
- Identifies and addresses structural defects in existing GUI critic models: Affordance Collapse and Noise Sensitivity.
- Presents BBBench, the first GUI critic benchmark with a dense action space and a hierarchical four-level taxonomy.
- Demonstrates that BBCritic outperforms larger binary models with fewer parameters and no additional annotations.
Read more
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
Summary
This paper addresses the limitations of existing GUI critic models that rely on binary classification for evaluating actions in graphical user interfaces (GUIs). The authors identify two main issues: Affordance Collapse, where the hierarchical affordance space is compressed into binary labels, and Noise Sensitivity, where binary objectives overfit to noisy decision boundaries. To overcome these challenges, they propose BBCritic, a new paradigm based on the Functional Equivalence Hypothesis, which aligns instructions and actions in a shared Affordance Space through two-stage contrastive learning. This approach allows for a more nuanced evaluation of actions, distinguishing between optimal, suboptimal, and irrelevant actions. Additionally, the authors introduce BBBench, a novel benchmark that features a dense action space and a hierarchical taxonomy for evaluating GUI critic performance. Experimental results demonstrate that BBCritic outperforms state-of-the-art binary models, showing strong zero-shot transferability and robustness to label noise. The findings suggest that GUI critique should be viewed as a metric-learning problem rather than a classification task.
Methodology
The authors employ a two-stage contrastive learning framework to align user instructions and actions in a shared Affordance Space. This approach allows for the recovery of the hierarchical structure of actions, enabling fine-grained ranking of GUI actions based on their functional alignment with user intent.
Results
BBCritic, with 3 billion parameters, significantly outperforms state-of-the-art binary models with 7 billion parameters across various benchmarks. It also exhibits strong zero-shot transferability and robustness to label noise, validating the effectiveness of the continuous semantic alignment approach.
Implications
The proposed methods and benchmark can enhance the performance of generalist GUI agents, making them more reliable in real-world applications. The shift towards continuous semantic alignment may lead to improved user experience in GUI interactions and more effective automation of tasks.
A Mutual Information Lower Bound for Multimodal Regression Active Learning
Multimodal
Theory
Generative Models
- Introduction of the Two-Index framework to separate epistemic and aleatoric uncertainties.
- Development of the Mutual Information Lower Bound (MI-LB) acquisition function for active learning in multimodal regression.
- MI-LB consistently outperforms existing acquisition functions across various multimodal benchmarks.
- The framework provides a unified approach to uncertainty quantification applicable to a wide range of model families.
Read more
A Mutual Information Lower Bound for Multimodal Regression Active Learning
Summary
This paper addresses the challenge of active learning in continuous regression settings, particularly when dealing with multimodal predictive distributions. Traditional acquisition functions often conflate epistemic uncertainty (uncertainty reducible by data) with aleatoric uncertainty (intrinsic to the data), leading to suboptimal performance in multimodal scenarios. The authors introduce a Two-Index framework that separates these uncertainties by employing two stochastic indices: one for epistemic sources and another for aleatoric sources. They derive a Mutual Information Lower Bound (MI-LB) acquisition function that serves as a tractable approximation for the mutual information between outputs and epistemic indices, specifically designed for Mixture Density Network (MDN) ensembles. The MI-LB function is shown to consistently outperform existing baselines across multiple multimodal benchmarks, demonstrating its effectiveness in capturing the nuances of multimodal distributions and providing a principled approach to active learning in these contexts.
Methodology
The authors extend the Epistemic Neural Network framework by introducing two independent stochastic indices to model epistemic and aleatoric uncertainties. They derive the MI-LB acquisition function by utilizing entropy decomposition and bounds on Gaussian mixtures, allowing for efficient computation in the context of MDNs.
Results
The MI-LB acquisition function matches or exceeds the performance of all evaluated baselines (Random, Variance, BAIT, Core-Set) across three multimodal benchmarks. It is the only method that consistently performs well, particularly in scenarios where traditional geometric and Fisher-based methods fail due to the nature of multimodal distributions.
Implications
The findings suggest that the MI-LB acquisition function can significantly enhance active learning strategies in multimodal regression tasks, potentially leading to more efficient data collection and improved model performance in real-world applications where multimodal outputs are common.
A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures
Theory
Efficient ML
Time Series
- Introduction of a Schur-stable weight projection method for state-space neural networks.
- Dynamic projection ensures stable dynamics with reduced overparameterization.
- Experimental validation shows comparable performance to state-of-the-art methods.
- Lower weight count enhances training convergence without sacrificing accuracy.
Read more
A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures
Summary
This paper addresses the challenge of building black-box models for dynamical systems while ensuring asymptotic stability. The authors propose a novel weight projection method based on Schur decomposition for linear discrete-time state-space layers, which is compatible with backpropagation. The method dynamically projects the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable counterpart, thereby ensuring stable dynamics with minimal overparameterization. The paper presents an alternative pre-factorized formulation that enhances computational efficiency during model training. Experimental results on synthetic linear systems show that the proposed method achieves accuracy and convergence rates comparable to state-of-the-art techniques, with a slight increase in computational complexity. Additionally, the reduced weight count aids convergence in training without compromising accuracy in stacked neural network architectures with static nonlinearities. The findings suggest that the Schur-based projection offers a robust framework for identifying complex dynamics while adhering to strict stability requirements.
Methodology
The authors developed a weight projection scheme using Schur decomposition to ensure stability in state-space neural networks. They introduced a pre-factorized formulation to improve computational efficiency and conducted experiments on synthetic and real-world datasets to benchmark their methods against existing techniques.
Results
The proposed Schur-based projection method demonstrated accuracy and convergence rates similar to state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. The reduced weight count facilitated better convergence during training in neural network architectures targeting real-world datasets.
Implications
The findings suggest that the Schur-decomposition-based projection method can be effectively utilized in real-time control applications and other domains requiring stable dynamical system modeling. The approach may also inspire further research into efficient neural network architectures that prioritize stability.
TopoPrimer: The Missing Topological Context in Forecasting Models
Time Series
- TopoPrimer encodes the global topological structure of time series data, improving forecasting accuracy.
- The framework uses persistent homology to analyze cross-series correlation, producing a shared persistence landscape vector.
- Spectral sheaf coordinates provide a per-series relational context without requiring extensive training.
- TopoPrimer shows significant performance gains, especially under peak seasonal demand and cold-start conditions.
Read more
TopoPrimer: The Missing Topological Context in Forecasting Models
Summary
This paper introduces TopoPrimer, a novel framework that incorporates the global topological structure of time series data into forecasting models. By leveraging persistent homology and spectral sheaf coordinates, TopoPrimer enhances forecasting accuracy across various domains, stabilizes predictions during seasonal demand spikes, and mitigates the cold-start problem. The framework computes a shared topological context vector for all series in a domain, which is then integrated into forecasting models either as a direct input for fully-trained models or as a lightweight adapter for pre-trained backbones. The results demonstrate significant improvements in forecasting accuracy, particularly in challenging scenarios, showcasing the utility of population-level topological information as a valuable forecasting feature.
Methodology
TopoPrimer employs persistent homology to analyze the cross-series correlation manifold, generating a 125-dimensional persistence landscape vector that captures global clustering and cyclic co-movement. Additionally, it utilizes spectral sheaf theory to derive a 256-dimensional coordinate for each series, reflecting its relational position within the population. These components are combined into a context vector that is integrated into forecasting models, either directly or through a lightweight adapter.
Results
The implementation of TopoPrimer led to a 7.9% reduction in MAE on the Monash Weather dataset with fully-trained models, and a 7.3% reduction in MSE with Chronos and 6.8% with TimesFM on ECL. Under peak seasonal demand, TopoPrimer maintained performance degradation below 10%, while classical models suffered up to 50% degradation. In cold-start scenarios, it achieved a 27% reduction in MAE compared to a baseline without topological context.
Implications
The findings suggest that incorporating topological information can significantly enhance forecasting models, particularly in domains with complex interdependencies among time series. This approach could be beneficial in various applications such as supply chain management, energy forecasting, and traffic prediction, where understanding the relational structure of data is crucial.
Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition
Reinforcement Learning
Optimization
Robotics
- Introduction of a second-order actor-critic method for discounted MDPs.
- Utilization of a two-timescale framework to stabilize second-order updates.
- Efficient computation of Hessian-vector products to enhance convergence.
- Demonstrated effectiveness across multiple benchmark environments.
Read more
Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition
Summary
This paper addresses the challenges of value approximation in reinforcement learning (RL) within the framework of discounted Markov Decision Processes (MDPs). The authors propose a novel second-order actor-critic method that utilizes policy Hessian decomposition to enhance convergence rates. Traditional actor-critic methods rely on first-order updates, which can be inefficient due to their inability to account for curvature information in the optimization landscape. The proposed method leverages a two-timescale actor-critic framework, where the critic evolves faster than the actor, allowing for a quasi-stationary approximation of the action-value function during updates. This approach stabilizes second-order approximations by reducing high-variance components and enables efficient Hessian-vector product computations. The authors validate their method across various benchmark environments, demonstrating improved performance in both discrete and continuous control tasks.
Methodology
The authors analyze second-order approximations for actor updates by employing a two-timescale actor-critic framework. They treat the action-value function as locally constant during updates and utilize Hessian-vector products to compute curvature information efficiently. This structured approach mitigates the high variance typically associated with second-order methods in RL.
Results
The proposed second-order actor-critic method showed significant improvements in convergence speed and stability compared to traditional first-order methods. The experiments conducted on benchmark environments, including discrete tasks like CartPole and LunarLander, as well as continuous control tasks from MuJoCo, confirmed the effectiveness of the approach.
Implications
The findings suggest that incorporating second-order optimization techniques in RL can lead to more efficient learning algorithms, particularly in complex environments. This could enhance the performance of RL applications in robotics, autonomous systems, and other domains requiring sequential decision-making under uncertainty.
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
Efficient ML
Theory
Optimization
- Introduces a framework to replace Layer Normalization with RMSNorm in DNNs without affecting model predictions.
- Defines 'foldable LNs' and develops a graph-based algorithm for their detection.
- Demonstrates that many LNs in widely used architectures can be replaced, leading to significant inference-time acceleration.
- Shows that the proposed method maintains competitive performance compared to traditional LN, especially in long-sequence tasks.
Read more
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
Summary
This paper addresses the computational inefficiencies of Layer Normalization (LN) in deep neural networks (DNNs) by proposing a framework to replace LN with RMSNorm, which is more efficient due to its lack of centering operations. The authors introduce the concept of 'foldable LNs,' which are LNs that can be replaced by RMSNorm without altering model predictions. They develop a graph-based detection algorithm to identify these foldable LNs and the necessary upstream layers for applying column-based weight centering (CBWC). The analysis reveals that many LNs in popular architectures are foldable, allowing for inference-time conversion that accelerates performance by 2% to 12%. The experiments demonstrate that even when the exact conditions for folding are not met during training, the proposed method remains competitive with traditional LN while enhancing efficiency, particularly in long-sequence tasks.
Methodology
The authors analyze the conditions under which LN's centering operation can be folded into upstream linear layers using the column-centered constraint (CCC) and column-based weight centering (CBWC). They extend this analysis to arbitrary DNNs, defining foldable LNs and developing a graph-based detection algorithm to identify these layers and their corresponding upstream components.
Results
The proposed method achieves inference-time acceleration of 2% to 12% by replacing foldable LNs with RMSNorm. Experiments across various task families indicate that the method remains effective even when exact equivalence is not fully maintained during training, achieving performance comparable to vanilla LN while improving efficiency.
Implications
This work has significant implications for the design of efficient deep learning architectures, particularly in scenarios where computational resources are limited. The ability to replace LN with RMSNorm without sacrificing performance can lead to faster inference times and lower operational costs in deploying deep learning models.
Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
Multimodal
Time Series
Generative Models
- The study explores the latent trajectory and topological signatures in pediatric sleep embeddings derived from PSG data.
- Augmenting embeddings with geometric features and EHR improves model performance and generalization across apnea-hypopnea index (AHI) strata.
- The research demonstrates that geometric and topological features provide complementary information for sleep event detection.
- The use of persistent homology allows for stable and compact signatures of sleep continuity and fragmentation.
Read more
Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
Summary
This paper investigates the latent structure of multimodal embeddings derived from pediatric polysomnography (PSG) data, focusing on the diagnostic information contained in sequences of 30-second epochs. The authors utilize a multimodal masked autoencoder to generate embeddings and augment these with PHATE-derived coordinates, persistent homology summaries, and electronic health records (EHR) to enhance the interpretability and performance of sleep event detection models. The study employs simple linear and multi-layer perceptron (MLP) models to assess the contributions of geometric, topological, and clinical features. The results indicate that these features provide complementary gains in predictive performance, with notable improvements in area under the precision-recall curve (AUPRC) for various sleep events. The findings highlight the importance of understanding the latent geometry and topology of sleep data, suggesting that these characteristics can improve model calibration and robustness, particularly in imbalanced datasets. Overall, the research presents a novel approach to interpreting generative models in sleep medicine, emphasizing the need for a session-wide analysis of pediatric sleep dynamics.
Methodology
The authors employed a multimodal masked autoencoder to generate embeddings from pediatric PSG data, followed by trajectory analysis using PHATE for per-epoch coordinates and persistent homology for topological features. They integrated EHR data to enhance model interpretability and performance, utilizing linear and MLP models for classification tasks.
Results
The study reported improvements in AUPRC for various sleep events: desaturation (0.26 to 0.34), EEG arousal (0.31 to 0.48), hypopnea (0.09 to 0.22), and apnea (0.05 to 0.14). The full fusion model exhibited the best calibration across all tasks, as measured by Brier score and Expected Calibration Error.
Implications
The findings suggest that incorporating trajectory and topological analyses into pediatric sleep studies can enhance diagnostic accuracy and treatment decisions. This approach may lead to better understanding and management of sleep disorders in children, ultimately improving developmental health outcomes.
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
Reinforcement Learning
Robotics
Optimization
- Introduction of Hybrid Policy Optimization (HPO) framework for hybrid discrete-continuous action spaces.
- HPO utilizes a mixed gradient estimator combining PW and SF gradients for improved credit assignment.
- Empirical results show HPO outperforms PPO, especially in high-dimensional continuous action settings.
- The mixed gradient structure allows for decentralized updates, enhancing efficiency in learning.
Read more
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
Summary
This paper addresses the challenges of reinforcement learning (RL) in hybrid discrete-continuous action spaces, which are common in robotics and control problems. Traditional model-free policy gradient methods, such as REINFORCE and Proximal Policy Optimization (PPO), struggle with credit assignment in high-dimensional settings, leading to poor gradient quality. The authors propose a new framework called Hybrid Policy Optimization (HPO), which combines pathwise (PW) and score-function (SF) gradient estimators to improve gradient estimation while maintaining unbiasedness. HPO backpropagates through a simulator where smoothness allows, thus enhancing credit assignment. The paper also reformulates problems with action discontinuities into a hybrid form to broaden applicability. Empirical results demonstrate that HPO significantly outperforms PPO in inventory control and switched linear-quadratic regulator problems, particularly as the continuous action dimension increases. The mixed gradient structure is characterized, revealing that its cross term diminishes near a discrete best response, allowing for decentralized updates and reduced variance near optimality.
Methodology
The authors developed the Hybrid Policy Optimization (HPO) framework, which employs a structured policy representation that combines a stochastic discrete policy with a deterministic continuous controller. This framework allows for the use of a mixed gradient estimator that blends PW and SF gradients, optimizing gradient estimation in environments with known smoothness.
Results
HPO demonstrated substantial performance improvements over PPO in empirical evaluations, particularly in tasks involving high-dimensional continuous actions. The performance gap increased with the continuous action dimension, confirming that the mixed gradient estimator provided a significantly better gradient signal compared to pure SF estimators.
Implications
The HPO framework can be applied to various domains requiring hybrid action spaces, such as robotics, inventory management, and control systems. Its ability to improve gradient estimation and learning efficiency could lead to advancements in real-time decision-making systems and complex control tasks.
Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing
Theory
Optimization
Efficient ML
- Introduction of Shodh-MoE architecture to address negative transfer in multi-physics models.
- Utilization of a physics-informed autoencoder for generating compressed physical latents.
- Implementation of a Top-1 soft-semantic router for dynamic expert assignment based on latent semantics.
- Demonstration of significant improvements in model convergence and performance across distinct physical regimes.
Read more
Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing
Summary
This paper addresses the challenge of negative transfer in Scientific Machine Learning (SciML) when training universal foundation models on diverse partial differential equation (PDE) regimes. The authors introduce Shodh-MoE, a novel sparse-activated latent transformer architecture designed to mitigate the interference caused by simultaneous training on incompatible physical systems, specifically open-channel fluid dynamics and porous media flows. The architecture employs a physics-informed autoencoder to produce compressed physical latents, ensuring that the decoded states adhere to divergence-free velocity manifolds, thus guaranteeing mass conservation. The model utilizes a Top-1 soft-semantic router to dynamically assign latent patches to specialized expert subnetworks, allowing for tailored parameter paths for different physical mechanisms while maintaining shared experts for universal symmetries. The results demonstrate that Shodh-MoE effectively converges across both physical regimes, achieving low mean squared errors (MSE) for latent and decoded physical states, thereby validating the efficacy of sparse expert routing in overcoming multi-physics interference in neural operators.
Methodology
The Shodh-MoE architecture combines a physics-informed latent autoencoder with a sparse-activated transformer. It employs a routing mechanism that directs latent patches to specialized expert subnetworks, allowing for conditional computation and reducing destructive interference from conflicting physical regimes. The model ensures physical constraints are embedded within the architecture, enhancing conservation properties during inference.
Results
The Shodh-MoE model achieved latent validation mean squared errors (MSEs) of 2.46 Γ 10β5 and 9.76 Γ 10β6, and decoded physical MSEs of 2.48 Γ 10β6 and 1.76 Γ 10β6 across open-channel and porous media flows, respectively. The model demonstrated effective autonomous domain bifurcation, routing tokens to the appropriate expert based on their physical characteristics.
Implications
The findings suggest that sparse expert routing can significantly enhance the performance of multi-physics foundation models, making them more robust to the challenges posed by diverse physical systems. This approach could be applied to various fields requiring the integration of multiple physical phenomena, potentially leading to advancements in scientific simulations and predictive modeling.