AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
65
Papers today
8h
Update frequency
7
Days of history
Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement
NLP
Large Language Models
Time Series
- Developed a causal network model to predict PASC severity in women using LLM.
- Achieved 86.7% precision in clinical severity prediction.
- Successfully differentiated between active pathology symptoms and confounding factors like menopause.
- Utilized wearable data to enhance prediction accuracy and reduce diagnostic ambiguity.
Read more
Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement
Summary
This study addresses the challenge of predicting the severity of Post-Acute Sequelae of SARS-CoV-2 (PASC) in adult women, particularly in the context of confounding factors such as menopause. The authors conducted a retrospective analysis of 1,155 women using the NIH RECOVER dataset, integrating static clinical profiles with four weeks of longitudinal wearable data to develop a causal network based on a Large Language Model (LLM). The model achieved a precision of 86.7% in predicting clinical severity, demonstrating its ability to distinguish between active pathology and baseline noise. The research highlights the importance of causal disentanglement in improving diagnostic accuracy and treatment pathways for PASC, especially in populations with overlapping symptoms due to hormonal transitions. The findings suggest that wearable technology can provide valuable real-time physiological data, enhancing the predictive capabilities of machine learning models in clinical settings.
Methodology
The study employed a causal-disentangled architecture integrating historical PASC scores, clinical comorbidities, and wearable data. A multi-seed validation strategy was used to ensure robustness, with performance metrics evaluated across five random seeds. Saliency scores were extracted to assess the model's focus on causal versus confounding features.
Results
The model demonstrated competitive classification accuracy (0.867 ± 0.044) and superior precision (0.836 ± 0.116) compared to a baseline XGBoost model. Saliency analysis revealed that direct indicators of pathology had maximum saliency scores, while confounding factors were effectively suppressed, indicating the model's capability to prioritize clinically relevant signals.
Implications
The findings suggest that integrating causal disentanglement in predictive models can significantly improve the accuracy of clinical assessments for conditions like PASC. This approach may lead to better diagnostic frameworks and treatment strategies tailored to women's health, particularly during hormonal transitions.
FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models
Interpretability
- FoMo-X enhances the explainability of outlier detection models by integrating modular diagnostic heads.
- The framework leverages frozen embeddings from pretrained PFNs to provide efficient, context-aware diagnostics.
- Two diagnostic heads are introduced: one for severity assessment and another for uncertainty estimation.
- Extensive evaluations show high fidelity in recovering diagnostic signals with negligible inference cost.
Read more
FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models
Summary
The paper introduces FoMo-X, a modular framework designed to enhance the explainability of outlier detection (OD) foundation models, specifically Prior-Data Fitted Networks (PFNs). While PFNs have shown significant promise in unsupervised zero-shot adaptation for OD, they typically operate as opaque black boxes, providing only scalar outlier scores without contextual explanations necessary for safety-critical applications. FoMo-X addresses this limitation by integrating lightweight diagnostic capabilities into PFNs. The framework utilizes frozen embeddings from a pretrained PFN backbone, attaching auxiliary diagnostic heads that are trained offline. These heads distill complex properties, such as epistemic uncertainty, into efficient, single-pass inferences. The authors instantiate FoMo-X with two heads: a Severity Head that categorizes deviations into interpretable risk tiers, and an Uncertainty Head that offers calibrated confidence measures. Extensive evaluations on synthetic and real-world benchmarks demonstrate that FoMo-X effectively recovers ground-truth diagnostic signals with minimal inference overhead, bridging the gap between model performance and operational explainability. This work represents a scalable approach toward trustworthy zero-shot outlier detection.
Methodology
FoMo-X employs a modular framework that attaches auxiliary diagnostic heads to the frozen embeddings of a pretrained PFN. These heads are trained offline using a generative simulator prior, allowing for the extraction of operationally meaningful signals without modifying the underlying detector. The Severity Head discretizes deviations into risk tiers, while the Uncertainty Head provides calibrated confidence measures, both derived from the context-conditioned embeddings.
Results
The evaluation of FoMo-X on synthetic and real-world benchmarks indicates that it successfully recovers ground-truth diagnostic signals with high fidelity and minimal additional computational overhead. The framework demonstrates its potential to enhance the interpretability of OD models while maintaining their performance.
Implications
FoMo-X offers a scalable solution for integrating explainability into outlier detection systems, particularly in safety-critical domains such as healthcare and cybersecurity. By providing actionable diagnostic signals, it supports better decision-making and trust calibration in operational settings.
CLeAN: Continual Learning Adaptive Normalization in Dynamic Environments
Theory
Optimization
Efficient ML
- CLeAN addresses the limitations of traditional normalization methods in continual learning contexts.
- The technique employs learnable parameters updated via Exponential Moving Average (EMA) for adaptive normalization.
- CLeAN improves model performance on new data while reducing catastrophic forgetting.
- The study emphasizes the critical role of adaptive normalization in dynamic environments.
Read more
CLeAN: Continual Learning Adaptive Normalization in Dynamic Environments
Summary
The paper introduces Continual Learning Adaptive Normalization (CLeAN), a novel technique aimed at addressing the challenges of data normalization in continual learning scenarios, particularly in dynamic environments where data distributions frequently shift. Traditional normalization methods, which assume access to the entire dataset, are inadequate for continual learning, where data is presented sequentially. CLeAN utilizes learnable parameters updated through an Exponential Moving Average (EMA) to estimate global feature scales, allowing models to adapt to evolving data distributions. The authors conduct comprehensive evaluations across two datasets and various continual learning strategies, including Reservoir Experience Replay, A-GEM, and EwC. The results demonstrate that CLeAN not only enhances model performance on new data but also effectively mitigates catastrophic forgetting, highlighting the significance of adaptive normalization in maintaining stability and effectiveness in tabular data learning. This work provides a new perspective on normalization's role in preserving knowledge in dynamic learning environments.
Methodology
The authors developed CLeAN, which estimates global feature scales using learnable parameters updated through an Exponential Moving Average (EMA). They conducted experiments on two datasets employing various continual learning strategies, including Reservoir Experience Replay, A-GEM, and EwC, to evaluate the performance of CLeAN compared to traditional normalization methods.
Results
The evaluations showed that CLeAN significantly improved model performance on new data and effectively mitigated catastrophic forgetting. The results underscored the importance of adaptive normalization in enhancing the stability and effectiveness of models in dynamic environments.
Implications
CLeAN has potential applications in fields such as cybersecurity, autonomous transportation, and finance, where data distributions are subject to continuous change. By improving the adaptability of models to evolving data, CLeAN can enhance the reliability and performance of AI systems in real-world scenarios.
Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition
Reinforcement Learning
Time Series
Generative Models
- Introduces CTFG, a novel framework for feature extraction in HAR that addresses cross-user variability.
- Utilizes a Transformer-based autoregressive generator for sequential feature token generation.
- Employs Group-Relative Policy Optimization to optimize feature generation without a critic.
- Achieves state-of-the-art accuracy on benchmark datasets while reducing training variance.
Read more
Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition
Summary
This paper addresses the challenge of cross-user variability in Human Activity Recognition (HAR) using wearable inertial sensors, which is crucial for applications in healthcare and fitness analytics. Traditional domain generalization methods often overlook temporal dependencies in sensor data or require impractical target-domain annotations. The authors propose a novel framework called Collaborative Temporal Feature Generation (CTFG), which utilizes a Transformer-based autoregressive generator to incrementally create feature token sequences based on prior context and encoded sensor input. The generator is optimized using Group-Relative Policy Optimization (GRPO), a critic-free reinforcement learning algorithm that evaluates generated sequences against alternatives from the same input, thus avoiding biases associated with critic-based methods. The optimization process incorporates a tri-objective reward system focusing on class discrimination, cross-user invariance, and temporal fidelity. Evaluations on benchmark datasets DSADS and PAMAP2 demonstrate that CTFG achieves state-of-the-art cross-user accuracy, significantly reduces inter-task training variance, accelerates convergence, and exhibits robust generalization across varying action-space dimensionalities.
Methodology
The methodology involves modeling feature extraction as a collaborative sequential generation process governed by reinforcement learning. The CTFG framework employs a Transformer-based autoregressive generator that constructs feature token sequences incrementally. The optimization is performed using Group-Relative Policy Optimization, which evaluates sequences based on a tri-objective reward system focused on class discrimination, cross-user invariance, and temporal fidelity.
Results
CTFG achieved cross-user accuracy rates of 88.53% on the DSADS dataset and 75.22% on the PAMAP2 dataset. The framework demonstrated a significant reduction in inter-task training variance, faster convergence rates, and robust generalization across different action-space dimensionalities.
Implications
The proposed framework has significant implications for improving the reliability and accuracy of HAR systems in real-world applications, particularly in healthcare monitoring and fitness analytics, by effectively addressing the challenges posed by cross-user variability.
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
Reinforcement Learning
Optimization
Multimodal
- Introduction of the Log-Fidelity Modulator (LFM) for stable gradient optimization.
- Implementation of Decoupled Hazard Penalty (DHP) for independent regulation of policy shifts.
- Demonstrated superior performance and stability in RL training across diverse benchmarks.
- Mitigation of risks associated with extreme policy shifts and high-variance outlier tokens.
Read more
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
Summary
The paper introduces Modulated Hazard-aware Policy Optimization (MHPO), a novel framework aimed at enhancing the stability of reinforcement learning (RL) training processes, particularly in Group Relative Policy Optimization (GRPO) contexts. Traditional methods for controlling importance ratios, such as hard clipping, often lead to non-differentiable boundaries and gradient issues, which can destabilize training. MHPO addresses these challenges by incorporating a Log-Fidelity Modulator (LFM) that maps unbounded importance ratios into a bounded, differentiable domain, thus preventing high-variance outlier tokens from destabilizing the optimization landscape. Additionally, the framework employs a Decoupled Hazard Penalty (DHP) that utilizes cumulative hazard functions from survival analysis to regulate positive and negative policy shifts independently. This dual regulation allows for fine-grained control over policy shifts, mitigating risks of mode collapse and policy erosion. The proposed methodology is evaluated across various reasoning benchmarks in both text-based and vision-language tasks, demonstrating that MHPO consistently outperforms existing methods, achieving superior performance and enhanced training stability.
Methodology
The MHPO framework consists of two main components: the Log-Fidelity Modulator (LFM), which uses a scaled tanh transformation to ensure bounded, differentiable importance ratios, and the Decoupled Hazard Penalty (DHP), which employs cumulative hazard functions to regulate policy shifts. This combination allows for stable optimization while maintaining high fidelity in gradient calculations.
Results
Extensive evaluations show that MHPO significantly outperforms baseline methods in terms of performance gains and training stability. The framework maintains lower gradient spikes and achieves higher overall rewards earlier in the training process compared to existing approaches.
Implications
The findings suggest that MHPO can be effectively utilized in various RL applications, particularly those requiring stable training in complex environments. Its hazard-aware mechanisms may also inspire future research in policy optimization and reinforcement learning stability.
Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies
Reinforcement Learning
Theory
Optimization
- Introduction of a benchmarking framework for RL based on stochastic converse optimality.
- Systematic generation of environments with known optimal policies for rigorous evaluation.
- Validation through diverse environments and assessment of standard RL methods against ground-truth optima.
- Provision of absolute metrics for performance evaluation, enhancing reproducibility in RL research.
Read more
Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies
Summary
This paper addresses the challenges of benchmarking Reinforcement Learning (RL) algorithms, which are often sensitive to environmental design and stochasticity. The authors propose a new benchmarking framework that extends the concept of converse optimality to discrete-time, control-affine, nonlinear systems with noise. This framework allows for the systematic generation of benchmark families with known optimal policies, enabling a more rigorous evaluation of RL algorithms. The authors validate their approach by automatically constructing diverse environments and comparing standard RL methods against a ground-truth optimum. The framework provides necessary and sufficient conditions for optimality, facilitating a controlled evaluation of algorithms. The results demonstrate that the proposed method allows for absolute metrics such as optimality gaps and regret to be reported against certified optima, thus offering a reproducible foundation for RL benchmarking.
Methodology
The authors extend the concept of converse optimality to stochastic, discrete-time, control-affine, nonlinear systems. They derive necessary and sufficient conditions for optimality and develop a framework for generating benchmark environments through homotopy variations and randomized parameters. A paired evaluation protocol using Common Random Numbers is employed to ensure fair comparisons.
Results
The proposed framework successfully generates diverse environments with known optimal policies, allowing for a comprehensive evaluation of RL algorithms. The results indicate that standard methods can be assessed against certified optima, providing absolute performance metrics such as optimality gaps and regret.
Implications
This work has significant implications for the field of RL by providing a reproducible and rigorous benchmarking methodology. It can help researchers better understand the performance of various RL algorithms and facilitate the development of more effective learning strategies in complex environments.
Classifier Pooling for Modern Ordinal Classification
Theory
Efficient ML
- Introduces a model-agnostic approach for ordinal classification using any non-ordinal classifier.
- Develops two algorithms: DifferenceOrdinalClassifier for cumulative classification and TreeOrdinalClassifier for hierarchical classification.
- Provides an open-source Python package 'statlab' for easy implementation of the proposed methods.
- Demonstrates superior performance of the proposed methods over traditional non-ordinal classifiers in various datasets.
Read more
Classifier Pooling for Modern Ordinal Classification
Summary
This paper addresses the challenges of ordinal classification, which is prevalent in various fields such as clinical data analysis and surveys. The authors propose a model-agnostic method that allows any non-ordinal classification algorithm to be adapted for ordinal tasks. They introduce two specific algorithms: DifferenceOrdinalClassifier and TreeOrdinalClassifier, which utilize cumulative and hierarchical approaches, respectively. The paper also presents an open-source Python package named 'statlab' that implements these algorithms, making them accessible for practical use. Through experiments on multiple real-world datasets, the authors demonstrate that their methods often outperform traditional non-ordinal classification techniques, particularly in scenarios with smaller datasets or numerous outcome classes. This work not only enhances the capabilities of machine learning in handling ordinal data but also provides a valuable software tool for researchers and practitioners.
Methodology
The authors developed two object-oriented classes in Python that allow for model-agnostic ordinal classification. The DifferenceOrdinalClassifier performs cumulative ordinal classification, while the TreeOrdinalClassifier implements hierarchical ordinal classification. Both methods involve training classifiers on binary outcomes corresponding to adjacent classes and estimating probabilities based on these classifiers. The implementation is encapsulated in the 'statlab' package, which is compatible with sklearn-style classifiers.
Results
The experiments conducted on real-world datasets showed that the proposed ordinal classification methods often outperformed traditional non-ordinal classification approaches, especially when the dataset size was small or when there were many classes. The results indicate that the cumulative and hierarchical paradigms effectively leverage the ordinal nature of the data.
Implications
The findings suggest that modern machine learning algorithms can be effectively utilized for ordinal classification tasks, which have been historically underserved. The availability of the 'statlab' package facilitates broader adoption of these methods in various fields, including healthcare and social sciences, where ordinal data is common.
RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation
Graph Learning
- RaDAR addresses structural semantics degradation and limited relational expressiveness in recommendation systems.
- The framework employs a dual-view generation architecture combining graph generative and denoising models.
- Innovations include asymmetric contrastive learning and diffusion-guided augmentation for enhanced robustness.
- RaDAR outperforms existing methods on multiple benchmarks, especially under high noise and sparsity.
Read more
RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation
Summary
The paper introduces RaDAR, a novel framework designed to enhance collaborative filtering recommendation systems by addressing two main challenges: structural semantics degradation and limited relational expressiveness. Traditional methods often distort critical structural signals through random edge perturbations, leading to degraded semantic consistency across augmented views. Additionally, data sparsity hampers the propagation of collaborative signals, limiting generalization. RaDAR combines a graph generative model and a relation-aware denoising model to generate two complementary views. Key innovations include asymmetric contrastive learning with global negative sampling to maintain semantic alignment, diffusion-guided augmentation for robustness through progressive noise injection and denoising, and relation-aware edge refinement that dynamically adjusts edge weights based on latent node semantics. Extensive experiments on various public benchmarks demonstrate that RaDAR consistently outperforms state-of-the-art methods, particularly in noisy and sparse conditions, showcasing its effectiveness in improving recommendation accuracy.
Methodology
RaDAR employs a dual-view generation architecture that integrates a graph generative model based on variational autoencoders and a relation-aware denoising model. It utilizes asymmetric contrastive learning with global negative sampling and a diffusion-guided augmentation strategy that applies Gaussian noise to node representations, maintaining semantic integrity while generating robust graph views.
Results
The experimental results indicate that RaDAR consistently outperforms state-of-the-art recommendation methods across multiple datasets, including Last.FM, Yelp, and Tmall, particularly excelling in scenarios characterized by high data sparsity and noise.
Implications
RaDAR's approach can significantly enhance the performance of recommendation systems in real-world applications, particularly in domains where data is sparse or noisy. Its ability to capture complex relational patterns may lead to more personalized and accurate recommendations.
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data
Large Language Models
Theory
Efficient ML
- Specialized pretraining (SPT) improves domain performance while preserving general capabilities.
- SPT reduces the pretraining tokens needed to achieve a given domain performance by up to 1.75×.
- Incorporating domain data early in training is more effective than reserving it for finetuning.
- SPT outperforms traditional finetuning approaches, especially in underrepresented domains.
Read more
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data
Summary
This paper explores the concept of 'specialized pretraining' (SPT), a strategy that integrates domain-specific data into the pretraining phase of model training, rather than reserving it solely for finetuning. The authors argue that traditional approaches, which treat pretraining and finetuning as separate stages, may not be optimal, especially when the target domain is underrepresented in the pretraining corpus. By interleaving domain data throughout pretraining, SPT allows models to better retain general knowledge and achieve improved performance on specialized tasks. The study evaluates SPT across three domains: ChemPile, MusicPile, and ProofPile, demonstrating that it not only enhances domain-specific performance but also reduces the amount of pretraining tokens needed to achieve desired results. The findings suggest that incorporating domain data early in the training process can mitigate overfitting and enhance generalization, ultimately leading to better model performance with fewer parameters and less computational cost. The authors also derive scaling laws to guide practitioners in optimizing the use of domain data during pretraining.
Methodology
The authors conducted experiments using specialized pretraining (SPT), where a small domain dataset was mixed into the pretraining phase as a fraction of the total tokens, repeated multiple times. They compared the performance of models trained with SPT against those trained with standard pretraining followed by finetuning across three specialized domains. The study also included an analysis of overfitting scaling laws to understand the impact of domain data repetition during training.
Results
The results showed that models trained with SPT achieved lower domain test loss and retained general knowledge more effectively during finetuning compared to standard approaches. SPT models required significantly less pretraining compute to reach the same performance levels, with notable improvements in downstream task accuracy. For domains poorly represented in the pretraining corpus, SPT demonstrated superior performance, with a 1B model outperforming a 3B standard pretrained model.
Implications
The findings suggest that practitioners should reconsider their training strategies, particularly in scenarios with limited domain data. By adopting SPT, organizations can achieve better model performance with reduced computational resources, making it a valuable approach for deploying models in specialized domains.
RangeAD: Fast On-Model Anomaly Detection
Efficient ML
Theory
- Introduction of the On-Model AD framework for anomaly detection.
- Development of RangeAD, which uses internal neural activation ranges for real-time anomaly detection.
- Demonstration of superior performance in high-dimensional tasks with lower inference costs.
- Comprehensive ablation study validating the efficacy of the proposed method.
Read more
RangeAD: Fast On-Model Anomaly Detection
Summary
The paper introduces RangeAD, a novel approach to anomaly detection (AD) that operates within the On-Model AD framework, which leverages the information encoded in a primary machine learning model to enhance anomaly detection efficiency. Traditional anomaly detection methods often run as separate systems, leading to increased computational overhead and potential misalignment with the primary model's operational domain. RangeAD addresses these issues by utilizing neuron-wise output ranges from the primary model to derive anomaly scores, allowing for real-time detection with minimal additional computational cost. The authors demonstrate that RangeAD outperforms existing methods, particularly in high-dimensional tasks, while also providing a comprehensive ablation study to validate their design choices. The proposed method not only enhances detection accuracy but also aligns closely with the specific goals of the application, making it a practical solution for real-world deployment.
Methodology
RangeAD leverages the activation ranges of neurons from a trained primary model to compute anomaly scores during the model's forward pass. This approach eliminates the need for a separate anomaly detection model, thus reducing computational overhead and aligning detection with the primary model's objectives.
Results
The results indicate that RangeAD achieves higher accuracy in detecting anomalies compared to traditional methods, particularly in high-dimensional datasets, while maintaining low inference costs. The ablation study confirms the effectiveness of the design choices made in the development of the algorithm.
Implications
RangeAD has significant implications for real-time anomaly detection in various applications, including fraud detection, industrial monitoring, and healthcare systems, where efficient and accurate anomaly detection is critical for system reliability and safety.
TimeAPN: Adaptive Amplitude-Phase Non-Stationarity Normalization for Time Series Forecasting
Time Series
- TimeAPN addresses non-stationarity in time series forecasting by modeling amplitude and phase changes.
- The framework utilizes discrete wavelet transform for frequency domain analysis.
- It incorporates adaptive normalization mechanisms to handle abrupt fluctuations in signal energy.
- TimeAPN is model-agnostic, allowing integration with various forecasting backbones.
Read more
TimeAPN: Adaptive Amplitude-Phase Non-Stationarity Normalization for Time Series Forecasting
Summary
The paper addresses the challenge of non-stationarity in multivariate long-term time series forecasting, which is characterized by rapid changes in amplitude and phase that can degrade predictive performance. Existing normalization methods often rely on first- and second-order statistics, which overlook fine-grained temporal dynamics. To overcome these limitations, the authors propose TimeAPN, an Adaptive Amplitude-Phase Non-Stationarity Normalization framework that models and predicts non-stationary factors in both time and frequency domains. TimeAPN employs discrete wavelet transform (DWT) to decompose time series into frequency components, estimating the mean sequence adaptively for energy compensation. It captures phase discrepancies and incorporates an adaptive normalization mechanism for amplitude variations. The framework is model-agnostic and integrates seamlessly with various forecasting models. Extensive experiments on seven real-world datasets demonstrate that TimeAPN consistently enhances long-term forecasting accuracy across multiple horizons and outperforms existing reversible normalization methods.
Methodology
TimeAPN employs a dual-domain approach, utilizing discrete wavelet transform to decompose time series into frequency components. It models the mean sequence adaptively in both time and frequency domains, captures phase discrepancies, and integrates amplitude information through an adaptive normalization mechanism. The predicted non-stationary factors are combined with the backbone forecasting outputs via a collaborative de-normalization process.
Results
The proposed TimeAPN framework consistently improves long-term forecasting accuracy across multiple prediction horizons on seven real-world multivariate datasets. It outperforms existing state-of-the-art reversible normalization methods, demonstrating lower prediction errors in both time and frequency domains.
Implications
TimeAPN has potential applications in various fields requiring accurate long-term time series forecasting, such as finance, energy management, traffic control, and weather prediction. Its model-agnostic nature allows for broad applicability across different forecasting architectures.
Transition Flow Matching
Generative Models
- Introduction of Transition Flow Matching for efficient few-step generative modeling.
- Derivation of the Transition Flow Identity and a new training objective for generative models.
- Establishment of a unified theoretical perspective connecting Transition Flow Matching with Mean Velocity models.
- Demonstration of competitive performance in image generation benchmarks.
Read more
Transition Flow Matching
Summary
This paper introduces a novel framework called Transition Flow Matching for generative modeling, which addresses the limitations of traditional flow matching methods that focus on learning local velocity fields. These conventional methods require multiple integration steps during generation, which can be inefficient. In contrast, Transition Flow Matching directly learns the transition flow as a global quantity, allowing for single-step generation or generation at arbitrary future time points. The paper establishes a theoretical foundation by deriving the Transition Flow Identity and proposing a training objective that enables end-to-end learning from scratch. This approach not only generalizes previous models but also clarifies the relationship with Mean Velocity Flow models. Extensive experiments demonstrate the effectiveness of the proposed method across various image generation tasks, showcasing competitive performance and providing insights through ablation studies on key design choices.
Methodology
The methodology involves deriving the Transition Flow Identity and formulating a training objective that allows generative models to learn transition flows directly. This approach contrasts with traditional methods that regress local velocity fields, instead modeling the generation trajectory itself. The framework supports arbitrary step sizes and numbers of steps, facilitating efficient generation.
Results
The experiments conducted validate the proposed method's effectiveness, showing that Transition Flow Matching achieves competitive performance in image generation tasks across multiple datasets. The results indicate that the new framework can generate high-quality outputs while simplifying the generative process.
Implications
The implications of this work suggest that Transition Flow Matching can significantly enhance the efficiency of generative modeling, particularly in applications requiring rapid generation of complex data distributions. This could be beneficial in fields such as computer vision, where quick and accurate image generation is crucial.
Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control
Reinforcement Learning
Large Language Models
Robotics
- GuidedSAC leverages LLMs for action-level guidance in reinforcement learning.
- The algorithm maintains convergence guarantees of the original SAC while enhancing speed.
- GuidedSAC outperforms standard SAC and advanced exploration methods in various tasks.
- The approach addresses the inefficiencies of exploration in vast state-action spaces.
Read more
Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control
Summary
This paper introduces GuidedSAC, a novel reinforcement learning (RL) algorithm designed to enhance exploration efficiency in large state-action spaces. GuidedSAC integrates large language models (LLMs) as intelligent supervisors that provide action-level guidance to the Soft Actor-Critic (SAC) algorithm. By analyzing recent trajectories and utilizing visual replays, the LLM-based supervisor offers targeted interventions that facilitate exploration of high-value states. The authors provide a theoretical framework demonstrating that GuidedSAC maintains the convergence guarantees of SAC while improving convergence speed. Experimental evaluations across various environments, including discrete toy tasks and complex MuJoCo benchmarks, show that GuidedSAC consistently outperforms standard SAC and other state-of-the-art exploration methods, such as RND, ICM, and E3B, in terms of sample efficiency and overall performance. The paper emphasizes the importance of valuable states in exploration and proposes a real-time supervisory approach to guide RL agents effectively.
Methodology
GuidedSAC builds upon the Soft Actor-Critic (SAC) framework, incorporating LLMs to provide real-time action-level guidance. The LLM analyzes the agent's recent trajectory and generates interventions that help the agent explore high-value states more effectively. The theoretical analysis confirms that this guidance improves sample efficiency without compromising the convergence properties of SAC.
Results
Experiments demonstrate that GuidedSAC achieves superior sample efficiency and performance compared to standard SAC and other exploration-enhanced methods across both discrete and continuous control tasks. The results indicate that the integration of LLMs significantly aids in the exploration of valuable states, leading to improved learning outcomes.
Implications
The findings suggest that incorporating LLMs as supervisors in reinforcement learning can substantially enhance exploration strategies, making it feasible to tackle complex robotic tasks and other applications requiring efficient learning in high-dimensional spaces. This approach may pave the way for more effective RL algorithms that leverage external knowledge sources.
Efficient Reasoning on the Edge
NLP
Large Language Models
Efficient ML
- Introduces a lightweight approach for enabling reasoning in small LLMs using LoRA adapters.
- Implements budget forcing via reinforcement learning to minimize verbosity in reasoning outputs.
- Utilizes parallel test-time scaling to improve accuracy without significantly increasing latency.
- Presents a dynamic adapter-switching mechanism to optimize resource usage during inference.
Read more
Efficient Reasoning on the Edge
Summary
This paper addresses the challenges of deploying large language models (LLMs) with reasoning capabilities on edge devices, which are constrained by memory, latency, and power consumption. The authors propose a novel framework that utilizes lightweight LoRA (Low-Rank Adaptation) adapters combined with supervised fine-tuning and reinforcement learning to optimize reasoning performance while minimizing resource usage. Key innovations include budget forcing to reduce verbosity in reasoning outputs, parallel test-time scaling to enhance accuracy, and a dynamic adapter-switching mechanism that activates reasoning only when necessary. The proposed methods were evaluated using the Qwen2.5-7B model, demonstrating that efficient reasoning can be achieved under strict resource constraints, making LLMs practical for mobile applications. The results indicate that the framework significantly reduces token generation costs and improves response times, thereby facilitating the deployment of intelligent personal assistants and other reasoning-capable applications on mobile devices.
Methodology
The authors developed an end-to-end pipeline that starts with a base non-reasoning instruct model and enables reasoning through LoRA adapters. The training process involves supervised fine-tuning followed by reinforcement learning to optimize reasoning performance. Key techniques include masked LoRA training for KV-cache sharing and dynamic switching between reasoning and non-reasoning modes based on query context.
Results
Experiments with the Qwen2.5-7B model showed that the proposed framework achieved efficient reasoning with reduced response lengths and minimal accuracy loss. The methods led to significant improvements in token generation efficiency and response times, making LLM reasoning feasible for mobile scenarios.
Implications
The findings suggest that LLMs can be effectively deployed on edge devices, enabling advanced reasoning capabilities for applications such as intelligent personal assistants, autonomous task planning, and contextual user interactions, all while maintaining data privacy and reducing latency.
Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks
Reinforcement Learning
Federated Learning
Graph Learning
- Presents a comprehensive taxonomy of multi-agent deep learning in wireless networks.
- Emphasizes the integration of federated learning with multi-agent systems for privacy-aware intelligence.
- Highlights various application domains including MEC, UAV networks, and intrusion detection.
- Identifies key challenges such as scalability, security, and real-time constraints in 6G deployments.
Read more
Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks
Summary
This paper surveys the integration of multi-agent deep learning (MADL) and federated learning (FL) in the context of advanced distributed sensing within wireless networks, particularly as they evolve towards 5G-Advanced and 6G systems. The authors present a comprehensive taxonomy that categorizes MADL approaches based on learning formulations, neural architectures, advanced techniques, and application domains. They emphasize the importance of decentralized decision-making in resource-constrained environments, where multi-agent reinforcement learning (MADRL) can facilitate scalable and efficient solutions. The paper also discusses the challenges of federated learning in wireless settings, including privacy concerns and communication overhead. The authors provide a critical synthesis of advanced techniques such as hierarchical and over-the-air federated learning, and they highlight various applications, including mobile edge computing (MEC), UAV networks, and intrusion detection. Finally, the paper identifies open issues and future research directions necessary for the realization of 6G-native systems that effectively integrate sensing, communication, and computation.
Methodology
The authors conducted a survey of existing literature and synthesized findings related to multi-agent deep learning and federated learning in wireless networks. They developed a taxonomy to categorize various approaches and techniques, and provided comparative analyses of algorithms and system-level trade-offs.
Results
The survey reveals a rich landscape of MADL and FL techniques applicable to distributed sensing in wireless networks. It identifies critical areas for future research, including the need for scalable solutions, enhanced security measures, and improved real-time performance in the context of 6G systems.
Implications
The findings of this paper have significant implications for the design and deployment of future wireless networks, particularly in enhancing the capabilities of autonomous systems, improving resource management, and ensuring privacy in data-sensitive applications. The integration of advanced learning techniques could lead to more efficient and robust wireless communication systems.
Minimum-Action Learning: Energy-Constrained Symbolic Model Selection for Physical Law Identification from Noisy Data
Optimization
Interpretability
Theory
- MAL effectively identifies physical laws from noisy data by minimizing a Triple-Action functional.
- The wide-stencil acceleration-matching technique reduces noise variance significantly, enabling learnability.
- MAL achieved 100% identification accuracy for the true force law in all tested cases.
- The framework combines symbolic model selection with energy-constrained optimization, enhancing interpretability.
Read more
Minimum-Action Learning: Energy-Constrained Symbolic Model Selection for Physical Law Identification from Noisy Data
Summary
The paper introduces Minimum-Action Learning (MAL), a novel framework aimed at identifying physical laws from noisy observational data. The approach focuses on selecting symbolic force laws from a predefined library by minimizing a Triple-Action functional that incorporates trajectory reconstruction, architectural sparsity, and energy conservation. A significant innovation is the wide-stencil acceleration-matching technique, which drastically reduces noise variance, transforming an initially intractable problem into a learnable one. The authors demonstrate the effectiveness of MAL through two benchmarks: Kepler's law and Hooke's law. The results show that MAL can accurately recover the correct force law with a high degree of efficiency, achieving a significant reduction in energy consumption compared to traditional methods. The framework not only identifies the true physical laws but also generates interpretable models that exhibit characteristics of evolved biological systems, suggesting a deeper connection between energy optimization in biology and machine learning.
Methodology
MAL is implemented as a differentiable neural network (MinActionNet) that utilizes a Triple-Action objective function to enforce constraints related to information maximization, energy minimization, and symmetry. The model incorporates a Noether Force Basis to parameterize forces and employs bimodal glial-neural optimization to draw structural analogies from biological systems.
Results
MAL successfully recovered the correct force law for Kepler's law with an exponent of p = 3.01 ± 0.01 in 835 seconds and an energy consumption of approximately 0.07 kWh, representing a 40% reduction compared to baseline methods. The raw correct-basis rates were 40% for Kepler and 90% for Hooke, with a 100% identification rate achieved through energy-conservation-based model selection.
Implications
The findings suggest that integrating biological principles of energy optimization into machine learning can enhance the efficiency and interpretability of models used for physical law identification. This approach may have broader applications in scientific discovery and the development of interpretable AI systems.
Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift
Efficient ML
- Proposes a novel staged transfer-learning framework for drug-response prediction.
- Demonstrates that unsupervised pretraining improves few-shot adaptation to patient tumors.
- Highlights the importance of separating representation learning from task supervision.
- Provides insights into the latent-space geometry affecting adaptation efficiency.
Read more
Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift
Summary
This paper addresses the challenge of predicting drug responses in patients using preclinical data, highlighting the significant biological differences between in vitro cell lines and patient tumors. The authors propose a staged transfer-learning framework that separates representation learning from task supervision, aiming for sample-efficient adaptation of drug-response models to patient data. The framework utilizes autoencoder-based representation learning to first learn cellular and drug representations from large unlabeled pharmacogenomic datasets. These representations are then aligned with drug-response labels on cell-line data and adapted to patient tumors using few-shot supervision. The study systematically evaluates the framework across various settings, revealing that while unsupervised pretraining offers limited benefits when source and target domains overlap, it significantly enhances adaptation to patient tumors with minimal labeled data. The results indicate that structured and transferable representations can reduce the need for extensive clinical supervision, facilitating the translation of preclinical findings to clinical applications in oncology.
Methodology
The authors developed a staged transfer-learning framework that consists of three main components: unsupervised representation learning using autoencoders on unlabeled pharmacogenomic data, task-specific alignment with drug-response labels on cell-line data, and few-shot adaptation to patient tumors. The framework was evaluated through systematic experiments across in-domain, cross-dataset, and patient-level settings to assess its effectiveness under varying degrees of biological domain shift.
Results
The proposed framework achieved faster performance improvements during few-shot adaptation to patient tumors while maintaining comparable accuracy to traditional single-phase baselines on cell-line benchmarks. The systematic evaluation revealed that unsupervised pretraining is particularly beneficial when adapting to patient tumors with limited labeled data, effectively reducing the number of required labeled samples for successful transfer.
Implications
The findings suggest that the proposed framework can significantly enhance the practical application of drug-response prediction models in clinical settings, particularly in scenarios where labeled patient data is scarce. This approach may lead to more personalized treatment strategies in oncology by enabling efficient adaptation of preclinical models to real-world patient data.
Manifold-Matching Autoencoders
Theory
Generative Models
Efficient ML
- Introduction of Manifold-Matching Autoencoders (MMAE) for improved dimensionality reduction.
- Focus on aligning pairwise distances in latent space with input data distances.
- MMAE shows superior performance in preserving geometric and topological structures.
- Scalable approximation of Multidimensional Scaling (MDS) is achieved.
Read more
Manifold-Matching Autoencoders
Summary
This paper introduces Manifold-Matching Autoencoders (MMAE), an innovative unsupervised regularization technique for autoencoders that focuses on aligning pairwise distances in the latent space with those in the input data space. By minimizing the mean squared error between these distance matrices, MMAE enhances the preservation of geometric and topological structures in the data, which is crucial for effective dimensionality reduction. The authors highlight that this method can be applied flexibly to lower-dimensional representations, making it adaptable for various datasets. The study demonstrates that MMAE outperforms existing methods in preserving closest neighbor distances and persistence homology measures. Additionally, it provides a scalable approximation of Multidimensional Scaling (MDS), which is significant given the computational challenges associated with traditional MDS. The paper includes experiments on synthetic datasets, such as the nested spheres, and real-world benchmarks, showcasing the effectiveness of MMAE in visualizing and maintaining the underlying structure of complex data.
Methodology
The methodology involves adding a regularization term, called Manifold-Matching (MM-reg), to the standard autoencoder objective function. This term minimizes the mean squared error between the pairwise distance matrix of the latent space and a reference distance matrix derived from the input data or its embedding. This approach allows for flexibility in dimensionality, enabling the use of higher-dimensional references for lower-dimensional latent spaces.
Results
The results indicate that MMAE significantly improves the preservation of topological features and geometric structures compared to traditional autoencoders and other topological variants. The experiments reveal that MMAE effectively recovers complex structures in synthetic datasets and performs competitively on real-world benchmarks, demonstrating its potential as a robust tool for dimensionality reduction.
Implications
The findings suggest that MMAE can be a valuable method for tasks requiring dimensionality reduction while maintaining the integrity of data structures, such as anomaly detection, visualization of high-dimensional data, and generative modeling. Its scalability and flexibility make it suitable for various applications across different domains.
Abstraction as a Memory-Efficient Inductive Bias for Continual Learning
Theory
Efficient ML
Graph Learning
- AAT introduces a lightweight, loss-level abstraction mechanism for online continual learning.
- The method stabilizes learning by optimizing over both concrete instances and their abstract representations.
- AAT outperforms standard instance-only learning and matches or exceeds experience replay baselines.
- The paper introduces two new benchmarks for evaluating continual learning methods.
Read more
Abstraction as a Memory-Efficient Inductive Bias for Continual Learning
Summary
This paper addresses the challenges of continual learning in non-stationary environments, where models must learn new information without forgetting previously acquired knowledge. The authors propose a novel approach called Abstraction-Augmented Training (AAT), which introduces a memory-efficient inductive bias by encouraging models to capture latent relational structures across examples. AAT modifies the training loss to optimize both concrete instances and their abstract representations, thereby stabilizing learning without the need for a replay buffer. The paper evaluates AAT on two benchmarks: the Relational Cycle Benchmark, which uses entity masking to assess relational reasoning, and the Narrative Abstraction Benchmark, which focuses on shared narrative structures. The results demonstrate that AAT achieves performance comparable to or exceeding strong experience replay baselines while requiring no additional memory and minimal changes to the training objective. The findings highlight the effectiveness of structural abstraction as a powerful alternative to traditional memory-based methods in continual learning.
Methodology
The authors developed AAT as a loss-level modification that encourages models to learn abstract representations alongside concrete instances. They evaluated AAT on two benchmarks designed to capture relational and narrative abstractions, analyzing training dynamics and performance metrics to demonstrate its effectiveness in mitigating catastrophic forgetting and plasticity loss.
Results
AAT achieved performance levels comparable to or exceeding those of strong experience replay methods, while requiring zero additional memory. The results from both benchmarks indicated that AAT effectively stabilized learning and improved generalization in continual learning scenarios.
Implications
The findings suggest that structural abstraction can serve as a viable alternative to memory-intensive methods in continual learning, potentially leading to more efficient and scalable learning systems in dynamic environments. This approach could be applied in various domains where continual learning is essential, such as robotics, natural language processing, and real-time data analysis.
Unsupervised Symbolic Anomaly Detection
Interpretability
- SYRAN provides a transparent and interpretable approach to anomaly detection using symbolic regression.
- The method generates human-readable equations that describe normal data patterns, allowing for direct inspection and validation.
- SYRAN achieves competitive anomaly detection performance compared to existing state-of-the-art methods.
- The approach is applicable across various domains without the need for labeled anomaly data.
Read more
Unsupervised Symbolic Anomaly Detection
Summary
This paper introduces SYRAN, an innovative unsupervised anomaly detection method that leverages symbolic regression to model normality through human-readable equations. Unlike traditional methods that often operate as black boxes, SYRAN generates symbolic expressions that represent invariant functions, which are approximately constant on normal data. Deviations from these functions yield interpretable anomaly scores, enhancing the transparency of the detection process. The authors argue that this approach addresses significant limitations in existing anomaly detection techniques, particularly in high-stakes applications where understanding model behavior is crucial. SYRAN is evaluated against standard anomaly detection datasets, demonstrating strong performance comparable to state-of-the-art methods while maintaining high explainability. The results indicate that the symbolic equations produced by SYRAN correspond to known scientific and medical relationships, making them valuable for further analysis and decision-making. The code for SYRAN is made publicly available to promote reproducibility and further research in this area.
Methodology
SYRAN employs symbolic regression to learn a collection of scalar functions that are approximately constant on normal data. It assigns anomaly scores based on deviations from these functions, producing closed-form equations that are interpretable by design. The method addresses challenges such as avoiding trivial solutions and balancing data fit with expression complexity.
Results
Experimental evaluations show that SYRAN maintains strong anomaly detection performance while offering unparalleled explainability. The symbolic equations generated correspond to established scientific and medical relationships, enhancing their utility in practical applications.
Implications
The development of SYRAN has significant implications for fields requiring reliable anomaly detection, such as healthcare and predictive maintenance. Its interpretable nature allows practitioners to understand model behavior, fostering trust and facilitating the integration of anomaly detection into critical decision-making processes.
Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting
Time Series
- TIPS integrates multiple inductive biases into a unified Transformer model for financial forecasting.
- The framework utilizes knowledge distillation to synthesize the strengths of bias-specialized teacher models.
- TIPS outperforms existing state-of-the-art models in financial time series forecasting across multiple metrics.
- The model demonstrates significant computational efficiency, requiring only 38% of the inference-time computation compared to alternatives.
Read more
Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting
Summary
This paper addresses the limitations of Transformer-based models in financial time series forecasting, where traditional assumptions of stationarity and stable dynamics often fail. The authors introduce TIPS (Transformer with Inductive Prior Synthesis), a novel framework that integrates diverse inductive biases—such as causality, locality, and periodicity—into a single Transformer model through knowledge distillation. TIPS first trains specialized teacher models that embody these biases and then distills their knowledge into a student model that adapts to different market regimes. The empirical results demonstrate that TIPS significantly outperforms state-of-the-art models across four major equity markets, achieving superior annual returns, Sharpe ratios, and Calmar ratios while maintaining lower computational costs. The findings emphasize the necessity of utilizing regime-dependent inductive biases for robust financial forecasting in non-stationary environments.
Methodology
The authors propose TIPS, which involves training bias-specialized Transformer teacher models using attention masking. These teachers are then distilled into a single student model that aligns with various inductive biases depending on market regimes. This approach allows for the effective integration of diverse temporal priors while addressing the merging penalty that often arises from naively combining biases.
Results
TIPS achieved state-of-the-art performance, outperforming strong ensemble baselines by 55% in annual return, 9% in Sharpe ratio, and 16% in Calmar ratio. Additionally, TIPS demonstrated statistically significant excess returns compared to vanilla Transformers and its teacher ensembles, showcasing its effectiveness in adapting to different market conditions.
Implications
The findings suggest that incorporating regime-dependent inductive biases can enhance the robustness and accuracy of financial forecasting models. This approach may be applicable to other domains characterized by non-stationarity and varying dynamics, potentially leading to improved predictive performance across various fields.
The Importance of Being Smoothly Calibrated
Theory
- Introduces a new omniprediction guarantee for smoothly calibrated predictors.
- Characterizes smooth calibration using the earth mover's distance to the nearest perfectly calibrated distribution.
- Demonstrates that estimating the upper distance to calibration is fundamentally limited.
- Unifies and extends prior results on omniprediction from smooth calibration.
Read more
The Importance of Being Smoothly Calibrated
Summary
This paper emphasizes the significance of smooth calibration as a robust measure of calibration error in machine learning. The authors generalize and unify previous findings on smooth calibration, presenting a new omniprediction guarantee for smoothly calibrated predictors across all bounded proper losses. By introducing noise to the predictor, they demonstrate that the omniprediction error is bounded by the smooth calibration error and the earth mover's distance from a benchmark predictor. The paper also provides a new characterization of smooth calibration in relation to the earth mover's distance to the nearest perfectly calibrated distribution, simplifying previous proofs. Furthermore, it highlights the limitations in estimating the upper distance to calibration, contrasting it with the known impossibility of estimating the distance to calibration with a finite number of samples. Overall, the work advances the understanding of calibration in predictive modeling and its implications for decision-making under uncertainty.
Methodology
The authors develop theoretical frameworks to analyze smooth calibration and omniprediction. They introduce noise to the predictors and establish bounds on omniprediction error in relation to smooth calibration error and earth mover's distance. They also provide new characterizations and proofs related to calibration distances.
Results
The paper presents a new omniprediction guarantee that shows the relationship between smooth calibration error and omniprediction performance. It establishes a clear characterization of smooth calibration and demonstrates the inapproximability of the upper distance to calibration, providing a comprehensive understanding of the calibration landscape.
Implications
The findings have significant implications for machine learning practitioners, particularly in areas where calibration is crucial for decision-making. The results can enhance the reliability of predictive models in various applications, ensuring that downstream decision-makers can trust the predictions made by these models.
Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models
Generative Models
Theory
Graph Learning
- Identifies fundamental flaws in the assumption that local causal mechanisms yield global counterfactual coherence.
- Introduces a sheaf-theoretic framework to model structural causal models over Wasserstein spaces.
- Develops the Entropic Wasserstein Causal Sheaf Laplacian to resolve topological conflicts without singularities.
- Demonstrates the effectiveness of the proposed framework in high-dimensional scRNA-seq counterfactuals.
Read more
Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models
Summary
This paper addresses a critical assumption in continuous generative models, which is that locally consistent causal mechanisms lead to globally coherent counterfactuals. The authors demonstrate that this assumption fails in the presence of non-trivial homology in causal graphs, such as structural conflicts or hidden confounders. They formalize structural causal models using cellular sheaves over Wasserstein spaces, introducing the concept of cohomological obstructions in measure spaces. To mitigate computational challenges and avoid deterministic singularities, the authors propose entropic regularization and derive the Entropic Wasserstein Causal Sheaf Laplacian, a new system of coupled non-linear Fokker-Planck equations. They present an entropic pullback lemma that integrates with the Implicit Function Theorem to create a direct algorithmic link to automatic differentiation, achieving efficient memory usage for reverse-mode gradients. Empirical results show that their framework effectively utilizes thermodynamic noise to overcome topological barriers in high-dimensional single-cell RNA sequencing counterfactuals. Additionally, they introduce the Topological Causal Score, which serves as a sensitive detector for topology-aware causal discovery.
Methodology
The authors formalize structural causal models as cellular sheaves over Wasserstein spaces and introduce entropic regularization to derive the Entropic Wasserstein Causal Sheaf Laplacian. They utilize the Implicit Function Theorem and automatic differentiation to achieve efficient computation of gradients.
Results
The proposed framework successfully navigates topological barriers in counterfactual inference, demonstrating robustness in high-dimensional settings. The Topological Causal Score effectively detects causal relationships informed by topological structures.
Implications
This work has significant implications for causal inference in complex systems, particularly in fields like genomics and social sciences, where understanding causal relationships is crucial. The framework could enhance the development of generative models that account for topological complexities.
Only relative ranks matter in weight-clustered large language models
Large Language Models
Efficient ML
Theory
- Relative ranks of weights are more important than their exact values in LLMs.
- Weight clustering can significantly compress LLMs without retraining, preserving accuracy.
- Fine-tuning cluster means can recover a portion of accuracy loss at low cost.
- Rank distortion leads to substantial performance degradation, while rank preservation maintains model quality.
Read more
Only relative ranks matter in weight-clustered large language models
Summary
This paper investigates the significance of weight values in large language models (LLMs), proposing that the relative ranks of weights are more critical than their exact numerical values. The authors apply weight clustering using K-means to pretrained models, reducing the number of unique weight values while maintaining accuracy. They demonstrate that models like Llama 3.1-8B-Instruct and SmolLM2-135M can be compressed to 16-64 distinct values without retraining, achieving substantial storage savings. Fine-tuning the cluster means can recover 30-40% of the accuracy gap at minimal cost. The study reveals that scrambling the relative ranks of clusters significantly degrades model performance, while rank-preserving modifications have minimal impact. The findings suggest that scale drift, rather than rank distortion, is the primary mechanism leading to performance collapse when multiple layers are perturbed. The authors conclude that understanding the rank-based structure of weights can enhance model compression and robustness.
Methodology
The authors employed K-means clustering to group weights in pretrained models into K clusters, replacing each weight with its cluster representative. They systematically modified cluster means while keeping assignments fixed to analyze the impact on model performance, focusing on the preservation of relative ranks.
Results
The application of weight clustering allowed for a reduction in unique weight values to as few as 16-64, maintaining strong accuracy across tasks without retraining. Fine-tuning cluster means recovered 30-40% of the accuracy gap. The study found that scrambling relative ranks led to significant performance degradation, while rank-preserving changes caused minimal loss.
Implications
These findings suggest a new perspective on model compression and robustness, emphasizing the importance of relative weight ranks. This could lead to more efficient compression techniques for LLMs and a better understanding of the underlying structures that contribute to model performance.
Evidential Domain Adaptation for Remaining Useful Life Prediction with Incomplete Degradation
Time Series
- EviAdapt addresses the limitations of existing domain adaptation methods in RUL prediction with incomplete degradation data.
- The method segments data into distinct degradation stages for accurate stage-wise alignment.
- Evidential uncertainty alignment is introduced to manage varying degradation patterns across domains.
- Extensive experiments show that EviAdapt significantly outperforms current state-of-the-art techniques.
Read more
Evidential Domain Adaptation for Remaining Useful Life Prediction with Incomplete Degradation
Summary
This paper addresses the challenge of predicting Remaining Useful Life (RUL) in scenarios where the target domain lacks labeled data, particularly in cases of incomplete degradation trajectories. Existing domain adaptation (DA) methods often fail in such contexts due to two main limitations: they typically focus on global alignment, which can misalign degradation stages, and they do not account for varying degradation patterns under different operating conditions. To overcome these challenges, the authors propose a novel approach called EviAdapt, which utilizes evidential learning to enhance domain adaptation. EviAdapt segments both source and target domain data into distinct degradation stages based on degradation rates, allowing for stage-wise alignment of samples. Additionally, it introduces an evidential uncertainty alignment technique to align uncertainty levels across matched stages. The effectiveness of EviAdapt is validated through experiments on multiple datasets, demonstrating significant improvements over state-of-the-art methods in RUL prediction under incomplete degradation scenarios.
Methodology
EviAdapt segments source and target domain data into distinct degradation stages based on degradation rates, enabling accurate stage-wise alignment. It also employs evidential learning to estimate and align uncertainty levels across matched degradation stages, addressing the misalignment and varying patterns in the data.
Results
The experiments conducted on the C-MAPSS, N-CMAPSS, and PHM2010 datasets demonstrate that EviAdapt significantly outperforms existing domain adaptation methods, showcasing its effectiveness in handling incomplete degradation scenarios for RUL prediction.
Implications
The proposed EviAdapt method has significant implications for improving RUL prediction in industrial applications, particularly where data from late degradation stages is scarce. This can lead to better maintenance strategies, reduced costs, and enhanced safety in operational environments.
Federated Learning with Multi-Partner OneFlorida+ Consortium Data for Predicting Major Postoperative Complications
Federated Learning
- Federated learning models were developed to predict major postoperative complications using multicenter data.
- The study included a large cohort of 358,644 patients and 494,163 surgical procedures.
- Federated learning models showed superior or comparable predictive performance compared to local and central models.
- The approach preserves patient data privacy while enhancing model generalizability.
Read more
Federated Learning with Multi-Partner OneFlorida+ Consortium Data for Predicting Major Postoperative Complications
Summary
This study investigates the application of federated learning (FL) models to predict major postoperative complications and mortality using a large, multicenter dataset from the OneFlorida Data Trust. The authors hypothesize that FL models can enhance generalizability while ensuring data privacy. The retrospective cohort study analyzed data from 358,644 adult patients who underwent 494,163 major surgical procedures across five healthcare institutions from 2012 to 2023. The study developed and validated FL models to predict risks of ICU admission, mechanical ventilation, acute kidney injury, and in-hospital mortality. The performance of these FL models was compared against local models trained on single-center data and central models trained on pooled datasets. Results indicated that FL models exhibited strong predictive performance, with area under the receiver operating characteristics curve (AUROC) and area under the precision-recall curve (AUPRC) scores that were comparable or superior to local models. The findings demonstrate the feasibility of using federated learning in clinical decision support systems, highlighting its potential to leverage diverse data while preserving patient privacy.
Methodology
The study employed a retrospective, longitudinal cohort design, utilizing electronic health record (EHR) data from the OneFlorida Data Trust. It included 358,644 adult patients and developed federated learning models to predict postoperative risks. Performance was evaluated using AUROC and AUPRC metrics, comparing federated models with local and central models.
Results
The federated learning models demonstrated strong predictive performance across all outcomes, with AUROC and AUPRC scores consistently comparable or superior to local models at each site. This indicates the robustness and generalizability of the federated learning approach.
Implications
The successful application of federated learning in this context suggests its potential for broader use in clinical decision support systems, allowing for improved predictive analytics while maintaining patient data privacy. This could lead to better patient outcomes and resource allocation in surgical settings.
DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis
Generative Models
Computer Vision
- Introduction of DSS-GAN, the first GAN to use Mamba as a generator backbone for noise-to-image synthesis.
- Development of the Directional Latent Routing (DLR) mechanism for improved class conditioning.
- Achieves better performance metrics (FID, KID, precision-recall) than StyleGAN2-ADA with significantly fewer parameters.
- Demonstrates that directional subvectors in the latent space allow for structured changes in synthesized images.
Read more
DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis
Summary
The paper introduces DSS-GAN, a novel generative adversarial network that utilizes Mamba as a hierarchical generator backbone for class-conditional image synthesis. The primary innovation is the Directional Latent Routing (DLR) mechanism, which decomposes the latent vector into direction-specific subvectors. Each subvector is conditioned with a class embedding to modulate the Mamba scan's feature map, allowing for a more nuanced and spatially coherent integration of class identity and latent structure. This approach contrasts with traditional methods that apply a global conditioning signal. DSS-GAN demonstrates significant improvements in Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and precision-recall scores compared to StyleGAN2-ADA across various datasets, while also requiring over three times fewer parameters. The analysis of the latent space indicates that the directional subvectors lead to structured, direction-correlated changes in the generated images, enhancing the model's interpretability and control over the synthesis process.
Methodology
DSS-GAN employs a hierarchical generator architecture based on Mamba, utilizing the DLR mechanism to decompose the latent vector into direction-specific subvectors. These subvectors are conditioned with class embeddings to modulate the Mamba scan's feature map, ensuring consistent application across all generative scales. The model's performance is evaluated using standard metrics such as FID, KID, and precision-recall scores across multiple datasets.
Results
DSS-GAN outperforms StyleGAN2-ADA in terms of FID, KID, and precision-recall scores across all tested datasets while using over three times fewer parameters. The latent space analysis reveals that the directional subvectors have distinct influences on the spatial features of the generated images, allowing for more controlled and interpretable image synthesis.
Implications
The advancements presented in DSS-GAN could significantly enhance applications requiring precise class-conditional image generation, such as in art generation, virtual reality, and other creative industries. The model's efficiency and interpretability may also facilitate its adoption in real-time applications where speed and control are critical.
Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models
NLP
Large Language Models
Interpretability
- Introduction of Capability-Guided Compression (CGC) framework for LLMs.
- Capability density maps derived from Sparse Autoencoders provide a new signal for compression budget allocation.
- Theoretical foundation linking capability density to component-level phase transitions.
- Experimental validation shows independence of capability density from existing importance metrics.
Read more
Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models
Summary
This paper addresses the limitations of existing large language model (LLM) compression techniques, which often allocate compression budgets without understanding the functional contributions of individual model components. The author introduces the concept of Capability-Guided Compression (CGC), which utilizes Sparse Autoencoder (SAE)-derived capability density maps to inform differential budget allocation across transformer components. This approach aims to mitigate the 'capability-blind compression problem,' which has been linked to performance degradation and abrupt phase transitions in model capabilities during compression. The paper theoretically establishes that components with higher capability density exhibit lower structural redundancy and can withstand higher compression ratios before experiencing performance drops. The CGC framework integrates with existing phase avoidance strategies, extending the effective compression capabilities of LLMs. Experimental results on the GPT-2 Medium model demonstrate that capability density is statistically independent of traditional importance metrics, suggesting a novel approach to compression that prioritizes functional relevance over mere statistical significance. The findings indicate the need for further validation on larger models with properly trained SAEs to fully realize the potential of CGC.
Methodology
The paper proposes a novel framework called Capability-Guided Compression (CGC), which employs Sparse Autoencoder (SAE)-derived capability density maps to allocate compression budgets differentially across transformer components. The methodology includes theoretical analysis of capability density in relation to phase transitions and experimental validation using the GPT-2 Medium model.
Results
The experiments confirm that capability density is orthogonal to existing importance metrics, with a Spearman correlation of ρ = -0.054. The results also indicate that the PPL-based compression comparison yielded negative results, attributed to the limitations of the GPT-2 Medium model and the evaluation metric used.
Implications
The findings suggest a paradigm shift in how compression is approached for large language models, emphasizing the importance of understanding component capabilities. This could lead to more efficient and effective compression techniques that maintain model performance while reducing resource requirements, ultimately making LLMs more accessible for practical applications.
QuantFL: Sustainable Federated Learning for Edge IoT via Pre-Trained Model Quantisation
Federated Learning
Efficient ML
- QUANTFL combines pre-trained model initialisation with structured quantisation to reduce communication costs in federated learning.
- The framework achieves a 40% reduction in total communication while maintaining or exceeding accuracy compared to uncompressed baselines.
- QUANTFL employs bucket-based quantisation schemes that adapt to the distribution of model updates, enhancing efficiency.
- The method demonstrates robustness under non-IID data conditions, making it suitable for diverse IoT applications.
Read more
QuantFL: Sustainable Federated Learning for Edge IoT via Pre-Trained Model Quantisation
Summary
The paper introduces QUANTFL, a sustainable federated learning (FL) framework designed to minimize the carbon footprint associated with frequent uplink transmissions in edge IoT environments. By leveraging pre-trained models, QUANTFL enables aggressive quantisation of model updates, significantly reducing communication costs. The authors demonstrate that pre-training concentrates update statistics, allowing for memory-efficient bucket quantisation without the need for complex error-feedback mechanisms. The framework is evaluated on datasets such as MNIST and CIFAR-100, showing a 40% reduction in total communication and achieving high accuracy with fewer bits transmitted. QUANTFL's approach not only enhances communication efficiency but also maintains robust performance under heterogeneous data distributions, making it a practical solution for battery-constrained IoT networks.
Methodology
QUANTFL utilizes pre-trained models to initialize client updates, which are then quantised using two bucket-based schemes: bucket-uniform (BU) and bucket-quantile (BQ). This method reduces the bit-length of model updates while preserving learning performance. The framework is benchmarked against existing methods like QSGD and FedAvg, focusing on communication efficiency and accuracy.
Results
The experiments conducted on MNIST and CIFAR-100 datasets reveal that QUANTFL reduces total communication by 40%, achieving 89.00% accuracy on MNIST and 66.89% on CIFAR-100 with significantly fewer bits transmitted. The results indicate that QUANTFL can match or exceed the performance of uncompressed models while operating under strict bandwidth constraints.
Implications
QUANTFL's framework has significant implications for deploying federated learning in edge IoT environments, particularly in scenarios where energy efficiency and communication costs are critical. Its ability to leverage pre-trained models for efficient training can facilitate broader adoption of federated learning in real-world applications, enhancing privacy and reducing environmental impact.
Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication
Computer Vision
Theory
Optimization
- Introduction of TopoJSCC, a topology-aware DeepJSCC framework.
- Integration of persistent-homology regularizers for topology preservation.
- Improved performance in topology preservation and PSNR under low SNR conditions.
- End-to-end learning without the need for side information.
Read more
Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication
Summary
This paper introduces TopoJSCC, a novel topology-aware framework for Deep Joint Source-Channel Coding (DeepJSCC) aimed at enhancing semantic communication in wireless vision applications, particularly those requiring preservation of global structural information, such as autonomous driving. Traditional DeepJSCC methods focus on pixel-wise losses, which often fail to maintain the connectivity and topology of structured images. TopoJSCC addresses this gap by integrating persistent-homology regularizers into the end-to-end training process. The framework penalizes Wasserstein distances between persistence diagrams of original and reconstructed images, as well as between Vietoris–Rips complexes of latent features before and after transmission. This approach promotes a robust latent manifold without requiring additional side information. Experimental results demonstrate that TopoJSCC significantly improves topology preservation and peak signal-to-noise ratio (PSNR) in low signal-to-noise ratio (SNR) and bandwidth-limited conditions, outperforming existing methods such as DeepJSCC and TopoCode.
Methodology
The authors propose a topology-aware framework that augments traditional DeepJSCC with persistent-homology-based losses. This includes an image-domain topological loss using Wasserstein distances between persistence diagrams and a latent-space topological loss applied to Vietoris–Rips complexes derived from encoder features. The framework is trained end-to-end, allowing for direct optimization of topology preservation during the coding process.
Results
Experiments conducted on topology-rich datasets under various channel conditions (AWGN and Rayleigh fading) indicate that TopoJSCC significantly reduces persistence-diagram distortion and topological errors while maintaining PSNR levels comparable to traditional DeepJSCC methods. The results highlight the effectiveness of the proposed topology-preserving approach in enhancing semantic communication.
Implications
The findings suggest that TopoJSCC can be particularly beneficial for safety-critical applications in autonomous driving and telemedicine, where maintaining the structural integrity of visual data is crucial. This approach could lead to more reliable communication systems that prioritize semantic understanding over mere pixel fidelity.
Translation Invariance of Neural Operators for the FitzHugh-Nagumo Model
Theory
Efficient ML
Time Series
- Introduces a novel training strategy exploiting translation invariance in the FHN model.
- Benchmarks seven different Neural Operator architectures for modeling excitable cell dynamics.
- CNOs excel in translated dynamics but require higher training costs.
- FNOs achieve low training error but have high inference times and less accuracy on translated data.
Read more
Translation Invariance of Neural Operators for the FitzHugh-Nagumo Model
Summary
This paper explores the capabilities of Neural Operators (NOs) in modeling the FitzHugh-Nagumo (FHN) model, which describes the dynamics of excitable cells. The study introduces a novel training strategy that leverages the translation invariance property of the FHN model, allowing for the generation of training datasets with varying spatial locations and intensities of applied current while keeping time fixed. The authors benchmark seven NO architectures, including Convolutional Neural Operators (CNOs), Deep Operator Networks (DONs), and Fourier Neural Operators (FNOs), among others, assessing their performance based on training and test accuracy, computational efficiency, and inference speed. The findings indicate that while CNOs perform well on translated dynamics, they incur higher training costs. FNOs achieve the lowest training error but have the highest inference time and less accurate predictions on translated dynamics. DONs and their variants show high efficiency but struggle with generalization to the test set. Overall, the paper provides a comprehensive evaluation of NOs in capturing complex dynamics and highlights their current limitations and capabilities.
Methodology
The study employs a novel training strategy that generates datasets with varying applied current spatial locations and intensities while keeping time fixed. The test set introduces a challenging scenario with translated applied currents in both time and space. Seven NO architectures are benchmarked based on their performance metrics, including training and test accuracy, computational efficiency, and inference speed.
Results
The results show that CNOs perform well on translated dynamics but require higher training costs. FNOs have the lowest training error but the highest inference time, and they provide less accurate predictions for translated dynamics. DONs and their variants are efficient in training and inference but do not generalize well to the test set.
Implications
The findings suggest that while NOs can capture complex dynamics in the FHN model, there are trade-offs between accuracy, computational efficiency, and generalization. This research could inform future developments in Scientific Machine Learning applications in computational electrophysiology and related fields.
Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization
Optimization
- Q-BioLat models protein fitness landscapes in binary latent spaces for efficient optimization.
- The framework utilizes pretrained protein language models to create continuous embeddings that are binarized for optimization.
- Empirical results show that Q-BioLat effectively identifies high-fitness protein variants.
- Different optimization strategies exhibit distinct behaviors based on latent space dimensionality.
Read more
Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization
Summary
The paper introduces Q-BioLat, a novel framework for modeling and optimizing protein fitness landscapes using binary latent representations. The approach begins with protein sequences that are transformed into continuous embeddings via pretrained protein language models. These embeddings are then binarized to create compact representations suitable for quadratic unconstrained binary optimization (QUBO). This formulation allows for efficient combinatorial search using classical optimization techniques such as simulated annealing and genetic algorithms. The authors evaluate Q-BioLat on the ProteinGym benchmark, demonstrating its ability to capture significant structures in protein fitness landscapes and identify high-fitness variants. The results indicate that the method can retrieve sequences that are closely aligned with the top-performing sequences in the training dataset. Furthermore, the study reveals that different optimization strategies yield varying results based on the dimensionality of the latent space, emphasizing the importance of representation design. Q-BioLat not only enhances protein fitness prediction but also establishes a connection between protein representation learning and combinatorial optimization, paving the way for future integration with quantum computing technologies.
Methodology
The methodology involves encoding protein sequences using pretrained protein language models to obtain continuous embeddings, which are then transformed into binary latent representations. These representations allow the formulation of protein fitness as a QUBO problem, enabling the use of combinatorial optimization techniques such as simulated annealing and genetic algorithms for efficient exploration of the fitness landscape.
Results
The evaluation on the ProteinGym benchmark shows that Q-BioLat captures meaningful structures in protein fitness landscapes, consistently retrieving sequences that rank within the top fraction of the training fitness distribution. The study also highlights the varying effectiveness of different optimization strategies based on the dimensionality of the latent space.
Implications
Q-BioLat provides a significant advancement in the field of protein engineering by enabling structured optimization of protein sequences. Its compatibility with quantum annealing hardware opens new possibilities for quantum-assisted approaches in protein design and optimization, potentially leading to breakthroughs in enzyme engineering and drug discovery.
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
Efficient ML
- FEAT addresses the O(N^2) complexity issue of traditional LDMs by utilizing linear-complexity encoding methods.
- The model combines local and global attention mechanisms to preserve expressive representations in structured data.
- FEAT incorporates a hybrid structural causal model for improved robustness in pre-training.
- Empirical evaluations show significant performance improvements over existing models on real-world datasets.
Read more
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
Summary
The paper introduces FEAT, a novel linear-complexity foundation model designed to handle extremely large structured data, addressing significant limitations of existing large structured-data models (LDMs). Traditional LDMs struggle with the quadratic complexity of self-attention mechanisms, which restricts their ability to process large datasets effectively. FEAT employs a multi-layer dual-axis encoding architecture that integrates two linear-complexity encoding layers: adaptive-fusion bi-Mamba-2 (AFBM) for local dependencies and convolutional gated linear attention (Conv-GLA) for global memory. This innovative design allows FEAT to maintain expressive representations while scaling linearly with the number of samples. Additionally, the model incorporates a hybrid structural causal model pipeline and a stable reconstruction objective to enhance robustness during pre-training, particularly for heavy-tailed structured distributions. Experimental results demonstrate that FEAT outperforms several baseline models in zero-shot performance across 11 real-world structured datasets, achieving up to 40 times faster inference speeds.
Methodology
FEAT employs a multi-layer dual-axis encoding architecture featuring two linear-complexity layers: AFBM for local dependencies and Conv-GLA for global memory. It also utilizes a hybrid structural causal model pipeline and a stable reconstruction objective tailored for heavy-tailed distributions.
Results
FEAT consistently outperformed several representative baselines in zero-shot performance across 11 real-world structured datasets, demonstrating its capability to scale linearly with the number of samples and achieving up to 40 times faster inference.
Implications
The development of FEAT has significant implications for various domains that rely on structured data, such as healthcare, finance, and e-commerce, enabling more efficient and effective modeling of large datasets for tasks like classification, regression, and decision support.
Federated Distributional Reinforcement Learning with Distributional Critic Regularization
Reinforcement Learning
Federated Learning
Robotics
- Introduction of FedDistRL, which federates distributional critics while keeping policies local.
- Development of TR-FedDistRL, a barycentric regularization method that biases critic updates towards a risk-aware reference distribution.
- Empirical demonstration of reduced mean-smearing and improved safety metrics compared to mean-oriented and non-federated baselines.
- Theoretical stability results for the constrained critic update under a Wasserstein metric.
Read more
Federated Distributional Reinforcement Learning with Distributional Critic Regularization
Summary
This paper addresses the challenges of federated reinforcement learning (FRL), particularly the issue of mean-smearing that arises from parameter averaging, which can obscure important statistical properties of value distributions in safety-critical applications. The authors introduce a novel framework called Federated Distributional Reinforcement Learning (FedDistRL), which focuses on federating distributional critics while keeping the policy networks local. They propose a method termed TR-FedDistRL, which constructs a risk-aware Wasserstein barycenter from recent critic outputs to serve as a reference for constraining the critic updates. This approach mitigates the loss of distributional information during the aggregation process, ensuring that multimodal and heavy-tailed return distributions are preserved. The paper provides theoretical stability results for the constrained critic updates and demonstrates empirical improvements in safety metrics across various environments, including bandit problems, multi-agent gridworlds, and continuous highway scenarios.
Methodology
The authors formalize a federated distributional reinforcement learning framework where clients maintain local policy networks and share distributional critics. They implement a trust-region mechanism using a CVaR-weighted Wasserstein barycenter to constrain critic updates, ensuring that critical distributional information is retained during the federated learning process. The methodology includes theoretical analysis of the update stability and empirical validation through experiments in various environments.
Results
The experiments conducted demonstrate that TR-FedDistRL significantly reduces mean-smearing effects and improves safety proxies, such as catastrophe and accident rates, compared to traditional mean-oriented approaches. The proposed method also shows lower critic and policy drift, indicating better alignment with local distributional characteristics.
Implications
The findings suggest that incorporating distributional information in federated reinforcement learning can enhance safety and performance in applications where data privacy is crucial, such as autonomous driving and personal robotics. This work opens avenues for further research into risk-aware federated learning methodologies.
Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning
Time Series
- The proposed framework effectively balances fall and non-fall activity data using semi-supervised contrastive learning.
- Personalized models show improved recall and precision compared to traditional models trained on imbalanced datasets.
- The Training from Scratch approach outperforms other retraining strategies, highlighting the importance of tailored data in model training.
- The method simplifies the personalization process by automating sample selection, reducing the need for manual labeling.
Read more
Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning
Summary
This paper presents a novel framework for personalized fall detection that addresses the challenges posed by imbalanced datasets, particularly the scarcity of fall data compared to non-fall activities. The authors propose a method that integrates semi-supervised clustering with contrastive learning to selectively balance user feedback samples, enhancing the model's sensitivity to fall events. The framework is evaluated through three retraining strategies: Training from Scratch (TFS), Transfer Learning (TL), and Few-Shot Learning (FSL). Real-time experiments conducted with ten participants demonstrate that the TFS approach yields the highest performance, achieving up to a 25% improvement over baseline models, while FSL shows a 7% improvement. The results indicate that personalized models trained with selectively chosen data significantly outperform generalized models, improving recall without increasing false positives, thereby enhancing the practical applicability of fall detection systems in real-world scenarios.
Methodology
The authors employed a semi-supervised clustering approach combined with contrastive learning to selectively choose and balance training data for personalized fall detection models. They evaluated three retraining strategies: Training from Scratch (TFS), Transfer Learning (TL), and Few-Shot Learning (FSL), using feedback data collected from participants during real-time experiments.
Results
The results showed that the TFS approach achieved a 25% improvement over the baseline model, while the FSL approach achieved a 7% improvement. Personalized models trained with selectively chosen ADL data demonstrated better precision and recall compared to the initial common model and models trained with combined feedback data.
Implications
This research has significant implications for the development of personalized healthcare monitoring systems, particularly for elderly individuals living independently. By improving the accuracy of fall detection systems, the framework can enhance user safety and acceptance, leading to better health outcomes.
PhasorFlow: A Python Library for Unit Circle Based Computing
Theory
Optimization
Time Series
- Introduction of the Phasor Circuit model with a comprehensive gate library.
- Development of Variational Phasor Circuits for classical machine learning optimization.
- Implementation of a Phasor Transformer that enhances token mixing without parameter overhead.
- Validation of PhasorFlow on diverse tasks, showcasing its versatility and efficiency.
Read more
PhasorFlow: A Python Library for Unit Circle Based Computing
Summary
PhasorFlow is an open-source Python library that introduces a novel computational paradigm based on the S1 unit circle, allowing for the encoding of inputs as complex phasors. The library operates through unitary wave interference gates while preserving global norms, enabling algorithms to utilize continuous geometric gradients for predictive learning. The paper presents three main contributions: (1) the formalization of the Phasor Circuit model, which includes a library of 22 gates for various operations; (2) the introduction of Variational Phasor Circuits (VPC) that optimize continuous phase parameters for classical machine learning tasks; and (3) the development of the Phasor Transformer, which replaces traditional attention mechanisms with a DFT-based token mixing layer. The authors validate PhasorFlow across multiple tasks, including non-linear spatial classification and time-series prediction, demonstrating its effectiveness as a deterministic and lightweight alternative to classical neural networks and quantum circuits. PhasorFlow operates on classical hardware while leveraging the mathematical foundations of quantum mechanics.
Methodology
The methodology involves defining a circuit-based programming model where users create circuits of phasor threads and apply gate operations. The library supports analytical evaluation through matrix multiplication, ensuring deterministic outputs. The authors also introduce trainable phasor circuits and a novel token mixing layer inspired by existing architectures.
Results
PhasorFlow was validated on various tasks, including non-linear spatial classification, time-series prediction, and financial volatility detection. The results indicate that unit circle computing provides a robust framework for handling complex spatio-temporal dynamics, outperforming traditional methods in certain applications.
Implications
PhasorFlow has potential applications in fields requiring the analysis of oscillatory or cyclical data, such as neuroscience, finance, and systems biology. Its deterministic nature and compatibility with classical hardware make it an attractive alternative for developing efficient machine learning models.
Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals
NLP
- Introduces a reproducible framework for detecting undervalued football players based on objective mispricing.
- Combines structured market data with NLP-derived signals from news articles to improve player valuation.
- Demonstrates that market dynamics are the primary indicators of undervaluation, with NLP features providing additional insights.
- Utilizes SHAP analyses for interpretability, enhancing trust in the model's recommendations.
Read more
Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals
Summary
This paper presents a novel framework for identifying undervalued football players based on objective mispricing, moving away from subjective expert evaluations. The authors estimate an expected market value using structured data, including historical market dynamics and player features, and compare it to observed valuations to define mispricing. The study integrates Natural Language Processing (NLP) features derived from news articles to enhance the identification process. A gradient-boosted regression model is employed to explain market value variance, and a chronological evaluation method is used to avoid data leakage. The findings indicate that market dynamics are the primary signal for undervaluation, while NLP features provide supplementary benefits that enhance robustness and interpretability. The proposed framework emphasizes ranking and shortlisting players rather than binary classification, making it applicable for scouting workflows. The paper also includes a reproducibility and ethics statement, highlighting its practical implications for decision support in football analytics.
Methodology
The authors developed a leakage-aware framework that estimates expected market values without using current valuations as inputs. They employed gradient-boosted regression for modeling and assessed the contribution of NLP features through ROC-AUC-based ablation studies. SHAP analyses were used to interpret model outputs and understand the influence of various features.
Results
The model explained a significant portion of the variance in log-transformed market values. The ablation studies revealed that while market dynamics were the dominant signal for identifying undervalued players, the inclusion of NLP features consistently improved the model's robustness and interpretability. SHAP analyses highlighted the importance of market trends and player age, with news-derived signals amplifying insights in uncertain contexts.
Implications
The framework has practical applications in football scouting and player valuation, providing teams with a data-driven approach to identify undervalued players. By integrating market dynamics with news signals, the model enhances decision-making processes in transfer negotiations and squad planning.
OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning
Multimodal
Large Language Models
Interpretability
- OMNIFLOW is the first training-free framework for generalized fluid physical reasoning using LLMs.
- The architecture enables zero-shot generalization to different governing equations with high prediction accuracy.
- OMNIFLOW generates interpretable structured analysis reports, enhancing scientific discovery and decision-making.
Read more
OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning
Summary
The paper introduces OMNIFLOW, a neuro-symbolic architecture designed to enhance the reasoning capabilities of large language models (LLMs) in the context of scientific problems governed by Partial Differential Equations (PDEs). Traditional LLMs often struggle with continuous spatiotemporal dynamics, leading to non-physical outputs. OMNIFLOW addresses this issue by grounding LLMs in fundamental physical laws without requiring domain-specific fine-tuning. The architecture features a Semantic-Symbolic Alignment mechanism that translates high-dimensional flow tensors into topological linguistic descriptors, allowing the model to better understand physical structures. Additionally, the Physics-Guided Chain-of-Thought (PG-CoT) workflow facilitates reasoning through dynamic constraint injection and iterative verification. The authors evaluate OMNIFLOW on benchmarks related to microscopic turbulence, theoretical Navier-Stokes equations, and global weather forecasting, demonstrating superior performance in zero-shot generalization and few-shot adaptation tasks compared to traditional deep learning models. OMNIFLOW not only improves prediction accuracy but also provides interpretable reasoning reports, marking a significant advancement in scientific reasoning and decision support.
Methodology
OMNIFLOW employs a neuro-symbolic mechanism that decouples physical computation from cognitive reasoning. It utilizes a Visual Symbolic Projector to convert raw flow data into semantic tokens and implements a Physics-Guided Chain-of-Thought (PG-CoT) for reasoning, which includes dynamic retrieval of physical knowledge and consistency checks during output generation.
Results
Empirical evaluations show that OMNIFLOW significantly outperforms traditional deep learning baselines in zero-shot generalization and few-shot adaptation tasks across various scientific benchmarks, achieving prediction accuracy comparable to specialized models while providing transparent and interpretable reasoning.
Implications
The advancements presented by OMNIFLOW could revolutionize how scientific reasoning is conducted, enabling more accurate and interpretable models for fluid dynamics and other scientific domains, thereby enhancing decision support systems in research and industry.
Causal Representation Learning on High-Dimensional Data: Benchmarks, Reproducibility, and Evaluation Metrics
Theory
- Causal representation learning (CRL) models are essential for understanding causal relationships in high-dimensional data.
- The paper critiques existing datasets and proposes characteristics for ideal datasets in CRL development.
- An integrated evaluation framework is introduced to consolidate multiple performance metrics into a single score.
- Reproducibility is highlighted as a critical issue, with recommendations for best practices in sharing code and results.
Read more
Causal Representation Learning on High-Dimensional Data: Benchmarks, Reproducibility, and Evaluation Metrics
Summary
This paper addresses the challenges in causal representation learning (CRL) for high-dimensional data, focusing on the need for robust evaluation metrics and reproducibility in model development. CRL aims to transform high-dimensional data into a latent space that captures causal relationships, facilitating counterfactual reasoning and interventions. The authors analyze existing synthetic and real-world datasets used in CRL, highlighting their limitations and proposing essential characteristics for effective dataset design. They also introduce a unified aggregate metric to evaluate model performance across multiple dimensions, including reconstruction, disentanglement, causal discovery, and counterfactual reasoning. Furthermore, the paper reviews existing implementations for reproducibility, identifying gaps and best practices. The findings emphasize the importance of comprehensive evaluation frameworks and the need for publicly available source code to enhance reproducibility in CRL research.
Methodology
The authors conducted a comprehensive review of existing literature on CRL, critically analyzing synthetic and real-world datasets, and proposing a set of characteristics for effective dataset design. They developed an integrated evaluation framework that aggregates multiple performance metrics into a single score for holistic model assessment.
Results
The analysis revealed significant limitations in current datasets used for CRL, and the proposed aggregate metric allows for a more comprehensive evaluation of model performance across different dimensions. The review of existing implementations underscored the lack of reproducibility in the field, with many studies failing to provide publicly available source code.
Implications
The findings of this paper can guide future research in CRL by establishing standards for dataset design and evaluation metrics. Improved reproducibility practices can enhance the reliability of CRL models, making them more applicable in real-world scenarios where causal inference is critical.
Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization
Optimization
Interpretability
- Integration of AutoML with deep unfolding for waveform optimization.
- Achieves high spectral efficiency with significantly fewer training samples.
- Introduces a hybrid layer for learnable gradient transformation.
- Addresses gradient normalization for improved training consistency.
Read more
Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization
Summary
This paper presents a novel approach that integrates automated machine learning (AutoML) with model-based deep unfolding (DU) to optimize wireless beamforming and waveform generation. The proposed method, termed Auto-Unrolled Proximal Gradient Descent (Auto-PGD), transforms the iterative proximal gradient descent (PGD) algorithm into a deep neural network, allowing for the learning of layer parameters rather than relying on predetermined values. A key innovation is the introduction of a hybrid layer that performs a learnable linear gradient transformation before the proximal projection. The architecture is optimized using AutoGluon with a tree-structured parzen estimator (TPE) for hyperparameter optimization across a broad search space, including network depth and learning rate schedules. The results demonstrate that Auto-PGD achieves 98.8% of the spectral efficiency of a traditional 200-iteration PGD solver using only five unrolled layers and just 100 training samples. The study also addresses gradient normalization issues to ensure consistent performance and introduces per-layer sum-rate logging for enhanced interpretability. Overall, this work significantly reduces the training data and inference costs while maintaining high interpretability compared to conventional black-box models.
Methodology
The methodology involves converting the iterative PGD algorithm into a deep neural network format, where the parameters are learned. The architecture is optimized using AutoGluon for hyperparameter tuning, allowing for automatic discovery of optimal configurations. The study also incorporates a hybrid layer for gradient transformation and addresses normalization issues to maintain performance.
Results
The Auto-PGD method achieves 98.8% of the spectral efficiency of a traditional PGD solver while utilizing only five unrolled layers and 100 training samples. This represents a significant reduction in training overhead compared to conventional deep learning approaches.
Implications
The proposed approach has potential applications in real-time optimization for 6G wireless networks, where low-latency and high reliability are critical. It also demonstrates the feasibility of combining AutoML with structured models to enhance interpretability and efficiency in signal processing tasks.
What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover
Interpretability
Multimodal
Efficient ML
- Introduces a functional interpretability framework for GAEF embeddings.
- Identifies a hierarchical organization of embedding dimensions based on their roles.
- Demonstrates that high classification accuracy can be achieved with only a few dimensions.
- Highlights the redundancy in the embedding space, suggesting potential for computational efficiency.
Read more
What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover
Summary
This paper presents a novel framework for interpreting the embeddings generated by Google AlphaEarth Foundations (GAEF), a geospatial foundation model designed for global land cover classification. The authors investigate the hierarchical structure of the embedding space, revealing that dimensions can be categorized based on their functional roles. Through large-scale experimentation and structural analysis, the study identifies specialist, low-, mid-, and high-generalist dimensions, each contributing differently to land cover classification. Remarkably, the findings indicate that high classification accuracy (up to 98% of baseline performance) can be achieved using only a subset of the available dimensions, highlighting redundancy in the embedding space. This work not only enhances the interpretability of GAEF embeddings but also provides practical guidance for dimension selection, which can lead to reduced computational costs and improved operational efficiency in land cover mapping tasks.
Methodology
The authors employed a combination of large-scale experimentation and structural analysis to investigate the relationships between embedding dimensions and land cover classes. They utilized feature importance patterns and progressive ablation techniques to characterize the contributions of different dimensions to classification tasks.
Results
The study found that embedding dimensions exhibit consistent functional behavior, allowing them to be categorized into a hierarchical spectrum. The results showed that accurate land cover classification could be achieved using as few as 2 to 12 dimensions out of the 64 available, indicating significant redundancy and potential for computational savings.
Implications
The findings have important implications for the use of geospatial foundation models in various applications, including climate adaptation, ecological monitoring, and infrastructure planning. By improving the interpretability of embeddings, the study facilitates their integration into scientific decision-making processes, enhancing the reliability of environmental assessments.
ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery
Federated Learning
- ARES enables high-fidelity reconstruction of training samples from large batches without architectural modifications.
- The attack formulates the recovery problem as a noisy sparse recovery task using Lasso.
- The incorporation of the imprint method allows for scalable reconstruction of individual samples.
- Theoretical guarantees are established for the recovery rate and reconstruction error.
Read more
ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery
Summary
The paper introduces ARES (Activation REcovery via Sparse inversion), a novel gradient inversion attack (GIA) tailored for Federated Learning (FL) environments. While FL aims to protect user privacy by sharing model updates instead of raw data, it has been shown that these updates can leak sensitive information through GIAs. Existing active GIAs often require architectural modifications, limiting their practical use. ARES addresses this limitation by reconstructing training samples from large batches without altering the model architecture. The authors reformulate the recovery problem as a noisy sparse recovery task and utilize the generalized Least Absolute Shrinkage and Selection Operator (Lasso) to solve it. ARES also employs the imprint method to disentangle activations, allowing for scalable per-sample reconstruction. The paper provides theoretical guarantees regarding the expected recovery rate and upper bounds on reconstruction error. Extensive experiments demonstrate that ARES achieves high-fidelity reconstructions across various datasets, significantly outperforming previous GIAs, especially under large batch sizes. The findings emphasize the privacy risks posed by intermediate activations in FL, highlighting the need for improved defenses against such attacks.
Methodology
ARES reformulates the gradient inversion problem as a noisy sparse recovery task, leveraging the generalized Lasso for solving it. The method incorporates the imprint technique to disentangle activations, facilitating the reconstruction of individual training samples from large batches without requiring changes to the model architecture.
Results
The experiments conducted on CNNs and MLPs demonstrate that ARES achieves high-fidelity reconstructions across diverse datasets, outperforming previous gradient inversion attacks, particularly in scenarios involving large batch sizes. The theoretical framework provides guarantees on recovery rates and reconstruction errors.
Implications
The findings underscore the significant privacy risks associated with intermediate activations in Federated Learning, indicating an urgent need for stronger privacy-preserving mechanisms. ARES could inform the development of more robust defenses against gradient inversion attacks in FL systems.
Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking
Reinforcement Learning
Theory
Optimization
- Introduces DRCB as a novel defense mechanism against steganographic collusion in MARL.
- Demonstrates that existing static monitoring techniques are ineffective in reducing collusion.
- Shows significant improvements in observer accuracy and reduced volatility under DRCB governance.
- Highlights the Transparency Paradox, where agents achieve predictability while retaining covert communication capabilities.
Read more
Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking
Summary
This paper addresses the issue of steganographic collusion in decentralized Multi-Agent Reinforcement Learning (MARL), where agents develop covert communication protocols that evade monitoring while maximizing joint utility. Existing defenses focus on behavioral outputs or rewards, failing to detect covert coordination in high-dimensional communication channels. The author introduces the Dynamic Representational Circuit Breaker (DRCB), a multi-layered defense that operates at the optimization level, utilizing the AI Mother Tongue (AIM) framework to transform agent messages into auditable statistical objects. DRCB monitors three signals to compute a unified Collusion Score, triggering interventions when thresholds are breached. Experiments on a Contextual Prisoner’s Dilemma demonstrate that static monitoring is ineffective, while DRCB significantly improves observer accuracy and reduces volatility without sacrificing mean joint reward. The findings suggest that monitoring reshapes communication protocols rather than eliminating collusion, leading to a Transparency Paradox where agents maintain covert capacities despite surface-level predictability. This work establishes a path for MICA-compliant auditing applicable to systems involving neural networks, reinforcement learning, and inter-agent communication.
Methodology
The study employs the Dynamic Representational Circuit Breaker (DRCB) framework, which monitors communication channels as statistical objects rather than semantic carriers. It utilizes the AI Mother Tongue (AIM) framework to compress agent intents into discrete symbols, allowing for granular monitoring of token usage frequencies, transition matrix stability, and policy-symbol covariance. Upon detecting collusion, DRCB triggers a four-layer intervention process to disrupt collusive behaviors.
Results
Experiments reveal that static monitoring does not significantly reduce joint reward (p = 0.3517), while DRCB governance improves observer accuracy from 0.858 to 0.938 (+9.3%), raises worst-case accuracy by 11.7 times, and reduces observer volatility by 43%, all while preserving mean joint reward (p = 0.854). Policy-Symbol Covariance analysis indicates forced Semantic Degradation, confirming DRCB's effectiveness in preventing complex steganographic encodings.
Implications
The findings suggest a need to rethink monitoring strategies in MARL systems, emphasizing the importance of structural interventions over surface-level oversight. The DRCB framework provides a technical pathway for enhancing safety in multi-agent systems, potentially applicable to various domains involving neural networks and reinforcement learning.
DISCOVER: A Solver for Distributional Counterfactual Explanations
Optimization
Interpretability
- DISCOVER is a model-agnostic solver that preserves the DCE objective while avoiding gradient-based optimization.
- The method utilizes a sparse propose-and-select search to focus on the most influential samples for counterfactual generation.
- An OT-guided cone sampling technique enhances the efficiency of candidate generation without relying on predictor gradients.
- The approach successfully extends distributional counterfactual reasoning to non-differentiable models, making it applicable to a wider range of real-world scenarios.
Read more
DISCOVER: A Solver for Distributional Counterfactual Explanations
Summary
The paper introduces DISCOVER, a model-agnostic solver for Distributional Counterfactual Explanations (DCE), which aims to provide insights into model decisions by identifying input modifications that lead to different predictions. Unlike traditional counterfactual explanations that focus on individual instances, DCE optimizes an objective that balances the proximity to a factual input distribution with alignment to a target output distribution, while ensuring statistical certification through chance-constrained bounds. The challenge with existing DCE methods is their reliance on gradient-based optimization, which is unsuitable for many non-differentiable models prevalent in real-world applications. DISCOVER addresses this limitation by employing a sparse propose-and-select search paradigm, which allows for efficient exploration of the solution space without requiring gradients. The methodology includes a sample-wise decomposition of the transport objective to compute impact scores for individual samples, focusing interventions on the most influential ones. Additionally, an OT-guided cone sampling technique is introduced to generate candidate counterfactual distributions effectively. Experimental results demonstrate that DISCOVER achieves strong alignment of input and output distributions across various tabular datasets, thus extending the applicability of distributional counterfactual reasoning to modern black-box learning pipelines.
Methodology
DISCOVER employs a sparse propose-and-select search paradigm to generate counterfactual distributions. It utilizes a sample-wise decomposition of the optimal transport objective to compute per-row impact scores and enforces a top-k intervention budget. The method also introduces an OT-guided cone sampling primitive to guide candidate generation based on input-side transport geometry, allowing for efficient exploration of the solution space without gradients.
Results
The experiments conducted on multiple tabular datasets demonstrate that DISCOVER achieves effective alignment of input and output distributions, outperforming traditional gradient-based methods in scenarios involving non-differentiable models. The results indicate that DISCOVER can successfully generate distributional counterfactual explanations that are both statistically certified and interpretable.
Implications
The development of DISCOVER has significant implications for explainable AI, particularly in applications requiring population-level interventions, such as policy-making and financial decision-making. By enabling effective distributional counterfactual reasoning in non-differentiable settings, DISCOVER can facilitate more informed decision-making processes across various domains.
Symmetry-Reduced Physics-Informed Learning of Tensegrity Dynamics
Theory
Efficient ML
Robotics
- Introduces SymPINN, a framework that incorporates geometric symmetries into tensegrity dynamics modeling.
- Reduces computational complexity by using a symmetry basis for nodal coordinates.
- Ensures predicted configurations satisfy symmetry constraints through symmetry transformations.
- Demonstrates improved prediction accuracy and efficiency in numerical experiments.
Read more
Symmetry-Reduced Physics-Informed Learning of Tensegrity Dynamics
Summary
This paper introduces a novel framework called Symmetry-Reduced Physics-Informed Neural Network (SymPINN) aimed at improving the modeling of tensegrity dynamics by explicitly incorporating geometric symmetries. Tensegrity structures, known for their lightweight and mechanically efficient designs, exhibit intrinsic symmetries that influence their dynamic behavior. Traditional physics-informed neural networks (PINNs) often overlook these symmetries, leading to increased computational complexity and optimization challenges. The SymPINN framework addresses these issues by embedding group-theory-based symmetry into the neural network architecture and solution representation. By decomposing nodes into symmetry orbits and using a symmetry basis for nodal coordinates, the method constructs a reduced coordinate representation that maintains the geometric symmetry of the structure. The full coordinates are recovered through symmetry transformations, ensuring that predicted configurations adhere to symmetry constraints. The framework enforces equivariance through orbit-based coordinate generation, symmetry-consistent message passing, and physics residual constraints. Additionally, it enhances training effectiveness by encoding initial conditions as hard constraints, utilizing Fourier feature encoding for dynamic motion representation, and implementing a two-stage optimization strategy. Numerical experiments on symmetric T-bars and lander structures demonstrate that SymPINN significantly improves prediction accuracy and computational efficiency compared to standard PINN models, showcasing the potential of symmetry-aware learning for modeling tensegrity dynamics.
Methodology
The SymPINN framework employs group-theory-based symmetry to create a reduced coordinate representation of tensegrity structures. It uses orbit-based coordinate generation and symmetry-consistent message passing to enforce equivariance. The framework integrates physics-informed loss functions to ensure compliance with governing physical laws while training the neural network in a reduced coordinate space.
Results
Extensive numerical experiments indicate that the SymPINN framework significantly outperforms standard PINN models in terms of prediction accuracy and computational efficiency when applied to symmetric T-bars and lander structures.
Implications
The proposed SymPINN framework has the potential to enhance the modeling and analysis of tensegrity structures in various fields, including aerospace engineering, robotics, and architectural design, by providing a more efficient and accurate approach to capturing their dynamic behavior.
SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval
NLP
Large Language Models
Multimodal
- Introduces a lightweight EEG-to-text framework that avoids LLM fine-tuning.
- Utilizes a CLIP-aligned EEG representation for semantic grounding and keyword inference.
- Ensures privacy by keeping raw EEG data on-premises and only sharing extracted keywords.
- Achieves comparable or improved performance over fine-tuned LLMs in generating text from EEG signals.
Read more
SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval
Summary
The paper presents SENSE, a novel framework designed to decode brain activity from electroencephalography (EEG) into natural language text without the need for fine-tuning large language models (LLMs). Traditional Brain-Computer Interface (BCI) methods often require extensive computational resources and raise privacy concerns due to the sensitive nature of neural data. SENSE addresses these issues by employing a two-stage process: first, it performs on-device semantic retrieval to extract a non-sensitive Bag-of-Words (BoW) representation from EEG signals, and then it utilizes this representation to prompt an off-the-shelf LLM for text generation. The EEG-to-keyword module is lightweight, containing approximately 6 million parameters, and operates entirely on-device, ensuring that raw EEG data remains local. The framework was evaluated using a 128-channel EEG dataset from six subjects, demonstrating that SENSE can achieve generative quality comparable to fully fine-tuned models like THOUGHT2TEXT while significantly reducing computational overhead. This approach not only enhances accessibility and scalability but also ensures privacy by limiting external interactions to abstract semantic cues.
Methodology
The SENSE framework decouples the EEG-to-text decoding process into two main stages: (1) EEG-to-keyword extraction, where EEG signals are mapped to a discrete textual space using a multichannel EEG encoder aligned with a pretrained CLIP visual encoder, and (2) prompt-based text generation, where extracted keywords are used to prompt an off-the-shelf LLM for natural language synthesis. This modular approach allows for efficient processing and privacy preservation.
Results
The evaluation on a public 128-channel EEG dataset showed that the prompt-based generation from EEG-derived keywords achieved quality comparable to, and in some cases exceeding, that of fine-tuned models like THOUGHT2TEXT. The SENSE framework demonstrated significant reductions in computational overhead while maintaining high generative quality.
Implications
The SENSE framework has potential applications in assistive communication for individuals with neurological disorders, lightweight BCI interfaces for everyday use, and brain-driven interactions in augmented and virtual reality environments. Its privacy-preserving design makes it suitable for sensitive applications involving neural data.
Variational Rectification Inference for Learning with Noisy Labels
Theory
Optimization
- Introduces Variational Rectification Inference (VRI) for robust learning with noisy labels.
- Formulates loss rectification as an amortized variational inference problem.
- Utilizes a hierarchical Bayesian model to treat the rectifying vector as a latent variable.
- Demonstrates improved generalization performance and avoids model collapse.
Read more
Variational Rectification Inference for Learning with Noisy Labels
Summary
This paper addresses the challenge of learning from noisy labels, which is prevalent in real-world datasets and can significantly hinder the performance of deep learning models. The authors propose a novel approach called Variational Rectification Inference (VRI), which formulates the rectification of loss functions as an amortized variational inference problem within a meta-learning framework. By treating the rectifying vector as a latent variable, VRI enhances robustness against label noise through a hierarchical Bayesian model. The method employs an amortization meta-network to approximate the conditional posterior of the rectifying vector, thus avoiding model collapse and improving generalization performance. The authors demonstrate that the proposed VRI method can effectively learn rectification strategies without the need for manual specification of weighting functions or hyperparameter tuning, which are common limitations in existing methods. Theoretical analyses support the efficiency of the meta-network learning process, and extensive experiments validate the effectiveness of VRI, particularly in scenarios with open-set noise.
Methodology
The methodology involves constructing a hierarchical Bayesian model where the rectifying vector is treated as a latent variable. The authors derive the evidence lower bound (ELBO) under a meta-learning framework and utilize an amortization meta-network to estimate the posterior distribution of the rectifying vector. This approach allows for rectified predictions via Monte Carlo sampling, enhancing the model's robustness against label noise.
Results
The results indicate that VRI significantly improves generalization performance compared to existing methods, particularly in the presence of noisy labels. The method effectively mitigates the issues of model collapse and enhances the learning process by leveraging the intrinsic smoothness of data.
Implications
The proposed VRI method has potential applications in various domains where noisy labels are common, such as medical image segmentation and other tasks requiring accurate annotations. It offers a scalable solution for robust learning without the need for extensive manual tuning or assumptions about data distribution.
Discovering the Hidden Role of Gini Index In Prompt-based Classification
NLP
Large Language Models
Optimization
- The Gini Index serves as a valuable tool for detecting and optimizing class accuracy disparities in prompt-based classification.
- Significant relative accuracy imbalances exist in both text and image classification tasks, regardless of dimensionality.
- A post-hoc model-agnostic bias mitigation method based on the Gini Index can effectively reduce accuracy imbalances.
- The proposed method enhances the performance of minority classes while limiting the dominance of frequently seen head classes.
Read more
Discovering the Hidden Role of Gini Index In Prompt-based Classification
Summary
This paper investigates the Gini Index's role in addressing accuracy imbalances in classification tasks, particularly in the context of prompt-based classification using large language models (LLMs) and vision models. The author identifies that long-tailed minority classes often yield critical predictions but suffer from low accuracy due to their underrepresentation in training data. Traditional methods to mitigate class imbalance typically focus on data-level adjustments, which can be costly and inefficient. Instead, this work proposes a shift towards output-level corrections, leveraging the Gini Index as a metric for detecting and optimizing disparities in class accuracy. The author empirically demonstrates the presence of relative accuracy imbalances across various classification scenarios and introduces a model-agnostic post-hoc bias mitigation method that utilizes the Gini Index. Experimental results across different classification tasks show that this method effectively reduces both relative and absolute accuracy imbalances, thereby improving the performance of minority classes while minimizing the dominance of head classes.
Methodology
The study employs empirical analysis to benchmark Gini scores in real-world LLMs and vision models. It introduces a post-hoc bias mitigation method that utilizes the Gini Index as an optimization metric to address output-level accuracy imbalances in classification tasks.
Results
The experimental results indicate that the Gini-based bias mitigation method significantly reduces both relative and absolute accuracy imbalances across various classification tasks, leading to improved performance for minority classes and a decrease in the dominance of head classes.
Implications
The findings suggest that leveraging the Gini Index for output-level adjustments can enhance fairness in classification tasks, particularly in critical applications where minority class performance is crucial. This approach could be beneficial in fields such as medical diagnosis, fraud detection, and anomaly identification.
Evaluating Causal Discovery Algorithms for Path-Specific Fairness and Utility in Healthcare
Graph Learning
- Establishment of causal graph benchmarks for synthetic and real-world clinical datasets.
- Evaluation of causal discovery algorithms on structural recovery and path-specific fairness.
- Identification of significant variations in fairness-utility ratios across different algorithms.
- Highlighting the necessity for graph-aware fairness evaluations in clinical applications.
Read more
Evaluating Causal Discovery Algorithms for Path-Specific Fairness and Utility in Healthcare
Summary
This paper addresses the challenges of evaluating causal discovery algorithms in healthcare, particularly when the ground truth is unknown. The authors collaborate with domain experts to construct proxy ground-truth graphs for synthetic datasets related to Alzheimer's disease and heart failure. They evaluate three causal discovery algorithms: Peter-Clark, Greedy Equivalence Search, and Fast Causal Inference, focusing on structural recovery and path-specific fairness decomposition. The study emphasizes the importance of graph-aware fairness evaluation and fine-grained path-specific analysis in clinical applications. The results indicate that Peter-Clark excels in structural recovery on synthetic data, while Fast Causal Inference provides the highest utility on heart failure data. The analysis reveals significant variations in the fairness-utility ratio across algorithms, underscoring the need for a nuanced understanding of causal pathways in health disparities.
Methodology
The authors developed a pipeline to evaluate causal discovery algorithms, utilizing expert-driven inputs to establish ground truth causal graphs. They applied various causal discovery algorithms to recover the underlying causal structures and assessed them based on utility and causal fairness metrics, particularly focusing on the causal fairness utility ratio for fine-grained analysis.
Results
Peter-Clark achieved the best structural recovery on synthetic data, while Fast Causal Inference yielded the highest utility on heart failure data. The study found that ejection fraction contributed significantly to the indirect effects in the ground truth, and variations in the fairness-utility ratio were observed across different algorithms, emphasizing the importance of detailed path-specific analysis.
Implications
The findings suggest that incorporating path-specific fairness evaluations in causal discovery can enhance the understanding of health disparities and improve the targeting of interventions in clinical settings. This approach can lead to more equitable healthcare outcomes by identifying which causal pathways contribute to disparities.
Determinism in the Undetermined: Deterministic Output in Charge-Conserving Continuous-Time Neuromorphic Systems with Temporal Stochasticity
Theory
Efficient ML
- Development of a unified continuous-time framework for charge-conserving SNNs.
- Establishment of deterministic output under temporal stochasticity through rigorous proof.
- Exact representational correspondence between charge-conserving SNNs and QANNs.
- Demonstration of unique terminal states that are invariant to spike timing.
Read more
Determinism in the Undetermined: Deterministic Output in Charge-Conserving Continuous-Time Neuromorphic Systems with Temporal Stochasticity
Summary
This paper addresses the challenge of achieving deterministic computation in asynchronous neuromorphic systems, which are affected by temporal stochasticity. The authors propose a unified continuous-time framework for spiking neural networks (SNNs) that integrates the Law of Charge Conservation (LoCC) with minimal neuron-level constraints. This framework ensures that the terminal state of the system is determined solely by the aggregate input charge, making the output invariant to the timing of spikes. The authors prove that this mapping is invariant in acyclic networks, while recurrent networks may introduce temporal sensitivity. Additionally, they establish a correspondence between charge-conserving SNNs and quantized artificial neural networks (QANNs), allowing for deterministic outputs in continuous-time systems without approximation errors. The work provides a theoretical foundation for designing neuromorphic systems that leverage asynchronous processing while ensuring algorithmic determinism.
Methodology
The authors formulated a model-independent Law of Charge Conservation for continuous-time spiking systems and identified minimal neuron-level design conditions necessary for coupling charge conservation with unique steady-state decoding. They conducted theoretical proofs to demonstrate the invariance of terminal outputs in acyclic networks and established a bidirectional mapping between SNNs and QANNs.
Results
The results show that SNNs adhering to the proposed framework yield unique terminal outputs that depend solely on the aggregate injected charge, independent of the timing of spike events. The correspondence with QANNs indicates that SNNs can achieve inference accuracy comparable to established static models while operating in an event-driven manner.
Implications
This work has significant implications for the design of neuromorphic computing systems, potentially enhancing their reliability and efficiency. It opens avenues for integrating SNNs with existing deep learning frameworks, facilitating the development of more robust and energy-efficient computational models.
On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings
Multimodal
- Introduces a post-hoc mechanism to adjust modality gap in VLMs without retraining.
- Demonstrates that the modality gap significantly affects performance in medical datasets.
- Finds that optimal separation is task-dependent, challenging the notion of universally minimizing the modality gap.
- Highlights the pronounced cone effect in medical domains due to lower diversity in data.
Read more
On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings
Summary
This paper investigates the 'cone effect' and 'modality gap' in Vision-Language Models (VLMs), particularly in the context of medical applications. The authors introduce a lightweight post-hoc mechanism that allows for the adjustment of cross-modal separation without retraining the models. By controlling a single hyperparameter λ, they systematically analyze the impact of the modality gap on downstream multimodal performance across various medical and natural datasets. The study reveals that while reducing the modality gap generally enhances performance, the optimal level of separation is task-dependent, especially in medical datasets where the cone effect is more pronounced due to lower semantic and visual diversity. The findings suggest that the modality gap can be treated as a tunable property rather than a fixed quantity to minimize, highlighting the need for careful tuning to maintain modality-specific information in medical VLMs.
Methodology
The authors propose a framework that keeps pretrained VLM encoders frozen while allowing for the adjustment of the modality gap through a hyperparameter λ. They evaluate this approach on both generalist and medically specialized models across multiple datasets, employing linear probing to assess performance in supervised multimodal tasks.
Results
The experiments show that reducing the modality gap consistently improves downstream performance, particularly in medical datasets. However, the results indicate that fully collapsing the gap is not always optimal, and intermediate separation tailored to specific tasks yields the best outcomes.
Implications
The findings suggest that practitioners in medical imaging and related fields should consider the modality gap as a tunable aspect of VLMs, allowing for improved performance in multimodal tasks. This approach can lead to more effective applications of VLMs in clinical settings where data diversity is limited.
Formal verification of tree-based machine learning models for lateral spreading
Theory
Interpretability
- Introduces formal verification via SMT solvers for tree-based geotechnical ML models.
- Formalizes four key geotechnical specifications for model compliance.
- Demonstrates the limitations of post-hoc explainability methods in ensuring model consistency.
- Establishes a verify-fix-verify engineering loop for improving model reliability.
Read more
Formal verification of tree-based machine learning models for lateral spreading
Summary
This paper addresses the challenge of ensuring physical consistency in machine learning models used for geotechnical hazard prediction, particularly in the context of lateral spreading. Traditional methods for model verification, such as SHAP and LIME, provide only approximate diagnostics and do not guarantee compliance with physical specifications. The author introduces a novel approach that utilizes Satisfiability Modulo Theories (SMT) solvers to formally verify tree-based models, specifically XGBoost ensembles and Explainable Boosting Machines (EBMs), against four geotechnical specifications derived from the 2011 Christchurch earthquake dataset. The specifications include thresholds for water table depth, monotonicity of peak ground acceleration (PGA), distance safety, and flat-ground safety. The methodology involves encoding the models as logical formulas and checking these against the specifications across the entire input domain. Results indicate that unconstrained models often violate these specifications, while applying monotonic constraints can improve compliance but may still leave some specifications unfulfilled. The study establishes a 'verify-fix-verify' engineering loop, demonstrating that formal verification can guide the iterative improvement of model consistency, ultimately laying the groundwork for the formal certification of geotechnical ML models prior to deployment.
Methodology
The paper employs Satisfiability Modulo Theories (SMT) solvers to encode trained tree ensembles as logical formulas and checks these against specified geotechnical conditions. The approach formalizes the specifications into quantifier-free satisfiability queries in linear real arithmetic, allowing for comprehensive verification across the input domain.
Results
The study finds that unconstrained XGBoost models violate all four specifications, while constrained models show varying degrees of compliance. A fully constrained EBM satisfies three out of four specifications, demonstrating that iterative application of constraints based on verification results can enhance physical consistency. However, no model variant achieves both high accuracy (>80%) and full compliance with all specifications.
Implications
The findings suggest a critical need for formal verification in the deployment of machine learning models in safety-critical applications, particularly in geotechnical engineering. This approach could lead to improved trust and reliability in ML predictions, facilitating safer operational practices in the field.
FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios
Federated Learning
Generative Models
Computer Vision
- FederatedFactory achieves centralized performance in extreme single-class silo scenarios, significantly improving accuracy from 11.36% to 90.57% on CIFAR-10.
- The framework operates with zero dependency on external pre-trained models, relying solely on localized generative priors.
- It utilizes a one-shot communication strategy, enhancing efficiency by avoiding multiple rounds of data transmission.
- The architecture supports exact modular unlearning, allowing for the removal of specific client contributions without data leakage.
Read more
FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios
Summary
The paper introduces FederatedFactory, a novel framework for Federated Learning (FL) that addresses the challenges posed by extremely non-IID (Independent and Identically Distributed) data distributions, particularly in scenarios where clients possess mutually exclusive label sets. Traditional FL methods struggle under these conditions due to conflicting optimization trajectories and reliance on pre-trained foundation models, which can introduce biases. FederatedFactory shifts the focus from aggregating discriminative parameters to exchanging generative priors, allowing for the synthesis of class-balanced datasets in a single communication round. This approach not only eliminates gradient conflicts but also avoids external prior biases, leading to significant improvements in performance across various medical imaging benchmarks. The framework also supports modular unlearning, enabling the deterministic removal of specific generative modules without compromising the overall system integrity.
Methodology
FederatedFactory employs a generative one-shot learning architecture that inverts the traditional federative approach by focusing on localized generative prior parameters instead of discriminative parameters. Clients independently train and transmit generative models, which are then aggregated to synthesize class-balanced datasets from a standard latent space. This method avoids the pitfalls of traditional FL by eliminating the need for overlapping class distributions and external model dependencies.
Results
The evaluations demonstrate that FederatedFactory recovers centralized upper-bound performance across diverse medical imaging benchmarks. Specifically, it improved CIFAR-10 accuracy from 11.36% to 90.57% and restored the ISIC2019 AUROC to 90.57%. These results indicate that the framework effectively addresses the challenges of extreme non-IID data distributions.
Implications
FederatedFactory has significant implications for applications in medical imaging and other fields where data privacy and sovereignty are paramount. By enabling effective learning from highly heterogeneous data distributions without compromising data security, it opens up new avenues for collaborative learning in sensitive domains.
Learning Permutation Distributions via Reflected Diffusion on Ranks
Generative Models
Optimization
- Introduction of Soft-Rank Diffusion for learning permutation distributions.
- Utilization of a continuous soft-rank representation to enable smoother diffusion processes.
- Development of contextualized generalized Plackett–Luce (cGPL) denoisers for enhanced expressivity.
- Demonstrated superior performance on permutation generation tasks compared to existing methods.
Read more
Learning Permutation Distributions via Reflected Diffusion on Ranks
Summary
This paper addresses the challenge of learning probability distributions over permutations, which is complicated by the factorial growth of the symmetric group Sn and its discrete structure. The authors propose a novel framework called Soft-Rank Diffusion, which replaces traditional shuffle-based corruption methods with a structured soft-rank forward process. This approach involves lifting permutations to a continuous latent representation by relaxing discrete ranks into soft ranks, allowing for smoother and more manageable trajectories. For the reverse process, the authors introduce contextualized generalized Plackett–Luce (cGPL) denoisers that enhance expressivity for sequential decision-making tasks. The proposed method is evaluated on sorting and combinatorial optimization benchmarks, demonstrating consistent performance improvements over existing permutation diffusion methods, particularly in settings with longer sequences. The results suggest that Soft-Rank Diffusion provides a scalable and principled approach to generative modeling of permutations.
Methodology
The authors propose Soft-Rank Diffusion, which defines a forward noising process as a reflected diffusion bridge in a continuous soft-rank space. This involves representing permutations through continuous soft ranks, allowing for smooth stochastic dynamics. The reverse process is augmented with a hybrid sampler that combines intractable discrete updates with tractable continuous updates in the soft-rank space, followed by a projection back to permutations via sorting.
Results
Soft-Rank Diffusion consistently outperformed prior permutation diffusion baselines and differentiable sorting methods across various benchmarks, including 4-digit MNIST sorting and the traveling salesperson problem (TSP). The performance improvements were particularly pronounced in scenarios involving longer sequences, indicating the method's scalability and effectiveness.
Implications
The proposed Soft-Rank Diffusion framework has potential applications in various fields requiring permutation modeling, such as ranking systems, recommendation algorithms, and combinatorial optimization problems. Its ability to handle longer sequences effectively can enhance the performance of generative models in these areas.
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
Time Series
Efficient ML
Theory
- Introduction of the Phasor Transformer block as a phase-native alternative to dense attention layers.
- Achieves global token mixing with O(N log N) complexity using DFT token coupling.
- Demonstrates competitive performance in time-series forecasting with fewer parameters than traditional Transformers.
- Establishes a new efficiency-performance frontier for long-context temporal modeling.
Read more
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
Summary
This paper introduces the Phasor Transformer, a novel architecture designed to address the computational bottlenecks associated with traditional self-attention mechanisms in Transformers, particularly for long-context time-series data. The Phasor Transformer block operates on the unit-circle manifold, utilizing lightweight trainable phase-shifts and parameter-free Discrete Fourier Transform (DFT) token coupling to achieve global mixing with O(N log N) complexity, circumventing the need for explicit attention maps. The architecture is stacked to form the Large Phasor Model (LPM), which is validated through autoregressive time-series prediction on synthetic multi-frequency benchmarks. The LPM demonstrates a compact parameter budget while effectively learning stable global dynamics and achieving competitive forecasting performance compared to conventional self-attention models. The findings suggest that leveraging geometric constraints and phase computation can lead to scalable and interpretable temporal modeling in oscillatory domains, establishing a new frontier in efficiency-performance trade-offs for time-series forecasting.
Methodology
The paper develops the Phasor Transformer block, which combines trainable phase-shift layers with deterministic global DFT token mixing. The architecture is structured as a deep stack of these blocks, allowing for efficient global context propagation without the need for explicit attention maps. The model is empirically benchmarked against conventional Transformer baselines on time-series tasks to evaluate its efficiency, scalability, and accuracy.
Results
The Large Phasor Model (LPM) outperforms conventional self-attention baselines in autoregressive time-series prediction tasks while maintaining a compact parameter budget. The results indicate that the LPM can effectively learn stable global dynamics and achieve competitive forecasting behavior, demonstrating the practical applicability of phase-native computation in temporal modeling.
Implications
The findings suggest that the Phasor Transformer architecture could revolutionize time-series modeling by providing a more efficient and interpretable framework for capturing oscillatory dynamics. This could have significant implications in fields such as finance, climate modeling, and biosignal analysis, where understanding phase relationships is crucial.
Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability, Stability and Fairness
Reinforcement Learning
Optimization
- MAPPO outperforms other algorithms in terms of profit and stability.
- MADDPG achieves fairer profit distribution among agents despite lower overall profit.
- The study highlights the importance of stability and reproducibility in MARL for dynamic pricing.
- Insights on trade-offs between exploration and reliability are provided, particularly regarding MASAC.
Read more
Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability, Stability and Fairness
Summary
This paper investigates the application of multi-agent reinforcement learning (MARL) techniques for dynamic pricing in competitive retail markets. The authors conduct a systematic empirical evaluation of three MARL algorithms—MAPPO (Multi-Agent Proximal Policy Optimization), MADDPG (Multi-Agent Deep Deterministic Policy Gradient), and MASAC (Multi-Agent Soft Actor-Critic)—against an Independent DDPG (IDDPG) baseline. The evaluation is performed in a simulated marketplace environment based on real-world retail data, focusing on metrics such as profit performance, stability across random seeds, fairness, and training efficiency. The results indicate that MAPPO consistently yields the highest average returns with low variance, making it a stable and reproducible choice for dynamic pricing. In contrast, MADDPG, while slightly less profitable, provides the fairest profit distribution among agents. The findings underscore the advantages of MARL methods, particularly MAPPO, as scalable and reliable alternatives to independent learning approaches in retail pricing strategies.
Methodology
The authors benchmarked MAPPO, MADDPG, and MASAC against an IDDPG baseline in a simulated retail marketplace environment. They evaluated the algorithms based on profit performance, stability across random seeds, and sample efficiency, using real-world retail transaction data to model demand elasticity and competitive interactions.
Results
The results demonstrated that MAPPO consistently achieved the highest mean profits with significantly lower variance compared to MASAC and MADDPG. MADDPG, while yielding slightly lower profits, provided a more equitable profit distribution among agents. The analysis of stability revealed MAPPO's superior performance in terms of reproducibility across different random seeds.
Implications
The findings suggest that MARL methods, especially MAPPO, can be effectively utilized in real-world dynamic pricing systems, offering a robust framework for retailers to optimize pricing strategies in competitive environments. This research provides valuable insights for practitioners looking to implement adaptive pricing mechanisms that balance profitability, stability, and fairness.
SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding
Time Series
- Introduction of a Gaussian-smoothed masking strategy for EEG signal pretraining.
- Development of SpecHi-Net, a hierarchical architecture for multi-scale feature extraction.
- Implementation of a spectral gating mechanism in a mixture of experts framework.
- Demonstration of state-of-the-art performance in diverse EEG decoding tasks.
Read more
SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding
Summary
The paper presents SpecMoE, a novel foundation model designed for decoding electroencephalography (EEG) signals across species. The authors address limitations in existing EEG decoding frameworks, which often rely on separate temporal and spectral masking that biases learning towards high-frequency oscillations. SpecMoE introduces a Gaussian-smoothed masking strategy applied to short-time Fourier transform (STFT) maps, enhancing the model's ability to learn intricate neural patterns across both high- and low-frequency domains. The architecture, SpecHi-Net, is a U-shaped hierarchical model that captures multi-scale temporal and spectral features through multiple encoding and decoding stages. To optimize performance, the authors implement a mixture of experts framework (SpecMoE) that utilizes a learned spectral gating mechanism to dynamically weight expert contributions based on the signal's Power Spectral Density (PSD). The model demonstrates state-of-the-art performance across various EEG decoding tasks, including sleep staging, emotion recognition, motor imagery classification, abnormal signal detection, and drug effect prediction, while exhibiting strong cross-species and cross-subject generalization.
Methodology
The authors propose a Gaussian-smoothed masking strategy applied to STFT spectrograms, which replaces sharp masking boundaries with smooth transitions. This approach prevents bias towards non-physiological transients and low-frequency leakage. The SpecHi-Net architecture captures multi-scale features through multiple encoding and decoding stages. The mixture of experts framework, SpecMoE, utilizes a learned spectral gating mechanism to adaptively weight expert contributions based on the task's rhythmic content.
Results
SpecMoE achieves state-of-the-art performance across multiple EEG decoding tasks, including sleep staging, emotion recognition, motor imagery classification, abnormal signal detection, and drug effect prediction. The model shows robust generalization across different species, maintaining high accuracy on both human and murine EEG datasets.
Implications
The findings suggest that the proposed spectral-aware Gaussian-smoothed masking and hierarchical feature integration can significantly enhance the performance of EEG foundation models and brain-computer interface systems. This could lead to improved diagnostic tools and applications in neuroscience and clinical settings.
A foundation model for electrodermal activity data
Time Series
- Introduction of UME, the first foundation model specifically for EDA data.
- Compilation of EDAMAME, a large-scale EDA dataset from 24 public sources.
- UME outperforms baseline models and matches generalist models with significantly lower computational costs.
- Challenges in EDA modeling are acknowledged, indicating the need for further research.
Read more
A foundation model for electrodermal activity data
Summary
This paper introduces UME, the first dedicated foundation model for electrodermal activity (EDA) data, addressing the significant gap in large-scale, curated datasets for EDA. The authors compiled EDAMAME, a comprehensive collection of EDA traces from 24 public datasets, totaling over 25,000 hours of data from 634 users. UME was trained on approximately 275 million 60-second windows of EDA data and evaluated across various downstream tasks. The results demonstrated that UME outperformed baseline models in 8 out of 10 scenarios and matched the performance of generalist time series foundation models while requiring 20 times less computational resources. However, the study also highlighted the intrinsic challenges of EDA modeling, such as variability in balanced accuracy scores, which rarely exceeded 0.7. The authors emphasize the need for further research to fully exploit EDA's potential in both unimodal and multimodal sensing applications. All datasets, model weights, and code are made publicly available to facilitate ongoing research.
Methodology
The authors created the EDAMAME dataset by integrating EDA traces from 24 public datasets. UME was trained using self-supervised learning techniques on this dataset, focusing on 60-second windows of EDA data. The model's performance was evaluated against various baseline models across multiple tasks.
Results
UME outperformed baseline models in 8 out of 10 evaluation scenarios and matched the performance of generalist time series foundation models while using 20 times fewer computational resources. However, the balanced accuracy scores were generally low, indicating variability and challenges in EDA modeling.
Implications
The development of UME and the EDAMAME dataset opens new avenues for research in physiological signal analysis, particularly in applications related to cognitive load, stress, and engagement assessment. The public availability of resources encourages further exploration and innovation in EDA modeling.
Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions
Reinforcement Learning
Theory
Optimization
- Introduces a hybrid framework combining game theory with multi-agent reinforcement learning (MARL).
- Achieves significant improvements in training efficiency, with higher rewards and faster convergence.
- Utilizes the Apollonius Circle for Nash equilibrium computation, allowing for early termination of RL episodes.
- Demonstrates effectiveness across different team sizes in border defense scenarios.
Read more
Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions
Summary
This paper presents a novel hybrid approach that integrates game-theoretic insights with reinforcement learning (RL) to enhance the efficiency of training in a border defense scenario. Traditional game theory provides optimality guarantees under certain assumptions, but these can falter in real-world applications where information is imperfect. Conversely, RL is adaptive but often suffers from sample inefficiency, particularly in complex environments. The authors focus on a border defense game characterized by limited perceptual range, where the defenders' success hinges on effective search and pursuit strategies. They utilize the Apollonius Circle (AC) to derive equilibrium solutions for the post-detection phase of the game, allowing for early termination of RL episodes. This method enables RL to focus on optimizing search strategies while ensuring optimal outcomes post-detection. The results demonstrate that this hybrid approach yields 10-20% higher rewards, accelerates convergence, and produces more efficient search trajectories across various defender configurations. The findings underscore the potential of combining analytical solutions with RL to improve learning efficiency in adversarial settings.
Methodology
The authors formulate a two-phase border defense problem and employ the Apollonius Circle to compute the Nash equilibrium for the pursuit phase of the game. They integrate this analytical solution into the MARL training loop, allowing RL to focus on learning search strategies while leveraging closed-form rewards for the pursuit phase. The experimental evaluation is conducted using the MAPPO algorithm.
Results
The proposed method results in 10-20% higher rewards compared to traditional end-to-end learning approaches. It also demonstrates faster convergence rates and more efficient search trajectories in both single and multi-defender scenarios, validating the effectiveness of the hybrid framework.
Implications
This research has significant implications for developing more efficient training methods in adversarial environments, particularly in military and security applications where border defense strategies are critical. The integration of analytical solutions with RL could lead to advancements in other domains requiring strategic decision-making under uncertainty.
CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning
Reinforcement Learning
Theory
Efficient ML
- The paper formulates the problem of arithmetic circuit synthesis as a single-player game for RL agents.
- Two RL methods are compared: PPO+MCTS and SAC, with SAC showing better performance on simpler tasks.
- PPO+MCTS demonstrates scalability to more complex polynomial instances.
- The study suggests that RL can effectively navigate the vast search space of arithmetic circuits.
Read more
CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning
Summary
This paper addresses the challenge of discovering efficient arithmetic circuits for computing polynomials using reinforcement learning (RL). The authors model the problem as a single-player game where an RL agent constructs circuits through a series of operations, aiming to minimize the number of gates used. They implement two RL approaches: Proximal Policy Optimization combined with Monte Carlo Tree Search (PPO+MCTS) and Soft Actor-Critic (SAC). The study finds that SAC performs better on simpler two-variable polynomials, while PPO+MCTS scales effectively to more complex three-variable cases. The results indicate that learning-based methods can uncover optimal or near-optimal circuit designs, providing insights into algebraic complexity and potential applications in automated proof generation.
Methodology
The authors employ a reinforcement learning framework, specifically using Proximal Policy Optimization (PPO) combined with Monte Carlo Tree Search (MCTS) and Soft Actor-Critic (SAC) methods. The problem is modeled as a Markov Decision Process (MDP) where the agent selects algebraic operations to construct circuits for target polynomials, with a focus on minimizing gate count and generalizing to unseen polynomials.
Results
The experiments reveal that SAC achieves higher success rates for two-variable polynomial targets, while PPO+MCTS is more effective for three-variable targets, showing consistent improvement on more challenging instances. The study demonstrates that both methods can recover optimal or near-optimal circuits, indicating the potential of RL in this domain.
Implications
The findings suggest that reinforcement learning can be a powerful tool for discovering efficient computational structures in algebra, with implications for automated proof generation and deeper insights into algebraic complexity theory. This could lead to advancements in understanding the VP vs. VNP conjecture and other related problems.
WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation
Reinforcement Learning
Generative Models
Robotics
- WINFlowNets introduces a co-training framework for flow and retrieval networks, enhancing adaptability in dynamic environments.
- The two-phase training strategy (Warm-Up and Dual-Training) eliminates the need for pre-training the retrieval network.
- Experimental results show significant improvements in performance and stability over standard CFlowNets and leading RL algorithms.
- WINFlowNets demonstrates strong adaptability in fault environments, making it suitable for real-world robotic applications.
Read more
WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation
Summary
This paper introduces WINFlowNets, a novel framework for training Generative Flow Networks (CFlowNets) aimed at improving their application in dynamic robotic control tasks and adapting to machine faults. Traditional CFlowNets rely on a pre-trained retrieval network, which limits their effectiveness in environments where pre-training data may not be available or representative. WINFlowNets addresses this limitation by enabling co-training of flow and retrieval networks through a two-phase training strategy: a Warm-Up phase for initial policy bootstrapping of the retrieval network, followed by a Dual-Training phase where both networks collaboratively learn from a shared replay buffer. Experimental results in simulated robotic environments demonstrate that WINFlowNets outperforms standard CFlowNets and state-of-the-art reinforcement learning algorithms (PPO, SAC) in terms of average reward and training stability, particularly in out-of-distribution (OOD) scenarios. The findings suggest that WINFlowNets is well-suited for deployment in dynamic and malfunction-prone robotic systems, where traditional pre-training methods are impractical.
Methodology
The methodology involves a two-phase training process: a Warm-Up phase where the retrieval network independently interacts with the environment to gather initial experience, followed by a Dual-Training phase where both the flow and retrieval networks co-train using a shared replay buffer. This approach allows for simultaneous learning and adaptation to changing environments without the need for pre-training.
Results
WINFlowNets was shown to outperform standard CFlowNets and state-of-the-art RL algorithms in terms of average reward and training stability in simulated robotic environments. The framework also exhibited strong performance in adapting to out-of-distribution scenarios, highlighting its effectiveness in dynamic and fault-prone settings.
Implications
The findings suggest that WINFlowNets can be effectively deployed in real-world robotic systems that require rapid adaptation to changing conditions and faults, potentially improving the robustness and efficiency of robotic operations in various applications.
Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates
Time Series
- Baguan-TS unifies end-to-end representation learning with in-context learning for time series forecasting.
- The model employs a 3D Transformer architecture that attends to temporal, variable, and context dimensions.
- A Y-space retrieval-based calibration module improves model stability and forecasting accuracy.
- The context-overfitting strategy enhances robustness by balancing denoising and sample selection.
Read more
Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates
Summary
The paper introduces Baguan-TS, a novel framework that integrates in-context learning (ICL) with raw sequence representation for time series forecasting. Traditional ICL approaches often rely on tabularized features, while end-to-end models lack the ability to adapt at inference time. Baguan-TS addresses this gap by employing a 3D Transformer architecture that simultaneously attends to temporal, variable, and context dimensions, allowing for effective adaptation without the need for feature engineering. The authors tackle two significant challenges: calibration and output oversmoothing. They propose a Y-space retrieval-based local calibration module to enhance model stability and accuracy, and a context-overfitting strategy to mitigate oversmoothing by focusing on relevant support examples. The framework demonstrates superior performance on benchmark datasets, achieving the highest win rates and significant reductions in forecasting metrics compared to established baselines. Overall, Baguan-TS provides a robust solution for time series forecasting under varying data conditions, emphasizing the importance of leveraging raw sequences for effective learning.
Methodology
The authors developed Baguan-TS as a 3D Transformer model that processes raw multivariate time series data. The model incorporates a local calibration mechanism (Y-space RBfcst) for improved prediction accuracy and a context-overfitting strategy to manage output oversmoothing. This allows the model to adaptively learn from a support set during inference without relying on pre-engineered features.
Results
Baguan-TS achieved the highest average win rate and significantly reduced scaled quantile loss (SQL) and mean absolute scaled error (MASE) on the fev-bench-cov benchmark, outperforming established models like TabPFN-TS by 4.8% in SQL. The model also demonstrated robustness across various real-world energy datasets.
Implications
The framework has potential applications in various domains requiring time series forecasting, such as finance, energy, and supply chain management. Its ability to adapt quickly to new tasks and handle distribution shifts makes it a valuable tool for practitioners dealing with dynamic data environments.
The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning
Theory
Optimization
Large Language Models
- Introduces a five-level taxonomy of AI integration in research.
- Presents an open-source framework for using CLI coding agents as autonomous research assistants.
- Demonstrates the framework's application through case studies in mathematics and machine learning.
- Emphasizes the importance of human oversight and augmentation in AI-assisted research.
Read more
The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning
Summary
This paper presents a practical guide for integrating AI tools into research practices in mathematics and machine learning. The authors propose a five-level taxonomy of AI integration, ranging from full human control to high agent autonomy. They introduce an open-source framework that utilizes command-line interface (CLI) coding agents as autonomous research assistants, enhancing productivity in research workflows. The framework operates within a sandboxed environment, allowing for seamless experimentation across various computational resources. The authors provide case studies demonstrating the framework's application in deep learning and mathematical proofs, emphasizing that it is designed to augment rather than replace human researchers. The longest autonomous session recorded lasted over 20 hours, showcasing the framework's capability to manage multiple experiments without human intervention. The paper aims to bridge the gap between the capabilities of AI systems and their practical application in everyday research, offering actionable insights for researchers looking to leverage AI effectively.
Methodology
The authors developed a taxonomy of AI integration levels and created an open-source framework that includes methodological rules for using CLI coding agents. This framework allows researchers to run autonomous experiments in a sandboxed environment, facilitating both mathematical derivations and computational tasks.
Results
The framework successfully enables researchers to conduct long-duration autonomous research sessions, with the longest recorded session lasting over 20 hours. It demonstrated effective management of multiple independent experiments across various computational nodes, highlighting its scalability and efficiency.
Implications
The findings suggest that AI can significantly enhance research productivity by automating routine tasks and managing complex experiments. This framework could be applied across various fields beyond mathematics and machine learning, potentially transforming research methodologies and workflows.
Conditional Inverse Learning of Time-Varying Reproduction Numbers Inference
Time Series
- CIRL addresses the ill-posed inverse problem of estimating time-varying reproduction numbers from epidemic data.
- The framework combines epidemiological constraints with data-driven modeling to enhance adaptability to changing dynamics.
- CIRL employs a Conditional Inverse Mapping Network and a Statistical Observation and Consistency Module to improve estimation accuracy.
- Experiments validate the robustness of CIRL against observation noise and its responsiveness to abrupt transmission changes.
Read more
Conditional Inverse Learning of Time-Varying Reproduction Numbers Inference
Summary
This paper addresses the challenge of estimating time-varying reproduction numbers (Rt) from epidemic incidence data, a critical task in infectious disease surveillance. Traditional methods often rely on rigid structural assumptions that can hinder adaptability to non-stationary transmission dynamics, resulting in delayed detection of changes and reduced estimation accuracy. The authors propose a novel framework called Conditional Inverse Reproduction Learning (CIRL), which learns a conditional mapping from historical incidence patterns and explicit time information to latent reproduction numbers. Unlike conventional approaches that impose strict parametric constraints, CIRL integrates epidemiological structures with flexible likelihood-based statistical modeling. The framework employs a renewal equation as a forward operator to ensure dynamical consistency while allowing for robust estimates that are responsive to abrupt changes and noise in the data. The effectiveness of CIRL is demonstrated through experiments on synthetic epidemic data with controlled regime changes and real-world data from SARS and COVID-19, showcasing its ability to provide accurate and timely reproduction number estimates.
Methodology
The CIRL framework consists of two main components: a Conditional Inverse Mapping Network that learns a flexible mapping from historical incidence data and time to reproduction numbers, and a Statistical Observation and Consistency Module that uses a probabilistic objective to handle the heterogeneity and sparsity of real-world surveillance data. This approach allows for soft consistency constraints that mitigate noise while maintaining sensitivity to significant transmission changes.
Results
The experiments conducted on both synthetic and real-world epidemic data demonstrate that CIRL provides accurate estimates of time-varying reproduction numbers, effectively adapting to regime changes and noise in the data. The results indicate that CIRL outperforms traditional methods that rely on rigid structural assumptions, particularly in scenarios with abrupt changes in transmission dynamics.
Implications
The CIRL framework has significant implications for public health surveillance and epidemic modeling, offering a more flexible and accurate method for estimating reproduction numbers. This can enhance the responsiveness of health interventions and improve the understanding of epidemic dynamics in real-time.
FlashSampling: Fast and Memory-Efficient Exact Sampling
NLP
Large Language Models
Efficient ML
- FlashSampling fuses exact sampling into the LM-head matmul, eliminating the need for full logits tensor materialization.
- The method computes logits tile-by-tile and retains only essential candidates, reducing memory traffic and improving efficiency.
- FlashSampling achieves exact sampling without approximations, maintaining accuracy while enhancing performance.
- The approach demonstrates significant speedups in end-to-end vLLM experiments across multiple GPU architectures.
Read more
FlashSampling: Fast and Memory-Efficient Exact Sampling
Summary
FlashSampling introduces a novel exact sampling method designed to optimize the sampling process from categorical distributions, particularly in large-vocabulary decoding scenarios. Traditional methods often incur significant memory overhead due to the need to materialize logits tensors in high-bandwidth memory (HBM) and execute multiple kernels for sampling. FlashSampling addresses this inefficiency by integrating the sampling process directly into the linear model (LM) head's matrix multiplication (matmul) operation. The approach computes logits in a tile-by-tile manner on-chip, applies Gumbel noise, and retains only one maximizer per row and vocabulary tile, thus avoiding the full logits tensor's materialization. This method is exact due to the argmax decomposition over partitions and can be adapted for online and tensor-parallel settings through hierarchical factorization. Experimental results demonstrate that FlashSampling significantly reduces the time per output token by up to 19% across various GPU architectures, showcasing its effectiveness in bandwidth-bound decoding scenarios.
Methodology
FlashSampling employs a two-stage design where logits are computed tile-by-tile in the LM-head epilogue. It adds Gumbel noise on-chip and stores only one candidate per row and vocabulary tile, followed by a lightweight reduction. The method leverages hierarchical factorization to ensure exactness in both online and distributed settings.
Results
The implementation of FlashSampling resulted in a reduction of time per output token by up to 19% in end-to-end vLLM experiments across various NVIDIA GPUs (H100, H200, B200, B300). The method also demonstrated consistent performance improvements in memory-bandwidth-bound decode regimes, highlighting its efficiency.
Implications
FlashSampling has the potential to enhance the performance of large language models (LLMs) by streamlining the sampling process, which is critical for autoregressive decoding. This efficiency can lead to faster model inference times and reduced resource consumption, making it suitable for real-time applications in natural language processing and other fields relying on large categorical distributions.