gistml | Daily ML Paper Summaries

Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement

Jing Wang, Jie Shen, Yiming Luo, Amar Sra, Qiaomin Xie, Jeremy C. Weiss

NLP Large Language Models Time Series

Developed a causal network model to predict PASC severity in women using LLM.
Achieved 86.7% precision in clinical severity prediction.
Successfully differentiated between active pathology symptoms and confounding factors like menopause.
Utilized wearable data to enhance prediction accuracy and reduce diagnostic ambiguity.

FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models

Simon Klüttermann, Tim Katzke, Phuong Huong Nguyen, Emmanuel Müller

Interpretability

FoMo-X enhances the explainability of outlier detection models by integrating modular diagnostic heads.
The framework leverages frozen embeddings from pretrained PFNs to provide efficient, context-aware diagnostics.
Two diagnostic heads are introduced: one for severity assessment and another for uncertainty estimation.
Extensive evaluations show high fidelity in recovering diagnostic signals with negligible inference cost.

CLeAN: Continual Learning Adaptive Normalization in Dynamic Environments

Isabella Marasco, Davide Evangelista, Elena Loli Piccolomini, Michele Colajanni

Theory Optimization Efficient ML

CLeAN addresses the limitations of traditional normalization methods in continual learning contexts.
The technique employs learnable parameters updated via Exponential Moving Average (EMA) for adaptive normalization.
CLeAN improves model performance on new data while reducing catastrophic forgetting.
The study emphasizes the critical role of adaptive normalization in dynamic environments.

Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition

Xiaozhou Ye, Feng Jiang, Zihan Wang, Xiulai Wang, Yutao Zhang, Kevin I-Kai Wang

Reinforcement Learning Time Series Generative Models

Introduces CTFG, a novel framework for feature extraction in HAR that addresses cross-user variability.
Utilizes a Transformer-based autoregressive generator for sequential feature token generation.
Employs Group-Relative Policy Optimization to optimize feature generation without a critic.
Achieves state-of-the-art accuracy on benchmark datasets while reducing training variance.

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

Hongjun Wang, Wei Liu, Weibo Gu, Xing Sun, Kai Han

Reinforcement Learning Optimization Multimodal

Introduction of the Log-Fidelity Modulator (LFM) for stable gradient optimization.
Implementation of Decoupled Hazard Penalty (DHP) for independent regulation of policy shifts.
Demonstrated superior performance and stability in RL training across diverse benchmarks.
Mitigation of risks associated with extreme policy shifts and high-variance outlier tokens.

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

Sinan Ibrahim, Grégoire Ouerdane, Hadi Salloum, Henni Ouerdane, Stefan Streif, Pavel Osinenko

Reinforcement Learning Theory Optimization

Introduction of a benchmarking framework for RL based on stochastic converse optimality.
Systematic generation of environments with known optimal policies for rigorous evaluation.
Validation through diverse environments and assessment of standard RL methods against ground-truth optima.
Provision of absolute metrics for performance evaluation, enhancing reproducibility in RL research.

Classifier Pooling for Modern Ordinal Classification

Noam H. Rotenberg, Andreia V. Faria, Brian Caffo

Theory Efficient ML

Introduces a model-agnostic approach for ordinal classification using any non-ordinal classifier.
Develops two algorithms: DifferenceOrdinalClassifier for cumulative classification and TreeOrdinalClassifier for hierarchical classification.
Provides an open-source Python package 'statlab' for easy implementation of the proposed methods.
Demonstrates superior performance of the proposed methods over traditional non-ordinal classifiers in various datasets.

RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation

Yixuan Huang, Jiawei Chen, Shengfan Zhang, Zongsheng Cao

Graph Learning

RaDAR addresses structural semantics degradation and limited relational expressiveness in recommendation systems.
The framework employs a dual-view generation architecture combining graph generative and denoising models.
Innovations include asymmetric contrastive learning and diffusion-guided augmentation for enhanced robustness.
RaDAR outperforms existing methods on multiple benchmarks, especially under high noise and sparsity.

The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data

Christina Baek, Ricardo Pio Monti, David Schwab, Amro Abbas, Rishabh Adiga, Cody Blakeney, Maximilian Böther, Paul Burstein, Aldo Gael Carranza, Alvin Deng, Parth Doshi, Vineeth Dorna, Alex Fang, Tony Jiang, Siddharth Joshi, Brett W. Larsen, Jason Chan Lee, Katherine L. Mentzer, Luke Merrick, Haakon Mongstad, Fan Pan, Anshuman Suri, Darren Teh, Jason Telanoff, Jack Urbanek, Zhengping Wang, Josh Wills, Haoli Yin, Aditi Raghunathan, J. Zico Kolter, Bogdan Gaza, Ari Morcos, Matthew Leavitt, Pratyush Maini

Large Language Models Theory Efficient ML

Specialized pretraining (SPT) improves domain performance while preserving general capabilities.
SPT reduces the pretraining tokens needed to achieve a given domain performance by up to 1.75×.
Incorporating domain data early in training is more effective than reserving it for finetuning.
SPT outperforms traditional finetuning approaches, especially in underrepresented domains.

RangeAD: Fast On-Model Anomaly Detection

Luca Hinkamp, Simon Klüttermann, Emmanuel Müller

Efficient ML Theory

Introduction of the On-Model AD framework for anomaly detection.
Development of RangeAD, which uses internal neural activation ranges for real-time anomaly detection.
Demonstration of superior performance in high-dimensional tasks with lower inference costs.
Comprehensive ablation study validating the efficacy of the proposed method.

TimeAPN: Adaptive Amplitude-Phase Non-Stationarity Normalization for Time Series Forecasting

Yue Hu, Jialiang Tang, Siwei Yu, Baosheng Yu, Jing Zhang, Dacheng Tao

Time Series

TimeAPN addresses non-stationarity in time series forecasting by modeling amplitude and phase changes.
The framework utilizes discrete wavelet transform for frequency domain analysis.
It incorporates adaptive normalization mechanisms to handle abrupt fluctuations in signal energy.
TimeAPN is model-agnostic, allowing integration with various forecasting backbones.

Transition Flow Matching

Chenrui Ma

Generative Models

Introduction of Transition Flow Matching for efficient few-step generative modeling.
Derivation of the Transition Flow Identity and a new training objective for generative models.
Establishment of a unified theoretical perspective connecting Transition Flow Matching with Mean Velocity models.
Demonstration of competitive performance in image generation benchmarks.

Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control

Hao Ma, Zhiqiang Pu, Xiaolin Ai, Huimu Wang

Reinforcement Learning Large Language Models Robotics

GuidedSAC leverages LLMs for action-level guidance in reinforcement learning.
The algorithm maintains convergence guarantees of the original SAC while enhancing speed.
GuidedSAC outperforms standard SAC and advanced exploration methods in various tasks.
The approach addresses the inefficiencies of exploration in vast state-action spaces.

Efficient Reasoning on the Edge

Yelysei Bondarenko, Thomas Hehn, Rob Hesselink, Romain Lepert, Fabio Valerio Massoli, Evgeny Mironov, Leyla Mirvakhabova, Tribhuvanesh Orekondy, Spyridon Stasis, Andrey Kuzmin, Anna Kuzina, Markus Nagel, Ankita Nayak, Corrado Rainone, Ork de Rooij, Paul N Whatmough, Arash Behboodi, Babak Ehteshami Bejnordi

NLP Large Language Models Efficient ML

Introduces a lightweight approach for enabling reasoning in small LLMs using LoRA adapters.
Implements budget forcing via reinforcement learning to minimize verbosity in reasoning outputs.
Utilizes parallel test-time scaling to improve accuracy without significantly increasing latency.
Presents a dynamic adapter-switching mechanism to optimize resource usage during inference.

Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks

Nadine Muller, Stefano DeRosa, Su Zhang, Chun Lee Huan

Reinforcement Learning Federated Learning Graph Learning

Presents a comprehensive taxonomy of multi-agent deep learning in wireless networks.
Emphasizes the integration of federated learning with multi-agent systems for privacy-aware intelligence.
Highlights various application domains including MEC, UAV networks, and intrusion detection.
Identifies key challenges such as scalability, security, and real-time constraints in 6G deployments.

Minimum-Action Learning: Energy-Constrained Symbolic Model Selection for Physical Law Identification from Noisy Data

Martin G. Frasch

Optimization Interpretability Theory

MAL effectively identifies physical laws from noisy data by minimizing a Triple-Action functional.
The wide-stencil acceleration-matching technique reduces noise variance significantly, enabling learnability.
MAL achieved 100% identification accuracy for the true force law in all tested cases.
The framework combines symbolic model selection with energy-constrained optimization, enhancing interpretability.

Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift

Camille Jimenez Cortes, Philippe Lalanda, German Vega

Efficient ML

Proposes a novel staged transfer-learning framework for drug-response prediction.
Demonstrates that unsupervised pretraining improves few-shot adaptation to patient tumors.
Highlights the importance of separating representation learning from task supervision.
Provides insights into the latent-space geometry affecting adaptation efficiency.

Manifold-Matching Autoencoders

Laurent Cheret, Vincent Létourneau, Isar Nejadgholi, Chris Drummond, Hussein Al Osman, Maia Fraser

Theory Generative Models Efficient ML

Introduction of Manifold-Matching Autoencoders (MMAE) for improved dimensionality reduction.
Focus on aligning pairwise distances in latent space with input data distances.
MMAE shows superior performance in preserving geometric and topological structures.
Scalable approximation of Multidimensional Scaling (MDS) is achieved.

Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

Elnaz Rahmati, Nona Ghazizadeh, Zhivar Sourati, Nina Rouhani, Morteza Dehghani

Theory Efficient ML Graph Learning

AAT introduces a lightweight, loss-level abstraction mechanism for online continual learning.
The method stabilizes learning by optimizing over both concrete instances and their abstract representations.
AAT outperforms standard instance-only learning and matches or exceeds experience replay baselines.
The paper introduces two new benchmarks for evaluating continual learning methods.

Unsupervised Symbolic Anomaly Detection

Md Maruf Hossain, Tim Katzke, Simon Klüttermann, Emmanuel Müller

Interpretability

SYRAN provides a transparent and interpretable approach to anomaly detection using symbolic regression.
The method generates human-readable equations that describe normal data patterns, allowing for direct inspection and validation.
SYRAN achieves competitive anomaly detection performance compared to existing state-of-the-art methods.
The approach is applicable across various domains without the need for labeled anomaly data.

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Yu-Chen Den, Kuan-Yu Chen, Kendro Vincent, Darby Tien-Hao Chang

Time Series

TIPS integrates multiple inductive biases into a unified Transformer model for financial forecasting.
The framework utilizes knowledge distillation to synthesize the strengths of bias-specialized teacher models.
TIPS outperforms existing state-of-the-art models in financial time series forecasting across multiple metrics.
The model demonstrates significant computational efficiency, requiring only 38% of the inference-time computation compared to alternatives.

The Importance of Being Smoothly Calibrated

Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, Pranay Tankala

Theory

Introduces a new omniprediction guarantee for smoothly calibrated predictors.
Characterizes smooth calibration using the earth mover's distance to the nearest perfectly calibrated distribution.
Demonstrates that estimating the upper distance to calibration is fundamentally limited.
Unifies and extends prior results on omniprediction from smooth calibration.

Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models

Rui Wu, Hong Xie, Yongjun Li

Generative Models Theory Graph Learning

Identifies fundamental flaws in the assumption that local causal mechanisms yield global counterfactual coherence.
Introduces a sheaf-theoretic framework to model structural causal models over Wasserstein spaces.
Develops the Entropic Wasserstein Causal Sheaf Laplacian to resolve topological conflicts without singularities.
Demonstrates the effectiveness of the proposed framework in high-dimensional scRNA-seq counterfactuals.

Only relative ranks matter in weight-clustered large language models

Borja Aizpurua, Sukhbinder Singh, Román Orús

Large Language Models Efficient ML Theory

Relative ranks of weights are more important than their exact values in LLMs.
Weight clustering can significantly compress LLMs without retraining, preserving accuracy.
Fine-tuning cluster means can recover a portion of accuracy loss at low cost.
Rank distortion leads to substantial performance degradation, while rank preservation maintains model quality.

Evidential Domain Adaptation for Remaining Useful Life Prediction with Incomplete Degradation

Yubo Hou, Mohamed Ragab, Yucheng Wang, Min Wu, Abdulla Alseiari, Chee-Keong Kwoh, Xiaoli Li, Zhenghua Chen

Time Series

EviAdapt addresses the limitations of existing domain adaptation methods in RUL prediction with incomplete degradation data.
The method segments data into distinct degradation stages for accurate stage-wise alignment.
Evidential uncertainty alignment is introduced to manage varying degradation patterns across domains.
Extensive experiments show that EviAdapt significantly outperforms current state-of-the-art techniques.

Federated Learning with Multi-Partner OneFlorida+ Consortium Data for Predicting Major Postoperative Complications

Yuanfang Ren, Varun Sai Vemuri, Zhenhong Hu, Benjamin Shickel, Ziyuan Guan, Tyler J. Loftus, Parisa Rashidi, Tezcan Ozrazgat-Baslanti, Azra Bihorac

Federated Learning

Federated learning models were developed to predict major postoperative complications using multicenter data.
The study included a large cohort of 358,644 patients and 494,163 surgical procedures.
Federated learning models showed superior or comparable predictive performance compared to local and central models.
The approach preserves patient data privacy while enhancing model generalizability.

DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis

Aleksander Ogonowski, Konrad Klimaszewski, Przemysław Rokita

Generative Models Computer Vision

Introduction of DSS-GAN, the first GAN to use Mamba as a generator backbone for noise-to-image synthesis.
Development of the Directional Latent Routing (DLR) mechanism for improved class conditioning.
Achieves better performance metrics (FID, KID, precision-recall) than StyleGAN2-ADA with significantly fewer parameters.
Demonstrates that directional subvectors in the latent space allow for structured changes in synthesized images.

Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models

Rishaank Gupta

NLP Large Language Models Interpretability

Introduction of Capability-Guided Compression (CGC) framework for LLMs.
Capability density maps derived from Sparse Autoencoders provide a new signal for compression budget allocation.
Theoretical foundation linking capability density to component-level phase transitions.
Experimental validation shows independence of capability density from existing importance metrics.

QuantFL: Sustainable Federated Learning for Edge IoT via Pre-Trained Model Quantisation

Charuka Herath, Yogachandran Rahulamathavan, Varuna De Silva, Sangarapillai Lambotharan

Federated Learning Efficient ML

QUANTFL combines pre-trained model initialisation with structured quantisation to reduce communication costs in federated learning.
The framework achieves a 40% reduction in total communication while maintaining or exceeding accuracy compared to uncompressed baselines.
QUANTFL employs bucket-based quantisation schemes that adapt to the distribution of model updates, enhancing efficiency.
The method demonstrates robustness under non-IID data conditions, making it suitable for diverse IoT applications.

Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication

Omar Erak, Omar Alhussein, Fang Fang, Sami Muhaidat

Computer Vision Theory Optimization

Introduction of TopoJSCC, a topology-aware DeepJSCC framework.
Integration of persistent-homology regularizers for topology preservation.
Improved performance in topology preservation and PSNR under low SNR conditions.
End-to-end learning without the need for side information.

Translation Invariance of Neural Operators for the FitzHugh-Nagumo Model

Luca Pellegrini

Theory Efficient ML Time Series

Introduces a novel training strategy exploiting translation invariance in the FHN model.
Benchmarks seven different Neural Operator architectures for modeling excitable cell dynamics.
CNOs excel in translated dynamics but require higher training costs.
FNOs achieve low training error but have high inference times and less accuracy on translated data.

Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization

Truong-Son Hy

Optimization

Q-BioLat models protein fitness landscapes in binary latent spaces for efficient optimization.
The framework utilizes pretrained protein language models to create continuous embeddings that are binarized for optimization.
Empirical results show that Q-BioLat effectively identifies high-fitness protein variants.
Different optimization strategies exhibit distinct behaviors based on latent space dimensionality.

FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

Zhenghang Song, Tang Qian, Lu Chen, Yushuai Li, Zhengke Hu, Bingbing Fang, Yumeng Song, Junbo Zhao, Sheng Zhang, Tianyi Li

Efficient ML

FEAT addresses the O(N^2) complexity issue of traditional LDMs by utilizing linear-complexity encoding methods.
The model combines local and global attention mechanisms to preserve expressive representations in structured data.
FEAT incorporates a hybrid structural causal model for improved robustness in pre-training.
Empirical evaluations show significant performance improvements over existing models on real-world datasets.

Federated Distributional Reinforcement Learning with Distributional Critic Regularization

David Millard, Cecilia Alm, Rashid Ali, Pengcheng Shi, Ali Baheri

Reinforcement Learning Federated Learning Robotics

Introduction of FedDistRL, which federates distributional critics while keeping policies local.
Development of TR-FedDistRL, a barycentric regularization method that biases critic updates towards a risk-aware reference distribution.
Empirical demonstration of reduced mean-smearing and improved safety metrics compared to mean-oriented and non-federated baselines.
Theoretical stability results for the constrained critic update under a Wasserstein metric.

Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning

Awatif Yasmin, Tarek Mahmud, Sana Alamgeer, Anne H. H. Ngu

Time Series

The proposed framework effectively balances fall and non-fall activity data using semi-supervised contrastive learning.
Personalized models show improved recall and precision compared to traditional models trained on imbalanced datasets.
The Training from Scratch approach outperforms other retraining strategies, highlighting the importance of tailored data in model training.
The method simplifies the personalization process by automating sample selection, reducing the need for manual labeling.

PhasorFlow: A Python Library for Unit Circle Based Computing

Dibakar Sigdel, Namuna Panday

Theory Optimization Time Series

Introduction of the Phasor Circuit model with a comprehensive gate library.
Development of Variational Phasor Circuits for classical machine learning optimization.
Implementation of a Phasor Transformer that enhances token mixing without parameter overhead.
Validation of PhasorFlow on diverse tasks, showcasing its versatility and efficiency.

Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals

Chinenye Omejieke, Shuyao Chen, Xia Cui

NLP

Introduces a reproducible framework for detecting undervalued football players based on objective mispricing.
Combines structured market data with NLP-derived signals from news articles to improve player valuation.
Demonstrates that market dynamics are the primary indicators of undervaluation, with NLP features providing additional insights.
Utilizes SHAP analyses for interpretability, enhancing trust in the model's recommendations.

OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning

Hao Wu, Yongheng Zhang, Yuan Gao, Fan Xu, Fan Zhang, Ruobing Xie, Ruijian Gou, Yuxuan Liang, Xiaomeng Huang, Xian Wu

Multimodal Large Language Models Interpretability

OMNIFLOW is the first training-free framework for generalized fluid physical reasoning using LLMs.
The architecture enables zero-shot generalization to different governing equations with high prediction accuracy.
OMNIFLOW generates interpretable structured analysis reports, enhancing scientific discovery and decision-making.

Causal Representation Learning on High-Dimensional Data: Benchmarks, Reproducibility, and Evaluation Metrics

Alireza Sadeghi, Wael AbdAlmageed

Theory

Causal representation learning (CRL) models are essential for understanding causal relationships in high-dimensional data.
The paper critiques existing datasets and proposes characteristics for ideal datasets in CRL development.
An integrated evaluation framework is introduced to consolidate multiple performance metrics into a single score.
Reproducibility is highlighted as a critical issue, with recommendations for best practices in sharing code and results.

Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization

Ahmet Kaplan

Optimization Interpretability

Integration of AutoML with deep unfolding for waveform optimization.
Achieves high spectral efficiency with significantly fewer training samples.
Introduces a hybrid layer for learnable gradient transformation.
Addresses gradient normalization for improved training consistency.

What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover

Ivan Felipe Benavides-Martinez, Justin Guthrie, Jhon Edwin Arias, Yeison Alberto Garces-Gomez, Angela Ines Guzman-Alvis, Cristiam Victoriano Portilla-Cabrera, Somnath Mondal, Andrew J. Allyn, Auroop R. Ganguly

Interpretability Multimodal Efficient ML

Introduces a functional interpretability framework for GAEF embeddings.
Identifies a hierarchical organization of embedding dimensions based on their roles.
Demonstrates that high classification accuracy can be achieved with only a few dimensions.
Highlights the redundancy in the embedding space, suggesting potential for computational efficiency.

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Zirui Gong, Leo Yu Zhang, Yanjun Zhang, Viet Vo, Tianqing Zhu, Shirui Pan, Cong Wang

Federated Learning

ARES enables high-fidelity reconstruction of training samples from large batches without architectural modifications.
The attack formulates the recovery problem as a noisy sparse recovery task using Lasso.
The incorporation of the imprint method allows for scalable reconstruction of individual samples.
Theoretical guarantees are established for the recovery rate and reconstruction error.

Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking

Liu Hung Ming

Reinforcement Learning Theory Optimization

Introduces DRCB as a novel defense mechanism against steganographic collusion in MARL.
Demonstrates that existing static monitoring techniques are ineffective in reducing collusion.
Shows significant improvements in observer accuracy and reduced volatility under DRCB governance.
Highlights the Transparency Paradox, where agents achieve predictability while retaining covert communication capabilities.

DISCOVER: A Solver for Distributional Counterfactual Explanations

Yikai Gu, Lele Cao, Bo Zhao, Lei Lei, Lei You

Optimization Interpretability

DISCOVER is a model-agnostic solver that preserves the DCE objective while avoiding gradient-based optimization.
The method utilizes a sparse propose-and-select search to focus on the most influential samples for counterfactual generation.
An OT-guided cone sampling technique enhances the efficiency of candidate generation without relying on predictor gradients.
The approach successfully extends distributional counterfactual reasoning to non-differentiable models, making it applicable to a wider range of real-world scenarios.

Symmetry-Reduced Physics-Informed Learning of Tensegrity Dynamics

Jing Qin, Muhao Chen

Theory Efficient ML Robotics

Introduces SymPINN, a framework that incorporates geometric symmetries into tensegrity dynamics modeling.
Reduces computational complexity by using a symmetry basis for nodal coordinates.
Ensures predicted configurations satisfy symmetry constraints through symmetry transformations.
Demonstrates improved prediction accuracy and efficiency in numerical experiments.

SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval

Akshaj Murhekar, Christina Liu, Abhijit Mishra, Shounak Roychowdhury, Jacek Gwizdka

NLP Large Language Models Multimodal

Introduces a lightweight EEG-to-text framework that avoids LLM fine-tuning.
Utilizes a CLIP-aligned EEG representation for semantic grounding and keyword inference.
Ensures privacy by keeping raw EEG data on-premises and only sharing extracted keywords.
Achieves comparable or improved performance over fine-tuned LLMs in generating text from EEG signals.

Variational Rectification Inference for Learning with Noisy Labels

Haoliang Sun, Qi Wei, Lei Feng, Yupeng Hu, Fan Liu, Hehe Fan, Yilong Yin

Theory Optimization

Introduces Variational Rectification Inference (VRI) for robust learning with noisy labels.
Formulates loss rectification as an amortized variational inference problem.
Utilizes a hierarchical Bayesian model to treat the rectifying vector as a latent variable.
Demonstrates improved generalization performance and avoids model collapse.

Discovering the Hidden Role of Gini Index In Prompt-based Classification

Ruixi Lin

NLP Large Language Models Optimization

The Gini Index serves as a valuable tool for detecting and optimizing class accuracy disparities in prompt-based classification.
Significant relative accuracy imbalances exist in both text and image classification tasks, regardless of dimensionality.
A post-hoc model-agnostic bias mitigation method based on the Gini Index can effectively reduce accuracy imbalances.
The proposed method enhances the performance of minority classes while limiting the dominance of frequently seen head classes.

Evaluating Causal Discovery Algorithms for Path-Specific Fairness and Utility in Healthcare

Nitish Nagesh, Elahe Khatibi, Thomas Hughes, Mahdi Bagheri, Pratik Gajane, Amir M. Rahmani

Graph Learning

Establishment of causal graph benchmarks for synthetic and real-world clinical datasets.
Evaluation of causal discovery algorithms on structural recovery and path-specific fairness.
Identification of significant variations in fairness-utility ratios across different algorithms.
Highlighting the necessity for graph-aware fairness evaluations in clinical applications.

Determinism in the Undetermined: Deterministic Output in Charge-Conserving Continuous-Time Neuromorphic Systems with Temporal Stochasticity

Jing Yan, Kang You, Zhezhi He, Yaoyu Zhang

Theory Efficient ML

Development of a unified continuous-time framework for charge-conserving SNNs.
Establishment of deterministic output under temporal stochasticity through rigorous proof.
Exact representational correspondence between charge-conserving SNNs and QANNs.
Demonstration of unique terminal states that are invariant to spike timing.

On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings

David Restrepo, Miguel L Martins, Chenwei Wu, Luis Filipe Nakayama, Diego M Lopez, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante

Multimodal

Introduces a post-hoc mechanism to adjust modality gap in VLMs without retraining.
Demonstrates that the modality gap significantly affects performance in medical datasets.
Finds that optimal separation is task-dependent, challenging the notion of universally minimizing the modality gap.
Highlights the pronounced cone effect in medical domains due to lower diversity in data.

Formal verification of tree-based machine learning models for lateral spreading

Krishna Kumar

Theory Interpretability

Introduces formal verification via SMT solvers for tree-based geotechnical ML models.
Formalizes four key geotechnical specifications for model compliance.
Demonstrates the limitations of post-hoc explainability methods in ensuring model consistency.
Establishes a verify-fix-verify engineering loop for improving model reliability.

FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios

Andrea Moleri, Christian Internò, Ali Raza, Markus Olhofer, David Klindt, Fabio Stella, Barbara Hammer

Federated Learning Generative Models Computer Vision

FederatedFactory achieves centralized performance in extreme single-class silo scenarios, significantly improving accuracy from 11.36% to 90.57% on CIFAR-10.
The framework operates with zero dependency on external pre-trained models, relying solely on localized generative priors.
It utilizes a one-shot communication strategy, enhancing efficiency by avoiding multiple rounds of data transmission.
The architecture supports exact modular unlearning, allowing for the removal of specific client contributions without data leakage.

Learning Permutation Distributions via Reflected Diffusion on Ranks

Sizhuang He, Yangtian Zhang, Shiyang Zhang, David van Dijk

Generative Models Optimization

Introduction of Soft-Rank Diffusion for learning permutation distributions.
Utilization of a continuous soft-rank representation to enable smoother diffusion processes.
Development of contextualized generalized Plackett–Luce (cGPL) denoisers for enhanced expressivity.
Demonstrated superior performance on permutation generation tasks compared to existing methods.

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Dibakar Sigdel

Time Series Efficient ML Theory

Introduction of the Phasor Transformer block as a phase-native alternative to dense attention layers.
Achieves global token mixing with O(N log N) complexity using DFT token coupling.
Demonstrates competitive performance in time-series forecasting with fewer parameters than traditional Transformers.
Establishes a new efficiency-performance frontier for long-context temporal modeling.

Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability, Stability and Fairness

Krishna Kumar Neelakanta Pillai, Santha Kumari Amma

Reinforcement Learning Optimization

MAPPO outperforms other algorithms in terms of profit and stability.
MADDPG achieves fairer profit distribution among agents despite lower overall profit.
The study highlights the importance of stability and reproducibility in MARL for dynamic pricing.
Insights on trade-offs between exploration and reliability are provided, particularly regarding MASAC.

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

D. Darankoum, C. Habermacher, J. Volle, S. Grudinin

Time Series

Introduction of a Gaussian-smoothed masking strategy for EEG signal pretraining.
Development of SpecHi-Net, a hierarchical architecture for multi-scale feature extraction.
Implementation of a spectral gating mechanism in a mixture of experts framework.
Demonstration of state-of-the-art performance in diverse EEG decoding tasks.

A foundation model for electrodermal activity data

Leonardo Alchieri, Matteo Garzon, Lidia Alecci, Francesco Bombassei De Bona, Martin Gjoreski, Giovanni De Felice, Silvia Santini

Time Series

Introduction of UME, the first foundation model specifically for EDA data.
Compilation of EDAMAME, a large-scale EDA dataset from 24 public sources.
UME outperforms baseline models and matches generalist models with significantly lower computational costs.
Challenges in EDA modeling are acknowledged, indicating the need for further research.

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Goutam Das, Michael Dorothy, Kyle Volle, Daigo Shishika

Reinforcement Learning Theory Optimization

Introduces a hybrid framework combining game theory with multi-agent reinforcement learning (MARL).
Achieves significant improvements in training efficiency, with higher rewards and faster convergence.
Utilizes the Apollonius Circle for Nash equilibrium computation, allowing for early termination of RL episodes.
Demonstrates effectiveness across different team sizes in border defense scenarios.

CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

Weikun K. Zhang, Rohan Pandey, Bhaumik Mehta, Kaijie Jin, Naomi Morato, Archit Ganapule, Michael Ruofan Zeng, Jarod Alper

Reinforcement Learning Theory Efficient ML

The paper formulates the problem of arithmetic circuit synthesis as a single-player game for RL agents.
Two RL methods are compared: PPO+MCTS and SAC, with SAC showing better performance on simpler tasks.
PPO+MCTS demonstrates scalability to more complex polynomial instances.
The study suggests that RL can effectively navigate the vast search space of arithmetic circuits.

WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation

Zahin Sufiyan, Shadan Golestan, Yoshihiro Mitsuka, Shotaro Miwa, Osmar Zaiane

Reinforcement Learning Generative Models Robotics

WINFlowNets introduces a co-training framework for flow and retrieval networks, enhancing adaptability in dynamic environments.
The two-phase training strategy (Warm-Up and Dual-Training) eliminates the need for pre-training the retrieval network.
Experimental results show significant improvements in performance and stability over standard CFlowNets and leading RL algorithms.
WINFlowNets demonstrates strong adaptability in fault environments, making it suitable for real-world robotic applications.

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Linxiao Yang, Xue Jiang, Gezheng Xu, Tian Zhou, Min Yang, ZhaoYang Zhu, Linyuan Geng, Zhipeng Zeng, Qiming Chen, Xinyue Gu, Rong Jin, Liang Sun

Time Series

Baguan-TS unifies end-to-end representation learning with in-context learning for time series forecasting.
The model employs a 3D Transformer architecture that attends to temporal, variable, and context dimensions.
A Y-space retrieval-based calibration module improves model stability and forecasting accuracy.
The context-overfitting strategy enhances robustness by balancing denoising and sample selection.

The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning

Max Zimmer, Nico Pelleriti, Christophe Roux, Sebastian Pokutta

Theory Optimization Large Language Models

Introduces a five-level taxonomy of AI integration in research.
Presents an open-source framework for using CLI coding agents as autonomous research assistants.
Demonstrates the framework's application through case studies in mathematics and machine learning.
Emphasizes the importance of human oversight and augmentation in AI-assisted research.

Conditional Inverse Learning of Time-Varying Reproduction Numbers Inference

Lanlan Yu, Quan-Hui Liu, Haoyue Zheng, Xinfu Yang

Time Series

CIRL addresses the ill-posed inverse problem of estimating time-varying reproduction numbers from epidemic data.
The framework combines epidemiological constraints with data-driven modeling to enhance adaptability to changing dynamics.
CIRL employs a Conditional Inverse Mapping Network and a Statistical Observation and Consistency Module to improve estimation accuracy.
Experiments validate the robustness of CIRL against observation noise and its responsiveness to abrupt transmission changes.

FlashSampling: Fast and Memory-Efficient Exact Sampling

Tomas Ruiz, Zhen Qin, Yifan Zhang, Xuyang Shen, Yiran Zhong, Mengdi Wang

NLP Large Language Models Efficient ML

FlashSampling fuses exact sampling into the LM-head matmul, eliminating the need for full logits tensor materialization.
The method computes logits tile-by-tile and retains only essential candidates, reducing memory traffic and improving efficiency.
FlashSampling achieves exact sampling without approximations, maintaining accuracy while enhancing performance.
The approach demonstrates significant speedups in end-to-end vLLM experiments across multiple GPU architectures.

Today's ML research,without the noise.

Today's ML research,
without the noise.