AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

64 Papers today
8h Update frequency
7 Days of history
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations
Ricardo Knauer, Andre Beinrucker, Erik Rodner
Interpretability
  • ConceptTracer is an interactive application for analyzing neural representations based on human-understandable concepts.
  • It incorporates information-theoretic measures to quantify concept saliency and selectivity in neural activations.
  • The tool was validated using representations from TabPFN, demonstrating its utility in identifying interpretable neurons.
  • ConceptTracer enhances the understanding of how neural networks encode information, contributing to mechanistic interpretability.
Read more
Fraud Detection System for Banking Transactions
Ranya Batsyas, Ritesh Yaduwanshi
Theory Efficient ML Optimization
  • The framework utilizes the PaySim dataset to simulate financial transactions for effective fraud detection.
  • Employs CRISP-DM methodology for structured data analysis and model development.
  • Implements SMOTE to address class imbalance in the dataset, improving minority class detection.
  • Evaluates multiple machine learning models, with hyperparameter tuning to enhance performance.
Read more
ODE-free Neural Flow Matching for One-Step Generative Modeling
Xiao Shou
Generative Models
  • OT-NFM allows for one-step image generation without ODE solvers at inference.
  • Mean collapse is identified as a unique failure mode in neural flow models, necessitating optimal transport for effective learning.
  • Two scalable optimal transport coupling strategies are introduced, enhancing the practicality of OT-NFM for large-scale applications.
  • Empirical results show OT-NFM's competitive performance in generating high-quality samples with reduced computational cost.
Read more
PriPG-RL: Privileged Planner-Guided Reinforcement Learning for Partially Observable Systems with Anytime-Feasible MPC
Mohsen Amiri, Mohsen Amiri, Ali Beikmohammadi, Sindri Magnuśson, Mehdi Hosseinzadeh
Reinforcement Learning Robotics Optimization
  • Introduces PriPG-RL, a framework for RL in partially observable environments using a privileged planner.
  • Utilizes an anytime-feasible MPC algorithm (REAP) to provide structured guidance to the learning agent.
  • Develops the Planner-to-Policy Soft Actor-Critic (P2P-SAC) method to distill knowledge from the planner.
  • Demonstrates improved sample efficiency and policy performance in simulations and real-world applications.
Read more
A Novel Edge-Assisted Quantum-Classical Hybrid Framework for Crime Pattern Learning and Classification
Niloy Das, Apurba Adhikary, Sheikh Salman Hassan, Yu Qiao, Zhu Han, Tharmalingam Ratnarajah, Choong Seon Hong
Optimization
  • First comprehensive quantum-classical comparison for crime analytics with statistical validation.
  • Novel quantum circuit architecture exploits crime feature correlations through targeted entanglement.
  • Hybrid architectures (Q→C and C→Q) enhance classification performance and efficiency.
  • Quantum-inspired methods show competitive accuracy and reduced parameter requirements.
Read more
Bayesian Optimization for Mixed-Variable Problems in the Natural Sciences
Yuhao Zhang, Ti John, Matthias Stosiek, Patrick Rinke
Optimization
  • Generalizes the probabilistic reparameterization approach to handle non-equidistant discrete variables.
  • Demonstrates the effectiveness of Bayesian optimization in mixed-variable settings using Gaussian process surrogates.
  • Conducts systematic benchmarks to optimize kernel formulations and acquisition functions.
  • Establishes a practical framework for optimizing mixed-variable problems in natural sciences.
Read more
DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting
Yeran Zhang, Pengwei Yang, Guoqing Wang, Tianyu Li
Time Series Graph Learning Interpretability
  • DSPR effectively decouples stable trends from regime-dependent dynamics in industrial time series forecasting.
  • The framework incorporates an Adaptive Window for transport delays and a Physics-Guided Dynamic Graph for interaction structures.
  • DSPR achieves state-of-the-art predictive performance with high accuracy and physical plausibility.
  • The model provides interpretable insights consistent with known domain mechanisms, enhancing scientific understanding.
Read more
Extraction of linearized models from pre-trained networks via knowledge distillation
Fumito Kimura, Jun Ohkubo
Efficient ML Theory
  • Proposes a framework for extracting linearized models from pre-trained neural networks using knowledge distillation.
  • Integrates Koopman operator theory to approximate nonlinear transformations as linear systems.
  • Demonstrates improved classification accuracy and numerical stability over conventional methods.
  • Utilizes principal component analysis to incorporate weak nonlinearity in the model.
Read more
SBBTS: A Unified Schrödinger-Bass Framework for Synthetic Financial Time Series
Alexandre Alouadi, Grégoire Loeper, Célian Marsala, Othmane Mazhar, Huyên Pham
Generative Models Time Series Optimization
  • Introduces SBBTS, a unified framework for generating synthetic financial time series.
  • Jointly models drift and stochastic volatility, overcoming limitations of existing methods.
  • Demonstrates improved forecasting performance and data augmentation capabilities.
  • Empirical validation on both synthetic benchmarks and real financial data.
Read more
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo
Computer Vision Generative Models Multimodal
  • Introduces BrainCoDec, a training-free method for cross-subject brain decoding.
  • Achieves generalization to novel subjects without fine-tuning or anatomical alignment.
  • Utilizes a hierarchical inference process for robust visual decoding.
  • Demonstrates strong performance across diverse visual backbones and scanning protocols.
Read more
Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings
Yunxiang Peng, Mengmeng Ma, Ziyu Yao, Xi Peng
Computer Vision Interpretability
  • Introduces two new metrics for evaluating model generalization based on internal mechanisms.
  • Dependency Depth Bias (DDB) quantifies reliance on deep versus shallow features for model selection.
  • Circuit Shift Score (CSS) detects performance degradation under distribution shifts.
  • Both metrics show improved correlation with OOD performance, outperforming existing methods.
Read more
Automating aggregation strategy selection in federated learning
Dian S. Y. Pang, Endrias Y. Ergetu, Eric Topham, Ahmed E. Fetit
Federated Learning
  • Introduces an automated framework for selecting aggregation strategies in Federated Learning.
  • Utilizes large language models for single-trial strategy inference and genetic search for multi-trial exploration.
  • Demonstrates improved robustness and generalization under non-IID conditions.
  • Reduces reliance on manual intervention in strategy selection.
Read more
ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification
Paul Quinlan, Qingguo Li, Xiaodan Zhu
Time Series
  • Introduction of the ADAPT framework for many-to-one pre-training in time-series classification.
  • Achieves state-of-the-art performance on 162 diverse time-series datasets.
  • Utilizes average adaptive pooling for mixed-batch training, accommodating varying input dimensions.
  • Addresses fundamental challenges in building generalist models for time-series data.
Read more
Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge
Wonseon Lim, Jaesung Lee, Dae-Won Kim
Computer Vision Efficient ML Robotics
  • Introduction of CPS-Prompt framework for efficient continual learning on edge devices.
  • Utilization of Critical Patch Sampling (CPS) for effective token reduction.
  • Implementation of Decoupled Prompt and Classifier Training (DPCT) to reduce backpropagation overhead.
  • Demonstrated significant improvements in memory usage, training time, and energy efficiency.
Read more
Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training
Jinhong Lin, Pan Wang, Zitong Zhan, Lin Zhang, Pedro Morgado
Generative Models Efficient ML Computer Vision
  • Identifies the mismatch between data complexity and model readiness as a source of inefficiency in diffusion training.
  • Introduces a semantic-aware image complexity metric that combines foreground dominance and typicality.
  • Demonstrates significant improvements in generation quality metrics (IS and FID) with the proposed curriculum strategy.
  • Establishes that the order of image complexity presentation is critical for performance gains.
Read more
Decisions and Deployment: The Five-Year SAHELI Project (2020-2025) on Restless Multi-Armed Bandits for Improving Maternal and Child Health
Shresth Verma, Arpan Dasgupta, Neha Madhiwalla, Aparna Taneja, Milind Tambe
Optimization Reinforcement Learning Theory
  • Restless Multi-Armed Bandits effectively optimize limited public health interventions.
  • Decision-focused learning enhances outcomes in predict-then-optimize scenarios.
  • Long-term studies showed significant improvements in adherence to mHealth programs and maternal health behaviors.
Read more
How Does Machine Learning Manage Complexity?
Lance Fortnow
Theory
  • Machine learning models can effectively manage complexity through probabilistic outcomes.
  • The paper abstracts machine learning to P/poly-computable distributions with polynomially-bounded max-entropy.
  • A key theorem shows that learned distributions from cryptographic pseudorandom generators are close to uniform.
  • The strength of machine learning models is derived from their ability to generate random guesses rather than specific answers.
Read more
Production-Ready Automated ECU Calibration using Residual Reinforcement Learning
Andreas Kampmeier, Kevin Badalian, Lucas Koch, Sung-Yong Lee, Jakob Andert
Reinforcement Learning Optimization Interpretability
  • Introduces a residual reinforcement learning approach for automated ECU calibration.
  • Demonstrates the methodology using a map-based air path controller in a HiL environment.
  • Achieves faster calibration with minimal human intervention compared to traditional methods.
  • Ensures explainability and safety in the calibration process.
Read more
Weighted Bayesian Conformal Prediction
Xiayin Lou, Peng Luo
Theory
  • WBCP generalizes BQ-CP to importance-weighted settings, addressing distribution shifts.
  • Theoretical results confirm the calibration consistency and improved coverage guarantees.
  • Geographical BQ-CP offers spatial diagnostics, enhancing interpretability in spatial predictions.
  • WBCP maintains coverage guarantees while providing richer uncertainty information.
Read more
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics
Ziyi Ding, Xianxin Lai, Weiyu Chen, Xiao-Ping Zhang, Jiayu Chen
Generative Models Graph Learning Reinforcement Learning
  • CausalVAE is proposed as a plug-in module for latent world models to enhance counterfactual dynamics.
  • The integration of a structured causal disentanglement module allows for the identification of causal relationships among latent variables.
  • A staged training strategy is introduced to stabilize sequential training and improve model interpretability.
  • Significant performance gains were observed in counterfactual retrieval tasks, especially in physics-related benchmarks.
Read more
Preference Redirection via Attention Concentration: An Attack on Computer Use Agents
Dominik Seip, Matthias Hein
Computer Vision Multimodal
  • Introduction of PRAC, a novel attack on CUAs that manipulates attention via adversarial image patches.
  • Demonstration of PRAC's effectiveness in redirecting CUA preferences in online shopping scenarios.
  • Highlighting the vulnerability of vision modalities in CUAs, which has been less explored compared to language modalities.
  • Validation of the attack in realistic deployment settings, showing high success rates.
Read more
Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee
NLP Large Language Models Graph Learning
  • Introduces a feedforward graph architecture leveraging frozen LLMs for enhanced performance.
  • Achieves strong benchmark results, outperforming single constituent models and parameter-matched classifiers.
  • Demonstrates effective gradient flow through frozen model boundaries, enabling end-to-end training.
  • Emergent selective routing behavior observed in the output node without explicit supervision.
Read more
GAN-based Domain Adaptation for Image-aware Layout Generation in Advertising Poster Design
Chenchen Xu, Min Zhou, Tiezheng Ge, Weiwei Xu
Generative Models Computer Vision
  • Introduction of the CGL-Dataset for training image-aware layout generation models.
  • Development of two GAN-based models: CGL-GAN and PDA-GAN, with the latter using pixel-level domain adaptation.
  • Proposal of three novel content-aware metrics for evaluating layout generation quality.
  • PDA-GAN outperforms CGL-GAN, achieving significant improvements across multiple evaluation metrics.
Read more
Learning to Query History: Nonstationary Classification via Learned Retrieval
Jimmy Gammell, Bishal Thapaliya, Yoon Jung, Riyasat Ohib, Bilel Fehri, Deepayan Chakrabarti
Time Series
  • Introduces a learned retrieval mechanism for nonstationary classification.
  • Reframes nonstationary classification as a time series prediction problem.
  • Demonstrates improved robustness to distribution shifts compared to standard classifiers.
  • Allows for the use of large historical data corpora without requiring them to fit in memory.
Read more
Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency
Mingqing Xiao, Yansen Wang, Dongqi Han, Caihua Shan, Dongsheng Li
Computer Vision Efficient ML Multimodal
  • Introduction of Kuramoto oscillatory Phase Encoding (KoPE) to Vision Transformers.
  • KoPE enhances learning efficiency through synchronization of phase and rate representations.
  • Demonstrated improvements in training, parameter, and data efficiency across multiple vision tasks.
  • Facilitates attention learning and structural understanding in neural networks.
Read more
Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning
Teng Pang, Zhiqiang Dong, Yan Zhang, Rongjian Xu, Guoqiang Wu, Yilong Yin
Reinforcement Learning Generative Models Robotics
  • Introduction of VGM2P, a flow-based policy learning framework for offline MARL.
  • Utilization of global advantage values to guide agent collaboration.
  • Implementation of classifier-free guidance MeanFlow for efficient action generation.
  • Demonstrated comparable performance to advanced methods using only conditional behavior cloning.
Read more
PolicyLong: Towards On-Policy Context Extension
Junlong Jia, Ziyang Chen, Xing Wu, Chaochen Gao, TingHao Yu, Feng Zhang, Songlin Hu
NLP Large Language Models
  • PolicyLong proposes an on-policy framework for long-context training, addressing the off-policy gap in traditional methods.
  • The iterative self-curriculum allows the model to continuously adapt its training data based on its evolving capabilities.
  • Both positive contexts and hard negatives are derived from the current model's entropy landscape, enhancing learning efficiency.
  • Experiments show significant performance improvements over baseline methods, especially with longer context lengths.
Read more
Improving Semantic Uncertainty Quantification in Language Model Question-Answering via Token-Level Temperature Scaling
Tom A. Lamb, Desi R. Ivanova, Philip H. S. Torr, Tim G. J. Rudner
NLP Large Language Models
  • Systematic evaluation of semantic calibration and discrimination reveals limitations of existing methods.
  • Optimized token-level temperature scaling significantly improves semantic UQ compared to fixed-temperature heuristics.
  • The proposed method enhances both semantic calibration and discrimination in question-answering tasks.
  • A principled approach to response selection based on semantic confidence distributions yields better results.
Read more
Cluster Attention for Graph Machine Learning
Oleg Platonov, Liudmila Prokhorenkova
Graph Learning
  • Introduction of Cluster Attention (CLATT) as a new attention mechanism for graph learning.
  • CLATT allows nodes to attend to other nodes within their clusters, enhancing receptive fields.
  • Augmenting MPNNs and Graph Transformers with CLATT improves performance on diverse graph datasets.
  • The method retains strong graph-structure-based inductive biases, crucial for GML tasks.
Read more
Equivariant Efficient Joint Discrete and Continuous MeanFlow for Molecular Graph Generation
Rongjian Xu, Teng Pang, Zhiqiang Dong, Guoqiang Wu
Generative Models Graph Learning
  • Introduction of Equivariant MeanFlow (EQUIMF) for joint modeling of discrete and continuous graph components.
  • Development of a new discrete MeanFlow model that enables efficient few-step sampling.
  • Implementation of synchronized MeanFlow dynamics with mutual conditioning for improved generation quality.
  • EQUIMF shows superior performance in molecular generation benchmarks compared to existing methods.
Read more
KV Cache Offloading for Context-Intensive Tasks
Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov
NLP Large Language Models Efficient ML
  • Introduction of the Text2JSON benchmark for evaluating context-intensive tasks.
  • Significant performance degradation observed in existing KV-cache offloading methods.
  • Identification of low-rank projection and unreliable landmarks as key issues affecting accuracy.
  • Proposal of a simpler alternative strategy that improves performance across multiple LLMs.
Read more
Tracking Adaptation Time: Metrics for Temporal Distribution Shift
Lorenzo Iovine, Giacomo Ziffer, Emanuele Della Valle
Theory Time Series
  • Existing metrics fail to distinguish between adaptation lag and intrinsic data difficulty.
  • Three new metrics are proposed to evaluate model adaptation under temporal distribution shifts.
  • The study reveals that performance degradation may be misinterpreted as poor adaptation.
  • Results indicate that the ID-OOD accuracy gap often reflects adaptation lag rather than a lack of generalization.
Read more
Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing
Ning Yang, Chuangxin Cheng, Haijun Zhang
Large Language Models Reinforcement Learning Optimization
  • COMLLM integrates GRPO and LACS for effective task offloading in MEC.
  • The framework captures long-term impacts of decisions on future system states.
  • Achieves near-optimal latency and improved load-balancing fairness.
  • Exhibits zero-shot scalability, generalizing to larger topologies without retraining.
Read more
SOLAR: Communication-Efficient Model Adaptation via Subspace-Oriented Latent Adapter Reparametrization
Seyed Mahmoud Sajjadi Mohammadabadi, Xiaolong Ma, Lei Yang, Feng Yan, Junshan Zhang
Efficient ML Federated Learning Large Language Models
  • SOLAR reduces communication costs of PEFT methods by reparameterizing updates as linear combinations of basis vectors.
  • The framework is model-agnostic and compatible with existing PEFT techniques, allowing for flexible integration.
  • The method achieves up to 98% reduction in adapter sizes while preserving task performance.
  • A theoretical analysis provides bounds on reconstruction error, ensuring reliability in performance.
Read more
On the Price of Privacy for Language Identification and Generation
Xiaoyu Li, Andi Han, Jiaojiao Jiang, Junbin Gao
NLP Large Language Models Theory
  • Approximate differential privacy incurs no cost for language identification and generation tasks.
  • Under pure differential privacy, the degradation in performance is characterized by a factor of min{1,ε}.
  • Language generation exhibits a tighter privacy-utility tradeoff compared to language identification.
  • The study provides a complete characterization of the price of privacy for language learning tasks.
Read more
Bias-Constrained Diffusion Schedules for PDE Emulations: Reconstruction Error Minimization and Efficient Unrolled Training
Constantin Le Cleï, Nils Thürey, Xiaoxiang Zhu
Generative Models Optimization Time Series
  • Introduction of the Reconstruction Exposure-Bias concept linking training and inference errors.
  • Development of an Adaptive Noise Schedule to optimize reconstruction error and exposure bias.
  • Proposal of a fast Proxy Unrolled Training method to enhance stability and reduce computational costs.
  • Demonstrated improvements in accuracy and stability over existing diffusion and deterministic models.
Read more
QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training–Inference Mismatch
Hao Gu, Hao Wang, Jiacheng Liu, Lujun Li, Qiyuan Zhu, Bei Liu, Binxing Xu, Lei Wang, Xintong Yang, Sida Lin, Sirui Han, Yike Guo
Reinforcement Learning Large Language Models Efficient ML
  • QaRL minimizes the training-inference mismatch in quantized rollouts for LLMs.
  • Introduces TBPO to stabilize training by controlling updates within a trust region.
  • Demonstrates significant performance improvements over traditional quantized rollout training.
  • Achieves a 1.3x training speedup while maintaining high accuracy.
Read more
Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization
Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang
Reinforcement Learning Large Language Models Optimization
  • MOLREACT integrates reinforcement learning with LLMs to optimize lead compounds in drug discovery.
  • The framework ensures synthesizability by using validated reaction templates for molecular modifications.
  • A caching mechanism significantly reduces computational costs during the optimization process.
  • MOLREACT outperforms existing methods in property optimization tasks while maintaining sample efficiency.
Read more
SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization
Wooin Lee, Hyun-Tae Kim
Large Language Models Optimization Efficient ML
  • SAGE optimizes memory usage by replacing the second moment state of AdamW with a dimension-wise adaptive scale.
  • The optimizer effectively addresses the unique challenges posed by high-variance gradients in embedding layers.
  • SAGE achieves state-of-the-art perplexity results while significantly reducing memory requirements compared to existing methods.
Read more
The Impact of Dimensionality on the Stability of Node Embeddings
Tobias Schumacher, Simon Reichelt, Markus Strohmaier
Graph Learning
  • Dimensionality significantly affects the stability of node embeddings.
  • Different embedding methods exhibit varied stability patterns with increasing dimensions.
  • Higher dimensionality does not guarantee optimal performance in downstream tasks.
  • The study highlights the trade-offs between stability, performance, and computational efficiency in graph representation learning.
Read more
Approximation of the Basset force in the Maxey-Riley-Gatignol equations via universal differential equations
Finn Sommer, Vamika Rathi, Sebastian Goetschel, Daniel Ruprecht
Theory Optimization Time Series
  • Introduces a neural network-based approximation for the Basset force in MaRGE.
  • Transforms complex integro-differential equations into solvable ordinary differential equations.
  • Compares FNN and LSTM architectures for capturing historical effects in particle motion.
  • Demonstrates the feasibility of using standard numerical solvers for the modified equations.
Read more
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
Quantong Qiu, Zhiyi Hong, Yi Yang, Haitian Wang, Kebin Liu, Qingqing Dang, Juntao Li, Min Zhang
Large Language Models Efficient ML
  • Flux Attention dynamically optimizes attention computation at the layer level, improving efficiency.
  • The method integrates a lightweight Layer Router to adaptively assign layers to Full or Sparse Attention based on context.
  • The approach achieves significant speed improvements in inference without sacrificing performance.
  • Training is efficient, requiring only 12 hours on 8 GPUs, making it accessible for practical applications.
Read more
Asymptotic-Preserving Neural Networks for Viscoelastic Parameter Identification in Multiscale Blood Flow Modeling
Giulia Bertaglia, Raffaella Fiamma Cabini
Theory
  • Introduction of Asymptotic-Preserving Neural Networks for viscoelastic parameter identification.
  • Integration of physical principles into the neural network training process.
  • Use of non-invasive patient-specific data for estimating pressure waveforms.
  • Demonstration of methodology effectiveness through synthetic and real patient data simulations.
Read more
Learning is Forgetting: LLM Training As Lossy Compression
Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen, Thomas L. Griffiths, Max Bartolo, Seraphina Goldfarb-Tarrant
NLP Large Language Models Theory
  • LLMs are conceptualized as instances of lossy compression, retaining only relevant information from training data.
  • Pre-training dynamics of LLMs align with Information Bottleneck theory, showing a two-phase trajectory of representation expansion followed by compression.
  • The optimality of a model's compression correlates significantly with its performance across multiple benchmarks.
  • Different LLMs compress information differently, influenced by their training data and methodologies.
Read more
Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees
Marek Gagolewski
Theory Graph Learning Efficient ML
  • Lumbermark is a robust clustering algorithm that can handle varying cluster sizes, densities, and shapes.
  • The algorithm utilizes mutual reachability distances to enhance data distribution and reduce noise impact.
  • Lumbermark allows users to specify the number of clusters, unlike HDBSCAN.
  • An open-source implementation is available for both Python and R.
Read more
Bi-level Heterogeneous Learning for Time Series Foundation Models: A Federated Learning Approach
Shengchao Chen, Guodong Long, Dikai Liu, Jing Jiang
Time Series Federated Learning
  • Introduction of FedTRL, a federated learning method for bi-level heterogeneous learning in time series data.
  • Mitigation of inter-domain and intra-domain conflicts through domain-adversarial optimization and prototype alignment.
  • Demonstrated superior performance of FedTRL over centralized and federated TSFM baselines in forecasting tasks.
  • Provision of a flexible and scalable approach for training TSFMs in heterogeneous environments.
Read more
Cognitive-Causal Multi-Task Learning with Psychological State Conditioning for Assistive Driving Perception
Keito Inoshita, Nobuhiro Hayashida, Akira Imanishi
Multimodal
  • CauPsi models the hierarchical dependencies among multiple driving-related tasks, enhancing inter-task information transfer.
  • The framework introduces a Causal Task Chain for soft-label propagation, reflecting cognitive processes in driving.
  • CTPC integrates psychological state signals into all tasks, addressing the impact of driver emotions on performance.
  • CauPsi achieves state-of-the-art accuracy on the AIDE dataset with a compact model size.
Read more
Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
Changkun Guan, Mengfan Xu
Theory Optimization
  • Pareto regret in stochastic MO-MABs is governed by the maximum sub-optimality gap g†.
  • The proposed algorithm achieves optimal Pareto regret of O(K log T/g†).
  • The algorithm utilizes a two-layer uncertainty quantification strategy for effective exploration and exploitation.
  • Numerical experiments validate the algorithm's performance, showing lower finite-horizon Pareto regret compared to existing methods.
Read more
Distributed Interpretability and Control for Large Language Models
Dev Arpan Desai, Shaoyi Huang, Zining Zhu
Large Language Models Interpretability Efficient ML
  • Introduces a tensor-parallel inference architecture for LLMs that supports activation-level interpretability and steering.
  • Achieves up to 7x reduction in activation memory and up to 41x increase in throughput compared to baseline methods.
  • Enables real-time behavioral steering with stable, monotonic output modifications without additional passes.
  • Demonstrates effectiveness across multiple large models, including LLaMA and Qwen.
Read more
How to sketch a learning algorithm
Sam Gunn
Theory Interpretability Efficient ML
  • Introduces a data deletion scheme that predicts model behavior with diminishing error ε.
  • The scheme's precomputation and prediction are computationally efficient, only slightly slower than traditional methods.
  • The concept of 'stability' is central to the methodology, allowing for robust predictions despite data exclusion.
  • Experiments with microgpt support the stability assumption and its applicability to powerful models.
Read more
Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making
Fan Zhaowen
Robotics Interpretability Reinforcement Learning
  • Introduces an event-centric framework for world modeling in autonomous agents.
  • Utilizes memory-augmented retrieval for decision-making based on prior experiences.
  • Ensures interpretability and consistency with physical constraints in decision-making.
  • Demonstrates effectiveness in real-time UAV flight scenarios.
Read more
Bi-Lipschitz Autoencoder With Injectivity Guarantee
Qipeng Zhan, Zhuoping Zhou, Zexuan Wang, Qi Long, Li Shen
Theory Optimization Generative Models
  • Introduces the Bi-Lipschitz Autoencoder (BLAE) to address non-injectivity in autoencoders.
  • Proposes an injective regularization scheme to eliminate local minima during optimization.
  • Implements a bi-Lipschitz relaxation to ensure geometric preservation and robustness to distribution shifts.
  • Demonstrates superior performance of BLAE over existing methods in preserving manifold structure.
Read more
From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures
Srinidhi Madabhushi, Pranesh Vyas, Swathi Vaidyanathan, Mayur Kurup, Elliott Nash, Yegor Silyutin
Graph Learning
  • Introduces a graph-based anomaly detection system for microservices.
  • Utilizes unsupervised learning with GCN-GAE for structural representation.
  • Achieves high precision and low false positive rates in anomaly detection.
  • Identifies the limitations of traditional load tests in simulating real traffic.
Read more
Spectral Edge Dynamics Reveal Functional Modes of Learning
Yongzhong Xu
Theory Interpretability Optimization
  • Identification of a spectral edge that distinguishes grokking from non-grokking regimes.
  • Standard interpretability methods fail to capture the spectral edge, indicating its non-localized nature.
  • Functional modes exhibit structured behavior in symmetry-adapted bases, revealing harmonic structures for simpler tasks.
  • Complex tasks require richer functional descriptions beyond simple harmonic bases.
Read more
Reinforcement Learning with Reward Machines for Sleep Control in Mobile Networks
Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp
Reinforcement Learning Optimization
  • Introduces a reinforcement learning framework that incorporates reward machines for sleep control in mobile networks.
  • Addresses the non-Markovian nature of QoS constraints by tracking historical performance through RMs.
  • Balances immediate energy savings with long-term QoS impacts, enhancing energy efficiency in telecommunications.
  • Demonstrates scalability and adaptability in complex network environments with varying traffic patterns.
Read more
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu
NLP Large Language Models Efficient ML
  • Introduces the Master Key Hypothesis for cross-model capability transfer.
  • Proposes Unlock, a training-free and label-free framework for capability extraction and transfer.
  • Demonstrates significant performance improvements in reasoning tasks without retraining.
  • Shows that capability transfer can approach the performance of post-trained models.
Read more
Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions
Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang, Hong Zhou
Reinforcement Learning Efficient ML Robotics
  • Introduction of the ANDROID COACH framework for online RL in Android agents.
  • Shift from Single State Single Action to Single State Multiple Actions paradigm to enhance exploration.
  • Utilization of a critic for action value estimation to reduce emulator interaction overhead.
  • Integration of a process reward model for improved supervision of agent actions.
Read more
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
Baihui Liu, Kaiyuan Tian, Wei Wang, Zhaoning Zhang, Linbo Qiao, Dongsheng Li
Large Language Models Efficient ML
  • Introduces the concept of activation budget for expert activations in MoE models.
  • Proposes Alloc-MoE, a unified framework optimizing budget allocation at both layer and token levels.
  • Alloc-L utilizes sensitivity profiling and dynamic programming for optimal layer-level activation allocation.
  • Alloc-T dynamically reallocates expert activations based on routing scores to enhance performance.
Read more
The Rhetoric of Machine Learning
Robert C. Williamson
Theory
  • Machine learning is inherently rhetorical, aiming to persuade rather than merely report facts.
  • The concept of 'manipulation as a service' highlights the commercial use of ML technologies.
  • Viewing ML through the lens of rhetoric can provide fresh insights and stimulate new discussions.
  • The paper critiques the notion of ML as a purely objective technology, emphasizing its socio-technical implications.
Read more
Mathematical analysis of one-layer neural network with fixed biases, a new activation function and other observations
Fabricio Macià, Shu Nakamura
Theory
  • Rigorous proof of convergence for one-hidden-layer neural networks with fixed biases using L2 loss and gradient descent.
  • Introduction of a new activation function, FReX, which maintains convergence properties similar to ReLU.
  • Establishment of the spectral bias property for the learning process.
  • Discussion on the representability of functions and the uniqueness of parameterization in the proposed models.
Read more
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
Andreas Plesner, Francisco Guzmán, Anish Athalye
Reinforcement Learning Large Language Models
  • RLVR is robust to noisy rewards, with noise rates up to 15% showing minimal impact on performance.
  • Imperfect verification does not fundamentally hinder RLVR effectiveness.
  • Precision in verification is more critical than recall.
  • Diminishing returns are observed when improving verifier accuracy beyond a certain point.
Read more
Benchmark Shadows: Data Alignment, Parameter Footprints, and Generalization in Large Language Models
Hongjian Zou, Yidan Wang, Qi Ding, Yixuan Liao, Xiaoxin Chen
Large Language Models NLP Multimodal
  • Introduces a regime-centric framework linking data distribution to learning dynamics in LLMs.
  • Demonstrates that benchmark-aligned data improves narrow metrics but limits broader capabilities.
  • Shows that coverage-expanding data leads to better generalization and parameter adaptation.
  • Presents parameter-space diagnostics to reveal structural signatures of training regimes.
Read more
Tree-of-Evidence: Efficient 'System 2' Search for Faithful Multimodal Grounding
Micky C. Nnamdi, Benoit L. Marteau, Yishan Zhong, J. Ben Tamo, May D. Wang
Multimodal Interpretability Optimization
  • Tree-of-Evidence (ToE) improves interpretability of Large Multimodal Models (LMMs) by framing it as a discrete optimization problem.
  • ToE employs lightweight Evidence Bottlenecks and a beam search strategy to identify compact evidence sets for model predictions.
  • The method retains over 98% predictive performance while significantly reducing the number of evidence units needed.
  • ToE adapts its search strategy based on the ambiguity of the evidence, effectively integrating both time-series and textual data.
Read more
Latent Structure of Affective Representations in Large Language Models
Benjamin J. Choi, Melanie Weber
NLP Large Language Models Interpretability
  • LLMs learn coherent latent representations of affective emotions that align with psychological valence-arousal models.
  • The representations exhibit modest nonlinearity, challenging the purely linear representation hypothesis.
  • The geometry of these representations can be leveraged to quantify uncertainty in emotion recognition tasks.
  • Complementary evidence includes parallels with human neural data and causal steering experiments for emotional valence.
Read more