AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning
Zhicong Lu, Zichuan Lin, Wei Jia, Changyuan Tian, Deheng Ye, Peiguang Li, Li Jin, Nayu Liu, Guangluan Xu, Wei Feng
Reinforcement Learning Large Language Models
  • Introduction of HISR for improved credit assignment in multi-turn reinforcement learning.
  • Segment-level process rewards align more effectively with sub-goals compared to traditional turn-level rewards.
  • Hindsight model captures action importance, enhancing the reliability of reward assignment.
  • Extensive experiments show HISR achieves state-of-the-art performance on multiple benchmarks.
Read more
Rigorous Error Certification for Neural PDE Solvers: From Empirical Residuals to Solution Guarantees
Amartya Mukherjee, Maxwell Fitzsimmons, David C. Del Rey Fernández, Jun Liu
Theory
  • Establishes a theoretical link between residual-based training objectives and solution-space error for PINNs.
  • Proves that vanishing residual error ensures convergence to the true solution under certain conditions.
  • Derives generalization bounds that can be computed without access to the true solution.
  • Demonstrates the applicability of the theoretical results through numerical experiments on various PDEs.
Read more
Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse
Dip Roy, Rajiv Misra, Sanjay Kumar Singh
Interpretability
  • Extreme neural network sparsification leads to catastrophic interpretability collapse.
  • Global representation quality remains stable while local interpretability degrades significantly.
  • The phenomenon of interpretability collapse is intrinsic to the sparsification process, not algorithm-specific.
  • Extended training does not recover dead neurons, indicating irreversibility of the collapse.
Read more
BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery
Sijian Fan, Liyan Xiong, Dayuan Wang, Guoshuai Cai, Ray Bai
Interpretability
  • BVSIMC improves predictive accuracy and interpretability in drug discovery by incorporating variable selection.
  • The model effectively handles high-dimensional and noisy side information using Bayesian variable selection techniques.
  • BVSIMC outperforms existing methods in predicting drug resistance and drug repositioning.
  • The approach reveals clinically meaningful side features that contribute to drug-disease interactions.
Read more
When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence
Chen Yaoling, Liang Hao, Tu Xiaotong
Federated Learning Theory Optimization
  • Introduces a precise characterization of privacy loss in DPWFL that converges to a constant.
  • Incorporates both device selection and mini-batch sampling in the analysis.
  • Establishes convergence guarantees for general non-convex objectives while considering gradient clipping.
  • Derives an explicit privacy-utility trade-off, improving upon existing methods.
Read more
Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction
Anh-Tuan Dao, Driss Matrouf, Mickael Rouvier, Nicholas Evans
Audio & Speech
  • Multi-corpus training can degrade performance in spoofing detection due to dataset-specific biases.
  • The proposed IDFE framework effectively minimizes corpus-specific information in embeddings.
  • A 20% reduction in average EER was achieved using the IDFE framework compared to baseline models.
  • The study emphasizes the need for improved generalization in spoofing detection systems.
Read more
A Family of Adaptive Activation Functions for Mitigating Failure Modes in Physics-Informed Neural Networks
Krishna Murari
Theory Optimization Efficient ML
  • Introduction of adaptive wavelet-based activation functions for PINNs.
  • Significant improvements in training stability and accuracy over traditional activation functions.
  • Evaluation across multiple PDE classes demonstrating robustness.
  • Validation against various models including PINNsFormer and other deep learning architectures.
Read more
MST-Direct: Matching via Sinkhorn Transport for Multivariate Geostatistical Simulation with Complex Non-Linear Dependencies
Tchalies Bachmann Schmitz
Optimization Theory Generative Models
  • MST-Direct preserves complex non-linear dependencies in multivariate geostatistical simulations.
  • The algorithm uses Optimal Transport theory and the Sinkhorn algorithm for direct distribution matching.
  • MST-Direct processes all variables simultaneously, enhancing computational efficiency.
  • Comprehensive experiments show 100% shape preservation across various complex relationship types.
Read more
Are complicated loss functions necessary for teaching LLMs to reason?
Gabriele Carrino, Andrea Sassella, Nicolo Brunello, Federico Toschi, Mark James Carman
Large Language Models Reinforcement Learning Optimization
  • Negative feedback is crucial for effective learning in LLMs.
  • PPO-style constraints are not necessary for improving mathematical reasoning.
  • RGRA, a simplified variant of GRPO, can outperform GRPO on reasoning tasks.
  • Simpler reinforcement learning methods can enhance reasoning in LLMs.
Read more
Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus
Mohamed Badi, Chaouki Ben Issaid, Mehdi Bennis
Federated Learning Multimodal
  • Introduction of CoMFed, a framework for multi-modal federated learning that enhances communication efficiency.
  • Utilization of learnable projection matrices to create compressed latent representations for heterogeneous clients.
  • Implementation of a robust alignment regularizer based on geometric-median consensus to improve resilience against outliers.
  • Demonstration of competitive accuracy in human activity recognition tasks with minimal communication costs.
Read more
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
Arundhathi Dev, Justin Zhan
NLP Large Language Models Optimization
  • AFBS-BO automates hyperparameter tuning for sparse attention, eliminating the need for manual grid search.
  • The framework achieves a 3.4× speedup in hyperparameter discovery and requires 8.8× fewer evaluations than traditional methods.
  • Configurations discovered by AFBS-BO outperform existing sparse attention methods while maintaining high model quality.
  • The approach leverages multi-fidelity evaluation to efficiently explore hyperparameter spaces.
Read more
Hierarchical Latent Structure Learning through Online Inference
Ines Aitsahalia, Kiyohito Iigaya
Theory Time Series Efficient ML
  • HOLMES integrates hierarchical representation with online inference for latent structure learning.
  • The model uses a nested Chinese Restaurant Process prior for dynamic latent tree construction.
  • HOLMES achieves compact representations that support efficient transfer learning.
  • It demonstrates improved predictive performance in context-dependent tasks compared to flat models.
Read more
Variational Phasor Circuits for Phase-Native Brain-Computer Interface Classification
Dibakar Sigdel
Theory Efficient ML Time Series
  • Introduction of the Variational Phasor Circuit (VPC) as a phase-native learning architecture.
  • VPC utilizes trainable phase shifts and local unitary mixing for BCI classification.
  • Demonstrated competitive accuracy with fewer parameters compared to traditional classifiers.
  • VPC serves as a bridge between classical oscillatory signal processing and quantum systems.
Read more
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
An Luo, Jin Du, Xun Xian, Robert Specht, Fangqiao Tian, Ganghua Wang, Xuan Bi, Charles Fleming, Ashish Kundu, Jayanth Srinivasa, Mingyi Hong, Rui Zhang, Tianxi Li, Galin Jones, Jie Ding
Multimodal
  • AI agents struggle with domain-specific reasoning and often perform poorly on tasks requiring specialized knowledge.
  • Human expertise is crucial for diagnosing issues, incorporating domain knowledge, and making strategic decisions.
  • Human-AI collaboration yields superior results compared to either humans or AI working independently.
  • The AgentDS benchmark provides a structured way to evaluate and improve human-AI collaboration in data science.
Read more
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
J. Clayton Kerce
Interpretability NLP Large Language Models
  • Introduces per-layer supervision to enhance modularity in transformer architectures.
  • Demonstrates that per-layer supervision leads to larger and more predictable ablation effects.
  • Establishes a feature engineering methodology that captures computational dynamics independent of vocabulary.
  • Shows that different tasks can route through different attention heads, indicating functional reorganization.
Read more
MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data
Masoumeh Shafieinejad, Xi He, Mahshid Alinoori, John Jewell, Sana Ayromlou, Wei Pang, Veronica Chatrath, Garui Sharma, Deval Pandya
Generative Models
  • The MIDST Challenge quantitatively evaluates the privacy of synthetic tabular data generated by diffusion models.
  • It introduces novel membership inference attacks tailored for complex tabular data.
  • The challenge encompasses both single-table and multi-table data synthesis scenarios.
  • The results aim to inform industry practices regarding privacy-preserving technologies.
Read more
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
Naoki Morihira, Amal Nahar, Kartik Bharadwaj, Yasuhiro Kato, Akinobu Hayashi, Tatsuya Harada
Reinforcement Learning Computer Vision Efficient ML
  • Introduces R2-Dreamer, a decoder-free MBRL framework that eliminates the need for data augmentation.
  • Utilizes a self-supervised redundancy-reduction objective to prevent representation collapse.
  • Achieves competitive performance on standard benchmarks and superior results on DMC-Subtle.
  • Trains significantly faster than existing models like DreamerV3.
Read more
Seasoning Generative Models for a Generalization Aftertaste
Hisham Husain, Valentin De Bortoli, Richard Nock
Generative Models Theory
  • Introduces a discriminator-guided recipe for refining generative models.
  • Establishes a strong duality result for f-divergences that enhances understanding of generative model training.
  • Demonstrates that refined generative models show improved generalization capabilities.
  • Connects theoretical insights to practical applications in score-based diffusion models.
Read more
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails
Gregory N. Frank
NLP Large Language Models Theory
  • Detection of concepts in language models is trivial and does not indicate alignment effectiveness.
  • Surgical ablation can effectively remove censorship mechanisms, leading to improved factual outputs.
  • Routing mechanisms are lab- and model-specific, affecting how concepts are expressed in outputs.
  • Refusal-based evaluations fail to capture the complexity of censorship, as models may still be heavily influenced by steering mechanisms.
Read more
Signals of Success and Struggle: Early Prediction and Physiological Signatures of Human Performance across Task Complexity
Yufei Cao, Penny Sweetser, Ziyu Chen, Xuanying Zhu
Multimodal
  • Early physiological signals can predict future performance outcomes in interactive tasks.
  • High performers show targeted gaze and stable cardiac activation under increasing task complexity.
  • The study achieved a balanced accuracy of 0.86 using an ocular-cardiac fusion model.
  • Physiological measures provide insights into cognitive processes and emotional states during task execution.
Read more
AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models
Chengxuan Lu, Shukuan Wang, Yanjie Li, Wei Liu, Shiji Jin, Fuyuan Qian, Peiming Li, Baigui Sun, Yang Liu
Reinforcement Learning Multimodal Robotics
  • AcceRL introduces a fully asynchronous and decoupled RL framework for VLA models.
  • The framework integrates a trainable world model to generate synthetic experiences, enhancing sample efficiency.
  • AcceRL achieves state-of-the-art performance on the LIBERO benchmark.
  • The architecture exhibits super-linear scaling in throughput and efficient hardware utilization.
Read more
Context Bootstrapped Reinforcement Learning
Saaket Agashe, Jayanth Srinivasa, Gaowen Liu, Ramana Kompella, Xin Eric Wang
Reinforcement Learning Large Language Models NLP
  • Introduction of Context Bootstrapped Reinforcement Learning (CBRL) to enhance RLVR.
  • CBRL uses a stochastic injection of few-shot demonstrations to improve exploration efficiency.
  • Demonstrated consistent performance improvements across multiple tasks and model families.
  • CBRL is algorithm-agnostic, yielding gains with different reinforcement learning algorithms.
Read more
Off-Policy Learning with Limited Supply
Koichi Tanaka, Ren Kishimoto, Bushun Kawagishi, Yusuke Narita, Yasuo Yamamoto, Nobuyuki Shimizu, Yuta Saito
Theory Optimization Reinforcement Learning
  • Conventional greedy OPL methods can lead to suboptimal performance in limited supply scenarios.
  • The paper introduces a novel method, OPLS, that focuses on relative expected rewards for better item allocation.
  • Theoretical proofs confirm the existence of superior policies under limited supply conditions.
  • Empirical results demonstrate OPLS's effectiveness over existing OPL methods in various datasets.
Read more
Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training
Wenshuo Wang, Fan Zhang
Optimization Time Series Theory
  • Introduces Gradient-Informed Temporal Sampling (GITS) for improved data selection in PDE surrogate training.
  • GITS optimizes local gradients and temporal coverage to enhance rollout accuracy.
  • Demonstrates superior performance over traditional sampling methods across multiple PDE systems.
  • Ablation studies validate the importance of GITS's dual optimization objectives.
Read more
Path-Constrained Mixture-of-Experts
Zijin Gu, Tatiana Likhomanenko, Vimal Thilak, Jason Ramapuram, Navdeep Jaitly
NLP Large Language Models Efficient ML
  • PathMoE constrains the expert path space by sharing router parameters across consecutive layers.
  • The method shows consistent performance improvements over conventional independent routing in language modeling tasks.
  • PathMoE eliminates the need for auxiliary load balancing losses while maintaining balanced expert utilization.
  • Improved cross-layer coordination leads to better specialization and robustness in routing.
Read more
SINDy-KANs: Sparse identification of non-linear dynamics through Kolmogorov-Arnold networks
Amanda A. Howard, Nicholas Zolman, Bruno Jacob, Steven L. Brunton, Panos Stinis
Theory Interpretability
  • SINDy-KANs combine KANs and SINDy to enhance model interpretability.
  • The framework allows for symbolic regression of function compositions.
  • SINDy-KANs enforce learning of parsimonious equations directly.
  • The method is validated through multiple symbolic regression tasks.
Read more
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
Yifan Zhang, Liang Zheng
Reinforcement Learning
  • Introduces RE-SAC framework to disentangle aleatoric and epistemic uncertainties in bus fleet control.
  • Employs IPM-based weight regularization to stabilize Q-value estimates against aleatoric risks.
  • Utilizes a diversified Q-ensemble to address epistemic risks and prevent overconfidence in sparse data regions.
  • Demonstrates superior performance and stability in simulations compared to standard DRL approaches.
Read more
Evaluating Model-Free Policy Optimization in Masked-Action Environments via an Exact Blackjack Oracle
Kevin Song
Reinforcement Learning Optimization Theory
  • Development of an exact dynamic programming oracle for blackjack, providing a rigorous benchmark for policy optimization.
  • Comparison of three model-free optimizers (REINFORCE, SPSA, CEM) in recovering optimal policies, with REINFORCE showing the best performance.
  • Significant cell-conditional regret observed across all methods, indicating persistent policy-level errors despite smooth reward convergence.
  • Establishment and empirical validation of the minimum-bet optimality theorem under no-count constraints.
Read more
Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control
Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman
Reinforcement Learning Time Series
  • Introduces an autoencoder-based mechanism for regime detection that adapts to market conditions.
  • Utilizes dual node transformer architecture for specialized processing of stable and volatile market states.
  • Employs a Soft Actor-Critic reinforcement learning controller for dynamic adjustment of regime detection thresholds.
  • Achieves a 26% reduction in MAPE and a 7 percentage point improvement in directional accuracy over baseline models.
Read more
Enhancing the Parameterization of Reservoir Properties for Data Assimilation Using Deep VAE-GAN
Marcio Augusto Sampaio, Paulo Henrique Ranazzi, Martin Julian Blunt
Generative Models
  • Introduces a hybrid VAE-GAN model to enhance data assimilation in reservoir simulations.
  • Addresses limitations of traditional ESMDA methods, particularly regarding finite ensemble sizes and Gaussian assumptions.
  • Demonstrates improved quality of reservoir descriptions and effective history matching through two case studies.
Read more
SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
Prince Zizhuang Wang, Shuli Jiang
Reinforcement Learning Large Language Models NLP
  • SLEA-RL retrieves experiences at each decision step based on current observations, unlike traditional methods that use static retrieval.
  • The framework includes a self-evolving experience library that maintains quality through score-based admission and rate-limited extraction.
  • Empirical results show SLEA-RL outperforms various reinforcement learning and experience-augmented baselines on multiple benchmarks.
  • The approach allows agents to adaptively leverage accumulated experiences, enhancing their learning and decision-making capabilities.
Read more
Frayed RoPE and Long Inputs: A Geometric Perspective
Davis Wertheimer, Aozhong Zhang, Derrick Liu, Penghang Yin, Naigang Wang
NLP Large Language Models Theory
  • RoPE causes performance degradation for long inputs due to the disruption of key/query cluster separation.
  • The concept of sink tokens is critical for preventing over-mixing of information in attention mechanisms.
  • RoPE-ID is proposed as a modification that maintains cluster separation and improves generalization to longer contexts.
  • Empirical validation shows that RoPE-ID outperforms prior tuning-free methods on long-context tasks.
Read more
Towards Differentiating Between Failures and Domain Shifts in Industrial Data Streams
Natalia Wojak-Strzelecka, Szymon Bobek, Grzegorz J. Nalepa, Jerzy Stefanowski
Time Series Interpretability
  • Introduces a method to differentiate between failures and domain shifts in industrial data streams.
  • Utilizes a modified Page-Hinkley changepoint detector for identifying changes in data distribution.
  • Incorporates supervised domain-adaptation algorithms for fast online anomaly detection.
  • Demonstrates the method's effectiveness through experiments on a steel factory dataset.
Read more
Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization
Takato Yasuno
NLP Large Language Models Efficient ML
  • Optimal training scale for domain-specific Japanese LMs is identified as 4,000 samples.
  • Llama-3 models with Japanese pre-training outperform multilingual models in technical domains.
  • Quantization effects vary by architecture, with Llama-3 models improving under Q4 quantization.
  • The study provides a complete reproducible pipeline for deploying QLoRA on consumer hardware.
Read more
Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans
Christian Di Maio, Tommaso Guidi, Luigi Quarantiello, Jack Bell, Marco Gori, Stefano Melacci, Vincenzo Lomonaco
NLP Large Language Models Theory
  • Introduction of TuringHotel as a distributed Turing Test framework.
  • Implementation on the UNaIVERSE platform, allowing for mixed communities of humans and AIs.
  • Findings indicate that current LLMs can sometimes be indistinguishable from humans, but human traits are still identifiable.
  • Advocacy for open AI practices to ensure transparency and public oversight in AI evaluations.
Read more
Towards Noise-Resilient Quantum Multi-Armed and Stochastic Linear Bandits
Zhuoyue Chen, Kechao Cai
Theory Optimization Efficient ML
  • Introduction of a noise-robust QMC algorithm (BQMC) for improved estimation in noisy environments.
  • Development of noise-resilient quantum bandit algorithms (NR-QUCB and NR-QLinUCB) that integrate BQMC.
  • Demonstration of logarithmic regret behavior under realistic noise conditions.
  • Extensive experimental validation showing improved performance across multiple noise models.
Read more
STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation
Chen Zhang, Liwei Liu, Jun Tao, Xiaoyu Yang, Xuenan Xu, Kai Chen, Bowen Zhou, Wen Wu, Chao Zhang
Time Series
  • STEP framework leverages cross-domain distillation to enhance scientific time-series representation learning.
  • Introduces adaptive patching and statistics compensation to handle diverse and extreme-length sequences.
  • Demonstrates the transferability of foundation models from related time series domains.
  • Achieves strong performance across various scientific time series tasks.
Read more
Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning
Jiaxin Liu
Reinforcement Learning Robotics Theory
  • Introduces Interventional Boundary Discovery (IBD) for causal identification in RL.
  • IBD uses the agent's actions as interventions to distinguish causal dimensions from confounded distractors.
  • Demonstrates that traditional observational methods can misidentify relevant features when distractors are present.
  • IBD closely tracks oracle performance and is effective across various RL algorithms.
Read more
Mathematical Foundations of Deep Learning
Xiaojing Ye
Theory Optimization Generative Models
  • Deep learning is fundamentally a mathematical enterprise, requiring a solid understanding of function approximation and optimization.
  • The book emphasizes the importance of theoretical guarantees and mathematical rigor in the design and training of neural networks.
  • Integration of deep learning with optimal control and reinforcement learning showcases its versatility and applicability in various fields.
  • The text serves as a bridge for readers from different backgrounds, offering insights into both the mathematical and practical aspects of deep learning.
Read more
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards
Haechan Kim, Soohyun Ryu, Gyouk Chu, Doohyuk Jang, Eunho Yang
Reinforcement Learning Large Language Models Efficient ML
  • Introduces Discounted Beta–Bernoulli (DBB) reward estimation to improve sample efficiency in RLVR.
  • Addresses issues of variance collapse and high estimation variance in existing group-based RLVR methods.
  • DBB leverages historical reward statistics, providing more stable training signals.
  • Empirical results show significant performance improvements on both in-distribution and out-of-distribution benchmarks.
Read more
From ex(p) to poly: Gaussian Splatting with Polynomial Kernels
Joerg H. Mueller, Martin Winter, Markus Steinberger
Computer Vision Efficient ML
  • Introduction of a polynomial kernel approximation for Gaussian Splatting that is computationally efficient and compatible with existing datasets.
  • Demonstration of significant performance improvements (4%-15%) with negligible impact on image quality.
  • Mathematical derivation proving invariance of anti-aliasing normalization factors for arbitrary kernel functions.
  • Methodology for fitting polynomial coefficients using an L1 loss with a sampling strategy tailored for practical rendering distributions.
Read more
Foundations of Schrödinger Bridges for Generative Modeling
Sophia Tang
Generative Models Theory Optimization
  • Introduces Schrödinger bridges as a unifying framework for generative modeling.
  • Develops mathematical foundations linking optimal transport and stochastic control.
  • Provides a comprehensive toolkit for constructing Schrödinger bridges.
  • Explores various applications of Schrödinger bridges in generative modeling.
Read more
Enactor: From Traffic Simulators to Surrogate World Models
Yash Ranjan, Rahul Sengupta, Anand Rangarajan, Sanjay Ranka
Generative Models Reinforcement Learning Robotics
  • Enactor utilizes a transformer-based architecture to model actor interactions in traffic simulations.
  • The model generates physically consistent trajectories over long time periods, addressing limitations of traditional methods.
  • It operates in a 'simulation-in-the-loop' framework, allowing for real-time control of actor dynamics.
  • Enactor requires fewer training samples than traditional agent-centric generative approaches.
Read more
FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra
Jianan Nie, Peng Gao
Generative Models Graph Learning
  • FlowMS is the first discrete flow matching framework for spectrum-conditioned molecular generation.
  • It achieves state-of-the-art performance on 5 out of 6 metrics on the NPLIB1 benchmark.
  • The model enforces chemical formula constraints during generation, enhancing structural plausibility.
  • FlowMS demonstrates a 9.15% top-1 accuracy, representing a 9.7% improvement over the previous best model.
Read more
Enhancing Pretrained Model-based Continual Representation Learning via Guided Random Projection
Ruilin Li, Heming Zou, Xiufeng Yan, Zheming Liang, Jie Yang, Chenliang Li, Xue Yang
Theory Efficient ML Optimization
  • Introduction of SCL-MGSM, a method that enhances RPL construction using a data-guided approach.
  • MGSM selects informative and non-redundant random bases, improving expressivity without high dimensionality.
  • Theoretical convergence analysis supports the stability of the proposed method during updates.
  • Extensive experiments show SCL-MGSM outperforms existing methods in exemplar-free CIL benchmarks.
Read more
Position: Spectral GNNs Are Neither Spectral Nor Superior for Node Classification
Qin Jiang, Chengjia Wang, Michael Lones, Dongdong Chen, Wei Pang
Graph Learning Theory
  • Spectral GNNs are theoretically flawed and do not represent true spectral mechanisms.
  • Graph Fourier bases used in Spectral GNNs lack the properties of classical Fourier bases.
  • Polynomial approximations in Spectral GNNs can exactly interpolate spectral responses, challenging their theoretical justification.
  • The performance of GCNs is attributed to message-passing dynamics rather than spectral filtering.
Read more
Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
Sahil Tyagi, Feiyi Wang
Optimization Efficient ML Computer Vision
  • Tula optimizes large-batch training by balancing time, cost, and model quality.
  • The service predicts training time and cost with an accuracy of 7.5-14% across multiple models.
  • It achieves up to 20× speedup and improves test accuracy by approximately 9% on average.
  • A gradient-scaling technique is introduced to mitigate the generalization gap associated with large batches.
Read more
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
Youjin Wang, Jiaqiao Zhao, Rong Fu, Run Zhou, Ruizhe Zhang, Jiani Liang, Suisuai Cao, Feng Zhou
Efficient ML Computer Vision NLP
  • InfoMamba integrates a lightweight global aggregation pathway with a selective recurrent pathway.
  • The architecture replaces traditional self-attention with a concept-bottleneck linear filtering layer.
  • Information-Maximizing Fusion (IMF) dynamically injects global context into local SSM dynamics.
  • Extensive experiments show superior performance compared to existing Transformer and SSM models.
Read more