AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

68 Papers today
8h Update frequency
7 Days of history
Multimodal Graph-based Classification of Esophageal Motility Disorders
Alexander Geiger, Lars Wagner, Daniel Rueckert, Alois Knoll, Dirk Wilhelm, Alissa Jell
Multimodal Graph Learning
  • Proposes a multimodal ML approach combining HRIM data with patient-specific information.
  • Uses graph-based modeling to represent HRIM data, enhancing the analysis of esophageal motility.
  • Demonstrates improved classification accuracy over traditional methods and vision-based classifiers.
  • Highlights the importance of integrating multiple data modalities for better diagnostic outcomes.
Read more
HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts
Tao Zhong, Dongzhe Zheng, Christine Allen-Blanchette
Efficient ML Large Language Models Theory
  • Introduces HodgeCover, a learning-free expert-selection method based on higher-order topological structures.
  • Identifies the harmonic kernel of the simplicial Laplacian as a key component for expert mergeability.
  • Demonstrates that HodgeCover can achieve significant expert reduction while maintaining or improving model accuracy.
  • Presents a hybrid approach (HodgeCover+Wanda) that combines expert selection with weight pruning for enhanced compression.
Read more
What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions
Shuqi Gu, Yongxiang Zhao, Baoyu Jing, Kan Ren
Time Series Multimodal
  • Introduction of counterfactual time series forecasting with textual conditions.
  • Development of the TADIFF model that utilizes a text-attribution mechanism.
  • Creation of a comprehensive evaluation framework for factual and counterfactual settings.
  • Implementation of counterfactual data augmentation to improve model adaptability.
Read more
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
Cristian Hinostroza, Rodrigo Toro Icarte, Christ Devia, Andres Carvallo De Ferari, Eugenio Herrera-Berg, Denis Parra, Jorge F Silva
NLP Large Language Models Interpretability
  • Cosine similarity is an unreliable metric for assessing layer relevance in LLMs.
  • The correlation between cosine similarity and performance degradation is often weak or moderate.
  • A proposed alternative metric based on actual accuracy drop offers a more accurate assessment of layer importance.
  • Empirical results show that significant performance can be maintained even after removing a substantial number of layers.
Read more
Proposal and study of statistical features for string similarity computation and classification
E.O. Rodrigues, D. Casanova, M. Teixeira, V. Pegorini, F. Favarim, E. Clua, A. Conci, Panos Liatsis
NLP
  • Introduction of COM and RLM features for string similarity computation.
  • Features are language-agnostic and purely statistical.
  • COM and RLM outperform traditional statistical measures in synthetic experiments.
  • RLM features achieve the best results in a real text plagiarism dataset.
Read more
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
Qazi Mamunur Rashid, Xuan Yang, Zhengzhe Yang, Yanzhou Pan, Erin van Liemt, Darlene Neal, Kshitij Pancholi, Jamila Smith-Loud
NLP Large Language Models Generative Models
  • Introduction of NodeSynth, a methodology for generating socially aligned synthetic data for AI evaluation.
  • NodeSynth significantly outperforms human-authored benchmarks in eliciting model failures.
  • The methodology utilizes a fine-tuned taxonomy generator (TaG) grounded in real-world evidence.
  • Ablation studies confirm the importance of granular taxonomic depth in identifying model vulnerabilities.
Read more
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
Yuxin Guo, Yihao Yue, Yunhao Ni, Yizhou Ruan, Jie Luo, Wenjun Wu, Lei Huang
Efficient ML
  • Introduces a framework for replacing Layer Normalization with RMSNorm in DNNs.
  • Defines 'foldable LNs' and develops a graph-based detection algorithm.
  • Achieves 2% to 12% inference-time acceleration without changing model predictions.
  • Maintains competitive performance compared to standard Layer Normalization in practical training settings.
Read more
Contextual Bandits for Resource-Constrained Devices using Probabilistic Learning
Marco Angioli, Kevin Johansson, Antonello Rosato, Amy Loutfi, Denis Kleyko
Reinforcement Learning Efficient ML Theory
  • Introduces probabilistic HD-CB, a low-precision variant of HD-CB for resource-constrained devices.
  • Replaces deterministic accumulation with a probabilistic update rule to enhance decision-making efficiency.
  • Demonstrates improved performance over binarized HD-CB while maintaining low precision.
  • Addresses the overflow issue in low-precision components without the need for periodic binarization.
Read more
Discovery of Hidden Miscalibration Regimes
Katarzyna Kobalczyk, Mihaela van der Schaar
Large Language Models NLP Interpretability
  • Introduces the concept of hidden miscalibration regimes that are not detectable through traditional calibration methods.
  • Defines an input-dependent miscalibration field to measure calibration error across the input space.
  • Demonstrates the prevalence of calibration heterogeneity in large language models across various datasets.
  • Provides a diagnostic framework that supports local confidence corrections, enhancing model reliability.
Read more
Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise
Puyu Wang, Jan Schuchardt, Nikita Kalinin, Junyu Zhou, Sophie Fellenz, Christoph Lampert, Marius Kloft
Theory Optimization
  • First population risk bounds for KANs trained with mini-batch SGD and correlated noise.
  • Establishes bounds for both non-private and differentially private settings.
  • Introduces a novel analysis framework for correlated-noise DP training in non-convex regimes.
  • Demonstrates that correlated noise can improve the privacy-utility tradeoff compared to independent noise.
Read more
Reliability-Gated Source Anchoring for Continual Test-Time Adaptation
Vikash Singh, Debargha Ganguly, Weicong Chen, Sabyasachi Sahoo, Sreehari Sankar, Biyao Zhang, Mohsen Hariri, Shouren Wang, Osama Zafar, Christian Gagné, Vipin Chaudhary
Computer Vision Theory Optimization
  • Identification of 'blind anchoring' as a systematic failure in CTTA methods when relying on unreliable sources.
  • Introduction of RMEMSAFE, which gates source-coupled uses based on a runtime reliability signal derived from source entropy.
  • Demonstration of an analytical graceful-decay property, ensuring performance stability as source reliability decreases.
  • Empirical validation showing RMEMSAFE outperforms existing methods across multiple benchmarks and degradation scenarios.
Read more
Understanding Imbalanced Forgetting in Rehearsal-Based Class-Incremental Learning
Alberto Tamajo, Srinandan Dasmahapatra, Rahman Attar
Theory
  • Imbalanced forgetting is a systematic issue in rehearsal-based CIL, leading to unequal class retention.
  • Three last-layer coefficients were developed to quantify gradient-level interference affecting class performance.
  • Self-induced interference is identified as the most significant predictor of class forgetting.
  • The study provides a mechanistic understanding of how rehearsal impacts class retention in CIL.
Read more
Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks
Solomiia Kurchaba, Angela Meyer
Time Series
  • Introduces a novel deep learning model for downscaling LST from geostationary to high-resolution satellite data.
  • Achieves high accuracy in LST forecasting with low RMSE and bias errors.
  • Demonstrates the applicability of the model across major European cities.
  • Provides a framework for intraday LST nowcasting, enhancing urban climate studies.
Read more
Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction
Daniel Asare Kyei, Alimatu Saadia-Yussiff, Maame G. Asante-Mensah, Abdul Lateef-Yussiff, Charles Roland Haruna, Derry Emmanuel
Optimization Time Series
  • Introduction of DBS-Adam, a novel optimiser for deep learning.
  • DBS-Adam dynamically adjusts learning rates based on batch difficulty.
  • Integration with Bi-LSTM networks improves prediction of injury severity.
  • Significant performance improvements over traditional optimisers.
Read more
TAPIOCA: Why Task-Aware Pruning Improves OOD model Capability
Krish Sharma, Omar Naim, Soumadeep Saha, Nicholas Asher
Large Language Models NLP Efficient ML
  • Task-aware pruning significantly improves performance on OOD data but not on ID data.
  • OOD inputs induce a mismatch in representation geometry compared to ID inputs.
  • Certain layers can amplify distortions in representation, affecting model performance based on input distribution.
  • Pruning layers that amplify mismatches can realign OOD representations with the adapted geometry.
Read more
EvolveMem: Self-Evolving Memory Architecture via AutoResearch for LLM Agents
Jiaqi Liu, Xinyu Ye, Peng Xia, Zeyu Zheng, Cihang Xie, Mingyu Ding, Huaxiu Yao
Large Language Models NLP Optimization
  • EVOLVEMEM allows for self-evolution of both memory content and retrieval mechanisms.
  • The architecture employs a closed-loop diagnosis system powered by LLMs to optimize retrieval configurations.
  • The AutoResearch process enables the system to autonomously improve its performance without manual tuning.
  • EVOLVEMEM shows significant performance improvements over strong baselines on multiple benchmarks.
Read more
Learning with Shallow Neural Networks on Cluster-Structured Features
Elisabetta Cornacchia, Laurent Massoulié
Theory Efficient ML Optimization
  • Introduces a model for learning with shallow neural networks on clustered, correlated features.
  • Demonstrates that sample complexity can be independent of input dimension in high SNR regimes.
  • Proposes a layerwise gradient descent method that leverages correlations among input features.
  • Empirical tests support theoretical claims using synthetic and real-world data.
Read more
Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence
Mashrekur Rahman
Multimodal Computer Vision Efficient ML
  • Mini-JEPAs achieve high accuracy in predicting environmental variables specific to their satellite sensors.
  • The fleet of Mini-JEPAs demonstrates distinct embedding manifold geometries, reflecting the physics of their respective sensors.
  • A routing agent effectively selects the appropriate Mini-JEPA for specialized hydrologic questions, enhancing retrieval performance.
  • Mini-JEPAs provide a cost-effective alternative to large-scale foundation models for hydrologic intelligence applications.
Read more
SeesawNet: Towards Non-stationary Time Series Forecasting with Balanced Modeling of Common and Specific Dependencies
Hao Li, Lu Zhang, Liu Chong, Yankai Chen, Pengyang Wang, Yingjie Zhou
Time Series
  • SeesawNet addresses the challenge of balancing common and instance-specific dependencies in non-stationary time series forecasting.
  • The architecture utilizes Adaptive Stationary–Nonstationary Attention (ASNA) for dynamic dependency modeling.
  • Incorporates specialized layers for temporal and cross-channel dependency learning.
  • Demonstrates superior performance compared to state-of-the-art forecasting methods on real-world datasets.
Read more
Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions
Qirui Liu, Hao Chen, Weijie Shi, Jiajie Xu, Jia Zhu
NLP Efficient ML Interpretability
  • Introduces a two-stage knowledge distillation framework for classifying student misconceptions.
  • Addresses challenges of data scarcity, annotation noise, and model deployment paradox.
  • Utilizes cognitive uncertainty to identify critical samples for improved model training.
  • Achieves significant performance improvements over existing large models with a lightweight approach.
Read more
Active Learners as Efficient PRP Rerankers
Jeremías Figueiredo Paschmann, Juan Kaplan, Francisco Nattero, Santiago Mauricio Barron Bucolo, Juan Wisznia, Luciano del Corro
NLP Large Language Models Efficient ML
  • Active learning can improve the efficiency of PRP reranking by adaptively selecting comparisons.
  • A randomized-direction oracle reduces the cost of pairwise comparisons by halving the number of calls needed.
  • Active rankers significantly outperform traditional sorting algorithms in terms of NDCG@10 under call constraints.
  • The proposed methods maintain robustness against noise and position bias in LLM judgments.
Read more
Strategic PAC Learnability via Geometric Definability
Yuval Filmus, Shay Moran, Elizaveta Nesterova, Nir Rosenfeld, Alexander Shlimovich
Theory
  • Strategic behavior can significantly impact the learnability of hypothesis classes.
  • The authors provide a counterexample showing that learnability is not preserved under strategic behavior in simple cases.
  • Introducing geometric definability allows for the preservation of learnability and manageable sample complexity.
  • The framework accommodates a variety of cost functions and hypothesis classes commonly used in machine learning.
Read more
R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning
Sanghyeob Song, Donghyeok Lee, Jinsik Kim, Sungroh Yoon
Reinforcement Learning Robotics Theory
  • Introduction of R2R2, a regularization method that reduces representation-level instability in SPL.
  • Theoretical analysis reveals the conflict between zero-centering and SPL's spectral properties.
  • Integration of SPL into the SimbaV2 architecture, creating SimbaV2-SPL, which achieves state-of-the-art performance.
  • R2R2 improves TD7 performance by approximately 22% at high UTD ratios.
Read more
MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse
Donghwan Kim, Hyunsoo Yoon
Theory Computer Vision
  • Identification of a structural asymmetry in class-wise Mahalanobis distances between ID and OOD samples.
  • Theoretical grounding of the observation in Neural Collapse geometry, linking variance to OOD detection.
  • Introduction of MahaVar, an effective post-hoc OOD detector that incorporates class-wise distance variance.
  • MahaVar achieves state-of-the-art performance on CIFAR-100 and ImageNet benchmarks.
Read more
A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning
Jason Gaitonde, Frederic Koehler, Elchanan Mossel, Joonhyung Shin, Allan Sly
NLP Large Language Models Theory
  • Introduces synthetic languages with hierarchical structures for precise analysis of context and reasoning in autoregressive generation.
  • Derives explicit asymptotic predictions for distributional statistics in two broadcast process settings.
  • Establishes a lower bound on context length for faithful sampling and demonstrates an exponential improvement using reasoning models.
  • Empirical results validate theoretical predictions, showing the relationship between context size and model performance.
Read more
The Rate-Distortion-Polysemanticity Tradeoff in SAEs
Tommaso Mencattini, Francesco Montagna, Francesco Locatello
Theory Interpretability
  • Introduction of the Rate-Distortion-Polysemanticity tradeoff in Sparse Autoencoders.
  • Theoretical and empirical evidence that enforcing monosemanticity increases rate and distortion.
  • Polysemanticity is determined by the co-occurrence patterns of features in the training data.
  • Development of necessary conditions for evaluating polysemanticity measures in real-world applications.
Read more
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
Chien Van Nguyen, Chaitra Hegde, Van Cuong Pham, Ryan A. Rossi, Franck Dernoncourt, Thien Huu Nguyen
NLP Large Language Models Efficient ML
  • Introduces Orthrus, a dual-architecture framework that combines autoregressive and diffusion models.
  • Achieves up to 7.8× speedup in token generation while maintaining exact predictive fidelity.
  • Utilizes a shared Key-Value cache to eliminate redundant memory usage.
  • Incorporates a consensus mechanism for lossless inference.
Read more
Support Before Frequency in Discrete Diffusion
Adrian Müller, Antoine Gonon, Zebang Shen, Ya-Ping Hsieh, Niao He
NLP Large Language Models Generative Models
  • DLMs first learn the structure of valid sequences (support) before refining the probabilities of these sequences (frequency).
  • The reverse edit probabilities can be decomposed into support and frequency components, influenced by the corruption mechanism.
  • Uniform diffusion shows a trichotomy of edits, while absorbing diffusion focuses on validity-improving moves.
  • Experiments demonstrate that support localization emerges before frequency ranking in DLMs.
Read more
Artificial Intelligence-Assistant Cardiotocography: Unified Model for Signal Reconstruction, Fetal Heart Rate Analysis, and Variability Assessment
Xiaohua Wang, Kai Yu, XuXiao Liang, Liang Wang, Chao Han
Time Series
  • Development of a unified AI model for FHR monitoring that addresses traditional method limitations.
  • High sensitivity and specificity in detecting critical FHR changes, improving clinical decision-making.
  • Utilization of a large dataset for training and validation, enhancing model robustness.
  • Introduction of the IOL approach for more accurate categorical analysis of FHR data.
Read more
bde: A Python Package for Bayesian Deep Ensembles via MILE
Vyron Arvanitis, Angelos Aslanidis, Emanuel Sommer, David Rügamer
Optimization Theory Efficient ML
  • bde provides a user-friendly implementation of Bayesian Deep Ensembles using MILE.
  • The package integrates seamlessly with scikit-learn, enhancing accessibility for practitioners.
  • It offers robust uncertainty quantification metrics, crucial for modern machine learning applications.
  • Benchmarks show bde's competitive performance in predictive accuracy and uncertainty estimation.
Read more
Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines
Katherine Lambert, Sasha Luccioni
Large Language Models Efficient ML
  • Introduces a comprehensive energy accounting framework for distillation pipelines.
  • Highlights the significant teacher-side energy costs often ignored in previous studies.
  • Provides empirical measurements and comparisons of energy use across different distillation methods.
  • Establishes design rules for selecting distillation methods based on energy and budget constraints.
Read more
Architecture-Aware Explanation Auditing for Industrial Visual Inspection
Sibo Jia, Zihang Zhao, Kunrong Li
Computer Vision Interpretability
  • Introduces an architecture-aware explanation audit protocol based on the native-readout hypothesis.
  • Demonstrates that explanation methods' faithfulness is influenced by their structural alignment with the model's decision mechanism.
  • Finds that ViT-Tiny + Attention Rollout, despite lower accuracy, provides more faithful explanations than other models.
  • Highlights the importance of co-designing explanation pathways with model architectures.
Read more
Peng's Q(λ) for Conservative Value Estimation in Offline Reinforcement Learning
Byeongchan Kim, Min-hwan Oh
Reinforcement Learning Theory Optimization
  • CPQL is the first multi-step Q-learning algorithm for model-free offline reinforcement learning.
  • The algorithm effectively mitigates over-pessimistic value estimation without requiring additional models or networks.
  • Theoretical analyses guarantee that CPQL's learned policy performs at least as well as the behavior policy.
  • Extensive experiments show CPQL consistently outperforms existing offline single-step algorithms.
Read more
Uncertainty-Aware Prediction of Lung Tumor Growth from Sparse Longitudinal CT Data via Bayesian Physics-Informed Neural Networks
Lingfei Kong, Haoran Ma
Time Series
  • Introduces a Bayesian physics-informed framework for tumor growth prediction under sparse CT data.
  • Combines mechanistic Gompertz constraints with probabilistic inference for improved prediction accuracy.
  • Utilizes a two-stage inference procedure for stable posterior inference and efficient sampling.
  • Demonstrates the model's capability to provide calibrated uncertainty estimates alongside predictions.
Read more
WriteSAE: Sparse Autoencoders for Recurrent State
Jack Young
NLP Large Language Models Theory
  • WriteSAE is the first sparse autoencoder that effectively addresses matrix cache write operations in recurrent language models.
  • The method allows for closed-form predictions of logit shifts, achieving high accuracy (R² = 0.98).
  • Substitution of learned rank-1 atoms consistently outperforms traditional matched-norm ablation tests.
  • WriteSAE demonstrates significant improvements in performance metrics, including a 3× lift in midrank target-in-continuation tasks.
Read more
Tight Sample Complexity Bounds for Entropic Best Policy Identification
Amer Essakine, Claire Vernade
Reinforcement Learning Theory
  • Introduces a new lower bound for best policy identification in risk-sensitive reinforcement learning.
  • Develops the Entropic-BPI algorithm that achieves optimal sample complexity.
  • Improves concentration bounds for exponential utilities, enhancing exploration strategies.
  • Demonstrates that the maximal achievable reward Gmax is a better metric for sample complexity than the horizon H.
Read more
Multi-Quantile Regression for Extreme Precipitation Downscaling
Hamed Najafi, Gareth Lagerwall, Jayantha Obeysekera, Jason Liu
Time Series Generative Models Theory
  • Q-SRDRN significantly improves detection rates of extreme precipitation events compared to traditional methods.
  • The use of pinball loss allows for better handling of heavy-tail distributions in precipitation data.
  • Data augmentation through cVAE is beneficial when aligned with the model architecture and regional characteristics.
  • The architecture shows strong performance across diverse climatic conditions, indicating its robustness.
Read more
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
Sinjae Kang, Chanyoung Kim, Kaixin Wang, Li Zhao, Kimin Lee
Robotics Generative Models Reinforcement Learning
  • WarmPrior replaces the standard Gaussian source with a temporally grounded prior, improving robotic manipulation success rates.
  • The method includes two variants: WP-Past and WP-Preview, which leverage recent action history for better performance.
  • WarmPrior enhances sample efficiency and final performance in prior-space reinforcement learning.
  • Empirical results show significant improvements over traditional methods, especially in complex tasks.
Read more
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization
Yang Bai, Kaiyuan Liu, Ziyuan Zhuang, Jiahong Zhou, Rongxiang Weng, Xin Chen, Jingang Wang, Xunliang Cai
Reinforcement Learning Large Language Models NLP
  • Introduction of Reward-Decorrelated Policy Optimization (RDPO) for stabilizing multi-objective reinforcement learning.
  • Utilization of Magnitude-Aware Quantile Normalization and Mahalanobis whitening to address reward heterogeneity and correlation.
  • Demonstrated improvements in model performance on instruction following and writing quality through RDPO.
  • Introduction of Effective Information Efficiency (ηeff) as a metric for assessing mixed-reward aggregation quality.
Read more
TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes
Ruizhe Liu, Jiaqi Luo
Theory Optimization Efficient ML
  • TILBench evaluates over 40 imbalanced learning algorithms across 57 datasets, resulting in extensive empirical insights.
  • The effectiveness of imbalanced learning methods varies significantly based on dataset characteristics such as sample size and imbalance severity.
  • No single method is universally superior; practical recommendations for method selection are provided based on data properties.
  • The benchmark assesses not only predictive performance but also computational scalability and efficiency.
Read more
Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
Scott Ye, Harlin Lee
Multimodal Time Series Generative Models
  • Introduces a novel approach to analyze session-wide trajectories in pediatric PSG embeddings.
  • Applies persistent homology to characterize topological features of sleep data.
  • Demonstrates that augmenting embeddings with clinical EHR data improves predictive performance.
  • Shows significant improvements in AUPRC for various sleep event detection tasks.
Read more
Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic
Leo Muxing Wang, Pengkun Yang, Lili Su
Reinforcement Learning Federated Learning Robotics
  • Introduction of a federated actor-critic framework that supports personalized policy training.
  • Establishment of finite-time convergence rates for critic error and policy gradient norms.
  • Development of a new perturbation analysis to handle complexities in heterogeneous environments.
  • Experimental validation showing improved performance over traditional federated learning methods.
Read more
LoMETab: Beyond Rank-1 Ensembles for Tabular Deep Learning
Changryeol Choi, Hyewon Park, Yujin Kwon, Gowun Jeong
Theory Efficient ML
  • LoMETab introduces a rank-r multiplicative implicit ensemble framework for tabular MLPs.
  • The model allows for member-specific deviations from a shared weight, enhancing diversity control.
  • Empirical results indicate that LoMETab sustains higher predictive diversity compared to traditional methods.
  • The framework provides practical trade-offs among rank, ensemble size, and initialization scale.
Read more
Separating Shortcut Transition from Cross-Family OOD Failure in a Minimal Model
Hongmin Li
Theory
  • Introduces a minimal binary model to study shortcut features and OOD failure.
  • Demonstrates that training-side observations can indicate potential cross-family failures.
  • Establishes that positive training shortcut correlation and shortcut-rule transitions are distinct phenomena.
  • Shows that the same training solution can yield different outcomes depending on the held-out family.
Read more
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization
Leena Chennuru Vankadara, Moritz Haas, Luke Hayward, Sebastian Bordt, Alessandro Breccia
NLP Large Language Models Theory
  • Introduces a novel Dynamical Mean Field Theory (DMFT) for analyzing MoE training dynamics.
  • Identifies limitations of the Maximal Update Parameterization (µP) in achieving stable learning-rate transfer.
  • Proposes the Maximally Scale-Stable Parameterization (MSSP) to enhance stability and performance across scaling regimes.
  • Empirical results demonstrate that MSSP outperforms µP in terms of learning-rate transfer and monotonic improvement.
Read more
Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks
Kai Sun, Peibo Duan, Yongsheng Huang, Guowei Zhang, Benjamin Smith, Nanxu Gong, Levin Kuhlmann
Efficient ML Theory Time Series
  • Introduces SeAl-KD, a selective knowledge distillation framework for SNNs.
  • Highlights the mismatch between intermediate and final predictions in SNNs.
  • Proposes Error-aware Logit Alignment (ELA) and Selective Temporal Alignment (STA) for improved supervision.
  • Demonstrates significant performance improvements on various datasets.
Read more
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations
Zhonghao Li, Chaoyu Liu, Qian Zhang
Efficient ML Generative Models Theory
  • Di-BiLPS effectively addresses both forward and inverse PDE problems under extreme data sparsity.
  • The framework utilizes a combination of variational autoencoders, latent diffusion models, and contrastive learning.
  • It achieves state-of-the-art performance with significantly reduced computational costs.
  • The proposed denoising algorithm integrates physical constraints for improved inference.
Read more
Scaling Laws for Mixture Pretraining Under Data Constraints
Anastasiia Sedova, Skyler Seto, Natalie Schluter, Pierre Ablin
NLP Large Language Models Optimization
  • Mixture training allows for higher repetition of target data compared to single-source training.
  • Optimal repetition rates for target data range from 15 to 20 times, depending on various factors.
  • A new scaling law is introduced that predicts target-domain loss based on mixture configurations.
  • Empirical findings demonstrate that larger models can extract more from limited data despite faster overfitting.
Read more
RISED: A Pre-Deployment Safety Evaluation Framework for Clinical AI Decision-Support Systems
Rohith Reddy Bellibatlu
Theory Interpretability
  • RISED Framework introduces a five-dimension evaluation for clinical AI systems.
  • Framework identifies critical deployment risks not captured by traditional metrics.
  • Validation across multiple cohorts shows varying failure patterns, supporting construct validity.
  • Equity dimension highlights the need for independent measures of clinical need.
Read more
Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations
Bardh Hoxha, Oliver Schön, Hideki Okamoto, Lars Lindemann, Georgios Fainekos
Computer Vision Robotics Theory
  • Introduction of a semantic basis as a minimal reusable interface for monitoring ptSTL fragments.
  • Development of a rolling prediction monitor that updates predicate values online, improving learning efficiency.
  • Demonstration of compositional conformal certification that allows simultaneous certification of multiple formulas.
  • Empirical validation on real-world data showing effectiveness and tightness of certified bounds.
Read more
GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction
Yifan Duan, Siyuan Zheng, Lihuan Li, Chao Xue, Flora Salim
Multimodal Time Series Optimization
  • GHGbench is the first open dataset and benchmark for joint evaluation of company and building-level carbon emissions.
  • Building emissions are structurally more difficult to predict compared to company emissions due to additional influencing factors.
  • The in-distribution to out-of-distribution performance gap is larger than within-model variations.
  • Multimodal remote-sensing embeddings significantly improve prediction accuracy in challenging scenarios.
Read more
Language-Induced Priors for Domain Adaptation
Qiyuan Chen, Jiayu Zhou, Raed Al Kontar
NLP Large Language Models Reinforcement Learning
  • Introduction of Language-Induced Prior (LIP) for source relevance in domain adaptation.
  • Integration of LIP into a Bayesian hierarchical model using Expectation-Maximization (EM) for improved performance.
  • Theoretical guarantees validate the effectiveness of the proposed framework.
  • Empirical results demonstrate superior performance in various tasks, especially under data scarcity.
Read more
DeepTokenEEG Enhancing Mild Cognitive Impairment and Alzheimers Classification via Tokenized EEG Features
Thinh Nguyen-Quang, Minh Long Ngo, Ngoc-Son Nguyen, Nguyen Thanh Vinh, Huy-Dung Han, Bui Thanh Tung, Nguyen Quang Linh, Khuong Vo, Manoj Vishwanath, Hung Cao
Time Series Efficient ML
  • Introduction of DeepTokenEEG, a lightweight model for EEG-based AD classification.
  • Utilization of tokenization to enhance feature extraction from EEG signals.
  • Achieved 100% accuracy on specific frequency bands, surpassing existing methods.
  • Constructed a large-scale dataset for comprehensive benchmarking.
Read more
Not All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communication
Albert Shaju, Christo Kurisummoottil Thomas, Mayukh Roy Chowdhury
Reinforcement Learning Generative Models Optimization
  • Introduces a joint semantic-physical layer framework for communication systems.
  • Develops a learned semantic-aware M-QAM constellation that prioritizes task-relevant symbols.
  • Proposes novel metrics (SSV and SPP) to evaluate the protection of semantically important information.
  • Demonstrates significant improvements in semantic quality and compression ratios compared to traditional methods.
Read more
A Systematic Evaluation of Imbalance Handling Methods in Biomedical Binary Classification
Jiandong Chen, Lingjie Su, Le Peng, Yash Travadi, Rui Zhang, Ju Sun
Optimization Multimodal
  • IHMs have varying effectiveness based on model complexity and data modality.
  • ROS and RW consistently improve performance in complex models.
  • RUS and SMOTE generally degrade performance and are not recommended.
  • Direct F1-score optimization is beneficial mainly for unstructured data.
Read more
EMO: Frustratingly Easy Progressive Training of Extendable MoE
Linghao Jin, Chufan Shi, Huijuan Wang, Nuan Wen, Zhengzhong Liu, Eric Xing, Xuezhe Ma
Large Language Models Efficient ML
  • EMO allows for progressive expansion of the expert pool during training, improving efficiency.
  • The framework is based on a sparsity scaling law that optimizes token allocation across training stages.
  • EMO matches or exceeds the performance of fixed-expert models while reducing training time and costs.
  • The approach leverages the principle that MoE capacity should grow with data availability.
Read more
Spectral Energy Centroid: a Metric for Improving Performance and Analyzing Spectral Bias in Implicit Neural Representations
Tomasz Dądela, Adam Kania, Maciej Rut, Przemysław Spurek
Computer Vision Generative Models Theory
  • Introduces the Spectral Energy Centroid (SEC) as a metric for analyzing spectral bias in INRs.
  • Proposes a data-driven hyperparameter selection strategy (SEC-Conf) that outperforms existing methods.
  • Demonstrates that SEC serves as a reliable proxy for signal complexity and reconstruction quality.
  • Reveals the significant impact of model depth on spectral bias and INR performance.
Read more
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
William Lehn-Schiøler, Magnus Ruud Kjær, Rahul Thapa, Magnus Guldberg Pedersen, Anton Storgaard Mosquera, Nick Williams, Radu Gatej, Tue Lehn-Schiøler, Sándor Beniczky, Sadasivan Puthusserypady, James Zou, Lars Kai Hansen
Interpretability Time Series Multimodal
  • Introduces a unified framework for interpreting EEG transformers using Sparse Autoencoders.
  • Proposes a clinical semanticity taxonomy to audit encoder representations.
  • Develops a selectivity metric for evaluating model interventions and their effects.
  • Demonstrates the ability to translate latent manipulations into interpretable physiological features.
Read more
Modeling Heterophily in Multiplex Graphs: An Adaptive Approach for Node Classification
Kamel Abdous, Nairouz Mrabah, Mohamed Bouguessa
Graph Learning
  • HAAM explicitly models both homophilic and heterophilic interactions in multiplex graphs.
  • The use of dimension-specific compatibility matrices allows for tailored representation learning.
  • Product-composed Chebyshev filters enable the model to capture non-linear interactions effectively.
  • The framework improves node classification performance compared to existing methods.
Read more
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
Siyang Yao, Erhu Feng, Yubin Xia
NLP Large Language Models Efficient ML
  • QAOD introduces a geometric approach to hallucination detection by decoupling question and answer representations.
  • The framework utilizes Fisher scoring for efficient selection of informative layers and neurons.
  • QAOD achieves superior performance in both in-domain and cross-domain settings with a single inference pass.
  • The joint probing strategy enhances in-domain discriminability, while the orthogonal-only probe excels in OOD scenarios.
Read more
Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models
Yuehao Liu, Shanyan Guan, Weijia Zhang, Xuanming Shang, Yanhao Ge, Wei Li, Chao Ma
NLP Large Language Models Multimodal
  • Introduces a history-free approach to gradient orthogonalization for continual learning.
  • Decouples task adaptation from regularization to enhance model performance.
  • Achieves state-of-the-art results on the UCIT benchmark, outperforming previous methods.
  • Addresses privacy and storage concerns associated with rehearsal-based methods.
Read more
A Unified Geometric Framework for Weighted Contrastive Learning
Raphael Vock, Edouard Duchesnay, Benoit Dufumier
Theory
  • Weighted InfoNCE objectives can be viewed as Distance Geometry Problems, linking the weighting scheme to target geometry.
  • SupCon and Soft SupCon collapse class samples to prototypes differently under class imbalance, affecting inter-class similarities.
  • y-Aware CL struggles to reach its entropic optimum due to inconsistencies between label-space geometry and latent-space similarity.
  • The framework offers practical guidance for designing contrastive learning objectives by aligning geometry in weightings and embeddings.
Read more
A Unified Three-Stage Machine Learning Framework for Diabetes Detection, Subtype Discrimination, and Cognitive-Metabolic Hypothesis Testing
Vishal Pandey, Ruzina Haque Laskar, Rishav Tewari
Interpretability
  • Introduces a three-stage framework for diabetes detection and subtype discrimination.
  • Achieves high performance metrics with SVM-RBF and Logistic Regression on diabetes prediction.
  • Utilizes unsupervised K-Means clustering to identify diabetes subtypes without ground-truth labels.
  • Demonstrates a significant association between glycaemic control and cognitive function.
Read more
Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning
Joana Pasquali, Ramiro N. Barros, Arthur S. Bianchessi, Vinícius Conte Turani, João Vitor Boer Abitante, Rafaela Cappelari Ravazio, Christian Mattjie, Otávio Parraga, Lucas S. Kupssinskü, Rodrigo C. Barros
NLP Large Language Models Efficient ML
  • Slice is a new initialization method for LoRA adapters that mitigates catastrophic forgetting in continual learning.
  • The method uses gradient surgery to align current task objectives with previously learned knowledge.
  • Slice outperforms existing methods (vanilla LoRA, LoRA-GA, LoRAM) in terms of stability and performance metrics.
  • The paper introduces adversarial task sequences to better evaluate the performance of continual learning methods.
Read more
Bayesian Model Merging
Kaiyang Li, Shaobo Han, Qing Su, Shihao Ji
Optimization Efficient ML Computer Vision
  • BMM leverages strong anchor models to improve the merging process.
  • The framework employs bi-level optimization for effective hyperparameter tuning.
  • A data-free variant of BMM allows for regression without auxiliary data.
  • BMM shows significant performance improvements over existing model merging techniques.
Read more
Rethinking Molecular OOD Generalization via Target-Aware Source Selection
Zhuohao Lin, Kun Li, Jiameng Chen, Jiajun Yu, Duanhua Cao, Yizhen Zheng, Wenbin Hu
Optimization Graph Learning Reinforcement Learning
  • Introduction of SCOPE-BENCH, a rigorous OOD evaluation benchmark that mitigates evaluation biases.
  • Development of POMA, a policy-guided framework that enhances knowledge transfer and reduces negative transfer.
  • Demonstration of significant performance degradation of existing models under stricter OOD conditions.
  • Achievement of up to an 11.2% reduction in mean absolute error across diverse architectures.
Read more
A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
Earl Killian
Large Language Models Efficient ML Optimization
  • Introduces Scaled Outer Product (SOP) for efficient post-training quantization of LLMs.
  • Achieves near-lossless fidelity at 4.5–6 bits per weight with lower reconstruction error than conventional methods.
  • Utilizes a hardware-efficient LUT output format to enhance performance and reduce costs.
  • Employs a flexible, per-layer optimization approach tailored to individual model characteristics.
Read more
Fast Rates for Inverse Reinforcement Learning
Andreas Schlaginhaufen, Maryam Kamgarpour
Reinforcement Learning Theory Robotics
  • Establishes equivalence between MLE and Min-Max-IRL at population and empirical levels.
  • Proves fast convergence rates of O(n−1) for trajectory-level KL divergence and parameter estimation.
  • Extends reward identifiability results to general Borel spaces.
  • Derives novel results on the derivatives of the soft-optimal value function with respect to reward parameters.
Read more