AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy
Yuxuan Ying, Hanqing Yang, Kaige Wang, Yu Hu, Zhiming Zheng, Yunliang Jiang, Xiaoqing Lin, Xiaodong Li, Jun Chen
Theory Efficient ML Interpretability
  • Introduces a physics-informed transfer learning framework for emission control in MSWI systems.
  • Demonstrates the importance of considering physical constraints and operational heterogeneity in modeling.
  • Achieves high predictive accuracy for emissions across multiple incineration plants.
  • Shows that adaptation occurs through structured re-weighting of operating regimes rather than complete model re-learning.
Read more
reward-lens: A Mechanistic Interpretability Library for Reward Models
Mohammed Suhail B Nadaf
Reinforcement Learning Interpretability Large Language Models
  • Introduction of 'reward-lens', a toolkit for mechanistic interpretability of reward models.
  • Unifies various interpretability techniques under a common framework based on the reward head's weight vector.
  • Includes five theory-grounded extensions to enhance interpretability tools.
  • Empirical validation shows that linear attribution fails to predict causal importance in reward models.
Read more
Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile
Andrea Maurino
Theory
  • Introduction of the Error Sensitivity Profile (ESP) for assessing model sensitivity to data errors.
  • Development of the Dirtify tool suite to facilitate error injection and analysis.
  • Extensive evaluation across 14 classification models reveals complex relationships between data errors and model performance.
  • ESP allows for prioritization of data-cleaning efforts based on specific error types and features.
Read more
Shearlet Neural Operators for Anisotropic-Shock-Dominated and Multi-scale parametric partial differential equations
Fabio Pereira dos Santos, Julio de Castro Vargas Fernandes, Adriano Mauricio de Almeida Cortes
Theory Efficient ML
  • Introduction of Shearlet Neural Operator (SNO) to enhance neural operator architectures for PDEs.
  • SNO replaces Fourier transforms with shearlet representations, improving handling of anisotropic features.
  • Demonstrated significant accuracy improvements over Fourier Neural Operators across multiple PDE benchmarks.
  • SNO integrates shearlet transforms into the neural operator pipeline for end-to-end training.
Read more
Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yiren Feng
NLP Large Language Models Efficient ML
  • Introduces High Entropy Phases (HEPs) as a stable measure of model uncertainty during inference.
  • Defines the Entropy Centroid as a weighted average of HEP positions to guide response selection.
  • Proposes the Lowest Centroid method for selecting responses based on intrinsic rewards derived from model uncertainty.
  • Demonstrates consistent performance improvements across various tasks and model sizes.
Read more
Investigation into In-Context Learning Capabilities of Transformers
Rushil Chandrupatla, Leo Bangayan, Sebastian Leng, Arya Mazumdar
Theory Efficient ML
  • Transformers can perform in-context learning effectively for unseen tasks using example input-output pairs.
  • The study identifies critical factors affecting in-context test accuracy, including input dimension and the number of examples.
  • Benign overfitting allows models to generalize well despite memorizing noisy labels under certain conditions.
  • The research provides an empirical framework for understanding the scaling behavior of in-context classification.
Read more
Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models
Julia Berger, Bernd Frauenknecht, Sebastian Trimpe, Bastian Leibe
Reinforcement Learning
  • Latent dynamics models can exhibit attractor behavior, biasing transitions towards well-represented regions of latent space.
  • This attractor behavior can obscure discrepancies between latent and true environment dynamics, undermining the reliability of epistemic uncertainty estimates.
  • Latent rollouts systematically overestimate predicted rewards due to the bias towards high-reward regions.
  • The findings highlight the inadequacy of directly transferring epistemic uncertainty quantification methods from physical to latent dynamics models.
Read more
Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors
Shiyi Du, Jiayuan Liu, Weihua Du, Yue Huang, Jiayi Li, Yingtao Luo, Xiangliang Zhang, Vincent Conitzer, Carl Kingsford
NLP Large Language Models Optimization
  • SWIFT reframes workflow design from iterative search to amortized synthesis, significantly reducing computational costs.
  • The framework distills reusable structural priors from past workflows, enhancing efficiency and performance.
  • SWIFT outperforms existing search-based methods across multiple benchmarks and generalizes well to unseen tasks.
  • The study reveals that workflow demonstrations primarily transfer topological structures rather than specific operator names.
Read more
Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking
Disha Singha
Reinforcement Learning
  • Introduces a dual-source uncertainty-aware reward framework to mitigate reward hacking.
  • Employs a confidence-adjusted Reliability Filter to balance exploitation and caution in action selection.
  • Achieves a 93.7% reduction in reward-hacking behavior across various environments.
  • Demonstrates robustness to supervisory noise up to 30%, while maintaining statistical significance.
Read more
CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
Zhe Ding, Su Pan, Duowei Pan
Large Language Models Efficient ML
  • CoQuant proposes a joint weight-activation subspace projection method for mixed-precision quantization.
  • The method addresses the limitations of existing quantization techniques that rely solely on activation statistics.
  • CoQuant demonstrates superior performance in perplexity and zero-shot reasoning tasks compared to strong PTQ baselines.
  • The approach provides a principled framework for low-bit quantization in Large Language Models.
Read more
A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification
Ethan Harvey, Dennis Johan Loevlie, Amir Ali Satani, Wansu Chen, David M. Kent, Michael C. Hughes
Computer Vision Efficient ML
  • Mean pooling MIL outperforms or matches advanced MIL and 3D CNN methods on multiple datasets.
  • Attention-based MIL methods do not provide significant gains in performance compared to simple mean pooling.
  • The study highlights the efficiency of mean pooling MIL, being 25 times faster to train than complex alternatives.
  • A semi-synthetic dataset analysis reveals limitations in current MIL approaches, indicating potential for future improvements.
Read more
Optimization-Free Topological Sort for Causal Discovery via the Schur Complement of Score Jacobians
Rui Wu, Hong Xie
Graph Learning Theory Efficient ML
  • Introduction of Score-Schur Topological Sort (SSTS) for causal discovery.
  • Decoupling of representation learning from structural optimization to improve scalability.
  • Exact algebraic mapping of acyclicity in linear Gaussian models using Schur complements.
  • Development of Block-SSTS to address non-linear systems and reduce structural error.
Read more
PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures
Karim Alghoul, Hussein Al Osman, Abdulmotaleb El Saddik
Time Series
  • Comparison of CNN, CNNโ€“LSTM, Transformer, and Mamba architectures for PPG-based affect recognition.
  • Transformers and Mamba models show comparable performance to CNNs but do not consistently outperform them.
  • CNNs provide the highest accuracy with the smallest model size, making them the most effective overall.
  • Transformers achieve better F1 scores for specific emotional states like arousal and relaxation.
Read more
VAE-Inf: A statistically interpretable generative paradigm for imbalanced classification
Hongfei Wu, Ruijian Han, Yancheng Yuan
Generative Models Theory Interpretability
  • Introduces VAE-Inf, a two-stage framework for imbalanced classification.
  • First stage involves training a VAE on majority-class data to establish a reference distribution.
  • Second stage fine-tunes the model with minority samples using a distribution-aware loss.
  • Provides a statistically interpretable inference strategy with controlled error rates.
Read more
Unifying Runtime Monitoring Approaches for Safety-Critical Machine Learning: Application to Vision-Based Landing
Mathieu Dario, Florent Chenevier, Kรฉvin Delmas, Joris Guerin, Jรฉrรฉmie Guiochet
Computer Vision Robotics
  • Introduction of a unified framework for runtime monitoring in safety-critical ML applications.
  • Categorization of monitoring approaches into ODD, OOD, and OMS types.
  • Demonstration of the framework's application through runway detection in aviation.
  • Establishment of common safety-oriented metrics for evaluating monitoring methods.
Read more
Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning
Mengyang Zhao, Longlong Li, Cunquan Qu
Graph Learning
  • Introduction of Cheegerโ€“Hodge joint signature for robust graph representation learning.
  • CHCL framework aligns graph embeddings with a perturbation-stable structural consistency target.
  • Demonstrated effectiveness through extensive experiments on standard benchmarks.
  • Improves upon traditional GCL methods by reducing reliance on augmentation heuristics.
Read more
Multiple Additive Neural Networks for Structured and Unstructured Data
Janis Mohr, Jรถrg Frochte
Multimodal Theory Efficient ML
  • MANN replaces decision trees with shallow neural networks in the Gradient Boosting framework.
  • The approach integrates Capsule Neural Networks for feature extraction in structured data and CNNs for unstructured data.
  • MANN incorporates continuous learning mechanisms to adapt to new data and combat overfitting.
  • Empirical results show MANN's superior accuracy compared to traditional boosting methods like XGB.
Read more
KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment
Attila Pintรฉr, Javier Rico, Attila Rรฉpai, Jalal Al-Afandi, Adrienn ร‰va Borsy, Andrรกs Kozma, Hajnalka Andrikovics, Gyรถrgy Cserey
Computer Vision
  • KAYRA utilizes a microservice architecture for karyotyping, enhancing deployment flexibility.
  • The system integrates multiple machine learning models for improved segmentation and classification accuracy.
  • Clinical evaluation shows KAYRA outperforms existing commercial karyotyping systems in key metrics.
  • The architecture supports both cloud and on-premise deployments, addressing patient-data residency concerns.
Read more
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment
Chayanon Kitkana, Shivam Arora
Theory
  • Positive gradient alignment between trait and distillation gradients persists throughout multi-step training.
  • Removing the trait-aligned component of the distillation gradient effectively stops trait acquisition.
  • Liminal training reduces alignment but does not prevent trait acquisition, indicating limitations in current mitigation methods.
  • The study provides empirical evidence supporting the causal relationship between gradient alignment and subliminal learning.
Read more
Negative Ontology of True Target for Machine Learning: Towards Evaluation and Learning under Democratic Supervision
Yongquan Yang
Theory
  • Challenges the traditional assumption of the objective existence of the true target in ML.
  • Introduces the concept of Democratic Supervision, promoting a participatory approach to supervision.
  • Defines Multiple Inaccurate True Targets (MIATTs) as a practical application of Democratic Supervision.
  • Develops the EL-MIATTs framework for evaluation and learning in ML.
Read more
Hankel and Toeplitz Rank-1 Decomposition of Arbitrary Matrices with Applications to Signal Direction-of-Arrival Estimation
Georgios I. Orfanidis, Dimitris A. Pados, George Sklivanitis, Elizabeth Serena Bentley
Time Series Optimization Robotics
  • Development of efficient algorithms for Hankel and Toeplitz rank-1 matrix approximations.
  • Estimation methods are shown to be maximum-likelihood optimal under specific noise conditions.
  • Robustness against outliers is achieved through the use of L1-norm formulations.
  • Extensive validation through simulations and real-world data demonstrates practical applicability.
Read more
Mini-Batch Class Composition Bias in Link Prediction
Kieran Maguire, Srinandan Dasmahapatra
Graph Learning
  • GNNs trained for link prediction may learn trivial heuristics based on mini-batch composition rather than meaningful graph features.
  • Randomizing the class distribution in mini-batches improves alignment with node classification features, albeit at the cost of link prediction performance.
  • The study challenges the assumption that link prediction models can generalize representations learned from node classification tasks.
Read more
Momentum-Conserving Graph Neural Networks for Deformable Objects
Jiahong Wang, Logan Numerow, Stelian Coros, Christian Theobalt, Vahid Babaei, Bernhard Thomaszewski
Graph Learning Robotics
  • Introduction of MomentumGNN, a GNN architecture that conserves momentum.
  • Utilizes per-edge impulses for predicting bending and stretching forces.
  • Employs a layer-by-layer update mechanism for vertex positions.
  • Trained using a physics-based loss function in an unsupervised manner.
Read more
Knowledge Distillation Must Account for What It Loses
Wenshuo Wang
Theory
  • Knowledge distillation should account for lost capabilities, not just retained scores.
  • Current evaluation methods often overlook critical off-metric capabilities.
  • A taxonomy of off-metric losses is proposed to better understand distillation impacts.
  • Scenario-specific preservation targets and a Distillation Loss Statement are introduced.
Read more
GCA-BULF: A Bottom-Up Framework for Short-Term Load Forecasting Using Grouped Critical Appliances
Yunhao Yao, Jinwei Fang, Puhan Luo, Zhiqiang Wang, Jiahui Hou, Xiang-Yang Li
Time Series
  • GCA-BULF is the first bottom-up STLF framework that selects critical appliances and incorporates appliance correlations.
  • The framework improves forecasting accuracy by 20.85%-57.88% compared to top-down methods and by 33.03%-92.48% compared to existing bottom-up methods.
  • The Critical Appliance Filtering module effectively identifies a minimal set of appliances that significantly influence total load trends.
  • The Related Appliance Grouping module clusters appliances based on their usage correlations, enhancing group-level forecasting.
Read more
NeuroPlastic: A Plasticity-Modulated Optimizer for Biologically Inspired Learning Dynamics
Douglas Jiang, Yuechen Wang, Jiayi Wang, Jiaying Geng, Qinglong Wang, Feng Tian
Optimization
  • NeuroPlastic introduces a plasticity-modulated optimizer that combines gradient updates with additional signals inspired by biological learning mechanisms.
  • The optimizer features a stabilization mechanism to regulate update magnitudes, ensuring stable optimization dynamics across different learning rates.
  • Empirical evaluations show significant performance improvements over traditional gradient-only methods, particularly in data-scarce and challenging tasks.
  • NeuroPlastic remains competitive without requiring retuning, making it a practical alternative for standard deep learning applications.
Read more
Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning
Seungyub Han, Hyungjin Kim, Jungwoo Lee
Reinforcement Learning Robotics Theory
  • Introduction of Self-Alignment for Safety (SAS) for offline safe RL.
  • Utilization of Lyapunov stability as an occupancy-measure criterion for safety.
  • Transformer-based architecture allows for hierarchical RL interpretation.
  • SAS enables safe test-time adaptation without retraining.
Read more
Laplace-Bridged Randomized Smoothing for Fast Certified Robustness
Miao Lin, MD Saifur Rahman Mazumder, Feng Yu, Daniel Takabi, Rui Ning
Computer Vision Efficient ML Theory
  • LBS eliminates the need for noise-augmented training, preserving clean accuracy.
  • The method significantly reduces the computational cost of certification, making it feasible for edge devices.
  • LBS achieves up to 494ร— speedup compared to traditional RS methods on devices like NVIDIA Jetson Orin Nano and Raspberry Pi 4.
  • Theoretical foundations of LBS are established, ensuring the validity of the certification process.
Read more
Compute Aligned Training: Optimizing for Test Time Inference
Adam Ousherovitch, Ambuj Tewari
NLP Large Language Models Reinforcement Learning
  • Introduction of Compute Aligned Training (CAT) to align training objectives with test-time inference strategies.
  • Derivation of new loss functions that improve performance during test-time scaling for LLMs.
  • Empirical validation of CAT across multiple test-time strategies, showing substantial performance improvements.
  • Unified framework that generalizes existing methods and addresses misalignment issues in training and inference.
Read more
Heterogeneous Variational Inference for Markov Degradation Hazard Models: Discretized Mixture with Interpretable Clusters
Takato Yasuno
Time Series Interpretability Efficient ML
  • Introduces an 8-state discretization method that significantly improves the detection of degradation events.
  • Develops a comprehensive feature engineering strategy that integrates various data types for better model performance.
  • Establishes practical interpretability rules for model selection to prevent overfitting and ensure meaningful clusters.
  • Demonstrates that ADVI outperforms traditional MCMC methods in terms of speed and stability for finite mixture models.
Read more
Categorical Optimization with Bayesian Anchored Latent Trust Regions for Structural Design under High-Dimensional Uncertainty
Zhangyong Liang, Huanhuan Gao
Optimization
  • COBALT effectively tackles high-dimensional categorical optimization under uncertainty with costly evaluations.
  • The framework locks latent catalog instances as discrete physical anchors to maintain design integrity.
  • Additive SAAS-GP is used to model sparse effects in the presence of heteroscedastic noise.
  • The trust-region graph acquisition method allows for valid design selection without rounding errors.
Read more
Semi-supervised learning with max-margin graph cuts
Branislav Kveton, Michal Valko, Ali Rahimi, Ling Huang
Graph Learning Theory Optimization
  • Introduction of a max-margin graph cuts algorithm for semi-supervised learning.
  • Theoretical proof of a generalization error bound for the proposed method.
  • Demonstrated superior performance compared to existing methods like manifold regularization of SVMs.
  • Stability improvements for harmonic function solutions with soft labeling constraints.
Read more
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Arushi Goel, Mike Ranzinger, Greg Heinrich, Guo Chen, Lukas Voegtle, Philipp Fischer, Timo Roman, Karan Sapra, Collin McCarthy, Shaokun Zhang, Fuxiao Liu, Hanrong Ye, Yi Dong, Mingjie Liu, Yifan Peng, Piotr Zelasko, Zhehuai Chen, Nithin Rao Koluguri, Nune Tadevosyan, Lilit Grigoryan, Ehsan Hosseini Asl, Pritam Biswas, Leili Tavabi, Yuanhang Su, Zhiding Yu, Peter Jin, Alexandre Milesi, Netanel Haber, Yao Xu, Sarah Amiraslani, Nabin Mulepati, Eric Tramel, Jaehun Jung, Ximing Lu, Brandon Cui, Jin Xu, Zhiqi Li, Shihao Wang, Yuanguo Kuang, Shaokun Zhang, Huck Yang, Boyi Li, Hongxu Yin, Song Han, Pavlo Molchanov, Adi Renduchintala, Charles Wang, David Mosallanezhad, Soumye Singhal, Luis Vega, Katherine Cheung, Sreyan Ghosh, Yian Zhang, Alexander Bukharin, Venkat Srinivasan, Johnny Greco, Andre Manoel, Maarten Van Segbroeck, Suseella Panguliri, Rohit Watve, Divyanshu Kakwani, Shubham Pachori, Jeffrey Glick, Radha Sri-Tharan, Aileen Zaman, Khanh Nguyen, Shi Chen, Jiaheng Fang, Qing Miao, Wenfei Zhou, Yu Wang, Zaid Pervaiz Bhat, Varun Praveen, Arihant Jain, Ramanathan Arunachalam, Tomasz Kornuta, Ashton Sharabiani, Amy Shen, Wei Huang, Yi-Fu Wu, Ali Roshan Ghias, Huiying Li, Brian Yu, Nima Tajbakhsh, Chen Cui, Wenwen Gao, Li Ding, Terry Kong, Manoj Kilaru, Anahita Bhiwandiwalla
Multimodal Efficient ML Audio & Speech
  • Introduces native audio support alongside text, images, and video.
  • Achieves significant accuracy improvements over previous models.
  • Incorporates multimodal token-reduction techniques for efficiency.
  • Demonstrates leading results in various multimodal tasks.
Read more
PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
Zhiquan Tan, Yinrong Hong
NLP Large Language Models Reinforcement Learning
  • PAINT combines overlap-adaptive solution masking with sparse teacher energy interpolation for enhanced reasoning.
  • The method provides a contextual re-scoring view that identifies key bottlenecks in reasoning tasks.
  • Empirical results show consistent improvements over existing self-distillation methods and GRPO.
  • PAINT achieves better rollout-token efficiency with shorter training sequences.
Read more
Knowledge-Data Dually Driven Paradigm for Accurate Landslide Susceptibility Prediction under Data-Scarce Conditions Using Geomorphic Priors and Tabular Foundation Model
Yuting Yang, Gang Mei, Feng Chen, Yongshuang Zhang, Jianbing Peng
Theory Efficient ML
  • Introduces a novel paradigm for landslide susceptibility prediction that integrates geomorphic prior knowledge with limited data.
  • Demonstrates comparable predictive accuracy to traditional methods using only 30% of available data in a data-rich region.
  • Validates the approach in a data-scarce environment, confirming its applicability in complex geological settings.
  • Utilizes a foundation model (TabPFN) tailored for small datasets to enhance prediction reliability.
Read more
DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training
Tianhao Hu, Xiangcheng Liu, Youshao Xiao, Yang Zheng, Xuan Huang, Jinrui Ding, Yufei Zhang, Tao Liang, Hongyu Zang, Quan Chen, Yueqing Sun, Wenjie Shi, Chao Zhang, Wei Wang, Qi Gu, Yerui Sun, Yucheng Xie, Xunliang Cai
Reinforcement Learning Large Language Models Efficient ML
  • DORA introduces a scalable asynchronous training paradigm for RL in LLMs.
  • The system maintains multiple policy versions to enhance rollout efficiency.
  • DORA achieves significant improvements in training throughput without compromising convergence.
  • The centralized load-balancing orchestrator optimizes resource allocation dynamically.
Read more
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
Hayate Iso, Tiyasa Mitra, Sudipta Mondal, Rasoul Shafipour, Venmugil Elango, Terry Kong, Yuki Huang, Seonjin Na, Izzy Putterman, Benjamin Chislett, Maor Ashkenazi, Joseph Guman, Gerald Shen, Tugrul Konuk, Ashwath Aithal, Ritika Borkar, Ran Zilberstein, Bita Rouhani
Reinforcement Learning Large Language Models Efficient ML
  • Speculative decoding is introduced as a method to accelerate RL rollouts without altering the output distribution of the target model.
  • The integration of speculative decoding into the NeMo-RL framework supports both synchronous and asynchronous execution.
  • The proposed method achieves a 1.8ร— throughput improvement in rollout generation for 8B-scale models.
  • Combining speculative decoding with asynchronous RL can potentially lead to a 2.5ร— speedup in end-to-end training at larger model scales.
Read more
On the Trainability of Masked Diffusion Language Models via Blockwise Locality
Yuxiang Wang, Yu Xiang, Baojian Zhou, Qifang Zhao, Keyue Jiang, Yanghua Xiao, Xiaoxiao Xu
NLP Large Language Models Generative Models
  • Standard random-masking MDMs exhibit instability and high variance in training for certain tasks.
  • Proposed models Jigsaw and Scatter incorporate left-to-right locality to improve trainability and stability.
  • Jigsaw matches AR-LLM stability on linear regression while maintaining performance on Sudoku.
  • Scatter retains diffusion's advantages in planning tasks like path-finding.
Read more
Learning with Embedded Linear Equality Constraints via Variational Bayesian Inference
Matthew Marsh, Benoรฎt Chachuat, Antonio del Rio Chanona
Theory Optimization
  • Introduces a Bayesian framework for embedding linear equality constraints in BNNs.
  • Utilizes variational inference to provide uncertainty quantification while enforcing constraints.
  • Demonstrates improved performance on a single particle battery model compared to standard BNNs.
  • Treats constraint tolerance as a learnable random variable, enhancing model flexibility.
Read more
Comparative Study of Bending Analysis using Physics-Informed Neural Networks and Numerical Dynamic Deflection in Perforated nanobeam
Ramanath Garai, Iswari Sahu, S. Chakraverty
Theory Optimization
  • Introduction of a novel Physics-Informed Functional Link Constrained Framework (DFL-TFC) for analyzing perforated nanobeams.
  • Establishment of a relationship between static and dynamic deflections in perforated nanobeams.
  • Demonstration of computational efficiency and accuracy without the need for complex neural network architectures.
  • Utilization of the Theory of Functional Connections to embed differential equation constraints effectively.
Read more
Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model
Maixent Chenebaux
NLP Large Language Models Reinforcement Learning
  • Introduction of SeqCond Attention (SCA) as a novel sequence operator for language models.
  • Demonstration of SCA's expressiveness, capable of retrieving individual tokens and replicating softmax attention outputs.
  • Hybrid architecture comprising 16 SCA layers and 8 transformer layers for efficient reasoning.
  • Innovative training strategies, including gradient-balanced GRPO and scored self-distillation, leading to improved accuracy.
Read more
A Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms
Catherine Ning, Yu Ma, Cindy Beini Wang, Sean McMahon, Joseph Radojevic, Steven Zweibel, Dimitris Bertsimas
Multimodal Time Series Interpretability
  • Developed a multimodal machine learning framework for LVEF classification using ECG and EHR data.
  • Achieved high AUROC scores across multiple LVEF categories, outperforming unimodal models.
  • Utilized SHAP attributions to enhance model explainability and identify key features influencing predictions.
  • Demonstrated the model's temporal generalizability, indicating robustness over time.
Read more
Transformer Approximations from ReLUs
Jerry Yao-Chieh Hu, Mingcheng Lu, Yi-Chen Lee, Han Liu
Theory Efficient ML Generative Models
  • Introduction of a translation theorem for ReLU to softmax Transformer approximations.
  • Establishment of constructive transformer-native universal approximation results.
  • Derivation of economic transformer constructions for specific approximation targets.
  • Improved resource bounds for Transformers compared to traditional universal approximation methods.
Read more
A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication
Valentin Cuzin-Rambaud, Laetitia Matignon, Maxime Morge
Reinforcement Learning Graph Learning
  • Communication enhances coordination among agents in MARL.
  • GNNs provide a robust framework for modeling communication in multi-agent systems.
  • The paper identifies a lack of classification frameworks for GNN-based communication methods in MARL.
  • A generalized GNN-based communication process is proposed to clarify existing approaches.
Read more
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
Ishan Patel, Ishan Joshi
Large Language Models Efficient ML NLP
  • Introduces a shared KV cache pool for concurrent inference agents, reducing memory overhead.
  • Achieves a stable 2.91x compression ratio across different configurations.
  • Demonstrates significant memory savings, reducing KV cache memory from 19.8 GB to 0.45 GB.
  • Shows that perplexity degradation is minimal and improves with longer context lengths.
Read more
Diverse Image Priors for Black-box Data-free Knowledge Distillation
Tri-Nhan Vo, Dang Nguyen, Trung Le, Kien Do, Sunil Gupta
Computer Vision Efficient ML Theory
  • Introduces Diverse Image Priors as a new class of synthetic images for knowledge distillation.
  • Utilizes a primer student for contrastive optimization, enhancing the quality of distillation signals.
  • Demonstrates that data diversity is crucial for effective knowledge transfer in black-box scenarios.
  • Achieves state-of-the-art performance across multiple benchmarks in black-box data-free KD.
Read more
A Randomized PDE Energy driven Iterative Framework for Efficient and Stable PDE Solutions
Yi Bing, Zheng Ran, Fu Jinyang, Liu Long, Peng Xiang
Efficient ML Theory Optimization
  • Introduces a PDE energy-driven framework that avoids traditional matrix-based solvers.
  • Utilizes physically constrained diffusion iterations for solving PDEs.
  • Demonstrates stable convergence and accurate resolution of sharp gradients.
  • Achieves competitive accuracy compared to analytical solutions.
Read more
Asynchronous Federated Unlearning with Invariance Calibration for Medical Imaging
Zhaoyuan Cai, Xinglin Zhang
Federated Learning
  • Introduces an asynchronous paradigm for Federated Unlearning in medical imaging.
  • Addresses the latency issues of synchronous unlearning methods.
  • Implements a server-side invariance calibration mechanism to ensure genuine data erasure.
  • Achieves unlearning efficacy comparable to retraining while maintaining high fidelity.
Read more