AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

62 Papers today
8h Update frequency
7 Days of history
Characterization and forecasting of national-scale solar power ramp events
Luca Lanzilao, Angela Meyer
Time Series
  • The study provides a comprehensive national-scale characterization of solar ramp events using data from 6434 PV stations.
  • Quantitative metrics were developed to define and analyze the occurrence, frequency, and magnitude of solar ramp events.
  • Meteorological factors, particularly cloud dynamics, were identified as significant drivers of ramp events.
  • SHADECast was found to be the most reliable forecasting model, outperforming others in terms of continuous ranked probability score (CRPS).
Read more
Why Safety Probes Catch Liars But Miss Fanatics
Kristiyan Haralambiev
Theory Reinforcement Learning Interpretability
  • Distinction between deceptive misalignment and coherent misalignment in AI systems.
  • Probes are effective against models that hide harmful intentions but fail against those that believe in their harmful actions.
  • Theoretical proof that detecting coherent misalignment is computationally hard under standard assumptions.
  • Empirical validation showing that models trained with rationalizations evade detection despite similar outputs to deceptive models.
Read more
Offline Decision Transformers for Neural Combinatorial Optimization: Surpassing Heuristics on the Traveling Salesman Problem
Hironori Ohigashi, Shinichiro Hamada
Reinforcement Learning Optimization
  • Introduces a novel offline RL framework using Decision Transformers for the Traveling Salesman Problem.
  • Integrates Pointer Networks to effectively handle variable action spaces in node selection.
  • Employs expectile regression for optimistic conditioning of Return-to-Go, enhancing solution quality.
  • Demonstrates that the proposed method consistently outperforms classical heuristics in generating TSP solutions.
Read more
Parameter-Free Dynamic Regret for Unconstrained Linear Bandits
Alberto Rumi, Andrew Jacobsen, Nicolò Cesa-Bianchi, Fabio Vitale
Theory Optimization
  • Introduces the first parameter-free algorithm for dynamic regret in linear bandits.
  • Achieves optimal regret guarantees without prior knowledge of the comparator variability.
  • Utilizes a novel technique for combining multiple bandit algorithms to enhance performance.
  • Resolves a long-standing open problem in the field of online learning.
Read more
Amplified Patch-Level Differential Privacy for Free via Random Cropping
Kaan Durmaz, Jan Schuchardt, Sebastian Schmidt, Stephan Günnemann
Computer Vision Theory Efficient ML
  • Random cropping can amplify differential privacy in machine learning models without requiring changes to the training process.
  • The authors introduce a patch-level neighboring relation that aligns better with the structure of privacy-sensitive content in images.
  • The proposed method enhances the privacy-utility trade-off in segmentation tasks, demonstrating practical applicability.
  • This approach leverages existing randomness in training pipelines, offering a drop-in improvement for DP-SGD.
Read more
Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory
Juno Kim, Eshaan Nichani, Denny Wu, Alberto Bietti, Jason D. Lee
NLP Large Language Models Optimization
  • Muon significantly improves storage efficiency compared to SGD, recovering more items in a single step.
  • The performance of Muon benefits from larger batch sizes, saturating at a much higher critical batch size than SGD.
  • Muon accelerates early in training, achieving better recovery rates than SGD from the outset.
  • The analysis provides a quantitative understanding of signal amplification in Muon, laying groundwork for future studies on scaling laws in language modeling.
Read more
Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery
Gilles Wainrib, Barbara Bodinier, Haitem Dakhli, Josep Monserrat, Almudena Espin Perez, Sabrina Carpentier, Roberta Codato, John Klein
Large Language Models Optimization NLP
  • LLMs can learn from experimental feedback, leading to significant improvements in scientific discovery.
  • A random feedback control demonstrated that performance gains are dependent on the structure of feedback, not just prior knowledge recall.
  • Model capability plays a crucial role in the effectiveness of in-context learning from feedback.
  • The study provides empirical evidence against previous claims that LLMs do not genuinely learn from experimental design feedback.
Read more
Topology-Aware Graph Reinforcement Learning for Energy Storage Systems Optimal Dispatch in Distribution Networks
Shuyi Gao, Stavros Orfanoudakis, Shengren Hou, Peter Palensky, Pedro P. Vergara
Reinforcement Learning Graph Learning Optimization
  • Introduction of a topology-aware GNN encoder in RL for ESS dispatch.
  • Significant reduction in voltage violations using GNN-based controllers.
  • Case-dependent transfer learning benefits, with zero-shot transfer often degrading performance.
  • Demonstrated effectiveness on both 34-bus and 69-bus distribution systems.
Read more
On the Complexity of Optimal Graph Rewiring for Oversmoothing and Oversquashing in Graph Neural Networks
Mostafa Haghir Chehreghani
Graph Learning Optimization Theory
  • Introduces a theoretical framework for understanding the complexity of graph rewiring in GNNs.
  • Proves that optimizing for oversmoothing and oversquashing is NP-hard.
  • Establishes a connection between graph topology and the performance limitations of GNNs.
  • Justifies the use of heuristic methods for graph optimization in GNNs.
Read more
How Class Ontology and Data Scale Affect Audio Transfer Learning
Manuel Milling, Andreas Triantafyllopoulos, Alexander Gebhard, Simon Rampp, Björn W. Schuller
Audio & Speech
  • Transfer learning benefits from both the scale of pre-training data and the similarity of tasks.
  • Increasing the number of samples and classes in pre-training data positively impacts performance.
  • Task similarity is a more significant factor than data scale in determining transfer learning success.
  • The study provides a systematic evaluation of audio transfer learning across multiple tasks.
Read more
EngineAD: A Real-World Vehicle Engine Anomaly Detection Dataset
Hadi Hojjati, Christopher Roth, Rory Woods, Ken Sills, Narges Armanfard
Time Series
  • Introduction of EngineAD, a real-world dataset for vehicle engine anomaly detection.
  • Dataset includes high-resolution telemetry data with expert annotations for reliable labeling.
  • Significant performance variability observed across different vehicles in anomaly detection.
  • Classical anomaly detection methods often outperform deep learning techniques in this dataset.
Read more
DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph
Feng Zhao, Kangzheng Liu, Teng Peng, Yu Yang, Guandong Xu
Multimodal Graph Learning Time Series
  • Introduces DyMRL for dynamic multispace representation learning.
  • Addresses the limitations of static knowledge acquisition and fusion methods.
  • Integrates multiple geometric spaces for deep representation learning.
  • Employs dual fusion-evolution attention mechanisms for dynamic feature fusion.
Read more
Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards
Cheng Jiayang, Xin Liu, Zhihan Zhang, Haoyang Wen, Zixuan Zhang, Qingyu Yin, Shiyang Li, Priyanka Nigam, Bing Yin, Chao Zhang, Yangqiu Song
Large Language Models Reinforcement Learning
  • Introduces a framework for training LLMs in multi-step tool orchestration using real API responses.
  • Develops a graduated reward system that enhances learning signals for partial correctness.
  • Demonstrates substantial improvements in model accuracy on ComplexFuncBench.
  • Identifies and addresses the limitations of existing RL environments for complex orchestration tasks.
Read more
Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder
Kewei Zhu, Yanze Xin, Jinwei Hu, Xiaoyuan Cheng, Yiming Yang, Sibo Cheng
Time Series
  • Introduces the Physics-Spatiotemporal Masked Autoencoder (P-STMAE) for forecasting irregular time series.
  • Integrates convolutional autoencoders with masked autoencoders to enhance spatial and temporal feature extraction.
  • Demonstrates significant improvements in prediction accuracy and robustness over traditional methods.
  • Eliminates the need for data preprocessing techniques like interpolation or resampling.
Read more
PEANUT: Perturbations by Eigenvalue Alignment for Attacking GNNs Under Topology-Driven Message Passing
Bhavya Kohli, Biplab Sikdar
Graph Learning
  • PEANUT is a novel black-box attack that targets GNNs by injecting virtual nodes.
  • The attack operates during the inference phase, making it practical for real-world applications.
  • No features are required for the injected nodes, yet significant performance degradation is observed.
  • The method generalizes beyond node classification to include graph-level regression tasks.
Read more
SPECTRA: An Efficient Spectral-Informed Neural Network for Sensor-Based Activity Recognition
Deepika Gurung, Lala Shakti Swarup Ray, Mengxi Liu, Bo Zhou, Paul Lukowicz
Efficient ML Time Series
  • SPECTRA integrates spectral inductive bias with lightweight temporal modeling for efficient HAR.
  • The architecture captures spectral-temporal dependencies while minimizing computational costs.
  • SPECTRA achieves comparable accuracy to larger models while drastically reducing parameters and energy consumption.
  • Real-time deployments demonstrate the feasibility of SPECTRA on edge devices like smartphones and microcontrollers.
Read more
Do Neurons Dream of Primitive Operators? Wake-Sleep Compression Rediscovers Schank's Event Semantics
Peter Balogh
NLP Theory Interpretability
  • The study successfully adapts a wake-sleep learning algorithm to discover event primitives from data.
  • The discovered operators align closely with Schank's core primitives and introduce novel emotional state operators.
  • The algorithm achieves a 100% explanation rate for events in both synthetic and real-world commonsense data.
  • The findings challenge the completeness of Schank's original taxonomy, highlighting the dominance of mental/emotional operators in naturalistic data.
Read more
A Boltzmann-machine-enhanced Transformer For DNA Sequence Classification
Zhixuan Cao, Yishu Xu, Xuang Wu
Interpretability
  • Introduction of a Boltzmann-machine-enhanced Transformer for DNA sequence classification.
  • Utilization of structured binary gating variables to model query-key connections.
  • Adoption of mean-field variational inference and Gumbel-Softmax for training discrete gating structures.
  • Joint optimization of classification and energy loss to ensure both accuracy and interpretability.
Read more
Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
Haishan Ye
Optimization Theory
  • Introduces the first high-probability regret bound for OCO with two-point feedback.
  • Achieves a minimax optimal regret bound of O(d(log T + log(1/δ))/µ) for strongly convex losses.
  • Improves the dimension dependency of regret from O(d²) to O(d).
  • Develops a novel analytical framework that enhances robustness against variance in estimators.
Read more
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Hector Borobia, Elies Seguí-Mas, Guillermina Tormo-Carbó
NLP Large Language Models Interpretability
  • Rare features survive pruning better than frequent features, indicating implicit feature selection.
  • Wanda pruning preserves feature structure up to 3.7 times better than magnitude pruning.
  • Pre-trained Sparse Autoencoders remain viable on Wanda-pruned models up to 50% sparsity.
  • Seed stability is low, but the degradation pattern is consistent across conditions.
Read more
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Selim An, Il hong Suh, Yeseong Kim
Large Language Models Efficient ML Optimization
  • GlowQ introduces a group-shared low-rank approximation to enhance quantized LLMs.
  • The method reduces latency and memory overhead by caching a single shared right factor per input-sharing group.
  • GlowQ-S, a selective variant, further optimizes performance by applying corrections only where needed.
  • Empirical results show significant improvements in efficiency and accuracy compared to strong baselines.
Read more
DPD-Cancer: Explainable Graph-based Deep Learning for Small Molecule Anti-Cancer Activity Prediction
Magnus H. Strømme, Alex G. C. de Sá, David B. Ascher
Graph Learning
  • DPD-Cancer utilizes a Graph Attention Transformer for predicting small molecule anti-cancer activities.
  • The model outperforms existing methods with AUC scores of up to 0.98 on benchmark datasets.
  • It provides explainability by visualizing molecular substructures relevant to predictions.
  • DPD-Cancer employs a multi-stage, chemistry-aware data partitioning strategy for robust performance validation.
Read more
Identification of Bivariate Causal Directionality Based on Anticipated Asymmetric Geometries
Alex Glushkovsky
Theory
  • Introduction of two methods for identifying causal directionality in bivariate data: AAG and Monotonicity Index.
  • AAG method outperforms existing methods with a top accuracy of 77.9%.
  • Both methods utilize conditional distributions and assume stochastic properties of bivariate data.
  • Hyperparameter tuning is crucial for improving the accuracy of the proposed methods.
Read more
Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems
Pascal Henrich, Jonas Sievers, Maximilian Beichter, Thomas Blank, Ralf Mikut, Veit Hagenmeyer
Reinforcement Learning Efficient ML
  • Knowledge Distillation effectively compresses transformer models for deployment in hardware-constrained environments.
  • The smallest student models can outperform their teacher models in terms of electricity cost savings.
  • The proposed method achieves up to 96% reduction in parameters, 90% in memory usage, and 63% in inference time.
  • KD maintains control performance while enabling the use of complex models in practical applications.
Read more
Not a fragment, but the whole: Map-based evaluation of data-driven Fire Danger Index models
Shahbaz Alvi, Italo Epicoco, Jose Maria Costa Saura
Time Series
  • Traditional evaluation metrics for wildfire prediction models often overlook the importance of false positive rates.
  • The proposed evaluation framework aligns model performance with real-world decision-making needs.
  • An ensemble of machine learning models enhances fire detection accuracy while reducing false alarms.
  • The study highlights the economic and operational implications of false positives in wildfire management.
Read more
Pure and Physics-Guided Deep Learning Solutions for Spatio-Temporal Groundwater Level Prediction at Arbitrary Locations
Matteo Salis, Gabriele Sartor, Rosa Meo, Stefano Ferraris, Abdourrahmane M. Atto
Time Series Theory Interpretability
  • Introduction of STAINet, an attention-based deep learning model for groundwater level prediction.
  • Integration of physics-guided strategies to enhance model trustworthiness and generalization.
  • STAINet-ILB variant achieved the best performance metrics, indicating effective incorporation of physical principles.
  • Model provides insights into groundwater flow dynamics, improving interpretability.
Read more
Foundation Model for Cardiac Time Series via Masked Latent Attention
Moritz Vandenhirtz, Samuel Ruipérez-Campillo, Simon Böhi, Sonia Laguna, Irene Cannistraci, Andrea Agostini, Ece Ozkan, Thomas M. Sutter, Julia E. Vogt
Time Series
  • Introduction of LAMAE, a foundation model that exploits ECG structural redundancy.
  • Utilization of latent attention to model higher-order interactions across ECG leads.
  • Empirical validation on the Mimic-IV-ECG database demonstrating improved representation quality.
  • Outperformance of LAMAE compared to traditional independent-lead approaches in clinical tasks.
Read more
H-Node Attack and Defense in Large Language Models
Eric Yocam, Varghese Vaidyan, Yong Wang
Large Language Models NLP Interpretability
  • Introduction of H-Node ANC framework for hallucination detection and mitigation in LLMs.
  • Identification of Hallucination Nodes (H-Nodes) using logistic regression probes with high accuracy.
  • Development of a white-box adversarial attack that effectively amplifies hallucination signals.
  • Adaptive ANC defense significantly reduces hallucination effects while preserving model performance.
Read more
Empowering Epidemic Response: The Role of Reinforcement Learning in Infectious Disease Control
Mutong Liu, Yang Liu, Jiming Liu
Reinforcement Learning
  • Reinforcement Learning is increasingly being utilized for optimizing infectious disease control strategies.
  • The paper categorizes RL applications into four main areas: resource allocation, balancing health risks and socioeconomic costs, mixed intervention policies, and inter-regional coordination.
  • A systematic review identified 19 relevant studies, highlighting the growing interest in RL for public health applications.
  • RL can effectively address the complexities and uncertainties in epidemic response decision-making.
Read more
A Systematic Empirical Study of Grokking: Depth, Architecture, Activation, and Regularization
Shalima Binta Manir, Anamika Paul Rupa
Optimization Theory
  • Depth requires stabilization for effective grokking, with depth-4 MLPs failing while depth-8 residual networks succeed.
  • The differences between Transformers and MLPs are largely mitigated under matched hyperparameters, emphasizing the role of optimization and regularization.
  • Activation functions exhibit regime-dependent effects, with GELU outperforming ReLU only when regularization allows for memorization.
  • Weight decay is identified as a dominant control parameter, with a narrow range necessary for successful grokking.
Read more
AcTTA: Rethinking Test-Time Adaptation via Dynamic Activation
Hyeongyu Kim, Geonhui Han, Dosik Hwang
Computer Vision
  • AcTTA introduces an activation-aware framework for TTA, focusing on adaptive modulation of activation functions.
  • The method reformulates conventional activation functions into parameterized forms for dynamic adjustment during inference.
  • AcTTA achieves superior performance and stability compared to traditional normalization-based TTA methods.
  • The framework allows for continuous adaptation without altering network weights or requiring source data.
Read more
Contrastive Learning Boosts Deterministic and Generative Models for Weather Data
Nathan Bailey
Time Series Generative Models Graph Learning
  • Contrastive learning effectively generates robust embeddings for high-dimensional weather data.
  • The SPARTA method aligns sparse and complete samples to improve representation quality.
  • Incorporating temporal awareness and cycle-consistency enhances latent space structure.
  • A novel graph neural network fusion technique integrates physical knowledge into the learning process.
Read more
Neural Network Conversion of Machine Learning Pipelines
Man-Ling Sung, Jan Silovsky, Man-Hung Siu, Herbert Gish, Chinnu Pittapally
Theory Efficient ML Optimization
  • Introduces a method for converting traditional ML pipelines into neural networks using a student-teacher framework.
  • Focuses on replacing random forest classifiers with neural networks while maintaining performance.
  • Demonstrates the effectiveness of hyper-parameter selection in training NN students to mimic teacher models.
  • Explores the benefits of unified inference engines for multiple ML tasks and improved generalization capabilities.
Read more
Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation
Einari Vaaras, Manu Airaksinen, Okko Räsänen
Time Series Audio & Speech
  • The study compares three sample selection methods for annotating biomedical time-series data.
  • Interactive 2D visualization (2DV) outperformed other methods in aggregating labels and capturing rare classes.
  • Farthest-first traversal (FAFT) excelled in scenarios with limited annotation budgets.
  • The variability in label distribution from 2DV can negatively impact classification performance when training on individual annotators' labels.
Read more
Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression
Rafael Izbicki, Pedro L. C. Rodrigues
Theory
  • Tabular foundation models like TabPFN and TabICL show strong performance in conditional density estimation tasks.
  • These models outperform traditional CDE methods in most scenarios, particularly in terms of CDE loss and log-likelihood.
  • Calibration performance is competitive at smaller sample sizes but may require improvements at larger sizes.
  • A case study in photometric redshift estimation highlights the effectiveness of TabPFN over traditional methods.
Read more
PQuantML: A Tool for End-to-End Hardware-aware Model Compression
Roope Niemi, Anastasiia Petrovych, Arghya Ranjan Das, Enrico Lupi, Chang Sun, Dimitrios Danopoulos, Marlon Joshua Helbing, Mia Liu, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini
Efficient ML
  • PQuantML integrates pruning and quantization in a single framework for model compression.
  • The library is designed for real-time applications, particularly in high-energy physics environments.
  • It achieves significant reductions in model parameters and bit-width while maintaining accuracy.
  • PQuantML simplifies the adoption of advanced compression techniques for physicists.
Read more
D-GATNet: Interpretable Temporal Graph Attention Learning for ADHD Identification Using Dynamic Functional Connectivity
Qurat Ul Ain, Alptekin Temizel, Soyiba Jawed
Graph Learning Time Series Interpretability
  • D-GATNet leverages dynamic functional connectivity for improved ADHD classification.
  • The framework incorporates both spatial and temporal modeling using graph attention and convolutional layers.
  • Interpretability is enhanced through attention weights that identify key brain regions and connectivity patterns.
  • D-GATNet outperforms existing methods on the ADHD-200 dataset, achieving high accuracy and AUC.
Read more
Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards
Senura Hansaja Wanasekara, Minh-Duong Nguyen, Xiaochen Liu, Nguyen H. Tran, Ken-Tye Yong
Generative Models Multimodal
  • Generative modeling is transforming protein design by enabling sequence and structure generation.
  • The paper categorizes existing methods into representations, architectures, and task settings, addressing fragmentation in the literature.
  • Robust evaluation standards are essential for assessing generative models in protein design.
  • Key challenges include modeling dynamics, scaling, and addressing biosecurity concerns.
Read more
Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation
Adam Jakobsen, Sushant Gautam, Hugo Lewi Hammer, Susanne Olofsdotter, Miriam S Johanson, PÃ¥l Halvorsen, Vajira Thambawita
Generative Models Large Language Models NLP
  • Introduces a zero-shot, knowledge-guided framework for synthetic psychiatric data generation.
  • Utilizes large language models and Retrieval-Augmented Generation to create privacy-preserving datasets.
  • Demonstrates competitive performance against state-of-the-art models while ensuring patient privacy.
  • Finds that clinical retrieval significantly improves data fidelity.
Read more
Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework
Yixin Zhou, Zhixiang Liu, Vladimir I. Zadorozhny, Jonathan Elmer
Time Series
  • Identified a critical form of data leakage in EEG modeling pipelines that inflates validation metrics.
  • Proposed a two-stage framework to prevent data leakage, enhancing model reliability.
  • Achieved stable performance in predicting neurological outcomes post-cardiac arrest.
  • Emphasized the necessity of strict patient-level data partitioning in clinical applications.
Read more
On Neural Scaling Laws for Weather Emulation through Continual Training
Shashank Subramanian, Alexander Kiefer, Arnur Nigmetov, Amir Gholami, Dmitriy Morozov, Michael W. Mahoney
Time Series Efficient ML Theory
  • Adoption of a minimalist Swin Transformer architecture for weather forecasting.
  • Continual training with constant learning rates and cooldowns enhances model performance.
  • Constructed IsoFLOP curves to identify compute-optimal training regimes.
  • Demonstrated predictable scaling trends that can guide resource allocation.
Read more
MAGNET: Autonomous Expert Model Generation via Decentralized Autoresearch and BitNet Training
Yongwan Kim, Sungchul Park
Large Language Models Optimization Efficient ML
  • MAGNET automates the ML research process, enabling decentralized model generation and training.
  • The system includes a novel autoresearch pipeline validated through multiple case studies.
  • BitNet b1.58 allows for efficient CPU-native inference, making model deployment accessible on commodity hardware.
  • DiLoCo enables effective merging of independently trained models into stronger collective models.
Read more
CVA: Context-aware Video-text Alignment for Video Temporal Grounding
Sungho Moon, Seunghun Lee, Jiwan Seo, Sunghoon Im
Computer Vision Multimodal
  • Introduction of Query-aware Context Diversification (QCD) to enhance data augmentation.
  • Development of Context-invariant Boundary Discrimination (CBD) loss for improved semantic consistency.
  • Design of Context-enhanced Transformer Encoder (CTE) for effective multi-scale temporal context modeling.
  • Achieved state-of-the-art performance on major Video Temporal Grounding benchmarks.
Read more
Neuro-Symbolic Process Anomaly Detection
Devashish Gaikwad, Wil M. P. van der Aalst, Gyunam Park
Theory Interpretability
  • Introduces a neuro-symbolic approach for process anomaly detection integrating LTN and Declare constraints.
  • Addresses the misclassification of rare but conformant traces as anomalies in traditional methods.
  • Demonstrates improved F1 scores in anomaly detection with limited conformant traces.
  • Highlights the influence of domain knowledge on the effectiveness of anomaly detection.
Read more
PruneFuse: Efficient Data Selection via Weight Pruning and Network Fusion
Humaira Kousar, Hasnain Irshad Bhatti, Jaekyun Moon
Efficient ML
  • Introduction of PruneFuse, a two-stage data selection strategy leveraging pruned networks.
  • Significant reduction in computational costs associated with data selection compared to traditional methods.
  • Improved performance and generalization through the fusion of pruned and original networks.
  • Broad applicability across various datasets and network architectures.
Read more
Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?
Sounak Dutta, Fin Amin, Sushil Panda, Jonathan Rabe, Yuejiang Wen, Paul Franzon
Optimization
  • Introduction of an Actor-Critic framework for analog design optimization.
  • Separation of proposal and evaluation roles enhances search efficiency.
  • ACOF improves top-10 figure of merit by an average of 38.9% over baseline methods.
  • Reduces regret by an average of 24.7%, indicating more effective exploration of the design space.
Read more
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao
NLP Large Language Models Optimization
  • Token-level OPD is biased compared to sequence-level OPD but has lower variance in long-horizon training.
  • Three failure modes of sampled-token OPD are identified: imbalanced signals, unreliable guidance, and tokenizer mismatches.
  • The proposed teacher top-K local support matching improves stability and performance over traditional methods.
  • Empirical results demonstrate better optimization behavior in both single-task and multi-task settings.
Read more
Light Cones For Vision: Simple Causal Priors For Visual Hierarchy
Manglam Kartik, Neel Tushar Shah
Computer Vision Theory
  • Introduction of Worldline Slot Attention to model hierarchical structures in visual data.
  • Demonstration that Euclidean geometry fails to capture necessary causal relationships, while Lorentzian geometry succeeds.
  • Empirical validation of the method across three datasets with significant performance improvements.
  • Highlighting the importance of geometric structure in object-centric learning.
Read more
Curvature-aware Expected Free Energy as an Acquisition Function for Bayesian Optimization
Ajith Anil Meera, Wouter Kouw
Optimization Robotics Theory
  • Introduction of Expected Free Energy as a general acquisition function for Bayesian optimization.
  • Mathematical proofs showing EFE's reduction to UCB, LCB, and EIG under specific conditions.
  • Establishment of unbiased convergence guarantees for EFE on concave functions.
  • Development of a curvature-aware update rule that improves exploration and exploitation balance.
Read more
SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
Xinyu Wang, Fei Dou, Jinbo Bi, Minghu Song
Generative Models Graph Learning Optimization
  • SIGMA addresses trajectory divergence in ChemLMs by enforcing latent isotropy through dense trajectory alignment.
  • The Structure-Invariant Contrastive Loss maximizes mutual information between equivalent generation paths, decoupling chemical semantics from syntactic variations.
  • IsoBeam eliminates isomorphic redundancy during inference, improving computational efficiency.
  • Empirical results show that SIGMA outperforms strong baselines in sample efficiency and structural diversity.
Read more
Missing-Aware Multimodal Fusion for Unified Microservice Incident Management
Wenzhuo Qian, Hailiang Zhao, Ziqi Wang, Zhipeng Gao, Jiayi Chen, Zhiwei Ling, Shuiguang Deng
Multimodal
  • Introduces ARMOR, a framework for incident management that handles missing modalities in multimodal data.
  • Utilizes a modality-specific asymmetric encoder to address distribution disparities among different data types.
  • Employs a missing-aware gated fusion mechanism to reduce cross-modal interference from incomplete inputs.
  • Optimizes anomaly detection, failure triage, and root cause localization in a unified manner without relying heavily on fault labels.
Read more
An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability
Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff
Reinforcement Learning Optimization Theory
  • Introduces a novel policy (UCB-LP-A) for stochastic MAB problems with side-observations and dynamic action availability.
  • Models action availability using discrete activation sets, capturing correlated unavailability in real-world scenarios.
  • Derives a theoretical upper bound on the regret of the proposed policy, considering network structure and activation probabilities.
  • Demonstrates superior performance of UCB-LP-A over existing heuristics through extensive simulations.
Read more
Incorporating contextual information into KGWAS for interpretable GWAS discovery
Cheng Jiang, Brady Ryan, Megan Crow, Kipper Fletez-Brant, Kashish Doshi, Sandra Melo Carlos, Kexin Huang, Burkhard Hoeckendorf, Heming Yao, David Richmond
Graph Learning Interpretability
  • Proposes a context-aware KGWAS framework that utilizes cell-type specific knowledge graphs.
  • Demonstrates that pruning a general-purpose KG does not degrade performance in GWAS.
  • Incorporates Perturb-seq data to enhance gene-gene relationship mapping.
  • Achieves improved retrieval of significant loci in small cohorts.
Read more
Second-Order, First-Class: A Composable Stack for Curvature-Aware Training
Mikalai Korbit, Mario Zanon
Optimization
  • Introduction of Somax, a composable stack for curvature-aware training in JAX.
  • Provides a unified API for second-order optimization, enhancing usability and flexibility.
  • Separation of planning from execution reduces computational overhead and improves efficiency.
  • Empirical evaluations show significant impacts of module choices on performance metrics.
Read more
Energy-Efficient Hierarchical Federated Anomaly Detection for the Internet of Underwater Things via Selective Cooperative Aggregation
Kenechi Omeke, Michael Mollel, Lei Zhang, Qammer H. Abbasi, Muhammad Ali Imran
Federated Learning Efficient ML Time Series
  • Proposes a three-tier hierarchical federated learning framework for anomaly detection in IoUT.
  • Introduces feasibility-aware sensor-to-fog associations and selective cooperative aggregation to optimize energy use.
  • Demonstrates significant energy savings while maintaining detection accuracy compared to traditional flat FL methods.
  • Evaluates the framework using a physics-grounded model to realistically assess communication and participation.
Read more
Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML
Yassien Shaalan
Efficient ML Time Series Audio & Speech
  • HYPERTINYPW replaces stored PW weights with generated weights to reduce memory usage.
  • The method maintains the first PW layer in INT8 format for stability in early mixing.
  • Achieves a 6.31x reduction in model size while retaining over 95% of macro-F1 score on ECG tasks.
  • Provides a detailed analysis of deployment strategies, including boot vs. lazy synthesis.
Read more
Local learning for stable backpropagation-free neural network training towards physical learning
Yaqi Guo, Fabian Braun, Bastiaan Ketelaar, Stephanie Tan, Richard Norte, Siddhant Kumar
Optimization Efficient ML Theory
  • Introduction of FFzero, a backpropagation-free learning framework.
  • Utilizes local learning and directional-derivative optimization for stable training.
  • Demonstrated effectiveness on multilayer perceptrons and convolutional networks.
  • Addresses environmental concerns and physical limitations of traditional deep learning.
Read more
A Lyapunov Analysis of Softmax Policy Gradient for Stochastic Bandits
Tor Lattimore
Reinforcement Learning Theory Optimization
  • Adapts continuous-time analysis of softmax policy gradient to discrete-time stochastic bandits.
  • Establishes a regret bound dependent on the learning rate and action gaps.
  • Utilizes a Lyapunov argument to ensure well-behaved sample paths.
  • Identifies limitations in the learning rate requirements for optimal performance.
Read more
Interpretable long-term traffic modelling on national road networks using theory-informed deep learning
Yue Li, Shujuan Chen, Akihiro Shimoda, Ying Jin
Interpretability
  • DeepDemand integrates travel demand theory with deep learning for improved traffic volume predictions.
  • The model outperforms traditional methods in predictive accuracy and geographic transferability.
  • Interpretability analysis reveals significant socioeconomic factors influencing traffic demand.
  • The framework addresses limitations of existing traffic models by combining structured logic with modern machine learning.
Read more
Machine Unlearning under Retain-Forget Entanglement
Jingpu Cheng, Ping Liu, Qianxiao Li, Chi Zhang
Optimization Theory
  • Introduces a two-phase optimization framework for machine unlearning.
  • Focuses on the issue of retain-forget entanglement in unlearning tasks.
  • Demonstrates improved performance in accuracy retention and removal fidelity over existing methods.
  • Utilizes augmented Lagrangian methods and Wasserstein-2 distance regularization.
Read more
Hardware-Aware Tensor Networks for Real-Time Quantum-Inspired Anomaly Detection at Particle Colliders
Sagar Addepalli, Prajita Bhattarai, Abhilasha Dave, Julia Gonski
Theory Efficient ML
  • Introduction of Spaced Matrix Product Operators (SMPOs) for anomaly detection in collider events.
  • Demonstration of real-time implementation on FPGA hardware, addressing latency and resource constraints.
  • Development of a cascaded SMPO (CSMPO) architecture that maintains performance while reducing computational demands.
  • Potential for quantum-inspired ML to enhance anomaly detection capabilities beyond classical methods in high-energy physics.
Read more
On the Objective and Feature Weights of Minkowski Weighted k-Means
Renato Cordeiro de Amorim, Vladimir Makarenkov
Theory Optimization
  • The mwk-means objective can be expressed as a power-mean aggregation of within-cluster dispersions.
  • The Minkowski exponent p influences the selective and uniform use of features in clustering.
  • The structure of feature weights follows a power-law relationship with dispersion ratios.
  • Convergence guarantees for the mwk-means algorithm are established.
Read more