AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
A Systematic Empirical Study of Grokking: Depth, Architecture, Activation, and Regularization
Shalima Binta Manir, Anamika Paul Rupa
Optimization Theory
  • Depth requires stabilization for effective grokking, with depth-4 MLPs failing while depth-8 residual networks succeed.
  • The gap between Transformers and MLPs diminishes under matched hyperparameters, indicating confounding factors in previous studies.
  • Activation function performance is dependent on the regularization regime, with GELU being advantageous only under certain conditions.
  • Weight decay is the dominant factor influencing grokking, with a narrow optimal range necessary for generalization.
Read more
Longitudinal Digital Phenotyping for Early Cognitive-Motor Screening
Diego Jimenez-Oviedo, Ruben Vera-Rodriguez, Ruben Tolosana, Juan Carlos Ruiz-Garcia, Jaime Herreros-Rodriguez
Time Series Multimodal
  • Introduces an AI-driven framework for continuous monitoring of cognitive-motor development in children.
  • Identifies three distinct developmental profiles based on tablet interaction data.
  • Demonstrates high stability in low-performance profiles, indicating potential for persistent deficits.
  • Utilizes unsupervised learning techniques to uncover natural patterns of cognitive growth.
Read more
Anchored-Branched Steady-state WInd Flow Transformer (AB-SWIFT): a metamodel for 3D atmospheric flow in urban environments
Armand de Villeroché, Rem-Sophia Mouradi, Vincent Le Guen, Sibo Cheng, Marc Bocquet, Alban Farchi, Patrick Armand, Patrick Massin
Theory Graph Learning Efficient ML
  • AB-SWIFT is the first transformer-based neural operator specifically designed for local-scale atmospheric flow modeling.
  • The model is trained on a new dataset that includes various urban geometries and atmospheric stratification conditions.
  • AB-SWIFT achieves superior accuracy compared to existing transformer and graph neural network models.
  • The model's internal branched structure allows for effective representation of complex urban environments.
Read more
Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards
Cheng Jiayang, Xin Liu, Zhihan Zhang, Haoyang Wen, Zixuan Zhang, Qingyu Yin, Shiyang Li, Priyanka Nigam, Bing Yin, Chao Zhang, Yangqiu Song
NLP Large Language Models Reinforcement Learning
  • Introduces a framework for training LLMs in multi-step tool orchestration using real API responses.
  • Develops a graduated reward system that provides fine-grained feedback on correctness.
  • Demonstrates substantial improvements in model performance on ComplexFuncBench.
  • Identifies the limitations of existing training environments and reward structures in RL-based tool use.
Read more
Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation
Adam Jakobsen, Sushant Gautam, Hugo Lewi Hammer, Susanne Olofsdotter, Miriam S Johanson, PÃ¥l Halvorsen, Vajira Thambawita
Generative Models Large Language Models NLP
  • Introduces a zero-shot, knowledge-guided framework for synthetic psychiatric data generation.
  • Utilizes large language models (LLMs) with Retrieval-Augmented Generation to avoid reliance on real patient data.
  • Demonstrates competitive performance in generating high-fidelity synthetic data while preserving privacy.
  • Shows that clinical retrieval enhances the fidelity of generated datasets.
Read more
Not a fragment, but the whole: Map-based evaluation of data-driven Fire Danger Index models
Shahbaz Alvi, Italo Epicoco, Jose Maria Costa Saura
Time Series
  • Traditional evaluation metrics for wildfire prediction models often neglect the impact of false positives.
  • The proposed evaluation framework aligns model performance assessment with real-world operational needs.
  • Ensemble machine learning models improve both fire detection accuracy and reduce false alarms.
  • The study highlights the importance of incorporating a comprehensive set of fire predictors beyond meteorological variables.
Read more
Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch
Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela
Optimization Large Language Models
  • Classical HPO methods outperform LLM-based methods in fixed search spaces.
  • LLM agents that edit training code can significantly reduce the performance gap with classical methods.
  • The Centaur hybrid method, which shares CMA-ES's internal state with an LLM, achieves the best results.
  • Reliability in optimization methods is more critical than exploration breadth.
Read more
SEVerA: Verified Synthesis of Self-Evolving Agents
Debangshu Banerjee, Changming Xu, Gagandeep Singh
Large Language Models Generative Models Theory
  • Introduces a formal framework for synthesizing self-evolving agents with safety guarantees.
  • Combines hard formal specifications with soft performance objectives in agent synthesis.
  • Utilizes Formally Guarded Generative Models (FGGM) to enforce formal output contracts.
  • Achieves zero constraint violations across multiple evaluation tasks.
Read more
SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
Xinyu Wang, Fei Dou, Jinbo Bi, Minghu Song
Generative Models Graph Learning Optimization
  • SIGMA addresses trajectory divergence in ChemLMs by enforcing latent isotropy through dense trajectory alignment.
  • The Structure-Invariant Contrastive Loss maximizes mutual information between equivalent generation paths, decoupling chemical semantics from syntactic variations.
  • IsoBeam eliminates isomorphic redundancy during inference, optimizing computational resources and enhancing exploration of structurally distinct molecular scaffolds.
  • Empirical results show that SIGMA outperforms strong baselines in sample efficiency and structural diversity.
Read more
Contrastive Learning Boosts Deterministic and Generative Models for Weather Data
Nathan Bailey
Time Series Generative Models Multimodal
  • Contrastive learning effectively generates robust low-dimensional embeddings from high-dimensional weather data.
  • The proposed SPARTA method improves performance over traditional autoencoders in downstream tasks.
  • Incorporating domain-specific knowledge through graph neural networks enhances the contrastive learning approach.
  • The methodology addresses the challenges of data sparsity and multimodality in weather datasets.
Read more
Gap Safe Screening Rules for Fast Training of Robust Support Vector Machines under Feature Noise
Tan-Hau Nguyen, Thu-Le Tran, Kien Trung Nguyen
Optimization Efficient ML Theory
  • Introduction of safe sample screening rules for R-SVMs to reduce training complexity.
  • First application of safe screening techniques to worst-case robust models in supervised learning.
  • Methodology based on Lagrangian duality rather than Fenchel-Rockafellar duality.
  • Experimental results show significant reduction in training time with preserved accuracy.
Read more
Local learning for stable backpropagation-free neural network training towards physical learning
Yaqi Guo, Fabian Braun, Bastiaan Ketelaar, Stephanie Tan, Richard Norte, Siddhant Kumar
Efficient ML Theory Optimization
  • Introduction of FFzero, a backpropagation-free learning framework.
  • Utilizes layer-wise local learning and directional-derivative optimization.
  • Demonstrates effectiveness in multilayer perceptron and convolutional networks.
  • Provides a viable path for in-situ physical learning using simulated photonic networks.
Read more
DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph
Feng Zhao, Kangzheng Liu, Teng Peng, Yu Yang, Guandong Xu
Multimodal Graph Learning Time Series
  • DyMRL integrates time-sensitive structural features from multiple geometric spaces for deep representation learning.
  • The approach incorporates dual fusion-evolution attention mechanisms for dynamic multimodal feature fusion.
  • Extensive experiments show that DyMRL outperforms existing dynamic and static methods in event forecasting.
  • The framework reflects human-like cognitive processes in associative thinking and logical reasoning.
Read more
Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models
Moazzam Umer Gondal, Hamad ul Qudous, Asma Ahmad Farhan, Sultan Alamri
Time Series Interpretability Efficient ML
  • Lightweight forecasting models can achieve competitive accuracy in PM2.5 prediction.
  • Facebook Prophet demonstrated the best performance in terms of accuracy and execution time.
  • Residual correction significantly improved the robustness of forecasts.
  • The study emphasizes the importance of interpretability and computational efficiency in air quality forecasting.
Read more
From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents
Shuoling Liu, Zhiquan Tan, Kun Yi, Hui Wu, Yihan Li, Jiangpeng Yan, Liyuan Chen, Kai Chen, Qiang Yang
Theory
  • Introduces a formal framework for evaluating Deep Research Agents using category theory.
  • Develops a benchmark with 296 questions to rigorously test DRA capabilities.
  • Finds that state-of-the-art models achieve only 19.9% average accuracy, revealing significant evaluation challenges.
  • Identifies a dichotomy in AI capabilities: strengths in dynamic reasoning but weaknesses in multi-hop synthesis.
Read more
Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
Haishan Ye
Optimization Theory
  • Introduces the first high-probability regret bound for two-point feedback in OCO.
  • Achieves a minimax optimal regret bound of O(d(log T + log(1/δ))/µ) for strongly convex losses.
  • Improves the dimension dependency from O(d²) to O(d), enhancing efficiency.
  • Develops a novel analytical framework that is robust to variance in gradient estimators.
Read more
Neural Network Conversion of Machine Learning Pipelines
Man-Ling Sung, Jan Silovsky, Man-Hung Siu, Herbert Gish, Chinnu Pittapally
Theory Efficient ML Optimization
  • Introduces a novel approach to convert traditional ML pipelines into neural networks using student-teacher learning.
  • Demonstrates that neural networks can effectively mimic the performance of random forest classifiers across multiple tasks.
  • Explores the use of random forests for hyper-parameter selection in neural networks.
  • Highlights the benefits of unified inference engines for joint optimization of ML components.
Read more
How unconstrained machine-learning models learn physical symmetries
Michelangelo Domina, Joseph William Abbott, Paolo Pegolo, Filippo Bigi, Michele Ceriotti
Theory Graph Learning Efficient ML
  • Unconstrained ML models can learn physical symmetries effectively through data augmentation.
  • The paper introduces metrics to measure the symmetry content of learned representations.
  • Analysis of symmetry processing across model layers provides insights into model performance.
  • Strategic injection of inductive biases can improve model stability and accuracy.
Read more
Grokking as a Falsifiable Finite-Size Transition
Yuda Bi, Chenyu Zhang, Qiheng Wang, Vince D Calhoun
Theory
  • Introduces a framework for testing grokking as a finite-size transition using statistical mechanics principles.
  • Identifies the group order p of Zp as an extensive variable and spectral head–tail contrast as an order parameter.
  • Demonstrates that grokking exhibits a shared finite-size boundary, challenging the smooth-crossover interpretation.
  • Applies a rigorous diagnostic protocol that distinguishes genuine transitions from mere fitting exercises.
Read more
An Explainable Ensemble Learning Framework for Crop Classification with Optimized Feature Pyramids and Deep Networks
Syed Rayhan Masud, SK Muktadir Hossain, Md. Ridoy Sarkar, Mohammad Sakib Mahmood, Md. Kishor Morol, Rakib Hossain Sajib
Interpretability
  • Introduction of a high-performance meta-ensemble framework for crop classification.
  • Integration of Explainable AI methods to enhance transparency and interpretability.
  • Identification of key soil and climate features impacting crop suitability, consistent with agronomic knowledge.
  • Achieved 98.80% accuracy, precision, recall, and F1-score, outperforming individual models.
Read more
Physics-Informed Neural Network Digital Twin for Dynamic Tray-Wise Modeling of Distillation Columns under Transient Operating Conditions
Debadutta Patra, Ayush Bardhan Tripathy, Soumya Ranjan Sahu, Sucheta Panda
Optimization Theory Time Series
  • Introduction of a PINN framework that embeds thermodynamic constraints for distillation modeling.
  • Development of a sigmoid-scheduled adaptive loss-weighting strategy for training.
  • Creation of a comprehensive synthetic dataset for evaluating the model's performance.
  • Demonstration of superior performance compared to traditional data-driven models.
Read more
Layer-Specific Lipschitz Modulation for Fault-Tolerant Multimodal Representation Learning
Diyar Altinses, Andreas Schwung
Multimodal Theory Efficient ML
  • Introduces a unified framework for fault-tolerant multimodal representation learning.
  • Develops a dual-regularization mechanism to balance sensitivity for anomaly detection and correction.
  • Demonstrates improved performance on multimodal fault datasets compared to existing methods.
  • Integrates theoretical insights on perturbation effects into practical applications.
Read more
Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning
Jiajun Hu, Nuria Armengol Urpi, Jin Cheng, Stelian Coros
Reinforcement Learning Robotics
  • Introduction of FB-MEBE, an online zero-shot RL algorithm for quadrupedal robots.
  • Maximizes entropy of behavior distribution to enhance exploration and policy diversity.
  • Combines unsupervised exploration with a regularization critic for physically plausible behaviors.
  • Demonstrates improved performance in simulated tasks compared to other exploration strategies.
Read more
Process-Aware AI for Rainfall-Runoff Modeling: A Mass-Conserving Neural Framework with Hydrological Process Constraints
Mohammad A. Farmani, Hoshin V. Gupta, Ali Behrangi, Muhammad Jawad, Sadaf Moghisi, Guo-Yue Niu
Interpretability Time Series
  • Embedding hydrological process constraints enhances interpretability in rainfall-runoff predictions.
  • Vertical drainage significantly improves model performance in arid and snow-dominated basins.
  • Process-aware AI models can achieve deep-learning predictive skill while retaining physical interpretability.
Read more
Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback
Jungtaek Kim, Thomas Zeng, Ziqian Lin, Minjae Lee, Chungpa Lee, Jy-yong Sohn, Hyung Il Koo, Kangwook Lee
Large Language Models Reinforcement Learning Theory
  • Introduces a new framework for evaluating LLMs' search capabilities using external feedback.
  • Demonstrates that Transformers can represent and approximate various search strategies.
  • Shows that targeted training can significantly improve LLM performance in search tasks.
  • Highlights the limitations of current LLMs compared to traditional search algorithms.
Read more
Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization
Kalle Kujanpää, Yuying Zhu, Kristina Klinkner, Shervin Malmasi
Reinforcement Learning Large Language Models Optimization
  • Development of a Transformer-GNN architecture for offline RL that improves throughput by 2.4%.
  • LLMs require substantial task-specific adaptation; prompting alone is inadequate.
  • Supervised fine-tuning and preference optimization enable LLMs to match historical performance.
  • Iterative feedback loops can facilitate human-AI collaboration in decision-making.
Read more
How Class Ontology and Data Scale Affect Audio Transfer Learning
Manuel Milling, Andreas Triantafyllopoulos, Alexander Gebhard, Simon Rampp, Björn W. Schuller
Audio & Speech
  • Transfer learning performance in audio tasks is significantly influenced by the similarity between pre-training and downstream tasks.
  • Increasing the number of samples and classes in pre-training data positively impacts transfer learning, but not as much as task similarity.
  • The study provides a comprehensive analysis of various subsets of AudioSet for pre-training DNNs.
  • Findings challenge the assumption that larger datasets with broader ontologies are always the best choice for pre-training in audio tasks.
Read more
Flow matching on homogeneous spaces
Francesco Ruscelli
Generative Models Theory Efficient ML
  • Introduces a framework for flow matching on homogeneous spaces by lifting to Lie groups.
  • Avoids complex geometry by simplifying the problem to Euclidean flow matching on Lie algebras.
  • Eliminates the need for premetrics or geodesics, making the approach simpler and faster.
  • Demonstrates the framework's effectiveness through case studies on specific homogeneous spaces.
Read more
Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI
Steffen Lukas
Theory
  • Foundation models struggle in high-stakes environments due to the Fidelity Paradox, where they memorize noise instead of capturing useful signals.
  • Epistemic Compression emphasizes the importance of model architecture in enforcing parsimony to enhance robustness.
  • The Regime Index effectively differentiates between Shifting and Stable Regimes, guiding model complexity decisions.
  • High-capacity models can lead to overfitting in unstable environments, necessitating a focus on simpler, more robust models.
Read more
CVA: Context-aware Video-text Alignment for Video Temporal Grounding
Sungho Moon, Seunghun Lee, Jiwan Seo, Sunghoon Im
Computer Vision Multimodal
  • Introduction of Query-aware Context Diversification (QCD) to enhance data augmentation.
  • Development of Context-invariant Boundary Discrimination (CBD) loss for improved semantic consistency.
  • Design of Context-enhanced Transformer Encoder (CTE) for capturing multi-scale temporal context.
  • Achieves state-of-the-art performance on major Video Temporal Grounding benchmarks.
Read more
Offline Decision Transformers for Neural Combinatorial Optimization: Surpassing Heuristics on the Traveling Salesman Problem
Hironori Ohigashi, Shinichiro Hamada
Reinforcement Learning Optimization
  • Introduces a novel Decision Transformer framework for the Traveling Salesman Problem (TSP).
  • Demonstrates that conditioning on appropriate Return-to-Go (RTG) is crucial for surpassing heuristic performance.
  • Employs expectile regression to improve the quality of solutions in combinatorial optimization.
  • Shows that offline RL can effectively leverage heuristic datasets to generate superior solutions.
Read more
A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study
Yongda Fan, John Wu, Andrea Fitzpatrick, Naveen Baskaran, Jimeng Sun, Adam Cross
Time Series Interpretability
  • Attention mechanisms can effectively enhance interpretability in clinical predictive models.
  • Black-box interpreters like KernelSHAP and LIME are not suitable for time-series clinical prediction tasks.
  • Many interpretability approaches lack reliability and cannot be trusted for clinical applications.
  • The study provides guidelines for improving interpretability in clinical predictive workflows.
Read more
The Order Is The Message
Jordan LeDoux
Theory Efficient ML Interpretability
  • Example ordering in neural network training is a significant information channel, not just a nuisance variable.
  • Counterfactual gradient decomposition reveals that ordering contributes approximately 85% to the cumulative gradient norm.
  • Consistent ordering can enhance learning efficiency, achieving high accuracy with significantly less training data compared to IID shuffling.
  • The study highlights the importance of temporal structure in learning, which is often overlooked in traditional training methodologies.
Read more
Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML
Yassien Shaalan
Efficient ML Time Series Audio & Speech
  • HYPERTINYPW replaces stored PW weights with generated weights to reduce memory usage.
  • The method maintains compatibility with standard integer operators, ensuring efficient inference.
  • Validation on ECG benchmarks shows significant compression without sacrificing performance.
  • The approach enforces a shared latent basis across layers, reducing redundancy.
Read more
A Unified Memory Perspective for Probabilistic Trustworthy AI
Xueji Zhao, Likai Pei, Jianbo Liu, Kai Ni, Ningyuan Cao
Theory Efficient ML
  • Introduces a unified probabilistic memory abstraction for analyzing deterministic and stochastic operations.
  • Identifies a scaling mismatch between compute throughput, memory bandwidth, and entropy generation, leading to 'entropy wall' issues.
  • Examines architectural trade-offs between conventional von Neumann systems and emerging probabilistic compute-in-memory approaches.
  • Defines memory-level evaluation criteria to assess the effectiveness of memory systems in probabilistic computation.
Read more
Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring
John Ayotunde, Qinghua Xu, Guancheng Wang, Lionel C. Briand
Time Series
  • Introduces U-Balance, a novel approach for rebalancing imbalanced datasets in CPS safety monitoring.
  • Utilizes behavioral uncertainty as a key signal correlated with safety outcomes.
  • Demonstrates significant improvement in safety prediction performance over traditional methods.
  • Achieves a notable F1 score of 0.806 on a challenging UAV dataset.
Read more
Causal-INSIGHT: Probing Temporal Models to Extract Causal Structure
Benjamin Redden, Hui Wang, Shuyan Li
Time Series Interpretability Graph Learning
  • Causal-INSIGHT is a model-agnostic framework for extracting causal structures from temporal predictors.
  • The framework utilizes input clamping to analyze model responses and construct directed temporal influence signals.
  • Qbic, a new graph selection criterion, balances predictive accuracy and structural complexity.
  • Causal-INSIGHT shows competitive performance across various architectures and improves temporal delay localization.
Read more
Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder
Kewei Zhu, Yanze Xin, Jinwei Hu, Xiaoyuan Cheng, Yiming Yang, Sibo Cheng
Time Series
  • Introduces the Physics-Spatiotemporal Masked Autoencoder (P-STMAE) for forecasting irregular time series.
  • Integrates convolutional autoencoders with masked autoencoders to handle missing data without imputation.
  • Achieves significant improvements in prediction accuracy and computational efficiency over traditional methods.
  • Demonstrates robustness to nonlinearities in high-dimensional dynamical systems.
Read more
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Selim An, Il hong Suh, Yeseong Kim
Large Language Models Efficient ML Optimization
  • GlowQ utilizes a group-shared low-rank approximation to enhance quantized LLMs.
  • The method reduces latency and memory overhead by caching a single shared right factor per input-sharing group.
  • GlowQ-S, a selective variant, further optimizes performance by applying corrections only where beneficial.
  • Empirical results show significant improvements in efficiency and accuracy over strong baselines.
Read more
A CDF-First Framework for Free-Form Density Estimation
Chenglong Song, Mazharul Islam, Lin Wang, Bing Chen, Bo Yang
Generative Models Theory
  • Introduces a CDF-first framework that reframes density estimation as learning a valid CDF, minimizing inductive bias.
  • Extends the framework to multivariate outputs using an autoregressive decomposition with SMM-based conditional CDFs.
  • Demonstrates superior performance in capturing multi-modality, skewness, and topological complexity compared to existing methods.
Read more
Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Li Qing, Ke Tang
Reinforcement Learning Large Language Models Efficient ML
  • HIVE framework improves prompt selection efficiency in RL training for LLMs.
  • Identifies the 'learning edge' where the most informative prompts reside.
  • Utilizes historical data and real-time entropy to select high-utility prompts.
  • Demonstrates significant reductions in computational overhead during training.
Read more
Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?
Sounak Dutta, Fin Amin, Sushil Panda, Jonathan Rabe, Yuejiang Wen, Paul Franzon
Optimization
  • Introduces an Actor-Critic Optimization Framework (ACOF) for analog design optimization.
  • Separates proposal and evaluation roles to enhance search efficiency and interpretability.
  • Achieves significant improvements in design metrics over existing optimization methods.
  • Maintains compatibility with standard simulation workflows.
Read more
Missing-Aware Multimodal Fusion for Unified Microservice Incident Management
Wenzhuo Qian, Hailiang Zhao, Ziqi Wang, Zhipeng Gao, Jiayi Chen, Zhiwei Ling, Shuiguang Deng
Multimodal
  • ARMOR effectively addresses the issue of missing modalities in multimodal data for incident management.
  • The framework employs a self-supervised approach, eliminating the need for extensive fault labels.
  • It features a modality-specific asymmetric encoder and a missing-aware gated fusion mechanism.
  • ARMOR demonstrates state-of-the-art performance in anomaly detection, failure triage, and root cause localization.
Read more
Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim
NLP Large Language Models Reinforcement Learning
  • Multi-Answer RL enables language models to generate multiple plausible answers simultaneously.
  • The approach improves diversity and coverage of responses while providing calibrated uncertainty estimates.
  • Empirical results show over 50% improvement in accuracy on coding tasks with reduced token usage.
  • The method is applicable to various domains, including medical diagnosis and ambiguous question answering.
Read more
Energy-Efficient Hierarchical Federated Anomaly Detection for the Internet of Underwater Things via Selective Cooperative Aggregation
Kenechi Omeke, Michael Mollel, Lei Zhang, Qammer H. Abbasi, Muhammad Ali Imran
Federated Learning Efficient ML Time Series
  • Proposes a three-tier hierarchical federated learning framework for anomaly detection in IoUT.
  • Introduces feasibility-aware sensor-to-fog associations and compressed model-update transmissions.
  • Demonstrates that selective cooperative aggregation reduces energy consumption significantly while maintaining detection accuracy.
  • Evaluates the framework using a physics-grounded model to assess communication energy and network participation.
Read more
Amplified Patch-Level Differential Privacy for Free via Random Cropping
Kaan Durmaz, Jan Schuchardt, Sebastian Schmidt, Stephan Günnemann
Computer Vision Theory Efficient ML
  • Random cropping can enhance differential privacy by probabilistically excluding sensitive content from model inputs.
  • A new patch-level neighboring relation is introduced to better align privacy definitions with the structure of vision data.
  • The method does not require changes to existing training algorithms or additional computational resources.
  • Empirical results demonstrate improved privacy-utility trade-offs across multiple segmentation architectures and datasets.
Read more
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Hector Borobia, Elies Seguí-Mas, Guillermina Tormo-Carbó
NLP Large Language Models Interpretability
  • Rare features survive pruning better than frequent features, indicating implicit feature selection.
  • Wanda pruning preserves feature structure up to 3.7 times better than magnitude pruning.
  • Pre-trained Sparse Autoencoders remain effective on Wanda-pruned models up to 50% sparsity.
  • Geometric feature survival does not correlate with causal importance in model outputs.
Read more
Learning Mesh-Free Discrete Differential Operators with Self-Supervised Graph Neural Networks
Lucas Gerken Starepravo, Georgios Fourtakas, Steven Lind, Ajay B. Harish, Tianning Tang, Jack R. C. King
Graph Learning Theory Efficient ML
  • Introduction of a self-supervised graph neural network framework for learning mesh-free differential operators.
  • Operators are learned based on polynomial consistency constraints, enhancing accuracy while maintaining computational efficiency.
  • Demonstrated improved accuracy over traditional SPH methods and favorable trade-offs in computational cost.
  • Framework is resolution-agnostic and applicable across various particle configurations and governing equations.
Read more