AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

62 Papers today
8h Update frequency
7 Days of history
Not a fragment, but the whole: Map-based evaluation of data-driven Fire Danger Index models
Shahbaz Alvi, Italo Epicoco, Jose Maria Costa Saura
Time Series
  • Proposes a novel evaluation method for Fire Danger Index models that incorporates real-world decision-making.
  • Highlights the critical importance of minimizing false positive rates in wildfire prediction models.
  • Demonstrates that ensemble machine learning models improve both fire identification accuracy and reduce false alarms.
  • Addresses the limitations of traditional evaluation metrics in capturing operational performance.
Read more
LineMVGNN: Anti-Money Laundering with Line-Graph-Assisted Multi-View Graph Neural Networks
Chung-Hoo Poon, James Kwok, Calvin Chow, Jang-Hyeon Choi
Graph Learning
  • Introduction of LineMVGNN, a novel GNN model for AML detection.
  • Utilization of line graphs to enhance transaction information propagation.
  • Demonstrated superior performance over existing state-of-the-art methods.
  • Discussion on scalability and robustness of the proposed method.
Read more
Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers
Jun Ma, Xu Zhang, Zhengxing Jiao, Yaxin Hou, Hui Liu, Junhui Hou, Yuheng Jia
Computer Vision NLP Multimodal
  • Introduces a new LAIC framework that enhances clustering performance by addressing limitations in existing methods.
  • Utilizes cross-modal relations to create more discriminative self-supervision signals for clustering.
  • Implements learnable category-wise semantic centers through prompt learning for improved clustering assignments.
  • Achieves an average performance improvement of 2.6% over state-of-the-art methods across multiple datasets.
Read more
How Class Ontology and Data Scale Affect Audio Transfer Learning
Manuel Milling, Andreas Triantafyllopoulos, Alexander Gebhard, Simon Rampp, BjΓΆrn W. Schuller
Audio & Speech
  • Transfer learning benefits from both the scale of pre-training data and the similarity to downstream tasks.
  • The study highlights the importance of task similarity over mere data quantity in achieving better transfer performance.
  • The findings challenge existing assumptions about the optimal characteristics of pre-training datasets for audio tasks.
  • The research provides a systematic approach to evaluate the impact of class ontology on audio transfer learning.
Read more
Experiential Reflective Learning for Self-Improving LLM Agents
Marc-Antoine Allard, Arnaud Teinturier, Victor Xing, Gautier Viaud
Large Language Models NLP Reinforcement Learning
  • Introduces Experiential Reflective Learning (ERL) for LLM agents to improve adaptability.
  • ERL generates heuristics from past experiences to guide future task execution.
  • Achieves a 7.8% improvement in success rates on the Gaia2 benchmark compared to existing methods.
  • Highlights the importance of selective heuristic retrieval for performance enhancement.
Read more
Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards
Cheng Jiayang, Xin Liu, Zhihan Zhang, Haoyang Wen, Zixuan Zhang, Qingyu Yin, Shiyang Li, Priyanka Nigam, Bing Yin, Chao Zhang, Yangqiu Song
Large Language Models Reinforcement Learning
  • Introduces a framework for training LLMs in multi-step tool orchestration using real API responses.
  • Develops a graduated reward system that provides fine-grained feedback on correctness.
  • Demonstrates substantial improvements in model performance on ComplexFuncBench.
  • Highlights the importance of addressing dependency handling in multi-step workflows.
Read more
Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
Haishan Ye
Optimization Theory
  • Introduces the first high-probability regret bound for two-point feedback in OCO with strongly convex losses.
  • Achieves a regret bound of O(d(log T + log(1/Ξ΄))/Β΅), which is minimax optimal.
  • Implements a novel analytical framework that improves robustness against variance in estimators.
  • Reduces the dimension dependency from O(dΒ²) in prior work to O(d).
Read more
Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
Anand Swaroop
Theory Interpretability
  • Grokking is characterized by a delayed rise in validation accuracy after memorization in neural networks.
  • ReLU MLPs learn near-binary square wave input weights, differing from the sinusoidal weights found in transformers.
  • The study establishes a phase-sum relation among output weights, which holds even under noisy training conditions.
  • An idealized MLP model based on extracted weight characteristics achieves high accuracy, indicating the latent algorithmic structure is refined during grokking.
Read more
Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML
Yassien Shaalan
Efficient ML Time Series Audio & Speech
  • HYPERTINYPW reduces memory usage by generating most PW weights at load time using a shared micro-MLP.
  • The method retains the first PW layer in INT8 format to stabilize early mixing processes.
  • Achieves a 6.31x reduction in model size while retaining over 95% of the original model's performance on ECG tasks.
  • Compatible with standard integer operations, ensuring ease of integration with existing TinyML frameworks.
Read more
Contrastive Learning Boosts Deterministic and Generative Models for Weather Data
Nathan Bailey
Time Series Generative Models Graph Learning
  • Contrastive learning effectively compresses high-dimensional weather data into low-dimensional embeddings.
  • The SPARTA method aligns sparse and complete data samples, enhancing the robustness of embeddings.
  • Incorporating temporal awareness and cycle-consistency loss improves the latent space structure.
  • The proposed graph neural network fusion technique enhances the contrastive learning approach by integrating physical knowledge.
Read more
Boost Like a (Var)Pro: Trust-Region Gradient Boosting via Variable Projection
Abhijit Chowdhary, Elizabeth Newman, Deepanshu Verma
Optimization Theory Efficient ML
  • VPBoost bridges the gap between function space and parameter space in gradient boosting.
  • The algorithm operates as a trust-region method, enhancing adaptivity and reducing hyperparameter tuning.
  • Convergence guarantees are established under new subspace regularity conditions.
  • Empirical results show VPBoost consistently outperforms existing boosting methods across multiple tasks.
Read more
Flow matching on homogeneous spaces
Francesco Ruscelli
Generative Models Theory Efficient ML
  • Introduces a framework for flow matching on homogeneous spaces, simplifying the geometry involved.
  • Reformulates the flow matching problem to operate directly on Lie groups, avoiding complex computations.
  • Eliminates the need for geodesics or premetrics, leading to a more efficient and intrinsic approach.
  • Demonstrates the framework's applicability through case studies on specific homogeneous spaces.
Read more
Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters
Nan Cui, Wendy Hui Wang, Yue Ning
NLP Large Language Models Efficient ML
  • Proposes a lightweight bias mitigation method for LLM-based recommendations.
  • Combines kernelized Iterative Null-space Projection with a gated Mixture-of-Experts adapter.
  • Achieves bias removal without additional trainable parameters, ensuring computational efficiency.
  • Demonstrates significant fairness improvements while maintaining recommendation quality.
Read more
GraphER: An Efficient Graph-Based Enrichment and Reranking Method for Retrieval-Augmented Generation
Ruizhong Miao, Yuying Wang, Rongguang Wang, Chenyang Li, Tao Sheng, Sujith Ravi, Dan Roth
NLP Graph Learning Efficient ML
  • GraphER enhances retrieval-augmented generation by leveraging graph-based enrichment and reranking.
  • The method operates independently of knowledge graphs, allowing for efficient integration with existing vector stores.
  • GraphER captures multiple forms of proximity beyond semantic similarity, improving retrieval completeness.
  • The approach is retriever-agnostic and introduces negligible latency overhead.
Read more
Process-Aware AI for Rainfall-Runoff Modeling: A Mass-Conserving Neural Framework with Hydrological Process Constraints
Mohammad A. Farmani, Hoshin V. Gupta, Ali Behrangi, Muhammad Jawad, Sadaf Moghisi, Guo-Yue Niu
Time Series Interpretability
  • Embedding hydrological process constraints within a mass-conserving AI framework enhances rainfall-runoff prediction interpretability.
  • Incorporating vertical drainage improves performance in arid and snow-dominated basins but may reduce skill in rainfall-dominated areas.
  • Process-aware AI models can achieve deep-learning predictive skill while retaining physically interpretable storage-flux dynamics.
Read more
Vision Hopfield Memory Networks
Jianfeng Wang, Amine M'Charrak, Luk Koska, Xiangtao Wang, Daniel Petriceanu, Mykyta Smyrnov, Ruizhi Wang, Michael Bumbar, Luca Pinchetti, Thomas Lukasiewicz
Computer Vision Multimodal Interpretability
  • V-HMN integrates hierarchical memory mechanisms for improved interpretability and data efficiency.
  • The model employs local and global Hopfield modules for associative memory and contextual modulation.
  • Iterative refinement updates enhance the model's error correction capabilities.
  • V-HMN achieves competitive results on computer vision benchmarks while being more interpretable and data-efficient.
Read more
Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Li Qing, Ke Tang
Reinforcement Learning Large Language Models Efficient ML
  • HIVE framework improves efficiency in RL training for LLMs by selecting high-utility prompts.
  • The concept of 'learning edge' is introduced, highlighting the dynamic nature of sample utility.
  • HIVE reduces computational overhead significantly, achieving up to 3.8Γ— speedup in rollout and 2.2Γ— faster training time.
  • Real-time verification using prompt entropy helps mitigate metadata staleness in prompt selection.
Read more
A Theory of LLM Information Susceptibility
Zhuo-Yang Song, Hua Xing Zhu
Theory Large Language Models Optimization
  • Introduces the concept of LLM information susceptibility and its implications for optimization in agentic systems.
  • Develops a multi-variable utility-function framework to generalize the hypothesis across various architectures.
  • Empirical validation shows that fixed LLM configurations do not improve performance susceptibility in large-budget scenarios.
  • Demonstrates that nested architectures can provide additional response channels for optimization.
Read more
From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents
Shuoling Liu, Zhiquan Tan, Kun Yi, Hui Wu, Yihan Li, Jiangpeng Yan, Liyuan Chen, Kai Chen, Qiang Yang
Theory
  • Introduces a categorical framework for evaluating Deep Research Agents (DRAs).
  • Develops a novel benchmark with 296 questions to rigorously test agent capabilities.
  • Finds that state-of-the-art models achieve only 19.9% average accuracy, revealing evaluation challenges.
  • Identifies a dichotomy in AI capabilities, with strengths in certain areas but weaknesses in multi-hop synthesis.
Read more
DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph
Feng Zhao, Kangzheng Liu, Teng Peng, Yu Yang, Guandong Xu
Multimodal Graph Learning Time Series
  • DyMRL integrates multiple geometric spaces for dynamic structural modality learning.
  • The approach incorporates dual fusion-evolution attention mechanisms for effective multimodal feature fusion.
  • Extensive experiments show that DyMRL outperforms existing methods in event forecasting.
  • The method reflects human-like cognitive processes in associative thinking and logical reasoning.
Read more
Amplified Patch-Level Differential Privacy for Free via Random Cropping
Kaan Durmaz, Jan Schuchardt, Sebastian Schmidt, Stephan GΓΌnnemann
Computer Vision Theory Efficient ML
  • Random cropping can amplify differential privacy in machine learning models without requiring changes to the training process.
  • A new patch-level neighboring relation is introduced, allowing for more precise privacy accounting in vision tasks.
  • The method enhances the privacy-utility trade-off, demonstrating improved performance in semantic segmentation tasks.
  • The approach is computationally efficient, requiring no additional overhead.
Read more
Marchuk: Efficient Global Weather Forecasting from Mid-Range to Sub-Seasonal Scales via Flow Matching
Arsen Kuzhamuratov, Mikhail Zhirnov, Andrey Kuznetsov, Ivan Oseledets, Konstantin Sobolev
Generative Models Time Series Efficient ML
  • Marchuk is a generative latent flow-matching model for weather forecasting.
  • It effectively predicts weather up to 30 days ahead, addressing the limitations of traditional models.
  • The model uses trainable positional embeddings and extended context windows to improve long-range forecasting.
  • Marchuk maintains high predictive performance with significantly fewer parameters than existing models.
Read more
Dual-Criterion Curriculum Learning: Application to Temporal Data
Gaspard Abel, Eloi Campagne, Mohamed Benloughmari, Argyris Kalogeratos
Time Series
  • Introduction of Dual-Criterion Curriculum Learning (DCCL) framework combining loss-based and density-based difficulty assessments.
  • DCCL addresses the limitations of traditional difficulty measures in Curriculum Learning.
  • Empirical evaluations show significant improvements in time-series forecasting tasks using DCCL.
  • The framework is modular and applicable to a wide range of data types beyond temporal data.
Read more
Missing-Aware Multimodal Fusion for Unified Microservice Incident Management
Wenzhuo Qian, Hailiang Zhao, Ziqi Wang, Zhipeng Gao, Jiayi Chen, Zhiwei Ling, Shuiguang Deng
Multimodal
  • Introduces ARMOR, a self-supervised framework for incident management in microservices.
  • Addresses the issue of missing modalities in multimodal data, which is common in real-world applications.
  • Utilizes a modality-specific asymmetric encoder and a missing-aware gated fusion mechanism.
  • Achieves state-of-the-art performance in anomaly detection, failure triage, and root cause localization.
Read more
Light Cones For Vision: Simple Causal Priors For Visual Hierarchy
Manglam Kartik, Neel Tushar Shah
Computer Vision Theory
  • Introduction of Worldline Slot Attention for modeling visual hierarchies.
  • Demonstration that Lorentzian geometry outperforms Euclidean and hyperbolic embeddings.
  • Establishment of the necessity of geometric structure for effective hierarchical object discovery.
  • Lightweight architecture with only 11K parameters achieving significant performance improvements.
Read more
Wireless communication empowers online scheduling of partially-observable transportation multi-robot systems in a smart factory
Yaxin Liao, Qimei Cui, Kwang-Cheng Chen, Xiong Li, Jinlian Chen, Xiyu Zhao, Xiaofeng Tao, Ping Zhang
Robotics Optimization
  • Proposes a communication-enabled framework for online scheduling in T-MRS.
  • Integrates M2M communication with route scheduling to enhance AGV coordination.
  • Utilizes simulated annealing and congestion-aware A* methods for task assignment and routing.
  • Demonstrates significant improvements in scheduling efficiency under high AGV loads.
Read more
BXRL: Behavior-Explainable Reinforcement Learning
Ram Rachum, Yotam Amitai, Yonatan Nakar, Reuth Mirsky, Cameron Allen
Reinforcement Learning Interpretability
  • Introduces BXRL, a framework for explaining behaviors in RL as first-class objects.
  • Defines behavior quantitatively, allowing for targeted explanations of agent actions.
  • Analyzes and adapts existing explainability methods for behavior measures.
  • Presents HighJax, a new environment for defining and measuring behaviors in RL.
Read more
Can we generate portable representations for clinical time series data using LLMs?
Zongliang Ji, Yifei Sun, Andre Amaral, Anna Goldenberg, Rahul G. Krishnan
NLP Large Language Models Time Series
  • Introduces a novel 'summarize-then-embed' pipeline for creating portable patient embeddings using LLMs.
  • Demonstrates competitive performance across multiple clinical tasks and cohorts, with reduced performance drops in new hospital settings.
  • Highlights the significance of structured prompts in minimizing variance in predictive models.
  • Shows that the proposed method improves few-shot learning without increasing privacy risks related to demographic information.
Read more
Attack Assessment and Augmented Identity Recognition for Human Skeleton Data
Joseph G. Zalameda, Megan A. Witherow, Alexander M. Glandon, Jose Aguilera, Khan M. Iftekharuddin
Generative Models Computer Vision Theory
  • Introduction of Attack-AAIRS framework to enhance model robustness against adversarial attacks.
  • Utilization of GAN to generate synthetic adversarial samples for training.
  • Demonstrated significant improvement in robustness against various adversarial attack methods.
  • Maintained consistent accuracy on real data despite the introduction of adversarial training.
Read more
Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models
Moazzam Umer Gondal, Hamad ul Qudous, Asma Ahmad Farhan, Sultan Alamri
Time Series Interpretability
  • Lightweight forecasting models can achieve competitive performance compared to complex models.
  • Facebook Prophet demonstrated the best predictive accuracy and efficiency in the study.
  • Residual correction significantly improved model robustness and reduced operational costs.
  • The study emphasizes the importance of interpretability in forecasting models for public health applications.
Read more
Manifold Generalization Provably Proceeds Memorization in Diffusion Models
Zebang Shen, Ya-Ping Hsieh, Niao He
Generative Models Theory
  • Diffusion models can generate high-quality samples without memorizing training data.
  • Generalization is achieved through capturing the geometry of the data manifold rather than full density estimation.
  • Coarse score accuracy in diffusion models allows for faster convergence to a target distribution.
  • The study introduces a coverage criterion for evaluating the performance of diffusion models on manifolds.
Read more
Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?
Sounak Dutta, Fin Amin, Sushil Panda, Jonathan Rabe, Yuejiang Wen, Paul Franzon
Optimization
  • Introduction of an Actor-Critic framework for analog design optimization.
  • Separation of proposal and evaluation roles enhances search efficiency.
  • ACOF improves top-10 figure of merit by 38.9% over existing methods.
  • Reduces regret by an average of 24.7%, with peak improvements of 70.5%.
Read more
Towards Safe Learning-Based Non-Linear Model Predictive Control through Recurrent Neural Network Modeling
Mihaela-Larisa Clement, MΓ³nika Farsang, Agnes Poks, Johannes Edelmann, Manfred PlΓΆchl, Radu Grosu, Ezio Bartocci
Optimization Robotics Reinforcement Learning
  • Introduction of Sequential-AMPC, a recurrent neural policy for NMPC.
  • Significant reduction in expert MPC rollouts required for training.
  • Improved feasibility rates and closed-loop safety compared to traditional methods.
  • Better learning dynamics and performance in high-dimensional systems.
Read more
How unconstrained machine-learning models learn physical symmetries
Michelangelo Domina, Joseph William Abbott, Paolo Pegolo, Filippo Bigi, Michele Ceriotti
Theory Graph Learning Efficient ML
  • Unconstrained ML models can learn physical symmetries effectively through data augmentation.
  • The paper introduces new metrics to assess the symmetry content and equivariance of model outputs.
  • Analysis of two transformer-based models reveals insights into how symmetry information is processed.
  • Strategic injection of inductive biases can enhance model performance without sacrificing expressivity.
Read more
Self Paced Gaussian Contextual Reinforcement Learning
Mohsen Sahraei Ardakani, Rui Song
Reinforcement Learning Optimization Theory
  • SPGL avoids costly numerical optimizations by using a closed-form update for Gaussian contexts.
  • The method maintains sample efficiency and adaptability while reducing computational overhead.
  • SPGL shows improved performance on benchmark tasks compared to existing curriculum methods.
  • Theoretical guarantees on convergence are provided, enhancing the method's reliability.
Read more
Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction
Haoyu Wang, Yuxin Chen, Liang Luo, Buyun Zhang, Ellie Dingqiao Wen, Pan Li
Reinforcement Learning Large Language Models Optimization
  • ITPO leverages implicit process rewards to derive turn-wise rewards, improving robustness and training stability.
  • The method outperforms existing reinforcement learning baselines in multi-turn collaborative tasks.
  • ITPO integrates seamlessly with various advantage functions, enhancing policy optimization.
  • Empirical analysis shows that ITPO's turn-wise preferences align closely with human judgment.
Read more
A Unified Memory Perspective for Probabilistic Trustworthy AI
Xueji Zhao, Likai Pei, Jianbo Liu, Kai Ni, Ningyuan Cao
Theory Efficient ML
  • Introduces a unified probabilistic memory abstraction for analyzing deterministic and stochastic operations.
  • Identifies a scaling mismatch between compute throughput, memory bandwidth, and entropy generation.
  • Examines architectural trade-offs between conventional von Neumann systems and emerging probabilistic compute-in-memory approaches.
  • Outlines pathways for scalable hardware solutions to meet the demands of probabilistic computation.
Read more
A Systematic Empirical Study of Grokking: Depth, Architecture, Activation, and Regularization
Shalima Binta Manir, Anamika Paul Rupa
Optimization Theory
  • Depth requires stabilization for effective grokking; depth-4 MLPs fail while depth-8 residual networks succeed.
  • The performance gap between Transformers and MLPs is largely due to optimization and regularization confounds.
  • Activation function effects are dependent on the regularization regime, with GELU outperforming ReLU under specific conditions.
  • Weight decay is crucial for grokking, with a narrow optimal range necessary for effective generalization.
Read more
Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder
Kewei Zhu, Yanze Xin, Jinwei Hu, Xiaoyuan Cheng, Yiming Yang, Sibo Cheng
Time Series
  • Introduces a novel Physics-Spatiotemporal Masked Autoencoder (P-STMAE) for forecasting irregular time series.
  • Integrates convolutional autoencoders with masked autoencoders to handle missing data without imputation.
  • Achieves significant improvements in prediction accuracy and computational efficiency over traditional methods.
  • Demonstrates robustness to nonlinearities in high-dimensional dynamical systems.
Read more
Kirchhoff-Inspired Neural Networks for Evolving High-Order Perception
Tongfei Chen, Jingying Yang, Linlin Yang, Jinhu LΓΌ, David Doermann, Chunyu Xie, Long He, Tian Wang, Juan Zhang, Guodong Guo, Baochang Zhang
Theory Time Series Computer Vision
  • Introduction of Kirchhoff-Inspired Neural Network (KINN) for modeling higher-order state evolution.
  • KINN utilizes Kirchhoff's current law to derive stable state updates from ordinary differential equations.
  • The architecture allows for explicit encoding of higher-order evolutionary components within a single layer.
  • Extensive experiments show KINN outperforms existing methods in PDE solving and image classification tasks.
Read more
The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation
Mingyi Liu
NLP Large Language Models Theory
  • Aligned language models exhibit significant response homogenization, reducing response diversity.
  • Sampling-based uncertainty estimation methods fail on homogenized responses, while free token entropy retains some effectiveness.
  • The alignment tax is task-dependent, with varying performance across different types of questions.
  • A novel cascade architecture for uncertainty estimation improves accuracy significantly.
Read more
DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving
Pengxuan Yang, Yupeng Zheng, Deheng Qian, Zebin Xing, Qichao Zhang, Linbo Wang, Yichen Zhang, Shaoyu Guo, Zhongpu Xia, Qiang Chen, Junyu Han, Lingyun Xu, Yifeng Pan, Dongbin Zhao
Reinforcement Learning Robotics Efficient ML
  • DreamerAD compresses diffusion sampling from 100 steps to 1, achieving 80Γ— speedup.
  • The framework maintains visual interpretability while enhancing RL efficiency.
  • Introduces shortcut forcing, autoregressive dense reward modeling, and Gaussian vocabulary sampling.
  • Achieves state-of-the-art performance on the NavSim v2 benchmark with 87.7 EPDMS.
Read more
Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization
Kalle KujanpÀÀ, Yuying Zhu, Kristina Klinkner, Shervin Malmasi
Reinforcement Learning Large Language Models Optimization
  • Development of a Transformer-GNN architecture for offline RL leading to a 2.4% throughput improvement.
  • LLMs require significant task-specific adaptation; prompting alone is inadequate.
  • Supervised fine-tuning and preference optimization enable LLMs to match historical performance.
  • The framework allows for future integration of real manager feedback into the decision-making process.
Read more
Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI
Steffen Lukas
Theory
  • High-capacity models often fail in high-stakes environments due to overfitting and noise memorization.
  • Epistemic Compression advocates for model simplicity aligned with data relevance rather than increased complexity.
  • The Regime Index effectively categorizes environments to guide modeling strategies.
  • In an analysis of 15 domains, the proposed index matched superior modeling strategies in 86.7% of cases.
Read more
Social Hippocampus Memory Learning
Liping Yi, Zhiming Zhao, Qinghua Hu
Federated Learning
  • SoHip introduces a memory-centric approach to social machine learning, focusing on memory sharing for collaboration.
  • The framework preserves privacy by keeping raw data and local model parameters on-device.
  • Theoretical guarantees on convergence and privacy preservation are established.
  • Experimental results show SoHip achieves up to 8.78% accuracy improvements over existing methods.
Read more
Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback
Jungtaek Kim, Thomas Zeng, Ziqian Lin, Minjae Lee, Chungpa Lee, Jy-yong Sohn, Hyung Il Koo, Kangwook Lee
Large Language Models Reinforcement Learning Theory
  • Introduces a controlled framework for evaluating LLMs' search capabilities.
  • Demonstrates that Transformers can represent and approximate various search strategies.
  • Finds that existing LLMs underperform compared to traditional search algorithms.
  • Shows that targeted training for search tasks improves LLM performance significantly.
Read more
Steering Code LLMs with Activation Directions for Language and Library Control
Md Mahbubur Rahman, Arjun Guha, Harshitha Menon
Large Language Models
  • Code LLMs exhibit strong implicit preferences for specific programming languages and libraries.
  • Layer-wise activation directions can be used to steer model outputs effectively.
  • Interventions remain effective even against conflicting prompts, although strength varies by model and target.
  • Overly strong steering interventions can degrade the quality of generated outputs.
Read more
Causal-INSIGHT: Probing Temporal Models to Extract Causal Structure
Benjamin Redden, Hui Wang, Shuyan Li
Time Series Interpretability Graph Learning
  • Causal-INSIGHT provides a novel approach to interpret temporal models by focusing on their response to input clamping.
  • The framework constructs directed temporal graphs using a new sparsity-aware criterion, Qbic, enhancing interpretability without needing ground-truth data.
  • Causal-INSIGHT is model-agnostic, allowing it to be applied uniformly across various temporal predictor architectures.
  • Experiments show significant improvements in temporal delay localization and competitive structural accuracy compared to existing methods.
Read more
Deep Convolutional Neural Networks for predicting highest priority functional group in organic molecules
Kunal Khatri, Vineet Mehta, Manish Narwaria, Bhaskar Chaudhary
Computer Vision
  • Introduction of a CNN model for predicting the highest priority functional group in organic molecules.
  • Utilization of a large dataset of FTIR spectra for training the model.
  • Demonstration of CNN's superiority over traditional ML methods like SVM in this context.
  • Detailed methodology for data preparation and model training.
Read more
Unveiling Hidden Convexity in Deep Learning: a Sparse Signal Processing Perspective
Emi Zeger, Mert Pilanci
Theory Optimization Interpretability
  • Establishes a connection between ReLU neural networks and sparse signal processing models.
  • Reveals hidden convexities in the loss landscapes of certain neural network architectures.
  • Proposes a reformulation of neural network training as a convex optimization problem.
  • Demonstrates improved interpretability and robustness in neural network training.
Read more
Energy-Efficient Hierarchical Federated Anomaly Detection for the Internet of Underwater Things via Selective Cooperative Aggregation
Kenechi Omeke, Michael Mollel, Lei Zhang, Qammer H. Abbasi, Muhammad Ali Imran
Federated Learning Efficient ML Time Series
  • Proposes a three-tier hierarchical federated learning framework for anomaly detection in IoUT.
  • Introduces feasibility-aware sensor-to-fog association and selective cooperative aggregation to enhance energy efficiency.
  • Demonstrates significant energy savings while maintaining detection accuracy in underwater environments.
  • Evaluates the framework using a physics-grounded model to realistically assess communication costs and participation.
Read more
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Hector Borobia, Elies SeguΓ­-Mas, Guillermina Tormo-CarbΓ³
NLP Large Language Models Interpretability
  • Rare features survive pruning better than frequent features, indicating implicit feature selection.
  • Wanda pruning preserves feature structure up to 3.7 times better than magnitude pruning.
  • Pre-trained Sparse Autoencoders remain effective on Wanda-pruned models up to 50% sparsity.
  • Geometric feature survival does not predict causal importance, challenging assumptions in interpretability.
Read more
Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch
Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela
Optimization Large Language Models
  • Classical HPO methods outperform LLM-based agents in fixed hyperparameter search spaces.
  • An LLM agent that edits training code can significantly narrow the performance gap with classical methods.
  • The hybrid method 'Centaur', which combines CMA-ES with LLMs, achieves the best results in the study.
  • Reliability in optimization methods is more critical than search diversity.
Read more
CVA: Context-aware Video-text Alignment for Video Temporal Grounding
Sungho Moon, Seunghun Lee, Jiwan Seo, Sunghoon Im
Computer Vision Multimodal
  • Introduction of Query-aware Context Diversification (QCD) to enhance data augmentation while preventing false negatives.
  • Development of Context-invariant Boundary Discrimination (CBD) loss to ensure semantic consistency at temporal boundaries.
  • Design of Context-enhanced Transformer Encoder (CTE) for effective multi-scale temporal context modeling.
  • Achievement of state-of-the-art performance on Video Moment Retrieval and Highlight Detection benchmarks.
Read more
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao
NLP Large Language Models Optimization
  • Token-level OPD is biased compared to sequence-level OPD but has lower variance in long-horizon settings.
  • Identified failure modes of sampled-token OPD include imbalanced signals and unreliable teacher guidance.
  • Proposed teacher top-K local support matching improves optimization stability and performance.
  • Empirical results show better performance in math reasoning and multi-task training with the new method.
Read more
AI Generalisation Gap In Comorbid Sleep Disorder Staging
Saswata Bose, Suvadeep Maiti, Shivam Kumar Sharma, Mythirayee S, Tapabrata Chakraborti, Srijitesh Rajendran, Raju S. Bapi
Time Series Interpretability
  • Introduction of iSLEEPS, a new dataset for ischemic stroke patients with sleep disorders.
  • Demonstration of poor generalization of deep learning models trained on healthy data to clinical populations.
  • Use of Grad-CAM for model interpretability, revealing focus on non-informative EEG regions.
  • Statistical analyses highlight significant differences in sleep architecture between healthy and stroke cohorts.
Read more
A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study
Yongda Fan, John Wu, Andrea Fitzpatrick, Naveen Baskaran, Jimeng Sun, Adam Cross
Time Series Interpretability
  • Attention mechanisms can effectively enhance interpretability in clinical predictive models.
  • Black-box interpreters like KernelSHAP and LIME are not suitable for time-series clinical prediction tasks due to computational constraints.
  • Many existing interpretability approaches lack reliability and trustworthiness.
  • The study provides a systematic evaluation framework that is extensible and reproducible.
Read more
Longitudinal Digital Phenotyping for Early Cognitive-Motor Screening
Diego Jimenez-Oviedo, Ruben Vera-Rodriguez, Ruben Tolosana, Juan Carlos Ruiz-Garcia, Jaime Herreros-Rodriguez
Time Series Multimodal
  • Introduces an AI-driven framework for continuous monitoring of cognitive-motor development in children.
  • Identifies three distinct performance profiles (low, medium, high) based on longitudinal data.
  • Demonstrates high stability in low-performance clusters, indicating persistent early deficits.
  • Utilizes unsupervised learning techniques to analyze touchscreen interaction data.
Read more
CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control
Yifeng Zhang, Harsh Goel, Peizhuo Li, Mehul Damani, Sandeep Chinchali, Guillaume Sartoretti
Reinforcement Learning Optimization
  • Introduces Queue Dynamic State Encoding (QDSE) for improved traffic state representation.
  • Develops Neighbor-aware Policy Optimization (NAPO) to enhance coordination among traffic signal agents.
  • Demonstrates superior performance over existing traffic signal control methods across multiple datasets.
  • Addresses challenges of partial observability and agent coordination in decentralized environments.
Read more
An Integrative Genome-Scale Metabolic Modeling and Machine Learning Framework for Predicting and Optimizing Biofuel-Relevant Biomass Production in Saccharomyces cerevisiae
Neha K. Nair, Aaron D'Souza
Optimization Generative Models Interpretability
  • Integration of genome-scale metabolic modeling with machine learning for biomass production optimization.
  • High predictive accuracy achieved using Random Forest and XGBoost models.
  • Identification of key metabolic reactions influencing biomass yield through SHAP analysis.
  • Significant increase in predicted biomass flux through in silico overexpression and Bayesian optimization.
Read more
An Explainable Ensemble Learning Framework for Crop Classification with Optimized Feature Pyramids and Deep Networks
Syed Rayhan Masud, SK Muktadir Hossain, Md. Ridoy Sarkar, Mohammad Sakib Mahmood, Md. Kishor Morol, Rakib Hossain Sajib
Interpretability
  • Introduction of a high-performance meta-ensemble framework combining multiple advanced techniques for crop classification.
  • Integration of Explainable AI methods to enhance model transparency and provide actionable insights.
  • Identification of key soil and climate features impacting crop suitability, validated against agronomic knowledge.
  • Demonstration of superior performance metrics compared to individual machine learning models.
Read more
Local learning for stable backpropagation-free neural network training towards physical learning
Yaqi Guo, Fabian Braun, Bastiaan Ketelaar, Stephanie Tan, Richard Norte, Siddhant Kumar
Theory Efficient ML Optimization
  • FFzero enables stable neural network training without backpropagation or automatic differentiation.
  • The framework combines local learning, prototype-based representations, and directional-derivative optimization.
  • FFzero is effective for multilayer perceptron and convolutional networks across various tasks.
  • Demonstrated viability using a simulated photonic neural network, paving the way for physical learning.
Read more