AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

24 Papers today
8h Update frequency
7 Days of history
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
Aadyot Bhatnagar, Peter Mørch Groth, Ali Madani
Reinforcement Learning Optimization Large Language Models
  • Introduces STOMP, a novel offline RL algorithm for multi-objective optimization.
  • Utilizes smooth Tchebysheff scalarization to effectively capture non-convex regions of the Pareto front.
  • Demonstrates superior performance over existing methods in protein engineering tasks.
  • Addresses the limitations of linear reward scalarization in multi-objective RL.
Read more
Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety
Hossem Eddine Hafidi, Elisabetta De Giovanni, Teodoro Montanaro, Ilaria Sergi, Massimo De Vittorio, Luigi Patrono
Reinforcement Learning Robotics Time Series
  • Integration of real-time drowsiness detection into an autonomous braking system.
  • Utilization of ECG signals for accurate drowsiness monitoring.
  • Development of a Double Dual Deep Q-Network (DD-DQN) for adaptive braking policies.
  • Achieved a 99.99% success rate in avoiding accidents in both drowsy and non-drowsy scenarios.
Read more
Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling
Anton Saenko, Pranshav Gajjar, Abiodun Ganiyu, Vijay K. Shah
NLP Large Language Models
  • Identifies systematic overconfidence in LLM-generated confidence scores in telecommunications.
  • Proposes a Twin-Pass CoT-Ensembling method to improve confidence estimation.
  • Achieves up to 88% reduction in Expected Calibration Error (ECE) across benchmarks.
  • Provides empirically validated confidence thresholds and recommendations for telecom applications.
Read more
Hardware-Efficient Neuro-Symbolic Networks with the Exp-Minus-Log Operator
Eymen Ipek
Efficient ML Interpretability Theory
  • Introduction of the Exp-Minus-Log (EML) operator as a unifying primitive for DNNs.
  • Development of a DNN-EML hybrid architecture that enhances interpretability and reduces hardware complexity.
  • Establishment of computational-cost bounds and analysis of inference and training acceleration.
  • Identification of a literature gap in existing neuro-symbolic approaches that do not utilize a single hardware-realizable primitive.
Read more
Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection
Xuanyan Liu, Ignacio Cabrera Martin, Marcello Trovati, Xiaolong Xu, Nikolaos Polatidis
Theory
  • Model evaluation is often reduced to a few aggregate metrics, risking misleading conclusions.
  • Common pitfalls in evaluation include data leakage, class imbalance, and inappropriate metric selection.
  • Evaluation should be treated as a decision-oriented and context-dependent process.
  • The paper emphasizes the importance of aligning evaluation methods with operational objectives.
Read more
Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
Theory Reinforcement Learning Time Series
  • Establishes a minimax lower bound for classification risk under Markov dependence.
  • Demonstrates that uniform bagging is suboptimal, with a significant risk gap.
  • Proposes adaptive spectral routing to achieve optimal performance in Markov settings.
  • Validates theoretical predictions through extensive experiments on various datasets.
Read more
A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models
Jason Kong, Nilesh Prasad Pandey, Flavio Ponzina, Tajana Rosing
NLP Large Language Models Efficient ML
  • Introduces a gradient-free sensitivity analysis framework for hybrid SSM-Transformer models.
  • Demonstrates that KL divergence is a superior metric for quantization sensitivity in language models.
  • Validates the proposed method through extensive experiments and real-world profiling.
  • Achieves significant model compression with minimal accuracy loss, suitable for edge deployment.
Read more
Some Theoretical Limitations of t-SNE
Rupert Li, Elchanan Mossel
Theory
  • t-SNE can lose important data features during dimensionality reduction.
  • In high-dimensional spaces, t-SNE may map distinct points to the same location in lower dimensions.
  • The paper provides mathematical propositions demonstrating the limitations of t-SNE in preserving data structure.
  • The findings suggest that t-SNE may not be appropriate for all datasets, particularly those with high dimensionality.
Read more
When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration
Yiping Li, Zhiyu An, Wan Du
Large Language Models NLP Efficient ML
  • Introduces Orthogonal Backfill (OBF) to enhance KV compression in multi-agent LLM communication.
  • Achieves a significant reduction in communication costs (79.8%–89.4%) while maintaining competitive performance.
  • Demonstrates that preserving useful information is more critical than merely relaying large amounts of data.
  • Evaluates the method across nine diverse benchmarks, showing superior results in several cases.
Read more
Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments
Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, Satyasaran Changdar
Reinforcement Learning Robotics Theory
  • Introduction of Adaptive Memory Crystallization (AMC) for continual reinforcement learning.
  • Development of a three-phase memory hierarchy (Liquid, Glass, Crystal) to manage memory stability and plasticity.
  • Rigorous mathematical proofs establishing the convergence and performance guarantees of the proposed SDE.
  • Empirical results show substantial improvements in learning efficiency and memory management.
Read more
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
Sumeet Ramesh Motwani, Daniel Nichols, Charles London, Peggy Li, Fabio Pizzati, Acer Blake, Hasan Hammoud, Tavish McDonald, Akshat Naik, Alesia Ivanova, Vignesh Baskaran, Ivan Laptev, Ruben Glatt, Tal Ben-Nun, Philip Torr, Natasha Jaques, Ameya Prabhu, Brian Bartoldson, Bhavya Kailkhura, Christian Schroeder de Witt
Large Language Models NLP Theory
  • LongCoT is a novel benchmark for evaluating long-horizon reasoning in language models.
  • The benchmark consists of 2,500 expert-designed problems across multiple domains.
  • Current top models achieve less than 10% accuracy on LongCoT, highlighting significant reasoning limitations.
  • The problems require navigating complex interdependencies, emphasizing the need for planning and error management.
Read more
MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models
Gabriel Afriat, Xiang Meng, Shibal Ibrahim, Hussein Hazimeh, Rahul Mazumder
Computer Vision Large Language Models Efficient ML
  • MOONSHOT enhances one-shot pruning by optimizing multiple objectives simultaneously.
  • The framework is scalable and efficient, suitable for billion-parameter models.
  • Experimental results show significant improvements in performance and accuracy across various models.
  • The study reveals that different pruning criteria can yield complementary insights into parameter importance.
Read more
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
Zekai Lin, Chao Xue, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng
NLP Large Language Models Optimization
  • Parameter importance in supervised fine-tuning is dynamic, not static.
  • Evolving Parameter Isolation (EPI) adapts isolation masks based on online gradient estimates.
  • EPI improves stability and generalization in multi-task learning scenarios.
  • The framework effectively balances the retention of established knowledge with the acquisition of new capabilities.
Read more
Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism
Mohammad Ahangarkiasari, Andreas Tind Damgaard, Casper Haurum, Kaare B. Mikkelsen
Time Series
  • The study investigates the potential of unsupervised domain transfer for sleep monitoring amidst signal degradation.
  • A discriminator-guided approach is proposed to enhance the realism of hypnograms, which can improve scoring accuracy.
  • The unsupervised method shows performance improvements in various signal distortion scenarios without decreasing overall performance.
  • Real-life application of the method revealed limited benefits, indicating the need for further refinement.
Read more
MAny: Merge Anything for Multimodal Continual Instruction Tuning
Zijian Gao, Wangwang Jia, Xingxing Zhang, Pengfei Qian, Tao Sun, Bo Ding, Yong Dou, Huaimin Wang, Kele Xu
Multimodal Large Language Models Efficient ML
  • Identification of a dual-forgetting phenomenon in MLLMs affecting both perception and reasoning.
  • Introduction of Cross-modal Projection Merging (CPM) for adaptive merging of visual features.
  • Development of Low-rank Parameter Merging (LPM) using Recursive Least Squares for optimal parameter merging.
  • MAAny achieves state-of-the-art performance on UCIT and MLLM-DCL benchmarks without GPU training.
Read more
RPS: Information Elicitation with Reinforcement Prompt Selection
Tao Wang, Jingyao Lu, Xibo Wang, Haonan Huang, Su Yao, Zhiqiang Hu, Xingyan Chen, Enmao Diao
NLP Large Language Models Reinforcement Learning
  • Proposes Reinforcement Prompt Selection (RPS) for adaptive information elicitation in dialogues.
  • Introduces IELegal, a benchmark dataset for evaluating information elicitation in legal contexts.
  • RPS outperforms static prompt baselines, demonstrating the effectiveness of adaptive strategies.
  • Addresses the limitations of existing prompt engineering methods by reducing reliance on static prompts.
Read more
First-See-Then-Design: A Multi-Stakeholder View for Optimal Performance-Fairness Trade-Offs
Kavya Gupta, Nektarios Kalampalikis, Christoph Heitz, Isabel Valera
Theory Optimization
  • Introduces a multi-stakeholder framework for fair algorithmic decision-making.
  • Shifts focus from prediction-centric fairness to utility-based fairness.
  • Utilizes post-hoc multi-objective optimization to explore performance-fairness trade-offs.
  • Demonstrates that stochastic policies can yield better outcomes than deterministic ones.
Read more
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
Shentong Mo
Reinforcement Learning Large Language Models Optimization
  • Introduction of CoUR framework for efficient reward function design in RL.
  • Integration of code uncertainty quantification to streamline reward component reuse.
  • Utilization of Bayesian optimization for independent optimization of reward terms.
  • Extensive evaluation showing CoUR outperforms traditional methods in performance and cost.
Read more
Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces
Pranav Mahajan, Ben Seymour
Reinforcement Learning
  • Introduces Soft Q(λ), a multi-step off-policy method for entropy-regularized reinforcement learning.
  • Develops a novel Soft Tree Backup operator to handle entropy terms across multiple time steps.
  • Eliminates the on-policy bias inherent in traditional n-step soft Q-learning methods.
  • Demonstrates the ability to learn entropy-regularized value functions under arbitrary behavior policies.
Read more
Golden Handcuffs make safer AI agents
Aram Ebtekar, Michael K. Cohen
Reinforcement Learning Theory
  • Introduces the 'Golden Handcuffs' mechanism to enhance safety in AI agents.
  • Expands the reward range to include negative values, promoting risk aversion.
  • Proves that the agent can achieve sublinear regret against the best mentor.
  • Ensures that unsafe actions are only triggered by mentors, not the optimizing policy.
Read more
Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate
Jaemin Kim, Sungkyun Kim, Junyeol Lee, Jiwon Seo
Large Language Models Efficient ML Optimization
  • DASH-Q improves robustness in ultra low-bit quantization by using diagonal Hessian approximations.
  • The framework effectively filters out noise from calibration data, enhancing feature preservation.
  • Achieves significant accuracy improvements over existing PTQ methods, particularly in low-bit regimes.
  • Demonstrates strong performance with minimal calibration data, making it suitable for resource-limited environments.
Read more
Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification
Mohammad Nooraiepour, Zezhang Song, Wei Li, Sarah Perez
Theory Interpretability Efficient ML
  • Introduces a physics-informed transfer learning framework for methane sorption prediction.
  • Achieves a 227% improvement over classical isotherm models in predictive accuracy.
  • Monte Carlo Dropout is identified as the best method for uncertainty quantification.
  • Demonstrates the importance of moisture-volatile interactions in methane sorption.
Read more
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
Zijian Zhao, Jing Gao, Sen Li
Reinforcement Learning Robotics Optimization
  • CMAT bridges MARL and SARL, addressing key challenges in cooperative multi-agent settings.
  • The framework utilizes a Transformer encoder and a hierarchical decision-making mechanism for effective coordination.
  • Simultaneous action generation based on a consensus vector reduces sensitivity to action order.
  • CMAT shows superior performance on benchmark tasks compared to existing methods.
Read more
Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning
Kamer Ali Yuksel, Hassan Sawaf
Theory Optimization Efficient ML
  • Introduction of top-k goodness function, significantly outperforming the traditional sum-of-squares method.
  • Development of entmax-weighted energy for adaptive sparse weighting, leading to improved accuracy.
  • Implementation of separate label–feature forwarding (FFCL) enhances performance across all goodness functions.
  • Establishment of a unifying principle that emphasizes the importance of sparsity in goodness functions for FF networks.
Read more