AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

55 Papers today
8h Update frequency
7 Days of history
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo
Computer Vision Generative Models Interpretability
  • Introduces BrainCoDec, a training-free method for cross-subject brain decoding.
  • Utilizes a two-stage hierarchical inference process for visual decoding.
  • Achieves generalization across subjects without anatomical alignment or stimulus overlap.
  • Demonstrates robustness to input variability and effective reconstruction of visual stimuli.
Read more
Playing DOOM with 1.3M Parameters: Specialized Small Models vs Large Language Models for Real-Time Game Control
David Golchinfar, Daryoush Vaziri, Alexander Marquardt
Reinforcement Learning Efficient ML Robotics
  • SauerkrautLM-Doom-MultiVec outperforms LLMs with significantly fewer parameters in real-time gameplay.
  • Innovative use of ModernBERT architecture and depth-aware token representations enhances performance.
  • Trained on 31,000 human gameplay demonstrations, the model exhibits superior engagement in gameplay.
  • Demonstrates the effectiveness of specialized models for real-time decision-making tasks.
Read more
Structured Distillation of Web Agent Capabilities Enables Generalization
Xing Han LΓΉ, Siva Reddy
Large Language Models
  • Introduction of AGENT-AS-ANNOTATORS framework for web agent capability distillation.
  • Generation of a high-quality training dataset (A3-SYNTH) using a frontier LLM as a teacher.
  • Significant performance improvements on WebArena and unseen environments.
  • Ablation studies confirm the meaningful contributions of each pipeline component.
Read more
SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation
Grace Jiarui Fan, Chengpiao Huang, Tianyi Peng, Kaizheng Wang, Yuhang Wu
NLP Large Language Models Generative Models
  • SYN-DIGITS is a lightweight, model-agnostic calibration framework for digital twin simulations.
  • The framework successfully aligns LLM predictions with human ground truth using latent structure learning.
  • Empirical evaluations show significant improvements in prediction accuracy and reduction of biases.
  • SYN-DIGITS can be integrated with various simulation approaches, including naΓ―ve simulation and fine-tuning.
Read more
Quantization Impact on the Accuracy and Communication Efficiency Trade-off in Federated Learning for Aerospace Predictive Maintenance
Abdelkarim Loukili
Federated Learning Time Series Efficient ML
  • AeroConv1D model designed for efficient predictive maintenance in aerospace using federated learning.
  • INT4 quantization achieves accuracy similar to FP32 while reducing communication costs by 8x.
  • Non-IID evaluation reveals the limitations of IID client partitioning in assessing quantization performance.
  • INT2 quantization leads to instability in performance metrics, making it impractical for deployment.
Read more
KV Cache Offloading for Context-Intensive Tasks
Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov
NLP Large Language Models Efficient ML
  • Introduces the Text2JSON benchmark for evaluating KV-cache offloading on context-intensive tasks.
  • Identifies significant performance degradation in existing KV offloading methods for Llama 3 and Qwen 3 models.
  • Proposes a new strategy to improve accuracy in KV-cache offloading.
  • Highlights the inadequacy of current benchmarks in capturing the challenges of context-intensive tasks.
Read more
Flow Learners for PDEs: Toward a Physics-to-Physics Paradigm for Scientific Computing
Yilong Dai, Shengyu Chen, Xiaowei Jia, Runlong Yu
Theory Generative Models Optimization
  • Current learned PDE solvers often misrepresent the underlying physics by focusing on state prediction rather than transport dynamics.
  • Flow learners provide a more accurate framework by parameterizing transport vector fields, enabling better modeling of uncertainty and continuous dynamics.
  • The proposed paradigm supports improved predictions over long time horizons and in chaotic or partially observed environments.
  • The authors advocate for a shift in the research agenda towards a physics-to-physics approach in the design of learned solvers.
Read more
Pruning Extensions and Efficiency Trade-Offs for Sustainable Time Series Classification
Raphael Fischer, Angus Dempster, Sebastian BuschjΓ€ger, Matthias Jakobs, Urav Maniar, Geoffrey I. Webb
Time Series Efficient ML
  • Introduces a unified methodology for evaluating performance and efficiency trade-offs in TSC.
  • Presents a pruning strategy for hybrid classifiers Hydra and Quant, leading to the development of Hydrant.
  • Demonstrates significant energy savings (up to 80%) with minimal impact on accuracy (less than 5%).
  • Conducts extensive experiments across diverse datasets and hardware setups to validate findings.
Read more
DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting
Yeran Zhang, Pengwei Yang, Guoqing Wang, Tianyu Li
Time Series Graph Learning Interpretability
  • DSPR effectively decouples stable trends from regime-dependent dynamics in industrial time series forecasting.
  • The framework incorporates an Adaptive Window module and a Physics-Guided Dynamic Graph to enhance physical plausibility.
  • DSPR achieves state-of-the-art performance with over 99% Mean Conservation Accuracy and 97.2% Total Variation Ratio.
  • The model provides interpretable insights that align with known physical mechanisms, aiding scientific analysis.
Read more
SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective
Yuyao Wang, Min Yang, Meng Chen, Weiming Huang, Yongshun Gong
Optimization Graph Learning Theory
  • SCOT addresses the challenge of explicit soft correspondence in cross-city transfer learning.
  • The framework utilizes Sinkhorn-based entropic optimal transport for aligning region representations.
  • An OT-weighted contrastive objective enhances semantic separation and transferability.
  • SCOT shows significant improvements in transfer accuracy and robustness across various urban prediction tasks.
Read more
An Illusion of Unlearning? Assessing Machine Unlearning Through Internal Representations
Yichen Gao, Altay Unal, Akshay Rangamani, Zhihui Zhu
Computer Vision Theory
  • Current MU methods often fail to erase the internal representations of forgotten data, leading to potential vulnerabilities.
  • Feature–classifier misalignment is a significant issue that can result in the re-emergence of forgotten concepts.
  • A new MU method based on class-mean features (CMF) is proposed to enhance alignment between features and classifiers.
  • CMF-based unlearning effectively reduces forgotten information while preserving high accuracy on retained classes.
Read more
Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Binxing Xu, Hao Gu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li, Sirui Han, Yike Guo
Large Language Models Efficient ML Optimization
  • Introduces a progressive QAT framework that enhances stability during low-bit training.
  • Employs outlier channel splitting to mitigate quantization errors effectively.
  • Achieves significant speed improvements with custom operators for low-bit configurations.
  • Demonstrates superior performance on LLaMA-2/3 compared to existing QAT baselines.
Read more
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
Baihui Liu, Kaiyuan Tian, Wei Wang, Zhaoning Zhang, Linbo Qiao, Dongsheng Li
NLP Large Language Models Efficient ML
  • Introduces the concept of an activation budget for expert activations in MoE models.
  • Presents Alloc-L and Alloc-T strategies for optimizing expert allocation at layer and token levels, respectively.
  • Demonstrates that Alloc-MoE maintains model performance while significantly improving inference speed.
  • Achieves notable speedups on DeepSeek-V2-Lite with reduced expert activations.
Read more
Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
Jaden Zhang, Gardenia Liu, Oliver Johansson, Hileamlak Yitayew, Kamryn Ohly, Grace Li
Reinforcement Learning
  • Prediction Arena benchmarks AI models in real-world prediction markets, providing objective evaluation metrics.
  • Cohort 1 models showed significant performance differences on different platforms, with Polymarket yielding better returns than Kalshi.
  • The study identifies key factors influencing model performance, including initial prediction accuracy and capitalizing on correct predictions.
  • Computational efficiency does not correlate with performance, challenging assumptions about model complexity.
Read more
Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity
Yucheng Zhou, Jianbing Shen
Generative Models Optimization Efficient ML
  • Fewer-Frames method reduces training time but increases error and inconsistency in generated videos.
  • Local Optimization method improves training efficiency and reduces error accumulation compared to Fewer-Frames.
  • Representation Continuity strategy enhances video consistency and robustness while maintaining training speed.
  • Experimental results show the proposed methods outperform existing autoregressive video generation techniques.
Read more
What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha
NLP Large Language Models Interpretability
  • Introduces a multi-token activation patching framework for analyzing steering vectors in LLMs.
  • Finds that refusal steering interacts mainly with the OV circuit of the attention mechanism.
  • Demonstrates that freezing attention scores has a negligible effect on steering performance.
  • Reveals that steering vectors can be sparsified by up to 90-99% while retaining performance.
Read more
The Impact of Dimensionality on the Stability of Node Embeddings
Tobias Schumacher, Simon Reichelt, Markus Strohmaier
Graph Learning
  • Dimensionality significantly affects the stability of node embeddings.
  • Different embedding methods exhibit varying stability patterns with increased dimensionality.
  • Maximum stability does not necessarily align with optimal performance in downstream tasks.
  • The study emphasizes the importance of selecting appropriate embedding dimensions.
Read more
Rethinking Residual Errors in Compensation-based LLM Quantization
Shuaiting Li, Juncan Deng, Kedong Xu, Rongtao Deng, Hong Gu, Minghan Jiang, Haibin Shen, Kejie Huang
Large Language Models Efficient ML Optimization
  • Introduces a refined calibration objective for quantization that aligns outputs with the original model rather than compensated weights.
  • Defines 'compensation-aware error' to capture intra-layer discrepancies introduced by weight compensation.
  • Utilizes neuron decomposition techniques to efficiently incorporate the new error formulation into weight updates.
  • Demonstrates significant performance improvements in quantization for LLMs with minimal modifications to existing methods.
Read more
Decisions and Deployment: The Five-Year SAHELI Project (2020-2025) on Restless Multi-Armed Bandits for Improving Maternal and Child Health
Shresth Verma, Arpan Dasgupta, Neha Madhiwalla, Aparna Taneja, Milind Tambe
Reinforcement Learning Optimization
  • Restless Multi-Armed Bandits effectively optimize limited public health interventions.
  • Decision-focused learning enhances the predict-then-optimize approach in healthcare settings.
  • Long-term interventions led to improved adherence to mHealth programs and better health behaviors.
Read more
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
Andreas Plesner, Francisco GuzmΓ‘n, Anish Athalye
Reinforcement Learning Large Language Models
  • RLVR is robust to noise, with up to 15% noise rates yielding minimal performance drops.
  • Precision in verification is more important than recall for effective training.
  • Diminishing returns are observed when improving verifier accuracy beyond a certain threshold.
  • The findings apply across different model families and noise types.
Read more
A Novel Edge-Assisted Quantum-Classical Hybrid Framework for Crime Pattern Learning and Classification
Niloy Das, Apurba Adhikary, Sheikh Salman Hassan, Yu Qiao, Zhu Han, Tharmalingam Ratnarajah, Choong Seon Hong
Optimization Theory Efficient ML
  • Introduction of a comprehensive quantum-classical comparison framework for crime analytics.
  • Development of a novel quantum circuit architecture that leverages crime feature correlations.
  • Demonstration of competitive performance of quantum-inspired models compared to classical baselines.
  • Hybrid architectures show promise for deployment in resource-constrained environments.
Read more
Bias-Constrained Diffusion Schedules for PDE Emulations: Reconstruction Error Minimization and Efficient Unrolled Training
Constantin Le CleΓ―, Nils ThΓΌrey, Xiaoxiang Zhu
Generative Models Optimization Time Series
  • Introduction of the Reconstruction Exposure-Bias concept, linking training and inference errors.
  • Development of an Adaptive Noise Schedule to optimize reconstruction error while maintaining stability.
  • Proposal of a fast Proxy Unrolled Training method to enhance computational efficiency.
  • Demonstrated improvements in accuracy and stability over traditional diffusion and deterministic models.
Read more
Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks
Mayuka Jayawardhana, Nihal Sharma, Kazem Meidani, Bayan Bruss, Tom Goldstein, Doron Bergman
Time Series
  • Introduces a framework for zero-shot multivariate time series forecasting using tabular models.
  • Addresses the limitation of treating MTS as independent univariate problems by modeling inter-channel dependencies.
  • Utilizes a 'rolled out' tabular format to capture spatial correlations and temporal dependencies.
  • Demonstrates competitive performance against state-of-the-art methods in empirical evaluations.
Read more
Tree-of-Evidence: Efficient "System 2" Search for Faithful Multimodal Grounding
Micky C. Nnamdi, Benoit L. Marteau, Yishan Zhong, J. Ben Tamo, May D. Wang
Multimodal Interpretability Optimization
  • Introduces Tree-of-Evidence (ToE) for improved interpretability of multimodal models.
  • Frames interpretability as a discrete optimization problem using Evidence Bottlenecks.
  • Maintains high predictive performance with minimal evidence units.
  • Achieves better decision agreement and lower errors compared to traditional methods.
Read more
BLEG: LLM Functions as Powerful fMRI Graph-Enhancer for Brain Network Analysis
Rui Dong, Zitong Wang, Jiaxing Li, Weihuang Zheng, Youyong Kong
Graph Learning Large Language Models NLP
  • Introduces BLEG, a framework combining LLMs and GNNs for brain network analysis.
  • Addresses limitations of GNNs due to feature sparsity and lack of domain knowledge.
  • Demonstrates a three-stage methodology for enhancing GNN performance.
  • Achieves superior results on various downstream tasks compared to existing methods.
Read more
Sinkhorn doubly stochastic attention rank decay analysis
Michela Lapenna, Rita Fioresi, Bahman Gharesifard
Theory NLP Computer Vision
  • Doubly stochastic attention mitigates rank collapse more effectively than row-stochastic attention.
  • Rank decay in self-attention using Sinkhorn normalization occurs doubly exponentially with depth.
  • Skip connections are crucial for maintaining rank in self-attention networks.
  • Empirical validation shows improved performance in sentiment analysis and image classification tasks.
Read more
A Machine Learning Framework for Turbofan Health Estimation via Inverse Problem Formulation
Milad Leyli-Abadi, Lucas Thil, SΓ©bastien Razakarivony, Guillaume Doquet, Jesse Read
Time Series
  • Introduction of a realistic turbofan dataset that captures real-world health monitoring challenges.
  • Comprehensive evaluation of established methods for health state estimation from sparse measurements.
  • Investigation of self-supervised learning approaches to recover health states without true labels.
  • Comparison of traditional Bayesian filters and data-driven models, establishing strong baselines.
Read more
TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis
Sikai Bai, Haoxi Li, Jie Zhang, Yongjiang Liu, Song Guo
Reinforcement Learning Large Language Models
  • TTVS enables dynamic augmentation of training data from unlabeled test queries.
  • The framework consists of two modules: Online Variational Synthesis and Test-time Hybrid Exploration.
  • TTVS outperforms existing test-time adaptation methods and state-of-the-art RL techniques using only unlabeled data.
  • The approach encourages models to learn underlying problem logic rather than superficial patterns.
Read more
A Systematic Framework for Tabular Data Disentanglement
Ivan Tjuawinata, Andre Gunawan, Anh Quan Tran, Nitish Kumar, Payal Pote, Harsh Bansal, Chu-Hung Chi, Kwok-Yan Lam, Parventanis Murthy
Theory Generative Models Optimization
  • Introduces a systematic framework for tabular data disentanglement.
  • Modularizes the disentanglement process into four core components.
  • Identifies limitations of existing methods and proposes a comprehensive view.
  • Highlights the unique challenges posed by tabular data compared to other data types.
Read more
A Graph Foundation Model for Wireless Resource Allocation
Yucheng Sheng, Jiacheng Wang, Le Liang, Hao Ye, Shi Jin
Graph Learning Optimization
  • Introduces a novel Graph Foundation Model for resource allocation in wireless networks.
  • Utilizes an interference-aware Transformer architecture for improved adaptability.
  • Employs a hybrid self-supervised pre-training strategy for effective representation learning.
  • Achieves state-of-the-art performance and sample efficiency in various scenarios.
Read more
Tensor-based computation of the Koopman generator via operator logarithm
Tatsuya Kishimoto, Jun Ohkubo
Theory Time Series Efficient ML
  • Introduces a tensor-based method for computing the Koopman generator in low-rank TT format.
  • Avoids the curse of dimensionality by leveraging eigendecomposition for efficient computation.
  • Demonstrates effectiveness on both 4D and 10D dynamical systems, achieving accurate recovery of vector fields.
  • Provides a scalable solution for system identification in nonlinear dynamics.
Read more
Multimodal Latent Reasoning via Predictive Embeddings
Ashutosh Adhikari, Mirella Lapata
Multimodal
  • PEARL eliminates the need for explicit tool invocation at inference time, reducing overhead.
  • The framework supports multi-step reasoning and avoids training-inference mismatches.
  • PEARL outperforms traditional supervised fine-tuning and reconstruction-based methods in various benchmarks.
  • The approach focuses on predictive embedding learning, which is shown to be more effective than reconstruction-based methods.
Read more
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
Yue Huang, Haomin Zhuang, Jiayi Ye, Han Bao, Yanbo Wang, Hang Hua, Siyuan Wu, Pin-Yu Chen, Xiangliang Zhang
NLP Large Language Models Reinforcement Learning
  • Introduction of the GaaA framework as a soft-gating alternative to traditional hard-gated safety mechanisms.
  • Development of GuardSet, a large-scale dataset with over 208,000 examples for training guardian models.
  • Training of GuardAdvisor using a combination of supervised fine-tuning and reinforcement learning.
  • Demonstration of GuardAdvisor's competitive performance and significant reduction in unnecessary refusals.
Read more
Preference Redirection via Attention Concentration: An Attack on Computer Use Agents
Dominik Seip, Matthias Hein
Multimodal
  • Introduction of PRAC, a novel attack on CUAs that manipulates attention in vision models.
  • Demonstration of the attack's effectiveness in redirecting product selection on online shopping platforms.
  • Highlighting the security vulnerabilities of CUAs in trusted environments, particularly through visual perception.
  • Validation of the attack in realistic scenarios, indicating high success rates.
Read more
Implicit Regularization and Generalization in Overparameterized Neural Networks
Zeran Johannsen
Theory Optimization
  • Overparameterized neural networks can generalize well despite classical predictions of overfitting.
  • Optimization dynamics, particularly through SGD, play a crucial role in implicit regularization.
  • Smaller batch sizes lead to better generalization and flatter minima in the loss landscape.
  • Sparse subnetworks can achieve performance comparable to full models, supporting the Lottery Ticket Hypothesis.
Read more
Automating aggregation strategy selection in federated learning
Dian S. Y. Pang, Endrias Y. Ergetu, Eric Topham, Ahmed E. Fetit
Federated Learning
  • Introduces an automated framework for selecting aggregation strategies in Federated Learning.
  • Operates in single-trial and multi-trial modes to accommodate different resource constraints.
  • Utilizes large language models for strategy inference and a genetic search for optimization.
  • Demonstrates improved robustness and generalization in non-IID scenarios through extensive experiments.
Read more
Optimal Decay Spectra for Linear Recurrences
Yang Cao
NLP Large Language Models Theory
  • Introduces Position-Adaptive Spectral Tapering (PoST) for improved long-range memory in linear recurrent models.
  • Establishes a design blueprint for memory channels based on logarithmic equipartition of information.
  • Demonstrates minimax optimality through Spectral Reparameterization for geometrically spaced decay rates.
  • Implements Position-Adaptive Scaling to dynamically adjust memory channel contributions based on sequence position.
Read more
Physics-informed neural operators for the in situ characterization of locally reacting sound absorbers
Jonas M. Schmid, Johannes D. Schmid, Martin Eser, Steffen Marburg
Audio & Speech Theory Optimization
  • Introduces a physics-informed neural operator approach for estimating acoustic surface admittance.
  • Avoids the need for explicit forward models by embedding governing acoustic equations into the training process.
  • Demonstrates improved robustness to noise and sparse data compared to traditional methods.
  • Validates the approach using synthetic data from simulations of porous absorbers.
Read more
Benchmark Shadows: Data Alignment, Parameter Footprints, and Generalization in Large Language Models
Hongjian Zou, Yidan Wang, Qi Ding, Yixuan Liao, Xiaoxin Chen
NLP Large Language Models Multimodal
  • Introduces a regime-centric framework linking data distribution to learning dynamics in LLMs.
  • Demonstrates that benchmark-aligned data improves narrow metrics but limits broader representational development.
  • Shows that coverage-expanding data leads to better generalization and distributed parameter adaptation.
  • Presents parameter-space diagnostics to characterize training regime effects.
Read more
Leveraging Complementary Embeddings for Replay Selection in Continual Learning with Small Buffers
Danit Yanowsky, Daphna Weinshall
Computer Vision Efficient ML Theory
  • MERS integrates supervised and self-supervised embeddings to improve replay selection in continual learning.
  • The method employs a non-parametric alignment strategy based on k-NN density estimation for adaptive selection.
  • MERS achieves state-of-the-art performance on Split CIFAR-100 and Split TinyImageNet datasets.
  • The approach is efficient and can be seamlessly integrated into existing replay-based continual learning frameworks.
Read more
DMax: Aggressive Parallel Decoding for dLLMs
Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang
NLP Large Language Models Generative Models
  • DMax mitigates error accumulation in parallel decoding of dLLMs.
  • Introduces On-Policy Uniform Training (OPUT) for effective self-correction.
  • Proposes Soft Parallel Decoding (SPD) to enhance decoding robustness.
  • Achieves significant improvements in tokens per forward (TPF) without sacrificing accuracy.
Read more
The ecosystem of machine learning competitions: Platforms, participants, and their impact on AI development
Ioannis Nasios
Theory
  • MLCs significantly contribute to AI innovation and skill development.
  • Major platforms like Kaggle dominate participation and prize distribution.
  • Competitions bridge the gap between academic research and industrial applications.
  • MLCs foster collaboration and knowledge sharing within the AI community.
Read more
Multimodal Large Language Models for Multi-Subject In-Context Image Generation
Yucheng Zhou, Dubing Chen, Huan Zheng, Jianbing Shen
Multimodal Generative Models Computer Vision
  • MUSIC is the first MLLM designed for multi-subject in-context image generation.
  • An automatic data generation pipeline is introduced, removing the need for manual annotation.
  • The vision chain-of-thought mechanism enhances the model's understanding of multi-subject relationships.
  • A novel semantics-driven spatial layout planning method is proposed to reduce semantic conflicts.
Read more
Validated Synthetic Patient Generation for Small Longitudinal Cohorts: Coagulation Dynamics Across Pregnancy
Jeffrey D. Varner, Maria Cristina Bravo, Carole McBride, Thomas Orfeo, Ira Bernstein
Generative Models Time Series Theory
  • Introduces multiplicity-weighted Stochastic Attention (SA) for synthetic patient generation.
  • SA preserves the geometry of real patient data while generating new synthetic profiles.
  • Successfully applied to a longitudinal coagulation dataset from pregnant patients.
  • Synthetic patients were validated to be statistically and mechanistically similar to real patients.
Read more
Learning is Forgetting: LLM Training As Lossy Compression
Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen, Thomas L. Griffiths, Max Bartolo, Seraphina Goldfarb-Tarrant
NLP Large Language Models Theory
  • LLMs are conceptualized as instances of lossy compression, retaining only relevant information from training data.
  • Pre-training dynamics align with Information Bottleneck theory, showing a trajectory of initial expansion followed by compression.
  • The optimality of compression correlates significantly with performance across multiple benchmarks.
  • Quantifying preference information in models predicts downstream performance effectively.
Read more
The Role of Emotional Stimuli and Intensity in Shaping Large Language Model Behavior
Ameen Patel, Felix Lee, Kyle Liang, Joseph Thomas
NLP Large Language Models
  • Emotional prompting can enhance LLM performance but may increase sycophantic behavior.
  • The study evaluates four emotions: joy, encouragement, anger, and insecurity, across varying intensities.
  • A prompt-generation pipeline was developed to create a comprehensive dataset for analysis.
  • Positive emotional stimuli lead to more accurate and less toxic outputs from LLMs.
Read more
Approximation of the Basset force in the Maxey-Riley-Gatignol equations via universal differential equations
Finn Sommer, Vamika Rathi, Sebastian Goetschel, Daniel Ruprecht
Theory Optimization Time Series
  • Introduces a neural network-based approximation for the Basset force in MaRGE.
  • Transforms the integro-differential equations into ordinary differential equations for easier numerical solutions.
  • Compares FNN and LSTM architectures to capture historical effects in particle motion.
  • Demonstrates the effectiveness of the proposed method through numerical experiments in various flow fields.
Read more
Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems
Tolga Dimlioglu, Nadine Chang, Maying Shen, Rafid Mahmood, Jose M. Alvarez
Optimization Robotics Efficient ML
  • MOSAIC optimizes data selection by clustering data into domains and modeling their impact on performance metrics.
  • The framework significantly reduces the amount of data needed for training while improving model performance.
  • MOSAIC outperforms traditional data selection methods, achieving better results with up to 82% less data.
  • The approach is robust across different clustering strategies and emphasizes the importance of scaling laws in data selection.
Read more
Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training
Jinhong Lin, Pan Wang, Zitong Zhan, Lin Zhang, Pedro Morgado
Generative Models Computer Vision Efficient ML
  • Data Warmup addresses inefficiencies in diffusion training by aligning data complexity with model readiness.
  • A semantic-aware complexity metric is introduced, combining foreground dominance and typicality for image scoring.
  • The curriculum's simple-to-complex ordering is critical for performance improvements, as reversing it degrades results.
  • Data Warmup significantly improves IS by up to 6.11 and FID by up to 3.41 on ImageNet datasets.
Read more
Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge
Wonseon Lim, Jaesung Lee, Dae-Won Kim
Computer Vision Efficient ML Robotics
  • CPS-Prompt improves training-time efficiency for continual learning on edge devices.
  • The framework reduces memory usage and computational cost with minimal accuracy loss.
  • Critical Patch Sampling (CPS) and Decoupled Prompt and Classifier Training (DPCT) are the two main components.
  • CPS-Prompt shows significant improvements in peak memory and energy efficiency over existing methods.
Read more
EgoEverything: A Benchmark for Human Behavior Inspired Long Context Egocentric Video Understanding in AR Environment
Qiance Tang, Ziqi Wang, Jieyu Lin, Ziyun Li, Barbara De Salvo, Sai Qian Zhang
Computer Vision Multimodal
  • EgoEverything benchmark incorporates human attention signals for question generation.
  • It includes over 5,000 question-answer pairs based on realistic AR scenarios.
  • The methodology employs a multi-agent VQA pipeline and attention-inspired sampling.
  • Evaluation shows current VLMs struggle with the complexities of real-world AR interactions.
Read more
Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions
Jing Wang, Yu-Yang Qian, Ke Xue, Chao Qian, Peng Zhao, Zhi-Hua Zhou
Large Language Models NLP Efficient ML
  • Output length prediction is critical for efficient LLM serving and resource allocation.
  • Existing methods treat output length as a deterministic scalar, which is statistically misaligned.
  • The proposed ProD framework captures the heavy-tailed nature of output length distributions.
  • ProD-M and ProD-D provide robust point and distributional predictions, respectively.
Read more
SOLAR: Communication-Efficient Model Adaptation via Subspace-Oriented Latent Adapter Reparametrization
Seyed Mahmoud Sajjadi Mohammadabadi, Xiaolong Ma, Lei Yang, Feng Yan, Junshan Zhang
Efficient ML Federated Learning Large Language Models
  • SOLAR significantly reduces communication and storage costs of PEFT adapters.
  • The method utilizes subspace similarity to create compact adapter representations.
  • It is model-agnostic and compatible with existing PEFT methods.
  • The framework allows for post-training compression without modifying the fine-tuning process.
Read more
Provably Adaptive Linear Approximation for the Shapley Value and Beyond
Weida Li, Yaoliang Yu, Bryan Kian Hsiang Low
Theory Efficient ML Interpretability
  • Establishes a theoretical framework for approximating semi-values with improved query complexities.
  • Develops a linear-space algorithm requiring O(n/Ρ² log(1/Ξ΄)) utility queries.
  • Introduces Adalina, an adaptive algorithm that minimizes mean square error in linear time and space.
  • Bridges existing algorithms and clarifies the benefits of paired sampling.
Read more
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control
Prakul Sunil Hiremath
Reinforcement Learning Generative Models Robotics
  • GIRL addresses imagination drift in MBRL through cross-modal grounding and trust-region constraints.
  • The framework utilizes a frozen DINOv2 model to ensure semantic consistency in imagined trajectories.
  • GIRL shows a 38-61% reduction in latent rollout drift compared to DreamerV3.
  • It achieves higher asymptotic returns with 40-55% fewer environment steps on long-horizon tasks.
Read more