AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

61 Papers today
8h Update frequency
7 Days of history
LongFlow: Efficient KV Cache Compression for Reasoning Models
Yi Su, Zhenxu Tian, Dan Qiao, Yuechi Zhou, Juntao Li, Min Zhang
NLP Large Language Models Efficient ML
  • Introduction of LongFlow, a lightweight KV cache compression algorithm tailored for long-output generation.
  • Development of an efficient importance estimation metric that requires negligible computational overhead.
  • Creation of a custom Triton kernel that fuses multiple operations to enhance performance.
  • Demonstration of significant improvements in throughput and KV cache size reduction with minimal accuracy loss.
Read more
Security Considerations for Artificial Intelligence Agents
Ninghui Li, Kaiyuan Zhang, Kyle Polley, Jerry Ma
Large Language Models Theory Optimization
  • AI agent systems introduce unique security vulnerabilities distinct from traditional software due to the blurring of code and data.
  • Current security mechanisms are often mismatched for the autonomous and adaptable nature of AI agents.
  • The paper identifies critical attack surfaces and emphasizes the need for layered defense strategies.
  • There is a significant gap in standards and research for secure multi-agent system design.
Read more
AutoScout: Structured Optimization for Automating ML System Configuration
Jimmy Shong, Yuhan Ding, Yihan Jiang, Liheng Jing, Haonan Chen, Gaokai Zhang, Aditya Akella, Fan Lai
Optimization Efficient ML
  • AutoScout addresses the challenges of optimizing ML system configurations in a mixed-discrete/continuous space.
  • It employs a hybrid optimization framework that combines tree-based search and gradient-guided optimization.
  • The system achieves 2.7–3.0× training speedup compared to expert-tuned settings.
  • AutoScout is 13.7–16.5× faster than existing system configurators in generating high-performance configurations.
Read more
Disentangled Representation Learning through Unsupervised Symmetry Group Discovery
Dang-Nhu Barthélémy, Annabi Louis, Argentieri Sylvain
Reinforcement Learning Robotics Theory
  • Introduces a method for unsupervised discovery of symmetry groups in representation learning.
  • Proves the identifiability of symmetry group decomposition under minimal assumptions.
  • Develops algorithms for both symmetry group discovery and LSBD representation learning.
  • Demonstrates improved performance over existing LSBD methods in various environments.
Read more
Separable neural architectures as a primitive for unified predictive and generative intelligence
Reza T. Batley, Apurba Sarker, Rajib Mostakim, Andrew Klichine, Sourav Saha
Generative Models Reinforcement Learning Theory
  • Introduces separable neural architectures (SNA) as a framework for exploiting factorisable structures in intelligent systems.
  • Demonstrates the ability of SNAs to unify predictive and generative modeling across various domains.
  • Highlights the application of SNAs in reinforcement learning, inverse generation, turbulent flow modeling, and language modeling.
  • Establishes SNAs as a lightweight architecture capable of real-time operation with minimal parameters.
Read more
Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
Sai V R Chereddy
Computer Vision Interpretability
  • Video models can represent nuanced action outcomes even when final classifications are correct.
  • Mechanistic interpretability techniques reveal distinct internal mechanisms in video models.
  • Attention Heads gather evidence while MLP Blocks compose concepts for action outcomes.
  • The model's internal representation showcases hidden knowledge beyond explicit tasks.
Read more
KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation
Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang Liang
NLP Large Language Models Graph Learning
  • KEPo is a novel poisoning attack method specifically designed for GraphRAG systems.
  • The method generates toxic events and manipulates knowledge evolution paths to poison the KG.
  • KEPo outperforms existing poisoning attack methods in both single-target and multi-target scenarios.
  • The research exposes significant security vulnerabilities in GraphRAG frameworks.
Read more
Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers
Orit Davidovich, Zohar Ringel
Theory Large Language Models NLP
  • Formal definition of Algorithmic Capture and its implications for neural networks.
  • Transformers exhibit an inductive bias towards low-complexity algorithms, limiting their ability to learn higher-complexity tasks.
  • Upper bounds on inference-time complexity show that infinite-width transformers cannot capture algorithms with heuristic complexity beyond O(T^2+ϵ).
  • The study contrasts statistical learning with genuine algorithmic learning, emphasizing the importance of generalization to large problem sizes.
Read more
Language Generation with Replay: A Learning-Theoretic View of Model Collapse
Giorgio Racca, Michal Valko, Amartya Sanyal
NLP Large Language Models Theory
  • Introduces a learning-theoretic framework to analyze model collapse in LLMs.
  • Defines a replay adversary that simulates the re-entry of generated text into training data.
  • Demonstrates that replay affects generatability differently across various definitions.
  • Aligns theoretical results with practical strategies like data cleaning and watermarking.
Read more
Harnessing Data Asymmetry: Manifold Learning in the Finsler World
Thomas Dagès, Simon Weber, Daniel Cremers, Ron Kimmel
Theory Optimization
  • Introduction of Finsler geometry to capture asymmetric dissimilarities in manifold learning.
  • Development of a Finsler manifold learning pipeline that broadens the applicability of asymmetric embeddings.
  • Generalization of existing methods like t-SNE and Umap to accommodate asymmetric data.
  • Empirical results show superior performance of Finsler embeddings over traditional Euclidean methods.
Read more
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh
Reinforcement Learning Robotics Optimization
  • Introduction of cross-domain Bellman consistency to measure model transferability.
  • Development of QAvatar framework for knowledge transfer between domains with distinct state and action spaces.
  • Demonstration of QAvatar's convergence properties and sample efficiency improvements.
  • Experimental validation showing superior performance of QAvatar over existing CDRL benchmarks.
Read more
Effective Resistance Rewiring: A Simple Topological Correction for Over-Squashing
Bertran Miquel-Oliver, Manel Gil-Sorribes, Victor Guallar, Alexis Molina
Graph Learning
  • Introduces Effective Resistance Rewiring (ERR) to combat over-squashing in GNNs.
  • Utilizes effective resistance as a global measure to identify structural bottlenecks.
  • Demonstrates a trade-off between over-squashing and oversmoothing in GNNs.
  • Combines ERR with normalization techniques to improve model performance.
Read more
Sharpness-Aware Minimization for Generalized Embedding Learning in Federated Recommendation
Fengyuan Yu, Xiaohua Feng, Yuyuan Li, Changwang Zhang, Jun Wang, Chaochao Chen
Federated Learning Optimization
  • FedRecGEL reformulates federated recommendation as a multi-task learning problem focused on generalized item embeddings.
  • Sharpness-aware minimization is utilized to address the generalization challenges in embedding learning.
  • The proposed framework stabilizes the training process and enhances recommendation performance.
  • Extensive experiments show significant performance improvements over existing federated recommendation methods.
Read more
On the Role of Reversible Instance Normalization
Gaspard Berthelier, Tahar Nabil, Etienne Le Naour, Richard Niamke, Samir Perlaza, Giovanni Neglia
Time Series
  • Identification of three key challenges in normalization for time series forecasting: temporal, spatial, and conditional distribution shifts.
  • Ablation studies reveal redundancies and limitations in the components of Reversible Instance Normalization (RevIN).
  • The paper critiques the effectiveness of RevIN in addressing distribution shifts, challenging its widespread adoption.
  • Proposes new perspectives for improving normalization strategies tailored to time series data.
Read more
STAMP: Selective Task-Aware Mechanism for Text Privacy
Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon
NLP Large Language Models Theory
  • STAMP selectively allocates privacy budgets based on token importance and sensitivity.
  • Introduces the polar mechanism for perturbing token embeddings directionally.
  • Maintains semantic neighborhoods in embedding space, enhancing downstream utility.
  • Demonstrates superior performance on multiple text datasets compared to traditional methods.
Read more
Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference
Valentyn Melnychuk, Vahid Balazadeh, Stefan Feuerriegel, Rahul G. Krishnan
Theory
  • PFNs can exhibit prior-induced confounding bias, preventing frequentist consistency.
  • A one-step posterior correction (OSPC) is proposed to restore frequentist consistency.
  • The OSPC leads to a semi-parametric Bernstein-von Mises theorem for calibrated PFNs.
  • Martingale posteriors are utilized to implement the OSPC effectively.
Read more
A Learning-Based Superposition Operator for Non-Renewal Arrival Processes in Queueing Networks
Eliran Sherzer
Theory Efficient ML Time Series
  • Introduces a learning-based superposition operator for non-renewal arrival processes.
  • Utilizes deep learning to map statistical descriptors of arrival streams to their superposition.
  • Demonstrates significant performance improvements over classical renewal-based methods.
  • Enables accurate distributional performance analysis in queueing networks.
Read more
Slack More, Predict Better: Proximal Relaxation for Probabilistic Latent Variable Model-based Soft Sensors
Zehua Zou, Yiran Ma, Yulong Zhang, Zhengnan Li, Zeyu Yang, Jinhao Xie, Xiaoyu Jiang, Zhichao Chen
Generative Models Optimization Theory
  • Introduction of KProxNPLVM to improve soft sensor modeling accuracy.
  • Use of Wasserstein distance as a proximal operator to relax the learning objective.
  • Rigorous derivation and proof of convergence for the proposed optimization algorithm.
  • Extensive experimental validation on synthetic and real-world datasets.
Read more
Monitoring and Prediction of Mood in Elderly People during Daily Life Activities
Daniel Bautista-Salinas, Joaquín Roca González, Inmaculada Méndez, Oscar Martinez Mozos
Time Series
  • Development of a wearable system for mood monitoring in elderly people.
  • Utilization of ecological momentary assessment (EMA) for real-time mood tracking.
  • Machine learning classifier trained on physiological data from a wristband.
  • Promising results in mood prediction accuracy, especially for happiness and activeness.
Read more
Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study
Mohammad Shihab Uddin, Md Hasibul Amin, Nusrat Jahan Ema, Bushra Uddin, Tanvir Ahmed, Arif Hassan Zidan
NLP
  • Explores financial fraud detection in a multilingual Bangla-English context.
  • Compares classical machine learning models with transformer-based architectures.
  • Highlights the impact of linguistic characteristics and low-resource constraints on model performance.
  • Demonstrates that classical models can outperform transformers in certain metrics.
Read more
Bayesian Optimization of Partially Known Systems using Hybrid Models
Eike Cramer, Luis Kutschat, Oliver Stollenwerk, Joel A. Paulson, Alexander Mitsos
Optimization
  • Introduces a hybrid Bayesian optimization framework that combines mechanistic models with probabilistic Gaussian processes.
  • Demonstrates significant improvements in convergence speed and optimization quality over traditional Bayesian optimization methods.
  • Applies the hybrid model to a single-stage distillation optimization, achieving better designs with fewer iterations.
  • Transforms the optimization problem into a constrained, nonlinear stochastic program for effective solution.
Read more
Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates
Haoze Song, Zhihao Li, Mengyi Deng, Xin Li, Duyi Pan, Zhilu Lai, Wei Wang
Theory Optimization Efficient ML
  • Introduces a structure-aware UQ scheme for neural operator PDE surrogates.
  • Focuses on perturbations in the lifting module to improve uncertainty estimates.
  • Demonstrates improved reliability and tighter uncertainty bands in experiments.
  • Addresses the limitations of existing UQ methods like MCDropout and Deep Ensembles.
Read more
Exponential-Family Membership Inference: From LiRA and RMIA to BaVarIA
Rickard Brännvall
Theory
  • Unification of LiRA, RMIA, and BASE under a single exponential-family framework.
  • Introduction of BaVarIA, a Bayesian approach that improves variance estimation.
  • Empirical results show BaVarIA outperforms existing methods in low-shadow-model budgets.
  • The framework clarifies the relationship between different MIA methods and their assumptions.
Read more
ARROW: Augmented Replay for RObust World models
Abdulaziz Alyahya, Abdallah Al Siyabi, Markus R. Ernst, Luke Yang, Levin Kuhlmann, Gideon Kowadlo
Reinforcement Learning Robotics Efficient ML
  • ARROW introduces a dual-buffer system for memory-efficient continual reinforcement learning.
  • The algorithm significantly reduces catastrophic forgetting compared to traditional methods.
  • ARROW maintains comparable forward transfer while preserving task diversity.
  • The approach is inspired by neuroscience, leveraging principles of memory systems.
Read more
Multi-Task Anti-Causal Learning for Reconstructing Urban Events from Residents' Reports
Liangkai Zhou, Susu Xu, Shuqi Zhong, Shan Lin
Theory
  • Introduces a novel framework (MTAC) for anti-causal learning in multi-task settings.
  • Utilizes a structured multi-task structural equation model to separate task-invariant and task-specific causal mechanisms.
  • Implements MAP-based inference for cause reconstruction from observed outcomes.
  • Demonstrates significant improvements in urban event reconstruction accuracy using real-world data.
Read more
Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability
Xingyu Xie, Zhaochen Yu, Yue Liao, Tao Wang, Kim-Chuan Toh, Shuicheng Yan
NLP Large Language Models Efficient ML
  • Identification of within-sentence support stability, where attention support remains stable over short coherent spans.
  • Introduction of Slow-Fast Inference (SFI) framework that alternates between low-cost fast steps and dense slow steps.
  • Development of a training-free Selector that converts dense-attention evidence into reusable memory.
  • Achieved significant throughput improvements (1.6× to 14.4×) without retraining, maintaining quality on par with full-KV baselines.
Read more
Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning
Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
Reinforcement Learning Robotics Multimodal
  • Simple Sequential Fine-Tuning (Seq. FT) with LoRA achieves high performance in continual learning for VLA models.
  • Contrary to previous beliefs, Seq. FT exhibits little to no catastrophic forgetting.
  • The synergy between pretrained models, parameter-efficient adaptation, and on-policy RL enhances stability and plasticity.
  • The study provides a principled starting point for scalable lifelong embodied intelligence.
Read more
Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives
Taeho Lee, Donghwan Lee
Reinforcement Learning Robotics Optimization
  • Introduction of MMDDPG framework for robust policy learning in continuous control tasks.
  • Formulation of a minimax optimization problem between user and adversarial policies.
  • Use of a fractional objective to stabilize the interaction and prevent excessive disturbances.
  • Demonstrated improved robustness in experimental evaluations against external disturbances.
Read more
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
Indranil Halder, Annesya Banerjee, Cengiz Pehlevan
NLP Large Language Models Theory
  • Adversarial prompt injection can significantly amplify the attack success rate of LLMs.
  • The scaling of attack success rates transitions from polynomial to exponential based on the length of injected prompts.
  • A theoretical model based on spin-glass theory provides insights into the dynamics of adversarial attacks on LLMs.
  • Empirical validation of the model's predictions was conducted using various large language models.
Read more
Single molecule localization microscopy challenge: a biologically inspired benchmark for long-sequence modeling
Fatemeh Valeh, Monika Farsang, Radu Grosu, Gerhard Schütz
Computer Vision Time Series Theory
  • Introduction of SMLM-C as a benchmark for evaluating long-sequence models in biological imaging.
  • Focus on the unique challenges of sparse, irregular, and heavy-tailed temporal processes in SMLM data.
  • Evaluation of state space models S5 and Mamba reveals significant limitations in handling extreme sparsity and noise.
  • Highlights the necessity for methodological advancements in sequence modeling for biological applications.
Read more
Causal Representation Learning with Optimal Compression under Complex Treatments
Wanting Liang, Haoang Chi, Zhiheng Zhang
Theory Efficient ML Generative Models
  • Introduces a novel estimator for optimal balancing weight α, eliminating heuristic tuning.
  • Proposes Treatment Aggregation strategy for O(1) scalability in multi-treatment settings.
  • Extends the framework to a generative architecture preserving Wasserstein geodesic structure.
  • Demonstrates significant improvements in estimation accuracy and efficiency over traditional models.
Read more
Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
Mohammad Aflah Khan, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander
NLP Large Language Models Efficient ML
  • Partial RoPE can achieve comparable convergence to full RoPE with significant memory savings.
  • Applying RoPE to only 10% of dimensions maintains training stability across various architectures.
  • Higher-quality data correlates with lower loss and similar performance benchmarks.
  • Models without positional encoding may face instability, which can be addressed with minimal RoPE.
Read more
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models
Jae-Won Chung, Jeff J. Ma, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury
Multimodal
  • Cornserve is the first distributed serving system specifically for Any-to-Any multimodal models.
  • It allows for flexible task abstractions and model fission, enabling independent scaling of model components.
  • The system utilizes a record-and-replay execution model for efficient data management.
  • Cornserve is built on Kubernetes and supports a variety of multimodal models.
Read more
Geometry-Aware Probabilistic Circuits via Voronoi Tessellations
Sahil Sidheekh, Sriraam Natarajan
Generative Models Theory Interpretability
  • Introduces Voronoi tessellations as a method to enhance probabilistic circuits with geometric awareness.
  • Formalizes the incompatibility between Voronoi-based routing and tractable inference in PCs.
  • Develops two solutions: approximate inference with certified bounds and a structural condition for exact inference.
  • Presents Hierarchical Factorized Voronoi circuits that enable tractable inference.
Read more
Beyond Barren Plateaus: A Scalable Quantum Convolutional Architecture for High-Fidelity Image Classification
Radhakrishnan Delhibabu
Computer Vision Theory Efficient ML
  • Introduction of a scalable QCNN architecture that mitigates barren plateaus.
  • Achieved a classification accuracy of 98.7% on the MNIST dataset.
  • Demonstrated a significant reduction in the number of required trainable parameters compared to classical CNNs.
  • Utilized localized cost functions and tensor-network initialization to enhance training efficiency.
Read more
UniHetCO: A Unified Heterogeneous Representation for Multi-Problem Learning in Unsupervised Neural Combinatorial Optimization
Kien X. Nguyen, Ilya Safro
Optimization Graph Learning
  • Introduces a unified heterogeneous graph representation for multiple combinatorial optimization problems.
  • Employs a gradient-norm-based dynamic weighting scheme to address gradient imbalance during training.
  • Demonstrates competitive performance against state-of-the-art unsupervised NCO methods.
  • Shows strong cross-problem adaptation potential and effective warm starts for classical solvers.
Read more
Heavy-Tailed Principle Component Analysis
Mario Sayde, Christopher Khater, Jihad Fahs, Ibrahim Abou-Faycal
Theory
  • Introduces a robust PCA framework for heavy-tailed data using a superstatistical model.
  • Utilizes a logarithmic loss function to maintain performance without relying on finite variance assumptions.
  • Demonstrates that principal components from heavy-tailed data coincide with those from Gaussian covariance.
  • Proposes new robust covariance estimators that outperform classical methods in challenging noise conditions.
Read more
Inverse Neural Operator for ODE Parameter Optimization
Zhi-Song Liu, Wenqing Peng, Helmi Toropainen, Ammar Kheder, Andreas Rupp, Holger Froning, Xiaojie Lin, Michael Boy
Optimization Theory Time Series
  • Introduction of the Inverse Neural Operator (INO) framework for ODE parameter recovery.
  • Utilization of Conditional Fourier Neural Operator (C-FNO) with cross-attention to enhance trajectory reconstruction.
  • Development of Amortized Drifting Model (ADM) to stabilize parameter optimization without backpropagation.
  • Demonstrated superior performance in parameter recovery accuracy and inference speed compared to traditional methods.
Read more
Personalized Federated Learning via Gaussian Generative Modeling
Peng Hu, Jianwei Ma
Federated Learning Generative Models
  • Introduction of pFedGM, a method for personalized federated learning using Gaussian generative modeling.
  • Focus on balancing global collaboration and personalization through a dual objective approach.
  • Decoupling of the Gaussian classifier into a navigator and a statistic extractor to enhance representation learning.
  • Utilization of Bayesian inference for class probability estimation based on representation distributions.
Read more
Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization
Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao
Reinforcement Learning Robotics Theory
  • H-EARS framework combines potential-based reward shaping with energy-aware action regularization.
  • Achieves linear modeling complexity by focusing on dominant energy components.
  • Establishes a theoretical foundation for optimizing task performance and energy efficiency.
  • Demonstrates significant improvements in convergence speed and stability in empirical tests.
Read more
Bridging Discrete Marks and Continuous Dynamics: Dual-Path Cross-Interaction for Marked Temporal Point Processes
Yuxiang Liu, Qiao Liu, Tong Luo, Yanglei Gan, Peng He, Yao Liu
Time Series
  • NEXTPP integrates discrete event marks and continuous dynamics for improved event prediction.
  • The model employs a dual-channel architecture with self-attention and Neural ODEs.
  • Cross-attention enables bidirectional interaction between discrete and continuous representations.
  • Extensive experiments show NEXTPP outperforms existing models on real-world datasets.
Read more
Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control
Ihor Kendiukhov
Interpretability
  • Exhaustive circuit tracing reveals 1,393,850 significant edges, highlighting massive redundancy in feature interactions.
  • A heavy-tailed hub architecture is identified, with 1.8% of features accounting for disproportionate connectivity.
  • Systematic annotation bias is exposed, with 40% of top hubs lacking biological annotation.
  • Redundancy in feature interactions increases with interaction order, confirming a fundamentally subadditive architecture.
Read more
Duration Aware Scheduling for ASR Serving Under Workload Drift
Darshan Makwana, Yash Jogi, Harsh Kotta, Aayush Kubba
Audio & Speech Optimization Efficient ML
  • Duration-aware scheduling can significantly reduce end-to-end latency in ASR systems.
  • SJF reduces median latency by up to 73% but can cause increased tail latency.
  • HRRN provides a balanced approach, improving median latency while controlling tail latency degradation.
  • Both scheduling algorithms incur less than 0.1 ms overhead per request.
Read more
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
Yuval Ran-Milo
Theory NLP Large Language Models
  • Attention sinks are necessary for softmax Transformers to compute certain trigger-conditional tasks.
  • Normalization in softmax attention forces attention to concentrate on a fixed position, leading to sink behavior.
  • ReLU attention can solve the same tasks without inducing attention sinks, indicating the role of normalization.
  • The findings have implications for understanding attention mechanisms in various contexts, including multimodal and vision tasks.
Read more
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich
NLP Large Language Models Reinforcement Learning
  • Introduces a feature-matching loss for language model fine-tuning targeting sequence-level statistics.
  • Proposes Energy-Based Fine-Tuning (EBFT) as a method to optimize the feature-matching loss.
  • Demonstrates that EBFT outperforms traditional supervised fine-tuning and matches RLVR in downstream tasks.
  • Highlights the limitations of token-level supervision and the need for sequence-level calibration.
Read more
CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time
Nghia D. Nguyen, Pablo Robles-Granda, Lav R. Varshney
Time Series Theory Optimization
  • CAETC addresses time-dependent confounding bias in counterfactual estimation.
  • The method is model-agnostic and can be applied to various sequence architectures.
  • An entropy maximization adversarial game is introduced to ensure balanced representation.
  • CAETC shows significant improvements over existing methods in empirical evaluations.
Read more
The Latent Color Subspace: Emergent Order in High-Dimensional Chaos
Mateusz Pach, Jessica Bader, Quentin Bouniot, Serge Belongie, Zeynep Akata
Generative Models Computer Vision Interpretability
  • Introduction of the Latent Color Subspace (LCS) in the VAE latent space of FLUX, reflecting HSL representation.
  • Demonstration of a training-free method for color intervention in image generation.
  • Validation of the LCS interpretation through accurate prediction and control of color in generated images.
  • Facilitation of fine-grained color control over specific objects in images using semantic segmentation.
Read more
Deep Learning Network-Temporal Models For Traffic Prediction
Yufeng Xin, Ethan Fan
Time Series Graph Learning Large Language Models
  • Introduction of two deep learning models for multivariate time series prediction in network traffic.
  • The GAT model captures both temporal and topological correlations, while the LLM model enhances generalization.
  • Extensive performance evaluations demonstrate the superiority of the LLM model over traditional methods.
  • Insights into prediction variance and correlation variability are provided, emphasizing the need for robust evaluation metrics.
Read more
Teleodynamic Learning a new Paradigm For Interpretable AI
Enrique ter Horst, Juan Diego Zambrano
Theory Interpretability Optimization
  • Teleodynamic Learning redefines learning as a dynamic process rather than static optimization.
  • The framework incorporates both continuous and discrete adaptations, reflecting biological learning processes.
  • DE11, the proposed teleodynamic learner, achieves high accuracy on benchmark datasets while providing interpretable outputs.
  • The approach emphasizes the co-evolution of structure, parameters, and resources under constraints.
Read more
Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors
Minrui Luo, Zhiheng Zhang
Theory Efficient ML
  • Introduction of Mixed Synthetic Nearest Neighbors (MSNN) for causal matrix completion under MNAR conditions.
  • MSNN integrates data across multiple treatment levels, enhancing sample efficiency for sparse treatments.
  • The method retains the finite-sample error bounds and asymptotic normality of the original SNN estimator.
  • Empirical results show MSNN's effectiveness in estimating causal effects in data-scarce environments.
Read more
A Quantitative Characterization of Forgetting in Post-Training
Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan
Generative Models Theory Optimization
  • Introduces a theoretical framework for understanding forgetting in continual learning.
  • Distinguishes between mass forgetting and old-component drift in generative models.
  • Shows that forward-KL objectives lead to mass forgetting, while reverse-KL objectives help retain old knowledge.
  • Quantifies the impact of replay mechanisms on forgetting dynamics.
Read more
EnTransformer: A Deep Generative Transformer for Multivariate Probabilistic Forecasting
Rajdeep Pathak, Rahul Goswami, Madhurima Panja, Palash Ghosh, Tanujit Chakraborty
Time Series Generative Models
  • EnTransformer is a new generative Transformer framework for multivariate probabilistic forecasting.
  • It integrates the engression principle to learn predictive distributions without restrictive assumptions.
  • The model effectively captures long-range dependencies and cross-series interactions.
  • Empirical evaluations show that EnTransformer outperforms benchmark models across multiple datasets.
Read more
Reference-Guided Machine Unlearning
Jonas Mirlach, Sonia Laguna, Julia E. Vogt
Computer Vision Theory Efficient ML
  • REGUN introduces a structured approach to machine unlearning using held-out data as a reference.
  • The framework emphasizes distributional indistinguishability between forgotten and unseen data.
  • REGUN outperforms traditional unlearning methods that rely on performance degradation heuristics.
  • Empirical validation shows improved forgetting-utility trade-offs across multiple architectures and datasets.
Read more
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran
NLP Large Language Models Optimization
  • Introduces a scaling-law framework for analyzing jailbreak attacks on LLMs.
  • Demonstrates that prompting-based methods are more compute-efficient than optimization-based methods.
  • Establishes a relationship between attacker budget and attack success using a saturating exponential fit.
  • Identifies distinct success-stealthiness operating points for different attack paradigms.
Read more
Chemical Reaction Networks Learn Better than Spiking Neural Networks
Sophie Jaffard, Ivo F. Sbalzarini
Theory
  • CRNs can solve classification tasks without the need for hidden layers, unlike SNNs.
  • The paper provides mathematical guarantees for the learning behavior of CRNs.
  • Numerical experiments show CRNs outperform SNNs in classifying handwritten digits.
  • The study highlights the potential of CRNs in machine learning applications.
Read more
Flowcean - Model Learning for Cyber-Physical Systems
Maximilian Schmidt, Swantje Plambeck, Markus Knitt, Hendrik Rose, Goerschwin Fey, Jan Christian Wieck, Stephan Balduin
Optimization Theory Efficient ML
  • Flowcean automates model generation for Cyber-Physical Systems using data-driven learning.
  • The framework emphasizes modularity and usability, allowing for integration of diverse learning strategies.
  • It addresses the challenges of CPS modeling, which often requires significant manual effort and domain expertise.
  • Flowcean customizes learning pipelines to the specific characteristics of each CPS, enhancing adaptability.
Read more
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
Hong Yang, Devroop Kar, Qi Yu, Travis Desell, Alex Ororbia
Computer Vision Theory
  • Identification of Domain-Sensitivity Collapse (DSC) as a critical failure mode in single-domain OOD detection.
  • Introduction of Teacher-Guided Training (TGT) to transfer domain-sensitive features from a multi-domain teacher model to a single-domain student model.
  • Demonstration of significant improvements in OOD detection performance across multiple benchmarks.
  • TGT maintains or slightly improves in-domain classification accuracy while reducing OOD false positive rates.
Read more
Survival Meets Classification: A Novel Framework for Early Risk Prediction Models of Chronic Diseases
Shaheer Ahmad Khan, Muhammad Usamah Shahid, Muddassar Farooq
Interpretability
  • Integration of survival analysis with classification techniques for chronic disease prediction.
  • Development of models using only EMR data, excluding lab results, for early risk assessment.
  • Demonstrated performance of survival models comparable to state-of-the-art classifiers.
  • Utilization of SHAP for model explainability, validated by expert physicians.
Read more
Relaxed Efficient Acquisition of Context and Temporal Features
Yunni Qu, Dzung Dinh, Grant King, Whitney Ringwald, Bing Cai Kok, Kathleen Gates, Aiden Wright, Junier Oliva
Efficient ML Optimization Time Series
  • Introduces a unified framework for onboarding and longitudinal feature acquisition in biomedical applications.
  • Employs Gumbel-Sigmoid relaxation for efficient gradient-based optimization of discrete acquisition decisions.
  • Demonstrates improved predictive performance and reduced costs compared to existing methods.
  • Addresses the practical challenges of measurement acquisition in real-world clinical workflows.
Read more
On the Robustness of Langevin Dynamics to Score Function Error
Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Yuchen Wu
Generative Models Theory
  • Langevin dynamics is sensitive to L2 errors in score function estimates, leading to significant deviations from target distributions.
  • Diffusion models maintain robustness under small L2 errors, unlike Langevin dynamics.
  • The paper provides a formal proof of the relationship between score estimation errors and the performance of Langevin dynamics.
  • The findings caution against the use of Langevin dynamics with estimated scores in high-dimensional settings.
Read more
A Multi-Label Temporal Convolutional Framework for Transcription Factor Binding Characterization
Pietro Demurtas, Ferdinando Zanchetta, Giovanni Perini, Rita Fioresi
Time Series
  • Introduces a multi-label classification framework for predicting TF binding sites.
  • Utilizes Temporal Convolutional Networks (TCNs) to capture interactions among multiple TFs.
  • Demonstrates that TCNs outperform traditional RNNs and attention-based models in biological sequence analysis.
  • Reveals biologically meaningful motifs and potential new TF interactions.
Read more