AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

55 Papers today
8h Update frequency
7 Days of history
Resource-Efficient Iterative LLM-Based NAS with Feedback Memory
Xiaojie Gu, Dmitry Ignatov, Radu Timofte
Computer Vision Large Language Models Efficient ML
  • Introduces a closed-loop iterative NAS pipeline utilizing LLMs for architecture generation and refinement.
  • Employs a historical feedback memory mechanism to learn from past attempts, enhancing iterative learning.
  • Achieves significant improvements in model accuracy on CIFAR datasets with minimal computational resources.
  • Demonstrates the feasibility of conducting NAS on a single consumer-grade GPU without cloud infrastructure.
Read more
Entropy-Preserving Reinforcement Learning
Aleksei Petrenko, Ben Lipkin, Kevin Chen, Erik Wijmans, Marco Cusumano-Towner, Raja Giryes, Philipp Krähenbühl
Reinforcement Learning NLP Large Language Models
  • Entropy reduction in policy gradient algorithms can limit exploration and lead to suboptimal policies.
  • Active monitoring and control of entropy during training can enhance policy performance.
  • The paper introduces REPO and ADAPO as mechanisms for effective entropy regulation.
  • Maintaining diversity in explored trajectories is crucial for robust learning in RL.
Read more
Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
Mynampati Sri Ranganadha Avinash
Large Language Models Interpretability Efficient ML
  • Introduction of routing signatures as a representation of expert activation patterns.
  • Demonstration of strong task-conditioned clustering of routing signatures in MoE transformers.
  • Validation of routing patterns against permutation and load-balancing baselines.
  • High accuracy in task classification using routing signatures.
Read more
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance
Joyce Lee, Seth Blumberg
Reinforcement Learning Optimization Theory
  • Introduces a simulation environment for antibiotic prescribing policy optimization under AMR.
  • Allows customization of patient populations and antibiotic resistance dynamics.
  • Compatible with Gymnasium RL API for training reinforcement learning agents.
  • Models antibiotic prescribing as a Markov Decision Process (MDP) with partial observability.
Read more
Graph Tokenization for Bridging Graphs and Transformers
Zeyuan Guo, Enmao Diao, Cheng Yang, Chuan Shi
Graph Learning
  • Introduction of a graph tokenization framework that combines reversible graph serialization with BPE.
  • Structure-guided serialization process that addresses ordering ambiguities in graphs.
  • Enables standard Transformer models to achieve state-of-the-art results on 14 graph benchmarks.
  • Outperforms traditional GNNs and specialized Graph Transformers in various tasks.
Read more
Effective Resistance Rewiring: A Simple Topological Correction for Over-Squashing
Bertran Miquel-Oliver, Manel Gil-Sorribes, Victor Guallar, Alexis Molina
Graph Learning
  • Introduces Effective Resistance Rewiring (ERR) to address over-squashing in GNNs.
  • ERR uses effective resistance as a global measure to identify structural bottlenecks.
  • Demonstrates a trade-off between over-squashing and oversmoothing in GNNs.
  • Combining ERR with normalization techniques enhances model performance.
Read more
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich
NLP Large Language Models Reinforcement Learning
  • Introduces a feature-matching loss for fine-tuning language models targeting sequence-level statistics.
  • Proposes Energy-Based Fine-Tuning (EBFT) as an efficient method to optimize the feature-matching objective.
  • EBFT outperforms traditional supervised fine-tuning (SFT) and matches reinforcement learning with verifiable rewards (RLVR) in downstream tasks.
  • Demonstrates that EBFT achieves lower validation cross-entropy while improving downstream accuracy.
Read more
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
Yuval Ran-Milo
Theory NLP Large Language Models
  • Attention sinks are a necessary feature of softmax transformers for certain tasks.
  • A trigger-conditional task is introduced to formalize the need for attention sinks.
  • Softmax normalization drives the formation of attention sinks, unlike ReLU attention.
  • Empirical experiments validate theoretical predictions regarding attention behavior.
Read more
Security Considerations for Artificial Intelligence Agents
Ninghui Li, Kaiyuan Zhang, Kyle Polley, Jerry Ma
Large Language Models Theory
  • AI agents introduce new security vulnerabilities distinct from traditional software systems.
  • The distinction between code and data is increasingly blurred in LLM-powered systems.
  • Existing security mechanisms may not be suitable for the autonomous and adaptable nature of AI agents.
  • A layered defense strategy is necessary to address the unique risks associated with AI agents.
Read more
Survival Meets Classification: A Novel Framework for Early Risk Prediction Models of Chronic Diseases
Shaheer Ahmad Khan, Muhammad Usamah Shahid, Muddassar Farooq
Theory Interpretability
  • Integration of survival analysis and classification for chronic disease risk prediction.
  • Development of models that do not rely on lab results, enhancing early intervention capabilities.
  • Performance metrics of the proposed models are competitive with leading machine learning models.
  • Clinically validated explanations of model predictions using SHAP, ensuring relevance to healthcare.
Read more
STAMP: Selective Task-Aware Mechanism for Text Privacy
Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon
NLP Large Language Models Efficient ML
  • STAMP provides a selective approach to text privatization, enhancing privacy without sacrificing task utility.
  • The polar mechanism allows for direction-only perturbations of embeddings, preserving semantic meaning.
  • Experimental results show STAMP outperforms traditional methods in maintaining privacy-utility balance.
  • The framework is applicable to various contexts, including inference-time privacy and privacy-preserving text rewriting.
Read more
Representation Finetuning for Continual Learning
Haihua Luo, Xuming Ran, Tommi Kärkkäinen, Huiyan Xue, Zhonghua Chen, Qi Xu, Fengyu Cong
Efficient ML Robotics Theory
  • CoRe is the first framework to integrate representation finetuning into continual learning.
  • It performs task-specific interventions in low-rank subspaces of hidden representations.
  • CoRe achieves superior parameter efficiency and mitigates catastrophic forgetting.
  • Extensive experiments show CoRe outperforms existing parameter-efficient fine-tuning methods.
Read more
Heavy-Tailed Principle Component Analysis
Mario Sayde, Christopher Khater, Jihad Fahs, Ibrahim Abou-Faycal
Theory
  • Introduces a robust PCA framework for heavy-tailed data using a superstatistical model.
  • Formulates PCA with a logarithmic loss function, applicable even without finite moments.
  • Demonstrates that principal components from heavy-tailed data coincide with those from Gaussian covariance.
  • Proposes new robust covariance estimators that outperform classical methods in challenging noise conditions.
Read more
LongFlow: Efficient KV Cache Compression for Reasoning Models
Yi Su, Zhenxu Tian, Dan Qiao, Yuechi Zhou, Juntao Li, Min Zhang
NLP Large Language Models Efficient ML
  • Introduction of LongFlow, a lightweight KV cache compression algorithm tailored for long-output generation.
  • Efficient importance estimation derived from attention computation, requiring negligible overhead.
  • Development of a custom Triton kernel that fuses multiple operations to enhance performance.
  • Achieves up to 11.8× throughput improvement and 80% KV cache compression with minimal accuracy loss.
Read more
Context-dependent manifold learning: A neuromodulated constrained autoencoder approach
Jérôme Adriaens, Guillaume Drion, Pierre Sacré
Theory Interpretability Robotics
  • Introduction of the Neuromodulated Constrained Autoencoder (NcAE) for context-dependent manifold learning.
  • Integration of a neuromodulatory mechanism to adaptively tune geometric constraints based on static context.
  • Demonstrated effectiveness on dynamical systems, capturing manifold geometry variations.
  • Maintains rigorous projection properties, ensuring physical consistency in latent space.
Read more
Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
Jongwoo Ko, Sara Abdali, Young Jin Kim, Tianyi Chen, Pashmina Cameron
Reinforcement Learning Large Language Models Efficient ML
  • REOPOLD stabilizes on-policy distillation by relaxing strict imitation constraints.
  • The framework utilizes modern RL insights to improve sample efficiency and test-time scaling.
  • Empirical results show REOPOLD outperforms traditional methods in various reasoning tasks.
  • The approach allows smaller models to achieve performance levels comparable to much larger models.
Read more
Efficient Generative Modeling with Unitary Matrix Product States Using Riemannian Optimization
Haotong Duan, Zhongming Chen, Ngai Wong
Generative Models Optimization Efficient ML
  • Introduction of a unitary MPS framework that enhances generative modeling by enforcing tensor-norm constraints.
  • Development of a Riemannian optimization technique that improves training stability and efficiency for MPS.
  • Demonstration of strong generative performance on benchmark datasets, validating the advantages of the proposed method.
Read more
H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code
Amit Singh, Vedant Nipane, Pulkit Agrawal, Jatin Kishnani
Large Language Models NLP Generative Models
  • Development of a large-scale training corpus for embedded systems code using repository-datasheet pairs.
  • Identification of optimal hyperparameters for continual pretraining using Bayesian optimization and grid search.
  • Significant improvements in perplexity and generative accuracy over existing models in specialized embedded domains.
  • Demonstration that smaller models can rival larger frontier models in specific technical tasks.
Read more
Disentangled Representation Learning through Unsupervised Symmetry Group Discovery
Dang-Nhu Barthélémy, Annabi Louis, Argentieri Sylvain
Robotics Reinforcement Learning Theory
  • Introduces a method for autonomous discovery of symmetry group structures in representation learning.
  • Proves the identifiability of true symmetry group decomposition under minimal assumptions.
  • Develops two algorithms: one for symmetry group discovery and another for LSBD representation learning.
  • Demonstrates improved performance over existing LSBD methods in various environments.
Read more
Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control
Ihor Kendiukhov
Interpretability
  • Exhaustive circuit tracing reveals a heavy-tailed hub distribution in feature connectivity.
  • Massive redundancy in feature interactions is confirmed, with no synergy found in higher-order interactions.
  • Late-layer features are causally linked to promoting cellular maturity, while early-layer features push cells away from maturity.
  • Systematic annotation bias is identified, with many significant features lacking biological annotations.
Read more
Slack More, Predict Better: Proximal Relaxation for Probabilistic Latent Variable Model-based Soft Sensors
Zehua Zou, Yiran Ma, Yulong Zhang, Zhengnan Li, Zeyu Yang, Jinhao Xie, Xiaoyu Jiang, Zhichao Chen
Generative Models Optimization Theory
  • Introduction of KProxNPLVM to improve soft sensor modeling accuracy.
  • Theoretical proof of approximation error in conventional NPLVM training methods.
  • Utilization of Wasserstein distance as a proximal operator for objective relaxation.
  • Rigorous derivation of optimization implementation and convergence proof.
Read more
Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference
Valentyn Melnychuk, Vahid Balazadeh, Stefan Feuerriegel, Rahul G. Krishnan
Theory
  • PFNs can exhibit prior-induced confounding bias, hindering frequentist consistency.
  • A one-step posterior correction (OSPC) is proposed to address this bias.
  • The OSPC restores frequentist consistency and leads to a semi-parametric Bernstein-von Mises theorem.
  • Martingale posteriors are utilized to implement the OSPC effectively.
Read more
Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors
Minrui Luo, Zhiheng Zhang
Theory Efficient ML
  • Introduction of Mixed Synthetic Nearest Neighbors (MSNN) for causal matrix completion under multiple treatments.
  • MSNN retains the statistical properties of SNN while improving sample efficiency for sparse treatment levels.
  • Demonstrates the feasibility of estimating causal effects using data from multiple treatment levels.
  • Empirical results show MSNN's effectiveness in real-world applications, particularly in data-scarce scenarios.
Read more
Flowcean - Model Learning for Cyber-Physical Systems
Maximilian Schmidt, Swantje Plambeck, Markus Knitt, Hendrik Rose, Goerschwin Fey, Jan Christian Wieck, Stephan Balduin
Optimization Theory Efficient ML
  • Flowcean automates model generation for Cyber-Physical Systems, addressing the complexity and diversity of these systems.
  • The framework supports a variety of learning strategies and data processing methods, enhancing flexibility and usability.
  • Flowcean integrates multiple learning libraries, streamlining the modeling process and making it more efficient.
  • Data-driven modeling reduces the need for manual effort and domain expertise, facilitating easier model generation.
Read more
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Yulu Gan, Phillip Isola
Large Language Models Optimization Efficient ML
  • Large pretrained models have a dense neighborhood of task-specific solutions, unlike small models.
  • The density of effective solutions scales with model size, making random sampling feasible for post-training.
  • RandOpt, a simple ensemble method based on random perturbations, achieves competitive performance with traditional methods.
  • Diversity in the neighborhood allows for task-specific improvements, where perturbations can enhance performance on some tasks while degrading others.
Read more
Duration Aware Scheduling for ASR Serving Under Workload Drift
Darshan Makwana, Yash Jogi, Harsh Kotta, Aayush Kubba
Audio & Speech Optimization Efficient ML
  • Duration-aware scheduling can significantly reduce end-to-end latency in ASR systems.
  • Shortest Job First (SJF) reduces median latency by up to 73% but can increase tail latency.
  • Highest Response Ratio Next (HRRN) provides a balanced approach, reducing median latency by up to 28% while controlling tail latency degradation.
  • The proposed methods incur less than 0.1 ms scheduling overhead per request.
Read more
UniHetCO: A Unified Heterogeneous Representation for Multi-Problem Learning in Unsupervised Neural Combinatorial Optimization
Kien X. Nguyen, Ilya Safro
Optimization Graph Learning
  • UniHetCO introduces a unified heterogeneous graph representation for multiple combinatorial optimization problems.
  • The framework allows for unsupervised learning without requiring ground-truth solutions.
  • Dynamic weighting based on gradient norms is employed to balance contributions from different problem classes during training.
  • Experiments show competitive performance against existing unsupervised NCO methods and effective cross-problem adaptation.
Read more
Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives
Taeho Lee, Donghwan Lee
Reinforcement Learning Robotics Optimization
  • Introduction of MMDDPG framework for robust policy learning in continuous control tasks.
  • Formulation of training as a minimax optimization problem between user and adversary.
  • Use of a fractional objective to balance performance and disturbance magnitude.
  • Demonstrated improved robustness against external disturbances and model uncertainties.
Read more
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
Indranil Halder, Annesya Banerjee, Cengiz Pehlevan
NLP Large Language Models Theory
  • Adversarial prompt injection can significantly amplify the attack success rate of large language models.
  • The scaling of attack success rates transitions from polynomial to exponential growth based on the length of injected prompts.
  • A theoretical model based on spin-glass theory provides insights into the dynamics of language generation and adversarial behavior in LLMs.
  • The proposed SpinLLM model allows for the analysis of inference-time scaling and the effects of prompt injection on attack success rates.
Read more
FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning
Yijun Pan, Weikang Qiu, Qiyao Ma, Mingxuan Ju, Tong Zhao, Neil Shah, Rex Ying
Reinforcement Learning Large Language Models NLP
  • FlexRec enables LLM-based recommenders to adapt to dynamic user needs and business objectives.
  • The framework introduces item-level rewards and uncertainty modeling to enhance training stability.
  • FlexRec outperforms traditional recommenders and LLM-based baselines in multiple recommendation scenarios.
  • The approach allows for efficient generalization to unseen needs with a single LLM model.
Read more
CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time
Nghia D. Nguyen, Pablo Robles-Granda, Lav R. Varshney
Time Series Theory Optimization
  • CAETC addresses time-dependent confounding bias in counterfactual estimation.
  • The method is model-agnostic and can be applied to various sequence architectures.
  • An entropy maximization adversarial game is proposed to ensure balanced representation.
  • CAETC shows significant improvements over existing counterfactual estimation methods.
Read more
Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification
Hang Yu, Huidong Liu, Qingchen Zhang, William Joy, Kateryna Nikulina, Andreas A. Schuppert, Sina Saffaran, Declan Bates
Reinforcement Learning Time Series Optimization
  • Introduction of T-CQL, a novel offline RL framework that incorporates temporal modeling and safety measures.
  • Development of a clinically relevant reward function that captures early indicators of VILI.
  • Validation of the framework using digital twin simulations for real-time policy evaluation.
  • Demonstration of improved performance over existing offline RL methods in optimizing mechanical ventilation.
Read more
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
Hong Yang, Devroop Kar, Qi Yu, Travis Desell, Alex Ororbia
Computer Vision Theory
  • Identification of Domain-Sensitivity Collapse (DSC) as a critical failure mode in single-domain OOD detection.
  • Introduction of Teacher-Guided Training (TGT) to enhance domain sensitivity in feature representations.
  • Demonstration of significant improvements in OOD detection performance across multiple benchmarks.
  • TGT maintains in-domain classification accuracy while reducing OOD detection false positives.
Read more
Personalized Federated Learning via Gaussian Generative Modeling
Peng Hu, Jianwei Ma
Federated Learning Generative Models Optimization
  • Introduces pFedGM, a personalized federated learning method using Gaussian generative modeling.
  • Balances global collaboration and personalization through a dual objective approach.
  • Decouples the Gaussian classifier into a navigator and a statistic extractor for improved representation learning.
  • Employs a dual-scale fusion framework for personalized classifier head development.
Read more
Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
Sai V R Chereddy
Computer Vision Interpretability
  • The study reveals that pre-trained Video Vision Transformers can represent nuanced action outcomes distinctly, despite producing the same final classification.
  • Mechanistic interpretability techniques, including delta analysis and activation patching, are employed to uncover the internal workings of the model.
  • Attention Heads and MLP Blocks have distinct roles, with Attention Heads gathering evidence and MLP Blocks composing concepts for outcome representation.
  • The findings highlight the potential for AI models to develop hidden knowledge, necessitating careful oversight for trustworthy AI deployment.
Read more
ARROW: Augmented Replay for RObust World models
Abdulaziz Alyahya, Abdallah Al Siyabi, Markus R. Ernst, Luke Yang, Levin Kuhlmann, Gideon Kowadlo
Reinforcement Learning Robotics Efficient ML
  • ARROW introduces a dual-buffer system for memory-efficient experience replay in continual reinforcement learning.
  • The method significantly reduces catastrophic forgetting while maintaining performance on previously learned tasks.
  • ARROW is evaluated in both non-shared and shared structure environments, showcasing its versatility.
  • The findings suggest that model-based reinforcement learning can effectively address challenges in continual learning.
Read more
Huntington Disease Automatic Speech Recognition with Biomarker Supervision
Charles L. Wang, Cady Chen, Ziwei Gong, Julia Hirschberg
Audio & Speech
  • Introduces a high-fidelity clinical corpus for HD speech ASR, the first of its kind for end-to-end evaluation.
  • Demonstrates that different ASR architectures exhibit unique error patterns when processing HD speech.
  • Achieves a significant reduction in WER through HD-specific adaptations of the Parakeet-TDT model.
  • Proposes the use of biomarker-based auxiliary supervision to enhance ASR performance and analyzes its effects on error behavior.
Read more
Chemical Reaction Networks Learn Better than Spiking Neural Networks
Sophie Jaffard, Ivo F. Sbalzarini
Theory
  • CRNs can solve classification tasks without requiring hidden layers, unlike SNNs.
  • The study provides mathematical guarantees for the learning behavior of CRNs.
  • Numerical experiments show CRNs outperform SNNs in accuracy and efficiency for digit classification.
  • The findings suggest potential advantages of biochemical networks over neuronal networks in learning tasks.
Read more
Meta-Reinforcement Learning with Self-Reflection for Agentic Search
Teng Xiao, Yige Yuan, Hamish Ivison, Huaisheng Zhu, Faeze Brahman, Nathan Lambert, Pradeep Dasigi, Noah A. Smith, Hannaneh Hajishirzi
Reinforcement Learning Large Language Models NLP
  • MR-Search leverages self-reflection to improve exploration strategies in agentic search tasks.
  • The method conditions on past episodes, allowing for adaptive learning across multiple interactions.
  • A novel multi-turn RL algorithm is introduced for precise credit assignment during training.
  • Empirical results show significant performance improvements over baseline RL methods.
Read more
A Multi-Label Temporal Convolutional Framework for Transcription Factor Binding Characterization
Pietro Demurtas, Ferdinando Zanchetta, Giovanni Perini, Rita Fioresi
Time Series
  • Introduces a multi-label classification approach for predicting TF binding sites, moving beyond binary classification.
  • Utilizes Temporal Convolutional Networks (TCNs) to capture correlations among multiple TFs effectively.
  • Demonstrates that TCNs outperform traditional RNNs and attention-based models in biological sequence analysis.
  • Reveals biologically meaningful motifs and novel TF interactions through model explainability.
Read more
Mitigating the Multiplicity Burden: The Role of Calibration in Reducing Predictive Multiplicity of Classifiers
Mustafa Cavus
Theory Interpretability
  • Predictive multiplicity can lead to conflicting outcomes for the same individual due to multiple near-optimal models.
  • Minority class observations are disproportionately affected by predictive multiplicity.
  • Post-hoc calibration methods can significantly reduce predictive multiplicity and improve prediction stability.
  • Platt Scaling and Isotonic Regression are the most effective calibration techniques tested.
Read more
CFD-HAR: User-controllable Privacy through Conditional Feature Disentanglement
Alex Gn, Fan Li, S Kuniyilh, Ada Axan
Time Series
  • CFD allows for user-controllable privacy by separating sensitive attributes from activity features.
  • The technique provides dynamic privacy filtering tailored to individual user preferences.
  • CFD outperforms traditional perturbation methods by maintaining high recognition performance.
  • A comparative analysis shows that few-shot HAR methods excel in label efficiency but compromise on privacy.
Read more
Topological DeepONets and a generalization of the Chen-Chen operator approximation theorem
Vugar Ismailov
Theory
  • Introduction of topological DeepONets that operate on locally convex spaces.
  • Generalization of the Chen-Chen operator approximation theorem to a broader context.
  • Construction of neural networks using continuous linear functionals from dual spaces.
  • Demonstration of uniform approximation of continuous operators on compact sets.
Read more
Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers
Orit Davidovich, Zohar Ringel
Theory Large Language Models NLP
  • Formal definition of Algorithmic Capture and algorithmic learning.
  • Transformers show a bias towards low-complexity algorithms, limiting their ability to learn higher-complexity tasks.
  • Upper bounds on inference-time complexity for infinite-width transformers are established.
  • Examples of captured algorithms include induction head search and sorting, while complex problems like shortest path and max flow are not captured.
Read more
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models
Jae-Won Chung, Jeff J. Ma, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury
Multimodal Generative Models Efficient ML
  • Cornserve is the first distributed serving system specifically for Any-to-Any multimodal models.
  • It allows for flexible task abstraction and model fission, enabling independent scaling of model components.
  • The system improves serving throughput by up to 3.81 times and reduces tail latency by up to 5.79 times.
  • Built on Kubernetes, Cornserve supports diverse multimodal models and enhances resource efficiency.
Read more
On the Role of Reversible Instance Normalization
Gaspard Berthelier, Tahar Nabil, Etienne Le Naour, Richard Niamke, Samir Perlaza, Giovanni Neglia
Time Series
  • Identifies three key challenges in normalization for time series forecasting: temporal, spatial, and conditional distribution shifts.
  • Conducts ablation studies on RevIN, revealing redundancies and limitations in its components.
  • Challenges the effectiveness of RevIN in mitigating distribution shifts in time series data.
  • Proposes new perspectives for improving normalization strategies in forecasting applications.
Read more
Monitoring and Prediction of Mood in Elderly People during Daily Life Activities
Daniel Bautista-Salinas, Joaquín Roca González, Inmaculada Méndez, Oscar Martinez Mozos
Time Series
  • Development of a wearable system for mood monitoring in elderly individuals.
  • Utilization of ecological momentary assessment (EMA) to simplify mood state evaluation.
  • Machine learning classifier trained on physiological data from a wristband.
  • Promising results in mood prediction accuracy, especially for happiness and activeness.
Read more
Deep Learning Network-Temporal Models For Traffic Prediction
Yufeng Xin, Ethan Fan
Time Series Graph Learning Large Language Models
  • Introduction of two deep learning models for multivariate time series traffic prediction: GAT and LLM.
  • GAT model effectively reduces prediction variance across time series and horizons.
  • LLM model shows superior overall prediction and generalization performance compared to traditional methods.
  • Comprehensive analysis reveals insights into correlation variability and prediction distribution discrepancies.
Read more
EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering
Nicolas Deutschmann, Constance Ferragu, Jonathan D. Ziegler, Shayan Aziznejad, Eli Bixby
Optimization Generative Models
  • EvoFlows enables controlled edit-based mutations in protein sequences, predicting both mutation type and location.
  • The model captures evolutionary patterns and protein distributions effectively, outperforming traditional models in generating realistic protein variants.
  • A new grid-free inference procedure enhances the model's efficiency in vectorized operations.
  • EvoFlows is particularly suited for protein optimization tasks, preserving structural and functional integrity while modifying properties.
Read more
Teleodynamic Learning a new Paradigm For Interpretable AI
Enrique ter Horst, Juan Diego Zambrano
Theory Interpretability Optimization
  • Teleodynamic Learning shifts the focus from optimization to the co-evolution of structure, parameters, and resources.
  • The framework introduces emergent stabilization and phase-structured behavior in learning processes.
  • DE11, the proposed teleodynamic learner, achieves high accuracy on benchmark datasets while producing interpretable rules.
  • The approach integrates concepts from biology and physics to enhance understanding of adaptive systems.
Read more
Bayesian Optimization of Partially Known Systems using Hybrid Models
Eike Cramer, Luis Kutschat, Oliver Stollenwerk, Joel A. Paulson, Alexander Mitsos
Optimization
  • Introduction of a hybrid model-based Bayesian Optimization framework that combines mechanistic models with Gaussian processes.
  • Demonstrated significant improvements in optimization efficiency, achieving convergence in as few as one iteration.
  • The hybrid model formulation allows for the inclusion of physical constraints, enhancing the robustness of the optimization process.
  • Outperformed standard BO methods in an in-silico optimization case study of a single-stage distillation.
Read more
Multi-Task Anti-Causal Learning for Reconstructing Urban Events from Residents' Reports
Liangkai Zhou, Susu Xu, Shuqi Zhong, Shan Lin
Theory Optimization
  • MTAC framework effectively disentangles task-invariant and task-specific causal mechanisms.
  • Utilizes a multi-task structural equation model (SEM) for causal discovery and inference.
  • Demonstrates significant improvements in urban event reconstruction accuracy over strong baselines.
  • Achieves up to 34.61% reduction in mean absolute error (MAE) in real-world applications.
Read more
A Learning-Based Superposition Operator for Non-Renewal Arrival Processes in Queueing Networks
Eliran Sherzer
Theory Efficient ML Time Series
  • Introduces a learning-based superposition operator for non-renewal arrival processes in queueing networks.
  • Utilizes deep learning to accurately reconstruct higher-order moments and dependence structures of merged arrival streams.
  • Demonstrates superior performance compared to classical renewal-based approximations through extensive computational experiments.
  • Enables decomposition-based evaluation of queueing networks, preserving critical variability and dependence information.
Read more
Procedural Fairness via Group Counterfactual Explanation
Gideon Popoola, John Sheppard
Theory Interpretability
  • Formalizes procedural fairness as group counterfactual explanation invariance.
  • Introduces GCIG, a regularization approach that minimizes cross-group variation in explanations.
  • Demonstrates that GCIG effectively reduces explanation disparity while preserving predictive performance.
  • Highlights the necessity of integrating fairness constraints during the training process.
Read more
AutoScout: Structured Optimization for Automating ML System Configuration
Jimmy Shong, Yuhan Ding, Yihan Jiang, Liheng Jing, Haonan Chen, Gaokai Zhang, Aditya Akella, Fan Lai
Optimization Efficient ML
  • AutoScout formulates ML system configuration as a mixed discrete-continuous optimization problem.
  • It employs a hybrid optimization framework that combines tree-based search and gradient-guided optimization.
  • AutoScout achieves 2.7–3.0× training speedup over expert-tuned configurations.
  • The system is 13.7–16.5× faster than existing system configurators in identifying optimal configurations.
Read more