AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
FedUP: One-Shot Federated Unlearning via Centroid-Guided Plug-in Filters
Feihong Nan, Zhengyi Zhong, Pan Wang, Weidong Bao, Xiongtao Zhang, Quan Wen, Ji Wang
Federated Learning
  • FedUP provides a one-shot federated unlearning solution that reduces latency from minutes to seconds.
  • The framework uses lightweight pluggable filters to maintain model performance while unlearning specific data.
  • It supports reversibility, allowing for easy restoration of previously forgotten knowledge.
  • Extensive experiments validate the effectiveness of FedUP across diverse tasks.
Read more
3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy
Amirhossein Kardoost, Lion Gleiter, Tingying Peng, Carsten Marr
Computer Vision Multimodal
  • 3D Masked Autoencoders outperform 2D models in cellular representation learning.
  • Channel cross-attention and frequency-domain regularization enhance volumetric representation quality.
  • Integration of protein sequence information via a pretrained model improves downstream task performance.
  • MAE-3D achieves state-of-the-art results in protein localization and interaction tasks.
Read more
Structure-Aware Graph Multi-Task Learning for Dynamic Sparse OD Demand Prediction
Ming Xu, Jiawei Cao
Graph Learning Time Series Optimization
  • SAGMTL effectively addresses the challenges of dynamic sparsity and long-tailed distributions in OD demand prediction.
  • The framework decomposes OD prediction into structural state modeling and flow intensity estimation.
  • A node-edge collaborative representation module captures essential regional and temporal dynamics.
  • SAGMTL outperforms classical spatiotemporal forecasting models and advanced OD prediction methods.
Read more
FlowPipe: LLM-Enhanced Conditional Generative Flow Networks for Data Preparation Pipeline Construction
Kunyu Ni, Lei Cao, Jie He, Xiaotong Zhang, Jianfeng Jin, Junyu Dong, Yanwei Yu
Reinforcement Learning Generative Models Large Language Models
  • FlowPipe reformulates data preparation pipeline construction as conditional probabilistic flow generation.
  • It utilizes Conditional Generative Flow Networks optimized for effective credit assignment.
  • Deep Semantic Modulation is introduced to enhance decision-making by integrating LLM-derived logical priors.
  • The framework improves exploration efficiency by incorporating failure awareness to prune invalid states.
Read more
Provable Benefits of RLVR over SFT for Reasoning Models: Learning to Backtrack Efficiently
Stanley Wei, Juno Kim
NLP Large Language Models Reinforcement Learning
  • RLVR outperforms SFT in teaching efficient backtracking for reasoning tasks.
  • SFT fails to learn backtracking strategies due to lack of exposure to dead ends.
  • RLVR-trained models achieve exponential improvements in inference-time compute.
  • Distilling reasoning traces from RLVR can enhance base model performance.
Read more
AsyncOPD: How Stale Can On-Policy Distillation Be?
Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjun Kang, Sanghyun Park, Donghoon Kim, Minjae Lee, Minseo Kim, Rishabh Tiwari, Yuchen Zeng, Hyung Il Koo, Kangwook Lee
NLP Large Language Models Reinforcement Learning
  • Asynchronous OPD can improve training efficiency but introduces challenges related to stale data.
  • The direction of KL divergence (forward vs. reverse) affects the robustness of OPD to stale rollouts.
  • Simpler OPD-specific methods outperform complex asynchronous RL techniques in mitigating staleness.
  • Finite teacher-score caches create a bias-variance tradeoff that can be addressed with multi-sample Monte Carlo methods.
Read more
DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty
Rowan Martnishn
Theory Optimization Computer Vision
  • DREG outperforms traditional regularizers in terms of accuracy and robustness, particularly under data scarcity.
  • It is particularly effective with the GELU activation function, common in modern transformer architectures.
  • DREG requires only a single hyperparameter and no per-dataset tuning, making it a practical drop-in solution.
  • The method focuses regularization on layers with the highest activation derivatives, enhancing stability without global constraints.
Read more
Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization
Kanishk Awadhiya
NLP Large Language Models Theory
  • LLMs can be viewed as Dense Associative Memories that store reasoning patterns as latent attractors.
  • Correct reasoning chains correspond to deep attractor basins, while hallucinations correspond to sharp local minima.
  • The proposed Gibbs-Weighted Basin Selection mechanism improves reasoning performance by sampling and weighting trajectories based on their stability.
  • Empirical results show a 5.38% performance improvement on GSM8K, demonstrating the effectiveness of the proposed method.
Read more
One Ruler: A Same-Hands Re-Evaluation of Bivariate Causal Direction on Tübingen, with a Parameter-Free Compression Baseline
Wietse Stienstra
Theory
  • Introduces a 'same-hands' evaluation protocol for bivariate causal direction methods.
  • Presents a parameter-free baseline method using sorted-conditional compression.
  • Finds that existing method rankings change significantly under standardized evaluation.
  • Documents mechanisms that inflate reported accuracy figures in the literature.
Read more
Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks
Leona Hennig, Marius Lindauer
Efficient ML Optimization Computer Vision
  • Introduces Deep Shift Neural Networks (DSNNs) as a solution to reduce computational demands in deep learning.
  • Combines multi-fidelity and multi-objective optimization techniques to optimize DSNN configurations.
  • Achieves a 20% increase in performance and over 60% reduction in emissions compared to default DSNNs.
  • Demonstrates that quantizing smaller network portions can optimize energy consumption while maintaining performance.
Read more
Efficient Network Inference via Hardware-Aware Architecture Search, Model Pruning & Quantization
Lucas Heublein, Mark Deutel, Axel Plinge, Felix Ott
Efficient ML
  • Investigates efficient network inference for GNSS interference characterization under strict resource constraints.
  • Utilizes a combination of iterative structured pruning, static quantization, and hardware-aware zero-shot NAS.
  • Demonstrates the trade-offs between predictive performance and deployment efficiency through experimental evaluations.
  • Identifies compact model configurations that maintain competitive performance compared to uncompressed baselines.
Read more
Information-Theoretic Classifier-Free Guidance with Adaptive Schedule Optimization
Haobo Chen, Xiangxiang Xu, Yuheng Bu
Generative Models Optimization Computer Vision
  • Introduces an information-theoretic framework for optimizing Classifier-Free Guidance schedules.
  • Derives trajectory-level formulas for estimating consistency and coverage without explicit density estimation.
  • Develops an adaptive schedule optimization method that allocates guidance selectively across noise levels.
  • Demonstrates improved consistency and coverage in image generation tasks compared to constant guidance methods.
Read more
EML Trees Are Universal Approximators
Joe Germany, Elie Abdo, Joseph Bakarji
Theory Interpretability Optimization
  • EML trees can universally approximate functions in Sobolev spaces W k,∞.
  • A generalization of the EML function incorporates six learnable parameters per unit.
  • The paper provides an explicit construction of EML representations of multivariate polynomials.
  • Empirical validation shows accurate approximation capabilities of EML trees on benchmark functions.
Read more
RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting
Cheng He, Zhenyu Guan, Xijie Liang, Defu Lian, Jiajia Li, Enhong Chen, Patrick P. C. Lee, Geng Hu, Zehao Chen
Time Series
  • RAVEN addresses the limitations of fixed context windows in financial time series forecasting.
  • The framework uses a Mixture-of-Experts approach with adaptive context selection.
  • RAVEN significantly improves predictive performance on financial datasets, outperforming state-of-the-art models.
  • The introduction of a Global Compressed Representation branch enhances temporal coherence.
Read more
Forget Without Compromise: Nexus Sampling for Streaming KV-Cache Eviction Under Fixed Budgets
Duc Duong, Hoang Anh Duy Le, Jianwen Xie, Anshumali Shrivastava, Zhaozhuo Xu
NLP Large Language Models Efficient ML
  • Deterministic top-K eviction methods lead to permanent loss of important tokens due to their myopic nature.
  • Nexus Sampling combines Nexus scoring and weighted reservoir sampling to improve token retention.
  • The proposed method shows theoretical advantages in long-run token survival compared to deterministic top-K.
  • Empirical results indicate that Nexus Sampling performs comparably to dense attention while being more memory efficient.
Read more
Concept-Constrained Prompt Learning for Few-Shot CLIP Adaptation
Na Sang, Ding Ma, Rui Sang, Yuxuan Liu
Computer Vision NLP Multimodal
  • CCPL introduces a lightweight regularization framework for few-shot CLIP adaptation.
  • The method anchors class prompts to frozen concept prototypes, enhancing generalization to unseen classes.
  • Concept dropout is utilized to reduce over-reliance on specific concepts during training.
  • Empirical results show significant improvements in new-class accuracy on certain datasets.
Read more
Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression
Yong Yi Bay, Kathleen A. Yearick
Theory Efficient ML
  • KORE provides an analytical solution for optimal resolution in spline regression, bypassing the need for exhaustive hyperparameter search.
  • The method achieves comparable accuracy to traditional cross-validation techniques while significantly reducing the number of model fits required.
  • The approach is based on classical approximation theory, linking bias and variance to the resolution parameter in a closed-form expression.
  • KORE is effective across a wide range of input dimensions and interaction orders, outperforming 21 other methods in terms of accuracy per compute unit.
Read more
Expressivity Saturation: Reduced Affine Region Usage Under Increasing Task Complexity
Xuan Qi, Yi Wei, Fanqi Yu, Manuel Lecha
Theory Optimization
  • The paper introduces a rigorous theorem that bounds the number of affine regions in piecewise-affine MLPs.
  • Empirical evidence shows that increasing task complexity leads to reduced usage of affine regions, termed expressivity saturation.
  • The reduction in realized affine regions correlates with degraded decision boundaries, impacting classification performance.
  • The study connects theoretical insights with empirical observations, highlighting the gap between expressive capacity and utilization.
Read more
Causal Variational Deep Embedding: A Family of Interventional Generators for Confounded Images
Jingyuan Chen, Kangrui Ruan, Junzhe Zhang
Generative Models Computer Vision Theory
  • Introduces CAUVADE, a framework for generating images that account for unobserved confounders.
  • Proves that the proposed canonical augmented SCM is dense in the class of all augmented SCMs compatible with a given causal diagram.
  • Demonstrates the ability to produce diverse interventional distributions that span the feasible region of causal explanations.
  • Shows improved performance in generating unconfounded images compared to traditional generative models.
Read more
Learning to Place Guards by Reinforcement: A Geo-Free Neural Policy for the Vertex-Guard Art Gallery Problem
Domagoj Ševerdija, Jurica Maltar, Nathan Chappel, Domagoj Matijević
Reinforcement Learning Optimization Theory
  • Introduces a geo-free neural policy for the vertex-guard Art Gallery Problem using reinforcement learning.
  • Demonstrates that the policy can achieve competitive guard placements without explicit geometric input during inference.
  • Utilizes a probing method to analyze the encoder's representation, revealing its capacity to encode necessary geometric information.
  • Shows significant improvements in feasibility when using a classifier on the encoder's embeddings, reducing under-covered polygons.
Read more
Sesame: Structure-Aware Molecular Generation via Spatial Density-Map Conditioning
Konstantin Yatsenko, Arvind Thiagarajan
Generative Models
  • Introduces a novel density map conditioning architecture for structure-aware molecular generation.
  • Supports both de novo generation and fragment-conditioned growth through a unified mechanism.
  • Implements a hybrid discrete-continuous diffusion process for effective molecular generation.
  • Utilizes trajectory finetuning to enhance the quality of generated molecules.
Read more
A Survey on Federated Causal Discovery and Inference
Xianjie Guo, Yuwei Wang, Guodu Xiang, Xiaoli Tang, Kui Yu, Han Yu, Qiang Yang
Federated Learning Graph Learning Theory
  • The paper provides a systematic review of Federated Causal Discovery and Inference, addressing the lack of comprehensive surveys in the field.
  • FCD and FCI are organized along three axes: methodological paradigms, federation topologies, and structural scopes.
  • The authors formalize the relationship between FCD and FCI as complementary stages of a unified federated causal reasoning pipeline.
  • Key practical dimensions such as data heterogeneity and missing data are examined in the context of federated causal analysis.
Read more
GRACE: Gated Refinement for Accurate Causal Edge Discovery in High-Dimensional Time Series
Mohammad Fesanghary, Abhinav Havaldar
Time Series Graph Learning Theory
  • GRACE improves causal edge discovery in high-dimensional time series using a two-stage framework.
  • The method employs Hard Concrete gates with L0 regularization for robust binary decision-making.
  • GRACE significantly enhances F1 scores while maintaining high precision compared to existing methods.
  • The framework is computationally efficient, achieving results comparable to nonlinear CI tests at 75x faster speeds.
Read more
Stage-dependent integer-binary encoding in factorization-machine black-box optimization
Ryo Ogawa, Mayumi Nakano, Yuya Seki, Shu Tanaka
Optimization
  • Introduces a stage-dependent encoding framework for black-box optimization using factorization machines.
  • Derives conversion formulas between different integer-binary encoding methods to maintain surrogate objectives.
  • Demonstrates that one-hot encoding consistently outperforms other encoding methods during the learning stage.
  • Shows that the effectiveness of domain-wall encoding for solution search varies based on problem conditions.
Read more
Noise is Signal: Density-Based Outliers as Leading Indicators of Occupational Emergence in Labor Market Text
Shreyash Rawat
NLP
  • Introduces the Emergence-Density Inversion (EDI) hypothesis, suggesting that noise in job postings indicates novelty.
  • Demonstrates that high-EOS outlier groups transition to stable clusters faster than low-EOS groups.
  • Implements an extended Emerging Occupation Score (EOS) that improves prediction accuracy for cluster formation.
  • Validates the EOS metric through retrospective studies of emerging roles, indicating its predictive capabilities.
Read more
Formalizing Task-Space Complexity for Zero-Shot Generalization
Jung-Hoon Cho, Heling Zhang, Siqi Du, Roy Dong, Cathy Wu
Reinforcement Learning Robotics Theory
  • Introduces signed divergence as a performance-based measure of task dissimilarity.
  • Defines task-space complexity in terms of ε-tolerance sets and provides geometric certificates.
  • Develops a greedy selection strategy for source contexts with an H(n) approximation guarantee.
  • Demonstrates the effectiveness of the proposed methods on both linear and nonlinear control systems.
Read more
One-Step Flow Matching for Generative Modeling of Path-Dependent Physical Fields
Yijing Zhou, Jasmin Jelovica
Generative Models
  • Introduction of a transformer-based flow matching model for generating path-dependent stress fields.
  • Direct generation of stress fields across all time steps, improving efficiency and reducing computational costs.
  • Non-Gaussian source distribution reduces complexity in training, allowing for one-step generation of samples.
  • Demonstrated significant speedup over traditional finite element methods, making it feasible for complex simulations.
Read more
PG-MAP: Joint MAP Optimization for Inference-Time Alignment of Diffusion and Flow-Matching Models
Ruolan Sun, Pawel Polak
Generative Models Optimization Computer Vision
  • PG-MAP enables joint optimization of conditioning and latent states during inference, improving generative model performance.
  • The framework is training-free and adapts to both diffusion and flow-matching models, demonstrating versatility.
  • Empirical results show significant improvements in alignment metrics and human preference evaluations.
  • The methodology includes a schedule-adaptive trust region for optimizing variables at each denoising step.
Read more
The Two-Hump Problem: Bridging the Difficulty Gap in Mathematical Reinforcement Learning
Lucas Fagan, Michele Tarquini, Ali Shehper, Maksymilian Manko, Angus Gruen, Coco Huang, Giorgi Butbaia, Davide Passaro, Sergei Gukov
Reinforcement Learning Theory Optimization
  • Identification of the 'Two-Hump' difficulty distribution in the AC problem.
  • Introduction of substitution supermoves to enhance the action space for RL agents.
  • Development of the Dual-Ring Transformer architecture to effectively handle large action spaces.
  • Creation of two large benchmark datasets, AC-19 and AC-1M, for training and evaluation.
Read more
EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
Tristan Maidment, JB Lanier, Chase McDonald, Nathan Tsang, Eugene Vinitsky, Roy Fox, Albert Wang, Wesley N. Kerr
Reinforcement Learning
  • EMAgnet introduces an adaptive regularization target using an exponential moving average of policy parameters.
  • The method outperforms traditional uniform regularization in terms of exploitability in various game environments.
  • EMAgnet effectively discards dominated strategies while maintaining coverage over strategically relevant options.
  • The approach is applicable to deep reinforcement learning, extending previous tabular methods to more complex settings.
Read more
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
Chenhao Dang, Jing Ma, Mingjie Liao
Large Language Models Reinforcement Learning Optimization
  • Introduction of the Holistic Data Scheduler (HDS) for LLM pre-training.
  • HDS utilizes a multi-objective reward function to optimize data mixing.
  • Achieved 44% fewer training iterations on The Pile dataset compared to existing methods.
  • Demonstrated a 7.2% improvement in 0-shot accuracy on the MMLU benchmark.
Read more
NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction
Wenhao Gao, Yifan Wang, Yijia Ma, Carl Yang, Wen Li, Chenyu You
Multimodal Audio & Speech Generative Models
  • NeuroSonic introduces a conditional flow-matching framework for EEG-to-speech reconstruction.
  • The method learns a deterministic velocity field for transporting corrupted acoustic states to clean speech.
  • Utilizes a time-conditioned gated Transformer for joint processing of EEG and audio signals.
  • Demonstrates significant improvements over existing methods, especially in challenging artifact-heavy conditions.
Read more
When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs
Lucky Verma, Pratik Yadav
NLP Large Language Models Generative Models
  • Top-1 argmax concentration is ineffective as a stability warning in DLM fine-tuning.
  • Actual training collapses were recorded as 0 out of 816 configurations despite top-1 warnings firing in all cases.
  • Max gradient norm is proposed as a more reliable indicator of training stability, achieving higher precision and F1 scores.
  • Calibration of monitoring thresholds should be done per DLM family rather than using a universal constant.
Read more
SOAP-Bubbles: Structured Weight Uncertainty for Neural Networks
Adrian Robert Minut, Nico Daheim, Marco Miani, Mohammad Emtiyaz Khan, Wu Lin, Thomas Möllenhoff
Optimization Large Language Models Theory
  • SOAP-Bubbles provide a method for transforming diagonal covariances into non-diagonal ones using SOAP's preconditioner.
  • The Eigenspace-VON (EVON) optimizer allows for efficient optimization of structured posteriors without extensive changes to training pipelines.
  • EVON shows improved performance over existing methods like IVON in both training speed and final loss for language models.
  • The approach captures richer representations of weight uncertainty, beneficial for various applications in deep learning.
Read more
Topological Out-of-Domain Generalization in Dynamical Systems Reconstruction
Georg Trede, Charlotte Ricarda Doll, Elias Weber, Daniel Durstewitz
Theory Time Series
  • Identified three core shortcomings in existing DSR models that limit out-of-domain generalization.
  • Proposed feature splitting as a key remedy to improve model performance across different dynamical regimes.
  • Derived a closed-form bound on the reliable extrapolation range for predictions.
  • Demonstrated significant improvements in zero-shot prediction capabilities through empirical validation.
Read more
Rapid FinFET Modelling Using an Autoencoder
Amit Sarkar, Suman Sau, Swagata Mandal
Efficient ML
  • Utilizes an autoencoder for efficient FinFET modeling.
  • Incorporates drain-to-source voltage (VDS) as an input feature.
  • Achieves high accuracy in reconstructing I-V curves with minimal training data.
  • Extracts critical device metrics directly from the model.
Read more
An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data
Jinghan Wang, Feng Cheng, Wentao Wu, Hang Li, Gaoliang Peng, Tianchen Liu
Time Series
  • Introduces a knowledge-guided framework for simultaneous dataset and condition shifts in bearing fault diagnosis.
  • Establishes explicit knowledge transfer mechanisms that outperform traditional implicit alignment methods.
  • Develops a dynamic classification head for seamless adaptation across heterogeneous fault taxonomies.
  • Demonstrates superior performance with limited labeled data, achieving 92.61% accuracy.
Read more
Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search
Yashkumar R Lukhi, Harsh Rameshbhai Moradiya, Radu Timofte, Dmitry Ignatov
Computer Vision Efficient ML Optimization
  • Introduces an automated pipeline for exploring heterogeneous MoE4 architectures.
  • Identifies a coverage bias in the search space, anchored to a single architecture family.
  • Proposes a stratified random sampling method to mitigate coverage bias.
  • Finds ShuffleNet and MobileNetV3 as high-yield families for ensemble accuracy.
Read more
A Verifiable Search Is Not a Learnable Chain-of-Thought
Harsh Patel
Theory Large Language Models Reinforcement Learning
  • The assumption that all solvable tasks can be learned as a chain-of-thought is challenged.
  • A significant gap exists between the accuracy of verifiable solvers and the performance of fine-tuned models, especially for cryptarithm tasks.
  • The concept of 'verdict-as-token' highlights the limitations of model outputs in decision-making despite high arithmetic accuracy.
  • Forward-derivable tasks are learnable, while those requiring backtracking search are not, unless the search is precomputed.
Read more
Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study
Khalil Ahammad, Derek Abbott, Mohsen Dorraki
Computer Vision Large Language Models Multimodal
  • Physiology-aware CNNs outperform zero-shot LLMs in ECG image classification.
  • LeadGroupECG model effectively captures anatomical relationships among ECG leads.
  • Zero-shot multimodal LLMs show limited diagnostic discrimination capabilities.
  • CNN models achieved high ROC-AUC scores, indicating strong classification performance.
Read more
Weight-Space Geometry of Offline Reasoning Training
Aleksandr Nikolich, Igor Kiselev, Vladimir Platonov, Karina Romanova
Reinforcement Learning Large Language Models Theory
  • SFT, RFT, and RIFT produce nearly collinear weight updates with similar performance metrics.
  • DFT diverges significantly in weight direction compared to reward-weighted methods.
  • Offline GRPO introduces a substantial orthogonal component while remaining in the SFT loss basin.
  • DPO achieves the highest accuracy on GSM8K and AIME26, despite using a smaller learning rate.
Read more
MGI: Member vs Generated Inference
Bihe Zhao, Michel Meintz, Juangui Xu, Franziska Boenisch, Adam Dziedzic
Generative Models Computer Vision
  • Introduction of the Member vs Generated Inference (MGI) task.
  • Existing membership inference methods are inadequate for distinguishing between training members and generated outputs.
  • The proposed Data Circuit Breaker (DCB) method effectively addresses the MGI challenge.
  • DCB combines signals from an autoencoder and latent generator to improve classification accuracy.
Read more
Learning the Koopman Operator using Attention Free Transformers
Mohammed Nagdi, Evangelos-Marios Nikolados, Alexey Yermakov, Mars Gao, Nathan Kutz, Filippo Menolascina
Time Series Theory Optimization
  • Introduction of an attention-free latent memory block for improved prediction accuracy.
  • Dynamic re-encoding mechanism to correct latent drift and maintain model robustness.
  • Demonstrated effectiveness across multiple benchmark systems with significant error reduction.
  • Lower inference latency compared to traditional models while maintaining accuracy.
Read more
Temporal-Spectral Alignment with Frequency Adaptation for Source-Free Time-Series Adaptation
Shichang Meng, Linquan Wu, Xuan Ai, Linqi Song
Time Series
  • Introduces a novel SFDA framework for time-series data that adapts at the signal level rather than the feature level.
  • Proposes a lightweight Frequency Adaptation Layer (FAL) for spectral alignment, enhancing adaptation efficiency.
  • Demonstrates superior performance on benchmark datasets, achieving state-of-the-art results in macro F1-score.
  • Addresses both temporal and spectral shifts in time-series data, which are often overlooked in existing methods.
Read more
When Is an LLM Worth It for Hyperparameter Optimization? A Budget-Matched Study on Tabular Data Finds the Warm-Start Is a Default Configuration, Not the Model
Carson Rodrigues, Oysturn Vas
Optimization Large Language Models
  • The warm-start advantage in LLM-HPO is attributed to a fixed default configuration rather than the model's suggestions.
  • LLM proposals add minimal improvement to cross-validation accuracy and none to held-out test accuracy.
  • Classical search methods seeded with a sensible default outperform LLM advisors within a few evaluations.
  • The study emphasizes the importance of proper baseline comparisons in evaluating LLM performance in HPO.
Read more
KLip-PPO: A per-sample KL perspective on PPO-Clip
Riccardo Colletti, Robin Holzinger
Reinforcement Learning Optimization Theory
  • Establishes a per-sample equivalence between PPO-Clip and PPO-KL surrogates.
  • Demonstrates that both surrogates produce indistinguishable training outcomes on benchmark tasks.
  • Clarifies the implicit structure of the PPO-Clip surrogate as a per-sample KL penalty.
  • Suggests new avenues for algorithmic generalization based on the identified structural features.
Read more
CLIP-guided Diffusion Model for Backdoor Generation in Sensor-based Human Activity Recognition
Toby Briston, Illya Kosyk, Kuniyih S
Generative Models Time Series Multimodal
  • Introduction of IMU-DM-CLIP, a backdoor training technique for HAR models.
  • Demonstration of the effectiveness of backdoor attacks with low injection rates.
  • Identification of challenges in generating unbiased synthetic data for HAR.
  • Evaluation of the stealthiness and performance of the proposed backdoor generation technique.
Read more
Collapsed Effective Operators for Higher-order Structures
Maximilian Krahn, Lennart Bastian, Vikas Garg, Björn Schuller, Tolga Birdal
Graph Learning Theory
  • Introduction of Collapsed Effective Operators for higher-order structures.
  • The operator condenses higher-order interactions into a single vertex-level representation.
  • Preserves positive semi-definiteness and lowers system energy under higher-order connectivity.
  • Empirical improvements in spectral clustering and signal smoothing.
Read more