AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

62 Papers today
8h Update frequency
7 Days of history
Deep Learning-Based Metamodeling of Nonlinear Stochastic Dynamic Systems under Parametric and Predictive Uncertainty
Haimiti Atila, Seymour M.J. Spence
Time Series Theory Optimization
  • Introduces three metamodeling frameworks for nonlinear dynamic systems that account for both loading and parameter uncertainties.
  • Utilizes advanced deep learning architectures (MLP, MPNN, AE) combined with LSTM for effective feature extraction and time-series prediction.
  • Demonstrates low prediction errors across different structural models, validating the effectiveness of the proposed methods.
  • Establishes a correlation between predictive variance and actual error, enhancing model reliability and confidence in predictions.
Read more
Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors
Wei W. Xing, Kaiqi Huang, Jiazhan Liu, Hong Qiu, Shan Shen
Optimization Efficient ML Theory
  • Introduces a learned prior framework that eliminates the need for hyperparameter tuning in yield analysis.
  • Achieves state-of-the-art accuracy with mean relative errors as low as 0.11%.
  • Reduces total validation costs by over 10 times compared to traditional methods.
  • Demonstrates effective cross-corner knowledge transfer through an attention mechanism.
Read more
A Multi-Label Temporal Convolutional Framework for Transcription Factor Binding Characterization
Pietro Demurtas, Ferdinando Zanchetta, Giovanni Perini, Rita Fioresi
Time Series
  • Introduces a multi-label classification framework for predicting TF binding sites.
  • Utilizes Temporal Convolutional Networks (TCNs) for improved performance over traditional methods.
  • Demonstrates the ability to capture correlations among multiple TFs and their cooperative mechanisms.
  • Reveals biologically meaningful motifs and novel TF interactions.
Read more
A Learning-Based Superposition Operator for Non-Renewal Arrival Processes in Queueing Networks
Eliran Sherzer
Theory Efficient ML Optimization
  • Introduces a scalable, data-driven superposition operator for non-renewal arrival processes.
  • Utilizes deep learning to accurately reconstruct statistical descriptors of merged arrival streams.
  • Demonstrates significant performance improvements over classical renewal-based methods.
  • Enables decomposition-based analysis of queueing networks with merging flows.
Read more
RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction
Hanbum Ko, Chanhui Lee, Ye Rin Kim, Rodrigo Hormazabal, Sehui Han, Sungbin Lim, Sungwoong Kim
Large Language Models Reinforcement Learning Generative Models
  • RetroReasoner incorporates a stepwise reasoning process that aligns with chemists' strategies for retrosynthesis.
  • The model is trained using a novel framework, SyntheticRetro, which generates structured reasoning text.
  • RetroReasoner employs reinforcement learning with round-trip accuracy as a reward to enhance prediction feasibility.
  • Experimental results indicate significant performance improvements over existing retrosynthesis prediction models.
Read more
A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis
Bo Hu, Jose C Principe
Theory Efficient ML Generative Models
  • Introduces a stable neural dependence estimator for analyzing autoencoders.
  • Avoids input concatenation and re-pairing, improving computational efficiency.
  • Demonstrates that Gaussian noise assumptions enable meaningful statistical dependence measurements.
  • Proposes a scalar objective based on NMF for enhanced stability.
Read more
Duration Aware Scheduling for ASR Serving Under Workload Drift
Darshan Makwana, Yash Jogi, Harsh Kotta, Aayush Kubba
Audio & Speech Optimization Efficient ML
  • Duration-aware scheduling significantly improves end-to-end latency in ASR systems.
  • Shortest Job First (SJF) can reduce median latency by up to 73%, but may cause increased tail latency.
  • Highest Response Ratio Next (HRRN) balances latency reduction and tail latency control effectively.
  • Both scheduling algorithms incur less than 0.1 ms overhead per request.
Read more
Retrieval-Enhanced Real Estate Appraisal
Simon Popelier, Matthieu X. B. Sarazin, Maximilien Bohm, Mathieu Gierski, Hanna Mergui, Matthieu Ospici, Adrien Bernhardt
Efficient ML Interpretability
  • Introduces a new comparable selection framework based on retrieval-enhanced machine learning (REML).
  • Demonstrates that learning to select comparables yields higher-quality comparables compared to traditional methods.
  • Achieves similar performance with up to 22 times fewer parameters than state-of-the-art models.
  • Enhances model explainability and confidence for decision-makers by simplifying the examination of retrieved properties.
Read more
CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time
Nghia D. Nguyen, Pablo Robles-Granda, Lav R. Varshney
Time Series Theory Optimization
  • CAETC addresses time-dependent confounding bias in counterfactual estimation.
  • The method is model-agnostic and can be applied to various sequence architectures.
  • An entropy maximization adversarial game is introduced to ensure balanced representations.
  • CAETC shows significant improvements over existing counterfactual estimation methods.
Read more
Learning Pore-scale Multiphase Flow from 4D Velocimetry
Chunyang Wang, Linqi Zhu, Yuxuan Gu, Robert van der Merwe, Xin Ju, Catherine Spurin, Samuel Krevor, Rex Ying, Tobias Pfaff, Martin J. Blunt, Tom Bultreys, Gege Wen
Graph Learning Multimodal Time Series
  • Introduces a multimodal learning framework for pore-scale multiphase flow prediction.
  • Combines graph network simulation with 3D U-Net architecture for enhanced accuracy.
  • Achieves significant reduction in computational time for predictions, enabling real-time applications.
  • Captures complex flow dynamics and interface evolution effectively, including transient phenomena.
Read more
Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers
Yue Zhang, Chuanlong Qiu, Xinfa Liao, Yiqun Zhang
Federated Learning
  • Introduces Fed-k∗-HC, a federated clustering framework that automatically determines the optimal number of clusters.
  • Addresses the issue of imbalanced cluster distributions in federated learning scenarios.
  • Utilizes a hierarchical merging process to explore clusters of varying sizes and shapes.
  • Demonstrates improved clustering performance through extensive experiments on diverse datasets.
Read more
Bridging Discrete Marks and Continuous Dynamics: Dual-Path Cross-Interaction for Marked Temporal Point Processes
Yuxiang Liu, Qiao Liu, Tong Luo, Yanglei Gan, Peng He, Yao Liu
Time Series
  • NEXTPP integrates discrete event marks and continuous dynamics through a dual-channel architecture.
  • The model employs self-attention for discrete encoding and Neural ODE for continuous evolution.
  • A cross-attention mechanism allows for bidirectional interaction between discrete and continuous representations.
  • Extensive evaluations show superior performance compared to existing models on real-world datasets.
Read more
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh
Reinforcement Learning Robotics Optimization
  • Introduction of cross-domain Bellman consistency to measure model transferability.
  • Development of the QAvatar framework for effective knowledge transfer between domains with distinct state and action spaces.
  • Establishment of convergence properties for the QAvatar algorithm.
  • Demonstration of QAvatar's superior performance on various reinforcement learning benchmarks.
Read more
Probing Length Generalization in Mamba via Image Reconstruction
Jan Rathjens, Robin Schiewer, Laurenz Wiskott, Anand Subramoney
Computer Vision NLP Efficient ML
  • Mamba's performance degrades on sequences longer than those encountered during training.
  • The study uses image reconstruction tasks to probe Mamba's length generalization capabilities.
  • A length-adaptive variant of Mamba is introduced, improving performance on varying sequence lengths.
  • The research highlights the importance of understanding internal processing mechanisms in sequence models.
Read more
Chem4DLLM: 4D Multimodal LLMs for Chemical Dynamics Understanding
Xinyu Li, Zhen Zhang, Qi Chen, Anton van den Hengel, Lina Yao, Javen Qinfeng Shi
NLP Large Language Models Multimodal
  • Introduction of Chemical Dynamics Understanding (ChemDU) to model dynamic chemical phenomena.
  • Development of Chem4DBench, the first dataset linking 4D molecular trajectories with natural language explanations.
  • Proposal of Chem4DLLM, a model that combines graph encoding with large language models for enhanced molecular understanding.
  • Focus on generating coherent narratives that describe chemical events, improving interpretability of dynamic processes.
Read more
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
Callum McLean, Luke Y. Prince, Alexandre Payot, Paul Balança, Carlo Luschi
Large Language Models Efficient ML
  • MXNorm reduces the computational overhead of normalization by reusing block scales from MXFP8 quantization.
  • The method achieves a 32x reduction in the size of reductions needed for normalization.
  • Validation on Llama 3 models shows minimal loss of training accuracy compared to RMSNorm.
  • MXNorm provides kernel speedups of up to 2.4x over RMSNorm, enhancing efficiency in large language models.
Read more
A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric
Keru Wang, Yixin Deng, Yao Lyu, Stephen Redmond, Shengbo Eben Li
Reinforcement Learning Theory
  • The distributional Bellman operator is contractive under the Cramér metric, ensuring stability in policy evaluation.
  • Existing analyses lack insight into the structural dynamics of the Bellman update on distributions.
  • The authors introduce a two-level analytical framework to analyze Bellman dynamics at the CDF level and construct regularized Hilbert spaces.
  • The framework preserves the intrinsic Cramér geometry while enabling operator-theoretic analysis.
Read more
H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code
Amit Singh, Vedant Nipane, Pulkit Agrawal, Jatin Kishnani
Large Language Models Generative Models
  • H2LooP Spark Preview adapts a 7-billion parameter model for low-level embedded systems programming.
  • A large-scale training corpus was created from repository-datasheet pairs, enabling effective domain adaptation.
  • The model achieved a 70.4% reduction in in-domain perplexity and surpassed larger models in generative benchmarks.
  • Extensive hyperparameter tuning established optimal configurations for continual pretraining.
Read more
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov, Abdul Ahad Butt, Gül Varol, Pascal Fua, Fabio Pizzati, Ivan Laptev
Generative Models Optimization Robotics
  • PhysMoDPO integrates a Whole-Body Controller into the training of diffusion models for humanoid motion generation.
  • The framework uses physics-based and task-specific rewards to ensure generated motions are both realistic and condition-faithful.
  • Extensive experiments show consistent improvements in physical realism and task metrics in simulation.
  • PhysMoDPO enables zero-shot motion transfer to real robots, demonstrating its practical applicability.
Read more
Beyond Barren Plateaus: A Scalable Quantum Convolutional Architecture for High-Fidelity Image Classification
Radhakrishnan Delhibabu
Computer Vision Theory Efficient ML
  • Introduction of a novel QCNN architecture that mitigates barren plateaus.
  • Achieved a classification accuracy of 98.7% on the MNIST dataset.
  • Demonstrated a significant reduction in required trainable parameters compared to classical CNNs.
  • Utilized localized cost functions and tensor-network initialization to enhance trainability.
Read more
NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation
Yuxin Yang, Haoran Zhang, Mingxuan Li, Jiachen Xu, Ruoxi Shen, Zhenyu Wang, Tianhao Liu, Siqi Chen, Weilin Huang
NLP Large Language Models Efficient ML
  • NeuroLoRA introduces a context-aware neuromodulation mechanism to enhance expert selection in multi-task adaptation.
  • The framework retains the computational efficiency of frozen random projections while allowing for dynamic adjustments based on input context.
  • A Contrastive Orthogonality Loss is proposed to improve task decoupling and mitigate catastrophic forgetting in continual learning.
  • Extensive experiments show that NeuroLoRA consistently outperforms existing methods like FlyLoRA in various adaptation scenarios.
Read more
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
Jiawei Hao, Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Dan Zeng
Large Language Models Efficient ML
  • Introduces a novel expert replacing paradigm for MoE models.
  • Achieves significant memory efficiency without sacrificing performance.
  • Demonstrates superior performance compared to existing compression methods.
  • Utilizes adaptive expert selection and hierarchical expert construction.
Read more
Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty
Marcell T. Kurbucz
Optimization
  • Introduces a four-phase framework for spectral risk optimization under decision-dependent uncertainty.
  • Utilizes Generalised Random Forests for adaptive conditional sampling to address distribution shifts.
  • Implements a two-stage oracle reranking mechanism to enhance solution quality.
  • Demonstrates superior performance in reducing variance and improving reliability compared to existing methods.
Read more
Differentiable Thermodynamic Phase-Equilibria for Machine Learning
Karim K. Ben Hicham, Moreno Ascani, Jan G. Rittig, Alexander Mitsos
Optimization Theory
  • Introduction of DISCOMAX, a differentiable algorithm for phase-equilibrium calculations.
  • Ensures thermodynamic consistency during training and inference.
  • Outperforms existing surrogate-based methods for binary liquid-liquid equilibrium data.
  • Provides a general framework for learning from different types of equilibrium data.
Read more
GeoChemAD: Benchmarking Unsupervised Geochemical Anomaly Detection for Mineral Exploration
Yihao Ding, Yiran Zhang, Chris Gonzalez, Eun-Jung Holden, Wei Liu
Theory
  • Introduction of GeoChemAD, a comprehensive benchmark dataset for unsupervised geochemical anomaly detection.
  • Benchmarking of various unsupervised anomaly detection methods, establishing the first unified performance comparison.
  • Development of GeoChemFormer, a transformer-based framework that enhances anomaly detection through self-supervised learning.
  • Demonstration of superior performance of GeoChemFormer across diverse geochemical datasets.
Read more
Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models
Yijun Quan, Wentai Wu, Giovanni Montana
Federated Learning Theory Efficient ML
  • Introduces exact federated continual unlearning for ridge heads on frozen foundation models.
  • Develops a communication protocol that allows for efficient handling of add/delete requests.
  • Proves deterministic exactness and invariance properties of the proposed methods.
  • Demonstrates experimental validation matching centralized retraining with minimal error.
Read more
Comparison of Outlier Detection Algorithms on String Data
Philip Maus
Theory
  • Introduces a modified local outlier factor algorithm for string data using Levenshtein distance.
  • Presents a new outlier detection algorithm based on hierarchical left regular expression learning.
  • Demonstrates the effectiveness of both algorithms in identifying outliers in string datasets.
  • Highlights the conditions under which each algorithm performs optimally.
Read more
Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents
Sky Chenwei Wan, Tianjun Hou, Yifei Wang, Xiqing Chang, Aymeric Jan
Time Series Interpretability
  • Introduction of Knowledge-Guided TSED (K-TSED) for event detection using natural language descriptions.
  • Development of the Event Logic Tree (ELT) framework to represent temporal-logic structures of events.
  • Creation of a neuro-symbolic VLM agent system (SELA) for zero-shot event detection.
  • Validation through a benchmark demonstrating superior performance compared to existing methods.
Read more
FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
Jun Xue, Junze Wang, Xinming Zhang, Shanze Wang, Yanjun Chen, Wei Zhang
Reinforcement Learning Robotics
  • FastDSAC effectively scales Maximum Entropy RL for high-dimensional humanoid control.
  • Dimension-wise Entropy Modulation (DEM) improves exploration efficiency by redistributing the exploration budget.
  • A continuous distributional critic enhances value fidelity and reduces overestimation errors.
  • FastDSAC achieves performance gains of 180% and 400% over deterministic baselines on challenging tasks.
Read more
High-resolution weather-guided surrogate modeling for data-efficient cross-location building energy prediction
Piragash Manmatharasan, Girma Bitsuamlak, Katarina Grolinger
Optimization Time Series Efficient ML
  • Introduces a high-resolution weather-informed surrogate modeling approach for building energy prediction.
  • Achieves effective cross-location generalization with minimal simulation effort.
  • Maintains high predictive accuracy when trained on a single location and applied to others within the same climate zone.
  • Utilizes weekly weather data to capture short-term energy demand patterns, improving model transferability.
Read more
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
Alliot Nagle, Jakhongir Saydaliev, Dhia Garbaya, Michael Gastpar, Ashok Vardhan Makkuva, Hyeji Kim
Large Language Models Efficient ML Optimization
  • Introduction of hindsight-optimal reasoning length (HORL) for determining optimal exit points in CoT reasoning.
  • Development of TERMINATOR, an inference-time early-exit algorithm that significantly reduces CoT lengths.
  • Creation of a novel dataset for training the early-exit strategy based on the first logical arrival of final answers.
  • Demonstrated substantial reductions in reasoning lengths (14%-55%) across multiple datasets.
Read more
Statistical and structural identifiability in representation learning
Walter Nelson, Marco Fumero, Theofanis Karaletsos, Francesco Locatello
Theory Generative Models Computer Vision
  • Introduces statistical and structural identifiability as distinct concepts in representation learning.
  • Proposes model-agnostic definitions of near-identifiability allowing for error tolerance.
  • Demonstrates that ICA can resolve linear ambiguities in representation learning models.
  • Achieves state-of-the-art disentanglement using a vanilla autoencoder combined with ICA.
Read more
Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information
Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet, Russel Pears
Time Series
  • FiCSUM framework combines supervised and unsupervised meta-information for concept representation.
  • Dynamic weighting strategy enhances the adaptability of the framework across different datasets.
  • FiCSUM significantly outperforms existing methods in classification accuracy and concept drift detection.
  • Concept fingerprints allow for effective reuse of classifiers for recurring concepts.
Read more
Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models
Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam
NLP Large Language Models Interpretability
  • TreeKD enhances LLM performance in MPP by distilling knowledge from tree-based models.
  • The method verbalizes decision tree rules into natural language for LLM training.
  • Rule-consistency improves prediction robustness by ensembling outputs from diverse rules.
  • TreeKD narrows the performance gap between LLMs and specialist models in MPP tasks.
Read more
Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks
Kun Wang, Reinhard Heckel
Large Language Models Reinforcement Learning NLP
  • Direct evaluation of LLMs can misrepresent their capabilities due to task familiarity.
  • The proposed TTRA method aligns models effectively without requiring a specific training dataset.
  • Post-alignment, base models show performance comparable to fine-tuned models, especially in reasoning tasks.
  • Many reported performance gains from RL and SFT may be artifacts of task familiarity rather than genuine reasoning improvements.
Read more
No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation
Taha Bouhsine
Theory Efficient ML Interpretability
  • Introduction of the ⵟ-product kernel operator that combines alignment and proximity.
  • Neural Matter Networks (NMNs) utilize the ⵟ-product, eliminating the need for separate activation functions.
  • Empirical results show NMNs match linear classifiers on MNIST and outperform GPT-2 in language modeling.
  • The framework offers a unified approach to kernel learning and gradient stability.
Read more
Causal Representation Learning with Optimal Compression under Complex Treatments
Wanting Liang, Haoang Chi, Zhiheng Zhang
Theory Efficient ML
  • Introduces a novel generalization bound for multi-treatment causal representation learning.
  • Proposes a consistent estimator for the optimal balancing weight α, eliminating heuristic tuning.
  • Demonstrates O(1) scalability with the Treatment Aggregation strategy.
  • Extends the framework to a generative architecture preserving Wasserstein geodesic structure.
Read more
Thermodynamics of Reinforcement Learning Curricula
Jacob Adamczyk, Juan Sebastian Rojas, Rahul V. Kulkarni
Reinforcement Learning Optimization Theory
  • Introduces a geometric framework for curriculum learning in reinforcement learning.
  • Optimal curricula are shown to correspond to geodesics in a task manifold defined by reward parameters.
  • Presents the 'MEW' algorithm for temperature annealing in maximum-entropy RL.
  • Challenges the assumption of a flat task space in traditional RL approaches.
Read more
IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor Killian, Aviral Kumar
Reinforcement Learning Large Language Models Optimization
  • Optimal number of parallel rollouts increases with compute budget and saturates at high levels.
  • Scaling trends differ between easy and hard problems, with distinct underlying mechanisms.
  • Performance is relatively insensitive to the number of unique problems per batch compared to rollouts per problem.
  • Guidelines for compute allocation can help maximize performance in LLM RL training.
Read more
Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs
Zhangyong Liang, Ji Zhang
Theory Optimization Time Series
  • DLDMF provides a novel approach to generalizing neural surrogate models for parameterized PDEs.
  • The framework utilizes a deterministic feed-forward mapping for encoding PDE parameters, avoiding unstable test-time auto-decoding.
  • DLDMF integrates spatial, temporal, and parameter information into a cohesive latent representation.
  • Extensive experiments validate DLDMF's superior performance in predictive accuracy and extrapolation robustness.
Read more
PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
Chenlong Yin, Runpeng Geng, Yanting Wang, Jinyuan Jia
NLP Large Language Models Reinforcement Learning
  • PISmith is a reinforcement learning-based framework for evaluating prompt injection defenses.
  • The framework addresses reward sparsity issues in training attack LLMs through adaptive entropy regularization and dynamic advantage weighting.
  • Extensive evaluations show that state-of-the-art defenses are vulnerable to adaptive attacks.
  • PISmith consistently achieves higher attack success rates compared to seven baseline methods.
Read more
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
Indranil Halder, Annesya Banerjee, Cengiz Pehlevan
NLP Large Language Models Theory
  • Adversarial prompt-injection attacks can significantly increase the attack success rate of LLMs.
  • The scaling of attack success rate transitions from polynomial to exponential with increased inference-time samples.
  • A theoretical model based on spin-glass theory provides insights into the behavior of LLMs under adversarial conditions.
  • Short prompts lead to power-law scaling, while long prompts result in exponential scaling of attack success rates.
Read more
Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization
Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao
Reinforcement Learning Robotics Theory
  • H-EARS unifies potential-based reward shaping with energy-aware action regularization.
  • The framework achieves linear complexity by focusing on dominant energy components.
  • Theoretical foundations include convergence guarantees and performance-modeling trade-offs.
  • Empirical results show improved convergence speed and energy efficiency across benchmarks.
Read more
As Language Models Scale, Low-order Linear Depth Dynamics Emerge
Buddhika Nettasinghe, Geethu Joseph
NLP Large Language Models Theory
  • Low-order linear surrogates can accurately capture the depth dynamics of large language models.
  • Agreement between linear surrogates and full models improves with increasing model size.
  • The linear surrogate enables more efficient intervention strategies than standard heuristics.
  • The study provides a systems-theoretic framework for analyzing transformer dynamics.
Read more
Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness
Arman Bolatov, Samuel Horváth, Martin Takáč, Eduard Gorbunov
Optimization Federated Learning Theory
  • Introduction of Byz-NSGDM, a robust optimization method for distributed systems facing Byzantine attacks.
  • The algorithm operates under the (L0, L1)-smoothness condition, which is more general than traditional L-smoothness.
  • Proven convergence rate of O(K−1/4) with a bias floor dependent on robustness and gradient heterogeneity.
  • Empirical validation shows strong performance against various Byzantine attack strategies.
Read more
L2GTX: From Local to Global Time Series Explanations
Ephrem Tibebe Mekonnen, Luca Longo, Lucas Rizzo, Pierpaolo Dondio
Time Series
  • L2GTX is a fully model-agnostic method for generating global explanations in time series classification.
  • The method aggregates local explanations from a selective set of instances to create class-wise global insights.
  • L2GTX effectively reduces redundancy in explanations by merging local clusters and constructing an instance-cluster importance matrix.
  • Experimental results indicate that L2GTX maintains high interpretability and global faithfulness across different datasets.
Read more
Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs
Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Josh Sunshine, Aarti Singh, Yuejie Chi, Wode Ni
Multimodal Large Language Models Generative Models
  • FEYNMAN effectively decouples knowledge elicitation from visual production, enhancing diagram generation.
  • The agent generated over 100,000 well-aligned diagram-caption pairs at a low cost.
  • A new benchmark, DIAGRAMMA, was created for evaluating visual reasoning in multi-modal models.
  • The use of PENROSE allows for diverse and semantically consistent diagram rendering.
Read more
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models
Jae-Won Chung, Jeff J. Ma, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury
Multimodal
  • Cornserve is the first distributed serving system specifically designed for Any-to-Any multimodal models.
  • It offers a flexible task abstraction for expressing complex computation graphs in Python.
  • The system enables model fission, allowing independent scaling of model components.
  • Cornserve utilizes a record-and-replay execution model for efficient tensor data forwarding.
Read more
Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers
Yiqun Zhang, Zexi Tan, Xiaopeng Luo, Yunlin Liu
Graph Learning
  • Introduces a novel outlier detection paradigm using graph structures.
  • Distinguishes between scatterliers and clusterliers for better anomaly detection.
  • Utilizes hierarchical reference sets for local and global anomaly evaluation.
  • Demonstrates effectiveness through extensive experiments and performance analysis.
Read more
RXNRECer Enables Fine-grained Enzymatic Function Annotation through Active Learning and Protein Language Models
Zhenkun Shi, Jun Zhu, Dehang Wang, BoYu Chen, Qianqian Yuan, Zhitao Mao, Fan Wei, Weining Wu, Xiaoping Liao, Hongwu Ma
NLP Large Language Models Interpretability
  • RXNRECer directly predicts enzyme-catalyzed reactions, bypassing the limitations of EC number reliance.
  • The framework integrates protein language modeling and active learning for enhanced prediction accuracy.
  • Significant performance improvements were observed over traditional EC-based methods.
  • RXNRECer supports scalable annotation and provides interpretable prediction rationales.
Read more
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
Hong Yang, Devroop Kar, Qi Yu, Travis Desell, Alex Ororbia
Computer Vision
  • Identification of Domain-Sensitivity Collapse (DSC) as a critical failure mode in single-domain OOD detection.
  • Introduction of Teacher-Guided Training (TGT) to enhance domain sensitivity during training.
  • Demonstration of significant improvements in OOD detection performance without increasing inference costs.
  • Validation of TGT across multiple benchmarks, showing consistent reductions in false positive rates.
Read more
DirPA: Addressing Prior Shift in Imbalanced Few-shot Crop-type Classification
Joana Reuss, Ekaterina Gikalo, Marco Körner
Computer Vision
  • DirPA method effectively mitigates prior shifts in imbalanced few-shot learning scenarios.
  • The study evaluates DirPA across eight European countries, demonstrating cross-dataset stability.
  • A strong correlation exists between class imbalance and performance improvements with DirPA.
  • DirPA enhances hierarchical classification accuracy, ensuring reliable land-cover identification.
Read more
Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
Xinyan Jiang, Wenjing Yu, Di Wang, Lijie Hu
NLP Large Language Models Theory
  • GER-steer provides a training-free solution for refining activation steering in LLMs.
  • The method utilizes the first principal component of tangent semantic directions to enhance steering robustness.
  • Extensive evaluations show GER-steer outperforms existing baselines across various tasks and models.
  • The framework ensures consistent control without requiring manual tuning or heuristic layer selection.
Read more
Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection
Kadir-Kaan Özer, René Ebeling, Markus Enzweiler
Time Series
  • AxonAD introduces a novel approach to anomaly detection by focusing on predictable query dynamics in multi-head attention mechanisms.
  • The model effectively captures structural dependency shifts in multivariate time series data, addressing limitations of traditional residual-based detectors.
  • A dual scoring mechanism combines reconstruction error with a query mismatch score to enhance sensitivity to anomalies.
  • Extensive evaluations show significant improvements in anomaly detection performance over existing methods.
Read more
Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis
Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, Ioannis Patras
Theory
  • The failure of ideal noise correction methods is not solely due to T estimation issues.
  • Controlled experiments with a perfect transition matrix still show performance collapse.
  • A unified analysis links macroscopic, microscopic, and information-theoretic perspectives.
  • The study provides insights into the inherent instabilities of noise correction methods.
Read more
When Drafts Evolve: Speculative Decoding Meets Online Learning
Yu-Yang Qian, Hao-Cong Wu, Yichao Fu, Hao Zhang, Peng Zhao
NLP Large Language Models Efficient ML
  • Introduction of OnlineSPEC framework that combines speculative decoding with online learning.
  • Establishment of a theoretical link between acceleration rates and online learning performance.
  • Development of novel algorithms leveraging interactive feedback for draft model refinement.
  • Demonstrated up to 24% speedup over existing methods while preserving output quality.
Read more
Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages
Vishnu Teja Kunde, Fatemeh Doudi, Mahdi Farahbakhsh, Dileep Kalathil, Krishna Narayanan, Jean-Francois Chamberland
Reinforcement Learning Large Language Models Generative Models
  • Introduces a finite-horizon MDP framework for DLMs to facilitate RL applications.
  • Derives an exact policy gradient that allows for stepwise advantage estimation.
  • Implements entropy-guided step selection to optimize compute allocation during training.
  • Achieves state-of-the-art results on coding and logical reasoning tasks, surpassing existing RL methods for DLMs.
Read more
Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors
Minrui Luo, Zhiheng Zhang
Theory Efficient ML
  • Introduction of Mixed Synthetic Nearest Neighbors (MSNN) for causal matrix completion under multiple treatments.
  • MSNN retains the statistical properties of the original SNN method while improving sample efficiency for sparse treatment levels.
  • The method leverages shared latent structures across treatments to enhance causal effect estimation.
  • Empirical results show MSNN's effectiveness in data-scarce environments, outperforming existing methods.
Read more
Deep Distance Measurement Method for Unsupervised Multivariate Time Series Similarity Retrieval
Susumu Naito, Kouta Nakata, Yasunori Taguchi
Time Series
  • DDMM improves retrieval accuracy by focusing on minute differences in multivariate time series data.
  • The method uses a unique weighting system for anchor-positive pairs based on Euclidean distance.
  • Empirical studies show significant performance improvements over existing methods in industrial applications.
  • Combining DDMM with feature extraction methods can lead to further accuracy enhancements.
Read more
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Xingli Fang, Jung-Eun Kim
Theory
  • Privacy vulnerability is concentrated in a small fraction of weights.
  • Critical weights for utility performance overlap with privacy-vulnerable weights.
  • The importance of weights is determined more by their locations than their values.
  • The proposed fine-tuning strategy selectively rewinds only privacy-vulnerable weights.
Read more
When LLM Judge Scores Look Good but Best-of-N Decisions Fail
Eddie Landesberg
NLP Large Language Models Optimization
  • Global metrics can misrepresent the effectiveness of LLM judges in best-of-n selection tasks.
  • A judge with moderate global correlation may perform poorly in actual selection scenarios.
  • Within-prompt ranking is crucial for effective candidate selection, as it differs from global agreement.
  • Explicit pairwise judging significantly improves recovery rates in selection tasks.
Read more
On Linear Separability of the MNIST Handwritten Digits Dataset
Ákos Hajnal
Theory
  • The MNIST dataset remains a critical benchmark for evaluating image classification models.
  • Linear separability is a key concept in machine learning, yet its status for MNIST has been unclear.
  • The paper distinguishes between pairwise and one-vs-rest linear separability in its analysis.
  • The findings may confirm the prevailing belief that MNIST is not linearly separable.
Read more