AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Zeus: Towards Tuning-Free Foundation Model for Time Series Analysis
Yisong Fu, Zezhi Shao, Chengqing Yu, Yujie Li, Yongjun Xu, Xueqi Cheng, Fei Wang
Time Series
  • ZEUS is a unified TSFM that operates without task-specific fine-tuning.
  • It incorporates a multi-scale Transformer architecture for efficient long-sequence modeling.
  • MOTM allows ZEUS to learn diverse task-specific inductive biases in a single framework.
  • Experimental results show competitive performance across five key time series tasks.
Read more
A Filtered Mixture-of-Generators for Fully Synthetic Survival Training
Niccolò Maria Rizzi, Eugenio Lomurno, Alberto Archetti, Matteo Matteucci
Generative Models Optimization Time Series
  • FoGS introduces a novel pipeline for synthetic data construction in survival analysis, focusing on sample selection from multiple generators.
  • The method improves downstream performance metrics on the majority of evaluated datasets compared to traditional single-generator approaches.
  • FoGS maintains privacy margins while providing a viable alternative to real-data training in clinical settings.
  • The study identifies key trade-offs in synthetic data selection, emphasizing the importance of balancing plausibility and population coverage.
Read more
Fixed-Set Robustness in Programming by Example: Example Corruption and Semantic Partition Recovery
Yuan Si, Jialu Zhang
Theory
  • Introduces the concept of fixed-set worst-case corruption in PBE systems.
  • Proposes version-space partition aggregation (VPA) as a defense mechanism.
  • Demonstrates that low-margin PBE tasks are particularly susceptible to adversarial attacks.
  • Shows that VPA can recover from certain corruptions but struggles with low semantic vote margins.
Read more
Neural Certificate Pricing for Combinatorial Optimization Problems
Jingyi Chen, Xinyuan Zhang, Xinwu Qian
Optimization Theory Graph Learning
  • NCP transforms the certification process into a learnable optimization pipeline.
  • The framework separates learned price signals from structural certificates for effective recovery.
  • Local stability results indicate robustness of the recovery process against price prediction errors.
  • NCP outperforms state-of-the-art methods in various CO problem classes.
Read more
Predicting Early Stages Of Alzheimer's Disease And Identifying Key Biomarkers Using Deep Artificial Neural Network And Ensemble Of Machine Learning Methodologies
Debopriya Ghosh
Theory
  • Developed an automatic diagnostic system for early-stage Alzheimer's Disease.
  • Addressed data challenges including missing values and class imbalance.
  • Utilized advanced feature selection techniques to identify significant biomarkers.
  • Implemented both ensemble and deep learning models for comparative analysis.
Read more
SA-HGNN: Sample-Adaptive Hyperbolic Graph Neural Network for EEG-Based Depression Recognition
Yang Li, Pan Hu, Yan Zhang, Wenfan Yang, Tao Wu, Lianbo Guo
Graph Learning
  • SA-HGNN introduces a Sample-Adaptive Graph Construction module for personalized brain network topologies.
  • Utilizes hyperbolic graph convolution to effectively capture hierarchical relationships in brain connectivity.
  • Incorporates an Attention Pooling module to mitigate noise interference in EEG signals.
  • Demonstrates superior performance over traditional GNNs in EEG-based depression recognition tasks.
Read more
Learning Generalizable Skill Policy with Data-Efficient Unsupervised RL
Jongchan Park, Seungjun Oh, Seungho Baek, Yusung Kim
Reinforcement Learning Robotics Efficient ML
  • Introduction of GenDa framework to enhance data efficiency and generalizability in unsupervised RL.
  • Skill relabeling mechanism to address non-stationary skill semantics and improve pre-training efficiency.
  • Complementary Information Bottleneck (CIB) to ensure robustness against distribution shifts.
  • Demonstrated superior performance on diverse benchmarks compared to state-of-the-art methods.
Read more
ZO-Act: Efficient Zeroth-Order Fine-Tuning via One-Shot Activation-Informed Low-Rank Subspaces
Xun Dong, Yibo Xu, Naigang Wang, Xin Li, Penghang Yin, Zi Yang
NLP Large Language Models Optimization
  • ZO-Act utilizes activation-informed low-rank subspaces for efficient fine-tuning of large language models.
  • The method reduces perturbation dimensions, leading to lower variance in gradient estimation.
  • It supports momentum-based optimizers and quantized model fine-tuning by freezing original weights.
  • Experiments show ZO-Act outperforms strong ZO fine-tuning baselines across multiple tasks.
Read more
Self-Gating Attention for Efficient Time Series Forecasting
Dezheng Wang, Tong Chen, Wei Yuan, Congyan Chen, Shihua Li, Hongzhi Yin
Time Series Efficient ML
  • Introduces Self-Gating Attention (SGA) to improve efficiency in time series forecasting.
  • Reduces computational complexity from quadratic to linear with respect to look-back length.
  • Utilizes a shared attention matrix for common patterns and a residual component for input-specific variations.
  • Demonstrates competitive performance against state-of-the-art attention mechanisms across multiple datasets.
Read more
DecompRL: Solving Harder Problems by Learning Modular Code Generation
Juliette Decugis, Fabian Gloeckle, Francis Bach, Taco Cohen, Gabriel Synnaeve
Reinforcement Learning Large Language Models Generative Models
  • DecompRL decomposes complex problems into smaller, independently solvable modules.
  • The framework significantly reduces GPU costs by shifting the computational burden to CPU evaluations.
  • DecompRL outperforms traditional RL methods and achieves higher success rates on challenging benchmarks.
  • The approach enhances exploration and maximizes the utility of recombined solutions.
Read more
Do LLMs Truly Generalize in the Molecular Domain? A Perturbation-Based Analysis
Jiatong Li, Weida Wang, Changmeng Zheng, Shufei Zhang, Yatao Bian, Xiao-yong Wei, Qing Li
Large Language Models Graph Learning
  • LLMs exhibit limited generalization in the molecular domain, with performance sensitive to small structural changes.
  • The Molecular Perturbation framework allows for systematic evaluation of model robustness through controlled structural edits.
  • In-Context Tuning (ICT) can enhance model stability by anchoring predictions to structurally similar molecules.
  • The study highlights the disconnect between probabilistic modeling in LLMs and the rigid topological constraints of chemical structures.
Read more
Unveiling the Non-Monotonic Effect of Privacy on Generalization under Byzantine Robustness
Thomas Boudou, Batiste Le Bars, Nirupam Gupta, Aurélien Bellet
Federated Learning Theory Optimization
  • The privacy-robustness-optimization trilemma does not extend to generalization error.
  • In high-noise regimes, increasing privacy improves generalization performance.
  • In low-noise regimes, increased privacy can lead to worse generalization due to the influence of Byzantine participants.
  • The effectiveness of membership inference attacks is critical in determining generalization behavior.
Read more
Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling
Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan
NLP Large Language Models Reinforcement Learning
  • Identifies 'dimensional blind spots' as a critical failure in single-voiced rubric generation.
  • Introduces Multi-Role Rubric Generation (MRRG) to aggregate diverse evaluative perspectives.
  • Demonstrates that MRRG outperforms existing single-role rubric generation methods.
  • Provides a unified scoring interface applicable to both LLM evaluation and RLVR.
Read more
QFedAgent: Quantum-Enhanced Personalized Federated Learning for Multi-Agent Activity Recognition
Quoc Bao Phan, Tuy Tan Nguyen
Federated Learning Multimodal Robotics
  • Introduction of QFedAgent, a quantum-enhanced personalized federated learning framework.
  • Utilization of variational quantum circuits for efficient multimodal data fusion.
  • Achieved a 10× reduction in parameters compared to classical fusion methods.
  • Demonstrated high accuracy (97.7%) on the OPPORTUNITY dataset under non-IID conditions.
Read more
WARP: Weight-Space Analysis for Recovering Training Data Portfolios
Tzu-Heng Huang, Aditya Goyal, John Cooper, Frederic Sala
NLP Large Language Models Interpretability
  • WARP recovers domain mixtures from fine-tuned model weights, addressing the access asymmetry in AI research.
  • The framework generates pseudo-checkpoints through model merging, allowing for the estimation of training data distributions.
  • WARP outperforms traditional membership inference methods and variants with access to true training trajectories.
  • The method remains robust across different training recipes, including overtraining scenarios.
Read more
TiRex-2: Generalizing TiRex to Multivariate Data and Streaming
Patrick Podest, Marco Pichler, Elias Bürger, Levente Zólyomi, Bernhard Voggenberger, Wilhelm Berghammer, Daniel Klotz, Sebastian Böck, Günter Klambauer, Sepp Hochreiter
Time Series
  • TiRex-2 generalizes the original TiRex model to multivariate time series forecasting.
  • The model allows for streaming inference with constant computational costs per time step.
  • It incorporates both past and future covariates while preserving causality.
  • A synthetic coupling pipeline is introduced for scalable multivariate pretraining.
Read more
GAIA: Geometry-Adaptive Operator Learning for Forward and Inverse Problems
Meenakshi Krishnan, Pranav Pulijala, Ke Chen, Haizhao Yang, Ramani Duraiswami
Optimization Theory Efficient ML
  • GAIA provides a unified framework for solving both forward and inverse problems on arbitrary geometries without retraining.
  • The model utilizes a dual-pathway tokenization to explicitly encode geometric information, enhancing adaptability to varying geometries.
  • GAIA sets new state-of-the-art results on multiple benchmarks, significantly reducing error rates in inverse problem tasks.
  • The approach maintains competitive performance on forward problems while ensuring stable accuracy across varying resolutions.
Read more
Geometry-Aware R-Structured Kolmogorov-Arnold Networks
Sergei Kucherenko, Nilay Shah
Theory Interpretability Efficient ML
  • Introduction of GRS-KAN, integrating R-functions into KAN for enhanced interpretability and accuracy.
  • Explicit analytical representation of geometric constraints improves predictive performance on regression tasks.
  • Demonstrated up to 67% reduction in test RMSE in comparison to traditional KANs.
  • Agnostic variant can automatically determine the relevance of geometric priors for learning tasks.
Read more
Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability
Rodrigo Mendoza-Smith
Interpretability Efficient ML Large Language Models
  • Introduction of Expander SAEs, a parameter-efficient architecture for sparse coding.
  • Demonstrated a significant reduction in learned decoder values while maintaining high reconstruction fidelity.
  • Proposed a parallel implementation of OMP that optimizes inference speed and fidelity.
  • Provided theoretical guarantees for identifiability of sparse codes under specific conditions.
Read more
EVOTS: Evolutionary Transformer Search for Time Series Forecasting
AbdElRahman ElSaid, Damir Pulatov
Time Series Optimization
  • Introduction of EVOTS, a modular evolutionary architecture search framework for time-series forecasting.
  • Demonstration of superior performance of evolved architectures over hand-designed Transformer variants.
  • Effective exploration of diverse architecture space without fixed design constraints.
  • Strong performance gains in long-horizon and multivariate forecasting settings.
Read more
Learning the Supports for Categorical Critic in Reinforcement Learning
Jen-Yen Chang, Takayuki Osa, Tatsuya Harada
Reinforcement Learning Optimization Theory
  • Introduces a dynamic support learning method that eliminates the need for pre-defined support intervals in value function estimation.
  • Demonstrates that the mean-squared Bellman error is upper-bounded by the HL-Gauss loss, motivating the need for tighter support intervals.
  • Formulates the dynamic support learning as a constrained optimization problem, allowing for automatic adaptation of supports.
  • Empirical results show that the proposed method matches or improves upon existing HL-Gauss-based algorithms in continuous-control tasks.
Read more
Muon as a Residual Connection
Hao Huang
Optimization Theory
  • Muon can be interpreted as an implicit residual connection, enhancing representation preservation.
  • Orthogonalizing updates sacrifices immediate gradient fidelity for better downstream usability.
  • The paper provides a mechanistic explanation of Muon's effectiveness, accessible to a broader audience.
  • The findings suggest new avenues for optimizer design that balance local and downstream performance.
Read more
SemiScope: Disentangling Classifier Tuning and Joint Optimization in Semi-Supervised Security Classification
Rui Shu, Tianpei Xia, Jingzhu He
Optimization Theory Efficient ML
  • SemiScope effectively disentangles the contributions of SSL optimization and classifier tuning.
  • Significant performance improvements were observed with SemiScope compared to default SSL methods.
  • Classifier hyperparameter optimization alone accounts for a substantial portion of the gains from the joint pipeline.
  • A simpler approach using Self-Training and classifier tuning can achieve similar results with less complexity.
Read more
I2RiMA: Spectral Riemannian Representation with Temporal Attention for Mental Stress Detection based on EEG Signals
Cheng He, Kunyu Peng, Shangen Han, Jinming Ma, Jinhong Ding, Likun Xia
Time Series
  • I2RiMA constructs frequency-specific spatial covariance matrices and maps them to the SPD tangent space.
  • The model employs frequency cluster aggregation for effective feature selection and redundancy reduction.
  • An intra-inter slice attention module captures both local and global temporal dependencies in EEG data.
  • I2RiMA achieves state-of-the-art performance in cross-subject EEG stress detection.
Read more
Decomposer: Learning to Decompile Symbolic Music to Programs
Yewon Kim, Apurva Gandhi, David Chung, Graham Neubig, Chris Donahue
Generative Models Reinforcement Learning Audio & Speech
  • DECOMPOSER addresses the challenge of converting MIDI to Strudel code, enhancing the readability and editability of musical programs.
  • The framework utilizes a two-stage approach combining supervised fine-tuning and reinforcement learning to optimize both faithfulness and readability.
  • A synthetic corpus, STRUDEL-SYNTH, is created to facilitate the training process, addressing the lack of naturally paired data.
  • Experimental results show DECOMPOSER achieves superior performance compared to existing LLMs and heuristic converters.
Read more
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
Michael Y. Li, Anthony Zhan, Kanishk Gandhi, Noah D. Goodman, Emily B. Fox
NLP Large Language Models Reinforcement Learning Efficient ML
  • QuasiMoTTo improves sample efficiency by generating correlated samples instead of independent ones.
  • The method utilizes quasi-Monte Carlo techniques to ensure better coverage of the output space.
  • Empirical results show that QuasiMoTTo can achieve similar accuracy with significantly fewer samples.
  • The approach is applicable to both language model inference and reinforcement learning.
Read more
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Yunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi Song
NLP Large Language Models Reinforcement Learning
  • DemoPSD selectively adopts teacher guidance based on distribution consistency.
  • The framework mitigates privileged information leakage and preserves exploration.
  • DemoPSD outperforms existing methods like GRPO and SDPO in experiments.
  • The approach balances learning from the teacher with the student's reasoning.
Read more
When Context Compensates for Sparse Event History: AlphaEarth for Spatio-Temporal Point-Process Forecasting
Yahya Aalaila, Mouad Elhamdi, Gerrit Großmann, Daniel Jenson, Elizaveta Semenova, Sebastian Vollmer
Time Series
  • AlphaEarth embeddings provide a standardized method for incorporating spatial context into forecasting models.
  • The integration of contextual information significantly improves predictive performance in spatio-temporal point-process models, especially when local event histories are sparse.
  • The study demonstrates that the benefits of using external spatial context diminish as more event history is accumulated, but remain positive even with longer histories.
  • The research emphasizes the need for models to leverage both event history and contextual information for better forecasting accuracy.
Read more
Efficient Temporal Point Processes via Monotone Alternating Splines
Cheng Wan, Quyu Kong, Feng Zhou
Time Series Efficient ML Theory
  • Identifies fundamental limitations of Monotone Neural Networks in CCIF modeling.
  • Proposes Monotone Alternating Splines (MAS) to enhance flexibility and efficiency.
  • Establishes a theoretical foundation for MAS, including generalization error analysis.
  • Demonstrates superior performance of MAS on synthetic and real-world datasets.
Read more
StateFlow: Dual-State Recurrent Modeling for Long-Horizon Time Series Forecasting
Haroon Gharwi, Yue Dai, Kai Shu
Time Series
  • Introduces StateFlow, a dual-state recurrent framework for long-horizon time series forecasting.
  • Extends VARNN to capture both primary temporal dynamics and structured local prediction deviations.
  • Employs a two-stage optimization strategy to enhance forecasting stability and performance.
  • Achieves competitive results against linear, recurrent, convolutional, and Transformer-based models.
Read more
PRISM: Prioritized Channel Importance with Semi-supervised Domain Adaptation for Cross-Subject EEG Emotion Recognition
Xin Zhou, Xiang Zhang, Hao Deng, Lijun Yin
Time Series
  • PRISM utilizes a lightweight expert ensemble for adaptive channel prioritization in EEG emotion recognition.
  • The framework integrates semi-supervised domain adaptation to enhance cross-subject generalization under limited labels.
  • PRISM achieves superior performance on benchmark datasets compared to existing state-of-the-art methods.
  • The model is designed to be plug-and-play, allowing easy integration with existing architectures.
Read more
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training
Jinwen Wang, Youfang Lin, Xiaobo Hu, Siyu Yang, Sheng Han, Shuo Wang, Kai Lv
Reinforcement Learning Computer Vision Robotics
  • Introduction of a temporal correlation space for better representation learning in RL.
  • Development of Multi-scale Temporal Contrastive Learning (MTCL) to model temporal correlations.
  • Balanced attention to different elements in videos enhances representation quality.
  • Extensive experiments show significant improvements in sample efficiency and performance.
Read more
Automatic Detection of Stress from Speech in the Trier Social Stress Test
Hanna Drimalla, Wieland R. Cremer, Christine Kraus, Oliver T. Wolf
Audio & Speech
  • Automatic speech analysis can effectively differentiate between stressed and non-stressed speech.
  • Physiological stress responses can be predicted from acoustic-prosodic features of speech.
  • The study utilized a between-subject design to enhance the reliability of stress detection.
  • Feature importance analysis identified key predictors for stress detection performance.
Read more
Model Merging as Probabilistic Inference in Fine-Tuning Parameter Space
Long Minh Bui, Tuan Anh Le Van, Tung Phi Duc, Phi Le Nguyen, Jana Doppa, Trong Nghia Hoang
Optimization Theory Efficient ML
  • Introduces a probabilistic framework for model merging that improves upon traditional geometric methods.
  • Models each task-specific solution as an energy-based expert, allowing for better aggregation of update directions.
  • Addresses the limitations of Gaussian assumptions in existing methods by employing a heavy-tailed PoE design.
  • Demonstrates significant performance improvements over state-of-the-art merging techniques in empirical tests.
Read more
Fourier Neural Operators for Rayleigh-Bénard Convection
Chelsea Maria John, Thibaut Lunet, Sebastian Götschel, Andreas Herten, Stefan Kesselheim, Daniel Ruprecht
Theory Efficient ML Time Series
  • Introduction of a lean FNO architecture for predicting time increments in RBC.
  • Achieved higher accuracy than standard FNOs while maintaining a compact model size.
  • Demonstrated the model's ability to generalize across spatial and temporal resolutions.
  • Ablation study indicates that multi-layer 1D convolutional layers enhance performance.
Read more
A Lightweight Self-Supervised Learning Framework for Multivariate Time Series using Hierarchical-JEPA on ECG Data
Siwon Kim
Time Series Efficient ML
  • Introduction of ER-JEPA, a hierarchical SSL framework for ECG data analysis.
  • Two-stage structure allows for efficient multichannel and temporal analysis.
  • Achieves state-of-the-art performance on ECG benchmarks with minimal resource usage.
  • Demonstrates the effectiveness of hierarchical representation learning without representation collapse.
Read more
AdaBoosting Text Prompts for Vision-Language Models
Seokhee Jin, Changhwan Sung, Sunung Mun, Hoyoung Kim, Jungseul Ok
Multimodal Computer Vision NLP
  • TPB combines AdaBoost principles with natural-language text prompts for enhanced few-shot learning.
  • The framework explicitly focuses on misclassified examples to improve prompt quality and classification accuracy.
  • TPB demonstrates superior shot scalability and cross-model transferability compared to existing methods.
  • Extensive evaluations across multiple benchmarks confirm TPB's effectiveness in various VLM architectures.
Read more
Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space
Di Wu, Huan Liu, Zhixiang Chi, Yuanhao Yu, Konstantinos N. Plataniotis, Yang Wang
Graph Learning Optimization Theory
  • Introduction of dynamic neural graphs for modeling neural network parameters.
  • Development of the Dynamic Neural Graph Encoder (DNG-Encoder) to process these dynamic graphs.
  • Creation of INR2JLS for mapping INR weights into a joint latent space.
  • Demonstration of significant improvements in INR classification accuracy on CIFAR datasets.
Read more
EPC: A Standardized Protocol for Measuring Evaluator Preference Dynamics in LLM Agent Systems
Zewen Liu
Large Language Models Reinforcement Learning NLP
  • Introduction of EPC, a standardized protocol for measuring evaluator preference dynamics.
  • Establishment of a four-phase isolation paradigm for systematic evaluation.
  • Provision of a versioned Reference Snapshot for reproducibility and comparison.
  • Focus on community governance and versioning to maintain measurement validity.
Read more
Adaptive Group-Based Counterfactual Explanations for Time-Series Rehabilitation Data
Emmanuel C. Chukwu, Rianne M. Schouten, Monique Tabak, Mykola Pechenizkiy
Time Series Interpretability Optimization
  • Introduces a two-stage framework for generating group-based counterfactual explanations in rehabilitation data.
  • Implements a Learnable Gate mechanism to optimize sensor group relevance and enhance interpretability.
  • Demonstrates improved modality-group sparsity and validity over traditional channel-level methods.
  • Validates the approach using the KneE-PAD dataset, showing clinically meaningful corrective feedback.
Read more
Flow-Map GRPO: Reinforcement Learning for Few-Step Flow-Map Generators via Anchored Stochastic Composition
Zhiqi Li, Wen Zhang, Bo Zhu
Reinforcement Learning Generative Models Computer Vision
  • Introduces Flow-Map GRPO, a framework for optimizing deterministic few-step flow-map generators using RL.
  • Proposes Anchored Stochastic Flow Map Composition (ASFMC) to introduce stochasticity while preserving the original probability path.
  • Demonstrates that existing SDE-based stochasticization techniques are not applicable to long-range flow maps.
  • Empirical results show improvements in performance metrics for text-to-image generation tasks.
Read more
CausalMix: Data Mixture as Causal Inference for Language Model Training
Zinan Tang, Yukun Zhang, Shaomian Zheng, Zhuoshi Pan, Qizhi Pei, Dingnan Jin, Jun Zhou, Yujun Wang, Biqing Huang
NLP Large Language Models Optimization
  • CAUSALMIX optimizes data mixtures by framing it as a causal inference problem.
  • The framework allows for dynamic adjustment of mixture weights based on the current data state.
  • Extensive experiments show significant performance improvements over traditional methods.
  • CAUSALMIX provides interpretability through the analysis of Conditional Average Treatment Effects.
Read more
Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials
Gil Harari, Yoel Zimmermann, Ola Tangen Kulseng, Laura Zichi, Chuin Wei Tan, Marc L. Descoteaux, Boris Kozinsky
Optimization Efficient ML
  • SOAP and SOAP-Muon optimizers outperform Adam in training MLIPs, showing faster convergence and higher accuracy.
  • These optimizers maintain strong performance even with reduced force supervision, indicating potential for label-efficient training.
  • SOAP-Muon achieves robust results, particularly in scenarios where force labels are expensive or limited.
  • The resulting MLIPs demonstrate physical fidelity, accurately reproducing ab initio calculations and experimental data.
Read more
Interpretable vs Learned Encoders for High-Cardinality Fraud Detection
Xiao Han, Jingjing Liu, Moxuan Zheng, Zhen Zhang, Chenyu Wu
Interpretability
  • Entity embeddings provided the highest AUC-ROC score, indicating their effectiveness in high-cardinality fraud detection.
  • The study operationalized auditor-readable tier grouping, demonstrating its competitive performance against learned encodings.
  • Controlled comparisons across different encoders highlight the importance of isolating encoding methods from model architectures.
  • Interpretability and computational efficiency are critical factors in selecting encoding methods for fraud detection in regulated environments.
Read more
EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning
Ahin Lee, Sehyun Yun, Taesik Gong
NLP Large Language Models Efficient ML
  • EPnG optimizes parameter-efficient fine-tuning for Mixture-of-Experts models by reallocating resources based on expert importance.
  • The prune-and-grow mechanism allows for dynamic adjustment of expert utilization while maintaining a fixed parameter budget.
  • EPnG achieves performance comparable to full fine-tuning while updating significantly fewer parameters (0.55%–0.72%).
  • The framework addresses the inefficiencies of existing PEFT methods that do not consider MoE routing dynamics.
Read more
Diffeomorphic Optimization
Ludwig Winkler, Andrew Leaver-Fay, Joseph Kleinhenz, Pan Kessel
Optimization Generative Models
  • Diffeomorphic optimization enables smoother optimization on low-dimensional manifolds by utilizing diffusion and flow models.
  • The method maintains on-manifold trajectories, reducing the risk of drifting into out-of-distribution solutions.
  • It extends to matrix Lie groups, facilitating efficient backpropagation for complex protein structures.
  • Diffeomorphic optimization outperforms existing techniques in protein design tasks, achieving better results in less time.
Read more
Balancing Expressivity and Learnability in Quantum Kernel Bandit Optimization
Yuqi Huang, Vincent Y. F. Tan, Sharu Theresa Jose
Optimization Theory Efficient ML
  • Identifies the expressivity of quantum kernels as a fundamental learnability barrier in GP bandit optimization.
  • Proposes new algorithms that utilize lower-dimensional quantum subspaces and classical approximations to reduce model complexity.
  • Derives regret bounds that quantify the trade-off between information gain and kernel misspecification.
  • Empirical results show improved sample efficiency and reduced computational overhead compared to full quantum kernels.
Read more
GSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV Cache
Soosung Kim, Minjae Park, Eui-Young Chung, Jaeyong Chung
NLP Large Language Models Efficient ML
  • Introduces Gain-Shape K-means (GSKM) to address centroid shrinkage in high-dimensional vector quantization.
  • Develops Gain-Shape Residual Quantization (GSRQ) for efficient KV cache compression in LLMs.
  • Demonstrates substantial improvements in accuracy over existing quantization baselines, particularly at 1-bit quantization.
  • Highlights the importance of directional preservation in high-dimensional quantization tasks.
Read more